The information revolution is spawning systems that require very frequent decisions and provide high volumes of data concerning past outcomes. Fueling the design of algorithms used in such systems is a vibrant research area at the intersection of sequential decision-making and machine learning that addresses how to balance between exploration and exploitation and learn over time to make increasingly effective decisions.
This talk will consider a broad family of such problems, which contains the classical multi-armed bandit problem as a special case, but also captures more complicated settings where samples of one action can inform the decision-maker's assessment of other actions. I'll describe the rising importance of this problem class, and then discuss two recent methodological advances. One advance is Thompson sampling, a simple and tractable approach that is provably efficient for many relevant problems. The other is information-directed sampling, a new algorithm we propose that is inspired by an information-theoretic perspective and can offer greatly superior statistical efficiently. We provide new insight into both algorithms and establish general theoretical guarantees.
This is based on joint work with Benjamin Van Roy