January 11, 2017
When: 12:00pm (noon) - 1:00pm
It is known that the optimal policy for Markovian multi-armed bandit problems is the Gittins index policy. For bandit problems with Bayesian learning (Bayesian bandits) however, computing the Gittins index is intractable. In this paper, we introduce the concept of an approximate learning trajectory as a new approach to approximating the dynamics of future learning, which is motivated by the Bernstein-von Mises Theorem. Using this approximation, we show how the expected cost-to-go terms in the dynamic programming equations for Bayesian bandits can be easily obtained, which allows for an efficient computation of an approximate Gittins index. We prove that the approximate Gittins index is asymptotically optimal in that it approaches the true Gittins index in certain limiting signal and prior standard deviation regimes. Moreover, we discuss how easy computation of the approximate Gittins index allows us to decompose the Gittins index into tractable terms that characterize how the value of exploration and exploitation depend on primitives of the problem such as upside variability, signal quality, and effective time-horizon.
Snacks and refreshments will be provided. Hope to see you there!