Multiarmed bandit games

It is especially good for web analytics and for anyone new to experimental statistics. The paper is easy to read, well written, and rather informative.

The Poisson Distribution should not be used to model clicks.

This effect must be considered when interpreting experimental results. If the same user is presented with the same experiment repeatedly, her reaction to the experiment is a function of the number of times she has been exposed to the experiment.Rather than running the experiment for years, the best option may be to run several short-term experiments and adapt the website to the changing behavior as soon as the new behavior is observed. A short-term experiment may only capture a short-term behavior. The underlying behavior of the users may change every month. If reality is constantly changing, the experiment length may not improve the accuracy of the experiment.On the other hand, experienced users may be initially slowed by a new format even if the new format is “better”. – In the beginning, experienced users may click on a new option just because it is new, not because it’s good. Initial results are strongly affected by “Primacy and Novelty”.The authors ended up focusing on “sessions per user” as a metric as opposed to “queries per month” partly due to a bug which increased (in the short-term) queries and revenues while degrading the user’s experience. It’s important to choose the right metric. Specifically, a high number clicks or queries per session could be indicative of a bug rather than success. – Short term effects may be diametrically opposed to long-term effects. The paper itself recounted five online statistical experiments mostly done at Microsoft that had informative counter-intuitive results: “…we often joke that our job, as the team that builds the experimentation platform, is to tell our clients that their new baby is ugly, …”Īndrew Gelman at Statistical Modeling, Causal Inference, and Social Science pointed me towards the paper “Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained” by Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, and Ya Xu all of whom seem to be affiliated with Microsoft. In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference. Third, we show how Thompson sampling can be applied to infer causal relationships when interacting with an environment in a sequential fashion. Second, we show how Thompson sampling can be used to study interactions between multiple adaptive agents, thus, opening up an avenue of game-theoretic analysis. First, we discuss in how far such Thompson sampling can be regarded as a natural consequence of the Bayesian modeling of policy uncertainty.

Here we discuss three important features of this approach. The predictive distribution can then be constructed by a Bayesian superposition of the optimal policies weighted by their posterior probability that is updated by Bayesian inference and causal calculus. Recently, it has been shown how sampling actions from the predictive distribution over the optimal action-sometimes called Thompson sampling-can be applied to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. They develop a generalization of Thompson sampling based on a Bayesian prior over a distribution of environments. In “ Generalized Thompson sampling for sequential decision-making and causal inference“, Ortega and Braun (2013) give a short history of Thompson sampling (see, ,, ) and report on the relationship between intelligent agents, evolutionary game theory, Bayesian inference, KL Divergence, and Thompson sampling.