Convergent learning algorithms for unknown reward games

Archie C. Chapman, David S. Leslie, Alex Rogers, Nicholas R. Jennings

Research output: Contribution to journalArticle (Academic Journal)peer-review

36 Citations (Scopus)
488 Downloads (Pure)


In this paper, we address the problem of convergence to Nash equilibria in games with rewards that are initially unknown and must be estimated over time from noisy observations. These games arise in many real-world applications, whenever rewards for actions cannot be prespecified and must be learned online, but standard results in game theory do not consider such settings. For this problem, we derive a multiagent version of Q-learning to estimate the reward functions using novel forms of the e-greedy learning policy. Using these Q-learning schemes to estimate reward functions, we then provide conditions guaranteeing the convergence of adaptive play and the better-reply processes to Nash equilibria in potential games and games with more general forms of acyclicity, and of regret matching to the set of correlated equilibria in generic games. A secondary result is that we prove the strong ergodicity of stochastic adaptive play and stochastic better-reply processes in the case of vanishing perturbations. Finally, we illustrate the efficacy of the algorithms in a set of randomly generated three-player coordination games and show the practical necessity of our results by demonstrating that violations to the derived learning parameter conditions can cause the algorithms to fail to converge.
Original languageEnglish
Pages (from-to)3154-3180
Number of pages27
JournalSIAM Journal on Control and Optimization
Issue number4
Publication statusPublished - 6 Aug 2013


  • potential games
  • distributed optimization
  • reinforcement learning
  • strong ergodicity


Dive into the research topics of 'Convergent learning algorithms for unknown reward games'. Together they form a unique fingerprint.

Cite this