Convergent multiple-timescales reinforcement learning algorithms in normal form games

DS Leslie, EJ Collins

Research output: Contribution to journalArticle (Academic Journal)

54 Citations (Scopus)

Abstract

We consider reinforcement learning algorithms in normal form games. Using two-time-scales stochastic approximation, we introduce a model-free algorithm which is asymptotically equivalent to the smooth fictitious play algorithm, in that both result in asymptotic pseudotrajectories to the flow defined by the smooth best response dynamics. Both of these algorithms are shown to converge almost surely to Nash distribution in two-player zero-sum games and N-player partnership games. However, there are simple games for which these, and most other adaptive processes, fail to converge — in particular, we consider the N-player matching pennies game and Shapley’s variant of the rock–scissors–paper game. By extending stochastic approximation results to multiple time scales we can allow each player to learn at a different rate. We show that this extension will converge for two-player zero-sum games and two-player partnership games, as well as for the two special cases we consider.
Translated title of the contributionConvergent multiple-timescales reinforcement learning algorithms in normal form games
Original languageEnglish
Pages (from-to)1231 - 1251
Number of pages21
JournalAnnals of Applied Probability
Volume13 (4)
DOIs
Publication statusPublished - Nov 2003

Bibliographical note

Publisher: Institute of Mathematical Statistics

Fingerprint Dive into the research topics of 'Convergent multiple-timescales reinforcement learning algorithms in normal form games'. Together they form a unique fingerprint.

Cite this