Asymptotic Optimality for Decentralised Bandits

Research output: Contribution to conferenceConference Paperpeer-review

Abstract

We consider a large number of agents collaborating on a
multi-armed bandit problem with a large number of arms.
We present an algorithm which improves upon the Gossip Insert-Eliminate method of Chawla et al. (2020). We provide a regret bound which shows that our algorithm is asymptotically optimal and present empirical results demonstrating
lower regret on simulated data.
Original languageEnglish
Publication statusAccepted/In press - 28 May 2021
EventReinforcement Learning in Networks and Queues, Sigmetrics 2021 -
Duration: 14 Jun 2021 → …

Workshop

WorkshopReinforcement Learning in Networks and Queues, Sigmetrics 2021
Abbreviated titleRLNQ
Period14/06/21 → …

Cite this