Abstract
We consider a large number of agents collaborating on a
multi-armed bandit problem with a large number of arms.
We present an algorithm which improves upon the Gossip Insert-Eliminate method of Chawla et al. (2020). We provide a regret bound which shows that our algorithm is asymptotically optimal and present empirical results demonstrating
lower regret on simulated data.
multi-armed bandit problem with a large number of arms.
We present an algorithm which improves upon the Gossip Insert-Eliminate method of Chawla et al. (2020). We provide a regret bound which shows that our algorithm is asymptotically optimal and present empirical results demonstrating
lower regret on simulated data.
Original language | English |
---|---|
Publication status | Accepted/In press - 28 May 2021 |
Event | Reinforcement Learning in Networks and Queues, Sigmetrics 2021 - Duration: 14 Jun 2021 → … |
Workshop
Workshop | Reinforcement Learning in Networks and Queues, Sigmetrics 2021 |
---|---|
Abbreviated title | RLNQ |
Period | 14/06/21 → … |