Research output per year
Research output per year
Conor J. Newton*, Ayalvadi Ganesh, Henry W.J. Reeve
Research output: Contribution to journal › Article (Academic Journal) › peer-review
We consider a large number of agents collaborating on a multi-armed bandit problem with a large number of arms. The goal is to minimise the regret of each agent in a communication-constrained setting. We present a decentralised algorithm which builds upon and improves the Gossip-Insert-Eliminate method of Chawla et al. (International conference on artificial intelligence and statistics, pp 3471–3481, 2020). We provide a theoretical analysis of the regret incurred which shows that our algorithm is asymptotically optimal. In fact, our regret guarantee matches the asymptotically optimal rate achievable in the full communication setting. Finally, we present empirical results which support our conclusions.
| Original language | English |
|---|---|
| Pages (from-to) | 307-325 |
| Journal | Dynamic Games and Applications |
| Volume | 13 |
| Early online date | 20 Jun 2022 |
| DOIs | |
| Publication status | Published - 1 Mar 2023 |
Research output: Contribution to journal › Article (Academic Journal) › peer-review