The k-Nearest Neighbour UCB Algorithm for Multi-Armed Bandits with Covariates

Henry Reeve, Joe Mellor, Gavin Brown

Research output: Contribution to journalArticle (Academic Journal)peer-review

14 Citations (Scopus)
103 Downloads (Pure)


In this paper we propose and explore the k-Nearest Neighbour UCB algorithm for multiarmed bandits with covariates. We focus on a setting where covariates are supported on a subspace of low intrinsic dimension, such as a manifold within a high dimensional ambient feature space. Unlike previous methods, such as the UCBogram and Adaptively Binned Successive Elimination, the k-Nearest Neighbour UCB algorithm does not require prior knowledge of either the time horizon or the intrinsic dimension. We prove a regret bound for the k-Nearest Neighbour UCB algorithm which is minimax optimal up to logarithmicfactors. In particular, the algorithm automatically takes advantage of both low intrinsic dimensionality of the marginal distribution over the covariates and low noise in the data, expressed as a margin condition. In addition, focusing on the case of bounded rewards, we give corresponding regret bounds for the k-Nearest Neighbour KL-UCB algorithm, which is an analogue of the KL-UCB algorithm adapted to the setting of multi-armed bandits with covariates. Finally, we present empirical results which demonstrate the ability of both the k-Nearest Neighbour UCB and k-Nearest Neighbour KL-UCB to take advantage of situations where the data is supported on an unknown sub-manifold of a high-dimensionalfeature space.
Original languageEnglish
Number of pages28
JournalProceedings of Machine Learning Research
Publication statusPublished - 9 Apr 2018
EventAlgorithmic Learning Theory 2018 - Lanzarote, Spain
Duration: 7 Apr 20189 Apr 2018


Dive into the research topics of 'The k-Nearest Neighbour UCB Algorithm for Multi-Armed Bandits with Covariates'. Together they form a unique fingerprint.

Cite this