KernelUCB (Kernel Upper Confidence Bound)
This is a kernelised version of contextual linear bandit algorithms. The goal is to maximise rewards over a large but finite set of actions that are described by contexts. It is useful in the case when the number of actions is too big to sample all of them even once. However it assumes that we have cheap access to the similarities between actions’ contexts and that the expected reward is an arbitrary linear function of the contexts’ images in the related reproducing kernel Hilbert space (RKHS).
KernelUCB takes as input a kernel that relates the contexts, number of actions N, number of pulls T. The algorithm selects sequentially an action at any time t given the previously observed contexts, actions and rewards, so as to minimise contextual cumulative regret
[Finite-Time Analysis of Kernelised Contextual Bandits, Michal Valko, Nathan Korda, Rémi Munos, Ilias Flaounas, Nello Cristianini, UAI 2013]
Software: KerbelUCB (Matlab)