Real-Life Bandits – Page 2 – Inria-Japan Associate Team

Overview

The RELIANT project is about studying sequential decision making from a reinforcement learning (RL) and multi-armed bandit (MAB) theory standpoint. In short, a bandit algorithm adaptively collects samples from a pool of actions (arms) which yield random rewards. In RL, often modeled by a Markov Decision Process (MDP), there is an underlying state that is also impacted by the chosen actions. Different objective can be considered, often related to maximizing rewards (equivalently, minimizing regret), or learning a good policy. Building on over a decade of leading expertise in advancing the field of MAB and RL theory, our two teams have also developed interactions with practitioners (e.g. in healthcare) in recent projects, in the quest to bring modern bandit theory to societal applications, for real.
This quest for real-world reinforcement learning, rather than working in simulated and toyish environments is today’s main grand-challenge of the field that hinders applications to the society and industry (see e.g. RWRL workshop \url{https://sites.google.com/view/neurips2020rwrl}). MAB are acknowledged to be the most applicable building block of RL, mostly due to several successes for online content optimization. However, as experts interacting with practitioners from different fields we have identifies a number of key bottlenecks on which joining our efforts is expected to significantly impact the applicability of MAB to the real-world. Those as related to the typically small samples size that arise in medical applications, the complicated type of rewards distributions that arise, e.g., in applications to agriculture, the numerous constraints (such as fairness) that should be taken into account to speed up learning, and the possible non-stationary aspects.

Research directions

Optimality under practical constraints: fairness, ethics, and structures.
Dealing with small sample sizes.
Relaxing typical assumption on the rewards.
Facing non-stationary, exploiting recurrence.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Presentation

Overview

Research directions

Archives

Categories