Speaker: Diego Goldsztajn, postdoctorant within NEO team, working with Konstantin Avrachenkov
Title: Asymptotically optimal policies for weakly coupled Markov decision processes
Date and Time: 12 November 2024, 3 p.m., Inria (Lagrange Gris), Sophia Antipolis
Abstract: We will consider the problem of maximizing the expected average reward obtained over an infinite time horizon by 𝑛 weakly coupled Markov decision processes; this setup is a generalization of the multi-armed restless bandit problem that allows for multiple actions and constraints. We will establish a connection with a deterministic and continuous-variable control problem where the objective is to maximize the average reward derived from an occupancy measure that represents the empirical distribution of the states of the processes in the limit as 𝑛 → ∞ . We will prove that a solution of this fluid control problem can be used to construct simple policies for the weakly coupled processes that achieve the maximum expected average reward as 𝑛 → ∞, and we will give sufficient conditions for the existence of such solutions.Under certain assumptions on the constraints, we will prove that these conditions are automatically satisfied if the unconstrained single-process problem admits a suitable unichain and aperiodic policy. In particular, these assumptions hold for multi-armed restless bandits and a broad class of problems with multiple actions and inequality constraints. Moreover, we will show that the policies can be constructed in an explicit way in these cases.
Video: Available soon!
Slides: