M2 Course, ENS Lyon: Optimal Decision Making and Online Optimization

Optimal Decision Making
and Online Optimization

Bruno Gaujal
Panayotis Mertikopoulos

Course Description
Learning theory is a catch-all term for automated prediction, decision-making and inference from large datasets that need to be understood and processed quickly – one of the key challenges of the so-called “Big Data” paradigm. In this course, we will provide an introduction to learning theory focusing on the prediction and optimal decision-making aspects. Specifically, we will cover the basics of Markov decision processes, Bellman’s optimality principle, multi-armed bandits and regret minimization (for both continuous and discrete problems).Throughout the course, we will focus on the algorithmic aspects of the theory and we will hold regular implementation tutorials for the course’s algorithmic content.

Course outline

I Dynamic optimization under uncertainty
• Markov decision processes
• Bellman equations
• Value/policy iteration algorithm

II Online learning
• The bandit problem
• Regret minimization
• Upper/lower regret bounds and how to achieve them

III Online convex optimization
• Static convex optimization (review)
• Algorithms
• Limited feedback

basic knowledge of probability theory, linear algebra and analysis over the reals

In addition to homework and in-class exercices, students will be given a research article to analyze and review under the form of a written report and an oral presentation. The final grade will be a function of all these.


1. Sébastien Bubeck and Nicolò Cesa-Bianchi, Regret analysis of stochastic and nonstochastic multi- armed bandit problems, Foundations and Trends in Machine Learning 5.
2. Stephen P. Boyd and Lieven Vandenberghe, Convex optimization, Cambridge University Press, Cambridge, UK, 2004.
3. Nicolò Cesa-Bianchi and Gábor Lugosi, Prediction, learning, and games, Cambridge University Press, 2006.
4. Yurii Nesterov, Introductory lectures on convex optimization: A basic course, Applied Optimization, no.87,Kluwer Academic Publishers, 2004.
5. Shai Shalev-Shwartz,Online learning and online convex optimization,Foundations andTrends in Machine Learning 4.

Comments are closed.