Postdoc: High Performance Deep Reinforcement Learning

Postdoc: High Performance Deep Reinforcement Learning

  • Requirement: PhD in computer Science
  • Location: Grenoble or Lille
  • Hosting Teams:
    • Sequel (INRIA Lille): https://team.inria.fr/sequel/
    • DataMove (INRIA Grenoble): https://team.inria.fr/datamove
  • Contact: Bruno.Raffin@inria.fr and Philippe.Preux@inria.fr
  • Period: Flexible starting date (early 2021)
  • Duration: 24 months
  • To apply:  jobs.inria.fr
Reinforcement learning goal is to self-learn a task trying to maximise a reward (a game score for instance). The learning process  acts by interacting with a simulation code  to explore the space of possible states.  As an explicit exploration is not possible as too large, the key to success is  in  building an efficient exploration strategy balancing between  exploration (test new states), exploitation (replay actions known to lead to high rewards).  Using deep neural networks to encode the decision process as lead to significant progress. This is often referred as Deep Reinforcement Learning (DRL). A classical benchmark where DRL thrives are  ATARI games.  The most visible success of DLR is probably AlphaGo Zero that  outperformed the best human players (and itself) after being trained without using data from human games but solely through reinforcement learning.  The process requires an advanced infrastructure for the training phase. For instance AlphaGo Zero trained during more than 70 hours using 64 GPU workers and19 CPU parameter servers for playing 4.9 million games of generated self-play, using 1,600 simulations for each Monte Carlo Tree Search.
The general workflow  is the following.  To speed up the learning process and enable a wide but thorough exploration of the parameter space, the learning neural network  interacts  in parallel with several instances of actors, each one consisting of a simulation of the task being learned and  a neural network interacting with this simulation through the best wining strategy it knows. Periodically the actor neural networks are being updated  by the learned neural network.  This workflow has evolved through various research works combining parallelisation, asynchronism, replay buffers  and  learning strategies (GORILA, A3C, IMPALA,…).
Latest developments have shown that massive parallelism is a key enabler to address more complex problems. The Rllib framework is designed to automatically distribute RL environments at scale. Google/Deepmind recent announcement of the Menger framework goes in the same direction.
The goal of  this postdoc is to investigate  novel training strategies to learn more  rapidly and more complex  tasks (multiple heterogeneous tasks  at  once,  non  deterministic  games,  simulations  of  complex industrial  or  living  systems)  relying on  massive  parallelism to enable. This postdoc is very flexible  on the directions it can take. We expect that  the candidate bring its  own experience and view  on these topics. Focus can address (not limited):
  1. Learning novel problems  typically taken from  traditional scientific domains like physics  or biology where  there exists mature, often large scale simulation codes;
  2. Developing novel learning rules specifically designed for large scale where  loosening synchronisation requirements are critical;
  3. Addressing middleware and system issues in deploying and running very large scale DRL;
  4. Developing novel parallelisation algorithms for some of the DRL components (replay buffer, model/data parallel training)
  5. Application of DRL as an adaptive strategy for smart parametric search space exploration  for ensemble run based scenarios like data assimilation, hyperparameter search, uncertainty quantification.
This work will be performed in close collaboration in between the Sequel INRIA team specialised in DRL (https://team.inria.fr/sequel/)  and the DataMove team specialised in HPC (https://team.inria.fr/datamove). Datamove and Sequel are involved in an INRIA group focused on the convergence between HPC, AI and Big Data  (https://project.inria.fr/hpcbigdata/). The candidate will participate to that group too.
The SequeL team is leading research group on reinforcement learning, either deep or not, ranging from theoretical aspects to applications. For instance Sequel organised the international Summer School on RL in 2019 (https://rlss.inria.fr). Among other projects, SequeL has collaborated with Mila (Montréal) to design and develop the Guesswhat?! experiment (https://guesswhat.ai/). As early as 2006, SequeL worked on go game and designed the first go program (Crazy Stone) able to challenge a human expert player (https://www.remi-coulom.fr/CrazyStone/).
Datamove has a long experience on high performance computing and data analytics and has been using machine learning within an HPC context for some times now (https://hal.archives-ouvertes.fr/hal-01221186). Datamove  is also  developing the Melissa (https://melissa-sa.github.io/) solution to manage  large ensembles of parallel simulations  and aggregate their data on-line in a parallel server. Melissa stands out  by its flexibility, efficiency  an resilience. Melissa  enabled to run tens of thousands of simulations on up to 30 000 cores. Melissa as been used for computing statistics, train deep surrogate models.  We expect it  to be a sound base for a DRL workflow.

We are looking for a candidate with a PhD either in deep learning, reinforcement learning or high performance computing (a combination of these expertise would be ideal) for a 24 month contract at INRIA. The candidate will have the possibility to join either the Sequel team at Lille or the Grenoble Team at Grenoble.

The postdoc will have access to large supercomputers equipped with multiple GPUs for experiments. We expect this work to lead to international publications  backed by  advanced software prototypes.

4 References

 

Comments are closed.