On-line optimization in dynamic real-time systems

Title

On-line optimization in dynamic real-time systems

Keywords

Constrained Markov Decision Processes, On-line Optimisation, Q-learning, Probabilistic Guarantees.

Background

Classical real-time programs are made of recurring tasks that are not intended to terminate but instead that loop endlessly, each instance coming with a cyclic deadline. In some sense, they can be viewed as static systems because they do not suffer from unexpected changes over time and their analysis is also static and off-line.

A modern approach of real-time system changes this paradigm. This change comes from the theoretical side. New techniques such as weakly-hard constraints [1] or probabilistic constraints [2] have been introduced to break the rigidity of the classical framework. They are also called for by the applications. Modern embedded systems, such as autonomous cars [3] or drones, must deal with online changes (failures, updates, maintenance, reconfigurations). They must also accomodate less stringent constraints on the timing of new applications such as control functions and displays.

Traditionally, real-time tasks are classified as being either hard or soft. For hard real-time tasks, it is imperative that no deadline be missed, while for soft real-time tasks, it is acceptable to miss some of the deadlines, and it is still valuable for the system to finish the task, even if it is late. In the latter case, the acceptable deadline misses can be expressed, for instance, as a maximum number within some time interval (e.g., 3~deadline misses every hour).

One way to implement real-time systems with hard and soft real-time tasks is to solve a constrained optimization problem: The deadlines of hard tasks make up the constraints while the optimization is done over the deadlines of the soft tasks (see for example [4]).

Another dynamic aspect of modern real-time systems concerns dynamic allocation under energy or temperature contraints. When a node (server) is overloaded, its temperature may increase up to a point where breakdown is possible. At this point, a possible action is to redispatch its load to other nodes.

Yet another context where a dynamic approach will be profitable in real-time systems is for systems subject to reconfigurations, maintenance or updates. In this case, optimisation often concerns the optimal time when to start a maintenance or install an update.

Objectives

This PhD proposal fits within this general framework, and wishes to address two important issues. First, the soft guarantees will be replaced by probability constraints, for example, in a stochastic environment, soft constraints can be of the type $P( E > D) \leq \varepsilon$, where $D$ is the deadline of the task, $E$~is its execution time, and $\varepsilon$ is a given threshold, that can be adapted.

Setting such optimization problems in a probabilistic framework will bring analytic tools to deal with dynamic real-time systems. While this approach is not really new (see for example [5]), we believe it still has a lot of unexploited potential in the analysis of real-time systems. For instance, we shall seek to establish probabilistic guarantees for the deadlines of the soft tasks (e.g., bounds on the expectation of the deadlines, or more precise results such as large deviations of the tardiness [6].

The overall framework encompassing both real-time constraints (for example hard real-time deadlines) as well as optimisation in a probabilistic framework can often be modeled as Constrained Markov Decision Processes [7]. This framework will help to design optimal strategies whose degree of randomness depends on the number of constraints [7].

The second question we wish to address is even more ambitious. We aim at mixing machine learning and online optimization for real-time systems. Indeed, certain features and/or parameters of the system are often unknown to the application (e.g., the speed of the processors, the initial offsets of the clocks, the starting times of the tasks, their execution time, …), or are only partially known (meaning that the execution time might be inside a given interval, maybe with a probability distribution). The learning approach is made possible by the cyclic nature of real-time task executions: the past behavior of each task helps to understand and learn several features of the system. This information can then be exploited to better optimize the scheduling of the tasks in the future.

Therefore, the goal will be to design online optimization algorithms for the scheduling of real-time systems, to provide so-called no-regret solutions [8], meaning that the impact of the scheduling decisions taken at the beginning of the system’s life (when almost nothing is known) have a negligible impact on the average performance over the total life time of the system.

In the Markov decision framework, this approach is often known as Q-learning (introduced in [9] and thoroughly used ever since in control problems). Our proposition here is to do Q-learning over constrained Markov decision processes (which is new up to our knowledge) and mix this with no-regret criterion, with applications to real-time systems.

Positioning

The research directions proposed here are original in the context of real-time systems, whose classical objectives (worst case analysis, static analysis) do not mix well with probabilistic and dynamic models. The role of probabilitic models in real-time systems has increased recently, for example in the work of Cucu-Grosjean and her co-authors [2]. However, this work is mainly concerned with the computation of tail distributions of WCET and is not concerned with online optimisation and adaptation to unknown dynamic environments.

Context

The thesis will take place in the INRIA/LIG POLARIS and SPADES teams, in Grenoble, France in collaboration with G-SCOP. The PhD thesis will be funded by PERSYVAL-Lab as part of a project-team involving three research labs of the Grenoble area.

Requirements

A strong background in formal methods and probability is expected and knowledge in embedded systems technology would be a plus, as well as in Markov Decision Processes. Good written and oral communication skills in English are required (French is not necessary).

Candidates are invited to send their application via email to Bruno Gaujal (bruno.gaujal@inria.fr), Jean-Philippe Gayon (jean-philippe.gayon@grenoble-inp.fr), and Alain Girault (alain.girault@inria.fr). Please include a CV, a motivation letter and contact information for 2 referees.

References

[1] Zain A. H. Hammadeh, Sophie Quinton, and Rolf Ernst. Extending typical worst-case analysis using response-time dependencies to bound deadline misses. In 14th International Conference on Embedded Software 2014 (EMSOFT), New Delhi, India, October 2014.

[2] S. Altmeyer, L. Cucu-Grosjean, and R. Davis. Static probabilistic timing analysis. Real-Time Systems Journal, pages 77–123, 2015.

[3] Richard Matthaei. Functional systems architecture for an autonomous driving on-road motor vehicule. Presentation at the institut fur Regelungtechnik, 2016. Technische Universitat Braunschweig.

[4] N. Navet and B. Gaujal. Traité I2C Systèmes Temps-Réel, volume 2, chapter Ordonnancement temps-réel et minimisation de la consommation d’énergie, chap. 4. Hermès Science, 2006.

[5] N. Navet. Quantile-based performance evaluation on CAN. In 14th International CAN Conference, Paris, 2013.

[6] N. Navet, L. Cucu, and R. Schott. Probabilistic estimation of response times through large deviations. In WiP of the 28th IEEE Real-Time Systems Symposium (RTSS), Tucson, Arizona, USA, 2007.

[7] E. Altman. Constrained Markov Decision Processes. Stochastic Modeling. Chapman & Hall/CRC, 1999.

[8] J. Hannan. Contributions to the Theory of Games, Volume III, volume 39 of ser. Annals of Mathematics Studies, chapter Approximation to Bayes risk in repeated play, pages 97-139. Princeton University Press, M. Dresher, A.W. Tucker, and P. Wolfe, eds. edition, 1957.

[9] C.J.C.H. Watkins. Learning from Delayed Rewards. PhD thesis, Cambridge University, England, 1989.

Comments are closed.