Presentation

Overview

The topic of our cooperation is the development of proved multicore schedulers. An OS kernel thread scheduler decides which thread runs at a given moment on a given core. A thread scheduler is a key OS service since any bad decision that it makes may lead to cores wasting cycles and may increase application response times [3]. Recent work has shown that the property of work conservation (no core remains idle when work is ready to be scheduled) is sometimes violated by Linux’s CFS scheduler [3], as well as by recent versions of FreeBSD’s ULE scheduler [4]. Indeed, developing a production thread scheduler is very difficult. A thread scheduler must operate under stringent performance constraints [5], support concurrency, adapt to complex hardware specificities, such as NUMA, anticipate thread behavior, and address not only scheduling of the threads within cores but also balancing the load across cores. As a result, while schedulers were services of a few hundred lines of code twenty years ago when machines were single core, they have grown into highly complex pieces of software in the era of multicore NUMA machines. As an example, CFS contains more than 20,000 lines of code in Linux 4.13, released in 2017.

Over the last two years, we have explored a novel approach for developing provable multicore schedulers based on the identification of key scheduling abstractions and the realization of these abstractions as a Domain-Specific Language (DSL), Ipanema. While DSLs are typically proposed to facilitate programming in a particular domain [6], this aspect is secondary in our work. Instead, we rely on the fact that our scheduling abstractions have a clean semantics that is enforced by the DSL design, and the DSL compiler. Expressing a scheduling policy in terms of these abstractions, via the Ipanema DSL, then implies that we can assemble the properties guaranteed by the abstractions into proofs of properties of the complete scheduling algorithm. As part of this work, we introduce a concurrency model that relies on execution of scheduling events in mutual execution locally on a core, but that still permits reading the state of other cores without requiring locks. Furthermore, we show that work conservation cannot be ensured in a non-stable system and introduce Weak Work Conservation (WWC), a property that is provable even when threads are created, unblocked, blocked, or terminated during the execution, while remaining sufficiently strong to characterize functionally correct schedulers. We prove WWC in the presence of concurrent thread events while performing load balancing. This is a key feature for allowing a CFS-like WWC scheduler to achieve the same performance as the original CFS scheduler.

So far, our methodology for proving schedulers has been published at HotOS 2017 [1]. We also made a comparison of the CFS and ULE schedulers, to understand their precise behaviors, and the results of this study appeared at Usenix ATC 2018 [2].

Research Topics

In the three next years, we will leverage on our existing results towards the following directions:

  • Better understanding of what should the best scheduler for a given multicore application. This topic is mainly funded by the Oracle donation, which support an engineer for the current year. As part of this topic, we are developing tools for understanding scheduler behavior.
  • Proving the correctness of the C code generated from the DSL policy and of the Ipanema abstract machine. This topic is funded by the ANR VeriAmos (2018-2021). VeriAmos is lead by Xavier Rival (Inria-Antique). The idea here is to apply static analyses designed by Antique so that we can check that the C code that runs within the OS is correct. In terms of systems, the main challenge here is to design an abstract machine that can be proved without loss of efficiency. This will be done mainly by a post-doc to be hired on VeriAmos.
  • Extend the Ipanema DSL to the domain of I/O request scheduling. This topic is founded by the ANR VeriAmos. We will extend the abstractions provided by the Ipanema DSL, which currently targets thread scheduling, so that we can capture the needs of I/O request scheduling.
  • Design of a provable complete concurrent kernel. This is the long term goal of the cooperation. We want to apply the lessons learn on specific services such as scheduling to a complete concurrent kernel.

References

  1. The Battle of the Schedulers: FreeBSD ULE vs. Linux CFS. Justinien Bouron, Sebastien Chevalley, Baptiste Lepers, Willy Zwaenepoel, Redha Gouicem, Julia Lawall, Gilles Muller, Julien Sopena. USENIX ATC 2018
  2. Towards Proving Optimistic Multicore Schedulers. Baptiste Lepers, Willy Zwaenepoel, Jean-Pierre Lozi, Nicolas Palix, Redha Gouicem, Julien Sopena, Julia Lawall, Gilles Muller. HotOS 2017
  3. The Linux scheduler: a decade of wasted cores. Jean-Pierre Lozi, Baptiste Lepers, Justin R. Funston, Fabien Gaud, Vivien Quéma, Alexandra Fedorova, EuroSys 2016
  4. [PATCH] Fix bug in which the long term ULE load balancer is executed only once. https://bugs:freebsd.org/bugzilla/show_bug.cgi?id=223914.
  5. KANEV, S., DARAGO, J. P., HAZELWOOD, K., RANGANATHAN, P., MOSELEY, T., WEI, G.-Y., AND BROOKS, D. Profiling a warehouse-scale computer. In Computer Architecture (ISCA), 2015 ACM/IEEE 42nd Annual International Symposium on (2015), IEEE, pp. 158–169.
  6. MULLER, G., CONSEL, C., MARLET, R., BARRETO, L. P., MERILLON, F., AND REVEILLERE, L. Towards robust OSes for appliances: A new approach based on domain-specific languages. In Proceedings of the 9th workshop on ACM SIGOPS European workshop (2000), ACM, pp. 19–24.

Comments are closed.