In brief, the long-term goal of the PACAP project-team is about performance of computing systems, that is: how fast programs run. We intend to contribute to the ongoing race for exponentially increasing performance and for performance guarantees.
Traditionally, the term “performance” is understood as “how much time is needed to complete execution”. Latency-oriented techniques focus on minimizing the average-case execution time (ACET). We are also interested in other definitions of performance. Throughput-oriented techniques are interested in how many units of computations can be completed per unit of time. This is more relevant on manycores and GPUs where many computing nodes are available, and latency is less critical. Finally, we also study worst-case execution times (WCET). They are extremely important for critical real-time systems where designers must guarantee that deadlines are met, in any situation.
Given the complexity of current systems, simply assessing their performance has become a non-trivial task which we also plan to tackle.
We occasionally consider other metrics related to performance, such as power efficiency, total energy, overall complexity, and real-time response guarantee. Our ultimate goal is to propose solutions that make computing systems more efficient, taking into account current and envisioned applications, compilers, runtimes, operating systems, and microarchitectures. And since increased performance often comes at the expense of another metric, identifying the related trade-offs is of interest to PACAP.
The ALF project-team witnessed the end of the “magically” increasing clock frequency and the introduction of commodity multicore processors. PACAP will likely experience the end of Moore’s law (which states that the number of transistors in a circuit doubles approximately every two years), and the generalization of commodity heterogeneous manycore processors. This impacts how performance is increased and how it can be guaranteed. It is also a time where exogenous parameters should be promoted to first-class citizens:
- the existence of faults, whose impact is becoming increasingly important when the photo-lithography feature size decreases;
- the need for security at all levels of computing systems;
- green computing, or the growing concern of power consumption.
We strive to address performance in a way as transparent as possible for users. For example, instead of proposing any new language, we consider existing applications (written for example in standard C), and we develop compiler optimizations that immediately benefit programmers; we propose microarchitectural features as opposed to changes in processor instruction sets; we analyze and re-optimize binary programs automatically, without any user intervention. The perimeter of research directions of the PACAP project-team derives from the intersection of two axes: on the one hand, our high-level research objectives, derived from the overall panorama of computing systems, on the other hand the existing expertise and background of the team members on key technology.
Improving the ACET of general purpose systems has been the core “business” of CAPS and ALF for two decades. We plan to pursue this line of research, acting at all levels: compilation, dynamic optimizations, and microarchitecture.
The goal is to maximize the performance-to-power ratio. We will leverage the execution model of throughput-oriented architectures (such as GPUs) and extend it towards general purpose systems. To address the memory wall issue, we will consider bandwidth saving techniques, such as cache and memory compression.
Real-Time Systems — WCET
Designers of real-time systems must provide an upper bound of the worst-case execution time of the tasks within their systems. By definition this bound must be safe (i.e. greater than any possible execution time). To be useful, WCET estimates have to be as tight as possible. The process of obtaining a WCET bound consists in analyzing a binary executable, modeling the hardware, and then maximizing an objective function that takes into account all possible flows of execution and their respective execution times. Our research will consider the following directions: 1) better modeling of hardware to either improve tightness, or handle more complex hardware (e.g. multicores); 2) eliminate unfeasible paths from the analysis; 3) consider probabilistic approaches where WCET estimates are provided with a confidence level.
Moore’s law drives the complexity of processor micro-architectures, which impacts all other layers: hypervisors, operating systems, compilers and applications follow similar trends. While a small category of experts is able to comprehend (parts of) the behavior of the system, the vast majority of users are only exposed to — and interested in — the bottom line: how fast their applications are actually running. In the presence of virtual machines and cloud computing, multi-programmed workload add yet another degree of non-determinism to the measure of performance. We plan to research how application performance can be characterized and presented to a final user: behavior of the microarchitecture, relevant metrics, possibly visual rendering. Targeting our own community, we also research techniques appropriate for fast and accurate ways to simulate future architectures, including heterogeneous designs, such as latency/throughput platforms. Once diagnosed, the way bottlenecks are addressed depends on the level of expertise of users. Experts can typically be left with a diagnostic as they probably know better how to fix the issue. Less knowledgeable users must be guided to a better solution. We plan to rely on iterative compilation to generate multiple versions of critical code regions, to be used in various runtime conditions. To avoid the code bloat resulting from multiversioning, we will leverage split-compilation to embed code generation “recipes” to be applied just-in-time, or even at rutime thanks to dynamic binary translation. Finally, we will explore the applicability of auto-tuning, where programmers expose which parameters of their code can be modified to generate alternate versions of the program (for example trading energy consumption for quality of service) and let a global orchestrator make decisions.
Dealing with Faults — Reliability
Semiconductor technology evolution suggests that permanent failure rates will increase dramatically with scaling. While well-known approaches, such as error correcting codes, exist to recover from failures and provide fault-free chips, the exponential growth of the number of faults will make them unaffordable in the future. Consequently, other approaches like fine-grained disabling and reconfiguration of hardware elements (e.g. individual functional units or cache blocks) will become economically necessary. This fine-grained disabling will degrade performance compared to a fault-free execution. This evolution impacts performance (both ACET and WCET). We plan to address this evolution, and propose new techniques, which can be developed at any level. For example, at microarchitecture level, one might consider designing part of a cache in an older technology to guarantee a minimum level of performance; at compile-time, one might generate redundant code for critical sections; at run-time, one can monitor faults and apply corrective measures to the software, or hardware. Solutions involving multiple levels are also very promising.
Dealing with Attacks — Security
Computer systems are under constant attack, from young hackers trying to show their skills, to “professional” criminals stealing credit card information, and even government agencies with virtually unlimited resources. A vast amount of techniques have been proposed in the literature to circumvent attacks. Many of them cause significant slowdowns due to additional checks and countermeasures. Thanks to our expertise in microarchitecture and compilation techniques, we will be able to significantly improve efficiency, robustness and coverage of security mechanism, as well as to partner with field experts to design innovative solutions.
Power consumption has become a major concern of computing systems, at all form factors, ranging from energy-scavenging sensors for IoT, to battery powered embedded systems and laptops, and up to supercomputers operating in the tens of megawatts. Execution time and energy are often related optimization goals. Optimizing for performance under a given power cap, however, introduces new challenges. It also turns out that technologists introduce new solutions (e.g. magnetic RAM) which, in turn, result in new trade-offs and optimization opportunities.
International and industrial relations
PACAP is involved in the following collaborations. For more details, please check the publications section of the web site.
- CONTINUUM: Design Continuum for Next Generation Energy-Efficient Compute Nodes – ANR project, Oct 2015 to Apr 2019.
- SECODE: Secure Codes to thwart Cyber-physical Attacks, ANR CHIST-ERA, Jan 2016 to Dec 2018.
- ANTAREX: AutoTuning and Adaptivity appRoach for Energy efficient eXascale HPC systems, H2020 FET HPC, Sep 2015 to Aug 2018.
- W-SEPT: “WCET: SEmantics, Precision and Traceability” – ANR project.
- Capacites: Calcul Parallèle pour Applications Critiques en Temps et Sûreté – Parallel computations for safty critical real-time applications, projet “Investissement d’avenir”, Oct 2014 to Feb 2018.
- Large scale multicore virtualization for performance scaling and portability – Inria Project Lab.
- Nano2017 – PSAIC (Performance and Size Auto-tuning through Iterative Compilation) – Programme de recherche & développement coopératif – Inria / STMicroelectronics
- ARGO (WCET-Aware Parallelization of Model-Based Applications for Heterogeneous Parallel Systems) H2020 project.
We are proud participants of: