• Dynamic vectorization for heterogeneous multi-core processors with single instruction set
  • ANR JCJC project
  • April 2020 to October 2023
  • Most of today’s computer systems have CPU cores and GPU cores on the same chip. Though both are general-purpose, CPUs and GPUs still have fundamentally different software stacks and programming models, starting from the instruction set architecture. Indeed, GPUs rely on static vectorization of parallel applications, which demands vector instruction sets instead of CPU scalar instruction sets. We advocate a disruptive change in both CPU and GPU architecture by introducing Dynamic Vectorization at the hardware level.Dynamic Vectorization will combine the efficiency of GPUs with the programmability and compatibility of CPUs by bringing them together into heterogeneous general-purpose multi-cores. It will enable processor architectures of the next decades to provide (1) high performance on sequential program sections thanks to latency-optimized cores, (2) energy-efficiency on parallel sections thanks to throughput-optimized cores, (3) programmability, binary compatibility and portability.

NOPE: Normally-Off Platforms for Embedded Systems

  • Exploratory Action of the Cominlabs LabEx
  • Jan-Dec 2019
  • Partners: IETR and LS2N
  • In normally-off computing, most components of the system are turned-off when no processing is running, and an instant-on function is provided so that processing starts as soon as power is on. Having most components off power allows to drastically reduce energy consumption. Instant-on allows to apply this policy without impairing too much performance. Normally-off computing is made possible thanks to the advent of new non-volatile memory (NVM) technologies based on spintronics. NOPE focuses on normally-off computing for small embedded systems. These systems are typically built using an ultra low-power System-on-Chip (SoC) and an embedded operating system (OS) tailored for their specific needs (limited memory and processing power, limited memory hierarchy, no hardware support for virtual memory, real-time scheduling). In this domain, pushing further energy efficiency enables to envision transiently powered systems that work without any battery, using energy harvested from the environment. In such systems, designers must consider that computations are made with unreliable power sources. This paradigm, known as transient computing, calls for the provision of a systematic and automated solution to accommodate power losses. NVM technologies are now sufficiently advanced for components to be commercialized, and even integrated in off-the-shelf ultra low-power SoCs alongside a conventional memory hierarchy. Questions are open concerning more complex integration scheme in the architecture of SoCs and works are still carried on. Moreover, very few works are dealing with the impact of this integration on platform software, as well as new opportunities that could be exploited at this level. By combining skills in compilation, RTOS design, and hardware architecture, we investigate how a holistic approach encompassing both hardware and software point-of-views could be used to fully exploit normally-off/instant-on capabilities and push forward energy efficiency in autonomous embedded systems.

ARMOUR: dynAmic binaRy optiMizatiOn cyber-secURity

  • 2018 – 2020
  • ARMOUR aims at improving the security of computing systems at the software level. Our contribution will be twofold: (1) identify vulnerabilities in existing software, and (2) develop adaptive countermeasure mechanisms against attacks. We will rely on dynamic binary rewriting (DBR) which consists in observing a program and modifying its binary representation in memory while it runs. DBR does not require the source code of the programs it manipulates, making it convenient for commercial and legacy applications. We will study the feasibility of an adaptive security agent that monitors target applications and deploys (or removes) countermeasures based on dynamic conditions. Lightweight monitoring is appropriate when the threat condition is low, heavy countermeasures will be dynamically woven into the code when an attack is detected. Vulnerability analysis will be based on advanced fuzzing. DBR makes it possible to monitor and modify deeply embedded variables, inaccessible to traditional monitoring systems, and also to detect unexpected/suspicious values taken by variables and act before the application crashes.

Hybrid SIMD architectures

  • 2018 – 2019
  • The project objective is to define new parallel computer architectures that offer high parallel performance on high-regularity workloads while keeping the flexibility to run more irregular parallel workloads, inspired by both GPU and SIMD or vector architectures.

ZEP: ZEro Power computing systems

  • Inria Project Lab
  • Partners: Inria teams PACAP, CAIRN, SOCRATE, CORSE, as well as CEA LETI & LIST
  • 2017 – 2020
  • ZEP addresses the issue of designing tiny wireless, batteryless, computing objects, harvesting energy in the environment. The energy level harvested being very low, very frequent energy shortages are expected. In order for the new system to maintain a consistent state, it will be based on a new architecture embedding non-volatile RAM (NVRAM). In order to benefit from the hardware innovations related to energy harvesting and NVRAM, software mechanisms will be designed. On the one hand, a compilation pass will compute a worst-case energy consumption. On the other hand, dedicated runtime mechanisms will allow:
    1. to manage efficiently and correctly the NVRAM-based hardware architecture;
    2. to use energy intelligently, by computing the worst-case energy consumption.

    The main application target is Internet of Things (IoT).

SECODE: Secure Codes to thwart Cyber-physical Attacks

  • ANR CHIST-ERA Project
  • Jan 2016 to Dec 2019
  • Partners: Télécom ParisTech, Paris 8, Université Catholique de Louvain, Sabancı University
  • In the SECODE project, we specify and design error correction codes suitable for an efficient protection of sensitive information in the context of Internet of Things (IoT) and connected objects. Such codes mitigate passive attacks, like memory disclosure, and active attacks, like stack smashing. The innovation of this project is to leverage these codes for protecting against both cyber and physical attacks. The main advantage is a 360° coverage of attacks of the connected embedded systems, which is considered as a smart connected device and also a physical device. The outcome of the project is first a method to generate and execute cyber-resilient software, and second to protect data and its manipulation from physical threats like side-channel attacks. Theses results are demonstrated by using a smart sensor application with hardened embedded firmware and tamper-proof hardware platform.


We are proud participants of the HiPEAC network of excellence, the European Network of Excellence on High Performance and Embedded Architecture and Compilation.

Former projects


  • Design Continuum for Next Generation Energy-Efficient Compute Nodes
  • ANR Project
  • Oct 2015 to Apr 2019
  • Partners: LIRMM and Cortus SAS
  • The CONTINUUM project aims to address the energy-efficiency challenge in future computing systems by investigating a design continuum for compute nodes, which seamlessly goes from software to technology levels via hardware architecture. Power saving opportunities exist at each of these levels, but the real measurable gains will come from the synergistic focus on all these levels as considered in this project. Then, a cross-disciplinary collaboration is promoted between computer science and microelectronics, to achieve two main breakthroughs: i) combination of state-of-the-art heterogeneous adaptive embedded multicore architectures with emerging communication and memory technologies and, ii) power-aware dynamic compilation techniques that suitably match such a platform.


  • WCET-Aware Parallelization of Model-Based Applications for Heterogeneous Parallel Systems
  • European project – H2020 RIA
  • Jan 2016 to Dec 2018
  • Partners: Karlsruher Institut fuer Technologie (KIT), SCILAB enterprises SAS, Recore Systems BV, Université de Rennes 1, Technologiko Ekpaideftiko Idryma (TEI) Dytikis Elladas, Absint GmbH, Deutsches Zentrum fuer Luft und Raumfahrt EV, Fraunhofer
  • Increasing performance and reducing cost, while maintaining safety levels and programmability are the key demands for embedded and cyber-physical systems in European domains, e.g. aerospace, automation, and automotive. For many applications, the necessary performance with low energy consumption can only be provided by customized computing platforms based on heterogeneous many-core architectures. However, their parallel programming with time-critical embedded applications suffers from a complex toolchain and programming process. The ARGO research project will address this challenge with a holistic approach for programming heterogeneous multi- and many-core architectures using automatic parallelization of model-based real-time applications. ARGO will enhance WCET-aware automatic parallelization by a cross-layer programming approach combining automatic tool-based and user-guided parallelization to reduce the need for expertise in programming parallel heterogeneous architectures. The ARGO approach will be assessed and demonstrated by prototyping comprehensive time-critical applications from both aerospace and industrial automation domains on customized heterogeneous many-core platforms.


  • AutoTuning and Adaptivity appRoach for Energy efficient eXascale HPC systems
  • European project – H2020 FET HPC
  • Sep 2015 to Nov 2018
  • Partners: Politecnico di Milano, ETH Zürich, Universidade do Porto, CINECA, IT4Innovations, Dompé Farmaceutici Spa, Sygic a.s.
  • The main goal of the ANTAREX project is to provide a breakthrough approach to map, runtime manage and autotune applications for green and heterogeneous High Performance Computing systems up to the Exascale level. One key innovation of the proposed approach consists of introducing a separation of concerns (where self-adaptivity and energy efficient strategies are specified aside to application functionalities) promoted by the definition of a Domain Specific Language (DSL) inspired by aspect-oriented programming concepts for heterogeneous systems. The new DSL will be introduced for expressing the adaptivity/energy/performance strategies and to enforce at runtime application autotuning and resource and power management. The goal is to support the parallelism, scalability and adaptability of a dynamic workload by exploiting the full system capabilities (including energy management) for emerging large-scale and extreme-scale systems, while reducing the Total Cost of Ownership (TCO) for companies and public organizations.


  • Calcul Parallèle pour Applications Critiques en Temps et Sûreté – Parallel computations for safety-critical real-time applications
  • National funding – projet “Investissement d’avenir”
  • Oct 2014 to Feb 2018
  • Partners: Kalray (lead), Airbus, Open-Wide, Safran Sagem, IS2T, Real Time at Work, Dassault Aviation, Eurocopter, MBDA, Supersonic Imagine, ProbaYes, IRIT, Onera, Verimag, Inria, Irisa, Tima and Armines
  • The project objective is to develop a hardware and software platform based on manycore architectures, and to demonstrate the relevance of these manycore architectures (and more specifically the Kalray manycore) for several industrial applications. The Kalray MPPA manycore architecture is currently the only one able to meet the needs of embedded systems simultaneously requiring high performance, lower power consumption, and the ability to meet the requirements of critical systems (low latency I/O, deterministic processing times, and dependability).

Nano2017 PSAIC

  • Performance and Size Auto-tuning through Iterative Compilation
  • National funding – Programme de recherche & développement coopératif
  • Sep 2014 to Dec 2017
  • Partners: STMicroelectronics, Inria teams CAMUS, CORSE, AriC
  • The PSAIC (Performance and Size Auto-tuning through Iterative Compilation) project concerns the automation of program optimization through the combination of several tools and techniques such as: compiler optimization, profiling, trace analysis, iterative optimization and binary analysis/rewriting. For any given application, the objective is to devise through a fully automated process a compiler profile optimized for performance and code size. For this purpose, we are developing instrumentation techniques that can be focused and specialized to a specific part of the application aimed to be monitored. PACAP contributes program analyses at the binary level, as well as binary transformations. We will also study the synergy between static (compiler-level) and dynamic (run-time) analyses.


  • WCET: SEmantics, Precision and Traceability
  • ANR project, grant ANR-12-INSE-0001
  • Oct 2012 to Dec 2016
  • Partners: Verimag, IRIT, Continental
  • W-SEPT is a collaborative research project focusing on worst-case execution time guarantees. The main goal is to improve the precision and the traceability of semantics information through the compilation flow, from high-level description (such as the Lustre synchronous language) down to C and binary levels. It is supported by the competitiveness clusters (pôles de compétitivité) Aerospace Valley and Minalogic.

Inria Project Lab Multicore

  • Large scale virtualization for performance scaling and portability
  • Inria Project Lab
  • 2013 to 2016
  • Partners: Inria teams ALGORILLE, CAMUS, REGAL, RUNTIME, as well as DALI.
  • Multicore processors are becoming the norm in most computing systems. However supporting them in an efficient way is still a scientific challenge. This large-scale initiative introduces a novel approach based on virtualization and dynamicity, in order to mask hardware heterogeneity, and to let performance scale with the number and nature of cores. It aims to build collaborative virtualization mechanisms that achieve essential tasks related to parallel execution and data management. We want to unify the analysis and transformation processes of programs and accompanying data into one unique virtual machine. We hope delivering a solution for compute-intensive applications running on general-purpose standard computers.


  • COST action TACLe: Timing Analysis at Code Level
  • Oct 2012 to Sep 2016
  • Embedded systems increasingly permeate our daily lives. Many of those systems are business- or safety-critical, with strict timing requirements. Code-level timing analysis (used to analyze software running on some given hardware wrt. its timing properties) is an indispensable technique for ascertaining whether or not these requirements are met. However, recent developments in hardware, especially multi-core processors, and in software organization render analysis increasingly more difficult, thus challenging the evolution of timing analysis techniques.
    New principles for building “timing-composable” embedded systems are needed in order to make timing analysis tractable in the future. This requires improved contacts within the timing analysis community, as well as with related communities dealing with other forms of analysis such as model-checking and type-inference, and with computer architectures and compilers. The goal of this COST Action is to gather these forces in order to develop industrial-strength code-level timing analysis techniques for future-generation embedded systems, through several working groups:

    • WG1 Timing models for multi-cores and timing composability
    • WG2 Tooling aspects
    • WG3 Early-stage timing analysis
    • WG4 Resources other than time

Comments are closed.