Embedded System Design
Typical embedded software applications display a mix of multimedia signal/data processing with modal interfaces, resulting in heterogenous concurrent data-flow streaming models, and often stringent real-time constraints. Similarly, embedded architectural platforms are becoming increasingly parallel, with dedicated hardware accelators and manycore processors. The optimized compilation of such kinds of applications onto such execution platforms involves complex mapping issues, both in terms of spatial distribution and in terms of temporal scheduling. Currently, it is far from being a fully automatic compilation process as in the case of commodity PC applications. Models are thus needed, both as formal mathematical objects from theoretical computer science to provide foundations for embedded system design, and also as engineering models to support an effective design flow.
Our general approach is directly inspired from the theories of synchronous languages, process networks, and of real-time distributed scheduling. We insist on the introduction of logical time as functional design ingredient to be explicitly considered as first-class modeling element of systems. Logical time is based on logical clocks, where such a clock can be defined as any meaningful sequence of event occurrences, usually meant as activation/triggering conditions for actions and operations in the systems. So logical time can be multiform, a global partial order built from local total orders of clocks. In the course of the design flow time refinement takes place, as decison are made towards placement and timing of various tasks and operations. This solves in part the constraints between clocks, committing to schedule and placement decisions. The final version should be totally ordered, and then subjet to physical timing verification as to physical constraints.
The general (logical) Time Model has been standardized as part of the OMG profile for Modeling and Analysis of Real-Time Embedded systems (MARTE).
Work on polychronous formalisms (descending from Esterel), on a Clock Constraint Specification Language (CCSL) handling logical time, on Application-Architecture Adequation approach and real-time scheduling results has been progressed over the years, resulting in sofware environments such as SynDEx or TimeSquare.
Last activity report : 2016
CCSL as a Logical Clock Calculus Algebra: expressiveness and analysis techniques
CCSL is a simple, half-declarative and half-imperative language describing relations and constraints between sequences of events considered as Logical Clocks. The usage of CCSL for specification of embedded systems is powerful in that it defers the precise setting of physical timing until later implementation design phases (which may vary according to circumstances), see 8.
Early this year we established the universal recursive expressivity of CCSL, by encoding the dynamics of Petri Nets with inhibitor arcs in our framework (still unpublished). Those results were presented by Robert de Simone in a keynote talk at Memocode 2016. This result prompts the use of non-automatic methods for establishing actual schedules as solutions of CCSL specifications seen as schedulability constraints. Steps in that direction were made in [article].
We also considered the extension of CCSL towards stochastic modeling of potential input clocks as my emerge from the Cyber-Physical world (mixing probabilistic modeling of external events with discrete transformations by discrete cyber digital controlers). This work was initiated in [article], and should be further extended in the ongoing PhD thesis of Dongdong An.
Finally, we have also investigated to decide on specific schedules (e.g. periodic schedules) valid for a subset of CCSL. We have established a sufficient static condition for the existence of such a periodic schedule as well as a practical implementation to build such a solution [article] based on a SMT solver.
Industrial design flow for Embedded System Engineering
As part of the PIA LEOC Clarity collaborative project we considered the inytroduction of formal methods into a high-level model-based design environment for embedded systems, named CAPELLA (https://
Our activities consisted in demonstrating how the theoretical models of Logical Time and derives Models of Computation could be used to give precise semantics and provide simulation benefits, when applied to the modeling paradigms used in CAPELLA and advanced in Clarity. In particular we focused on the connection between timing/performance properties and other kinds of non-functional properties, including model variability.
This year we focused on two mains tasks:
First, we clarified and extended the notion of Modes and States in the Capella system engineering language. Specifically, a specific diagram has been introduced to deal with the system modes. The notion of mode is then used to specify different configurations of the system, mainly in terms of the active functions, their data dependencies, their deployment on the logical and physical architecture as well as the scenario to be verified in this specific mode. In consequence, the behavioural semantics of the mode diagram strongly interacts with the behavioral semantics of the other diagrams. The execution semantics was given by promoting our contributions in GEMOC and BCOoL (see 57).
Second, Capella proposes a consistent multi-view approach accross different engineering domains. At some step in the refinement process, these different views are extracted to a domain specific tool (like Simulink for instance). It is then required 1) to verify that the manipulation done in the domaine specific tool respect the original semantics expected by the architect, and 2) to understand the impact of the decisions made in domain specific tools on the interaction with the other views. To do so we provided a generic aproach to confront the race to the behavioral semantics we formally defined in Capella. We are curretly working on a theoretical approach to improve the overal performance of such approach.
While BCOoL and Gemoc only considers discrete models, the PhD thesis of Hui Zhao, which started in March 2016, explores a possible extension that specifically targets Cyber-Physical Systems where we different timed models combined, including both discrete and dense timed models. In this thesis, we also explore the impact of such an heterogeneous modeling framework to guarantee security and safety properties of the combined models. This is done in collaboration with Ludovic Apvrille (who is co-advisor of the thesis) from Telecom ParisTech.
Coordination of heterogeneous Models of Computation as Domain-Specific Languages
Our work this on coordination of heterogeneous languages produced two major results. The first one is the development of BCOoL (Behavioral Coordination Operator Language. BCOoL is a language dedicated to the specification of coordination patterns between heterogeneous languages. It comes with a tool chain allowing the generation of the coordination given a BCOoL operator and specific models. Our second result is the development of an heterogeneous execution engine, integrated to Gemoc studio, to run conjointly different models. Both works re extensively reported in Matias Vara Larsen PhD thesis [article].
SoC multiview (meta)modeling for performance, power, and thermal aspects
In the framework of the ANR HOPE project we progressed the definition of multiview metamodels for the design of Systems-on-Chip) (SoC systems integrating performance, power and thermal aspects. The main concern was to stress regularity and commonalitbetween those views, each developed on “domains” defined as partitions of the original block diagram (clock domains, voltage domains, floorplans,…), and with finite state machine controlers setting the levels of these domains; links between distinct views are originally provided by laws of physics, but then usually identified with discrete allowed values(such as OPP, Operating Performance Points, providing the available frequency-voltage levels for processor clocks).
The corresponding methodology, named MuArch, was reported as Ameni Khacharem PhD document [article].
MoCs and novel architectures
In the context of the FUI Clistine project we considered the links betwen formal Models of Computation and parallel programming models (MPI mainly). The objective is to figure to what level an abstraction of MPI processes as concuurent communicating processes can help for the AAA design process being applied to theselection of adequate MPI communications. This topic reflects the ongoing PhD thesis of Amine Oueslati, and the engineering work of Albert Savary in the first semester.
Solving AAA constraints analytically
We experimented on the use of SMT solvers to compute efficient mappings (bothe schedules and placement allocations) for concurrent embedded applications onto specific embedded architectures of big.LITTLE features (where alocation and migration of tasks can follow concern for low-power consumption). In fact, the work consisted greatly in a study of how the various models could be encoded to scale up, allowing the solvesr to provide results in reasonalble time. The results have been presented [article], [article], and will soon appear as E. Kofman PhD thesis.
Coupling SystemC and FMI for co-simulation of Cyber-Physical Systems
In collaboration with Professor Davide Quaglia, from the University of Verona, we are studying the proper joint modeling of interactions between different domains involved in a cyber-physical system (CPS), and specifically between the cyber and physical parts. In our first work, realized in the context of Stefano Centomo master internship, we investigated how an event based hardware description language can be used in an emerging industry standard for co-simulation (FMI/FMU developed orginally in a Modelica framework). Preliminary results were published [article], and we hope to start a PhD as follow-up of these results.
Behavioural Semantics if Open pNets
We have extended our preliminary work on Parameterised Networks of Automata (pNets), by looking at the behavioural semantics and at bisimulation equivalences for open pNet systems. These can be used to encode operators of various process algebras, construct of distributed or reactive system programming languages, or even parallel algorithmic skeletons, and generic distributed algorithms. As a first step, we studied the properties of a strong bisimulation equivalence based on logical hypotheses about the behaviour of process variables in the open systems. This has been published in [article], [article] and an extended version as an Inria research report [article]. We are now implementing algorithms for computing the symbolic behavioural semantics of open pNets, and checking strong bisimulation, using a SAT engine for reasoning on the hypotheses.
In order to understand better this behavioural semantics, we also have defined another version with a denotational flavour, namely using a “Universal Theory of Processes (UTP)” style. There we express the communication actions of pNets using traces of interaction events, and we were able to prove axiomatic properties of some simple (open) pNets. This was published in [article]. In the long term, it could be interesting to study the relations between the FH-bisimulation and the UTP semantics, relating both behavioural, denotational and algebraic semantics of pNets.
Behavioural semantics for GCM components
With Ludovic Henrio (Comred/I3S) and Rabea Ameur-Boulifa (Labsoc/Telecom-Paristech), we have pursued our research on the Behavioural semantics, in terms of pNets, of the core concepts of Grid Component Model (GCM). The results are currently submitted for publication as a journal paper, under revision.
Performance analysis and optimisation of an HPC scientific application
In the context of the internatinal Internship of Luis Agustin Nieto we conducted a large-scale experiment of source code optimization for HPC application. This work is meant to identify potential approaches that may be automatized in the future. The current use case was an application named CONVIV. CONVIV is a computer code implementing the VMFCI Method to solve the stationary Schrödinger equation for a set of distinguishable degrees of freedom (https://
This application is very computer-intensive (many hours of computation on a high performance grid computer). We have been given its source code (fortran with OpenMP), and we have been asked to analyse its performance and to optimise its execution time.
We did an extensive set of experiments for this application on many computers, and mainly on the cicada.unice.fr shared grid computer used for scientific parallel computing at UNS). We varied many parameters in our experiments:
The number of threads was 2, 4, 6, 8, 16 threads. We also analysed the sequential code version.
The thread affinity strategies for scheduling were: none (linux scheduler), scatter, compact.
We repeated each experience 35 times to analyse performance stability.
We used 2 compilers (gfortran, ifort) with -O3.
We did a precise performance profiling using the Intel Vtune tool.
During our experiments we observed that, even with all the parameters above kept fixed, repeating the executions 35 times shows grat variability between best and worst execution times (more than double in some cases). The critical-path functions remained the same for each configuration choice, including in particular specific matrix computation functions.
After investigation and experiments, we succeed in getting a spectacular performance improvement by applying the following optimisations:
Replace one of the matrix computation function by an MKL one (highly optimised and tuned function done by Intel).
Use the compact thread scheduling strategy (OpenMP parameter).
By using gfortran compiler with -O3, we reduced the execution time from 18400 seconds to 820 seconds (speedup=22).
By using the ifort compiler with -03, we reduced the execution time from 21000 seconds to 620 seconds (speedup=33).
Formal translation validation of multi-processor real-time schedules
This research direction is mainly represented by the PhD thesis of Keryan Didier, and takes place in the framework of the ITEA3 ASSUME project. The technical focus of the ASSUME project is on formal compiler verification and on correct real-time implementation for parallel applications. The objective of this PhD thesis is to formally prove the correctness of (part of) the automatic code generation technology of Lopht, considering the respect of non-functional requirements, and in particular real-time requirements such as release dates, deadlines and periods.
During this first year of work we have:
Simplified the allocation and scheduling algorithms of Lopht to facilitate proof while still being able to handle the industrial use case. The resulting algorithms consider all the aspects pertaining to functional specification and non-functional requirements, but make simplifying assumptions on the execution platform (by not taking into account memory access interferences during parallel execution).
Developed a formally proved translation validation tool to determine the correctness of schedules produced by the algorithms at point (1). The tool is developed and proved in Coq. Coq code extraction is used to produce OCaML code that integrates in the allocation and scheduling flow.
Evaluated the tool on a large-scale industrial use case from Airbus (6000 Scade nodes). We demonstrated the tool to our project partners and during the ASSUME project evaluation. This evaluation showed that our scheduling and formally proved validation tools scale up to the size of large applications.
The main limitation of the current work is that it does not take into account the interferences due to concurrent memory accesses. This gives the main research direction for the next year.
We are currently writing a paper on this subject.
Lopht back-end for TTEthernet-based distributed systems
The global objective of this activity is a large-scale, ongoing effort to assess the possibility of automatically synthesizing full real-time implementations, including the so-called “bus frame” (the network configuration) on complex industrial platforms and for complex functional and non-functional specifications. We worked this year in the context of the post-doctoral position of Raul Gorcitz, funded by the ITEA3 ASSUME project, but also in the framework of our collaboration with CNES and Airbus DS.
The chosen platform was an industry-level evaluation platform using several Single-Board Computers (SBCs) running the VxWorks 653 OS, and connected through a Time-Triggered Ethernet (TTE) network. This platform was provided by CNES, as typical target for embedded applications. TTE is a standardized commmercial communication network, on top of a switched Ethernet basis, ommmercialized by TTTech. TTE adds support for realtime and fault tolerant communications, allows multiple communications of mixed criticalities to share a single physical medium. This is ensured by means of dedicated hardware using a set of configuration files describing the system architecture and behavior. These configurations are synthesized by the proprietary TTEplan tool starting from a global network description file.
The main scientific difficulty was the formal modeling of the behavior of the TTE network, followed by the extension of scheduling algorithms to consider such a network. While preliminary results were obtained and published last year, we completed and demonstrated this work to our industrial partners, and we are currently writing a second paper on the subject.
Uniprocessor Real-Time Scheduling
In the context of the master internship of Mehdi Mezouak, we thoroughly tested the offline time triggered scheduler implemented on an ARM Cortex M4 last year. We remind that this scheduler, intended for safety critical applications, uses a scheduling table containing the instants when the scheduler will be called through interruptions triggered by a timer. This table is generated by a uniprocessor offline schedulability analysis which accounts accurately for the scheduler cost itself, and for the cost of all preemptions the data dependent tasks are subjected to. This approach allows accounting for preemptions induced by the cost of other preemptions. We implemented a time measurement system on a LPC4080 microcontroller board of NXP which includes the ARM Cortex M4 and several timers, to determine on the one hand the actual cost of the scheduler and the cost of one preemption, and on the other hand start, resume and completion times of every task of the task sets. For the ARM Cortex M4 with a 120Mhz clock we obtained 142 cycles (2.3
Multiprocessor Real-Time Scheduling
Always in the context of the master internship of Mehdi Mezouak, we studied the extension to multiprocessor of our offline time triggered scheduler. Since we chose the partitioned multiprocessor scheduling approach rather than the global one which is not suited to safety critical applications due to the prohibitive cost of task migrations, the uniprocessor schedulability analysis is easily extended. Indeed, the main modification consists, for every processor, in accounting for the cost of inter-processor communications and synchronizations due to data dependences when a producer task is allocated to a processor which is different from the one the corresponding consumer task is allocated to. Therefore, new scheduler calls are added to the scheduling table corresponding to instants when awaited data are available, i.e. produced and then transfered. Of course, there are as many scheduling tables, and thus schedulers, as there are processors, and these scheduling tables are supposed to share a unique global time. The implementation of this global time raises a complex problem since it is not possible to dispatch a unique physical clock to all the processors. Among various solutions, we chose to use a physical clock rather than a logical one like in the Lamport’s timestamp approach since we are interested in safety critical real-time. In addition, we chose the Berkeley’s algorithm based on a master-slave approach where the clock server is maintained by one of the processor of the multiprocessor. This algorithm is more robust to failures than other algorithms based on an external clock server. Finally, using the measurement system mentioned previously, we measured accurately the cost of inter-processor communications according to the number of transfered data, in the case of an ethernet network that we experimented last year to connect several LPC4080 microcontroller boards.
During the second year of the PhD thesis of Salah Eddine Saidi, we continued to study the parallelization on multi-core of FMI-based co-simulation of numerical models, that is increasingly used for the design of Cyber-Physical Systems. Such model developed according to the FMI standard is defined by a number of C functions, called “operations”, for computing its variables (inputs, outputs, state) and data dependences between these variables. Each model has an associated integration step and exchanges data with the other models according to its communication step which can be larger or equal to its integration step. These models are represented by a dataflow graph of operations [article] that is compliant with the conditioned repetitive dataflow model of our AAA methodology for functional specification. Our work mainly focused on two aspects. First, we proposed a graph transformation algorithm in order to allow handling multi-rate co-simulation, i.e. where connected models have different communication steps. This algorithm is based on the concept of graph unfolding similarly to the unrolling algorithm of our AAA methodology. The new graph is represented over the hyper-step which is equal to the least common multiple of the communication steps of all the models. Each operation is repeated in the graph according to the ratio between the hyper-step and its communication step. Then, rather than adding edges connecting all the repetitions of dependent operations, specific rules are used to define the repetitions that have to be connected by edges. These rules ensure correct data exchange between the operations as requested in the context of simulation. Second, some FMI functions called to compute model variables may not be “thread-safe”, i.e. they cannot be executed in parallel as they may share some resource (e.g. variables). Consequently, if two or more operations belonging to the same model are executed on different cores, a mechanism that ensures these operations are executed in strictly disjoint time intervals must be set up. We proposed an acyclic orientation heuristic to solve this problem. This heuristic adds non directed edges between the operations that belong to the same model, and then assigns directions to these edges with the aim of minimizing the critical path of the resulting graph and subject to the constraint that no cycle is generated in the graph.
Probabilistic Solutions for Hard Real-Time Systems
The probabilistic solutions for hard real-time systems are built under the hypothesis that worst case values and worst case execution scenarios have extremely low probability of appearance. While continuing the estimation of bounds for the worst case execution times of a program [article], [article], we have proposed the first utilisation of probabilistic description for mixed-criticality systems [article]. Our result is exploiting the heavy tails of the execution times of a program to propose efficient scheduling solutions. Moreover since the feasibility intervals [article] for a probabilistic real-time system is not formally identified, we have formulated the first feasibility reasoning for such systems [article] under fixed-priority assignment policies [article]. Another important problem for probabilistic real-time systems concerns the feasibility in presence of precedence constraints, often used by our industry partners. The introduction of precedence constraints requires the comparison of probabilistic arrivals and we showed that existing measures are not correct in this context and we proposed and proved correct new measures [article].