HIPEAC vision 2011/12 “The advent of 3D stacking enables higher levels of integration and reduced costs for off-chip communications. The overall complexity is managed due to the separation in different dies, independently designed.”
FPGAs (Field Programmable Gate Arrays) are configurable circuits that have emerged as a privileged target platform for intensive signal processing applications. FPGAs take advantage of the latest technological developments in circuits. For example, the FVirtex7 from Xilinx offers a 28-nanometer integration, which is only one or two generations behind the latest processors. 3D-Stacked Integrated Circuits (3D SICs) consist of two or more conventional 2D circuits stacked on the top of each other and built into the same IC. Recently, 3D SICs have been released by Xilinx for the Virtex 7 FPGA family. 3D integration will vastly increase the integration capabilities of FPGA circuits. The convergence of massive parallelism and dynamic reconfiguration in inevitable: we believe it is one of the main challenges in computing for the current decade.
By incorporating the configuration and/or data/program memory on the top of the FPGA fabric, with fast and numerous connections between memory and elementary logic blocks, it will be possible to obtain dynamically reconfigurable computing platforms with a very high reconfiguration rate (~10000 connections between dies). Such a high dynamic reconfiguration rate was not possible before, due to the serial nature of the interface between the configuration memory and the FPGA fabric itself. The FPGA technology also enables massively parallel architectures due to the large number of programmable logic fabrics available on the chip. For instance, Xilinx demonstrated 3600 8-bit picoBlaze softcore processors running simultaneously on the Virtex-7 2000T FPGA. For specific applications, picoBlaze can be replaced by specialized hardware accelerators or other IPs (Intellectual Property) components. This opens the possibility of creating massively parallel IP-based machines.
Multi-softcore on 3D FPGA
From the 2010 Xilinx white paper on FPGAs: “Unlike a processor, in which architecture of the ALU is fixed and designed in a general-purpose manner to execute various operations, the CLBs (configurable logic blocks ) can be programmed with just the operations needed by the application… The FPGA architecture provides the flexibility to create a massive array of application-specific ALUs..The new solution enables high-bandwidth connectivity between multiple die by providing a much greater number of connections… enabling the integration of massive quantities of interconnect logic resources within a single package.”
Softcore processors are processors implemented using hardware synthesis. Proprietary solutions include PicoBlaze, MicroBlaze, Nios, and Nios II; open-source solutions include Leon, OpenRisk, and FC16. The choice is wide and many new solutions emerge, including multi-softcore implementations on FPGAs. An alternative to softcores are hardware accelerators on FPGAs, which are dedicated circuits that are an order of magnitude faster than softcores. Between these two approaches, there are other various approaches that connect IPs to softcores: therefore, the processor’s machine-code language is extended, and IP invocations become new instructions. A new class of softcores (we call them reflexive softcores) should emerge, where almost everything is implemented in IPs; only the control flow is assigned to the softcore itself. The partial dynamic reconfiguration of next-generation FPGAs makes such dynamic IP management possible in practice. Thus, a reflexive softcore dynamically adds, removes, and replaces IPs in the application running on it. Moreover, we believe that we in order to obtain efficient massively parallel implementations of softcores on the new 3D-FPGA,s one should make the softcore as small as possible, and replace low-performance generic hardware components (ALU, registers, memory, I/O…) by dedicated high-performance IPs.
The reflexive softcore HoMade that we have started developing in 2012 (http://www.lifl.fr/~dekeyser/Homade) follows these ideas.The current version is a monoprocessor. Users have to add to it the functionality they need in their applications via static or dynamic IPs. We develop a library of IPs for basic processor functions (ALU, registers, …).
Another challenge we identify is extending the (mono) reflexive softcores, described above, to multi-reflexive softcores, in which some softcores are slaves and other are masters. Massively parallel dynamically reconfigurable architectures of softcores can thus be envisaged. This requires a parallel management of the partial dynamic reconfiguration system. This can be done for chosen subsets of softcores: a massively parallel reconfiguration will replace the current replication of a given IP with the replication of a new IP. Thanks to the new 3D-FPGAs this task can be performed efficiently and in parallel using the large number of 3D communication links (Through-Silicon-Vias). Our roadmap for HoMade is to make it evolve towards this multi-reflexive softcore model.
When Hardware Meets Software
HIPEAC vision 2011/12 ‘The number of cores and instruction set extensions increases with every new generation, requiring changes in the software to effectively exploit the new features’
When the new massively parallel dynamically reconfigurable architectures become reality users will need languages for programming on them. The languages will be themselves dynamic and parallel, in order to reflect and to fully exploit the dynamicity and parallelism of the architectures. Thus, developers will be able to use reconfiguration and parallel constructs in their programs. This expressiveness comes with a cost, however, because new classes of bugs can be induced by the interaction between dynamic reconfiguration and parallelism; for example, deadlocks due to waiting for output from an IP that does not exist any more due to a reconfiguration. The early detection of such bugs and their and elimination before deployment is paramount for cost-effectiveness and safety reasons.
Thus, we shall build an environment for developing software on parallel, dynamically reconfigurable architectures that will include languages and adequate formal analyses and verification tools for them, in addition to more traditional tools (emulators, compilers, etc). To this end we shall be using formal-semantics frameworks associated with easy-to-use formal verification tools in order to formally define the semantics of the language and allow users to formally verify their programs. The K semantic framework (http://k-framework.org, developed jointly by Univs. Urbana Champaign, USA, and Iasi, Romania) is one such framework, which is mature enough (it has allowed defining a formal semantics of the largest subset of the C language to date and is familiar to us from previous work.
HiPEAC vision 2011-2012: “reconfiguration, customization and runtime adaptation techniques will facilitate switching between tasks during the deployment of smart camera networks”.
We shall focus on embedded systems performing intensive computations, in particular smart-camera systems, set-up-boxes, and transportation/aerospace/avionics applications of companies with whom we have ongoing collaborations. Some of the targeted classes of applications are safety-critical, and formal verification is essential for them. For the other ones, formal verification provides an added value in terms of quality of service.
Reflexive Camera Networks
A Smart Camera (SC) is a vision system which, in addition to image-capturing capabilities, is able to extract application-specific information from the captured images and to automatically make intelligent decisions based on them. Dynamicity is inherent in SCs: processing may change depending on the specific observations they make and on the context. For example, an SC may use a low-quality face recognition IP while observing an office during the day, but switch to a high-quality one if it detects an intrusion during the night. Moreover, image processing requires high-performance computing, which is achieved by using parallelism. Thus, the integration of dynamic reconfiguration and parallelism, which is addressed by our project, is naturally present in SCs. Previous work in the DaRT team has already explored efficient uses of FPGAs in an SC network deployed in a retail store. A new proposal concerns embedded reflexive camera in the Smart Cities multidisciplinary project developed on the University Lille 1 campus.
Television sets and set-top-boxes are forming a symbiotic connection, which relies on common standards and protocols such as DLNA, Web standards, Web 2.0, H264, HEVC… As a result, the hardware platform on which applications run is becoming less importantt: commonly used ISAs like x86 are not mandatory anymore. Dedicated pieces of hardware could efficiently provide specific services according to user requests. End-users expect platforms supporting many services with maximum performance, but do not require all of them at the same time. Dynamic reconfiguration is here too a good compromise, and it is efficient enough to support high performance algorithms like H264 or HEVC. It could also provide a ground for supporting on-the-fly codec switching. This may occur because the broadcaster decides to change the encoding of its video signal for safety reasons. Dreampal has started a collaboration with Kalray (http://www.kalray.eu) to develop a massively parallel language (without dynamic reconfiguration facilities, for now) on their MPSoC. H264 will be tested on this chip and also on special FPGA boards with dedicated extensions for multimedia applications like the Xilinx Zynq.
Transportation, Aerospace and Avionic Industries
Intensive signal-processing is becoming widely used nowadays in aerospace, defence, avionics, and transportation applications. Such systems often operate in uncertain environments. The essential feature of such systems to reconfigure themselves at run-time induces an additional complexity on the designs. Dreampal continues current industrial collaborations with EADS, Eurocopter, and Nolam Embedded Systems. We are also planning to cooperate with the European institute RAILENIUM, which deals with the competitiveness requirements related to the rapidly evolving market conditions of the rail industry.
Diverse and Complementary Backgrounds for One Project
To summarize our proposal: we shall exploit the features of the upcoming 3D-FPGAs in order to design dynamically reconfigurable, massively parallel architectures as well as languages and software tools for programming applications on these architectures and formally verifying them. The target application domain is embedded systems with intensive computation. A visual description of our scientific objectives together with the main participants is shown below.
1. HoMade: a Reflexive Softcore
Our contributions to the integration of dynamic reconfiguration and massive parallelism are centered around a softcore processor called HoMade. It is an ultra-Risc stack-based processor, with only 12 instructions, essentially control ones (i.e. jumps, procedure call/return, master/slaves invocation…). One other instruction is used to trigger IPs via their identifier. The IPs can be arbitrary complex, including ALU functions, register files, load/store units, or even other processors. A user pre-selects a set of IPs that will at some point be used by the application, to be invoked at runtime by the IP trigger instruction. Thanks to the partial dynamic reconfiguration offered by recent FPGAs, the same IP identifier can be used with software (HoMade executable code) and hardware (VHDL) IP instantiations. HoMade is a reflexive processor that extends intercession to hardware: “Intercession is the ability for a program to modify its own execution state or alter its own interpretation or meaning”.
Some benefits of reflexivity are:
a reduced silicon surface, because only the IPs actually present on the FPGA use space;
a reduced power consumption, due to the same reason as above, but for energy consumption;
a reduced execution/response time, by switching from software code to hardware IPs at runtime.
These savings are important for embedded systems that often need to operate with limited resources. We shall develop performance and power-consumption evaluations for HoMade-based architectures, based on runtime distributed and dynamic monitoring of the processor activities (mainly the IPs). At the end of the project this evaluation system will constitute the foundations of self-adaptive reconfigurable parallel architectures, to be explored in future projects.
Short Term — HoMade Core
Our primary objective is to strengthen the foundations of HoMade and to validate its basic concepts. HoMade 2.2 is the current version distributed on the web (http://www.lifl.fr/~dekeyser/Homade). It implements a 3-stage pipeline to improve processor frequency. Intercession is already supported via the WIM (Write Instruction Memory – HoMade is an Harvard architecture) and allows several software IPs and one hardware IP to successively use the same IP identifier. There is no dynamic reconfiguration of hardware IPs yet.
The earlier version HoMade 1.4 was used by 100 master students during the 2012-2013 academic year. Version 2.2 is used by some students from Univ. Lille and from several universities in Tunisia for their practice period. We are developing the web site with documentations and downloading page for the current and future versions. We are also writing a book on HoMade, which could be published next year.
Porting HoMade to several Xilinx boards The current version of HoMade is running on a Xilinx Nexys3 board with a Spartan6 FPGA. Portability between FPGAs is not trivial, even between those from the same constructor. In particular all the IPs depending on the board, at least the I/O ones, have to be rewritten, and all the clock and mapping constraints are specific to the FPGA. In the close future we shall port HoMade to Virtex6 and to the new Virtex7 and Kintex7. This will allow us to identify what is dependent of the chip and what is not, in order to facilitate future portings.
Hardware performance improvement A softcore is efficient if its clock period is short and if the number of cycle it uses per instruction is small. To achieve efficiency HoMade has the so-called Short_IP and Long_IP invocations. A Short_IP invocation guarantees a one-cycle execution of the invoked IP. By contrast, Long_IP invocations have a undefined execution time; they send a signal to the processor when they complete their task. To improve the global performances of HoMade we increase the clock frequency and reduce the pipeline-branching penalties.
Performance and power estimation In order to estimate the performance of an application at run-time we will rely on the on-board measurements to track the execution time of each IP. Their power consumption can be also measured using a dedicated power analyser. This approach will allow us to obtain the global performances of an HoMade-based implementation and to incorporate these measurements into analytical models.
Validation and comparison with other softcores To validate HoMade a large campaign of validation and comparison with others softcore on the same FPGA will be undertaken. A first validation will use a mono-softcore implementation. Frequency, power consumption, surface and execution times are the criteria for this comparative test.
Middle Term — Integrating Dynamic Reconfiguration and Massive Parallelism
Our objectives are to support dynamic reconfiguration and massive parallelism on the upcoming FPGA releases. We organize the development as a step-by-step improvement of successive HoMade versions. The ordering of these versions will depend on the progress made on the following tasks.
IPs for Dynamic reconfiguration of IPs Dynamic reconfiguration management will be itself performed by dedicated IPs. An ICAP-like (Internal Configuration Access Port) primitive will encapsulated within an IP. Using an IP to perform reconfiguration gives us more flexibility for performing reconfiguration, in particular, with respect to how and by whom reconfiguration may be triggered.. Of course, the targeted FPGA should support this functionality.
Performance and power monitoring wrapper As a first step toward on-chip performance analysis, we will implement a so-called wrapper around each IP in order to monitor its performance and power consumption estimation at run-time. We will introduce dedicated sensors for capturing the local activities affecting the overall performances of one HoMade. This wrapper will be developed as an IP and integrated in the execution model of HoMade..
Master/Slave HoMade to HoMade Parallel Execution Two HoMade instructions are reserved for master-to-slave and slave-to-master interaction. First, all selected slaves run a program with a common start address. Second, a synchronisation barrier detects the end of the processing of a subset of the slaves. We mimic inside the selected subset the SPMD execution model. Between disjoint subsets of slaves, we propose a multi-SPMD execution, each softcore subset running in a local SPMD mode. This is not supported by the version 2.2 of HoMade. A substantial effort has to be done to support a large number of slaves without degrading the clock frequency.
Slave activity management IPs To manage subsets of slaves and the multi-SPMD model we need to manage activity of each slave. Again, an IP will be proposed by default to manage the activity locally on each slave and globally on the master. Users can redefine their own activity managers and can change the IP during execution if needed.
Slave-to-slave communication IPs Parallel computing requires inter-processor communication. In line with the HoMade philosophy, communications between slaves will be implemented via specific IPs. One can use the default IPs provided with HoMade or develop one’s own IPs and even change the communication system at runtime thanks to dynamic reconfiguration. Like most Network on Chip 2D-Grids will be the main target topology for HoMade. A NEWS IP (North, East, West, South) grid will be proposed by default.
Parallel I/O and communication To take benefit of the pool of HoMade slaves we should provide a large data splitting on the grid. Different solutions exist in the literature. IP based-design allows users to implement some of them. By default we shall propose to connect parallel I/O buffers to the border of the NEWS network (à la Intel Paragon machine).
Validation and comparison with static MPSoC During all the development process we shall compare our Multi-HoMade with available MPSoCs one realistic examples. Some MPSoC are reconfigurable, but none is reflexive (to our best knowledge).
Long Term — Advanced Reconfiguration
Long-term objectives concern the combination of dynamicity and parallelism, and new extensions of the model of computation towards even more dynamicity.
Parallel dynamic reconfiguration system To achieve this objective we have to distribute the dynamic reconfiguration management between slaves and master (via bitstreams). The ICAP-like primitive needs a master IP implemented in the master to start a reconfiguration phase, and, locally, each slave has to manage its own hardware reconfiguration when bitstreams arrive. Bitstream flow is addressed in the next section.
Dynamicity and 2D-grid To change the grain of parallelism we envisage to dynamically change the size of the grid at runtime. Adding a line or column, deleting a line or a column need to organise the regions of the FPGA in such a way the communication links are predefined and the surface is used efficiently for any IPs of any HoMade slaves. We have already tested this approach for communications, but integrating the IPs mapping is much more difficult and FPGA-dependent.
Dynamic reconfiguration performance monitoring As a part of the reconfiguration model we have to observe the multi-HoMade behaviour and to adapt the performance on chip analyser to take care of both massively parallelism (communications and computations) and dynamic reconfiguration. A performance hierarchical monitoring strategy could help to post-verify satisfaction of real-time and power-consumption constraints. A self-adaptive Multi-HoMade architecture is a future story.
Constellation on chip with HoMade hierarchies A slave can be master of other slaves. HoMade can play both roles. When it executes code, some instructions are sent to its slaves. Slaves can reproduce the same model. The resulting hierarchy is not necessarily regular. A constellation of HoMades could cover one or more dies of the FPGA.
2. Hardware Support for Partial Dynamic Parallel Reconfiguration
In HoMade-based architectures everything is done by IPs, including the task that handles reconfiguration. In dynamically reconfigurable systems one may need to remove then reuse IPs at runtime, thus, such systems have to ensure consistency by managing context switching (storage and restore). These functions are mandatory as we target dynamic systems that allow reuse of IP intermediary results from previous instantiations. Moreover, IPs may be as simple as an addition and as complex as an H264 decoder. Thus, the resource occupation on an FPGA device can be very different from an IP to another. As placement and routing Xilinx tools need specifying sizes and positions of all reconfigurable regions at compile time, these have to be defined before any execution. The storage and restore functionalities will constitute the hardware support in HoMade-based architectures on physical FPGA devices.
Short Term — 2D FPGA and Multi-Devices
Generic and flexible wrappers for reconfigurable IPs context switch We shall develop a flexible and generic model for a hardware wrapper to encapsulate each reconfigurable partial bitstream. Such a component will communicate with the IP that it encapsulates in a transparent manner for the programmer and will serve as a support for commands such as: stopping an IP, replacing the current IP by another one, storing the current context of the IP, and changing the current context by another one (available in a die memory). This wrapper will be static and attached to each reconfigurable region (one IP per region), with a HoMade-compliant interface in one side and a PLB (xilinx Processor Local Bus) one in the other side, to enable using the complete Xilinx dynamic reconfiguration flow.
Xilinx-compliant physical IP deployment model on partially reconfigurable FPGAs A bitstream is not only the result of an IP synthesis targeting a specific device, but it is highly dependent of the location and the available resources of the region where it is supposed to be implemented. In order to implement any bitstream of the system, we can define fixe size reconfigurable regions that fit the largest bitstream of the system. This solution will allow us to obtain partially reconfigurable HoMade-based systems that will constitute a first integration and test platform. This first solution has to improved in terms of flexibility and performance.
Multi-FPGA Emulation Board The first validation of our massively parallel dynamically reconfigurable architecture will be performed using currently available Xilinx FPGAs which are not currently supporting parallel reconfiguration. For this purpose, we propose to use an emulation platform in order to implement massively parallel and dynamic architectures and to reconfigure several cores/regions in parallel. Currently, we are designing together with the Nolam company (www.nolam.com) a multi-FPGA board featuring a parallel reconfiguration mechanism built around a PCIe switch. This board has 4 FPGA circuits that can communicate through a shared-memory or a message-passing mechanism. A master FPGA can initiate a parallel reconfiguration using the broadcast option of the PCIe switch, allowing high reconfiguration throughputs. This platform will serve as a reference for evaluating performance and power consumption.
Middle Term — Parallel Dynamic Reconfiguration
Optimized wrappers for reconfigurable IPs context switch As context handling involves data and global memory access, the wrappers may cause substantial time and resource consumption overheads. Thus, in the middle term, instead of using external memories, we shall introduce local context stacks in reconfigurable regions, that we shall use in a similar way as caches. Each stored context requested by a wrapper will be called by its number, which at a later stage will then be translated into an address using a naming table that allows finding the right up to date context.
Regular regions/bitstreams for Multi-SPMD systems Designers handle homogeneous components (SPMD-based), thus, all bitstreams are derived from the same IP. An interesting challenge is how to make this kind of reconfigurations as fast as possible on existing 2D or 3D FPGAs. The second step of our work on the bitstream deployment model is to take advantage of the repetitive aspect of HoMade-based architectures. By taking into account the repetitive aspect one can manage a single bitstream for a single IP independently of the region where it will be located. The region characteristics is included on the fly in the bitstream when reconfiguration is performed.
Hardware performance analyser The latest Xilinx devices provide on-board monitoring to measure supplied voltage and device temperature. A monitoring system can be instantiated to access these measurements from FPGA applications. The information is mainly global for the device, and, by combining it with a local performance analyser, a more accurate estimation could be achieved for local power consumption and temperature analysis of each IP.
Long Term — Towards 3D FPGAs
Master/slave wrappers for Multi-SPMD reconfigurable systems Master/slave wrappers will likely constitute an interesting issue for regular multi-SPMD reconfigurable systems. Such a multi-SPMD model can take advantage of the homogeneity and repetitive slave characteristics in order to attach a complete wrapper to each homogeneous SPMD grid (several reconfiguration regions) that will communicate with simpler and less area-consuming wrappers attached to each reconfigurable region of the grid.
Runtime bitstream relocations for topology reconfiguration In order to improve performances of HoMade-based systems and reduce reconfiguration time, we shall develop an IP using bitstream-sliding techniques (FlexTiles project, http://flextiles.eu/) over the CLB grid on the FPGA device. This can be very relevant in the case of regular interconnection topologies, because it will allow moving all IPs composing a regular grid at the same time.
3D FPGAs become reality 3D packaging is the next innovative technology for FPGAs. The inter- and intra-layer positioning of communication and logic resources is of utmost importance. We anticipate that multiple stacked layers can be used for a fast and massively parallel reconfiguration over the whole chip. Recent works confirm this vision. As soon as 3D FPGAs become available, Multi-HoMade could be deployed taking benefits of all previous results.
3. Languages, Compilation, Formal Verification
When the HoMade and multi-HoMade architectures become reality users will need languages for programming applications on them. We shall develop languages that are reflective and parallel, effectively raising these features from the hardware to the software level, so that reflexivity and parallelism traverse the whole project and become a unifying factor for our research. The interaction of reflexivity and parallelism in the envisaged languages must be understood in all its details, hence the need for formally defined languages. We are planning to use the K framework (http://k-framework.org) for this. Once a language is formally defined, programs in that language can be formally verified against user-defined properties. Its compilation to other, lower-level languages will also need to be formally verified in order to ensure that properties established at the high level also hold on the low-level. The languages will be part of an environment that will consist of the following components, the development of which constitute tasks in our project:
a formally defined machine language of the HoMade processor in the K framework. K formal definitions are executable and generate, for free, an emulator for machine-code programs (where hardware IPs are emulated in software);
a formally defined assembly language for the HoMade processor in the K framework (and the emulator generated from the K executable definition);
a high-level, dynamically extensible, parallel, and programmer-friendly language called HiHope, also formally defined in the K framework (and its emulator generated from the K definition);
a formally verified compiler from HiHope to the HoMade assembler, which ensures that any compiled HoMade program has the same behaviour as the corresponding HiHope source;
a formally verified assembler of the HoMade assembly language to machine code, with a similar motivation;
a program-property specification and verification engine for HiHope programs, such that the properties specified are preserved by the compiler and the assembler, and thus also hold in the HoMade assembly and down to the machine code.
Creating these components requires overcoming the following scientific and technical barriers:
designing means for writing, executing and formally specifying programs that have underspecified hardware components, while staying above the hardware abstraction level;
identifying classes of properties relevant to our dynamic reconfiguration/parallelism setting.
Short Term — K formal definitions of languages
Formally defining in the K framework the HoMade machine language. The machine code of the processor will be formally defined in K, with special care to precisely follow the low-level aspects of machine code (i.e., alignment of instructions to word size) and how code is executed in the HoMade processor (including the handling of hardware IPs). Hardware IPs will be modelled as K rewrite rules, either concrete rules if the full behaviour of the IP is known, or symbolic ones if the behaviour is only partially known. (At the very least, one knows how many elements an IP pushes and pops on the HoMade stack). Since K definitions are executable these definitions will generate an emulator for the machine language (in which the hardware: HoMade and the IPs are emulated).
Formally defining in the K framework the HoMade assembly language. This language already exists: it was developed in order to allow writing some experimental IPs for the current version of HoMade (the machine code was too impractical for this). Compared to the machine code, where everything is represented in binary format, the assembly language has mnemonic instructions and labels for branching, with no concern regarding, e.g., how many words an instruction uses and the impact of word size when branching to a label. We will also define this language in K and obtain an emulator for HoMade assembly programs. With this definition the writing of simple mixed hardware/software applications and their emulation in software becomes possible. In particular, this relieves developers from waiting for a re-synthesis of all the hardware before testing each change in their code.
Designing and formally defining in K a dynamically extensible language: HiHope, which will contain higher-level constructions (loops, functions) ensuring a comfortable programming. In the short term we will first define the dynamic extensibility features of HiHope, which consist in the ability to define functions at runtime, to call them, and to redefine them on a by-need basis. This is another dynamic-reconfiguration feature of HiHope, the other one being the dynamical invocation of hardware IPs. Thus, in HiHope, programs will be able to switch at runtime between performing software-based computations (slower, but safer since software can be in principle fully verified), or hardware IP-based computations (faster, but with fewer guarantees, since functional correctness relies on the trust that IPs meet their specifications). Our objective is to make the resulting HiHope language to be a specialization of the FORTH language (http://www.forth.org), a dynamically exensible and reflexive language, which has a standard (http://www.forth200x.org/forth200x.html) and is used in embedded-system programming. Since FORTH is self-extensible, any program in a specialization of FORTH is a FORTH program as well. The goal is to make the HiHope language readily usable by FORTH embedded-system developers.
Middle Term — parallelism, compilation, verification
Incorporating parallelism into HiHope HiHope also needs to be a parallel language, i.e., consistent with the parallel nature of the hardware it runs on — the multi-HoMade reflexive softcore (which is also a middle-term objective). We shall thus extend HiHope to a multi-SPMD language. Such a language is an extension of data-parallel languages, in which two kind of domains are considered: a scalar domain and one or several parallel domains. A parallel domain is defined by a set of HoMade slaves. Structures defined on a parallel domain are the union of a local instance of the structure on each slave of the domain. An instruction executed on a parallel domain is locally executed on each slave of the domain. A HiHope parallel program is a succession of scalar and parallel blocks. Semantically, the execution of a parallel block triggers a parallel execution on the domain, the end of the block defines a synchronisation point between all the slaves and the master that executes the scalar code. From one HiHope instruction stream, the compiler will be in charge of the generation of both the slave’s codes and the master’s code. Moreover, parallel HiHope will extend the scalar control flow to the parallel dimension with a notion of activity attached to each slave. A slave takes part in a parallel block if and only if it is active. The slave activity management IP shall be generated from the parallel control flows.
The language will also integrate a possible definition of communication topologies matching the HoMade 2D-grids. Such a definition will provide a set of ports that will be used by the constructor of communication of the language. These high level constructor will generate call to the HoMade communication IPs.
In a second stage, the parallel structure, activity management, and communication instructions of the language has to be extended to multiple domains to reflect the multi-SPMD nature of the HoMade system.
Furthermore, the dynamic extensibility features of HiHope introduced in relation to the reflective nature of the HoMade system will also be extended with a data-parallel version that will allow to mix in a single code the dynamic configuration of the parallel architecture and its programming.
A compiler from HiHope to HoMade assembly and its formal verification for correctness Compiler correctness here means that the HiHope source and compiled HoMade assembly programs have the same behaviour. We shall formally defined define the compiler in K and then formally verify the compiler thus defined. For this we shall design an adequate, formal notion of symbolic program equivalence (symbolic HiHope programs represent, in particular, all the basic syntactical HiHope constructs), a logic for writing formulas expressing the chosen notion of equivalence, and a deductive system for proving the validity of formulas in the logic. We shall base this on our work on a logic for program equivalence as well as on symbolic program execution (Arusoaie, Lucanu, Rusu, 2012). We shall extend our logic and its deductive system for dealing with parallel HiHope and the resulting HoMade assembly programs, and shall build the extended logic’s deductive system on top of an extension of our symbolic execution.
An assembler from HoMade assembly to Homade machine language and its formal verification. This task resembles the previous one. The challenge here is to encode the assembler’s correctness in our automatic equivalence verification techniques suggested above. We have empirical evidence that the techniques work for the compiler, but need to investigate this for the assembler, which performs tasks that are different from the compiler (i.e. transforms jumps to symbolic labels to jumps to binary addresses, which it computes based on the binary representation and size of instructions). If the assembler verification does not fit in our techniques, fall-back solutions are either to adapt our techniques or to combine it with more standard, interactive techniques based on, e.g., interactive theorem proving.
Enhancing HiHope with a formal verification engine for reasoning about program correctness This involves the design of a logic suitable for specifying and verifying HiHope programs; the program verifier is the implementation of the logic’s deductive system. For this, we are planning to adapt Reachability Logic, a language-independent logic for program correctness. Reachability logic was proved sound and relatively complete with respect to program properties (in particular, invariants and pre/post-conditions, generically known as contracts). Thus, all that its deductive system proves are valid properties of the code, and all valid properties can, in principle, be proved. It makes sense to verify program properties at the HiHope level because, there, loop invariants and function pre/post-conditions make sense (since there are loops and functions in the first place). Conversely, the HoMade assembly and the machine code are too low-level for verification.
We note that a simple adaptation of reachability logic for HiHope will not suffice, because our goal is to make sure that what is proven in HiHope remains true after compilation to Homade assembly. Therefore, significant effort is required to identify exactly which properties are preserved by the compiler. Furthermore, the nature of the HiHope language involves additional challenges (how to model and reason about reconfiguration and parallelism in the logic).
Long Term — Towards multi-level SPMD languages and verification
A longer-term objective is tailoring the languages and the formal specification and verification tools to the longer-term evolutions of HoMade, namely, the multi-SPMD execution model integrated with the envisioned dynamic topologies and hierarchical constellations of Homade processors. Some parallel languages have been defined in K but not, to our knowledge, parallelism following the multi-SPMD execution model, so this remains a challenge. Languages with dynamic topologies and hierarchy (in the sense of dynamic communication channel creation and destruction between hierarchically defined processes) have been defined in K, so we could take inspiration from them. Finally, we need to identify specific challenges to the formal verification of programs written in such future languages: what are the application-independent, but architecture-depended properties to be verified? How to extend the proposed specification logic and its deductive verification system to handle such properties ? These are some of the open questions for the future. Other open questions, more theoretical in nature, concern unifying the symbolic program-property reachability logic with the symbolic logic used to specify the correctness of compiler single logic in which program equivalence (for compiler correctness) and program properties are mutually consistent with each other by construction.
4. Industrial Cases Studies
With the HoMade processor it will be possible to prototype dynamically reconfigurable massively parallel hardware and to test it on case studies. We have presented earlier in the document three application domains where we think that dynamic reconfiguration can help either for improving performance or for reducing resource consumption: reflexive camera networks, set-top-boxes, and transportation and avionics applications. Our goal is to demonstrate the relevance of our approach on the most common types of applications in these domains. Some of the targeted classes of applications are safety-critical, and formal verification is essential for them. For the other ones, formal verification provides an added value in terms of quality of service.
The first step is to map the algorithms specific to those domains onto our model of execution and to create the IPs that are building blocks of any signal and image-processing applications. We present here a few steps towards the design of our use-cases.
Short Term — Hardware/Software Partitioning
For highly parallel applications, when a hardware parallel IP does not exist, we need to exploit the intrinsic parallelism of the applications in order to propose efficient hardware implementation of the repetitive tasks using the massively parallel master/slave HoMade configuration. We also need to define the dynamic reconfiguration scenarios according to the application’s requirements, the system’s resources, and the context. Because all our applications belong to the same class (signal and image processing) some parts of the applications will be reusable. For example the video encoder/decoder H.264 will be used by several applications as it is considered as the main standard for video compression. Thus, this will be the first HoMade code dedicated to signal processing that we shall develop. In the new project Smart-cities we have already started the design of this program component using the Xilinx Zynq-based platform, a scalable FPGA equipped with I/O video features making it very appropriate for various multimedia prototypes.
Middle Term — Signal and image-processing platform
Our intention is to build a full experimental signal and image-processing platform. We need to integrate several IPs (stream acquisition, VGA or DVI output…) with the HoMade processor in charge of scheduling, parallel, and dynamic reconfiguration control. For example, in smart cameras and set-top-boxes we want to display on screen a non-encrypted video sent from acquisition devices. If the H.264 codec is encouraging, we shall then add other IPs: for example HEVC, also called H.265. In order to change dynamically the stream encoding (or switch the source) the HoMade processor can change the corresponding IPs on-the-fly to adapt itself to the new encoding. This is where the integrated massive parallelism and dynamic reconfiguration becomes very relevant..
Long Term — Complete Architectures
Finally, a complete chain of IPs controlled and scheduled by a HoMade processor will implement our our vision for embedded, high-performance, reflexive, and safe computing. We will experimentally validate our approach on cas studies proposed by our industrial partners.