Reconfigurable Processor Extension Generation

The core computational unit of an embedded system is usually an embedded processor often extended with specialized hardware. The design of such systems involves many different design methods. We have defined an original generic approach to the automatic generation of processor reconfigurable extensions and application compilation based on constraint programming. We proposed a method for the automatic selection of application-dependent processor extensions and for the application scheduling on such ASIP (Application Specific Instruction-set Processor) models. The extension can be an architecture composed of one run-time functionally reconfigurable processing unit tightly coupled to a processor core. But it can also be parallel and composed of a set of heterogeneous processing units communicating through a crossbar network, where the number of registers and the structure of interconnections are application-dependent. The proposed compilation framework compile C programs into a Hierarchical Conditional Dependency Graph internal representation which is used for identification of computational patterns under various constraints: critical path length, number of inputs and outputs, hardware resources such as the number of operators, etc. This set of identified patterns is then used in the mapping and scheduling step where a subset of patterns is selected for hardware implementation. If the extension can execute many computational patterns, the corresponding run-time reconfigurable functional unit can be synthesized using merging techniques. The developed DURASE framework uses advanced technologies, such as algorithms for graph matching and graph merging together with constraint programming methods.

 

Compilation for Run-Time Reconfigurable Architectures. In order to support run-time reconfigurable style of architectures such as the ROMA processor, extensions to the DURASE framework have been developed. Constraint programming models used to solve application scheduling, binding and routing for two abstract architecture models of the ROMA processor have been defined. A first one concerns a non-pipelined execution model of the architecture, it deals with execution time optimization. A second one, enabling a pipelined execution mode, faces the possible evolution of the ROMA architecture, the goal is here to minimize the latency of the pipeline. The compilation flow was applied to MediaBench and MiBench benchmark sets. In most of the cases, our system provides optimal results, confirming the high quality of our scheduling, binding and routing system.