Research

Dynamic parallelization and optimization of binary code

(Current PhD work of Nabil Hallou. Advisors : Erven Rohou & Philippe Clauss)
The advent of multicore processors and their growing heterogeneity and complexity put high pressure on development and execution support tools. It becomes more and more difficult to guarantee satisfactory execution performance of an application on these various and complex processor architectures. This is due to several important issues:

  • the application code has to exhibit sufficient and adapted parallelism in order to benefit from being run simultaneously on several cores
  • it has to take advantage of specific instruction set features as vectorization, multiply-add, …
  • it has to take advantage of cores dedicated to specific kinds of computations in order to accelerate some adapted sub-parts of the code
  • it has to adapt to the current available hardware ressources that may be concurrently shared with other applications
  • it has to be portable in order to guarantee satisfactory performance on various computer architectures
  • for all these reasons, it has to adapt dynamically to the underlying execution platform while it is running

These issues cannot be handled sufficiently during the development process. On the hardware side, all features of future execution platforms cannot be known in advance: the lifespan of software is generally longer than the hardware’s. On the software side, some relevant properties cannot be known before it has been executed. For instance, a precise knowledge of the performed memory accesses is essential to achieve valid parallelizing transformations of a code. However, these accesses are often expressed in the source code by using pointers or indirections that cannot be disambiguated at compile-time. Another important issue is that an optimizing transformation can be more or less efficient depending on the volume and the nature of the data that is processed by the application.

Hence it is essential for an application to be able to adapt continuously to its current execution environment, characterized by both hardware and software conditions. A solution is to be assisted by a runtime system dedicated to analyze and transform advantageously the code of the application while it is running.

In this thesis, we propose to develop such a runtime system to manage any x86 binary code. This system will be able to handle any executable by first identifying some optimization and parallelization opportunities, then by applying the corresponding code transformations, and finally by monitoring the resulting performance
in order to adjust the previously applied transformations if needed.

Technically, three different approaches will be investigated:

  • binary to binary: a binary code is transformed in situ by patching some of its instructions. Only slight modifications can be targeted by this way.
  • binary to intermediate to binary: the parts of the code that have been identified as good candidates for optimization are disassembled to a higher intermediate form on which advanced transformations are applied. The resulting intermediate code is then reassambled into x86 code on which the main code execution path is plugged. We will use the LLVM intermediate form and the LLVM Just-In-Time (JIT) compiler to re-generate x86 code.
  • static binary to intermediate to binary: before execution, the binary code file is analyzed as precisely as possible and some code transformations are already applied, if these have been considered as definitely efficient. Some partial transformations will also be applied to generate incomplete versions that will have to be filled at runtime thanks to additional information that has become available. Code snippets in intermediate form will be embedded in the execution file in order to be jitted at runtime.

Notice also that some optimizing and parallelizing transformations will be speculative, since they will be based on predictions that can only be verified afterwards, at the end of the execution of the targeted code. Hence a verification system and rollback system will be embedded in the runtime system.

Software implementations will be achieved using the LLVM compilation framework and the OpenMP library GOMP for code parallelization. Parallelization of loops will be particularly targeted using the polytope model, which is a framework dedicated to advanced loop analysis and transformation. It will have to be adapted
to a dynamic usage.

Conditional code optimization & parallelization

(Master work of Harrison Capera, future PhD work project. Advisors: Fabrice Rastello, Alain Ketterlin & Philippe Clauss)