Research


Overall objectives

Team Ecuador studies Algorithmic Differentiation (AD) of computer programs, blending :

  • AD theory: We study software engineering techniques, to analyze and transform programs mechanically. Algorithmic Differentiation (AD) transforms a program P that computes a function F , into a program P’ that computes analytical derivatives of F . We put emphasis on the adjoint mode of AD, a sophisticated transformation that yields gradients for optimization at a remarkably low cost.
  • AD application to Scientific Computing: We adapt the strategies of Scientific Computing to take full advantage of AD. We validate our work on real-size applications.

We aim to produce AD code that can compete with hand-written sensitivity and adjoint programs used in the industry. We implement our algorithms into the tool Tapenade, one of the most popular AD tools at present.

Our research directions :

  • Efficient adjoint AD of frequent dialects e.g. Fixed-Point loops.
  • Development of the adjoint AD model towards Dynamic Memory Management.
  • Evolution of the adjoint AD model to keep in pace with modern programming languages constructs.
  • Optimal shape design and optimal control for steady and unsteady simulations. Higher-order derivatives for uncertainty quantification.
  • Adjoint-driven mesh adaptation.

Last activity report : 2024

Results

New results

Profiling for improving Checkpointing schemes

Data-Flow inversion is at the heart of the computation of gradients through AD. This amounts to providing (many of) the intermediate values computed at run-time, in reverse order. The bottleneck is the memory space needed to store these values before retrieving them in reverse order. Alternatively, these values may be recomputed when needed, in which case the bottleneck becomes a quadratic increase of run time. The classical answer is a memory/recomputation trade-off known as checkpointing. By applying checkpointing to a well-chosen collection of nested portions of the code, one can compute its gradient at reasonable memory and run-time costs. However, there is no general approach to find a good (let alone optimal) set of nested checkpoints.

A checkpoint can be placed on any closed sequence of code instructions, such as a procedure or any part of a procedure with a single entry location and a single exit location. The number of possible sets of nested checkpoints is immense and an exhaustive optimal search is therefore out of reach. Yet, we observe that performance of gradient evaluation strongly depends on the set of checkpoints. We observed factors up to 10 or more on large applications.

In order to iteratively improve performance of a gradient computation by changing the choice of checkpoints, we developed a run-time profiling mechanism. Based on a current selection of checkpoints, profiling can estimate the effect of removing each of the current checkpoints, in terms of run-time and memory gain or loss. This estimation is performed during one run of the gradient code, with a minimal overhead.

We implemented this profiling tool inside our AD tool Tapenade, and experimented on two large applications taken from the MIT-GCM test suites. We observed run-time gains up to a factor 4. This work 13 was presented at the 8th International Conference on AD, Chicago USA, September 2024.

Frontiers of the AD model

AD suffers from numerous limitations. Some limitations come from the constraints of application languages, or from coding styles adverse to AD or at least that make it inefficient. A crude distinction, debatable but still quite valid, can be made between AD by execution of a new transformed source code on one hand, and AD by delayed evaluation of an execution trace on the other hand. This distinction is all the more important for the reverse AD mode, that computes gradients.

Creating a new source code that computes the gradient requires a complex tool akin to a compiler, implying a costly development that depends of the targetted application language. However, the resulting new source code can be optimized at compile time, and its resulting performance is only limited by the unavoidable cost of data-flow inversion (see 7.1). On the other hand, an execution trace can be built by a relatively simple instrumentation of the original code, almost independently of the application language, be it heavily templated C++ or one of the newer popular dynamic languages. However, the resulting execution trace (or “tape”) is in general huge, as it contains not only the data needed for data-flow inversion, but also the full unrolled sequence of run-time instructions.

This year, we started to explore AD by source-transformation for a dynamic language such as Julia. Our goal is to explore, and hopefully lift, the restrictions that the constructs of Julia impose on source-transformation AD. We are still at a preliminary stage. Difficulties arise in particular from:

  • the absence of explicit type declarations.
  • the type structure and the special flavor of overloading known as “multiple dispatch”.
  • the interpreted nature of the language, combined with hidden JIT compilation.
  • the presence of macros that dynamically rewrite the source.

No differentiated code was produced yet, but a few test codes could be parsed and analyzed.

Other limitations of AD can be of a quite abstract nature, and may come for instance from the mathematical equations underlying the code, or from the fundamental difference between computer’s float numbers and real numbers. Categorizing these limitations is difficult, and a (very long) catalogue may prove useless. Jan Hueckelheim considered the question of what one can reasonably expect from AD. The assumption is that problematic cases are often not about AD producing wrong derivatives, but rather meaningful derivatives, only for a model different from what the user has in mind. Accurate knowledge of the context in which AD derivatives are meaningful will help users build a realistic expectation for AD tools. A joint article “A taxonomy of automatic differentiation pitfalls” 11 was published this year.

Application to large industrial codes

We support users of Algorithmic Differentiation of large codes.

On CFD, we continue support for the use of Tapenade to provide gradient computations in the Onera Platform “SoNice”. SoNice is a new CFD platform developed at ONERA by Bourasseau, Maugars, Content and colleagues, as a successor to Elsa. In the modular architecture of SoNice, Tapenade is used to build the differentiated counterpart of each module or operator. The implementation of each module is devised to be platform-independent, eventually targetting several multithreaded architectures through e.g. OpenMP or Cuda. This imposes technical constraints on the language differentiated by Tapenade, at various abstraction levels that range from AD models for multithreaded code, down to ways of handling fpp and cpp directives and automatically setting entry points and labels in a differentiated code to allow for some necessary post-processing.

On Earth science, Shreyas Gaikwad continued building gradient computations on more test cases of the MIT-GCM, using Tapenade, with support from Krishna Narayanan from Argonne. These applications range from atmospheric to oceanic science and glaciology. This work is now published in  22. Also on Earth science, we support adjoint differentiation of the BEPS 29 code, a “biosphere” simulation model taking vegetation into account. This code is developed in part by the small company The Inversion Lab in Hamburg, Germany.

In Particle Physics, Krishna Narayanan differentiated with Tapenade the code HFBTHO from Lawrence Livermore National Lab., in tangent AD mode, which is a nuclear energy density functional (EDF) solver used to model the structure of atomic nuclei. This is a large Fortran code that unveiled a few technical issues but no major concern for Tapenade. An article is in preparation. We also successfully differentiated one test configuration of QCDNUM, a simulation code for Quantum Chromodynamics (QCD), in both tangent and reverse AD modes. This experiment, suggested by Lukas Heinrich and team at TU Munich and Max-Planck Institute, is supposed to lead to further collaboration.

Aeroacoustics/projet Norma

The progress in highly accurate schemes for compressible flows on unstructured meshes (together with advances in massive parallelization of these schemes) allows to solve problems previously out of reach, such as aeroacoustics around rotating machines. The four-years programme Norma, associating

  • IMAG of Montpellier University (B. Koobus, coordinator),
  • Ecuador of INRIA Sophia-Antipolis,

and supported by the ANR, was active till September 2024. Norma is a cooperation on the subject of extending Computational AeroAcoustics methods to simulate the noise emited by rotating machines (helicopters, aerial vehicles, unmanned aerial vehicles, wind turbines…). A final synthesis of the work is available as 30.

Since flows around rotating machines are turbulent and often characterized by laminar-turbulent transitions. The turbulence model must be able to take these phenomena into account while remaining reasonably expensive and of low dissipation in order to propagate acoustic waves accurately. The new turbulence model developed is an intermittency-based hybrid RANS/LES model (Reynolds averaged Navier-Stokes, Large Eddy Simulation) that addresses the targeted objectives. Flow simulation is based on the combination of an anisotropic mesh adaptation method with a Chimera or a multiple reference frame (MRF) method. On the other hand, and in order to improve the quality and efficiency of the prediction of unsteady flows, a new adaptation method based on a space-time metric is developed. Furthermore, a new adaptation technique, which introduces an adaptation criterion controlling in particular the LES error model, is also proposed for calculations that use hybrid turbulence models which are those favored in this program. Finally, a finite volume type numerical approximation scheme with quadratic or cubic reconstruction of the solution is developed in order to obtain high order accuracy on an unstructured mesh.

A detailed description with progress reports is available at this location.

Rotating machines

An important output of the Norma project is the research of an efficient combination of multiple reference frame (MRF) rotating formulation with anisotropic mesh adaptation. A first version of this combination has been defined and validated by Bastien Sauvage. With this new method, Bastien Sauvage has finalised two unsteady mesh-adaptive computations, one of a single Caradonna-Tung helicoter rotor and one of the combination of the Caradonna-Tung rotor with the Robin fuselage (see Fig. 2). The computation of the single Caradonna-Tung rotor shows a well-formed vortex shedding at the end of the two rotor wings, corresponding to high-frequency sound emission.

Computer-generated images and plots

Figure 2: Example simulation of rotating machines : hybrid/DDES computation around the Caradonna-Tung helicopter rotor: adapted mesh, vorticity field, iso-surface of the Q-factor, and one detail of it.

Turbulence models

The prediction of academic and industrial turbulent flows progresses with each decade. For example in 21, most results were 2D, RANS computations were very far from mesh convergence, and far from experimental measurements. Also, they did not apply well to many unsteady flows. Since then, new VLES (Very Large Eddy Simulation) and hybrid models appeared for addressing a large class of unsteady flows. At the beginning of 2000’s, a first generation of VLES/hybrid methods, including the Detached Eddy Simulation, showed interesting results. It is interesting to measure how these new physical models, and new numerical methods, improved one or two decades later. In association with IMAG, a wide series of test cases was recomputed in order to measure on these cases the progress in turbulence computation obtained thanks to the three novelties introduced by the Montpellier-Sophia cooperation:

  • a novel hybrid LES-RANS model with intermittency,
  • an fourth-order CENO4 advective approximation for RANS and hybrid calculations,
  • the new unsteady mesh adaptation for RANS and hybrid calculations.

An article in preparation (“Computing the flow past a cylinder : influence of models and numerics” by B. Sauvage, F. Miralles, S. Wornom, B. Koobus, A. Dervieux) discusses and contrasts these simulations.

Space-time mesh adaptation

The INRIA teams GammaO (Saclay) and Ecuador (Sophia-Antipolis) have continued to study the application of anisotropic mesh adaptation to unsteady flow simulation. The baseline approach is the Global Transient Fixed Point algorithm combined with a timestep relying on Courant-type stability condition for explicit time-advancing. This remains the best approach for many transient flows. For many unsteady turbulent flows at high Reynolds number, this is however inefficient since very small cells in the boundary layer impose a very small explicit time step. Implicit timestepping is compulsory. Indeed, the time steps which can be used are notably larger than those permitted with an explicit time advancing. Large time steps bring a higher CPU efficiency, but the time approximation accuracy becomes an issue. Too large time steps degrade the prediction, too small time steps increases the computational cost. We developed a novel space and time adaptation method for the implicit time advancing used in turbulence computations. The new method has been extended to a new Global Transient Fixed Point mesh adaptation algorithm addressing turbulent flows with several meshes applied to the time interval. This work has been published in 12.

Mesh adaptation for LES

With statistical turbulence models, only a part of turbulent industrial flows can be predicted at an affordable cost. The rest may involve large detached flow regions, which cannot be accurately described by statistical modeling, or vortices producing noise, that the engineer wants to predict accurately. The Large Eddy Simulation (LES) is a model which predicts a part of the vortices of industrial interest. LES relies on filtering too small vortices and on modeling their effect on the larger ones. But this approach is one or two orders of magnitude more computer intensive than statistical modeling and therefore cannot be routinely used by engineers. Germano proposed an analysis using two different filters in order to measure the efficicency of a family of LES models. Recently, Toosi and Larsson demonstrated that the Germano analysis in fact deals with the source term of LES modeling error. We propose a novel adaptation method based on the Germano dynamic-LES analysis, taking into account the LES model error. This method was validated on a set of test cases and exhibits a better efficiency. An article is being prepared (“A metric-based mesh adaptation for hybrid RANS/LES flow calculations” by B. Sauvage, B. Koobus, F. Alauzet, A. Dervieux).

Comments are closed.