Team Ecuador studies Algorithmic Differentiation (AD) of computer programs, blending :
AD theory:We study software engineering techniques, to analyze and transform programs mechanically. Algorithmic Differentiation (AD) transforms a program Pthat computes a function P’that computes analytical derivatives of adjoint modeof AD, a sophisticated transformation that yields gradients for optimization at a remarkably low cost.
AD application to Scientific Computing:We adapt the strategies of Scientific Computing to take full advantage of AD. We validate our work on real-size applications.
We aim to produce AD code that can compete with hand-written sensitivity and adjoint programs used in the industry. We implement our algorithms into the tool Tapenade, one of the most popular AD tools at present.
Our research directions :
- Efficient adjoint AD of frequent dialects e.g. Fixed-Point loops.
- Development of the adjoint AD model towards Dynamic Memory Management.
- Evolution of the adjoint AD model to keep in pace with modern programming languages constructs.
- Optimal shape design and optimal control for steady and unsteady simulations. Higher-order derivatives for uncertainty quantification.
- Adjoint-driven mesh adaptation.
Last activity report : 2023
- 2023 : PDF – HTML
- 2022 : PDF – HTML
- 2021 : PDF – HTML
- 2020 : PDF – HTML
- 2019 : PDF – HTML
- 2018 : PDF – HTML
- 2017 : PDF – HTML
- 2016 : PDF – HTML
- 2015 : PDF – HTML
- 2014 : PDF – HTML
Asynchronous file offload of the AD stack
Data-Flow inversion is at the heart of the computation of gradients through AD. This amounts to providing (many of) the intermediate values computed at run-time, in reverse order. Most approaches use a stack, whose size may grow dramatically. Earth science applications such as climatology run on meshes of several million cells on thousands of time steps, yielding challenging stack sizes. Even using classical storage/recomputation tradeoffs, stack size may exceed the available main memory.
Most tools deal with this in an ad-hoc manner, identifying locations that consume a large part of the stack size, typically the “snapshots” taken before selected time steps. These snapshots are then stored in a specific way, i.e. directly on file. The way Tapenade’s stack is organized allows for a more general and transparent approach. All storage needed by data-flow inversion goes to a unique stack. Without any intervention at the application level, this stack internally detects that its size becomes excessive, which triggers offloading of the deepest part of the stack to distant memory. Conversely, the stack is restored from the distant memory when needed. Offloading and restoring are triggered by the status of the stack, and are implemented in an asynchronous manner, allowing for time overlap between offloading/restoring on one hand, and derivative computation with elementary stack push/pop on the other hand.
We implemented an experimental version of this mechanism and started to use it on the adjoint of Dan Goldberg’s “halfpipe-streamice” test case in the MIT GCM (Global Circulation Model). The much larger stack size permitted lets us experiment with different checkpointing patterns, with the effect of a significant speedup. More experiments are needed to validate this promising improvement.
In parallel, Jan Hueckelheim is looking at this asynchronous stack, written in C using POSIX
pthreads and mutex‘es, to experiment a code analysis tool to detect potential conflicts and data races in multithreaded C codes.
Frontiers of the AD model
AD suffers from numerous limitations. Some limitations come from the constraints of application languages, or from coding styles adverse to AD or at least that make it inefficient. In this category of programming-related limitations, we have studied the question of data-flow reversal in languages with Garbage Collection (GC). Data-flow reversal means in particular restoring structures that have been deallocated. This is hard in presence of addresses, pointers, and even C pointer arithmetic. In a language with GC, how can we detect and undo a deallocation when it is done in a hidden manner and asynchronously? We proposed an extension of the AD model that deals with GC, we validated it on a simple Navier-Stokes solver application, and measured its performance. The article 13 describing this model and experiment has been published this year by ACM TOMS journal.
Other limitations are of a more abstract nature, and may come for instance from the mathematical equations underlying the code, or from the fundamental difference between computer’s float numbers and real numbers. Categorizing these limitations is difficult, and a (very long) catalogue may prove useless. Jan Hueckelheim considered the question of what one can reasonably expect from AD. The assumption is that problematic cases are often not about AD producing wrong derivatives, but rather meaningful derivatives, only for a model different from what the user has in mind. Accurate knowledge of the context in which AD derivatives are meaningful will help users build a realistic expectation for AD tools. A joint article “Understanding Automatic Differentiation Pitfalls” is in preparation. A poster has been presented at the ICML 2023 workshop “Differentiable Almost Everything” in June.
Application to large industrial codes
We support users of Algorithmic Differentiation of large codes, mostly on CFD and Earth science.
In CFD, we continue support for the use of Tapenade to provide gradient computations in the Onera Platform “SoNice”. SoNice is a new CFD platform developed at ONERA by Bourasseau, Maugars, Content and colleagues, as a successor to Elsa. In the modular architecture of SoNice, Tapenade is used to build the differentiated counterpart of each module or operator. The implementation of each module is devised to be platform-independent, eventually targetting several multithreaded architectures through e.g. OpenMP or Cuda. This imposes technical constraints on the language differentiated by Tapenade, at various abstraction levels that range from AD models for multithreaded code, down to ways of handling
fpp and cpp directives and automatically setting entry points and labels in a differentiated code to allow for some necessary post-processing.
We also continued support for Tapenade differentiation of the plasma Magneto-Hydro-Dynamic code SOLPS-ITER with Stefano Carli of KU Leuwen. A joint article 11 that describes this work was accepted this year for JCP.
In Earth science, this year main application is addressed by Shreyas Gaikwad, University of Texas at Austin, PhD student supervised by Patrick Heimbach, with support from Krishna Narayanan from Argonne. The goal is to produce the adjoint/gradient of all applications based on the MIT GCM code. These applications range from atmospheric to oceanic and glaciology. Last year’s successful tangent and adjoint differentiation of MIT GCM was extended to several new validation test cases, and this year’s work was to improve performance of the resulting differentiated code by carefully using the code optimization options of Tapenade. Still on MIT GCM, Dan Goldberg, University of Edinburgh, applies Tapenade to the glaciology test case “halfpipe-streamice”. Successful adjoint differentiation required improvements and extensions to Tapenade’s handling of fixed-point loops. We are now working on performance improvement (in memory usage and CPU time) by exploring the available options for “checkpointing”, a classical storage/recomputation tradeoff of AD. This will also serve as an application for our new asynchronous stack save/restore mechanism (see section 7.1).
Still on Earth science, we support adjoint differentiation of the BEPS 32 code, a “biosphere” simulation model taking vegetation into account. This code is developed in part by the small company The Inversion Lab in Hamburg, Germany. We collaborate with Michael Vossbeck, engineer with The Inversion Lab, to provide the adjoint of BEPS by Tapenade. Differentiation with Tapenade is becoming a strategic part of the business of The Inversion Lab. Michael Vossbeck is progressively getting acquainted with the source code of Tapenade, with the objective of contributing to Tapenade development, and of being able to maintain it. This year, Michael Vossbeck improved the flexibility of the built-in auto-validation tool of Tapenade.
The progress in highly accurate schemes for compressible flows on unstructured meshes (together with advances in massive parallelization of these schemes) allows to solve problems previously out of reach. The four-years programme Norma, associating:
- IMAG of Montpellier University (B. Koobus, coordinator),
- Computational AeroAcoustics Laboratory (CAALAB) of Keldysh Institute of Moscow (T. Kozubskaya, head), and
- Ecuador of INRIA Sophia-Antipolis.
is supported by the French ANR and by the Russian Science Foundation, and is active till september 2024. Norma is a cooperation on the subject of extending Computational AeroAcoustics methods to simulate the noise emited by rotating machines (helicopters, aerial vehicles, unmanned aerial vehicles, wind turbines…). A detailed description with progress reports is available at this location.
Modelling turbulence is an essential aspect of CFD. The purpose of our work in hybrid RANS/LES (Reynolds Averaged Navier-Stokes / Large Eddy Simulation) is to develop new approaches for industrial applications of LES-based analyses. In the applications targeted (aeronautics, hydraulics), the Reynolds number can be as high as several tens of millions, far too high for pure LES models. However, certain regions in the flow can be predicted better with LES than with usual statistical RANS (Reynolds averaged Navier-Stokes) models. These are mainly vortical separated regions as assumed in one of the most popular hybrid models, the Detached Eddy Simulation (DES) model. Here, “hybrid” means that a blending is applied between LES and RANS. The development of hybrid models, in particular DES in the litterature, has raised the question of the domain of validity of these models. According to theory, these models should not be applied to flows involving laminar boundary layers (BL). But industrial flows are complex flows and often combine in a single flow regions of laminar BL, regions of fully developed turbulent BL, and regions of non-equilibrium vortical BL. It is then mandatory for industrial use that the new hybrid models give a reasonable prediction for all these types of flow. We concentrated on evaluating the behavior of hybrid models for laminar BL and for vortical wakes. While less predictive than pure LES on laminar BL, some hybrid models still give reasonable predictions for rather low Reynolds numbers.
An important result for this year is the proposal by Florian Miralles of a novel intermittency-based statistical two-equation model, and its extension to a novel intermittency-based hybrid RANS-LES model. The new model predicts accurately a larger class of turbulent flow, as demonstrated by its ability to predict (using relatively coarse grids) the drag crisis and the increase in Strouhal number when applied to the simulation of the flow around a circular cylinder from sub-critical to super-critical flow regimes (cf. 35). This work is a part of the PhD thesis of Florian Miralles, supported by the Norma programme and defended in november 2023 at university of Montpellier.
The Montpellier-Sophia team has been studying this new model in combination with the mesh adaptation method developed by Bastien Sauvage, for higher Reynolds number and wing geometries. A Norma report and a paper are in preparation.
The physical problem addressed by Norma involves a computational domain made of at least two components having different rotative motions. The numerical problem of their combination gave birth to many specialized schemes, such as the so-called sliding method, chimera method, immersed boundary method (IBM). The Ecuador team is studying a novel Chimera method, in cooperation with Lemma engineering (Sophia-Antipolis).
In 2023, B. Sauvage and Didier Chargy made progress in the computation of the Caradonna-Tung rotor, with a new adapted calculation on 3 million vertices. Results were presented in 15.
High order approximations
Thanks to anisotropic mesh adaptation, high-fidelity industrial calculations can be made to converge at second-order (in the steady case), even for rather singular cases, and convergence error can be evaluated. We are investigating the way to extend this to higher order. Our research relies on a fourth-order accurate three-dimensional vertex-centered Central Essentially Non Oscillating (CENO) scheme. In 2023, the consolidation and validation of the fourth-order accurate CENO in a mesh-adaptive version of NiceFlow has been advanced. A theory is being developped for proposing a h-p approach (a simultaneous adaptation of the mesh and of the truncation order) combining the CENO scheme and our anisotropic mesh adaption approach.
Control of approximation errors
Reducing approximation errors as much as possible by modifying the mesh is a particular kind of optimal control problem. We formulate it exactly this way when we look for the optimal metric of the mesh, which minimizes a user-specified functional (goal-oriented mesh adaptation). In that case, the usual methods of optimal control apply, using adjoint states that can be produced by Algorithmic Differentiation.
The second volume of the book 25 on mesh adaptation by Alauzet, Loseille, Koobus and Dervieux has been published by Wiley this year.
Space-time mesh adaptation
The INRIA teams GammaO (Saclay) and Ecuador (Sophia-Antipolis) have continued to study the application of anisotropic mesh adaptation to unsteady flow simulation. The baseline approach is the Global Fixed Point algorithm combined with a timestep relying on Courant-type stability condition for explicit time-advancing. This remains the best approach for many transient flows. For many unsteady turbulent flows at high Reynolds number, this is however inefficient, since very small cells in the boundary layer impose a very small explicit time step. Implicit timestepping is compulsory. Indeed, the time steps which can be used are notably larger than those permitted with an explicit time advancing. Large time steps bring a higher CPU efficiency, but the time approximation accuracy becomes an issue. Too large time steps degrade the prediction, too small time steps increases the computational cost. The new development focuses on obtaining directly the space-time discretization that minimizes the global error, under the constraint of a prescribed space-time discretization complexity. Application to vortex shedding flows (see figure 2) illustrate the interest of this novel method (cf. 18, 14, and B. Sauvage, F. Alauzet, A. Dervieux, “A space and time fixed point mesh adaptation method”, in preparation).
Mesh adaptation for LES
With statistical turbulence models, only a part of turbulent industrial flows can be predicted at an affordable cost. The rest may involve large detached regions which cannot be accurately described by statistical modelling, or vortices producing noise that the engineer wants to predict accurately. The Large Eddy Simulation (LES) is a model which predicts a part of the vortices of industrial interest. LES relies on filtering too small vortices and on modeling their effect on the larger ones. But this approach is one or two orders of magnitude more computer intensive than statistical modeling and therefore cannot be routinely used by engineers. Germano proposed an analysis using two different filters in order to measure the efficiency of a family of LES models. Recently, Toosi and Larsson demonstrated that the Germano analysis in fact deals with the source term of LES modeling error. We introduce this error term in our mesh adaptation process for RANS flow in order to obtain an approach for adapting the mesh to hybrid RANS/LES flow calculations. Applications to vortex shedding flows are being performed (cf. B. Sauvage, B. Koobus, F. Alauzet, A. Dervieux, “A metric-based mesh adaptation for hybrid RANS/LES flow calculations”, in preparation).
Control of errors in optimization
It appears today that with the new algorithms for mesh adaptation, new efficient and accurate automatic tools are available for maximizing the quality of high-fidelity prediction. In the near future, the advantages of these tools will probably make them mandatory in industrial practice. For design optimization, automatic tools are already available. However, these tools today rely on a generation of non- adaptative or poorly adaptative CFD. It is important to propose (see 16 and figure 3) a composite algorithm which nicely combines the novel mesh adaptation technology with the most recent design optimization loops.