soutenance de thèse de Mawussi Zounon

Titre: “La résilience numérique en algèbre linéaire”

Date & Lieu : le 1 avril à 14h00 en salle Ada Lovelace (3ème étage, bat. INRIA Bx-Sud-Ouest)

Membres du jury :
– Mike Heroux (rapporteur)
– Peter Arbenz (rapporteur)
– Karl Meerbergen (examinateur)
– Frédéric Vivien (examinateur)
– Emmanuel Agullo (co-directeur de thèse)
– Luc Giraud (directeur de thèse)

Abstract:
As the computational power of high performance computing (HPC) systems continues to increase by using huge number of cores or specialized processing units, HPC applications are increasingly prone to faults. This study covers a new class of numerical fault tolerance algorithms at application level that does not require extra resources, i.e., computational unit or computing time, when no fault occurs.
Assuming that a separate mechanism ensures fault detection, we propose numerical algorithms to extract relevant information from available data after a fault. After data extraction, well chosen part of missing data is regenerated through interpolation strategies to constitute meaningful inputs to numerically restart the algorithm.
We have designed  these methods called interpolation-restart techniques for  numerical linear algebra problems such as the solution of linear systems  or eigen-problems that are the inner most numerical kernels in many scientific and  engineering applications and also often ones of the most time consuming parts. In the framework of Krylov subspace linear solvers the lost entries of the iterate are interpolated using the available entries on the still alive nodes to define a new initial guess before restarting the Krylov method. In particular, we consider two interpolation policies that preserve key numerical properties  of well-known linear solvers, namely the monotony decrease of the A-norm of the error of the conjugate gradient or the residual norm decrease of GMRES. We  assess the impact of the interpolation-restart techniques, the fault rate and the amount of lost data on the robustness of the resulting linear solvers.
For eigensolvers, we have also proposed such variants of interpolation-restart techniques for subspace  iterations, Arnoldi, implicitly restarted Arnoldi and Jacobi-Davidson methods by exploiting the numerical features of each algorithm in order to tune the interpolation-restart mechanism to each of them. We have also studied the numerical behaviors of our resilient eigensolvers and demonstrated through intensive experiments that in presence of faults, they exhibit a numerical robustness close to that of  fault-free calculations.

Comments are closed.