## Seminars

For any questions, please contact Antoine Commaret, David Loiseaux, Laure Ferraris or Wojciech Reise. The talks are transmitted online and recorded. The recordings of talks from previous years can be found on our YouTube Channel.

15/06 | Laurent Oudre |

25/05 | Christophe Pichon |

04/05 | Clément Maria |

Clément Levrard | |

Jae-Hun Jung | |

16/03 | Olympio Hacquard |

09/03 | Umut Şimşekli |

23/02 | Ximena Fernandez |

Michael Ghil, Denisse Sciamarella | |

26/01 | Laure Ferraris |

05/01 | Claire Brécheteau |

17/11 | Wolfgang Polonik |

Alice le Brigant |

For the duration of the thematic trimester GESDA at IHP, the regular seminars were paused.

15/06 (11h00 CEST)

**Speaker:** Laurent Oudre

**Title:** Étude de la marche grâce à des centrales inertielles : du traitement du signal à l’analyse topologique des données

Cet exposé reviendra sur le projet SmartCheck, qui vise à quantifier et étudier la marche en consultation grâce à des centrales inertielles. Nous présenterons un panorama des différentes approches utilisées au fil des années pour analyser ces données et pouvoir comparer deux enregistrements à des fins de comparaison inter-indivuelle et de suivi longitudinal. Les premières méthodes, issues du traitement du signal visaient à localiser les événements d’intérêt et les changements de comportements, grâce à des techniques de reconnaissance de formes et de détection de ruptures. Plus récemment, nous nous sommes demandés si ces étapes étaient nécessaires, et si une approche plus globale issue de l’analyse topologique des données ne permettrait pas de construire une distance adaptée entre enregistrements.

*25/05 (11h00 CEST)*

**Speaker: **Christophe Pichon

**Title:** DisPerSE: automatic identification of persistent structures in 2 and 3D complexes

DisPerSE stands for Discrete Persistent Structures Extractor. Its main purpose is to identify persistent topological features such as peaks, voids, walls and in particular filamentary structures within sampled distributions in 2D and 3D. It was developed with observational and numerical cosmology in mind (for the study of the so-called comic web in the Universe). I will present a few interesting applications and related theoretical breakthroughs from astrophysics.

*04/05 (11h00 CEST)*

**Speaker: **Clément Maria

**Title:** Aspects of Algorithmic Quantum Topology

In this talk, I will introduce notions from quantum topology through the algorithmic and computational complexity lenses. Quantum topology uses tools from mathematical physics to design topological invariants of knots and 3-manifolds. Computationally, the algorithmic complexity of these invariants are in connection with the complexity class #P of counting problems, and the quantum complexity class BQP. More recently, efficient and often optimal (under #ETH) algorithms have been designed using methods from parameterized complexity, where, in this framework, parameters may be of combinatorial or topological nature. I will try to give an overview of the state of the art and the future challenges of the field.

~~06/04~~ 08/06 (15h45 CEST)

**Speaker:** Clément Levrard

**Title:** Estimation optimale du Reach via estimation de métrique

Dans un contexte d’inférence géométrique, où le but est d’approcher le support d’une loi de probabilité P à partir d’un échantillon, le reach du support est un paramètre clé: il fournit une échelle locale en dessous de laquelle le support peut être considéré comme “plat”, et intervient de ce fait dans les vitesses d’estimation de support ainsi que dans les algorithmes de reconstruction effectifs de ce support.

Dans ce exposé je présenterai les différentes caractérisations de ce reach, ainsi que ses liens avec d’autres types d’échelles locales utilisées en géométrie algorithmique, certaines pouvant être estimées, d’autres non. Je conclurai en présentant les résultats obtenus avec Eddie Aamari et Clément Berenfeld, à savoir que la caractérisation métrique du reach (résultat relativement récent de Jean-Daniel Boissonnat, André Lieutier et Mathijs Wintraecken) fournit une voie d’accès “optimale” pour l’estimation du reach (au sens minimax).

~~23/03~~ 30/03 (11h00 CEST)

**Speaker:** Jae-Hun Jung

**Title:** Topological data analysis of time-series data

Time-series data analysis is found in various applications that deal with sequential data over the given interval of, e.g. time. In this talk, we discuss time-series data analysis based on topological data analysis (TDA). The commonly used TDA method for time-series data analysis utilizes the embedding techniques such as sliding window embedding. With sliding window embedding the given data points are translated into the point cloud in the embedding space and the method of persistent homology is applied to the obtained point cloud. In this talk, we first show some examples of time-series data analysis with TDA. The first example is from music data for which the dynamic processes in time is summarized by low dimensional representation based on persistence homology. The second is the example of the gravitational wave detection problem and we will discuss how we concatenate the real signal and topological features. Then we will introduce our recent work of exact and fast multi-parameter persistent homology (EMPH) theory. The EMPH method is based on the Fourier transform of the data and the exact persistent barcodes. The EMPH is highly advantageous for time-series data analysis in that its computational complexity is as low as O(N log N) and it provides various topological inferences almost in no time. The presented works are in collaboration with Mai Lan Tran, Chris Bresten and Keunsu Kim.

*16/03 (11h00 CET)
*

**Speaker:**Olympio Hacquard

**Title:**Statistical learning on measures: an application to persistence diagrams

We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $\mathcal{X}$. Formally, we observe data $D_N = (\mu_1, Y_1), \ldots, (\mu_N, Y_N)$ where $\mu_i$ is a measure on $\mathcal{X}$ and $Y_i$ is a label in ${0, 1}$. Given a set $\mathcal{F}$ of functions on $\mathcal{X}$, we build corresponding classifiers in the space of measures, and we provide upper and lower bounds on the Rademacher complexity of this new class of classifiers that can be expressed simply in terms of corresponding quantities for the class $\mathcal{F}$. If the measures $\mu_i$ are uniform over a finite set, this classification task boils down to a multi-instance learning problem. However, our approach allows more flexibility and diversity in the input data we can deal with. While such a framework has many possible applications, this work strongly emphasizes on classifying data via topological descriptors called persistence diagrams. These objects are discrete measures on $\mathbb{R}^2$, where the coordinates of each point correspond to the range of scales at which a topological feature exists. We will present several classifiers on measures and show how they can heuristically and theoretically enable a good classification performance in various settings in the case of persistence diagrams.

*09/03 (15h45 CET)
*

**Speaker:**Umut Şimşekli

**Title:**Generalization in SGD: Heavy-Tails and Fractal Structure

In this talk, we will first derive a generalization bound for SGD under the assumption that it can be well-modeled by a heavy-tailed stochastic differential equation. To do so, we will use tools from fractal geometry and geometric measure theory and make a connection between heavy-tails and generalization error through fractal geometry. Then, we will extend this framework by using the so called “persistent homology dimension”, which is a notion of fractal dimension that is defined through topological data analysis (TDA) notions. Thanks to this connection, we will show that generalization in neural networks can be efficiently computed by using TDA tools. This talk is based on the following papers: arXiv:2006.09313 and arXiv:2111.13171.

*23/02 (11h00 CET)
*

**Speaker:**Ximena Fernandez

**Title:**Intrinsic persistent homology via density-based metric learning

Slides

In this talk, I will present a recent density-based method to address the problem of estimating topological features from data in high dimensional Euclidean spaces under the manifold assumption. The key of this approach is to consider a sample metric known as Fermat distance to robustly infer the homology of the space of data points . We prove that such sample metric space GH-converges almost surely to the manifold itself endowed with an intrinsic (Riemannian) metric that accounts for both the geometry of the manifold and the density that produces the sample. This fact, joint with the stability properties of persistent homology, implies the convergence of the associated persistence diagrams, which present advantageous properties. We show that they are robust to the presence of (geometric) outliers in the input data and less sensitive to the particular embedding of the underlying manifold in the ambient space. Finally, we exhibit concrete applications of these ideas to time series analysis, with examples in real data.

This is joint work with E. Borghini, G. Mindlin and P. Groisman, Intrinsic persistent homology via density-based metric learning, arXiv preprint arXiv: 2012.07621, 2020.

~~16/02~~ 11/05 (11h00 CET)

**Speaker:** Michael Ghil, Denisse Sciamarella

**Title:** Chaos, stochasticité et topologie algébrique : nouvelles méthodes et leur application au climat

Abstract here.

*26/01 (11h00 CET)
*

**Speaker:**Laure Ferraris

**Title:**Imiter la lumière pour l’analyse topologique des données

Dans cet exposé, il sera question de “metric learning”.

Dans un premier temps je présenterai la famille des distances dites de Fermat. Ces distances définissent un plus court chemin entre deux points qui, à la manière de la lumière, traverse différents milieux plus ou moins rapidement. Interprétant un sous-ensemble de R^D comme le support d’une mesure de probabilité, dans ce modèle, le chemin ne sort pas du support et est accéléré lorsqu’il traverse des régions à forte densité. Ainsi la distance s’adapte d’une part, à la géométrie et d’autre part, à la distribution de la masse. Cet effet est modulable par un paramètre dont le choix, délicat, peut dépendre du contexte d’application mais aussi de contraintes statistiques ou encore du problème de la complexité numérique.

J’introduirai la famille des estimateurs de la distance Fermat et les premiers résultats sur ses propriétés statistiques. Enfin, j’évoquerai les questions ouvertes qui occupent un travail de recherche en cours en collaboration avec Matthieu Jonckheere, Facundo Sapienza, Pablo Groisman, Frédéric Pascal et Frédéric Chazal.

Dans un second temps, en guise d’ouverture, j’évoquerai mon projet de recherche sous la direction de Frédéric Chazal: “L’introduction d’une nouvelle distance entre deux points la DTM-Fermat et ses applications en analyse topologique des données”.

*05/01 (15h45 CET)
*

**Speaker:**Claire Brécheteau (Université de Nantes)

**Title:**Approcher des données par une union de boules ou d’ellipsoïdes et partitionnement

Dans cet exposé, il sera question de construire un proxy de le fonction distance à un compact, à partir d’un nuage de points générés sur ce compact, avec du bruit.

Ce proxy sera construit à partir d’un critère de type k-means, avec une divergence de Bregman. Ses sous-niveaux seront des unions de boules. Je présenterai l’utilisation de ce proxy à des fins de partitionnement de données.

Il s’agit de travaux publiés dans :

– Claire Brécheteau and Clément Levrard, A k-points-based distance for robust geometric inference. Bernoulli 2020, Vol. 26, No. 4, 3017-3050

– Claire Brécheteau and Aurélie Fischer and Clément Levrard, Robust Bregman Clustering. Annals of Statistics 2021, Vol. 49, No. 3, 1679-1701

– Claire Brécheteau, Robust anisotropic power-functions-based filtrations for clustering.

Symposium on Computational Geometry 2020, 23:1-23:15

*17/11 (15h45 CET)
*

**Speaker:**Wolfgang Polonik (UC Davis)

**Title:**Multiscale Geometric Feature Extraction

A method for extracting multiscale geometric features from a data cloud is presented. Each pair of data points is mapped into a real-valued feature function, whose construction is based on geometric considerations and the novel notion of a distribution of (local) data depths. The collection of these feature functions is then being used for further data analysis. In contrast to the popular kernel-trick, our feature functions are functions of a one-dimensional parameter, and thus they can be used to visualize geometric aspects of a high-dimensional data cloud. Besides visualization, applications include classification and anomaly detection. The performance of the methodology is illustrated through applications to real data sets, and some theoretical guarantees supporting the performance of the proposed methodology are presented. This is joint work with G. Chandler.

~~10/11~~ (15h45 CET) **Rescheduled to 08/12**

**Speaker:** Alice Le Brigant

**Title:** Fisher information geometry of Dirichlet distributions

The Fisher information can be used to define a Riemannian metric to compare probability distributions inside a parametric family. The most well-known example is the case of (univariate) normal distributions, where the Fisher information induces hyperbolic geometry. In this talk we will investigate the Fisher information geometry of Dirichlet distributions, and beta distributions as a particular case. We show that it is negatively curved and geodesically complete. This guarantees the uniqueness of the notion of mean distribution, and makes it a suitable geometry to apply the K-means algorithm, e.g. to compare and classify histograms.

## Previously

**2021-2022**

30/06 | Edouard Bonnet |

23/06 | Théo Lacombe |

16/06 | Quentin Mérigot & Mathijs Wintraecken & André Lieutier |

02/06 | Dmitriy Morozov |

19/05 | Thomas Bonis |

28/04 | Jean Feydy & Anna Song |

21/04 | Anthea Monod |

07/04 | Marc Hoffman |

10/03 | Jeremy Capitao |

24/02 | Vadim Lebovici |

17/02 | Nicolas Berkouk & Raphael Reinhauer |

03/02 | Mathieu Carriere |

13/01 | Olympio Hacquard |

06/01 | Catherine Aaron |

09/12 | Vadim Lebovici |

25/11 | Eduardo Mendes |

18/11 | Jonathan Spreer |

14/10 | Marine le Morvan |

07/10 | Etienne Lasalle |

23/09 | Daniel Perez |

*30/06 (11h00 CEST)*

**Speaker: **Édouard Bonnet

**Title: **The geometry of twin-width

We introduce and survey the key features of twin-width, and more generally, of the so-called contraction sequences, insisting on what is and what’s not geometric or topological about them.

*23/06 ( 14h00 CEST)
*

**Speaker:**Théo Lacombe

**Title:**Transport Optimal : Régularisation Entropique et Homogénéité

Le problème de transport optimal, dont les origines remontent à Monge, fournit une métrique pour comparer des mesures de probabilités via un problème d’optimisation. Si ce problème possède de nombreuses propriétés théoriques importantes, sa résolution numérique devient rapidement lourde et sa limitation aux mesures de probabilités est contraignante.

Cet exposé proposera une introduction au problème de transport optimal, d’abord dans sa forme la plus classique, puis dans ses variantes modernes qui font son succès aujourd’hui notamment en apprentissage : la régularisation entropique et le transport déséquilibré. Dans un travail récent [1], T.Séjourné et ses co-auteurs proposent un modèle unifiant ces deux formalismes. Dans certains cas (non-standard), ce nouveau modèle fait apparaître des inhomogénéités qui peuvent mener à des incohérences numériques. Nous ferons la lumière sur ce phénomène et proposerons une correction possible pour obtenir un modèle de transport régularisé déséquilibré homogène, introduit dans [2].

[1] T Séjourné, J Feydy, FX Vialard, A Trouvé, G Peyré : Sinkhorn divergences for unbalanced optimal transport.

[2] TL. An Homogeneous Unbalanced Regularized Optimal Transport model with application to Optimal Transport with Boundary.

*16/06 (15h45 CEST)
*

**Speaker:**Quentin Mérigot

**Title:**Convexité cachée pour le problème de la quantification optimale uniforme

En apprentissage automatique et en problèmes inverses, il est parfois nécessaire de générer (ou déformer) un nuage de points de sorte à approcher une mesure de probabilité modèle 𝜌. Une manière naturelle d’y parvenir est de chercher à minimiser la distance de Wasserstein de la mesure uniforme sur le nuage de points par rapport à la distribution modèle :

Ce problème de minimisation, dans lequel les inconnues sont les positions des atomes (et non leur masse), n’est pas convexe. Il admet des points critiques dont l’énergie est beaucoup plus grande que celle du minimiseur. Pourtant, dans la plupart des cas, une version adaptée de l’algorithme de Lloyd — une méthode de descente de gradient à pas constant — conduit à des configurations présentant une faible énergie. Nous expliquons quantitativement ce comportement, en montrant en particulier que si les points initiaux ne sont pas trop proches les uns des autres, alors une seule étape de l’algorithme de Lloyd est suffisante pour obtenir une bonne approximation de de 𝜌. Je parlerai également d’un résultat plus récent, qualitatif, montrant en dimension 𝑑=2 que l’énergie de quantification des points critiques stables est en réalité commensurable à

l’énergie du minimiseur. Cet exposé est basé sur des travaux en commun avec A. Figalli, F. Santambrogio et C. Sarrazin.

*16/06 (11h20 CEST)*

**Speaker:**André Lieutier

**Title:**Delaunay-like triangulation of smooth orientable submanifolds by least L1-norm minimization

We focus on one particular instance of the shape reconstruction problem, in which the shape we wish to reconstruct is an orientable smooth submanifold of the Euclidean space. Assuming we have as input a simplicial complex K that approximates the submanifold (such as the Čech complex or the Rips complex), we recast the reconstruction problem as a least L1-norm minimization problem in which the optimization variable is a chain of K. Providing that K satisfies some reasonable conditions, we prove that the considered minimization problem has a unique solution, whose support is a triangulation of the manifold called the flat Delaunay complex. If the point sample on which the Čech complex or the Rips complex is built satisfies similar « reasonable conditions », this triangulation is the tangential Delaunay complex introduced by Boissonnat et al.. Since the objective is a weighted L1-norm and the contraints are linear, the triangulation process can thus be implemented by linear programming.

By Dominique Attali (CNRS) and André Lieutier.

*16/06 (10h30 CEST)*

**Speaker:**Mathijs Wintraecken

**Title:**Optimal homotopy reconstruction results à la Niyogi, Smale, and Weinberger

We show that the proof of the homotopy reconstruction result by Niyogi, Smale, and Weinberger can be streamlined considerably using Federer’s work on the reach and several geometric observations. While Niyogi, Smale, and Weinberger restricted themselves to C² manifolds $(M)$ with positive reach, our proof extends to sets $(S)$ of positive reach.

Furthermore, the sample we consider does not have to lie directly on the set of positive reach. Instead, we assume that the two one-sided Hausdorff distances (δ from the set $S$ to the sample $P$ and ε the other Hausdorff distance) are bounded. We provide explicit bounds in terms of ε and δ, that guarantee that there exists a parameter $r$ such that the union of balls $\bigcup_{p\in P} B(p, r) $ deformation retracts to $S$. We provide even better bounds for the manifold case. In both cases, our bounds improve considerably on the state-of-the-art. In fact in all cases, except where the set is a one dimensional manifold (with boundary), our bounds are optimal.

Joint work with Dominique Attali, Hana Dal Poz Kouřimská, Chris Fillmore, Ishika Ghosh, André Lieutier, and Elizabeth Stephenson.

*02/06 ( 17h00 CEST)*

**Speaker:**Dmitriy Morozov

**Title:**Topological Optimization with Big Steps

Using persistent homology to guide optimization has emerged as a novel application of topological data analysis. Existing methods treat persistence calculation as a black box and backpropagate gradients only onto the simplices involved in particular pairs. We show how the cycles and chains used in the persistence calculation can be used to prescribe gradients to larger subsets of the domain. In particular, we show that in a special case, which serves as a building block for general losses, the problem can be solved exactly in linear time. We present empirical experiments that show the practical benefits of our algorithm: the number of steps required for the optimization is reduced by an order of magnitude.

Joint work with Arnur Nigmetov.

*19/05 (15h45)*

**Speaker: **Thomas Bonis

**Title:** Sampling with Stein variational gradient descent

Many methods in Bayesian statistics require approximating intractable integrals. After presenting the usual methods used to deal with this issue, namely variational inference and Monte-Carlo approaches, along with their respective shortcomings, I will introduce a novel algorithm called Stein variational gradient descent and present known theoretical results regarding this algorithm.

*28/04 (11h00 – 12h00)*

**Speaker: **Jean Feydy

**Title:** Fast geometric libraries for vision and data sciences

From 3D point clouds to high-dimensional samples, sparse representations have a key position in the data science toolbox. They complement 2D images and 3D volumes effectively, enabling fast geometric computations for e.g. Gaussian processes and shape analysis.

In this talk, I will present extensions for PyTorch, NumPy, Matlab and R that speed up fundamental computations on (generalized) point clouds by several orders of magnitude, when compared with PyTorch or JAX GPU baselines. These software tools allow researchers to break through major computational bottlenecks in the field and have been downloaded more than 100k times over the last few years.

The presentation may be of interest to all researchers who deal with point clouds, time series and segmentation maps, with a special focus on:

1. Fast and scalable computations with (generalized) distance matrices.

2. Efficient and robust solvers for the optimal transport (= “Earth Mover’s”) problem.

3. Applications to shape analysis and geometric deep learning, with a case study on the “pixel-perfect” registration of lung vessel trees.

References:

– “Geometric data analysis, beyond convolutions”

– “Fast geometric learning with symbolic matrices”

– KeOps library (geometric computations)

– GeomLoss library (optimal transport)

*28/04 (10h00 – 11h00)*

**Speaker: **Anna Song

**Title:** Analyzing branching shapes with cycle matching and signed distance persistent homology.

*07/04 (15h45 CET)*

**Speaker: **Marc Hoffman

**Title:** Quelques questions d’inférence géométrique autour de la reconstruction d’une densité ou de son support

Nous ferons un point dans un exposé un peu généraliste sur quelques questions d’inférence géométrique. Etant donné l’observation de la réalisation d’un échantillon d’une loi P de probabilité dont le support est une sous-variété M compacte de dimension d<D, où D est la dimension de l’espace euclidien ambiant, nous nous intéressons à plusieurs questions lorsque P est absolument continue par rapport à la forme volume de M: reconstruction minimax de M pour la distance de Hausdorff reconstruction de la “géométrie” de M à travers son laplacien, reconstruction de propriétes topologiques de M, comme son homologie. Nous préciserons certains résultats récents obtenus en collaboration avec C. Berenfeld, J. Harvey et K. Shankar sur (i) l’estimation du reach (au sens de Federer) de M, invariant qui combine courbure maximale et distance d’étranglement de la sous-variété et qui doit nécessairement être contrôlé en vue de l’identifiabilité du modèle statistique sous-jacent et (ii) l’estimation (adaptative) de la densité de P dans un contexte général, notamment lorsque M est inconnue.

~~31/03 (11h CET) ~~ **Postoponed to 21st of April**

**Speaker: **Anthea Monod

**Title:** Approximating persistent homology for large datasets

Persistent homology is an important methodology from topological data analysis which adapts theory from algebraic topology to data settings and has been successfully implemented in many applications; it produces a summary in the form of a persistence diagram, which captures the shape and size of the data. Despite its popularity, persistent homology is simply impossible to compute for very large datasets which prohibits its widespread use in many big data settings. What can we do if we would like a representative persistence diagram for a very large dataset whose persistent homology we cannot compute due to size restrictions? We adapt here the classical statistical method of bootstrapping, namely, drawing and studying smaller subsamples from the large dataset. We show that the mean of the persistence diagrams of subsamples is a valid approximation of the persistence diagram of the large dataset. Specifically, we show that the expected persistence diagram—taken as a persistence measure computed from the subsamples—is a consistent estimator of the true persistence diagram of the larger dataset. Studying the bias-variance decomposition serves as a determiner for the number and size of subsamples. Quantizing the persistence measure then gives us back persistence diagram representation. We demonstrate our approach on synthetic and real data; furthermore, we give an example of the utility of our approach in a shape clustering problem where we are able to obtain accurate results with only 2% subsampled from the original dataset.

This is joint work with Yueqi Cao (Imperial College London).

*10/03 (15h15 CET)*

**Speaker: **Jeremie Capitao Miniconi

**Title:** Deconvolution of spherical data corrupted with unknown noise.

We consider the deconvolution problem for densities supported on a $d$-dimensional sphere with unknown center and unknown radius, in the situation where the distribution of the noise is unknown and without any other observations. We propose estimators of the radius, of the center, and of the density of the signal on the sphere that are proved consistent without further information. The estimator of the radius is proved to have almost parametric convergence rate for any dimension $d$. When $d=2$, the estimator of the density is proved to achieve the same rate of convergence over Sobolev regularity classes of densities as when the noise distribution is known.

This is a work in collaboration with Elisabeth Gassiat (LMO).

*24/02 (11h00 CET)*

**Speaker: **Vadim Lebovici

**Title:** Magnitude persistante

Dans cet exposé, je présenterai un invariant des modules de persistance introduit récemment par Govc et Hepworth et appelé la “magnitude persistante”. Simple à définir, la magnitude d’un module de persistance est la somme des (-1)^d . (e^-a – e^-b) pour chaque barre [a,b) du code-barres de degré homologique d. Cet invariant possède de nombreuses propriétés formelles, comme l’additivité par rapport aux suites exactes, la compatibilité à un certain produit tensoriel, ou une expression explicite dans le cas de filtrations de sous-niveaux. De plus, son lien avec la caractéristique d’Euler en fait un invariant plus rapide à calculer que le code-barres. Enfin, il se généralise naturellement aux modules multipersistants sans perdre les propriétés mentionnées ci-dessus. Si le temps le permet, j’expliquerai les liens de la magnitude persistante avec la magnitude des espaces métriques finis qui ont originellement motivé sa définition par Govc et Hepworth.

*17/02 (11h00 CET)*

**Speaker: **Raphael Reinauer

**Title:** Persformer: A Transformer Architecture for Topological Machine Learning

One of the main challenges of Topological Data Analysis (TDA) is to extract meaningful features from persistent diagrams directly usable by machine learning algorithms. Indeed, persistence diagrams are intrinsically (multisets of points in $\mathbb R^2$ and cannot be seen straightforwardly as vectors).

This talk introduces Persformer, the first Transformer neural network architecture that accepts persistence diagrams as input. The Persformer architecture significantly outperforms previous topological neural network architectures on classical synthetic benchmark datasets and has competitive performance on graph datasets.

Furthermore, we introduce an interpretability method to topological machine learning to provide meaningful insights into the importance of the topological features in the persistence diagram of classical datasets. We show that having an expressive model like the Persformer allows us to find meaningful vector representations of persistence diagrams and analyze them using our interpretability method.

We expect our topological interpretability method to be helpful as an exploratory tool to understand use cases where the shape of a dataset is essential, such as biology or material science.

*03/02 (15h15 CET)*

**Speaker: **Mathieu Carrière

**Title:** Statistical analysis of Mapper for stochastic and multivariate filters

Reeb spaces, as well as their discretized versions called Mappers, are common descriptors used in Topological Data Analysis, with plenty of applications in various fields of science, such as computational biology and data visualization, among others. The stability and quantification of the rate of convergence of the Mapper to the Reeb space has been studied a lot in recent works, focusing on the case where a scalar-valued filter is used for the computation of Mapper. On the other hand, much less is known in the multivariate case, when the codomain of the filter is Rp, and in the general case, when it is a general metric space (Z, dZ), instead of R. In this talk, I will introduce a slight modification of the usual Mapper construction and give risk bounds for estimating the Reeb space using this estimator. Our approach applies in particular to the setting where the filter function used to compute Mapper is also estimated from data, such as the eigenfunctions of PCA. I will also provide preliminary applications of this estimator in statistics and machine learning for different kinds of target filters.

This is joint work with Bertrand Michel.

*13/01 (11h00 CET)*

**Speaker: **Olympio Hacquard

**Title:** Topologically penalized regression on manifolds

We study a regression problem on a compact manifold M. In order to take advantage of the underlying geometry and topology of the data, the regression task is performed on the basis of the first several eigenfunctions of the Laplace-Beltrami operator of the manifold, that are regularized with topological penalties. The proposed penalties are based on the topology of the sub-level sets of either the eigenfunctions or the estimated function. The overall approach is shown to yield promising and competitive performance on various applications to both synthetic and real data sets. We also provide theoretical guarantees on the regression function estimates, on both its prediction error and its smoothness (in a topological sense). Taken together, these results support the relevance of our approach in the case where the targeted function is “topologically smooth”.

*06/01 (at 15h15 CET!)*

**Speaker:**Catherine Aaron

**Title:**Local convex hull for support, density and level set estimation

In “A local nearest-neighbor convex-hull construction of home ranges and utilization distributions” authors proposed an algorithm for “home ranges” and “utilization distributions” that can be translated into statistics as support and density estimation.

In a first part we give the properties of the associated support estimator when the support is “full dimensional” but also when the support is a lower dimensional manifold (with or without boundary).

A second part is dedicated to the density (and level set) estimation. When applying the local convex hull idea for that purpose the initial estimator is biased and this bias can be reduced. We will present the bias reduction method and show that it allows to have density estimator adapted to (unknown) compact support. An extension of the proposed estimator when the distribution is supported by a (unknown) lower dimensional manifold with or without boundary is also given.

*09/12 (at 16h30 CET!)*

**Speaker:**Vadim Lebovici

**Title:**Hybrid transforms of constructible functions

Euler calculus — integration of constructible functions with respect to the Euler characteristic — is of increasing interest in TDA. For instance, the (constructible) Radon transform has provided an answer to the following important question: are two subsets of R^n with same persistent homology in all degrees and for all height filtrations equal?

In this talk I will introduce integral transforms combining Lebesgue integration and Euler calculus. We will see that Lebesgue integration gives access to regularity results, while Euler calculus conveys topological information. For instance, we get mean formulae for these transforms in the context of sublevel-sets persistent homology for random filtrations.

*25/11 (at 15h45 CET!)*

**Speaker:**Eduardo Fonseca Mendes

**Title:**Generalized Information Criteria for Structured Sparse Models

Regularized M-estimators are widely used due to their ability of recovering a low-dimensional model in high-dimensional scenarios. Some recent efforts on this subject focused on creating a unified framework for establishing oracle bounds and deriving conditions for support recovery. Under this same framework, we propose a new Generalized Information Criteria that takes into consideration the sparsity pattern one wishes to recover. We obtain non-asymptotic model selection bounds and sufficient conditions for model selection consistency of the GIC. Furthermore, we show that one may use the GIC for selecting the regularization parameter in a way that the sequence of model subspaces contains the true model with probability converging to one. This allows practical use of the GIC for model selection in high-dimensional scenarios. We illustrate those conditions on examples including group LASSO generalized linear regression and low rank matrix regression.

It is joint work with a former student (Gabriel Pinto PhD Student @ Northwestern University).

*18/11
*

**Speaker:**Jonathan Spreer

**Title:**An algorithm to compute the crosscap number of a knot

The crosscap number of a knot is the non-orientable counterpart of its

genus. It is defined as the minimum of one minus the Euler characteristic of

S, taken over all non-orientable surfaces S bounding the knot. Computing the

crosscap number of a knot is tricky, since normal surface theory – the usual

tool to prove computability of problems in 3-manifold topology, does not

deliver the answer “out-of-the-box”.

In this talk, I will review the strengths and weaknesses of normal surface

theory, focusing on why we need to work to obtain an algorithm to compute

the crosscap number. I will then explain the theorem stating that an

algorithm due to Burton and Ozlen can be used to give us the answer.

This is joint work with Jaco, Rubinstein, and Tillmann.

*14/10*

**Speaker:** Marine Le Morvan

**Title:** What’s a good imputation to predict with missing values ?

How to learn a good predictor on data with missing values ? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation may not be needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.

*07/10*

**Speaker:** Etienne Lasalle

**Title:** Heat diffusion distance processes: a statistically founded method to analyze graph data sets.

When working with weighted graphs, one can interpret weights as the thermal conductivity of edges. This means that heat diffuses faster along edges with higher weights. Given initial conditions, one can use the way heat diffuses to compare graphs. But choosing a relevant and informative diffusion time is often essential and challenging. To circumvent this issue, we choose to take into account the whole diffusion process. For that, we define real-valued processes indexed by all the diffusion times in [0, T] for some T > 0, namely the Heat Kernel Distance (HKD) process and the Heat Persistence Distance (HPD) process. Borrowing tools from TDA, the HPD process is able to compare graphs without known node correspondence or even graphs with different sizes. In this talk, I will introduce these processes and present their statistical properties. Namely, we proved under mild assumptions that they verify a functional central limit theorem and admit a gaussian approximation. Moreover, I will present potential applications of these processes (the construction of confidence bands and two-sample tests, the study of neural networks through their activation graphs).

*23/09*

**Speaker:** Daniel Perez

**Title:** On zeta functions and their application to the study of the topology of superlevel sets of stochastic processes

We introduce the notion of stochastic zeta function, which for some classes of processes share many properties with their counterparts in analytic number theory. These zeta functions, defined in terms of the barcode, are related to a dual variable counting the number of bars in the bars in the barcode of length ≥ε. A “prime number theorem” can then be proved for this dual variable in terms of the zeta function of the considered process.

**2020-2021**

*01/04*

**Speaker:** Pierre Baudot

**Title:** Cohomological characterization of Information and Higher Order Statistical Structures – Machine Learning and Statistical Physic Aspects

We will visit methods that quantify the statistical interactions structure within a given data set using the characterization of information theory in cohomology, and provide their expression in term of statistical physic and machine learning. In a first part, we will have a look at the formalism of Information Cohomology obtained with Bennequin and extended by Vigneaux to Tsallis entropies constructed on complexes of random variables. It allows to characterize entropy uniquely as the first class of the cohomology and Multivariate Mutual Informations as coboundary.

In a second part, we will have a look at the application of the computationally tractable simplicial subcase to genetic expression and standard dataset in machine learning. Mutual statistical independence is equivalent to the vanishing of all k-MI (Ik=0), leading to the conclusion that the Ik provide refined meepiasures of statistical dependencies and that the cohomology quantifies the obstruction to statistical factorization. The application to genetic expression and cell-type classification recovers the known differential expressions and co-regulations of genetic modules. The complex has a direct interpretation in terms of unsupervised and supervised deep learning where the neural network architecture is given by the chain complex and backpropagation becomes forward (cohomological). Some preliminary examples of applications are given on classical datasets.

*25/03*

**Speaker:** John Harvey

**Title:** Estimating the reach of a submanifold

The reach is an important geometric invariant of submanifolds of Euclidean space. It is a real-valued global invariant incorporating information about the second fundamental form of the embedding and the location of the first critical point of the distance from the submanifold. In the subject of geometric inference, the reach plays a crucial role. I will give a new method of estimating the reach of a submanifold, developed jointly with Clément Berenfeld, Marc Hoffmann and Krishnan Shankar. This results in improved convergence rates, but a minimax optimal estimator remains to be found.

*11/03*

**Speaker:** Amit Moscovich

**Title:** Nonparametric estimation of high-dimensional shape spaces with applications to structural biology

Over the last twenty years, there have been major advances in non-linear dimensionality reduction, or manifold learning, and nonparametric regression of high-dimensional datasets with low intrinsic dimensionality. A key idea in this field is the use of data-dependent Fourier-like basis vectors given by the eigenvectors of a graph Laplacian. In this talk, I will discuss the application of these methods for mapping spaces of volumetric shapes with continuous motion. Three lines of research will be presented:

(i) High-dimensional nonparametric estimation of distributions of volumetric signals from noisy linear measurements.

(ii) Leveraging the Wasserstein optimal transport metric for manifold learning and clustering.

(iii) Non-linear independent component analysis for analyzing independent motions.

A key motivation for this work comes from structural biology, where breakthrough advances in cryo-electron microscopy have led to thousands of atomic-resolution reconstructions of various proteins in their native states. However, the success of this field has been mostly limited to the solution of the inverse problem aimed at recovering a single rigid structure, while many important macromolecules contain several parts that can move in a continuous fashion, thus forming a manifold of conformations. The methods described in this talk present progress towards the solution of this grand challenge, namely the extension of reconstruction methods which estimate a single 3D structure to methods that recover entire manifolds of conformations.

*18/02*

**Speaker:** Bastian Rieck

**Title:** Topological Representation Learning for Structured and Unstructured Data

Topological data analysis emerged as an effective tool in machine learning, supporting the analysis of neural networks, but also driving the development of novel algorithms that incorporate topological characteristics. This talk will focus on learning appropriate topological representations for unstructured data sets, such as point clouds, and structured data sets, such as graphs. Specifically, we will discuss how to use topological data analysis to obtain topologically faithful embeddings. We will also study graph classification, which is of particular interest since graphs are inherently amenable to a topological description in terms of their connected components and cycles. Next to describing ‘deep’ approaches for such tasks, this talk will also build a bridge to graph neural networks and demonstrate a topological variant of ‘readout’ and ‘aggregation’ functions, which can be learned end-to-end.

*28/01*

**Speaker:** Ery Arias-Castro

**Title:** On Using Graph Distances to Estimate Euclidean and Related Distances

Graph distances have proven quite useful in machine learning/statistics, particularly in the estimation of Euclidean or geodesic distances, and as such have been used to embed a graph (the multidimensional scaling problem). The talk will include a partial review of the literature and then present more recent developments, including on the minimax estimation of distances on a surface and consequences for manifold learning, on the estimation of curvature-constrained distances on a surface, and on the estimation of Euclidean distances based on an unweighted and noisy neighborhood graph.

*21/01*

**Speaker:** Antonin Guilloux and Guillaume Moroz

**Title:** Visualisation d’une fibration du complémentaire du noeud de huit

Le complémentaire du noeud de huit dans la sphère S^3 est une variété de dimension 3 dont on sait qu’elle est homéomorphe à un fibré en tores épointés sur le cercle. De manière analogue aux fibrations de Milnor, nous utilisons une paramétrisation du noeud comme lieu des zéros de deux polynômes réels pour décrire explicitement dans R^3 les fibres. Chaque fibre est alors une hypersurface lisse à bord le noeud dans R^3, définie par l’annulation d’un polynôme à coefficients réels. Nous pouvons alors calculer un mesh pour suffisamment de ces surfaces et les représenter.

Cela mène à une visualisation, sous forme de film, de cette fibration et d’une courbe significative mathématiquement. Nous présenterons les diverses étapes de ce travail.

*07/01*

**Speaker:** Owen Rouillé

**Title:** Computation of Large Asymptotics of 3-Manifold Quantum Invariants

Quantum topological invariants have played an important role in computational topology, and they are at the heart of major modern mathematical conjectures. In this article, we study the experimental problem of computing large r values of Turaev-Viro invariants TV_r. We base our approach on an optimized backtracking algorithm, consisting of enumerating combinatorial data on a triangulation of a 3-manifold. We design an easily computable parameter to estimate the complexity of the enumeration space, based on lattice point counting in polytopes, and show experimentally its accuracy. We apply this parameter to a preprocessing strategy on the triangulation, and combine it with multi-precision arithmetics in order to compute the Turaev-Viro invariants. We finally study the improvements brought by these optimizations compared to state-of-the-art implementations, and verify experimentally Chen and Yang’s volume conjecture on a census of closed 3-manifold.

*17/12*

**Speaker:** Vincent Divol

**Title:** Density estimation on manifolds: an optimal transport approach

Density estimation is one of the most classical problem in nonparametric statistics: given i.i.d. samples X_1,…,X_n from a distribution \mu with density f on R^D, the goal is to reconstruct the underlying density (say for instance for the L_p norm). This problem is known to become untractable in high dimension D >> 1. We propose to overcome this issue by assuming that the distribution \mu is actually supported around a low dimensional unknown shape M, of dimension d << D. After showing that this problem is degenerate for a large class of standard losses (L_p, total variation, etc.), we focus on the Wasserstein loss, for which we build a minimax estimator, based on kernel density estimation, whose rate of convergence depends on d, and on the regularity of the underlying density, but not on the ambient dimension D.

*03/12*

**Speaker:** Serguei Barannikov

**Title:** Canonical forms = Persistence diagrams.

Any filtered complex over a field F can be brought by a linear transformation preserving the filtration to so called canonical form, a canonically defined direct sum of filtered complexes of two types: one-dimensional complexes with trivial differential d(e_{t_i})=0 and two-dimensional complexes with trivial homology d(e_{s_j})=e_{r_j}. The talk is devoted to the proof of this theorem, that was first published in the speaker’s 1994 paper “Framed Morse complex and its invariants”, AMS, Advances in Soviet Mathematics 21: 93–115. In that work these canonical form invariants were applied to Morse complexes which calculate homology of sublevel sets. Starting from middle of 2000s these invariants became widely popular in Applied Mathematics under the name of “persistence diagrams” and “persistence barcodes”. The mentioned classification theorem is usually referred in Applied Mathematics as the Persistent homology Main (or Structure, or Principal) Theorem. It is curious that the first dissertation in Applied Mathematics on the subject of these persistence invariants, which appeared in 2001 with parts published in 2000, also considered as principal example of filtered complexes exactly the same Morse complexes, in the particular case of two dimensions. It also had a big interesting part devoted to “canonical bases”. Many other filtered complexes are nowadays considered (Cech, simplicial, cubical etc) for sublevel homology of a function on topological space. Most often, the filtered complexes describing sublevel homology of the distance to a point-cloud set function are considered over last years. The algorithms of more than 10 different software platforms, that exist actually for computation of persistence diagrams, are based on the described in the mentioned 1994 paper algorithm bringing filtered complexes to the canonical form. After the proof of this theorem, final part of the talk is devoted to some applications of this theorem in pure mathematics, in particular in algebraic topology and differential geometry.

*26/11*

**Speaker:** Wojciech Reise

**Title:** Towards Pattern Detection using Dynamic Time Warping

This talk will be centered around pattern detection and Dynamic Time Warping. Using the problem of velocity estimation from magnetic data in navigation systems, I will motivate the need for detecting repeated patterns in time series and the use of Dynamic Time Warping to that aim. I will recall the definition as the technique was originally introduced, and present a selection of more recent variants, including a differentiable version.

*12/11*

**Speaker:** Nina Otter

**Title:** Magnitude homology: overview and current directions

Magnitude is an isometric invariant of metric spaces that was introduced by Tom Leinster in 2010, and is currently the object of intense research, since it has been shown to encode many invariants of a metric space such as volume, dimension and capacity. Magnitude homology is a homology theory for metric spaces that was introduced by Hepworth-Willerton and Leinster-Shulman, and categorifies magnitude in a similar way as the singular homology of a topological space categorifies its Euler characteristic.

In this talk I will first introduce magnitude and magnitude homology. I will then give an overview of existing results and current research in this area, explain how magnitude homology is related to persistent homology, and finally discuss some open questions and work in progress.

*05/11*

**Speaker:** Mathieu Carriere

**Title:** (Multiparameter) Persistent Homology for Quantitative Multiplex Immunofluorescence Images.

In this talk, I will present a novel application of persistent homology to characterize the spatial arrangement of immune and epithelial (tumor) cells within the breast cancer immune microenvironment. More specifically, quantitative and robust characterizations are built by computing persistence diagrams out of a staining technique (quantitative multiplex immunofluorescence) which allows to obtain spatial coordinates and stain intensities on individual cells. The resulting persistence diagrams are evaluated as characteristic biomarkers of cancer subtype and prognostic biomarker of overall survival, and outperform and complement the usual descriptors which capture spatial relationships with nearest neighbor analysis. I will also present an extension of persistence diagrams that we used to capture the behaviour of multiple stain intensities at once, and which is based on multiparameter persistence.

*29/10*

**Speaker:** Kristof Huszar

**Title: **Combinatorial width parameters for 3-manifolds

Algorithms in computational 3-manifold topology typically take a triangulation as an input and return topological information about the underlying 3-manifold. However, extracting the desired information from a triangulation (e.g., evaluating an invariant) is often computationally very expensive. In recent years this complexity barrier has been successfully tackled in some cases by importing ideas from the theory of parameterized algorithms into the realm of 3-manifolds. In this talk we focus on the key combinatorial parameter in this context (i.e., the treewidth of a compact, orientable 3-manifold), and relate it with classical topological invariants (e.g., Heegaard genus) in a quantitative way.

Joint work with Jonathan Spreer and Uli Wagner.

*22/10*

**Speaker:** Olympio Hacquard

**Title: **Regression on the Laplace Beltrami operator eigenfunctions using a topological penalty

We study a regression setting on a compact Riemannian manifold without boundary. We observe a n-sample (x1, y1), … (xn, yn) where the xi lie on the manifold and the yi are real numbers. The purpose is to estimate the regression function in order to predict the value when given a new point on the manifold. The regression is performed on the first eigenfunctions of the Laplace Beltrami operator. Since these functions are hard to estimate, we use a graph approximation of the manifold and replace the spectrum of the Laplace Beltrami operator by the spectrum of the Laplacian of the graph. We will discuss some of the problems that arise when going from a continuous problem to a discrete one. As the Laplacian eigenfunctions are oscillating, they tend to overfit the data and it is necessary to introduce a penalty to the model. The novelty of the method described here is the introduction of a topological penalty, which is equivalent to performing a Lasso weighted by the topological persistence of each eigenfunction, based on the heuristic idea that the functions that oscillate a lot have a high persistence. This method seems to provide promising experimental results to predict new data when given a noisy input and finding theoretical guarantees on the performance of the estimator is still under progress.

*15/10*

**Speaker:** Vincent Rouvreau

**Title: **What is new in Gudhi 3.3.0 ?

In this talk, I will present Edge Collapse and Persistence of Flag Complexes (https://drops.dagstuhl.de/opus/volltexte/2020/12177) and how to use it with Gudhi. Then I will show benchmark results and my conclusions on which algorithm to use in function of the data characteristics.

And finally, we will play with ToMaTo (https://dl.acm.org/doi/10.1145/2535927), now it is integrated in the Gudhi library.

*08/10*

**Speaker:** Raphaël Tinarrage

**Title: **Topological inference from measures and vector bundles (defense rehearsal)

We contribute to the theory of topological inference, based on the theory of persistent homology, by proposing three families of filtrations. For each of them, we prove consistency results—that is, quality of approximation of an underlying geometric object—, and stability results—that is, robustness against initial measurement errors. We propose concrete algorithms in order to use these methods in practice. The first family, the DTM-filtrations, is a robust alternative to the usual Cech filtration when the point cloud is noisy or contains anomalous points. It is based on the notion of distance to measure, which allows to obtain stability in the sense of the Wasserstein distance. Secondly, we propose the lifted filtrations, which make it possible to estimate the homology of immersed manifolds, even when their reach is zero. We introduce the notion of normal reach, and show that it allows to control geometric quantities associated to the manifold. We study the estimation of tangent spaces by local covariance matrices. Thirdly, we develop a theoretical framework for vector bundle filtrations, and define the persistent Stiefel-Whitney classes. We show that the persistent Stiefel-Whitney classes associated to the Cech bundle filtrations are Hausdorff-stable and consistent. To allow their algorithmic implementation, we introduce the notion of weak star condition.

*01/10*

**Speaker:** Daniel Perez

**Title: **Trees and the persistent homology of Markov processes

The TDA community has long been concerned with probabilistic aspect of persistence diagrams. An interesting question is : given a manifold M, what does the barcode of a random function on M look like ? We will give an answer to this question for the simple case of 1D, with the assumption that the processes are Markov. This covers a relatively wide range of well-studied processes such as Itô processes and Brownian motion. To study this question, we depart from the traditional persistent homology setting, by establishing a functorial link between diagrams and trees and studying metric invariants of the latter. These turn out to be related to the almost sure regularity of the processes studied, and if the process is Lévy, to the self-similarity of the process. In particular, we will give the law of the number of bars greater than epsilon and establish some asymptotic results concerning this random variable. We will illustrate these results by examining the particular case of Brownian motion, for which we have an almost full description of the barcode.

*17/09*

**Speaker:** Nicolas Berkouk

**Title: **Persistence and Sheaves: from Theory to Applications (defense rehearsal)

*10/09*

**Speaker:** Barak Sober

**Title: **The Manifold Moving Least-Squares (Manifold-MLS) Framework: estimating manifolds and reconstructing their atlas from high-dimensional point clouds

*Manifold Learning*is to project the data into a lower-dimensional Euclidean domain, while maintaining some notion of distance (e.g., geodesic or diffusion).

In this talk, we introduce an alternative approach named the Manifold Moving Least-Squares (*Manifold-MLS*) that, given a finite set of samples, reconstructs an atlas of charts and provides an approximation of the manifold itself. Under certain (non-restrictive) sampling assumptions, we prove that the *Manifold-MLS* produces a smooth Riemannian manifold approximating the sampled one, even in case of noisy samples. We show that the approximation converges to the sampled manifold in case the number of samples tends to infinity, and show the exact convergence rates. Furthermore, we show that in the case of clean samples we obtain a high order of approximation (up to the degree of smoothness) for both the manifold and geodesic distances (i.e., the procedure we describe is nearly isometric when the resolution is sufficiently fine).

*03/09*

**Speaker:** Théo Lacombe

**Title:** Statistical tools for topological descriptors using optimal transport (defense rehearsal)

Topological data analysis (TDA) allows one to extract rich information from structured data (such as graphs or time series) that occurs in modern machine learning problems. This information will be represented as descriptors such as persistence diagrams, which can be described as point measures supported on a half-plane. While persistence diagrams are not elements of a vector space, they can still be compared using partial matching metrics. The similarities between these metrics and those routinely used in optimal transport—another field of mathematics—are known for long, but a formal connection between these two fields is yet to come. The purpose of this thesis is to clarify this connection and develop new theoretical and computational tools to manipulate persistence diagrams, targeting statistical applications. First, we show how optimal partial transport with boundary, a variation of classic optimal transport theory, provides a formalism that encompasses standard metrics in TDA. We then showcase the benefits of this connection in different situations: a theoretical study and the development of an algorithm to perform fast estimation of barycenters of persistence diagrams, the characterization of continuous linear representations of persistence diagrams and how to learn such representations using a neural network, and eventually a stability result in the context of linearly averaging random persistence diagrams.