DataShape Seminar

Weekly seminars


For any questions, please contact Théo Lacombe or Owen Rouillé. Some of the talks are recorded and can be accessed on our YouTube Channel.



Speaker: Bastian Rieck
Title: Topological Representation Learning for Structured and Unstructured Data

Topological data analysis emerged as an effective tool in machine learning, supporting the analysis of neural networks, but also driving the development of novel algorithms that incorporate topological characteristics. This talk will focus on learning appropriate topological representations for unstructured data sets, such as point clouds, and structured data sets, such as graphs. Specifically, we will discuss how to use topological data analysis to obtain topologically faithful embeddings. We will also study graph classification, which is of particular interest since graphs are inherently amenable to a topological description in terms of their connected components and cycles. Next to describing ‘deep’ approaches for such tasks, this talk will also build a bridge to graph neural networks and demonstrate a topological variant of ‘readout’ and ‘aggregation’ functions, which can be learned end-to-end.


Speaker: Ery Arias-Castro
Title: On Using Graph Distances to Estimate Euclidean and Related Distances

Graph distances have proven quite useful in machine learning/statistics, particularly in the estimation of Euclidean or geodesic distances, and as such have been used to embed a graph (the multidimensional scaling problem). The talk will include a partial review of the literature and then present more recent developments, including on the minimax estimation of distances on a surface and consequences for manifold learning, on the estimation of curvature-constrained distances on a surface, and on the estimation of Euclidean distances based on an unweighted and noisy neighborhood graph.


Speaker: Antonin Guilloux and Guillaume Moroz
Title: Visualisation d’une fibration du complémentaire du noeud de huit

Le complémentaire du noeud de huit dans la sphère S^3 est une variété de dimension 3 dont on sait qu’elle est homéomorphe à un fibré en tores épointés sur le cercle. De manière analogue aux fibrations de Milnor, nous utilisons une paramétrisation du noeud comme lieu des zéros de deux polynômes réels pour décrire explicitement dans R^3 les fibres. Chaque fibre est alors une hypersurface lisse à bord le noeud dans R^3, définie par l’annulation d’un polynôme à coefficients réels. Nous pouvons alors calculer un mesh pour suffisamment de ces surfaces et les représenter.

Cela mène à une visualisation, sous forme de film, de cette fibration et d’une courbe significative mathématiquement. Nous présenterons les diverses étapes de ce travail.


Speaker: Owen Rouillé
Title: Computation of Large Asymptotics of 3-Manifold Quantum Invariants

Quantum topological invariants have played an important role in computational topology, and they are at the heart of major modern mathematical conjectures. In this article, we study the experimental problem of computing large r values of Turaev-Viro invariants TV_r. We base our approach on an optimized backtracking algorithm, consisting of enumerating combinatorial data on a triangulation of a 3-manifold. We design an easily computable parameter to estimate the complexity of the enumeration space, based on lattice point counting in polytopes, and show experimentally its accuracy. We apply this parameter to a preprocessing strategy on the triangulation, and combine it with multi-precision arithmetics in order to compute the Turaev-Viro invariants. We finally study the improvements brought by these optimizations compared to state-of-the-art implementations, and verify experimentally Chen and Yang’s volume conjecture on a census of closed 3-manifold.


Speaker: Vincent Divol
Title: Density estimation on manifolds: an optimal transport approach

Density estimation is one of the most classical problem in nonparametric statistics: given i.i.d. samples X_1,…,X_n from a distribution \mu with density f on R^D, the goal is to reconstruct the underlying density (say for instance for the L_p norm). This problem is known to become untractable in high dimension D >> 1. We propose to overcome this issue by assuming that the distribution \mu is actually supported around a low dimensional unknown shape M, of dimension d << D. After showing that this problem is degenerate for a large class of standard losses (L_p, total variation, etc.), we focus on the Wasserstein loss, for which we build a minimax estimator, based on kernel density estimation, whose rate of convergence depends on d, and on the regularity of the underlying density, but not on the ambient dimension D.


Speaker: Serguei Barannikov
Title: Canonical forms = Persistence diagrams.

Any filtered complex over a field F can be brought by a linear transformation preserving the filtration to so called canonical form, a canonically defined direct sum of filtered complexes of two types: one-dimensional complexes with trivial differential d(e_{t_i})=0 and two-dimensional complexes with trivial homology d(e_{s_j})=e_{r_j}. The talk is devoted to the proof of this theorem, that was first published in the speaker’s 1994 paper “Framed Morse complex and its invariants”, AMS, Advances in Soviet Mathematics 21: 93–115. In that work these canonical form invariants were applied to Morse complexes which calculate homology of sublevel sets. Starting from middle of 2000s these invariants became widely popular in Applied Mathematics under the name of “persistence diagrams” and “persistence barcodes”. The mentioned classification theorem is usually referred in Applied Mathematics as the Persistent homology Main (or Structure, or Principal) Theorem. It is curious that the first dissertation in Applied Mathematics on the subject of these persistence invariants, which appeared in 2001 with parts published in 2000, also considered as principal example of filtered complexes exactly the same Morse complexes, in the particular case of two dimensions. It also had a big interesting part devoted to “canonical bases”. Many other filtered complexes are nowadays considered (Cech, simplicial, cubical etc) for sublevel homology of a function on topological space. Most often, the filtered complexes describing sublevel homology of the distance to a point-cloud set function are considered over last years. The algorithms of more than 10 different software platforms, that exist actually for computation of persistence diagrams, are based on the described in the mentioned 1994 paper algorithm bringing filtered complexes to the canonical form. After the proof of this theorem, final part of the talk is devoted to some applications of this theorem in pure mathematics, in particular in algebraic topology and differential geometry.


Speaker: Wojciech Reise
Title: Towards Pattern Detection using Dynamic Time Warping

This talk will be centered around pattern detection and Dynamic Time Warping. Using the problem of velocity estimation from magnetic data in navigation systems, I will motivate the need for detecting repeated patterns in time series and the use of Dynamic Time Warping to that aim. I will recall the definition as the technique was originally introduced, and present a selection of more recent variants, including a differentiable version.


Speaker: Nina Otter
Title: Magnitude homology: overview and current directions

Magnitude is an isometric invariant of metric spaces that was introduced by Tom Leinster in 2010, and is currently the object of intense research, since it has been shown to encode many invariants of a metric space such as volume, dimension and capacity. Magnitude homology is a homology theory for metric spaces that was introduced by Hepworth-Willerton and Leinster-Shulman, and categorifies magnitude in a similar way as the singular homology of a topological space categorifies its Euler characteristic.

In this talk I will first introduce magnitude and magnitude homology. I will then give an overview of existing results and current research in this area, explain how magnitude homology is related to persistent homology, and finally discuss some open questions and work in progress.


Speaker: Mathieu Carriere
Title: (Multiparameter) Persistent Homology for Quantitative Multiplex Immunofluorescence Images.

In this talk, I will present a novel application of persistent homology to characterize the spatial arrangement of immune and epithelial (tumor) cells within the breast cancer immune microenvironment. More specifically, quantitative and robust characterizations are built by computing persistence diagrams out of a staining technique (quantitative multiplex immunofluorescence) which allows to obtain spatial coordinates and stain intensities on individual cells. The resulting persistence diagrams are evaluated as characteristic biomarkers of cancer subtype and prognostic biomarker of overall survival, and outperform and complement the usual descriptors which capture spatial relationships with nearest neighbor analysis. I will also present an extension of persistence diagrams that we used to capture the behaviour of multiple stain intensities at once, and which is based on multiparameter persistence.


Speaker: Kristof Huszar
Title: Combinatorial width parameters for 3-manifolds

Algorithms in computational 3-manifold topology typically take a triangulation as an input and return topological information about the underlying 3-manifold. However, extracting the desired information from a triangulation (e.g., evaluating an invariant) is often computationally very expensive. In recent years this complexity barrier has been successfully tackled in some cases by importing ideas from the theory of parameterized algorithms into the realm of 3-manifolds. In this talk we focus on the key combinatorial parameter in this context (i.e., the treewidth of a compact, orientable 3-manifold), and relate it with classical topological invariants (e.g., Heegaard genus) in a quantitative way.

Joint work with Jonathan Spreer and Uli Wagner.


Speaker: Olympio Hacquard
Title: Regression on the Laplace Beltrami operator eigenfunctions using a topological penalty

We study a regression setting on a compact Riemannian manifold without boundary. We observe a n-sample (x1, y1), … (xn, yn) where the xi lie on the manifold and the yi are real numbers. The purpose is to estimate the regression function in order to predict the value when given a new point on the manifold. The regression is performed on the first eigenfunctions of the Laplace Beltrami operator. Since these functions are hard to estimate, we use a graph approximation of the manifold and replace the spectrum of the Laplace Beltrami operator by the spectrum of the Laplacian of the graph. We will discuss some of the problems that arise when going from a continuous problem to a discrete one. As the Laplacian eigenfunctions are oscillating, they tend to overfit the data and it is necessary to introduce a penalty to the model. The novelty of the method described here is the introduction of a topological penalty, which is equivalent to performing a Lasso weighted by the topological persistence of each eigenfunction, based on the heuristic idea that the functions that oscillate a lot have a high persistence. This method seems to provide promising experimental results to predict new data when given a noisy input and finding theoretical guarantees on the performance of the estimator is still under progress.


Speaker: Vincent Rouvreau
Title: What is new in Gudhi 3.3.0 ?

In this talk, I will present Edge Collapse and Persistence of Flag Complexes ( and how to use it with Gudhi. Then I will show benchmark results and my conclusions on which algorithm to use in function of the data characteristics.
And finally, we will play with ToMaTo (, now it is integrated in the Gudhi library.


Speaker: Raphaël Tinarrage
Title: Topological inference from measures and vector bundles (defense rehearsal)

We contribute to the theory of topological inference, based on the theory of persistent homology, by proposing three families of filtrations. For each of them, we prove consistency results—that is, quality of approximation of an underlying geometric object—, and stability results—that is, robustness against initial measurement errors. We propose concrete algorithms in order to use these methods in practice. The first family, the DTM-filtrations, is a robust alternative to the usual Cech filtration when the point cloud is noisy or contains anomalous points. It is based on the notion of distance to measure, which allows to obtain stability in the sense of the Wasserstein distance. Secondly, we propose the lifted filtrations, which make it possible to estimate the homology of immersed manifolds, even when their reach is zero. We introduce the notion of normal reach, and show that it allows to control geometric quantities associated to the manifold. We study the estimation of tangent spaces by local covariance matrices. Thirdly, we develop a theoretical framework for vector bundle filtrations, and define the persistent Stiefel-Whitney classes. We show that the persistent Stiefel-Whitney classes associated to the Cech bundle filtrations are Hausdorff-stable and consistent. To allow their algorithmic implementation, we introduce the notion of weak star condition.


Speaker: Daniel Perez
Title: Trees and the persistent homology of Markov processes

The TDA community has long been concerned with probabilistic aspect of persistence diagrams. An interesting question is : given a manifold M, what does the barcode of a random function on M look like ? We will give an answer to this question for the simple case of 1D, with the assumption that the processes are Markov. This covers a relatively wide range of well-studied processes such as Itô processes and Brownian motion. To study this question, we depart from the traditional persistent homology setting, by establishing a functorial link between diagrams and trees and studying metric invariants of the latter. These turn out to be related to the almost sure regularity of the processes studied, and if the process is Lévy, to the self-similarity of the process. In particular, we will give the law of the number of bars greater than epsilon and establish some asymptotic results concerning this random variable. We will illustrate these results by examining the particular case of Brownian motion, for which we have an almost full description of the barcode.


Speaker: Nicolas Berkouk
Title: Persistence and Sheaves: from Theory to Applications (defense rehearsal)

Topological data analysis is a recent field of research aiming at using techniques coming from algebraic topology to define descriptors of datasets.To be useful in practice, these descriptors must be computable, and coming with a notion of metric, in order to express their stability properties with respect to the noise that always comes with real world data. Persistance theory was elaborated in the early 2000’s as a first theoretical setting to define such descriptors – the now famous so-called barcodes. However very well suited to be implemented in a computer, persistence theory has certain limitations. In this work, we establish explicit links between the theory of derived sheaves equipped with the convolution distance (after Kashiwara-Schapira) and persistence theory in order to enjoy best of both’s worlds: computer friendliness of persistence in the one hand, theoretical deepness of sheaf theory on the other one.


Speaker: Barak Sober
Title: The Manifold Moving Least-Squares (Manifold-MLS) Framework: estimating manifolds and reconstructing their atlas from high-dimensional point clouds

Differentiable manifolds are an indispensable ‘language’ in modern physics and mathematics. As such, there are a plethora of analytic tools designed to investigate manifold-based models (e.g., connections, differential forms, curvature tensors, parallel transport, bundles). However, in order to facilitate these tools, one normally assumes access to the manifold’s atlas of charts (i.e., local parametrizations). In recent years, manifold-based modeling has permeated into data analysis as well, usually in order to avoid working in high dimensions. However, in these data-driven models, charts are not accessible and the only information at hand is the samples themselves. As a result, a common practice in Manifold Learning is to project the data into a lower-dimensional Euclidean domain, while maintaining some notion of distance (e.g., geodesic or diffusion).

In this talk, we introduce an alternative approach named the Manifold Moving Least-Squares (Manifold-MLS) that, given a finite set of samples, reconstructs an atlas of charts and provides an approximation of the manifold itself. Under certain (non-restrictive) sampling assumptions, we prove that the Manifold-MLS produces a smooth Riemannian manifold approximating the sampled one, even in case of noisy samples. We show that the approximation converges to the sampled manifold in case the number of samples tends to infinity, and show the exact convergence rates. Furthermore, we show that in the case of clean samples we obtain a high order of approximation (up to the degree of smoothness) for both the manifold and geodesic distances (i.e., the procedure we describe is nearly isometric when the resolution is sufficiently fine).


Speaker: Théo Lacombe
Title: Statistical tools for topological descriptors using optimal transport (defense rehearsal)

Topological data analysis (TDA) allows one to extract rich information from structured data (such as graphs or time series) that occurs in modern machine learning problems. This information will be represented as descriptors such as persistence diagrams, which can be described as point measures supported on a half-plane. While persistence diagrams are not elements of a vector space, they can still be compared using partial matching metrics. The similarities between these metrics and those routinely used in optimal transport—another field of mathematics—are known for long, but a formal connection between these two fields is yet to come. The purpose of this thesis is to clarify this connection and develop new theoretical and computational tools to manipulate persistence diagrams, targeting statistical applications. First, we show how optimal partial transport with boundary, a variation of classic optimal transport theory, provides a formalism that encompasses standard metrics in TDA. We then showcase the benefits of this connection in different situations: a theoretical study and the development of an algorithm to perform fast estimation of barycenters of persistence diagrams, the characterization of continuous linear representations of persistence diagrams and how to learn such representations using a neural network, and eventually a stability result in the context of linearly averaging random persistence diagrams.


For Saclay
For Sophia


Comments are closed.