DataShape Seminar

Seminars

For any questions, please contact Antoine Commaret, David Loiseaux, Nina Otter or Charles Arnal. The talks are transmitted online and recorded. The recordings of talks from previous years can be found on our YouTube Channel.

08/11/2023Luis Scoccola
09/11/2023Vincent Divol
15/11/2023Bartek Blaszczyszyn
22/11/2023François Petit
30/11/2023Katharine Turner
13/12/2023Wolfgang Polonik
21/12/2023Céline Duval
10/01/2024Charles Arnal
24/01/2024Vincent Divol
31/01/2024Henrique Ennes and Raphaël Tinarrage
08/02/2024Eric Goubault
13/03/2024Alexandra Rören and Adrien Beaud
 20/03/2024 Sara Kalisnik and Miguel O’Malley

08/11/2023 (11h00 CET)
Speaker: Luis Scoccola
Title: What do we want from invariants of multiparameter persistence modules?

Various constructions relevant to practical problems such as clustering and graph classification give rise to multiparameter persistence modules (MPPM), that is, linear representations of non-totally ordered sets. Much of the mathematical interest in multiparameter persistence comes from the fact that there exists no tractable classification of MPPM up to isomorphism, meaning that there is a lot of room for devising invariants of MPPM that strike a good balance between discriminating power and complexity of their computation. However, there is no consensus on what type of information we want these invariants to provide us with, and, in particular, there seems to be no good notion of “global” or “high persistence” features of MPPM.
With the goal of substantiating these claims, as well as making them more precise, I will start with an overview of the theory of multiparameter persistence, including joint works with Bauer and Oudot. I will then briefly outline recent work of Bjerkevik, which contains relevant open questions and which will help us make sense of the notion of global feature in multiparameter persistence.


09/11/2023 (15h45 CET)
Speaker: Vincent Divol
Title: Spectral estimation of the Laplace-Beltrami operator

Graphs Laplacians are used for various tasks in machine learning, e.g. for spectral clustering, or to find adapted bases of eigenfunctions that can be used as feature maps. In the case where the underlying graph is a neighborhood graph built on top of n i.i.d. points sampled on a submanifold M, the discrete Laplacian operator is known to converge towards a weighted Laplace operator on M. We focus on exhibiting rates of convergence for the eigenvalues of the operators. Namely, if the density p is of regularity s (large enough) and the manifold is of dimension d, we show that the estimation of the eigenvalues of the p-weighted Laplace operator can be done as quickly as the estimation of the density itself, with rates of order n^(-s/(2s+d)). We are also able to show that this rate is minimax optimal in the case of a curve (d=1). Joint work with Clément Berenfeld and Yann Chaubet.


22/11/2023 (11 CET)
Speaker: François Petit (CRESS, Inserm)
Title: Projected barcodes and distances for multi-parameter persistence modules

In this talk, I will present the notion of projected barcodes and projected distances for multi-parameter persistence modules. Projected barcodes are defined as derived pushforward of persistence modules onto R and provide descriptor for multiprameter persistence modules. Projected distances come in two flavors: the integral sheaf metrics (ISM) and the sliced convolution distances (SCD).

In the case where the persistence module considered is the sublevel-sets persistence modules of a function f : X -> R^n, we will explain how, under mild conditions, the projected barcode of this module by a linear map u : R^n \to R is the collection of sublevel-sets barcodes of the composition uf . In particular, it can be computed using software dedicated to one-parameter persistence modules. This is joint work with Nicolas Berkouk.


30/11/2023 (11 CET)
Speaker: Katharine Turner (Australian National University)
Title: Representing Vineyard Modules

Time-series of persistence diagrams, more commonly known as vineyards, are a useful way to capture how multi-scale topological features vary over time. However, as the persistent homology is calculated and considered at each time step independently we do lose significant information in how the individual persistent homology classes evolve over time. A natural algebraic version of vineyards is a time series of persistence modules equipped with interleaving maps between the persistence modules at different time stamps. Let’s call this a vineyard module. I will set up the framework for representing a vineyard module via an indexed set of vines alongside a collection of matrices. Furthermore I will outline an algorithmic way to transform the bases of the persistence modules at each time step within the vineyard module to make the matrices within this representation as simple as possible. With some reasonable assumptions (analogous to those in Cerf theory) on the vineyard modules, this simplified representation can be completely described (up to isomorphism) by the underlying vineyard and a vector of finite length. While this vector representation is not in general guaranteed to be unique we can prove that it will be always zero when the vineyard module is isomorphic to the direct sum of vine modules.


13/12/2023 (11 CET)
Speaker: Wolfgang Polonik (UC Davis)
Title: Inference in Topological Data Analysis

This talk presents some novel contributions to statistical inference for Topological Data Analysis (TDA). The presented inference methods consist of bootstrap based confidence regions for (persistent) Betti numbers and Euler characteristic curves. In contrast to most of the other existing inference methods for TDA, our methods are based on one data set of size n, and  large sample guarantees are thus established for n tending to infinity. The presented results provide insights into how the sampling distribution affects the persistence diagram. This is joint work with Benjamin Roycraft and Johannes Krebs.


21/12/2023 (3.45pm CET) (joint with Probability and Statistics seminar)
Speaker: Céline Duval (Université de Lilles)
Title: Geometry of excursion sets: computing the surface area from discretized points

The excursion sets of a smooth random field carries relevant information in its various geometric measures. After an introduction of these geometrical quantities showing how they are related to the parameters of the field, we focus on the problem of discretization. From a computational viewpoint, one never has access to the continuous observation of the excursion set, but rather to observations at discrete points in space. It has been reported that for specific regular lattices of points in dimensions 2 and 3, the usual estimate of the surface area of the excursions remains biased even when the lattice becomes dense in the domain of observation. We show that this limiting bias is invariant to the locations of the observation points and that it only depends on the ambient dimension. (based on joint works with H. Biermé, R. Cotsakis, E. Di Bernardino and A. Estrade).


10/01/2024 (11am CET)
Speaker: Charles Arnal (Inria-Saclay)
Title: Critical points of generic submanifolds

[Work by David Cohen-Steiner, Vincent Divol, and Charles Arnal]
Given a closed set S in Euclidean space, one can consider the (generalized) gradient of the distance function to S. We are interested in the study of this gradient, and especially in the (generalized) critical points Z(S) of the distance function, which play an important role in computational geometry, and in particular in the study of the Cech complex of S. In general, the set of critical points Z(S) can be poorly behaved; however, we show that Z(S) becomes very regular and stable when S=M is a generically embedded submanifold. More precisely, for any compact abstract manifold M, the set of embeddings of M into a Euclidean space such that their image satisfies our strong regularity and stability conditions is open and dense in the Whitney C2-topology. When those conditions are satisfied, the distance function to the embedded submanifold essentially behaves like a Morse function in terms of the homotopy type of its sublevels. In this talk, I will detail this result and explain the core ideas of the proof, including our application of Thom’s transversality theorem. In his talk on the 24th of January, my coauthor Vincent Divol will detail the consequences of these results in terms of diagrams of subsamples of a generic submanifold. 
 

24/01/2024 (11am CET)
Speaker: Vincent Divol (Université PSL)
Title: Wasserstein convergence of persistence diagrams on generic manifolds

Persistence diagrams (PDs) allow one to describe the topology of a point cloud in a multiscale fashion. When the point cloud is sampled on an m-dimensional submanifold M of R^d, the bottleneck stability theorem ensures that the the PD of the point cloud is close with respect to the bottleneck distance to the PD of the underlying manifold M. However, closeness with respect to this distance does not give any indication on the structure of the points in the PDs that are close to the diagonal. For a generic submanifold M, we are able to give a precise analysis on the shape of these points by providing laws of large numbers. This generalizes previous results by Wolfgang Polonik and me in the case of points sampled in the cube [0,1]^m. (Joint work with Charles Arnal and David Cohen-Steiner)


31/01/2024 (11am CET)
Speaker: Henrique Ennes (Inria Sophia Antipolis) and Raphaël Tinarrage (FGV EMAp)
Title: Detection of representation orbits of compact Lie groups on point clouds

Arguably, the most impactful realization in early last-century quantitative sciences was the combination of the loose notions of symmetry with the precise definition of algebraic groups. From important results in cryptography, passing through the description of the hydrogen atom and the standard model of particles, arriving at the formalization of Klein’s Erlanger program, or even visual psychology, the group-theoretic point of view on symmetries allowed for unprecedented developments in many theoretical and practical fields. In particular, in Machine Learning, there is a long history of interest in detecting actions of groups. For instance, identifying symmetries of the special Euclidean group SE(n) has been tackled since the 1980s for planar data, and more recently 3D objects. Other Lie groups have also been studied, Abelian or not. Once the information of acting Lie groups is determined, it can be used to increase the efficiency of other inference and machine learning tasks, through the use of equivariant algorithms. However, the methods of representation detection found in the literature do not address the question of estimating the exact representation type, that is, its decomposition into irreducible representations. Indeed, the representation is often found through its Lie algebra, which is approximated as if it were a linear subspace. Consequently, the information regarding the commutators is not exact, and the approximation may not be stable by Lie bracket. In this work, we tackle the question of projecting onto the closest Lie algebra, as raised by Cahill, Mixon, and Parshall, through employing tools from optimization on matrix manifolds. Namely, the manifolds involved are the Stiefel and Grassmannian of Lie subalgebras. We propose a careful analysis of our algorithm, allowing to obtain precise theoretical guarantees for the convergence of the optimization. By bridging the gap between compact Lie groups representations and symmetry detection in practice, our algorithm allows the reconstruction of the orbits, and the identification of the group that generates the action. Our algorithm is general for any compact Lie group, but we implement it and focus specifically on SO(2), T^d, SU(2), and SO(3). We illustrate its accuracy in the context of three data science problems, including image analysis, harmonic analysis, and classical mechanics systems.


08/02/2024 (11am CET)
Speaker: Eric Goubault (LIX)
Title: Directed homology and persistence modules

In this talk, we will try to give a self-contained account on a construction for a directed homology theory based on modules over algebras, linking it to both persistence homology and natural homology, that was originally proposed as a convenient directed homology theory, based on natural systems of modules. 

Persistence modules have been introduced originally for topological data analysis, where the data set seen at different «  resolutions »  is organized as a filtration of spaces. This has been further generalized to multidimensional persistence and « generalized » persistence, where a persistence module was defined to be any functor from a partially ordered set, or more generally a preordered set, to an arbitrary category (in general, a category of vector spaces). 

Directed topology has its roots in a very different application area, concurrency and distributed systems theory rather than data analysis. Its goal is to study (multi-dimensional) dynamical systems, discrete (precubical sets, for application to concurrency theory) or continuous time (differential inclusions, for e.g. applications in control), that appear in the study of multi-agent systems. In this framework, topological spaces are «  directed », meaning they have «  preferred directions », for instance a cone in the tangent space, if we are considering a manifold, or the canonical local coordinate system in a precubical set. Natural homology, an invariant for directed topology, defines a natural system of modules, a further categorical generalization of (bi)modules, describing the evolution of the standard (simplicial, or singular) homology of certain path spaces, along their endpoints. Indeed, this is, in spirit, similar to persistence homology. 

This talk will be concerned with a more « classical » construction of directed homology, mostly for precubical sets here, based on (bi)modules over (path) algebras, making it closer to classical homology with value in modules over rings, and of the techniques introduced for persistence modules. Still, this construction retains the essential information that natural homology is unveiling. Of particular interest will be the role of restriction and extension of scalars functors, that will be central to the discussion of relative homology and Mayer-Vietoris sequences. If time permits as well, we will discuss a Kunneth formula and some « tameness » issues, for dealing with practical calculations. 


13/03/2024 (10am CET)

Speaker: Alexandra Rören and Adrien Beaud (CRESS, Inserm)

Title: Exploring methods for shoulder movement quality assessment using IMUs, in individuals with Subacromial Pain Syndrome

 


20/03/2024 (1.30pm CET)

Speaker: Sara Kalisnik (ETH)

Title: Persistent Homology for Ellipsoid Complexes

(the first part is joint work with Davorin Lesnik, the second part is joint work with Bastian Rieck and Ana Zegarac)

A seminal result by Niyogi, Smale and Weinberger states that if a sample of a closed smooth submanifold of a Euclidean space is dense enough (relative to the reach of the manifold), there exists an interval of radii, for which the union of closed balls around sample points deformation retracts to the manifold. A tangent space is a good local approximation of a manifold, so we can expect that an object, elongated in the tangent direction, will better approximate the manifold than a ball. I will briefly review a result we proved with Davorin Lešnik that the union of ellipsoids of suitable size around sample points deformation retracts to the manifold while requiring much smaller density than in the case of union of balls. Then I will focus on work-in-progress with Ana Zegarac and Bastian Rieck on implementing ellipsoid complexes, and in particular, I will share some of the preliminary results.


20/03/2024 (3pm CET)

Speaker: Miguel O’Malley (MPI MiS)

Title: Alpha magnitude and dimension

(joint work with Sara Kalisnik and Nina Otter)
 
Magnitude, an isometric invariant of metric spaces, is known to bear rich connections to other desirable invariants, such as dimension, volume, and curvature. Connections between magnitude and persistent homology, a method to observe topological features in datasets, are well studied and fruitful. We leverage one such connection, persistent magnitude, to introduce alpha magnitude, a new invariant which bears many of the same properties of magnitude. We show in particular a strong connection to the Minkowski dimensions of compact subspaces of R^n and conjecture the connection exists in general.

 


Previously

2022-2023

15/06Laurent Oudre
25/05Christophe Pichon
04/05Clément Maria
06/04 08/06Clément Levrard
23/03 30/03Jae-Hun Jung
16/03Olympio Hacquard
09/03Umut Şimşekli
23/02Ximena Fernandez
16/02 11/05
Michael Ghil, Denisse Sciamarella
26/01Laure Ferraris
05/01Claire Brécheteau
17/11Wolfgang Polonik
10/11 8/12
Alice le Brigant

For the duration of the thematic trimester GESDA at IHP, the regular seminars were paused.

15/06 (11h00 CEST)
Speaker: Laurent Oudre
Title: Étude de la marche grâce à des centrales inertielles : du traitement du signal à l’analyse topologique des données

Cet exposé reviendra sur le projet SmartCheck, qui vise à quantifier et étudier la marche en consultation grâce à des centrales inertielles. Nous présenterons un panorama des différentes approches utilisées au fil des années pour analyser ces données et pouvoir comparer deux enregistrements à des fins de comparaison inter-indivuelle et de suivi longitudinal. Les premières méthodes, issues du traitement du signal visaient à localiser les événements d’intérêt et les changements de comportements, grâce à des techniques de reconnaissance de formes et de détection de ruptures. Plus récemment, nous nous sommes demandés si ces étapes étaient nécessaires, et si une approche plus globale issue de l’analyse topologique des données ne permettrait pas de construire une distance adaptée entre enregistrements.

25/05 (11h00 CEST)
Speaker: Christophe Pichon
Title: DisPerSE: automatic identification of persistent structures in 2 and 3D complexes

DisPerSE stands for Discrete Persistent Structures Extractor. Its main purpose is to identify persistent topological features such as peaks, voids, walls and in particular filamentary structures within sampled distributions in 2D and 3D. It was developed with observational and numerical cosmology in mind (for the study of the so-called comic web in the Universe). I will present a few interesting applications and related theoretical breakthroughs from astrophysics.

04/05 (11h00 CEST)
Speaker: Clément Maria
Title: Aspects of Algorithmic Quantum Topology

In this talk, I will introduce notions from quantum topology through the algorithmic and computational complexity lenses. Quantum topology uses tools from mathematical physics to design topological invariants of knots and 3-manifolds. Computationally, the algorithmic complexity of these invariants are in connection with the complexity class #P of counting problems, and the quantum complexity class BQP. More recently, efficient and often optimal (under #ETH) algorithms have been designed using methods from parameterized complexity, where, in this framework, parameters may be of combinatorial or topological nature. I will try to give an overview of the state of the art and the future challenges of the field.

06/04 08/06 (15h45 CEST)
Speaker: Clément Levrard
Title: Estimation optimale du Reach via estimation de métrique

Dans un contexte d’inférence géométrique, où le but est d’approcher le support d’une loi de probabilité P à partir d’un échantillon, le reach du support est un paramètre clé: il fournit une échelle locale en dessous de laquelle le support peut être considéré comme “plat”, et intervient de ce fait dans les vitesses d’estimation de support ainsi que dans les algorithmes de reconstruction effectifs de ce support.

Dans ce exposé je présenterai les différentes caractérisations de ce reach, ainsi que ses liens avec d’autres types d’échelles locales utilisées en géométrie algorithmique, certaines pouvant être estimées, d’autres non. Je conclurai en présentant les résultats obtenus avec Eddie Aamari et Clément Berenfeld, à savoir que la caractérisation métrique du reach (résultat relativement récent de Jean-Daniel Boissonnat, André Lieutier et Mathijs Wintraecken) fournit une voie d’accès “optimale” pour l’estimation du reach (au sens minimax).

23/03 30/03 (11h00 CEST)
Speaker: Jae-Hun Jung
Title: Topological data analysis of time-series data

Time-series data analysis is found in various applications that deal with sequential data over the given interval of, e.g. time. In this talk, we discuss time-series data analysis based on topological data analysis (TDA). The commonly used TDA method for time-series data analysis utilizes the embedding techniques such as sliding window embedding. With sliding window embedding the given data points are translated into the point cloud in the embedding space and the method of persistent homology is applied to the obtained point cloud. In this talk, we first show some examples of time-series data analysis with TDA. The first example is from music data for which the dynamic processes in time is summarized by low dimensional representation based on persistence homology. The second is the example of the gravitational wave detection problem and we will discuss how we concatenate the real signal and topological features. Then we will introduce our recent work of exact and fast multi-parameter persistent homology (EMPH) theory. The EMPH method is based on the Fourier transform of the data and the exact persistent barcodes. The EMPH is highly advantageous for time-series data analysis in that its computational complexity is as low as O(N log N) and it provides various topological inferences almost in no time. The presented works are in collaboration with Mai Lan Tran, Chris Bresten and Keunsu Kim.

16/03 (11h00 CET)
Speaker: Olympio Hacquard
Title: Statistical learning on measures: an application to persistence diagrams

We consider a binary supervised learning classification problem where instead of having data in a finite-dimensional Euclidean space, we observe measures on a compact space $\mathcal{X}$. Formally, we observe data $D_N = (\mu_1, Y_1), \ldots, (\mu_N, Y_N)$ where $\mu_i$ is a measure on $\mathcal{X}$ and $Y_i$ is a label in ${0, 1}$. Given a set $\mathcal{F}$ of functions on $\mathcal{X}$, we build corresponding classifiers in the space of measures, and we provide upper and lower bounds on the Rademacher complexity of this new class of classifiers that can be expressed simply in terms of corresponding quantities for the class $\mathcal{F}$. If the measures $\mu_i$ are uniform over a finite set, this classification task boils down to a multi-instance learning problem. However, our approach allows more flexibility and diversity in the input data we can deal with. While such a framework has many possible applications, this work strongly emphasizes on classifying data via topological descriptors called persistence diagrams. These objects are discrete measures on $\mathbb{R}^2$, where the coordinates of each point correspond to the range of scales at which a topological feature exists. We will present several classifiers on measures and show how they can heuristically and theoretically enable a good classification performance in various settings in the case of persistence diagrams.

09/03 (15h45 CET)
Speaker: Umut Şimşekli
Title: Generalization in SGD: Heavy-Tails and Fractal Structure

In this talk, we will first derive a generalization bound for SGD under the assumption that it can be well-modeled by a heavy-tailed stochastic differential equation. To do so, we will use tools from fractal geometry and geometric measure theory and make a connection between heavy-tails and generalization error through fractal geometry. Then, we will extend this framework by using the so called “persistent homology dimension”, which is a notion of fractal dimension that is defined through topological data analysis (TDA) notions. Thanks to this connection, we will show that generalization in neural networks can be efficiently computed by using TDA tools. This talk is based on the following papers: arXiv:2006.09313 and arXiv:2111.13171.

23/02 (11h00 CET)
Speaker: Ximena Fernandez
Title: Intrinsic persistent homology via density-based metric learning
Slides

In this talk, I will present a recent density-based method to address the problem of estimating topological features from data in high dimensional Euclidean spaces under the manifold assumption. The key of this approach is to consider a sample metric known as Fermat distance to robustly infer the homology of the space of data points . We prove that such sample metric space GH-converges almost surely to the manifold itself endowed with an intrinsic (Riemannian) metric that accounts for both the geometry of the manifold and the density that produces the sample. This fact, joint with the stability properties of persistent homology, implies the convergence of the associated persistence diagrams, which present advantageous properties. We show that they are robust to the presence of (geometric) outliers in the input data and less sensitive to the particular embedding of the underlying manifold in the ambient space. Finally, we exhibit concrete applications of these ideas to time series analysis, with examples in real data.

This is joint work with E. Borghini, G. Mindlin and P. Groisman, Intrinsic persistent homology via density-based metric learning, arXiv preprint arXiv: 2012.07621, 2020.

16/02 11/05 (11h00 CET)
Speaker: Michael Ghil, Denisse Sciamarella
Title: Chaos, stochasticité et topologie algébrique : nouvelles méthodes et leur application au climat

Abstract here.

26/01 (11h00 CET)
Speaker: Laure Ferraris
Title: Imiter la lumière pour l’analyse topologique des données

Dans cet exposé, il sera question de “metric learning”.
Dans un premier temps je présenterai la famille des distances dites de Fermat. Ces distances définissent un plus court chemin entre deux points qui, à la manière de la lumière, traverse différents milieux plus ou moins rapidement. Interprétant un sous-ensemble de R^D comme le support d’une mesure de probabilité, dans ce modèle, le chemin ne sort pas du support et est accéléré lorsqu’il traverse des régions à forte densité. Ainsi la distance s’adapte d’une part, à la géométrie et d’autre part, à la distribution de la masse. Cet effet est modulable par un paramètre dont le choix, délicat, peut dépendre du contexte d’application mais aussi de contraintes statistiques ou encore du problème de la complexité numérique.
J’introduirai la famille des estimateurs de la distance Fermat et les premiers résultats sur ses propriétés statistiques. Enfin, j’évoquerai les questions ouvertes qui occupent un travail de recherche en cours en collaboration avec Matthieu Jonckheere, Facundo Sapienza, Pablo Groisman, Frédéric Pascal et Frédéric Chazal.
Dans un second temps, en guise d’ouverture, j’évoquerai mon projet de recherche sous la direction de Frédéric Chazal: “L’introduction d’une nouvelle distance entre deux points la DTM-Fermat et ses applications en analyse topologique des données”.

05/01 (15h45 CET)
Speaker: Claire Brécheteau (Université de Nantes)
Title: Approcher des données par une union de boules ou d’ellipsoïdes et partitionnement

Dans cet exposé, il sera question de construire un proxy de le fonction distance à un compact, à partir d’un nuage de points générés sur ce compact, avec du bruit.
Ce proxy sera construit à partir d’un critère de type k-means, avec une divergence de Bregman. Ses sous-niveaux seront des unions de boules. Je présenterai l’utilisation de ce proxy à des fins de partitionnement de données.
Il s’agit de travaux publiés dans :
– Claire Brécheteau and Clément Levrard, A k-points-based distance for robust geometric inference. Bernoulli 2020, Vol. 26, No. 4, 3017-3050
– Claire Brécheteau and Aurélie Fischer and Clément Levrard, Robust Bregman Clustering. Annals of Statistics 2021, Vol. 49, No. 3, 1679-1701
– Claire Brécheteau, Robust anisotropic power-functions-based filtrations for clustering.
Symposium on Computational Geometry 2020, 23:1-23:15

17/11 (15h45 CET)
Speaker: Wolfgang Polonik (UC Davis)
Title: Multiscale Geometric Feature Extraction

A method for extracting multiscale geometric features from a data cloud is presented. Each pair of data points is mapped into a real-valued feature function, whose construction is based on geometric considerations and the novel notion of a distribution of (local) data depths. The collection of these feature functions is then being used for further data analysis. In contrast to the popular kernel-trick, our feature functions are functions of a one-dimensional parameter, and thus they can be used to visualize geometric aspects of a high-dimensional data cloud. Besides visualization, applications include classification and anomaly detection. The performance of the methodology is illustrated through applications to real data sets, and some theoretical guarantees supporting the performance of the proposed methodology are presented. This is joint work with G. Chandler.

10/11 (15h45 CET) Rescheduled to 08/12
Speaker: Alice Le Brigant
Title: Fisher information geometry of Dirichlet distributions

The Fisher information can be used to define a Riemannian metric to compare probability distributions inside a parametric family. The most well-known example is the case of (univariate) normal distributions, where the Fisher information induces hyperbolic geometry. In this talk we will investigate the Fisher information geometry of Dirichlet distributions, and beta distributions as a particular case. We show that it is negatively curved and geodesically complete. This guarantees the uniqueness of the notion of mean distribution, and makes it a suitable geometry to apply the K-means algorithm, e.g. to compare and classify histograms.

2021-2022

30/06Edouard Bonnet
23/06Théo Lacombe
16/06Quentin Mérigot & Mathijs Wintraecken & André Lieutier
02/06Dmitriy Morozov
19/05Thomas Bonis
28/04Jean Feydy & Anna Song
21/04Anthea Monod
07/04Marc Hoffman
10/03Jeremy Capitao
24/02Vadim Lebovici
17/02Nicolas Berkouk & Raphael Reinhauer
03/02Mathieu Carriere
13/01Olympio Hacquard
06/01Catherine Aaron
09/12Vadim Lebovici
25/11Eduardo Mendes
18/11Jonathan Spreer
14/10Marine le Morvan
07/10Etienne Lasalle
23/09Daniel Perez

30/06 (11h00 CEST)
Speaker: Édouard Bonnet
Title: The geometry of twin-width

We introduce and survey the key features of twin-width, and more generally, of the so-called contraction sequences, insisting on what is and what’s not geometric or topological about them.

Recording


23/06 (14h00 CEST)
Speaker: Théo Lacombe
Title: Transport Optimal : Régularisation Entropique et Homogénéité

Le problème de transport optimal, dont les origines remontent à Monge, fournit une métrique pour comparer des mesures de probabilités via un problème d’optimisation. Si ce problème possède de nombreuses propriétés théoriques importantes, sa résolution numérique devient rapidement lourde et sa limitation aux mesures de probabilités est contraignante.
Cet exposé proposera une introduction au problème de transport optimal, d’abord dans sa forme la plus classique, puis dans ses variantes modernes qui font son succès aujourd’hui notamment en apprentissage : la régularisation entropique et le transport déséquilibré. Dans un travail récent [1], T.Séjourné et ses co-auteurs proposent un modèle unifiant ces deux formalismes. Dans certains cas (non-standard), ce nouveau modèle fait apparaître des inhomogénéités qui peuvent mener à des incohérences numériques. Nous ferons la lumière sur ce phénomène et proposerons une correction possible pour obtenir un modèle de transport régularisé déséquilibré homogène, introduit dans [2].

[1] T Séjourné, J Feydy, FX Vialard, A Trouvé, G Peyré : Sinkhorn divergences for unbalanced optimal transport.
[2] TL. An Homogeneous Unbalanced Regularized Optimal Transport model with application to Optimal Transport with Boundary.

Recording


16/06 (15h45 CEST)
Speaker: Quentin Mérigot
Title: Convexité cachée pour le problème de la quantification optimale uniforme

En apprentissage automatique et en problèmes inverses, il est parfois nécessaire de générer (ou déformer) un nuage de points de sorte à approcher une mesure de probabilité modèle 𝜌. Une manière naturelle d’y parvenir est de chercher à minimiser la distance de Wasserstein de la mesure uniforme sur le nuage de points par rapport à la distribution modèle :

Ce problème de minimisation, dans lequel les inconnues sont les positions des atomes (et non leur masse), n’est pas convexe. Il admet des points critiques dont l’énergie est beaucoup plus grande que celle du minimiseur. Pourtant, dans la plupart des cas, une version adaptée de l’algorithme de Lloyd — une méthode de descente de gradient à pas constant — conduit à des configurations présentant une faible énergie. Nous expliquons quantitativement ce comportement, en montrant en particulier que si les points initiaux ne sont pas trop proches les uns des autres, alors une seule étape de l’algorithme de Lloyd est suffisante pour obtenir une bonne approximation de de 𝜌. Je parlerai également d’un résultat plus récent, qualitatif, montrant en dimension 𝑑=2 que l’énergie de quantification des points critiques stables est en réalité commensurable à
l’énergie du minimiseur. Cet exposé est basé sur des travaux en commun avec A. Figalli, F. Santambrogio et C. Sarrazin.

Recording


16/06 (11h20 CEST)
Speaker: André Lieutier
Title: Delaunay-like triangulation of smooth orientable submanifolds by least L1-norm minimization

We focus on one particular instance of the shape reconstruction problem, in which  the shape we wish to reconstruct is an orientable smooth submanifold of the Euclidean space. Assuming we have as input a simplicial complex K that approximates the submanifold (such as the Čech complex or the Rips complex), we recast the reconstruction problem as a least L1-norm minimization problem in which the optimization variable is a chain of K. Providing that K satisfies some reasonable conditions, we prove that the considered minimization problem has a unique solution, whose support is a triangulation of the manifold called the flat Delaunay complex. If the point sample on which the  Čech complex or the Rips complex is built satisfies similar « reasonable conditions », this triangulation is the tangential Delaunay complex introduced  by Boissonnat et al.. Since the objective is a weighted L1-norm and the contraints are linear, the triangulation process can thus be implemented by linear programming.

By Dominique Attali (CNRS)  and André Lieutier.


16/06 (10h30 CEST)
Speaker: Mathijs Wintraecken
Title: Optimal homotopy reconstruction results à la Niyogi, Smale, and Weinberger

We show that the proof of the homotopy reconstruction result by Niyogi, Smale, and Weinberger can be streamlined considerably using Federer’s work on the reach and several geometric observations. While Niyogi, Smale, and Weinberger restricted themselves to C² manifolds $(M)$ with positive reach, our proof extends to sets $(S)$ of positive reach.

Furthermore, the sample we consider does not have to lie directly on the set of positive reach. Instead, we assume that the two one-sided Hausdorff distances (δ from the set $S$ to the sample $P$ and ε the other Hausdorff distance) are bounded. We provide explicit bounds in terms of ε and δ, that guarantee that there exists a parameter $r$ such that the union of balls $\bigcup_{p\in P} B(p, r) $ deformation retracts to $S$. We provide even better bounds for the manifold case. In both cases, our bounds improve considerably on the state-of-the-art. In fact in all cases, except where the set is a one dimensional manifold (with boundary), our bounds are optimal.

Joint work with  Dominique Attali, Hana Dal Poz Kouřimská, Chris Fillmore, Ishika Ghosh, André Lieutier, and Elizabeth Stephenson.


02/06 (17h00 CEST)
Speaker: Dmitriy Morozov
Title: Topological Optimization with Big Steps

Using persistent homology to guide optimization has emerged as a novel application of topological data analysis. Existing methods treat persistence calculation as a black box and backpropagate gradients only onto the simplices involved in particular pairs. We show how the cycles and chains used in the persistence calculation can be used to prescribe gradients to larger subsets of the domain. In particular, we show that in a special case, which serves as a building block for general losses, the problem can be solved exactly in linear time. We present empirical experiments that show the practical benefits of our algorithm: the number of steps required for the optimization is reduced by an order of magnitude.
Joint work with Arnur Nigmetov.

Recording


19/05 (15h45)
Speaker: Thomas Bonis
Title: Sampling with Stein variational gradient descent

Many methods in Bayesian statistics require approximating intractable integrals. After presenting the usual methods used to deal with this issue, namely variational inference and Monte-Carlo approaches, along with their respective shortcomings, I will introduce a novel algorithm called Stein variational gradient descent and present known theoretical results regarding this algorithm.

Recording


28/04 (11h00 – 12h00)
Speaker: Jean Feydy
Title: Fast geometric libraries for vision and data sciences

From 3D point clouds to high-dimensional samples, sparse representations have a key position in the data science toolbox. They complement 2D images and 3D volumes effectively, enabling fast geometric computations for e.g. Gaussian processes and shape analysis.

In this talk, I will present extensions for PyTorch, NumPy, Matlab and R that speed up fundamental computations on (generalized) point clouds by several orders of magnitude, when compared with PyTorch or JAX GPU baselines. These software tools allow researchers to break through major computational bottlenecks in the field and have been downloaded more than 100k times over the last few years.
The presentation may be of interest to all researchers who deal with point clouds, time series and segmentation maps, with a special focus on:

1. Fast and scalable computations with (generalized) distance matrices.
2. Efficient and robust solvers for the optimal transport (= “Earth Mover’s”) problem.
3. Applications to shape analysis and geometric deep learning, with a case study on the “pixel-perfect” registration of lung vessel trees.

References:
“Geometric data analysis, beyond convolutions”
“Fast geometric learning with symbolic matrices”
KeOps library (geometric computations)
GeomLoss library (optimal transport)

28/04 (10h00 – 11h00)
Speaker: Anna Song
Title: Analyzing branching shapes with cycle matching and signed distance persistent homology.

Abstract in pdf.

Recording


07/04 (15h45 CET)
Speaker: Marc Hoffman
Title: Quelques questions d’inférence géométrique autour de la reconstruction d’une densité ou de son support

Nous ferons un point dans un exposé un peu généraliste sur quelques questions d’inférence géométrique. Etant donné l’observation de la réalisation d’un éŽchantillon d’une loi P de probabilité dont le support est une sous-variété M compacte de dimension d<D, où D est la dimension de l’espace euclidien ambiant, nous nous intéressons à plusieurs questions lorsque P est absolument continue par rapport à la forme volume de M: reconstruction minimax de M pour la distance de Hausdorff reconstruction de la “géométrie” de M à travers son laplacien, reconstruction de propriétes topologiques de M, comme son homologie. Nous préciserons certains résultats récents obtenus en collaboration avec C. Berenfeld, J. Harvey et K. Shankar sur (i) l’estimation du reach (au sens de Federer) de M, invariant qui combine courbure maximale et distance d’étranglement de la sous-variété et qui doit nécessairement être contrôlé en vue de l’identifiabilité du modèle statistique sous-jacent et (ii) l’estimation (adaptative) de la densité de P dans un contexte général, notamment lorsque M est inconnue.

Recording


31/03 (11h CET)  Postoponed to 21st of April
Speaker: Anthea Monod
Title: Approximating persistent homology for large datasets

Persistent homology is an important methodology from topological data analysis which adapts theory from algebraic topology to data settings and has been successfully implemented in many applications; it produces a summary in the form of a persistence diagram, which captures the shape and size of the data.  Despite its popularity, persistent homology is simply impossible to compute for very large datasets which prohibits its widespread use in many big data settings.  What can we do if we would like a representative persistence diagram for a very large dataset whose persistent homology we cannot compute due to size restrictions?  We adapt here the classical statistical method of bootstrapping, namely, drawing and studying smaller subsamples from the large dataset.  We show that the mean of the persistence diagrams of subsamples is a valid approximation of the persistence diagram of the large dataset.  Specifically, we show that the expected persistence diagram—taken as a persistence measure computed from the subsamples—is a consistent estimator of the true persistence diagram of the larger dataset.  Studying the bias-variance decomposition serves as a determiner for the number and size of subsamples.  Quantizing the persistence measure then gives us back persistence diagram representation.  We demonstrate our approach on synthetic and real data; furthermore, we give an example of the utility of our approach in a shape clustering problem where we are able to obtain accurate results with only 2% subsampled from the original dataset.

This is joint work with Yueqi Cao (Imperial College London).

Recording


10/03 (15h15 CET)
Speaker: Jeremie Capitao Miniconi
Title: Deconvolution of spherical data corrupted with unknown noise.

We consider the deconvolution problem for densities supported on a $d$-dimensional sphere with unknown center and unknown radius, in the situation where the distribution of the noise is unknown and without any other observations. We propose estimators of the radius, of the center, and of the density of the signal on the sphere that are proved consistent without further information. The estimator of the radius is proved to have almost parametric convergence rate for any dimension $d$. When $d=2$, the estimator of the density is proved to achieve the same rate of convergence over Sobolev regularity classes of densities as when the noise distribution is known.

This is a work in collaboration with Elisabeth Gassiat (LMO).

Recording


24/02 (11h00 CET)
Speaker: Vadim Lebovici
Title: Magnitude persistante

Dans cet exposé, je présenterai un invariant des modules de persistance introduit récemment par Govc et Hepworth et appelé la “magnitude persistante”. Simple à définir, la magnitude d’un module de persistance est la somme des (-1)^d . (e^-a – e^-b) pour chaque barre [a,b) du code-barres de degré homologique d. Cet invariant possède de nombreuses propriétés formelles, comme l’additivité par rapport aux suites exactes, la compatibilité à un certain produit tensoriel, ou une expression explicite dans le cas de filtrations de sous-niveaux. De plus, son lien avec la caractéristique d’Euler en fait un invariant plus rapide à calculer que le code-barres. Enfin, il se généralise naturellement aux modules multipersistants sans perdre les propriétés mentionnées ci-dessus. Si le temps le permet, j’expliquerai les liens de la magnitude persistante avec la magnitude des espaces métriques finis qui ont originellement motivé sa définition par Govc et Hepworth.

Recording


17/02 (11h00 CET)
Speaker: Raphael Reinauer
Title: Persformer: A Transformer Architecture for Topological Machine Learning

One of the main challenges of Topological Data Analysis (TDA) is to extract meaningful features from persistent diagrams directly usable by machine learning algorithms. Indeed, persistence diagrams are intrinsically (multisets of points in $\mathbb R^2$ and cannot be seen straightforwardly as vectors).
This talk introduces Persformer, the first Transformer neural network architecture that accepts persistence diagrams as input. The Persformer architecture significantly outperforms previous topological neural network architectures on classical synthetic benchmark datasets and has competitive performance on graph datasets.
Furthermore, we introduce an interpretability method to topological machine learning to provide meaningful insights into the importance of the topological features in the persistence diagram of classical datasets. We show that having an expressive model like the Persformer allows us to find meaningful vector representations of persistence diagrams and analyze them using our interpretability method.
We expect our topological interpretability method to be helpful as an exploratory tool to understand use cases where the shape of a dataset is essential, such as biology or material science.

Recording


03/02 (15h15 CET)
Speaker: Mathieu Carrière
Title: Statistical analysis of Mapper for stochastic and multivariate filters

Reeb spaces, as well as their discretized versions called Mappers, are common descriptors used in Topological Data Analysis, with plenty of applications in various fields of science, such as computational biology and data visualization, among others. The stability and quantification of the rate of convergence of the Mapper to the Reeb space has been studied a lot in recent works, focusing on the case where a scalar-valued filter is used for the computation of Mapper. On the other hand, much less is known in the multivariate case, when the codomain of the filter is Rp, and in the general case, when it is a general metric space (Z, dZ), instead of R. In this talk, I will introduce a slight modification of the usual Mapper construction and give risk bounds for estimating the Reeb space using this estimator. Our approach applies in particular to the setting where the filter function used to compute Mapper is also estimated from data, such as the eigenfunctions of PCA. I will also provide preliminary applications of this estimator in statistics and machine learning for different kinds of target filters.

This is joint work with Bertrand Michel.

Recording


13/01 (11h00 CET)
Speaker: Olympio Hacquard
Title: Topologically penalized regression on manifolds

We study a regression problem on a compact manifold M. In order to take advantage of the underlying geometry and topology of the data, the regression task is performed on the basis of the first several eigenfunctions of the Laplace-Beltrami operator of the manifold, that are regularized with topological penalties. The proposed penalties are based on the topology of the sub-level sets of either the eigenfunctions or the estimated function. The overall approach is shown to yield promising and competitive performance on various applications to both synthetic and real data sets. We also provide theoretical guarantees on the regression function estimates, on both its prediction error and its smoothness (in a topological sense). Taken together, these results support the relevance of our approach in the case where the targeted function is “topologically smooth”.

Recording


06/01 (at 15h15 CET!)
Speaker: Catherine Aaron
Title: Local convex hull for support, density and level set estimation

In “A local nearest-neighbor convex-hull construction of home ranges and utilization distributions” authors proposed an algorithm for “home ranges” and “utilization distributions” that can be translated into statistics as support and density estimation.

In a first part we give the properties of the associated support estimator when the support is “full dimensional” but also when the support is a lower dimensional manifold (with or without boundary).

A second part is dedicated to the density (and level set) estimation. When applying the local convex hull idea for that purpose the initial estimator is biased and this bias can be reduced. We will present the bias reduction method and show that it allows to have density estimator adapted to (unknown) compact support. An extension of the proposed estimator when the distribution is supported by a (unknown) lower dimensional manifold with or without boundary is also given.

Recording


09/12 (at 16h30 CET!)
Speaker: Vadim Lebovici
Title: Hybrid transforms of constructible functions

Euler calculus — integration of constructible functions with respect to the Euler characteristic — is of increasing interest in TDA. For instance, the (constructible) Radon transform has provided an answer to the following important question: are two subsets of R^n with same persistent homology in all degrees and for all height filtrations equal?

In this talk I will introduce integral transforms combining Lebesgue integration and Euler calculus. We will see that Lebesgue integration gives access to regularity results, while Euler calculus conveys topological information. For instance, we get mean formulae for these transforms in the context of sublevel-sets persistent homology for random filtrations.

Recording


25/11 (at 15h45 CET!)
Speaker: Eduardo Fonseca Mendes
Title: Generalized Information Criteria for Structured Sparse Models

Regularized M-estimators are widely used due to their ability of recovering a low-dimensional model in high-dimensional scenarios. Some recent efforts on this subject focused on creating a unified framework for establishing oracle bounds and deriving conditions for support recovery. Under this same framework, we propose a new Generalized Information Criteria that takes into consideration the sparsity pattern one wishes to recover. We obtain non-asymptotic model selection bounds and sufficient conditions for model selection consistency of the GIC. Furthermore, we show that one may use the GIC for selecting the regularization parameter in a way that the sequence of model subspaces contains the true model with probability converging to one. This allows practical use of the GIC for model selection in high-dimensional scenarios. We illustrate those conditions on examples including group LASSO generalized linear regression and low rank matrix regression.

It is joint work with a former student (Gabriel Pinto PhD Student @ Northwestern University).

Recording


18/11
Speaker: Jonathan Spreer
Title: An algorithm to compute the crosscap number of a knot

The crosscap number of a knot is the non-orientable counterpart of its
genus. It is defined as the minimum of one minus the Euler characteristic of
S, taken over all non-orientable surfaces S bounding the knot. Computing the
crosscap number of a knot is tricky, since normal surface theory – the usual
tool to prove computability of problems in 3-manifold topology, does not
deliver the answer “out-of-the-box”.

In this talk, I will review the strengths and weaknesses of normal surface
theory, focusing on why we need to work to obtain an algorithm to compute
the crosscap number. I will then explain the theorem stating that an
algorithm due to Burton and Ozlen can be used to give us the answer.

This is joint work with Jaco, Rubinstein, and Tillmann.

Recording


14/10
Speaker: Marine Le Morvan
Title: What’s a good imputation to predict with missing values ?

How to learn a good predictor on data with missing values ? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation may not be needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.

Recording


07/10
Speaker: Etienne Lasalle
Title: Heat diffusion distance processes: a statistically founded method to analyze graph data sets.

When working with weighted graphs, one can interpret weights as the thermal conductivity of edges. This means that heat diffuses faster along edges with higher weights. Given initial conditions, one can use the way heat diffuses to compare graphs. But choosing a relevant and informative diffusion time is often essential and challenging. To circumvent this issue, we choose to take into account the whole diffusion process. For that, we define real-valued processes indexed by all the diffusion times in [0, T] for some T > 0, namely the Heat Kernel Distance (HKD) process and the Heat Persistence Distance (HPD) process. Borrowing tools from TDA, the HPD process is able to compare graphs without known node correspondence or even graphs with different sizes. In this talk, I will introduce these processes and present their statistical properties. Namely, we proved under mild assumptions that they verify a functional central limit theorem and admit a gaussian approximation. Moreover, I will present potential applications of these processes (the construction of confidence bands and two-sample tests, the study of neural networks through their activation graphs).

Recording


23/09
Speaker: Daniel Perez
Title: On zeta functions and their application to the study of the topology of superlevel sets of stochastic processes

We introduce the notion of stochastic zeta function, which for some classes of processes share many properties with their counterparts in analytic number theory. These zeta functions, defined in terms of the barcode, are related to a dual variable counting the number of bars in the bars in the barcode of length ε. A “prime number theorem” can then be proved for this dual variable in terms of the zeta function of the considered process.

Recording

2020-2021

01/04
Speaker: Pierre Baudot
Title: Cohomological characterization of Information and Higher Order Statistical Structures – Machine Learning and Statistical Physic Aspects

We will visit methods that quantify the statistical interactions structure within a given data set using the characterization of information theory in cohomology, and provide their expression in term of statistical physic and machine learning. In a first part, we will have a look at the formalism of Information Cohomology obtained with Bennequin and extended by Vigneaux to Tsallis entropies constructed on complexes of random variables. It allows to characterize entropy uniquely as the first class of the cohomology and Multivariate Mutual Informations as coboundary.
In a second part, we will have a look at the application of the computationally tractable simplicial subcase to genetic expression and standard dataset in machine learning. Mutual statistical independence is equivalent to the vanishing of all k-MI (Ik=0), leading to the conclusion that the Ik provide refined meepiasures of statistical dependencies and that the cohomology quantifies the obstruction to statistical factorization. The application to genetic expression and cell-type classification recovers the known differential expressions and co-regulations of genetic modules. The complex has a direct interpretation in terms of unsupervised and supervised deep learning where the neural network architecture is given by the chain complex and backpropagation becomes forward (cohomological). Some preliminary examples of applications are given on classical datasets.


25/03
Speaker: John Harvey
Title: Estimating the reach of a submanifold

The reach is an important geometric invariant of submanifolds of Euclidean space. It is a real-valued global invariant incorporating information about the second fundamental form of the embedding and the location of the first critical point of the distance from the submanifold. In the subject of geometric inference, the reach plays a crucial role. I will give a new method of estimating the reach of a submanifold, developed jointly with Clément Berenfeld, Marc Hoffmann and Krishnan Shankar. This results in improved convergence rates, but a minimax optimal estimator remains to be found.


11/03
Speaker: Amit Moscovich
Title: Nonparametric estimation of high-dimensional shape spaces with applications to structural biology

Over the last twenty years, there have been major advances in non-linear dimensionality reduction, or manifold learning, and nonparametric regression of high-dimensional datasets with low intrinsic dimensionality. A key idea in this field is the use of data-dependent Fourier-like basis vectors given by the eigenvectors of a graph Laplacian. In this talk, I will discuss the application of these methods for mapping spaces of volumetric shapes with continuous motion. Three lines of research will be presented:

(i) High-dimensional nonparametric estimation of distributions of volumetric signals from noisy linear measurements.

(ii) Leveraging the Wasserstein optimal transport metric for manifold learning and clustering.

(iii) Non-linear independent component analysis for analyzing independent motions.

A key motivation for this work comes from structural biology, where breakthrough advances in cryo-electron microscopy have led to thousands of atomic-resolution reconstructions of various proteins in their native states. However, the success of this field has been mostly limited to the solution of the inverse problem aimed at recovering a single rigid structure, while many important macromolecules contain several parts that can move in a continuous fashion, thus forming a manifold of conformations. The methods described in this talk present progress towards the solution of this grand challenge, namely the extension of reconstruction methods which estimate a single 3D structure to methods that recover entire manifolds of conformations.


18/02
Speaker: Bastian Rieck
Title: Topological Representation Learning for Structured and Unstructured Data

Topological data analysis emerged as an effective tool in machine learning, supporting the analysis of neural networks, but also driving the development of novel algorithms that incorporate topological characteristics. This talk will focus on learning appropriate topological representations for unstructured data sets, such as point clouds, and structured data sets, such as graphs. Specifically, we will discuss how to use topological data analysis to obtain topologically faithful embeddings. We will also study graph classification, which is of particular interest since graphs are inherently amenable to a topological description in terms of their connected components and cycles. Next to describing ‘deep’ approaches for such tasks, this talk will also build a bridge to graph neural networks and demonstrate a topological variant of ‘readout’ and ‘aggregation’ functions, which can be learned end-to-end.


28/01
Speaker: Ery Arias-Castro
Title: On Using Graph Distances to Estimate Euclidean and Related Distances

Graph distances have proven quite useful in machine learning/statistics, particularly in the estimation of Euclidean or geodesic distances, and as such have been used to embed a graph (the multidimensional scaling problem). The talk will include a partial review of the literature and then present more recent developments, including on the minimax estimation of distances on a surface and consequences for manifold learning, on the estimation of curvature-constrained distances on a surface, and on the estimation of Euclidean distances based on an unweighted and noisy neighborhood graph.


21/01
Speaker: Antonin Guilloux and Guillaume Moroz
Title: Visualisation d’une fibration du complémentaire du noeud de huit

Le complémentaire du noeud de huit dans la sphère S^3 est une variété de dimension 3 dont on sait qu’elle est homéomorphe à un fibré en tores épointés sur le cercle. De manière analogue aux fibrations de Milnor, nous utilisons une paramétrisation du noeud comme lieu des zéros de deux polynômes réels pour décrire explicitement dans R^3 les fibres. Chaque fibre est alors une hypersurface lisse à bord le noeud dans R^3, définie par l’annulation d’un polynôme à coefficients réels. Nous pouvons alors calculer un mesh pour suffisamment de ces surfaces et les représenter.

Cela mène à une visualisation, sous forme de film, de cette fibration et d’une courbe significative mathématiquement. Nous présenterons les diverses étapes de ce travail.


07/01
Speaker: Owen Rouillé
Title: Computation of Large Asymptotics of 3-Manifold Quantum Invariants

Quantum topological invariants have played an important role in computational topology, and they are at the heart of major modern mathematical conjectures. In this article, we study the experimental problem of computing large r values of Turaev-Viro invariants TV_r. We base our approach on an optimized backtracking algorithm, consisting of enumerating combinatorial data on a triangulation of a 3-manifold. We design an easily computable parameter to estimate the complexity of the enumeration space, based on lattice point counting in polytopes, and show experimentally its accuracy. We apply this parameter to a preprocessing strategy on the triangulation, and combine it with multi-precision arithmetics in order to compute the Turaev-Viro invariants. We finally study the improvements brought by these optimizations compared to state-of-the-art implementations, and verify experimentally Chen and Yang’s volume conjecture on a census of closed 3-manifold.


17/12
Speaker: Vincent Divol
Title: Density estimation on manifolds: an optimal transport approach

Density estimation is one of the most classical problem in nonparametric statistics: given i.i.d. samples X_1,…,X_n from a distribution \mu with density f on R^D, the goal is to reconstruct the underlying density (say for instance for the L_p norm). This problem is known to become untractable in high dimension D >> 1. We propose to overcome this issue by assuming that the distribution \mu is actually supported around a low dimensional unknown shape M, of dimension d << D. After showing that this problem is degenerate for a large class of standard losses (L_p, total variation, etc.), we focus on the Wasserstein loss, for which we build a minimax estimator, based on kernel density estimation, whose rate of convergence depends on d, and on the regularity of the underlying density, but not on the ambient dimension D.


03/12
Speaker: Serguei Barannikov
Title: Canonical forms = Persistence diagrams.

Any filtered complex over a field F can be brought by a linear transformation preserving the filtration to so called canonical form, a canonically defined direct sum of filtered complexes of two types: one-dimensional complexes with trivial differential d(e_{t_i})=0 and two-dimensional complexes with trivial homology d(e_{s_j})=e_{r_j}. The talk is devoted to the proof of this theorem, that was first published in the speaker’s 1994 paper “Framed Morse complex and its invariants”, AMS, Advances in Soviet Mathematics 21: 93–115. In that work these canonical form invariants were applied to Morse complexes which calculate homology of sublevel sets. Starting from middle of 2000s these invariants became widely popular in Applied Mathematics under the name of “persistence diagrams” and “persistence barcodes”. The mentioned classification theorem is usually referred in Applied Mathematics as the Persistent homology Main (or Structure, or Principal) Theorem. It is curious that the first dissertation in Applied Mathematics on the subject of these persistence invariants, which appeared in 2001 with parts published in 2000, also considered as principal example of filtered complexes exactly the same Morse complexes, in the particular case of two dimensions. It also had a big interesting part devoted to “canonical bases”. Many other filtered complexes are nowadays considered (Cech, simplicial, cubical etc) for sublevel homology of a function on topological space. Most often, the filtered complexes describing sublevel homology of the distance to a point-cloud set function are considered over last years. The algorithms of more than 10 different software platforms, that exist actually for computation of persistence diagrams, are based on the described in the mentioned 1994 paper algorithm bringing filtered complexes to the canonical form. After the proof of this theorem, final part of the talk is devoted to some applications of this theorem in pure mathematics, in particular in algebraic topology and differential geometry.


26/11
Speaker: Wojciech Reise
Title: Towards Pattern Detection using Dynamic Time Warping

This talk will be centered around pattern detection and Dynamic Time Warping. Using the problem of velocity estimation from magnetic data in navigation systems, I will motivate the need for detecting repeated patterns in time series and the use of Dynamic Time Warping to that aim. I will recall the definition as the technique was originally introduced, and present a selection of more recent variants, including a differentiable version.


12/11
Speaker: Nina Otter
Title: Magnitude homology: overview and current directions

Magnitude is an isometric invariant of metric spaces that was introduced by Tom Leinster in 2010, and is currently the object of intense research, since it has been shown to encode many invariants of a metric space such as volume, dimension and capacity. Magnitude homology is a homology theory for metric spaces that was introduced by Hepworth-Willerton and Leinster-Shulman, and categorifies magnitude in a similar way as the singular homology of a topological space categorifies its Euler characteristic.

In this talk I will first introduce magnitude and magnitude homology. I will then give an overview of existing results and current research in this area, explain how magnitude homology is related to persistent homology, and finally discuss some open questions and work in progress.


05/11
Speaker: Mathieu Carriere
Title: (Multiparameter) Persistent Homology for Quantitative Multiplex Immunofluorescence Images.

In this talk, I will present a novel application of persistent homology to characterize the spatial arrangement of immune and epithelial (tumor) cells within the breast cancer immune microenvironment. More specifically, quantitative and robust characterizations are built by computing persistence diagrams out of a staining technique (quantitative multiplex immunofluorescence) which allows to obtain spatial coordinates and stain intensities on individual cells. The resulting persistence diagrams are evaluated as characteristic biomarkers of cancer subtype and prognostic biomarker of overall survival, and outperform and complement the usual descriptors which capture spatial relationships with nearest neighbor analysis. I will also present an extension of persistence diagrams that we used to capture the behaviour of multiple stain intensities at once, and which is based on multiparameter persistence.


29/10
Speaker: Kristof Huszar
Title: Combinatorial width parameters for 3-manifolds

Algorithms in computational 3-manifold topology typically take a triangulation as an input and return topological information about the underlying 3-manifold. However, extracting the desired information from a triangulation (e.g., evaluating an invariant) is often computationally very expensive. In recent years this complexity barrier has been successfully tackled in some cases by importing ideas from the theory of parameterized algorithms into the realm of 3-manifolds. In this talk we focus on the key combinatorial parameter in this context (i.e., the treewidth of a compact, orientable 3-manifold), and relate it with classical topological invariants (e.g., Heegaard genus) in a quantitative way.

Joint work with Jonathan Spreer and Uli Wagner.


22/10
Speaker: Olympio Hacquard
Title: Regression on the Laplace Beltrami operator eigenfunctions using a topological penalty

We study a regression setting on a compact Riemannian manifold without boundary. We observe a n-sample (x1, y1), … (xn, yn) where the xi lie on the manifold and the yi are real numbers. The purpose is to estimate the regression function in order to predict the value when given a new point on the manifold. The regression is performed on the first eigenfunctions of the Laplace Beltrami operator. Since these functions are hard to estimate, we use a graph approximation of the manifold and replace the spectrum of the Laplace Beltrami operator by the spectrum of the Laplacian of the graph. We will discuss some of the problems that arise when going from a continuous problem to a discrete one. As the Laplacian eigenfunctions are oscillating, they tend to overfit the data and it is necessary to introduce a penalty to the model. The novelty of the method described here is the introduction of a topological penalty, which is equivalent to performing a Lasso weighted by the topological persistence of each eigenfunction, based on the heuristic idea that the functions that oscillate a lot have a high persistence. This method seems to provide promising experimental results to predict new data when given a noisy input and finding theoretical guarantees on the performance of the estimator is still under progress.


15/10
Speaker: Vincent Rouvreau
Title: What is new in Gudhi 3.3.0 ?

In this talk, I will present Edge Collapse and Persistence of Flag Complexes (https://drops.dagstuhl.de/opus/volltexte/2020/12177) and how to use it with Gudhi. Then I will show benchmark results and my conclusions on which algorithm to use in function of the data characteristics.
And finally, we will play with ToMaTo (https://dl.acm.org/doi/10.1145/2535927), now it is integrated in the Gudhi library.


08/10
Speaker: Raphaël Tinarrage
Title: Topological inference from measures and vector bundles (defense rehearsal)

We contribute to the theory of topological inference, based on the theory of persistent homology, by proposing three families of filtrations. For each of them, we prove consistency results—that is, quality of approximation of an underlying geometric object—, and stability results—that is, robustness against initial measurement errors. We propose concrete algorithms in order to use these methods in practice. The first family, the DTM-filtrations, is a robust alternative to the usual Cech filtration when the point cloud is noisy or contains anomalous points. It is based on the notion of distance to measure, which allows to obtain stability in the sense of the Wasserstein distance. Secondly, we propose the lifted filtrations, which make it possible to estimate the homology of immersed manifolds, even when their reach is zero. We introduce the notion of normal reach, and show that it allows to control geometric quantities associated to the manifold. We study the estimation of tangent spaces by local covariance matrices. Thirdly, we develop a theoretical framework for vector bundle filtrations, and define the persistent Stiefel-Whitney classes. We show that the persistent Stiefel-Whitney classes associated to the Cech bundle filtrations are Hausdorff-stable and consistent. To allow their algorithmic implementation, we introduce the notion of weak star condition.


01/10
Speaker: Daniel Perez
Title: Trees and the persistent homology of Markov processes

The TDA community has long been concerned with probabilistic aspect of persistence diagrams. An interesting question is : given a manifold M, what does the barcode of a random function on M look like ? We will give an answer to this question for the simple case of 1D, with the assumption that the processes are Markov. This covers a relatively wide range of well-studied processes such as Itô processes and Brownian motion. To study this question, we depart from the traditional persistent homology setting, by establishing a functorial link between diagrams and trees and studying metric invariants of the latter. These turn out to be related to the almost sure regularity of the processes studied, and if the process is Lévy, to the self-similarity of the process. In particular, we will give the law of the number of bars greater than epsilon and establish some asymptotic results concerning this random variable. We will illustrate these results by examining the particular case of Brownian motion, for which we have an almost full description of the barcode.


17/09
Speaker: Nicolas Berkouk
Title: Persistence and Sheaves: from Theory to Applications (defense rehearsal)

Topological data analysis is a recent field of research aiming at using techniques coming from algebraic topology to define descriptors of datasets.To be useful in practice, these descriptors must be computable, and coming with a notion of metric, in order to express their stability properties with respect to the noise that always comes with real world data. Persistance theory was elaborated in the early 2000’s as a first theoretical setting to define such descriptors – the now famous so-called barcodes. However very well suited to be implemented in a computer, persistence theory has certain limitations. In this work, we establish explicit links between the theory of derived sheaves equipped with the convolution distance (after Kashiwara-Schapira) and persistence theory in order to enjoy best of both’s worlds: computer friendliness of persistence in the one hand, theoretical deepness of sheaf theory on the other one.

10/09
Speaker: Barak Sober
Title: The Manifold Moving Least-Squares (Manifold-MLS) Framework: estimating manifolds and reconstructing their atlas from high-dimensional point clouds

Differentiable manifolds are an indispensable ‘language’ in modern physics and mathematics. As such, there are a plethora of analytic tools designed to investigate manifold-based models (e.g., connections, differential forms, curvature tensors, parallel transport, bundles). However, in order to facilitate these tools, one normally assumes access to the manifold’s atlas of charts (i.e., local parametrizations). In recent years, manifold-based modeling has permeated into data analysis as well, usually in order to avoid working in high dimensions. However, in these data-driven models, charts are not accessible and the only information at hand is the samples themselves. As a result, a common practice in Manifold Learning is to project the data into a lower-dimensional Euclidean domain, while maintaining some notion of distance (e.g., geodesic or diffusion).

In this talk, we introduce an alternative approach named the Manifold Moving Least-Squares (Manifold-MLS) that, given a finite set of samples, reconstructs an atlas of charts and provides an approximation of the manifold itself. Under certain (non-restrictive) sampling assumptions, we prove that the Manifold-MLS produces a smooth Riemannian manifold approximating the sampled one, even in case of noisy samples. We show that the approximation converges to the sampled manifold in case the number of samples tends to infinity, and show the exact convergence rates. Furthermore, we show that in the case of clean samples we obtain a high order of approximation (up to the degree of smoothness) for both the manifold and geodesic distances (i.e., the procedure we describe is nearly isometric when the resolution is sufficiently fine).


03/09
Speaker: Théo Lacombe
Title: Statistical tools for topological descriptors using optimal transport (defense rehearsal)

Topological data analysis (TDA) allows one to extract rich information from structured data (such as graphs or time series) that occurs in modern machine learning problems. This information will be represented as descriptors such as persistence diagrams, which can be described as point measures supported on a half-plane. While persistence diagrams are not elements of a vector space, they can still be compared using partial matching metrics. The similarities between these metrics and those routinely used in optimal transport—another field of mathematics—are known for long, but a formal connection between these two fields is yet to come. The purpose of this thesis is to clarify this connection and develop new theoretical and computational tools to manipulate persistence diagrams, targeting statistical applications. First, we show how optimal partial transport with boundary, a variation of classic optimal transport theory, provides a formalism that encompasses standard metrics in TDA. We then showcase the benefits of this connection in different situations: a theoretical study and the development of an algorithm to perform fast estimation of barycenters of persistence diagrams, the characterization of continuous linear representations of persistence diagrams and how to learn such representations using a neural network, and eventually a stability result in the context of linearly averaging random persistence diagrams.

Previously

For Saclay
For Sophia

Comments are closed.