DataShape Seminar

Weekly seminars

 

For any questions, please contact Antoine Commaret, David Loiseaux, Daniel Perez or Wojciech Reise. The talks are transmitted online and recorded. The recordings of talks from previous years can be found on our YouTube Channel.

30/06 Edouard Bonnet
23/06 Théo Lacombe
16/06 Quentin Mérigot & Mathijs Wintraecken & André Lieutier
02/06 Dmitriy Morozov
19/05 Thomas Bonis
28/04 Jean Feydy & Anna Song
21/04 Anthea Monod
07/04 Marc Hoffman
10/03 Jeremy Capitao
24/02 Vadim Lebovici
17/02 Nicolas Berkouk & Raphael Reinhauer
03/02 Mathieu Carriere
13/01 Olympio Hacquard
06/01 Catherine Aaron
09/12 Vadim Lebovici
25/11 Eduardo Mendes
18/11 Jonathan Spreer
14/10 Marine le Morvan
07/10 Etienne Lasalle
23/09 Daniel Perez

 

30/06 (11h00 CEST)
Speaker: Édouard Bonnet
Title: The geometry of twin-width

We introduce and survey the key features of twin-width, and more generally, of the so-called contraction sequences, insisting on what is and what’s not geometric or topological about them.


23/06 (14h00 CEST)
Speaker: Théo Lacombe
Title: Transport Optimal : Régularisation Entropique et Homogénéité

Le problème de transport optimal, dont les origines remontent à Monge, fournit une métrique pour comparer des mesures de probabilités via un problème d’optimisation. Si ce problème possède de nombreuses propriétés théoriques importantes, sa résolution numérique devient rapidement lourde et sa limitation aux mesures de probabilités est contraignante.
Cet exposé proposera une introduction au problème de transport optimal, d’abord dans sa forme la plus classique, puis dans ses variantes modernes qui font son succès aujourd’hui notamment en apprentissage : la régularisation entropique et le transport déséquilibré. Dans un travail récent [1], T.Séjourné et ses co-auteurs proposent un modèle unifiant ces deux formalismes. Dans certains cas (non-standard), ce nouveau modèle fait apparaître des inhomogénéités qui peuvent mener à des incohérences numériques. Nous ferons la lumière sur ce phénomène et proposerons une correction possible pour obtenir un modèle de transport régularisé déséquilibré homogène, introduit dans [2].

[1] T Séjourné, J Feydy, FX Vialard, A Trouvé, G Peyré : Sinkhorn divergences for unbalanced optimal transport.
[2] TL. An Homogeneous Unbalanced Regularized Optimal Transport model with application to Optimal Transport with Boundary.

Recording


16/06 (15h45 CEST)
Speaker: Quentin Mérigot
Title: Convexité cachée pour le problème de la quantification optimale uniforme

En apprentissage automatique et en problèmes inverses, il est parfois nécessaire de générer (ou déformer) un nuage de points de sorte à approcher une mesure de probabilité modèle 𝜌. Une manière naturelle d’y parvenir est de chercher à minimiser la distance de Wasserstein de la mesure uniforme sur le nuage de points par rapport à la distribution modèle :

Ce problème de minimisation, dans lequel les inconnues sont les positions des atomes (et non leur masse), n’est pas convexe. Il admet des points critiques dont l’énergie est beaucoup plus grande que celle du minimiseur. Pourtant, dans la plupart des cas, une version adaptée de l’algorithme de Lloyd — une méthode de descente de gradient à pas constant — conduit à des configurations présentant une faible énergie. Nous expliquons quantitativement ce comportement, en montrant en particulier que si les points initiaux ne sont pas trop proches les uns des autres, alors une seule étape de l’algorithme de Lloyd est suffisante pour obtenir une bonne approximation de de 𝜌. Je parlerai également d’un résultat plus récent, qualitatif, montrant en dimension 𝑑=2 que l’énergie de quantification des points critiques stables est en réalité commensurable à
l’énergie du minimiseur. Cet exposé est basé sur des travaux en commun avec A. Figalli, F. Santambrogio et C. Sarrazin.

Recording


16/06 (11h20 CEST)
Speaker: André Lieutier
Title: Delaunay-like triangulation of smooth orientable submanifolds by least L1-norm minimization

We focus on one particular instance of the shape reconstruction problem, in which  the shape we wish to reconstruct is an orientable smooth submanifold of the Euclidean space. Assuming we have as input a simplicial complex K that approximates the submanifold (such as the Čech complex or the Rips complex), we recast the reconstruction problem as a least L1-norm minimization problem in which the optimization variable is a chain of K. Providing that K satisfies some reasonable conditions, we prove that the considered minimization problem has a unique solution, whose support is a triangulation of the manifold called the flat Delaunay complex. If the point sample on which the  Čech complex or the Rips complex is built satisfies similar « reasonable conditions », this triangulation is the tangential Delaunay complex introduced  by Boissonnat et al.. Since the objective is a weighted L1-norm and the contraints are linear, the triangulation process can thus be implemented by linear programming.

By Dominique Attali (CNRS)  and André Lieutier.


16/06 (10h30 CEST)
Speaker: Mathijs Wintraecken
Title: Optimal homotopy reconstruction results à la Niyogi, Smale, and Weinberger

We show that the proof of the homotopy reconstruction result by Niyogi, Smale, and Weinberger can be streamlined considerably using Federer’s work on the reach and several geometric observations. While Niyogi, Smale, and Weinberger restricted themselves to C² manifolds $(M)$ with positive reach, our proof extends to sets $(S)$ of positive reach.

Furthermore, the sample we consider does not have to lie directly on the set of positive reach. Instead, we assume that the two one-sided Hausdorff distances (δ from the set $S$ to the sample $P$ and ε the other Hausdorff distance) are bounded. We provide explicit bounds in terms of ε and δ, that guarantee that there exists a parameter $r$ such that the union of balls $\bigcup_{p\in P} B(p, r) $ deformation retracts to $S$. We provide even better bounds for the manifold case. In both cases, our bounds improve considerably on the state-of-the-art. In fact in all cases, except where the set is a one dimensional manifold (with boundary), our bounds are optimal.

Joint work with  Dominique Attali, Hana Dal Poz Kouřimská, Chris Fillmore, Ishika Ghosh, André Lieutier, and Elizabeth Stephenson.


02/06 (17h00 CEST)
Speaker: Dmitriy Morozov
Title: Topological Optimization with Big Steps

Using persistent homology to guide optimization has emerged as a novel application of topological data analysis. Existing methods treat persistence calculation as a black box and backpropagate gradients only onto the simplices involved in particular pairs. We show how the cycles and chains used in the persistence calculation can be used to prescribe gradients to larger subsets of the domain. In particular, we show that in a special case, which serves as a building block for general losses, the problem can be solved exactly in linear time. We present empirical experiments that show the practical benefits of our algorithm: the number of steps required for the optimization is reduced by an order of magnitude.
Joint work with Arnur Nigmetov.

Recording


19/05 (15h45)
Speaker: Thomas Bonis
Title: Sampling with Stein variational gradient descent

Many methods in Bayesian statistics require approximating intractable integrals. After presenting the usual methods used to deal with this issue, namely variational inference and Monte-Carlo approaches, along with their respective shortcomings, I will introduce a novel algorithm called Stein variational gradient descent and present known theoretical results regarding this algorithm.

Recording


28/04 (11h00 – 12h00)
Speaker: Jean Feydy
Title: Fast geometric libraries for vision and data sciences

From 3D point clouds to high-dimensional samples, sparse representations have a key position in the data science toolbox. They complement 2D images and 3D volumes effectively, enabling fast geometric computations for e.g. Gaussian processes and shape analysis.

In this talk, I will present extensions for PyTorch, NumPy, Matlab and R that speed up fundamental computations on (generalized) point clouds by several orders of magnitude, when compared with PyTorch or JAX GPU baselines. These software tools allow researchers to break through major computational bottlenecks in the field and have been downloaded more than 100k times over the last few years.
The presentation may be of interest to all researchers who deal with point clouds, time series and segmentation maps, with a special focus on:

1. Fast and scalable computations with (generalized) distance matrices.
2. Efficient and robust solvers for the optimal transport (= “Earth Mover’s”) problem.
3. Applications to shape analysis and geometric deep learning, with a case study on the “pixel-perfect” registration of lung vessel trees.

References:
“Geometric data analysis, beyond convolutions”
“Fast geometric learning with symbolic matrices”
KeOps library (geometric computations)
GeomLoss library (optimal transport)

28/04 (10h00 – 11h00)
Speaker: Anna Song
Title: Analyzing branching shapes with cycle matching and signed distance persistent homology.

Abstract in pdf.

Recording


07/04 (15h45 CET)
Speaker: Marc Hoffman
Title: Quelques questions d’inférence géométrique autour de la reconstruction d’une densité ou de son support

Nous ferons un point dans un exposé un peu généraliste sur quelques questions d’inférence géométrique. Etant donné l’observation de la réalisation d’un éŽchantillon d’une loi P de probabilité dont le support est une sous-variété M compacte de dimension d<D, où D est la dimension de l’espace euclidien ambiant, nous nous intéressons à plusieurs questions lorsque P est absolument continue par rapport à la forme volume de M: reconstruction minimax de M pour la distance de Hausdorff reconstruction de la “géométrie” de M à travers son laplacien, reconstruction de propriétes topologiques de M, comme son homologie. Nous préciserons certains résultats récents obtenus en collaboration avec C. Berenfeld, J. Harvey et K. Shankar sur (i) l’estimation du reach (au sens de Federer) de M, invariant qui combine courbure maximale et distance d’étranglement de la sous-variété et qui doit nécessairement être contrôlé en vue de l’identifiabilité du modèle statistique sous-jacent et (ii) l’estimation (adaptative) de la densité de P dans un contexte général, notamment lorsque M est inconnue.

Recording


31/03 (11h CET)  Postoponed to 21st of April
Speaker: Anthea Monod
Title: Approximating persistent homology for large datasets

Persistent homology is an important methodology from topological data analysis which adapts theory from algebraic topology to data settings and has been successfully implemented in many applications; it produces a summary in the form of a persistence diagram, which captures the shape and size of the data.  Despite its popularity, persistent homology is simply impossible to compute for very large datasets which prohibits its widespread use in many big data settings.  What can we do if we would like a representative persistence diagram for a very large dataset whose persistent homology we cannot compute due to size restrictions?  We adapt here the classical statistical method of bootstrapping, namely, drawing and studying smaller subsamples from the large dataset.  We show that the mean of the persistence diagrams of subsamples is a valid approximation of the persistence diagram of the large dataset.  Specifically, we show that the expected persistence diagram—taken as a persistence measure computed from the subsamples—is a consistent estimator of the true persistence diagram of the larger dataset.  Studying the bias-variance decomposition serves as a determiner for the number and size of subsamples.  Quantizing the persistence measure then gives us back persistence diagram representation.  We demonstrate our approach on synthetic and real data; furthermore, we give an example of the utility of our approach in a shape clustering problem where we are able to obtain accurate results with only 2% subsampled from the original dataset.

This is joint work with Yueqi Cao (Imperial College London).

Recording


10/03 (15h15 CET)
Speaker: Jeremie Capitao Miniconi
Title: Deconvolution of spherical data corrupted with unknown noise.

We consider the deconvolution problem for densities supported on a $d$-dimensional sphere with unknown center and unknown radius, in the situation where the distribution of the noise is unknown and without any other observations. We propose estimators of the radius, of the center, and of the density of the signal on the sphere that are proved consistent without further information. The estimator of the radius is proved to have almost parametric convergence rate for any dimension $d$. When $d=2$, the estimator of the density is proved to achieve the same rate of convergence over Sobolev regularity classes of densities as when the noise distribution is known.

This is a work in collaboration with Elisabeth Gassiat (LMO).

Recording


24/02 (11h00 CET)
Speaker: Vadim Lebovici
Title: Magnitude persistante

Dans cet exposé, je présenterai un invariant des modules de persistance introduit récemment par Govc et Hepworth et appelé la “magnitude persistante”. Simple à définir, la magnitude d’un module de persistance est la somme des (-1)^d . (e^-a – e^-b) pour chaque barre [a,b) du code-barres de degré homologique d. Cet invariant possède de nombreuses propriétés formelles, comme l’additivité par rapport aux suites exactes, la compatibilité à un certain produit tensoriel, ou une expression explicite dans le cas de filtrations de sous-niveaux. De plus, son lien avec la caractéristique d’Euler en fait un invariant plus rapide à calculer que le code-barres. Enfin, il se généralise naturellement aux modules multipersistants sans perdre les propriétés mentionnées ci-dessus. Si le temps le permet, j’expliquerai les liens de la magnitude persistante avec la magnitude des espaces métriques finis qui ont originellement motivé sa définition par Govc et Hepworth.

Recording


17/02 (11h00 CET)
Speaker: Raphael Reinauer
Title: Persformer: A Transformer Architecture for Topological Machine Learning

One of the main challenges of Topological Data Analysis (TDA) is to extract meaningful features from persistent diagrams directly usable by machine learning algorithms. Indeed, persistence diagrams are intrinsically (multisets of points in $\mathbb R^2$ and cannot be seen straightforwardly as vectors).
This talk introduces Persformer, the first Transformer neural network architecture that accepts persistence diagrams as input. The Persformer architecture significantly outperforms previous topological neural network architectures on classical synthetic benchmark datasets and has competitive performance on graph datasets.
Furthermore, we introduce an interpretability method to topological machine learning to provide meaningful insights into the importance of the topological features in the persistence diagram of classical datasets. We show that having an expressive model like the Persformer allows us to find meaningful vector representations of persistence diagrams and analyze them using our interpretability method.
We expect our topological interpretability method to be helpful as an exploratory tool to understand use cases where the shape of a dataset is essential, such as biology or material science.

Recording


03/02 (15h15 CET)
Speaker: Mathieu Carrière
Title: Statistical analysis of Mapper for stochastic and multivariate filters

Reeb spaces, as well as their discretized versions called Mappers, are common descriptors used in Topological Data Analysis, with plenty of applications in various fields of science, such as computational biology and data visualization, among others. The stability and quantification of the rate of convergence of the Mapper to the Reeb space has been studied a lot in recent works, focusing on the case where a scalar-valued filter is used for the computation of Mapper. On the other hand, much less is known in the multivariate case, when the codomain of the filter is Rp, and in the general case, when it is a general metric space (Z, dZ), instead of R. In this talk, I will introduce a slight modification of the usual Mapper construction and give risk bounds for estimating the Reeb space using this estimator. Our approach applies in particular to the setting where the filter function used to compute Mapper is also estimated from data, such as the eigenfunctions of PCA. I will also provide preliminary applications of this estimator in statistics and machine learning for different kinds of target filters.

This is joint work with Bertrand Michel.

Recording


13/01 (11h00 CET)
Speaker: Olympio Hacquard
Title: Topologically penalized regression on manifolds

We study a regression problem on a compact manifold M. In order to take advantage of the underlying geometry and topology of the data, the regression task is performed on the basis of the first several eigenfunctions of the Laplace-Beltrami operator of the manifold, that are regularized with topological penalties. The proposed penalties are based on the topology of the sub-level sets of either the eigenfunctions or the estimated function. The overall approach is shown to yield promising and competitive performance on various applications to both synthetic and real data sets. We also provide theoretical guarantees on the regression function estimates, on both its prediction error and its smoothness (in a topological sense). Taken together, these results support the relevance of our approach in the case where the targeted function is “topologically smooth”.

Recording


06/01 (at 15h15 CET!)
Speaker: Catherine Aaron
Title: Local convex hull for support, density and level set estimation

In “A local nearest-neighbor convex-hull construction of home ranges and utilization distributions” authors proposed an algorithm for “home ranges” and “utilization distributions” that can be translated into statistics as support and density estimation.

In a first part we give the properties of the associated support estimator when the support is “full dimensional” but also when the support is a lower dimensional manifold (with or without boundary).

A second part is dedicated to the density (and level set) estimation. When applying the local convex hull idea for that purpose the initial estimator is biased and this bias can be reduced. We will present the bias reduction method and show that it allows to have density estimator adapted to (unknown) compact support. An extension of the proposed estimator when the distribution is supported by a (unknown) lower dimensional manifold with or without boundary is also given.

Recording


09/12 (at 16h30 CET!)
Speaker: Vadim Lebovici
Title: Hybrid transforms of constructible functions

Euler calculus — integration of constructible functions with respect to the Euler characteristic — is of increasing interest in TDA. For instance, the (constructible) Radon transform has provided an answer to the following important question: are two subsets of R^n with same persistent homology in all degrees and for all height filtrations equal?

In this talk I will introduce integral transforms combining Lebesgue integration and Euler calculus. We will see that Lebesgue integration gives access to regularity results, while Euler calculus conveys topological information. For instance, we get mean formulae for these transforms in the context of sublevel-sets persistent homology for random filtrations.

Recording


25/11 (at 15h45 CET!)
Speaker: Eduardo Fonseca Mendes
Title: Generalized Information Criteria for Structured Sparse Models

Regularized M-estimators are widely used due to their ability of recovering a low-dimensional model in high-dimensional scenarios. Some recent efforts on this subject focused on creating a unified framework for establishing oracle bounds and deriving conditions for support recovery. Under this same framework, we propose a new Generalized Information Criteria that takes into consideration the sparsity pattern one wishes to recover. We obtain non-asymptotic model selection bounds and sufficient conditions for model selection consistency of the GIC. Furthermore, we show that one may use the GIC for selecting the regularization parameter in a way that the sequence of model subspaces contains the true model with probability converging to one. This allows practical use of the GIC for model selection in high-dimensional scenarios. We illustrate those conditions on examples including group LASSO generalized linear regression and low rank matrix regression.

It is joint work with a former student (Gabriel Pinto PhD Student @ Northwestern University).

Recording


18/11
Speaker: Jonathan Spreer
Title: An algorithm to compute the crosscap number of a knot

The crosscap number of a knot is the non-orientable counterpart of its
genus. It is defined as the minimum of one minus the Euler characteristic of
S, taken over all non-orientable surfaces S bounding the knot. Computing the
crosscap number of a knot is tricky, since normal surface theory – the usual
tool to prove computability of problems in 3-manifold topology, does not
deliver the answer “out-of-the-box”.

In this talk, I will review the strengths and weaknesses of normal surface
theory, focusing on why we need to work to obtain an algorithm to compute
the crosscap number. I will then explain the theorem stating that an
algorithm due to Burton and Ozlen can be used to give us the answer.

This is joint work with Jaco, Rubinstein, and Tillmann.

Recording


14/10
Speaker: Marine Le Morvan
Title: What’s a good imputation to predict with missing values ?

How to learn a good predictor on data with missing values ? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation may not be needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in our experiments with finite number of samples.

Recording


07/10
Speaker: Etienne Lasalle
Title: Heat diffusion distance processes: a statistically founded method to analyze graph data sets.

When working with weighted graphs, one can interpret weights as the thermal conductivity of edges. This means that heat diffuses faster along edges with higher weights. Given initial conditions, one can use the way heat diffuses to compare graphs. But choosing a relevant and informative diffusion time is often essential and challenging. To circumvent this issue, we choose to take into account the whole diffusion process. For that, we define real-valued processes indexed by all the diffusion times in [0, T] for some T > 0, namely the Heat Kernel Distance (HKD) process and the Heat Persistence Distance (HPD) process. Borrowing tools from TDA, the HPD process is able to compare graphs without known node correspondence or even graphs with different sizes. In this talk, I will introduce these processes and present their statistical properties. Namely, we proved under mild assumptions that they verify a functional central limit theorem and admit a gaussian approximation. Moreover, I will present potential applications of these processes (the construction of confidence bands and two-sample tests, the study of neural networks through their activation graphs).

Recording


23/09
Speaker: Daniel Perez
Title: On zeta functions and their application to the study of the topology of superlevel sets of stochastic processes

We introduce the notion of stochastic zeta function, which for some classes of processes share many properties with their counterparts in analytic number theory. These zeta functions, defined in terms of the barcode, are related to a dual variable counting the number of bars in the bars in the barcode of length ε. A “prime number theorem” can then be proved for this dual variable in terms of the zeta function of the considered process.

Recording

Previously

2020-2021

01/04
Speaker: Pierre Baudot
Title: Cohomological characterization of Information and Higher Order Statistical Structures – Machine Learning and Statistical Physic Aspects

We will visit methods that quantify the statistical interactions structure within a given data set using the characterization of information theory in cohomology, and provide their expression in term of statistical physic and machine learning. In a first part, we will have a look at the formalism of Information Cohomology obtained with Bennequin and extended by Vigneaux to Tsallis entropies constructed on complexes of random variables. It allows to characterize entropy uniquely as the first class of the cohomology and Multivariate Mutual Informations as coboundary.
In a second part, we will have a look at the application of the computationally tractable simplicial subcase to genetic expression and standard dataset in machine learning. Mutual statistical independence is equivalent to the vanishing of all k-MI (Ik=0), leading to the conclusion that the Ik provide refined meepiasures of statistical dependencies and that the cohomology quantifies the obstruction to statistical factorization. The application to genetic expression and cell-type classification recovers the known differential expressions and co-regulations of genetic modules. The complex has a direct interpretation in terms of unsupervised and supervised deep learning where the neural network architecture is given by the chain complex and backpropagation becomes forward (cohomological). Some preliminary examples of applications are given on classical datasets.

 


25/03
Speaker: John Harvey
Title: Estimating the reach of a submanifold

The reach is an important geometric invariant of submanifolds of Euclidean space. It is a real-valued global invariant incorporating information about the second fundamental form of the embedding and the location of the first critical point of the distance from the submanifold. In the subject of geometric inference, the reach plays a crucial role. I will give a new method of estimating the reach of a submanifold, developed jointly with Clément Berenfeld, Marc Hoffmann and Krishnan Shankar. This results in improved convergence rates, but a minimax optimal estimator remains to be found.

 


11/03
Speaker: Amit Moscovich
Title: Nonparametric estimation of high-dimensional shape spaces with applications to structural biology

Over the last twenty years, there have been major advances in non-linear dimensionality reduction, or manifold learning, and nonparametric regression of high-dimensional datasets with low intrinsic dimensionality. A key idea in this field is the use of data-dependent Fourier-like basis vectors given by the eigenvectors of a graph Laplacian. In this talk, I will discuss the application of these methods for mapping spaces of volumetric shapes with continuous motion. Three lines of research will be presented:

(i) High-dimensional nonparametric estimation of distributions of volumetric signals from noisy linear measurements.

(ii) Leveraging the Wasserstein optimal transport metric for manifold learning and clustering.

(iii) Non-linear independent component analysis for analyzing independent motions.

A key motivation for this work comes from structural biology, where breakthrough advances in cryo-electron microscopy have led to thousands of atomic-resolution reconstructions of various proteins in their native states. However, the success of this field has been mostly limited to the solution of the inverse problem aimed at recovering a single rigid structure, while many important macromolecules contain several parts that can move in a continuous fashion, thus forming a manifold of conformations. The methods described in this talk present progress towards the solution of this grand challenge, namely the extension of reconstruction methods which estimate a single 3D structure to methods that recover entire manifolds of conformations.

 


18/02
Speaker: Bastian Rieck
Title: Topological Representation Learning for Structured and Unstructured Data

Topological data analysis emerged as an effective tool in machine learning, supporting the analysis of neural networks, but also driving the development of novel algorithms that incorporate topological characteristics. This talk will focus on learning appropriate topological representations for unstructured data sets, such as point clouds, and structured data sets, such as graphs. Specifically, we will discuss how to use topological data analysis to obtain topologically faithful embeddings. We will also study graph classification, which is of particular interest since graphs are inherently amenable to a topological description in terms of their connected components and cycles. Next to describing ‘deep’ approaches for such tasks, this talk will also build a bridge to graph neural networks and demonstrate a topological variant of ‘readout’ and ‘aggregation’ functions, which can be learned end-to-end.

 


28/01
Speaker: Ery Arias-Castro
Title: On Using Graph Distances to Estimate Euclidean and Related Distances

Graph distances have proven quite useful in machine learning/statistics, particularly in the estimation of Euclidean or geodesic distances, and as such have been used to embed a graph (the multidimensional scaling problem). The talk will include a partial review of the literature and then present more recent developments, including on the minimax estimation of distances on a surface and consequences for manifold learning, on the estimation of curvature-constrained distances on a surface, and on the estimation of Euclidean distances based on an unweighted and noisy neighborhood graph.

 


21/01
Speaker: Antonin Guilloux and Guillaume Moroz
Title: Visualisation d’une fibration du complémentaire du noeud de huit

Le complémentaire du noeud de huit dans la sphère S^3 est une variété de dimension 3 dont on sait qu’elle est homéomorphe à un fibré en tores épointés sur le cercle. De manière analogue aux fibrations de Milnor, nous utilisons une paramétrisation du noeud comme lieu des zéros de deux polynômes réels pour décrire explicitement dans R^3 les fibres. Chaque fibre est alors une hypersurface lisse à bord le noeud dans R^3, définie par l’annulation d’un polynôme à coefficients réels. Nous pouvons alors calculer un mesh pour suffisamment de ces surfaces et les représenter.

Cela mène à une visualisation, sous forme de film, de cette fibration et d’une courbe significative mathématiquement. Nous présenterons les diverses étapes de ce travail.

 


07/01
Speaker: Owen Rouillé
Title: Computation of Large Asymptotics of 3-Manifold Quantum Invariants

Quantum topological invariants have played an important role in computational topology, and they are at the heart of major modern mathematical conjectures. In this article, we study the experimental problem of computing large r values of Turaev-Viro invariants TV_r. We base our approach on an optimized backtracking algorithm, consisting of enumerating combinatorial data on a triangulation of a 3-manifold. We design an easily computable parameter to estimate the complexity of the enumeration space, based on lattice point counting in polytopes, and show experimentally its accuracy. We apply this parameter to a preprocessing strategy on the triangulation, and combine it with multi-precision arithmetics in order to compute the Turaev-Viro invariants. We finally study the improvements brought by these optimizations compared to state-of-the-art implementations, and verify experimentally Chen and Yang’s volume conjecture on a census of closed 3-manifold.

 


17/12
Speaker: Vincent Divol
Title: Density estimation on manifolds: an optimal transport approach

Density estimation is one of the most classical problem in nonparametric statistics: given i.i.d. samples X_1,…,X_n from a distribution \mu with density f on R^D, the goal is to reconstruct the underlying density (say for instance for the L_p norm). This problem is known to become untractable in high dimension D >> 1. We propose to overcome this issue by assuming that the distribution \mu is actually supported around a low dimensional unknown shape M, of dimension d << D. After showing that this problem is degenerate for a large class of standard losses (L_p, total variation, etc.), we focus on the Wasserstein loss, for which we build a minimax estimator, based on kernel density estimation, whose rate of convergence depends on d, and on the regularity of the underlying density, but not on the ambient dimension D.

 


03/12
Speaker: Serguei Barannikov
Title: Canonical forms = Persistence diagrams.

Any filtered complex over a field F can be brought by a linear transformation preserving the filtration to so called canonical form, a canonically defined direct sum of filtered complexes of two types: one-dimensional complexes with trivial differential d(e_{t_i})=0 and two-dimensional complexes with trivial homology d(e_{s_j})=e_{r_j}. The talk is devoted to the proof of this theorem, that was first published in the speaker’s 1994 paper “Framed Morse complex and its invariants”, AMS, Advances in Soviet Mathematics 21: 93–115. In that work these canonical form invariants were applied to Morse complexes which calculate homology of sublevel sets. Starting from middle of 2000s these invariants became widely popular in Applied Mathematics under the name of “persistence diagrams” and “persistence barcodes”. The mentioned classification theorem is usually referred in Applied Mathematics as the Persistent homology Main (or Structure, or Principal) Theorem. It is curious that the first dissertation in Applied Mathematics on the subject of these persistence invariants, which appeared in 2001 with parts published in 2000, also considered as principal example of filtered complexes exactly the same Morse complexes, in the particular case of two dimensions. It also had a big interesting part devoted to “canonical bases”. Many other filtered complexes are nowadays considered (Cech, simplicial, cubical etc) for sublevel homology of a function on topological space. Most often, the filtered complexes describing sublevel homology of the distance to a point-cloud set function are considered over last years. The algorithms of more than 10 different software platforms, that exist actually for computation of persistence diagrams, are based on the described in the mentioned 1994 paper algorithm bringing filtered complexes to the canonical form. After the proof of this theorem, final part of the talk is devoted to some applications of this theorem in pure mathematics, in particular in algebraic topology and differential geometry.

 


26/11
Speaker: Wojciech Reise
Title: Towards Pattern Detection using Dynamic Time Warping

This talk will be centered around pattern detection and Dynamic Time Warping. Using the problem of velocity estimation from magnetic data in navigation systems, I will motivate the need for detecting repeated patterns in time series and the use of Dynamic Time Warping to that aim. I will recall the definition as the technique was originally introduced, and present a selection of more recent variants, including a differentiable version.

 


12/11
Speaker: Nina Otter
Title: Magnitude homology: overview and current directions

Magnitude is an isometric invariant of metric spaces that was introduced by Tom Leinster in 2010, and is currently the object of intense research, since it has been shown to encode many invariants of a metric space such as volume, dimension and capacity. Magnitude homology is a homology theory for metric spaces that was introduced by Hepworth-Willerton and Leinster-Shulman, and categorifies magnitude in a similar way as the singular homology of a topological space categorifies its Euler characteristic.

In this talk I will first introduce magnitude and magnitude homology. I will then give an overview of existing results and current research in this area, explain how magnitude homology is related to persistent homology, and finally discuss some open questions and work in progress.

 


05/11
Speaker: Mathieu Carriere
Title: (Multiparameter) Persistent Homology for Quantitative Multiplex Immunofluorescence Images.

In this talk, I will present a novel application of persistent homology to characterize the spatial arrangement of immune and epithelial (tumor) cells within the breast cancer immune microenvironment. More specifically, quantitative and robust characterizations are built by computing persistence diagrams out of a staining technique (quantitative multiplex immunofluorescence) which allows to obtain spatial coordinates and stain intensities on individual cells. The resulting persistence diagrams are evaluated as characteristic biomarkers of cancer subtype and prognostic biomarker of overall survival, and outperform and complement the usual descriptors which capture spatial relationships with nearest neighbor analysis. I will also present an extension of persistence diagrams that we used to capture the behaviour of multiple stain intensities at once, and which is based on multiparameter persistence.

 


29/10
Speaker: Kristof Huszar
Title: Combinatorial width parameters for 3-manifolds

Algorithms in computational 3-manifold topology typically take a triangulation as an input and return topological information about the underlying 3-manifold. However, extracting the desired information from a triangulation (e.g., evaluating an invariant) is often computationally very expensive. In recent years this complexity barrier has been successfully tackled in some cases by importing ideas from the theory of parameterized algorithms into the realm of 3-manifolds. In this talk we focus on the key combinatorial parameter in this context (i.e., the treewidth of a compact, orientable 3-manifold), and relate it with classical topological invariants (e.g., Heegaard genus) in a quantitative way.

Joint work with Jonathan Spreer and Uli Wagner.

 


22/10
Speaker: Olympio Hacquard
Title: Regression on the Laplace Beltrami operator eigenfunctions using a topological penalty

We study a regression setting on a compact Riemannian manifold without boundary. We observe a n-sample (x1, y1), … (xn, yn) where the xi lie on the manifold and the yi are real numbers. The purpose is to estimate the regression function in order to predict the value when given a new point on the manifold. The regression is performed on the first eigenfunctions of the Laplace Beltrami operator. Since these functions are hard to estimate, we use a graph approximation of the manifold and replace the spectrum of the Laplace Beltrami operator by the spectrum of the Laplacian of the graph. We will discuss some of the problems that arise when going from a continuous problem to a discrete one. As the Laplacian eigenfunctions are oscillating, they tend to overfit the data and it is necessary to introduce a penalty to the model. The novelty of the method described here is the introduction of a topological penalty, which is equivalent to performing a Lasso weighted by the topological persistence of each eigenfunction, based on the heuristic idea that the functions that oscillate a lot have a high persistence. This method seems to provide promising experimental results to predict new data when given a noisy input and finding theoretical guarantees on the performance of the estimator is still under progress.

 


15/10
Speaker: Vincent Rouvreau
Title: What is new in Gudhi 3.3.0 ?

In this talk, I will present Edge Collapse and Persistence of Flag Complexes (https://drops.dagstuhl.de/opus/volltexte/2020/12177) and how to use it with Gudhi. Then I will show benchmark results and my conclusions on which algorithm to use in function of the data characteristics.
And finally, we will play with ToMaTo (https://dl.acm.org/doi/10.1145/2535927), now it is integrated in the Gudhi library.

 


08/10
Speaker: Raphaël Tinarrage
Title: Topological inference from measures and vector bundles (defense rehearsal)

We contribute to the theory of topological inference, based on the theory of persistent homology, by proposing three families of filtrations. For each of them, we prove consistency results—that is, quality of approximation of an underlying geometric object—, and stability results—that is, robustness against initial measurement errors. We propose concrete algorithms in order to use these methods in practice. The first family, the DTM-filtrations, is a robust alternative to the usual Cech filtration when the point cloud is noisy or contains anomalous points. It is based on the notion of distance to measure, which allows to obtain stability in the sense of the Wasserstein distance. Secondly, we propose the lifted filtrations, which make it possible to estimate the homology of immersed manifolds, even when their reach is zero. We introduce the notion of normal reach, and show that it allows to control geometric quantities associated to the manifold. We study the estimation of tangent spaces by local covariance matrices. Thirdly, we develop a theoretical framework for vector bundle filtrations, and define the persistent Stiefel-Whitney classes. We show that the persistent Stiefel-Whitney classes associated to the Cech bundle filtrations are Hausdorff-stable and consistent. To allow their algorithmic implementation, we introduce the notion of weak star condition.

 


01/10
Speaker: Daniel Perez
Title: Trees and the persistent homology of Markov processes

The TDA community has long been concerned with probabilistic aspect of persistence diagrams. An interesting question is : given a manifold M, what does the barcode of a random function on M look like ? We will give an answer to this question for the simple case of 1D, with the assumption that the processes are Markov. This covers a relatively wide range of well-studied processes such as Itô processes and Brownian motion. To study this question, we depart from the traditional persistent homology setting, by establishing a functorial link between diagrams and trees and studying metric invariants of the latter. These turn out to be related to the almost sure regularity of the processes studied, and if the process is Lévy, to the self-similarity of the process. In particular, we will give the law of the number of bars greater than epsilon and establish some asymptotic results concerning this random variable. We will illustrate these results by examining the particular case of Brownian motion, for which we have an almost full description of the barcode.

 


17/09
Speaker: Nicolas Berkouk
Title: Persistence and Sheaves: from Theory to Applications (defense rehearsal)

Topological data analysis is a recent field of research aiming at using techniques coming from algebraic topology to define descriptors of datasets.To be useful in practice, these descriptors must be computable, and coming with a notion of metric, in order to express their stability properties with respect to the noise that always comes with real world data. Persistance theory was elaborated in the early 2000’s as a first theoretical setting to define such descriptors – the now famous so-called barcodes. However very well suited to be implemented in a computer, persistence theory has certain limitations. In this work, we establish explicit links between the theory of derived sheaves equipped with the convolution distance (after Kashiwara-Schapira) and persistence theory in order to enjoy best of both’s worlds: computer friendliness of persistence in the one hand, theoretical deepness of sheaf theory on the other one.

 


10/09
Speaker: Barak Sober
Title: The Manifold Moving Least-Squares (Manifold-MLS) Framework: estimating manifolds and reconstructing their atlas from high-dimensional point clouds

Differentiable manifolds are an indispensable ‘language’ in modern physics and mathematics. As such, there are a plethora of analytic tools designed to investigate manifold-based models (e.g., connections, differential forms, curvature tensors, parallel transport, bundles). However, in order to facilitate these tools, one normally assumes access to the manifold’s atlas of charts (i.e., local parametrizations). In recent years, manifold-based modeling has permeated into data analysis as well, usually in order to avoid working in high dimensions. However, in these data-driven models, charts are not accessible and the only information at hand is the samples themselves. As a result, a common practice in Manifold Learning is to project the data into a lower-dimensional Euclidean domain, while maintaining some notion of distance (e.g., geodesic or diffusion).

In this talk, we introduce an alternative approach named the Manifold Moving Least-Squares (Manifold-MLS) that, given a finite set of samples, reconstructs an atlas of charts and provides an approximation of the manifold itself. Under certain (non-restrictive) sampling assumptions, we prove that the Manifold-MLS produces a smooth Riemannian manifold approximating the sampled one, even in case of noisy samples. We show that the approximation converges to the sampled manifold in case the number of samples tends to infinity, and show the exact convergence rates. Furthermore, we show that in the case of clean samples we obtain a high order of approximation (up to the degree of smoothness) for both the manifold and geodesic distances (i.e., the procedure we describe is nearly isometric when the resolution is sufficiently fine).

 


03/09
Speaker: Théo Lacombe
Title: Statistical tools for topological descriptors using optimal transport (defense rehearsal)

Topological data analysis (TDA) allows one to extract rich information from structured data (such as graphs or time series) that occurs in modern machine learning problems. This information will be represented as descriptors such as persistence diagrams, which can be described as point measures supported on a half-plane. While persistence diagrams are not elements of a vector space, they can still be compared using partial matching metrics. The similarities between these metrics and those routinely used in optimal transport—another field of mathematics—are known for long, but a formal connection between these two fields is yet to come. The purpose of this thesis is to clarify this connection and develop new theoretical and computational tools to manipulate persistence diagrams, targeting statistical applications. First, we show how optimal partial transport with boundary, a variation of classic optimal transport theory, provides a formalism that encompasses standard metrics in TDA. We then showcase the benefits of this connection in different situations: a theoretical study and the development of an algorithm to perform fast estimation of barycenters of persistence diagrams, the characterization of continuous linear representations of persistence diagrams and how to learn such representations using a neural network, and eventually a stability result in the context of linearly averaging random persistence diagrams.

Previously

For Saclay
For Sophia

 

Comments are closed.