# Workshop GUDHI / TOP DATA – October 20-24, 2015

## Keynote Speakers:

• Jessica CISEWSKI, Department of Statistics, Yale University, USA
“Persistent Homology in Astrostatistics”
Abstract:
Data exhibiting complicated spatial structures are common in many areas of science (e.g. cosmology, biology), but can be difficult to analyze. Persistent homology offers a promising way to represent, visualize, and interpret complex data by extracting topological features, which can be used to infer properties of the underlying structures.  There are a number of problems in astrostatistics that have benefited, or could benefit, from inference methods developed from these topological summaries.  I will discuss several of these problems in astrostatistics along with some possibilities for carrying out hypothesis tests.
• Olivier GUEDON, Université Paris-Est Marne-la-Vallée, France
“On concentration phenomena in high dimensional geometry”
• Axel MUNK, IMS Göttingen, Germany
“Multiscale Change Point Inference”
Abstract: SMUCE (Simultaneous MUltiscale Change-point Estimator) recovers an unknown step function from noisy data  by minimizing the number of change-points over the acceptance region of a multiscale statistical test. The probability of overestimating the true number of change-points K is controlled by the asymptotic null distribution of the multiscale test statistic. Further, we derive exponential bounds for the probability of underestimating K. Balancing these quantities allows to maximize the probability of correctly estimating K. Based on these bounds, we construct honest confidence sets for the unknown step function and its change-points. It is shown that SMUCE asymptotically achieves the optimal detection rate of vanishing signals in a multiscale setting. We illustrate how dynamic programming techniques can be employed for efficient computation of estimators and confidence regions. The performance of the proposed multiscale approach is illustrated by simulations and in several applications including ion channel recordings, CGH array analysis, and photoemission spectroscopy. Several extensions are discussed including topological data analysis for signals.

## Talks:

• Alfredo Hubard, Inria Sophia
“Non embeddability of complexes in Euclidean space”
Abstract: The following conjecture has fascinated me for a long time:Let X be a simplicial complex and f_i(X) the number of i-dimensional faces of X. There exist a constant c_d>0 such that if f_d(X)>c_d f_{d-1}(X), then for every continuous map g: X to R^2d there are two disjoint faces \sigma and \tau, such that g(\sigma)\cap g(\tau) is not empty. The only known case of this conjecture is c_1=6, for which follows easily from Euler’s formula and a double counting argument. I will explain the main results around this conjecture and reduce it to a different problem.

• Ruqi Huang , Inria Saclay
“Discover differences between networks using functional map”
• Bertrand Michel, Université Paris VI
“Robust geometric inference with the distance to measure”
• Max Sommerfield , IMS Göttingen, Germany
• Alba Chiara de Vitis, Inria Sophia
“Learning a Mixture of Distributions using Concentration of Measure”
• Aurélie Fischer , Université Paris Diderot
• Kunal Dutta, Inria Sophia
“On the Gaussian Correlation Conjecture”
Abstract:
The Gaussian correlation conjecture, studied since 1955, says that for any isotropic gaussian measure $\mu=\mu_n$ on $\mathbb{R}^n$, given two convex bodies $K$ and $L$ – both symmetric about the origin,  $\mu(K\cap L) \geq \mu(K)\cdot\mu(L)$. An equivalent version states that given jointly gaussian random variables $X_1,\cdots,X_n$, $k\in [n]$, and $a\in \mathbb{R}^+$,
$$\prob{\max_{1\leq i \leq n} |X_i| \leq a} \geq \prob{\max_{i\leq k} |X_i| \leq a}\cdot\prob{\max_{k<i\leq n} |X_i| \leq a}.$$
An important special case, when $L$ is a symmetric slab (i.e. k=1), was proved independently by Khatri (1967) and Sidak (1968). I shall briefly survey some important partial results, and mention some applications.
Next, I shall present some recent work in progress on the conjecture, and also state and prove a stability result for the guassian measure, which partially extends the Khatri-Sidak lemma to the asymmetric case, and also partially extends a result of Szarek and Werner (1999).
This is joint work (in progress) with Arijit Ghosh (ISI-India) and Nabil Mustafa (Univ. Paris-Est).
• Harry Lang, Inria Saclay
“Accurate and Low-Space Techniques for k-Median Clustering on Insertion-Only Streams”
Abstract:
The k-median problem for insertion-only data streams is an active area of research.  In 2003, Charikar et al provided the first poly(k, log n)-space constant-approximation using the Online Facility Location algorithm of Meyerson.  However, the accuracy was quite poor (~5500 times the optimum).  In the current work, we introduce a new technique that maintains a low-constant approximation (~20) while requiring only O(k log n)- storage.
• Mathieu Carrière , Inria Saclay
“A theoretical framework for Mapper”
Abstract:
Mapper algorithm is a famous tool in Topological Data Analysis. In short, it allows to represent the topology of high dimensional data in the form of simplicial complexes through a covering of the image of some filter function defined on the data. In this talk, I will present recent progresses in the theoretical treatment of the classical Mapper algorithm. In particular, I will present its extension with simplicial posets, the so-called MultiNerve Mapper, that greatly alleviates the mathematical framework that is necessary to derive theoretical guarantees on the output. Formal treatments with schemes of proofs will be discussed, such as the connections with Reeb graphs for instance. I will also introduce a measure of dissimilarity between the MultiNerve Mapper objects, that is similar to the bottleneck distance, leading to stability results, while taking the covering into account. Finally, I will present different possible ways to implement the algorithm in the discrete case.
• Vincent Rouvreau, Inria Saclay
“The GUDHI Library”
• Thomas Bonis , Inria Saclay
“Guarantees for the Langevin Monte Carlo sampling algorithm in terms of Wasserstein distance.”
Abstract:
We study the problem of sampling from a density $\mu$ using the Langevin Monte Carlo algorithm. This algorithm samples from a measure $\hat{mu}$ by approximating a continuous diffusion process which has $\mu$ as stationary measure.  In this talk, I will briefly present known results regarding the theoretical guarantees and the complexity of this algorithm in terms of total variation distance for log-concave measures $\mu$. I will then provide a study of this algorithm in terms of Wasserstein distance which, as we will see,  allows a better analysis of the asymptotical convergence of the algorithm which hold under weaker assumptions on the target measures. We also show how our result can be used to obtain a dynamic discretization scheme for the algorithm.
• Housen Li, IMS Göttingen, Germany
“Variational Multiscale Nonparametric Regression of Smooth Functions”
Abstract:
We consider the problem of nonparametric regression of smooth functions for sub-Gaussian data. We propose and analyze a constrained variational approach, which we call the MultIscale Nemirovski-Dantzig (MIND) estimator. It minimizes a Sobolev norm under the constraint that the multiresolution norm of the residual is bounded by a threshold depending only on the sample size. We derive the convergence rates of MIND with respect to $L^q$-loss, $1 \le q \le \infty$, both almost sure and in expectation. A remarkable consequence is that MIND attains almost minimax optimal rates simultaneously for a large range of Sobolev and Besov classes, which provides certain adaptation of MIND. Our results are based on an approximate source condition in terms of distance function, and an interpolation inequality for the multiresolution norm and Sobolev norms. We also illustrate the performance of MIND on some data examples. This is a joint work together with Markus Grasmair (Norwegian University of Science and Technology, Norway), and Axel Munk (University of G\”{o}ttingen, Germany).
• Xavier Goaoc, UPEM
“On Generalized Heawood Inequalities for Manifolds: A Van Kampen-Flores-type Nonembeddability Result”
• Ilaria Giulini ,
“Spectral clustering and reproducing kernels”
• Marc Glisse , Inria Saclay
“Efficient and Compact Data Structures for Simplicial Complexes”
• Frédéric Cazals, Inria Sophia
“Beyond Two-sample-tests: Localizing Data Discrepancies in High-dimensional Spaces”
Abstract: Comparing two sets of multivariate samples is a central problem in data analysis. From a statistical standpoint, the simplest way to perform such a comparison is to resort to a non-parametric two-sample test (TST), which checks whether the two sets can be seen as i.i.d. samples of an identical unknown distribution (the null hypothesis). If the null is rejected, one wishes to identify regions accounting for this difference. This paper presents a two-stage method providing feedback on this difference, based upon a combination of statistical learning (regression) and computational topology methods. Consider two populations, each given as a point cloud in R^d. In the first step, we assign a label to each set and we compute, for each sample point, a discrepancy measure based on comparing an estimate of the conditional probability distribution of the label given a position versus the global unconditional label distribution. In the second step, we study the height function defined at each point by the aforementioned estimated discrepancy. Topological persistence is used to identify persistent local minima of this height function, their basins defining regions of points with high discrepancy and in spatial proximity. Experiments are reported both on synthetic and real data (satellite images and handwritten digit images), ranging in dimension from d = 2 to d = 784, illustrating the ability of our method to localize discrepancies. On a general perspective, the ability to provide feedback downstream TST may prove of ubiquitous interest in exploratory statistics and data science.
• Clément Levrard , Université Paris Diderot
“Haussdorff convergence rates for manifold reconstruction using the Tangential Delaunay complex”

## Registration:

For organisational reasons, please fill the registration form as soon as possible.

## Practical informations:

##### Accommodation:

The number of single room is limited.

Accommodation in double room for students.

Oher participants in double room or single, depending on availability.

##### Mission:

Agents from Inria Sophia: ERC GUDHI – Allocation 8349 – DB 04RECH6016-S

Agents from Inria Saclay: see with Christine – DB 11RECH1046-S

##### Sojourn:

The fees will be cover by the project from Tuesday October 20 (lunch included) til Saturday October 24 (lunch included).

##### Transportation:
###### From Saclay:

Plane: Tuesday October 20, Paris/Toulon available flights:  7:20 –> 8:40 or 8:40 –> 10:00

Taxi: Toulon Airport/Tour Fondue Giens – Around 25 € – it takes about 20 mn

Bus: Toulon Airport/Tour Fondue Giens – Lines : 63 then 67 – it takes about 40 mn. More information

###### From Sophia:

By car: A8 exit  Hyères/Toulon/Le Luc –  around 1h45 mn

You can leave your vehicule in the parking near the jetty of Tour Fondue – Cost around 15 € / 24h

By train: from Nice to Toulon Station around 1h40 mn – from Antibes around 1h20 mn

from Nice to Hyères Station around 3 h – from Antibes around 2h40 mn

Bus: Toulon station/Tour Fondue – Lines 9-39 then 67 – more than 2 h.

Hyères station/Tour Fondue Giens – Line 67 – It takes about 20 mn.