Return to Team members

Homepage – Alix LHERITIER

Alix LHERITIER

Inria Sophia Antipolis – Méditerranée
2004, route des Lucioles – BP93
06902 Sophia Antipolis

E-mail: Alix.Lheritier@inria.fr
Telephone: +33.4.92.38.79.73

Thesis

Subject: Nonparametric methods for learning and detecting multivariate statistical dissimilarity.

Abstract

In this thesis, we study problems related to learning and detecting multivariate statistical dissimilarity, which are of paramount importance for many statistical learning methods nowadays used in an increasingly number of fields.  This thesis makes three contributions related to these problems.

The first contribution introduces a notion of multivariate nonparametric \emph{effect size} shedding light on the nature of the dissimilarity detected between two datasets. Our two step method first decomposes a dissimilarity measure (Jensen-Shannon divergence) aiming at localizing the dissimilarity in the data embedding space, and then proceeds by aggregating points of high discrepancy and in spatial proximity into clusters.

The second contribution presents the first sequential nonparametric two-sample test. That is, instead of being given two sets of observations of fixed size, observations can be treated one at a time and, when strongly enough evidence has been found, the test can be stopped, yielding a more flexible procedure while keeping guaranteed type I error control. Additionally, under certain conditions, when the number of observations tends to infinity, the test has a vanishing probability of type II error.

The third contribution presents a semi-supervised sequential nonparametric two-sample test, this time based on random spatial partitioning.  The test also exhibits a vanishing type II error, under certain conditions on the partitions.  This test automatically detects multi-scale differences.  Processing a new observation has logarithmic time complexity w.r.t. the unlabeled training dataset size, and negligible memory footprint, which makes it suitable for streaming data.

Keywords

Statistics, Information theory, Jensen-Shannon divergence, Data analysis, Data comparison, Point clouds, Nonparametric two-sample test, Effect size, Divergence, Conditional probability estimation, Regression, Topological persistence, Hypothesis testing, Bayes factor, Sequential prediction, Bayesian mixtures, Online learning, Streaming