MODAL Seminar, 2019-2020 (12 sessions)

Organizers: Alain Celisse and Hemant Tyagi

Vera Shalaeva

Date: May 12, 2020 (Tuesday) at 11.00 (Online seminar)
Affiliation: Inria Lille
Webpage:
Title: Improved PAC-Bayesian bounds for Linear Regression.
Abstract: This presentation will consider PAC-Bayesian theory as a framework to derive generalization guarantees for learning algorithms. We will make an overview of general PAC-Bayesian theorem and its variations for classification and regression problems. Further, we will make a particular focus on linear regression problem and introduce the tighter version of the PAC-Bayesian generalization bound as well as its extension, which does not assume data to be sampled independently.

Filippo Antonazzo

Date: April 28, 2020 (Tuesday) at 11.00 (Online seminar)
Affiliation: Inria Lille
Webpage:
Title: Estimation of Gaussian mixtures with binned data for huge datasets.
Abstract: Popularity of Gaussian mixture models is magnified by the regular increase of sample sizes. Indeed, it provides opportunity to reveal information previously out of scope. However, the volume of data leads to some issues related to prohibitive calculation times and also to high energy consumption and the need of high computational ressources. In this talk we will propose an estimation approach employing binned data derived directly from original ones (raw data).

This data transformation will be done imposing a grid on the sample space, adapting it optimally in order to give proper answer to such green computing issues while not harming the related estimation issues. We will consider binned data behaviour in the context of univariate Gaussian mixtures, included a numerical illustration and some theoretical advances. At the end of the talk there will be insights into multivariate mixtures, a natural but no trivial extension of the previous problem which will be a starting point to achieve a scalable method for more realistic unsupervised learning tasks.

Bernhard Stankewitz

Date: March 10, 2020 (Tuesday) at 11.00 (Plenary room)
Affiliation: HU Berlin
Webpage: Link
Title: Smoothed residual stopping for statistical inverse problems via truncated SVD estimation
Abstract: This work examines under what circumstances adaptivity for truncated SVD estimation can be achieved by an early stopping rule based on the smoothed residuals \| (A A^T)^(\alpha / 2) (Y – A /hat mu^(m)) \|^2. Lower and upper bounds for the risk are derived, which show that moderate smoothing of the residuals can be used to adapt over classes of signals with varying smoothness, while oversmoothing yields suboptimal convergence rates. The theoretical results are illustrated by Monte-Carlo simulations.

Alain Celisse

Date: February 4, 2020 (Tuesday) at 11.00 (Plenary room)
Affiliation: University of Lille and Inria Lille
Webpage: Link
Title: How smoothing can improve on the discrepancy principle for early stopping
Abstract: Early stopping rules refer to new strategies for choosing when to interrupt an iterative learning process (gradient descent,…). This question of deciding when to stop the iterations is of primary importance since a bad choice would result in poor statistical properties and/or additional unnecessary computations. In this talk the purpose is to describe the early-stopping challenge, illustrate the behavior of classical discrepancy-based stopping rules, and also discuss some of their deficiencies. This motivates considering smoothing-based strategies such as the one inspired from Tikhonov regularization. For such rules, we prove several theoretical (non-)asymptotic guarantees, and also illustrate their practical behavior on simulation experiments carried out by means of spectral filter algorithms.

Yaroslav Averyanov

Date: January 28, 2020 (Tuesday) at 11.00 (Room A11)
Affiliation: Inria Lille – Nord Europe
Webpage:
Title: Early stopping: some new ideas towards optimality
Abstract: Early stopping is a regularization technique based on choosing a stopping time for an iterative algorithm. The main advantage of this technique is significant reduction of computational resources and possibly better statistical performance. In this talk we will focus on nonparametric regression setting in reproducing kernel Hilbert space (RKHS). We perform empirical risk minimisation procedure in RKHS with two iterative learning algorithms: gradient descent and (iterative) kernel ridge regression. In order to avoid overfitting phenomenon for these two algorithms we will consider so-called discrepancy principle early stopping rule. We quantify its performance by means of oracle-type inequality and show that, under some mild conditions, this stopping rule is not appropriate for various kernel classes. In order to solve this problem we propose a new stopping rule called smoothed discrepancy principle and show that this rule could be proved optimal in minimax sense under some conditions. At the end of the talk there will be simulation experiments to validate the theoretical results.

Marianne Clausel

Date: January 21, 2020 (Tuesday) at 11.00 (Plenary room)
Affiliation: University of Lorraine
Webpage: Link
Title: Decision Tree for uncertainty measures
Abstract: The ensemble methods are popular machine learning techniques which are powerful when one wants to deal with both classification or prediction problems. A set of classifiers is constructed, and the classification or the prediction of a new data instance is done by tacking a weighted vote. These classifiers can be regression or classification trees. Practically, a tree is a piece-wise constant estimator on partitions obtained from the data. These partitions are induced by recursive dyadic split of the set of input variables. For example, CART (Classification And Regression Trees) is an algorithm which constructs a tree. The goal is to partition the space of input variable values in the most “homogeneous” K disjoint regions possible. More precisely, each partitioning value has to minimize a risk function. However, in practice, experimental measures can be observed with uncertainty. This work proposes to extend CART algorithm to these kind of data. We present an induced model adapted to uncertainty data and both a prediction and split rule for a tree construction taking into account the uncertainty of each quantitative observation from the data base. Joint work with Sami ALKHOURY, Myriam TAMI, Emilie DEVIJVER, Eric GAUSSIER and Total SA

Luxin Zhang

Date: January 14, 2020 (Tuesday) at 11.00 (Room A11)
Affiliation: Inria Lille – Nord Europe
Webpage:
Title: Optimal Transport and its Application to Domain Adaptation
Abstract: The optimal transport was first formalized by Monge in the 70s. The initial problem consists of finding the best plan to move a mass of dirt into a given hole while minimizing the effort. Recently, the problem and associated Wasserstein distance have attracted lots of attention and been used in different machine learning tasks. This talk will focus on the application of optimal transport to domain adaptation problems. In such a scenario, one is given a set of points sampled from a source distribution and wants to adapt them to match the distribution of the target domain. The optimal transport will be compared with other domain adaptation methods and some computational details will also be discussed.

Stephane Chretien

Date: December 3, 2019 (Tuesday) at 11.00 (Plenary Room)
Affiliation: University of Lyon 2
Webpage: Link
Title: A new and simpler approach to the analysis of Robust PCA
Abstract: In standard PCA, one collects n observed vectors x_1,. . . ,x_n in R^d and forms the matrix X = [x_1, . . . , x_n]. Then, one finds the best approximation of rank r by thresholding the singular values as prescribed, by the Eckart-Young theorem. One of the main drawback of PCA is that the singular vectors are very sensitive to outliers. Robust PCA (RPCA) is a kind of matrix decomposition which allows to recover a low rank matrix corrupted by both noise and outliers with potentially very large value. RPCA has been extensively used in a very wide range of applications from genetics to video processing. One of the main ideas behind Robust PCA is to reconstruct the matrices L_0 and S_0 by solving the convex programming problem

min ||L||_{∗} + λ ||S||_1 s. t. ||X − L − S|| ≤ η.

The original theoretical analysis of this estimator’s accuracy is well known to be technically very involved. The goal of the present work is to propose an elementary analysis of Robust PCA using the descent cone approach of Amelunxen et al. and a null space-type property on the eigenvectors of the original low rank matrix

Apoorv Vikram Singh

Date: October 15, 2019 (Tuesday) at 11.00 (Plenary Room)
Affiliation: Indian Institute of Science (IISc)
Webpage:
Title: Euclidean k-Means with Center Proximity
Abstract: Clustering is an important tool in any data science toolkit. Most popular clustering algorithms partition the given data into disjoint clusters by optimizing a certain global objective such as the k-means. The implicit assumption in doing so is that an optimal solution for this objective would recover the underlying ground truth clustering. However, many such objectives are NP-hard to optimize in the worst case, e.g., k-centre, k-median, k-means. Moreover, an optimal solution need not satisfy certain properties desired from the ground truth clustering, e.g., balance, stability. In practice, however, Lloyd’s algorithm, k-means++, and their variants perform well on most real-world data sets. This dichotomy between theoretical intractability and empirically observed efficiency has led to the CDNM thesis: “Clustering is Difficult only when it does Not Matter”! In most real-world data sets, the underlying ground-truth clustering is unambiguous and stable under small perturbations of data. We will highlight these issues with the example of k-means clustering. In this talk, we will explore a notion of stability called centre-proximity and give an algorithm and lower bounds for stable instances. This is joint work with Dr Amit Deshpande (Microsoft Research, India) and Dr Anand Louis (Indian Institute of Science, India).

Date: October 8, 2019 (Tuesday) at 11.00 (Plenary Room)
Affiliation: SINTEF
Webpage: Link.
Title: Machine learning in the real world
Abstract: Machine learning algorithms are flexible and powerful, but the data requirements are high and rarely met by the available data. Real world data is often medium sized (relative to problem side), noisy and full of missing values. At the same time, in order to deploy machine learning in industrial settings, they must be robust, explainable and have quantified uncertainties. I will show practical examples of these challenges from our recent projects and some case-by-case solutions, but also highlight remaining issues.

Han Bao

Date: September 23, 2019 (Monday) at 11.00 (Room A00)
Affiliation: University of Tokyo
Webpage: Link.
Title: Unsupervised Domain Adaptation Based on Source-guided Discrepancy
Abstract: Unsupervised domain adaptation is the problem setting where data generating distributions in the source and target domains are different, and labels in the target domain are unavailable. One important question in unsupervised domain adaptation is how to measure the difference between the source and target domains. A previously proposed discrepancy that does not use the source domain labels requires high computational cost to estimate and may lead to a loose generalization error bound in the target domain. To mitigate these problems, we propose a novel discrepancy called source-guided discrepancy (S-disc), which exploits labels in the source domain. As a consequence, S-disc can be computed efficiently with a finite sample convergence guarantee. In addition, we show that S-disc can provide a tighter generalization error bound than the one based on an existing discrepancy. Finally, we report experimental results that demonstrate the advantages of S-disc over the existing discrepancies.
Slides: Link

Michaël Fanuel

Date: September 11, 2019 (Wednesday) at 14.00 (Plenary Room)
Affiliation: KU Leuven
Webpage: Link.
Title: Landmark sampling, diversity and kernel methods
Abstract: In machine learning, there is a revived interest for kernel methods, e.g. for designing interpretable convolutional networks or in the context of Gaussian processes. More generally, in kernel-based learning, a central question concerns large scale approximations of the kernel matrix. A popular method for finding a low rank approximation of kernel matrices is the so-called Nystrom method, which relies on the sampling of ‘good’ landmark points in a dataset. We will discuss an approach for selecting ‘diverse’ landmarks with some theoretical guarantees. Our work makes a connection between kernelized Christoffel functions, ridge leverage scores and determinantal point processes.

MODAL Seminar, 2019-2020 (12 sessions)

Vera Shalaeva

Filippo Antonazzo

Bernhard Stankewitz

Alain Celisse

Yaroslav Averyanov

Marianne Clausel

Luxin Zhang

Stephane Chretien

Apoorv Vikram Singh

Han Bao

Michaël Fanuel

Posts

Categories

Archives