**Usual day**: Tuesday at 11.00.

**Place**: Inria Lille – Nord Europe.

**How to get there**: en français, in english.

**Organizer**: Hemant Tyagi

**Calendar feed**: iCalendar (hosted by the seminars platform of University of Lille)

Most slides are available: check past sessions and archives.

**Archives**: 2021-22, 2020-2021, 2019-2020, 2018-2019, 2017-2018, 2016-2017, 2015-2016, 2014-2015, 2013-2014

## Upcoming

## Gaurav Dhar

**Date**: April 30, 2024 **(Tuesday)** at **11h (online seminar****)**

**Affiliation**: Michigan Tech Research Institute (MTRI)

**Webpage**: Link

**Title**: An Introduction to Calibration Methods

**Abstract**: The goal of standard classification algorithms is to classify an object into one of several classes with reasonable accuracy. However, many applications not only require high accuracy but also a reliable estimate of

*predictive uncertainty*, i.e. how well a classifier is aware of what it does not know. In this talk, I will focus on

*calibration methods*which attempt to match the class probability of the classifier with the empirical accuracy of its prediction. I will give some examples from my recent work with SAR (Synthetic Aperture Radar) image data

## Past talks

## Steffen Grünewälder

**Date**: January 16, 2024 **(Tuesday)** at **10.30 (meeting room, building M2, Univ. Lille****)**

**Abstract**: I will discuss some of my recent results on compressing the empirical measure in the context of finite dimensional reproducing kernel Hilbert spaces (RKHSs). The aim is to significantly reduce the size of the sample while preserving minimax optimal rates of convergence. Such a reduction in size is of crucial importance when working with kernel methods in the context of large-scale data since kernel methods scale poorly with the sample size. In the RKHS context, an embedding of the empirical measure is contained in a convex set within an RKHS and can be approximated by using convex optimization techniques. Such an approximation gives rise to a small core-set of data points. A key quantity that controls the size of such a core-set is the size of the largest ball that fits within the convex set and which is centred at the embedding of the empirical measure. I will give an overview of how high probability lower bounds on the size of such a ball can be derived before discussing how the approach can be adapted to standard problems such as non-linear regression. (The talk will be based on an extended version of https://arxiv.org/pdf/2204.08847.pdf).

**Maria Alonso Pena**

**Date**: October 10, 2023 **(Tuesday)** at **11.00 (Room A11****)**

**Abstract**: In this work we study the estimation of the regression multifunction when considering circular data. Circular observations are data that can be expressed as points of the unit circumference, such as angles or directions. We will start the talk by introducing the basic concepts regarding these observations and showing why classical statistical and inferential tools are not appropriate for circular variables.

**Sebastian Kühnert**

**Date**: September 12, 2023 **(Tuesday)** at **11.00 (Room A21****)**

**Affiliation**: UC Davis

**Webpage**: Link

**Title**: Novelties in operator estimation of significant classes of functional time series

**Abstract**: Invertible linear processes naturally occur in functional time series analysis. Knowledge of the operators in the linear and inverted representations is of high interest, so consistent operator estimates are of great importance. Explicit asymptotic upper bounds for the full operators in the linear as well as for the finite-dimensional projections and the full operators in the inverted representation of Hilbert space-valued processes have recently been derived.

This talk deals with the novelties of our current article in which consistent estimates for the finite-dimensional projections and the full operators under milder conditions, both in the linear and inverted representations of Hilbert space-valued invertible linear processes are deduced. We also derive exact constants that appear in the consistency results. Moreover, based on these results, we derive consistency results for operator estimates for Hilbert space-valued MA, AR, and ARMA processes, for the finite-dimensional projections and the full operators, with explicit rates and constants.

In this presentation, we also review other interesting functional time series models where our results are potentially applicable, namely those with periodic character, with conditional heteroskedasticity, and with spatio-temporal structure.

**Date**: July 11, 2023 **(Tuesday)** at **11.00 (Room **A00**)**

**Affiliation**: University College Dublin

**Webpage**: Link

**Title**: Physics-Informed Functional Data Analysis

**Abstract**: Functional Data Analysis (FDA) has emerged as a powerful statistical framework for analyzing data collected in the form of curves, surfaces, or more generally, functions. It has found applications in various fields such as medicine, biology, finance, and environmental sciences. However, in many real-world problems, the underlying processes generating the functional data are influenced by physical laws or principles that need to be incorporated into the analysis.

In this talk, we explore the integration of physics-based knowledge into FDA through the paradigm of Physics-Informed Functional Data Analysis (PIFDA). PIFDA aims to exploit the known or assumed physical laws governing the system under study to enhance the analysis and interpretation of functional data. By combining the principles of FDA with physics-based constraints, PIFDA provides a promising framework for extracting meaningful information, improving predictions, and uncovering hidden patterns within functional datasets.

We discuss the key concepts and methodologies in PIFDA, including the incorporation of physics-based constraints as regularization terms, the integration of physical laws as differential equations or partial differential equations, and the utilization of domain knowledge to guide functional data analysis. We explore various examples from different domains, such as fluid dynamics, structural mechanics, and signal processing, where PIFDA has been successfully applied to address challenging problems.

**Date**: April 11, 2023 **(Tuesday)** at **11.00 (Room **A11**)**

**Affiliation**: University of Manchester, UK

**Webpage**: Link

**Title**: Multi-Means Gaussian Processes: A novel probabilistic framework for multi-correlated longitudinal data

**Abstract**: Modelling and forecasting time series, even with a probabilistic flavour, is a common and well-handled problem nowadays. However, suppose now that one is collecting data from hundreds of individuals, each of them gathering thousands of gene-related measurements, all evolving continuously over time. Such a context, frequently arising in biological or medical studies, quickly leads to highly correlated datasets where dependencies come from different sources (temporal trend, gene or individual similarities, for instance). Explicit modelling of overly large covariance matrices accounting for these underlying correlations is generally unreachable due to theoretical and computational limitations. Therefore, practitioners often need to restrict their analysis by working on subsets of data or making arguable assumptions (fixing time, studying genes or individuals independently, …). To tackle these issues, we recently proposed a new framework for multi-task Gaussian processes, tailored to handle multiple time series simultaneously. By sharing information between tasks through a mean process instead of an explicit covariance structure, this method leads to a learning and forecasting procedure with linear complexity in the number of tasks. The resulting predictions remain Gaussian distributions and thus offer an elegant probabilistic approach to deal with correlated time series. Finally, we will present the current development of an extended framework in which as many sources of correlation as desired can be considered (multiple individuals and genes could be handled jointly, for example). Intuitively, the approach relies on the definition of multiple latent mean processes, each being estimated with an adequate subset of data, and leads to an adaptive prediction associated with a mean process specific to the considered correlation

*.*

**Date**: March 21, 2023 **(Tuesday)** at **11.00 (**Plenary room**)**

**Affiliation**: University of Colorado at Boulder, USA

**Webpage**: Link

**Title**: Optimal nonparametric estimation of distribution functions, convergence rates for Fourier inversion theorems and applications

**Abstract**: We obtain two sets of results in this paper. First, we consider broad classes of kernel based nonparametric estimators of an unrestricted distribution function $F$. We develop improved lower and upper bounds for the bias of the estimators at points of continuity of $F$ as well as for jumps of $F$. Second, we provide new Fourier inversion theorems with rates of convergence and use them to obtain new convergence results for deconvolution estimators for distribution functions under measurement error.

**Date**: March 17, 2023 **(Friday)** at **11.00 (**Plenary room**)**

**Affiliation**: Instituto de Estadística, Chile

**Webpage**: Link

**Title**: Diagnostic tests for stocks with time-varying zero returns probability

**Abstract**: The first and second order serial correlations of illiquid stock’s price changes are studied, allowing for unconditional heteroscedasticity and time-varying zero returns probability. Depending on the set up, we investigate how the usual autocorrelations can be accommodated, to deliver an accurate representation of the price changes serial correlations. We shed some light on the properties of the different tools, by means of Monte Carlo experiments. The theoretical arguments are illustrated considering shares from the Chilean stock market.and Facebook 1-minute returns.

**Date**: February 14, 2023 **(Tuesday)** at **11.00 (**Plenary room**)**

**Affiliation**: Inria Lille

**Webpage**:

**Title**: Classification of multivariate functional data on different domains with Partial Least Squares approaches

**Abstract**: Classification (supervised learning) of multivariate functional data is considered when the elements of the underlying random functional vector are defined on different domains. In this setting, PLS classification and tree PLS-based methods for multivariate functional data are presented. From a computational point of view, we show that the PLS components of the regression with multivariate functional data can be obtained using only the PLS methodology with univariate functional data. This offers an alternative way to present the PLS algorithm for multivariate functional data.

**Date**: January 10, 2023 **(Tuesday)** at **11.00 **(Plenary room)

**Affiliation**: Université de Lille

**Webpage**: Link

**Title**: Clustering of recurrent event data

**Abstract**: Nowadays data are often timestamped, thus, when analysing the events which may occur several times (recurrent events), it is desirable to model the whole dynamics of the counting process rather than to focus on a total number of events. Such kind of data can be encountered in hospital re-admissions, disease recurrences or repeated failures of industrial systems. Recurrent events can be analysed in the counting process framework, as in the Andersen-Gill model, assuming that the baseline intensity depends on time and on covariates, as in the Cox model. However, observed covariates are often insufficient to explain the observed heterogeneity in the data.

**Date**: November 29, 2022 **(Tuesday)** at **11.00 **(online seminar)

**Affiliation**: IIT Delhi, India

**Webpage**: Link

**Title**: Graph Dimensionality Reduction with Guarantees and its

Applications

**Abstract**: Graph coarsening is a dimensionality reduction technique that

aims to learn a smaller-tractable graph while preserving the properties

of the original input graph. However, many real-world graphs also have

features or contexts associated with each node. The existing graph

coarsening methods do not consider the node features and rely solely on

a graph matrix, (eg., adjacency and Laplacian) to coarsen graphs. In this

talk, we introduce a novel optimization-based framework for graph

coarsening that takes both the graph matrix and the node features as the

input and learns the coarsened graph matrix and the coarsened feature

matrix jointly while ensuring desired properties. We also provide a

guarantee that the learned coarsened graph is $\epsilon \in (0, 1)$

similar to the original graph. Extensive experiments with both real and

synthetic benchmark datasets elucidate the efficacy and the

applicability of the proposed framework for numerous graph-based

applications including graph clustering, stochastic block model

identification, and graph summarization.This talk is based on the following work.

A Unified Framework for Optimization-Based Graph Coarsening

– https://arxiv.org/pdf/2210.00437.pdf

Serguei Dachian

**Date**: November 8, 2022 **(Tuesday)** at **11.00 **(Plenary room)

**Affiliation**: University of Lille

**Webpage**: Link

**Title**: On Smooth Change-Point Location Estimation for Poisson Processes and Skorokhod Topologies

**Abstract**: We consider the problem of estimation of the location of what we call “smooth change-point” from $n$ independent observations of an inhomogeneous Poisson process. The “smooth change-point” is a transition of the intensity function of the process from one level to another which happens smoothly, but over such a small interval, that its length $\delta_n$ is considered to be decreasing to 0 as $n$ goes to infinity.

We study the maximum likelihood estimator (MLE) and the Bayesian estimators (BEs), and show that there is a “phase transition” in the asymptotic behavior of the estimators depending on the rate at which $\delta_n$ goes to 0 ; more precisely, on if it is slower (slow case) or quicker (fast case) than $1/n$.

It should be noted that all these results were obtained using the likelihood ratio analysis method developed by Ibragimov and Khasminskii, which equally yields the convergence of polynomial moments of the considered estimators. On the other hand, for the study of the MLE, this method needs the convergence of the normalized likelihood ratio in some functional space, and up to the best of our knowledge, until now it was only applied using either the space of continuous functions equipped with the topology induced by the supremum norm, or the space of càdlàg functions equipped with the usual Skorokhod topology (called “J_1” by Skorokhod himself). However, we will see that in the fast case of our problem this convergence can not take place in neither of these topologies. So, in order to be able to apply the Ibragimov-Khasminskii method in this case, we extend it to use a weaker topology “M_1” (also introduced by Skorokhod).

### Cristian Preda

**Date**: October 25, 2022 **(Tuesday)** at **11.00 **(Plenary room)

**Affiliation**: University of Lille

**Webpage**: Link

**Title**: One dimensional scan statistics associated to some dependent models

**Abstract**: The one dimensional scan statistics is presented in the context of block-factor dependent models. The longest success run statistic is related to the scan statistics in this framework. An application to the movingaverage process is also presented.

The presentation is based on the papers :

1) Amarioarei, A.; Preda, C. One Dimensional Discrete Scan Statistics for Dependent Models and Some Related Problems. Mathematics 2020, 8, 576. https://doi.org/10.3390/math8040576 (open access)

2) G. Haiman, C. Preda (2013), One dimensional scan statistics generated by some dependent stationary sequences, Statistics and Probability Letters, Volume 83, Issue 5, 1457- 1463.

### Ayush Bhandari

**Date**: September 9, 2022 **(Friday)** at **11.00 **(Room A11)

**Affiliation**: Imperial College, London

**Webpage**: Link

**Title**: Digital Acquisition via Modulo Folding: Revisiting the Legacy of Shannon-Nyquist

**Abstract**: Digital data capture is the backbone of all modern day systems and “Digital Revolution” has been aptly termed as the Third Industrial Revolution. Underpinning the digital representation is the Shannon-Nyquist sampling theorem and more recent developments such as compressive sensing approaches. The fact that there is a physical limit to which sensors can measure amplitudes poses a fundamental bottleneck when it comes to leveraging the performance guaranteed by recovery algorithms. In practice, whenever a physical signal exceeds the maximum recordable range, the sensor saturates, resulting in permanent information loss. Examples include (a) dosimeter saturation during the Chernobyl reactor accident, reporting radiation levels far lower than the true value, and (b) loss of visual cues in self-driving cars coming out of a tunnel (due to sudden exposure to light).To reconcile this gap between theory and practice, we introduce a computational sensing approach—the Unlimited Sensing framework (USF)—that is based on a co-design of hardware and algorithms. On the hardware front, our work is based on a radically different analog-to-digital converter (ADC) design, which allows for the ADCs to produce modulo or folded samples. On the algorithms front, we develop new, mathematically guaranteed recovery strategies.

In the first part of this talk, we prove a sampling theorem akin to the Shannon-Nyquist criterion. Despite the non-linearity in the sensing pipeline, the sampling rate only depends on the signal’s bandwidth. Our theory is complemented with a stable recovery algorithm. Beyond the theoretical results, we also present a hardware demo that shows the modulo ADC in action.

Building on the basic sampling theory result, we consider certain variations on the theme that arise from practical implementation of the USF. This leads to a new Fourier-domain recovery algorithm that is empirically robust to noise and allows for recovery of signals upto 25 times modulo threshold, when working with modulo ADC hardware.

Moving further, we reinterpret the USF as a generalized linear model that motivates a new class of inverse problems. We conclude this talk by presenting a research overview in the context of sparse super-resolution, single-shot high-dynamic-range (HDR) imaging and sensor array processing.