Data-science with statistical learning

Data science with statistical learning

Some of the research at Soda is on the statistical foundations of machine learning, in particular for problem important to data science. The goals here are to use machine-learning models as non-parametric estimators for common problems in data science. Beyond mere prediction accuracy, questions of statistical control arise.

Statistical learning with missing values

Statistical inference with missing values has been studied for decades, but modern machine-learning practice brings new trade-offs. In particular, we have shown that the classical view on imputation may not give the best-performing predictors, and that missing-not-at-random settings could be tackled by machine learning models.

Machine learning for causal inference

Modern causal inference builds on estimating response function, for treated and non treated individuals, or probability of treatment or trial inclusion. We study the use of machine-learning models to estimate these quantities. Indeed, as we deal with increasingly complex data, such as that in Electronic Health Records, simple parametric models are no longer enough to leverage the data: the data is made of multiple tables, with many missing values and non-normalized text inputs.

One specific problem that we have focused on is that of generalizing a effect inferred on a study sample with a selection bias compared to the target population. This question is related to external validity of a study.

Publications

Missing values

Publications HAL titre missing de gael varoquaux

titre
On the consistency of supervised learning with missing values
auteur
Julie Josse, Jacob M. Chen, Nicolas Prost, Erwan Scornet, Gaël Varoquaux
article
2024
Accès au texte intégral et bibtex
https://hal.science/hal-02024202/file/main.pdf BibTex
titre
Causal effect on a target population: a sensitivity analysis to handle missing covariates
auteur
Bénédicte Colnet, Julie Josse, Gaël Varoquaux, Erwan Scornet
article
Journal of Causal Inference, 2022, 10 (1), pp.372-414. ⟨10.1515/jci-2021-0059⟩
Accès au texte intégral et bibtex
https://hal.science/hal-03473691/file/JCI-version-finale.pdf BibTex
titre
Benchmarking missing-values approaches for predictive models on health databases
auteur
Alexandre Perez-Lebel, Gaël Varoquaux, Marine Le Morvan, Julie Josse, Jean-Baptiste Poline
article
GigaScience, In press, ⟨10.1093/gigascience/giac013⟩
Accès au texte intégral et bibtex
https://hal.science/hal-03526292/file/Benchmarking%20missing-values%20approaches%20for%20predictive%20models%20on%20health%20databases.pdf BibTex
titre
What’s a good imputation to predict with missing values?
auteur
Marine Le Morvan, Julie Josse, Erwan Scornet, Gaël Varoquaux
article
NeurIPS 2021 – 35th Conference on Neural Information Processing Systems, Dec 2021, Virtual, France. ⟨10.48550/arXiv.2106.00311⟩
Accès au texte intégral et bibtex
https://hal.science/hal-03243931/file/LeMorvan2021_ImputeThenRegress.pdf BibTex
titre
NeuMiss networks: differentiable programming for supervised learning with missing values
auteur
Marine Le Morvan, Julie Josse, Thomas Moreau, Erwan Scornet, Gaël Varoquaux
article
NeurIPS 2020 – 34th Conference on Neural Information Processing Systems, Dec 2020, Vancouver / Virtual, Canada
Accès au texte intégral et bibtex
https://hal.science/hal-02888867/file/main.pdf BibTex
titre
Linear predictor on linearly-generated data with missing values: non consistency and solutions
auteur
Marine Le Morvan, Nicolas Prost, Julie Josse, Erwan Scornet, Gaël Varoquaux
article
AISTATS 2020 – International Conference on Artificial Intelligence and Statistics, Aug 2020, Online, France. pp.3165-3174
Accès au texte intégral et bibtex
https://hal.science/hal-02464569/file/aistats.pdf BibTex

Causal inference

Publications HAL de benedicte colnet

titre
Risk ratio, odds ratio, risk difference… Which causal measure is easier to generalize?
auteur
Bénédicte Colnet, Julie Josse, Gaël Varoquaux, Erwan Scornet
article
2024
Accès au bibtex
https://arxiv.org/pdf/2303.16008 BibTex
titre
Causal inference methods for combining randomized trials and observational studies: a review
auteur
Bénédicte Colnet, Imke Mayer, Guanhua Chen, Awa Dieng, Ruohong Li, Gaël Varoquaux, Jean-Philippe Vert, Julie Josse, Shu Yang
article
Statistical Science, In press
Accès au texte intégral et bibtex
https://hal.science/hal-03008276/file/main.pdf BibTex
titre
Decrease of the spatial variability and local dimension of the Euro-Atlantic eddy-driven jet stream with global warming
auteur
Robin Noyelle, Vivien Guette, Akim Viennet, Bénédicte Colnet, Davide Faranda, Andreia Hisi, Pascal Yiou
article
Climate Dynamics, 2023, ⟨10.1007/s00382-023-07022-z⟩
Accès au texte intégral et bibtex
https://hal.science/hal-04337045/file/Investigating_the_variability_of_the_North_Atlantic_jet_stream_using_dynamical_indicators_removed.pdf BibTex
titre
Reweighting the RCT for generalization: finite sample error and variable selection
auteur
Bénédicte Colnet, Julie Josse, Gaël Varoquaux, Erwan Scornet
article
2022
Accès au texte intégral et bibtex
https://hal.science/hal-03822662/file/covariate_selection_generalization_November2022.pdf BibTex
titre
Causal effect on a target population: a sensitivity analysis to handle missing covariates
auteur
Bénédicte Colnet, Julie Josse, Gaël Varoquaux, Erwan Scornet
article
Journal of Causal Inference, 2022, 10 (1), pp.372-414. ⟨10.1515/jci-2021-0059⟩
Accès au texte intégral et bibtex
https://hal.science/hal-03473691/file/JCI-version-finale.pdf BibTex

Comments are closed.