In many scientific fields, data acquisition devices have benefited of hardware improvement to increase the resolution of the observed phenomena, leading to ever larger datasets. While the dimensionality of the resulting measures has increased, the number of samples available is often limited, due to physical or financial limits. This is a problem when it comes to infer whether some of these measurements are important to fit a variable of interest: statistical inference on datasets with a huge number of features is indeed an open problem, yet it is needed in many fields of data science, where the model needs rigorous statistical assessment. ANR project FAST-BIG (efficient statistical testing for high-dimensional models) aims at developing theoretical results and practical estimation procedures that render statistical inference feasible in such hard cases. We will develop the corresponding software and assess novel inference schemes on two applications: genomics and brain imaging.
Link to the project’s website:https://project.inria.fr/fastbig/