Population imaging relates features of brain images to rich descriptions of the subjects such as behavioral and clinical assessments. We use predictive analysis pipelines to extract functional biomarkers of brain disorders from large-scale datasets of resting-state functional Magnetic Resonance Imaging (R-fMRI), Magnetoencephalography (MEG) and Electroencephalography (EEG). We also built tools for automated data analysis which facilitate processing large datasets at scale. Some of our results are highlighted below.
NeuroLang: a probabilistic programming language for knowledge representation
NeuroLang is a probabilistic programming language for knowledge representation based on Datalog. We develop a query language which allows the user to write queries as a simple program in a straightforward non-composite syntax. NeuroLang merges heterogeneous datasets and their analysis with a near-English textual syntax using first order logic. It is a key property of the language to be intuitive for researchers outside computer science or not habituated to high-level programming. A general form of recursion allows expression of complex relationships from combining many datasets.
These methods enable NeuroLang to map cortical neuroanatomy by formally describing sulcal relationships, and for the incremental identification of cortical landmarks in a top-down order intuitive to neuroanatomists. NeuroLang allows the user to map subject-specific cortical landmarks with sulcus-specific queries. The primary sulci are gold-standard cortical landmarks and form the starting blueprint. From these, lower-level sulci can be identified from relations to the primaries and sulcal characteristics embedded into the language as predicates.
Figure 1: Primary sulci which form the starting blueprint from which to map the rest of the sulci, on the left hemisphere.
Figure 2: Second and third level sulci identified from primary sulci relations and sulcal predicates, on the left hemisphere.
Non-invasive imaging at the cellular level could lead us to quantify tissue cytoarchitecture, which has so far been accessible only through histology. Being able to characterize a tissue in vivo would help us define the cytoarchitectonic boundaries and link anatomical and functional information in the cerebral cortex.
Large-scale regression and classification with M/EEG
M/EEG provides a unique window on brain function as it captures non-invasively neuronal large-scale dynamics in real-time across multiple time scales from seconds to less than milli-seconds. However, it was only recently that the availability of large datasets has rendered M/EEG an option for population-level predictive modeling in clinical neuroscience research. While some important steps of preprocessing can nowadays be automated, cross-person, cross-protocol and cross-site learning introduce peculiar challenges related to domain-adaptation problems and hierarchical variance components.
In this research project we develop methods to improve learning across subjects with heterogenous data and conduct applied research in areas such as biomarker development and automation of medical diagnosis.
Brain rhythms are a key source of information in electrophysiological modeling. Yet, they are highly exposed to geometric distortions when learning from non-invasive recordings. Closing this gap typically requires biophysical source modeling, which depends on availability of MRI recordings and specialized human expertise. In this paper we provide consistency proofs for appropriate generative models for learning from brain rhythms. We demonstrate with empirical M/EEG data that the consistent regression models also turn out more robust in the light of model violations induced by the cross-subject learning setting.
This paper is supported by the joint Inserm-Inria 2018 project and has been accepted for NeurIPS 2019 and won the JDSE best paper paper award.
Robust cross-site and cross-protocol classification of EEG-based diagnosis in disorders of Consciousness
Diagnosis in severely brain injured patients is notoriously hard and heterogeneity of clinical data poses severe challenges for machine learning approaches to gain traction. In this work we demonstrate that combining multiple state-of-the-art EEG-markers of consciousness together with robust tree-based classification methods enable out-of-the box generalization between data from different EEG-protocols and hospitals.
The IMaging-PsychAtry Challenge (IMPAC) is a data challenge on Autism Sprectrum Disorder (ASD). ASD is a severe psychiatric disorder that affects 1 in 166 children.
There is evidence that ASD is reflected in individuals brain networks and anatomy. Yet, it remains unclear how systematic these effects are, and how large is their predictive remain unclear. The large cohort assembled here can bring some answers. Predicting autism from brain imaging will provide biomarkers and shed some light on the mechanisms of the pathology.
Here we propose to jointly predict behavioral scores that make up the individual profiles from neuroimaging data with multi-output models. This approach boosts prediction accuracy by capturing latent shared information across scores. We demonstrate the efficiency of multi-output models on two rs-fMRI datasets targeting different brain disorders (Alzheimer’s Disease and schizophrenia).
Here we demonstrate the feasibility of inter-site classification of neuropsychiatric status from functional connectivity, with an application to the Autism Brain Imaging Data Exchange (ABIDE) database, a large (N=871) multi-site autism dataset. We predict this neuropsychiatric status for participants from the same acquisition sites or different, unseen, ones. Prediction accuracy improves as we include more subjects, up to the maximum amount of subjects available.
Here we systematically study resting state functional-connectivity (FC)-based prediction across six different cohorts (ADNI, COBRE, ACPI, ADNIDOD, ABIDE, HCP), a total of 2000 individuals. We explore various methodological choices: ROI set selection, FC metrics, and non-linear and linear classifiers to compare and evaluate the dominant strategies for the sake of prediction accuracy. We observe that: (i) tangent embedding performs better than correlation or partial correlation in all datasets; (ii) l2 regularized classifiers SVC and Ridge are more accurate than SVC- l1 classifier; (iii) with regards to brain atlases, decomposition methods (ICA, DictLearn) are generally the best choices, though with striking cross-datasets differences.
Removing artifacts from EEG and MEG signals is a common and necessary step in data analysis and, unfortunately, has claimed significant investment of human attention in the past. I developed and evaluated a novel algorithm, termed autoreject, for detecting and handling contaminated MEG and EEG data segments. Autoreject is described in Jas et al 2017 and is readily usable in a “plug and play” manner in a wide array of situations and has been validated on more than 250 datasets featuring a reanalysis of the Human Connectome Project MEG data. Notably, its successful usage does not require deep understanding of the method as it uses machine learning technology to handle artifact rejection in a data-driven manner, hence, reducing human processing time. It will soon be disseminated through the MNE Software. The code is accessible on github.
Here we investigate scale-free dynamics in brain activity. The temporal structure of macroscopic brain activity displays both oscillatory and scale-free dynamics. While the functional relevance of neural oscillations has been largely investigated, both the nature and the role of scale-free dynamics in brain processing have been disputed. Relying on the wavelet-leader multifractal formalism, we estimated self-similarity and multifractal exponents from resting-state and task MEG recordings.