Presentation
Core statistics and ML-development
Machine learning for inverse problems
From linear inverse problems to simulation based inference
Bi-level optimization
Reinforcement learning for active k-space sampling
Heterogeneous Data & Knowledge Bases
Learning coordinated representations
Probabilistic Knowledge Representation
Statistics and causal inference in high dimension
Conditional inference in high dimension
Post-selection inference on image data
Causal inference for population analysis
Machine Learning on spatio-temporal signals
Injecting structural priors with Physics-informed data augmentation
Learning structural priors with self-supervised learning
Revealing spatio-temporal structures in physiological signals
Application domains
MIND is driven by various applications in the data-driven neuroscience fields, which are largely part of the team members’ expertise.
Population modeling, large-scale predictive modeling
Unveiling Cognition Through Population Modelling
Imaging for health in the general population
Proxy measures of brain health
Studying brain age using electrophysiology
Proxy measures of mental health beyond brain aging
Mapping cognition & brain networks
Modeling clinical endpoints
EEG-based modeling of clinical endpoints
MRI-based modeling of clinical endpoints
From brain images and bio-signals to quantitative biology and physics
Activity
Results
New results
Accelerated acquisition in MRI
MRI is a widely used neuroimaging technique used to probe brain tissues, their structure and provide diagnostic insights on the functional organization as well as the layout of brain vessels. However, MRI relies on an inherently slow imaging process. Reducing acquisition time has been a major challenge in high-resolution MRI and has been successfully addressed by Compressed Sensing (CS) theory. However, most of the Fourier encoding schemes under-sample existing k-space trajectories which unfortunately will never adequately encode all the information necessary. Recently, the Mind team has addressed this crucial issue by proposing the Spreading Projection Algorithm for Rapid K-space sampLING (SPARKLING) for 2D/3D non-Cartesian T2* and susceptibility weighted imaging (SWI) at 3 and 7Tesla (T) 115, 116, 4. These advancements have interesting applications in cognitive and clinical neuroscience as we already have adapted this approach to address high-resolution functional and metabolic (Sodium 23Na) MR imaging at 7T – a very challenging feat 38, 40. Fig. 1 illustrates the SPARKLING application to anatomical, functional and metabolic imaging. Additionally, we have shown that this SPARKLING under-sampling strategy can be used to internally estimate the static B0 field inhomogeneities a necessary component to avoid the need for additional scans prior to correcting off-resonance artifacts due to these inhomogeneities. This finding has been published in 16 and a patent application has been filed in the US (US Patent App. 63/124,911). Ongoing extensions such as Minimized Off Resonance SPARKLING or MORE-SPARKLING tend to avoid such long-lasting processing by introducing a more temporally coherent sampling pattern in the k-space and then correcting these off-resonance effects already during data acquisition 42.
Accelerated acquisition in MRI using the optimization driven SPARKLING approach.
Left: Sketch explaining how the iterative SPARKLING algorithm works, alternating a gradient descent step to match a target sampling density in k-space with a projection step onto the hardware constraints (gradient magnitude and slew rate) trajectory-wise or shot-wise.
Right: Numerous applications where SPARKLING has been implemented in real MR pulse sequences for anatomical susceptibility weighted imaging (center) at 3 Tesla, Sodium imaging (bottom) and high resolution functional MRI (right) at 7Tesla. Pictures of the main team members (current and former PhD students) and external collaborators involved in this project.
Deep learning for MR image reconstruction and artifact correction
Although CS is used extensively, this approach suffers from a very slow image reconstruction process, which is detrimental to both patients and rapid diagnosis. To counteract this delay and improve image quality, as explained in Sec. 3.1 deep learning is used. In 2020 we secured the second spot in the 2020 brain fastMRI challenge (1.5 and 3T data) 131 with the XPDNet (Primal Dual Network where X plays the role of a magic card) deep learning architecture. Additionally, we assessed XPDNet’s transfer learning capacity on 7T NeuroSpin T2 images. However this DL reconstruction process was limited to Cartesian encoding, thus incompatible with our SPARKLING related technological push. In 2022, we went therefore further by proposing the NCPD-Net deep learning architecture for non-Cartesian imaging. NCPD-Net stands for Non-Cartesian Primal Dual Network and is able to handle both 2D and 3D non-Cartesian k-space data such as those collected with the full 3D SPARKLING encoding scheme 6. This progress allowed us to make a significant leap in image quality when implementing high resolution imaging while maintaining a high acceleration rate (e.g. 8-fold scan time reduction). Fig. 2 shows how NC-PDNet outperforms its competitors through an ablation study in 2D spiral and radial imaging and some preliminary results in 3D anatomical
Non-Cartesian Primal Dual network for MR image reconstruction.
Top Left: Table comparing the different 2D reconstruction models with respect the image quality metrics (PSNR and SSIM scores:) from 4-fold undersampled k-space data. The density compensated (DCp) adjoint is the extension of the zero-filled inverse Fourier transform to non-Cartesian data where we account for variable density sampling. DIP stands for the Deep Image Prior model. U-net is a standard convolutional neural network (CNN) combined here with the DCp mechanism.
bottom left: Box plot results showing that the best PSNR (in dB) SSIM scores are achieved by the 2D NC-PDNet architecture for both Proton Density (top row) weighted images and Fat Saturated PD images (bottom row) in both spiral and radial imaging.
Top right: Box plot associated with the 3D imaging results, confirming the superiority of NC-PDNet over the same competitors.
Bottom right: Sagittal view of an anatomical
Once the NC-PDNet architecture has been validated for 3D MR image reconstruction, it has then been combined with physics-driven model to speed up the correction of off-resonance effects induced by the inhomogeneities of the static magnetic field
Deep learning physics-informed correction of off-resonance artifacts during MR image reconstruction using the NC-PDnet architecture. Illustration on a single SWI volume (high resolution: 600
Left(red): Compressed Sensing (CS) reconstruction with no artifact correction, computed in 25min.
Center left: CS reconstruction using a reduced non-Fourier forward model to correct for
Middle(purple): Network or NC-PDnet based reconstruction without physics based knowledge to correct for
Center right(blue): NC-PDnet based reconstruction without physics based knowledge to correct for
Right(green): Classical CS reconstruction and off-resonance artifact correction based on an extended non-Fourier model, which costs 8 hours of computation. The best result is that in the blue frame.
Neuroimaging Meta-analyses with NeuroLang: Harnessing the Power of Probabilistic Logic Languages
Inferring reliable brain-behavior associations requires synthesizing evidence from thousands of functional neuroimaging studies through meta-analysis. However, existing meta-analysis tools are limited to investigating simple neuroscience concepts and expressing a restricted range of questions. Here, we expand the scope of neuroimaging meta-analysis by designing NeuroLang: a domain-specific language to express and test hypotheses using probabilistic first-order logic programming. This new result is a developement of our main objective on Probabilistic Knowledge Representation, described in Subsec. 3.2.2. By leveraging formalisms found at the crossroads of artificial intelligence and knowledge representation, NeuroLang provides the expressivity to address a larger repertoire of hypotheses in a meta-analysis, while seamlessly modeling the uncertainty inherent to neuroimaging data. We demonstrate the language’s capabilities in conducting comprehensive neuroimaging meta-analysis through use-case examples that address questions of structure-function associations. The schematic and results of this work can be seen in Fig. 4.
Specifically, we have produced three main advancements. First, we have formally defined and implemented a scalable query answering system which covers the functional requirements to address neuroimaging meta-analyses: NeuroLang. This system is described Zanitti et al. 29. Subsequently, we showed the capabilities of this language by performing a variety of neuroimaging meta-analyses which confirm and challenge current knowledge on the relationship between different regions and networks of the brain, and cognitive tasks 8. Finally, we have used NeuroLang to shed light onto the organization of the lateral prefrontal cortex 9, and, within the context of our project LargeSmallBrainNets (see 8.1.1) on the learning process for children with mathematical disabilities 13.
Efficient Bilevel optimization solvers
In recent years, bi-level optimization – solving an optimization problem that depends on the results of another optimization problem – has raised much interest in the machine learning community and, particularly, for hyper-parameter tuning, meta-learning or dictionary learning. This problem is made particularly hard by the fact that computing the gradient of the problem can be computationally expensive, as it requires to solve the inner problem as well as some large linear system. In the recent years, several solvers have been proposed to mitigate such cost, in particular by proposing ways to efficiently approximate the gradient. This year, we proposed two approaches that advanced the state-of-the-art solvers for such problems. First, we proposed a solver that is able to share inverse hessian estimate for the resolution of both the inner problem and the linear system, efficiently leveraging the structure of the problem to reduce the computations. This result was presented in Ramzi et al. 36. Then we proposed a stochastic solver for bi-level problems with variance reduction (e.g. see SABA in Fig. 5), and showed that such algorithm had the same convergence rate as its single level counter part. This algorithm was presented in Dagréou et al. 3 and it received an Oral (AR < 5%). These results are prerequisites to scale the resolution of bi-level optimization problems to larger applications such as the one in neurosciences.
Left: hyperparameter selection for
Right: data hyper-cleaning on MNIST with
Benchopt: Reproducible, efficient and collaborative optimization benchmarks
Numerical validation is at the core of machine learning research as it allows researchers in this field to assess the actual impact of new methods, and to confirm the agreement between theory and practice. Yet, the rapid development of the field poses several challenges: researchers are confronted with a profusion of methods to compare, limited transparency and consensus on best practices, as well as tedious re-implementation work. As a result, validation is often very partial, which can lead to wrong conclusions that slow down the progress of research. We proposed Benchopt, a collaborative framework to automate, reproduce and publish optimization benchmarks in machine learning across programming languages and hardware architectures (see Fig. 6). Benchopt simplifies benchmarking for the community by providing an off-the-shelf tool for running, sharing and extending experiments. To demonstrate its broad usability, we showcased benchmarks on many standard learning tasks including
Solveris run (in parallel) on each
Datasetand each variant of the
Objective. Results are exported as a CSV file that is easily shared and can be automatically plotted as interactive HTML visualizations or PDF figures.
Comprehensive decoding mental processes from Web repositories of functional brain images
Associating brain systems with mental processes requires statistical analysis of brain activity across many cognitive processes. These analyses typically face a difficult compromise between scope—from domain-specific to system-level analysis—and accuracy. Using all the functional Magnetic Resonance Imaging (fMRI) statistical maps of the largest data repository available, we trained machine-learning models that decode the cognitive concepts probed in unseen studies. For this, we leveraged two comprehensive resources: NeuroVault — an open repository of fMRI statistical maps with unconstrained annotations — and Cognitive Atlas — an ontology of cognition. We labeled NeuroVault images with Cognitive Atlas concepts occurring in their associated metadata. We trained neural networks to predict these cognitive labels on tens of thousands of brain images. Overcoming the heterogeneity, imbalance and noise in the training data, we successfully decoded more than 50 classes of mental processes on a large test set. This success demonstrates that image-based meta-analyses can be undertaken at scale and with minimal manual data curation. It enables broad reverse inferences, that is, concluding on mental processes given the observed brain activity.
Decoding exactly-matched labels.We evaluated the AUC of the NNoD model on 37 labels matched in the IBC collection, after training it to decode 96 labels across collections. On the top, we show decoding maps for some example terms. Terms that are well decoded such as
place maintenancehave meaningful maps, whereas terms such as
working memorywhose neural correlates are poorly captured get low AUC scores. As the decoding maps do not have a meaningful scale, we threshold them arbitrarily at the 95
thpercentile for visualization. Using pre-trained GCLDA and NeuroSynth models, we compared NNoD results for the labels that also appear in the vocabulary recognized by these models (NNoD AUCs for terms in the vocabulary intersections are shown in light orange). Furthermore, NNoD outperforms other methods for most labels.
Notip: Non-parametric True Discovery Proportion control for brain imaging
Cluster-level inference procedures are widely used for brain mapping. These methods compare the size of clusters obtained by thresholding brain maps to an upper bound under the global null hypothesis, computed using Random Field Theory or permutations. However, the guarantees obtained by this type of inference-i.e. at least one voxel is truly activated in the cluster-are not informative with regards to the strength of the signal therein. There is thus a need for methods to assess the amount of signal within clusters; yet such methods have to take into account that clusters are defined based on the data, which creates circularity in the inference scheme. This has motivated the use of post hoc estimates that allow statistically valid estimation of the proportion of activated voxels in clusters. In the context of fMRI data, the All-Resolutions Inference framework introduced in 148 provides post hoc estimates of the proportion of activated voxels. However, this method relies on parametric threshold families, which results in conservative inference. In this paper, we leverage randomization methods to adapt to data characteristics and obtain tighter false discovery control. We obtain Notip: a powerful, non-parametric method that yields statistically valid estimation of the proportion of activated voxels in data-derived clusters. Numerical experiments demonstrate substantial power gains compared with state-of-the-art methods on 36 fMRI datasets. The conditions under which the proposed method brings benefits are also discussed.
Comparison of the number of detections between ARI, calibrated Simes and learned template on fMRI data.Considering brain activity difference for a pair of functional Magnetic Renance Imaging contrasts
”look negative cue”vs
”look negative rating”, we compute the largest possible region such that False Discovery Proportion is controled, for the three possible templates: All-resolution inference, calibrated Simes template and learned template (our solution). Notice that the number of detections is markedly higher (+ 77%) using the learned template compared to the calibrated Simes template, and almost three times that of all-resolution inference procedure.
Data augmentation for machine learning on EEG
The use of deep learning for electroencephalography (EEG) classification tasks has been rapidly growing in the last years, yet its application has been limited by the relatively small size of EEG datasets. Data augmentation, which consists in artificially increasing the size of the dataset during training, can be employed to alleviate this problem. While a few augmentation transformations for EEG data have been proposed in the literature, their positive impact on performance is often evaluated on a single dataset and compared to one or two competing augmentation methods. In two works published in 2022 we have made progress towards there usage for EEG research. First in 28, we have evaluated 13 data augmentation approaches through a unified and exhaustive analysis in two applicative contexts (Sleep medicine and BCI systems). We have demonstrated that employing the adequate data augmentations can bring up to 45% accuracy improvements in low data regimes compared to the same model trained without any augmentation. Our experiments also show that there is no single best augmentation strategy, as the good augmentations differ on each task and dataset. This brings us towards our second major contribution in this topic. In 37, 45, we proposed two innovative approaches to automatically learn augmentation policies from data. The AugNet method published at NeurIPS 2022 is illustrated in 9. In this model the parameters of the augmentation policy are learnt end-to-end with a supervised task and back-propagation, and doing so reveal the invariance present in the data.
General architecture of AugNet.Input data is copied C times and randomly augmented by the augmentation layers forming the augmentation module. Each copy is then mapped by the trunk model f, whose predictions are averaged by the aggregation module. Parameters of both f and the augmentation layers are learned together from the training set.
Language processing in deep neural networks and the human brain
Deep language algorithms, like GPT-2, have demonstrated remarkable abilities to process text, and now constitute the backbone of automatic translation, summarization and dialogue. However, whether and how these models operate in a way that is similar to the human brain remains controversial. In 12, we showed that the representations of GPT-2 not only map onto the brain responses to spoken stories, but they also predict the extent to which subjects understand the corresponding narratives. To this end, we analyzed 101 subjects recorded with functional Magnetic Resonance Imaging while listening to 70 min of short stories. We then fit a linear mapping model to predict brain activity from GPT-2’s activations. Doing so, we showed that this mapping reliably correlates with subjects’ comprehension scores as assessed for each story. Overall, this study illustrated in 10 shows how deep language models help clarify the brain computations underlying language comprehension.
While this latter work offers interesting insights in how the brain processes language, it does not address the question of how it learns it. Indeed, while several deep neural networks have been shown to generate activations similar to those of the brain in response to the same input, these algorithms remain largely implausible: they require (1) extraordinarily large amounts of data, (2) unobtainable supervised labels, (3) textual rather than raw sensory input, and / or (4) implausibly large memory (e.g. thousands of contextual words). Focusing on the issue of speech processing, in 32 we tested if self-supervised algorithms trained on the raw waveform constitute a promising candidate. Specifically, we compared a recent self-supervised architecture, Wav2Vec 2.0, to the brain activity of 412 English, French, and Mandarin individuals recorded with functional Magnetic Resonance Imaging (fMRI), while they listened to 1h of audio books. With this work, we showed that this algorithm learns brain-like representations with as little as 600 hours of unlabelled speech – a quantity comparable to what infants can be exposed to during language acquisition. Second, its functional hierarchy aligns with the cortical hierarchy of speech processing. Third, different training regimes reveal a functional specialization akin to the cortex: Wav2Vec 2.0 learns sound-generic, speech-specific and language-specific representations similar to those of the prefrontal and temporal cortices. These elements, resulting from the largest neuroimaging benchmark to date, show how self-supervised learning can account for a rich organization of speech processing in the brain.
Deep language algorithms predict semantic comprehension from brain activity.
A.101 subjects listen to narratives (70 min of unique audio stimulus in total) while their brain signal is recorded using functional MRI. At the end of each story, a questionnaire is submitted to each subject to assess their understanding, and the answers are summarized into a
comprehension scorespecific to each (narrative, subject) pair (grey box). In parallel (blue box on the left), we measure the mapping between the subject’s brain activations and the activations of GPT2, a deep network trained to predict a word given its past context, both elicited by the same narrative. To this end, a linear spatio-temporal model is fitted to predict the brain activity of each voxel, given GPT2 activations as input. The degree of mapping, called “
brain score” is defined for each voxel as the Pearson correlation between predicted and actual brain activity on held-out data.
B.Brain scores (fMRI predictability) of the activations of the eighth layer of GPT2. Only significant regions are displayed.
C.Brain scores, averaged across fMRI voxels, for different activation spaces: phonological features (word rate, phoneme rate, phonemes, tone and stress, in green), the non-contextualized word embedding of GPT2 (“Word”, light blue) and the activations of the contextualized layers of GPT2 (from layer one to layer twelve, in blue).
D.Comprehension and GPT2 brain scores, averaged across voxels, for each (subject, narrative) pair. In red, Pearson’s correlation between the two (denoted
E.Correlations (
F.Correlation scores (
G.Relationship between the average GPT2-to-brain mapping (eighth layer) per region of interest (similar to B.), and the corresponding correlation with comprehension (
Activity reports
Overall objectives
The Mind team, which finds its origin in the Parietal team, is uniquely equipped to impact the fields of statistical machine learning and artificial intelligence (AI) in service to the understanding of brain structure and function, in both healthy and pathological conditions.
AI with recent progress in statistical machine learning (ML) is currently aiming to revolutionize how experimental science is conducted by using data as the driver of new theoretical insights and scientific hypotheses. Supervised learning and predictive models are then used to assess predictability. We thus face challenging questions like Can cognitive operations be predicted from neural signals? or Can the use of anesthesia be a causal predictor of later cognitive decline or impairment?
To study brain structure and function, cognitive and clinical neuroscientists have access to various neuroimaging techniques. The Mind team specifically relies on non-invasive modalities, notably on one hand, magnetic resonance imaging (MRI) at ultra-high magnetic field to reach high spatial resolution and, on the other hand, electroencephalography (EEG) and magnetoencephalography (MEG), which allow the recording of electric and magnetic activity of neural populations, to follow brain activity in real time. Extracting new neuroscientific knowledge from such neuroimaging data however raises a number of methodological challenges, in particular in inverse problems, statistics and computer science. The Mindproject aims to develop the theory and software technology to study the brain from both cognitive to clinical endpoints using cutting-edge MRI (functional MRI, diffusion weighted MRI) and MEG/EEG data. To uncover the most valuable information from such data, we need to solve a large panoply of inverse problems using a hybrid approach in which machine or deep learning is used in combination with physics-informed constraints.
Once functional imaging data is collected the challenge of statistical analysis becomes apparent. Beyond the standard questions (Where, when and how can statistically significant neural activity be identified?), Mind is particularly interested in addressing driving effect or the cause of such activity in a given cortical region. Answering these basic questions with computer programs requires the development of methodologies built on the latest research on causality, knowledge bases and high-dimensional statistics.
The field of neuroscience is now embracing more open science standards and community efforts to address the referenced to as “replication crisis” as well as the growing complexity of the data analysis pipelines in neuroimaging. The Mindteam is ideally positioned to address these issues from both angles by providing reliable statistical inference schemes as well as open source software that are compliant with international standards.
The impact of Mindwill be driven by the data analysis challenges in neuroscience but also by the fundamental discoveries in neuroscience that presently inspire the development of novel AI algorithms. The Parietal team has proved in the past that this scientific positioning leads to impactful research. Hence, the newly created Mind team formed by computer scientists and statisticians with a deep understanding of the field of neuroscience, from data acquisition to clinical needs, offers a unique opportunity to expand and explore more fully uncharted territories.