Controlling a confound in predictive models
Predictive models applied on brain images can extract imaging biomarkers of pathologies or psychological traits.
Successful prediction may be driven by a confounding effect that is correlated with the effect of interest.
For instance:
- fluid intelligence is strongly impacted by age
- age is well predicted from brain images
- hence successful prediction of fluid intelligence from brain images might have captured nothing more than a biomarker of aging.
We introduce a non-parametric approach to control for a confounding effect in a predictive model. It is based on crafting a test set on which the effect of interest is independent from the confounding effect.
We name this strategy “anti mutual-information subsampling”.
We demonstrate the approach with a large sample resting-state fMRI and psychometric data of healthy aging subjects (n = 608).
We show that using a linear model to remove the effect of age on the brain signals (“deconfounding”) leads to pessimistic scores, as previously reported. Anti mutual-information subsampling does not require to remove from the brain signals the shared variance between aging and fluid intelligence, and hence does not display this pessimistic behavior.
References
- Darya Chyzhyk, Gaël Varoquaux, Bertrand Thirion, Michael Milham. Controlling a confound in predictive models with a test set minimizing its effect. PRNI 2018 – 8th International Workshop on Pattern Recognition in Neuroimaging, Jun 2018, Singapore, Singapore. pp.1-4. The paper is available at HAL