Javascript must be enabled to continue!
Effect of data harmonization of multicentric dataset in ASD/TD classification
View through CrossRef
Abstract
Machine Learning (ML) is nowadays an essential tool in the analysis of Magnetic Resonance Imaging (MRI) data, in particular in the identification of brain correlates in neurological and neurodevelopmental disorders. ML requires datasets of appropriate size for training, which in neuroimaging are typically obtained collecting data from multiple acquisition centers. However, analyzing large multicentric datasets can introduce bias due to differences between acquisition centers. ComBat harmonization is commonly used to address batch effects, but it can lead to data leakage when the entire dataset is used to estimate model parameters. In this study, structural and functional MRI data from the Autism Brain Imaging Data Exchange (ABIDE) collection were used to classify subjects with Autism Spectrum Disorders (ASD) compared to Typical Developing controls (TD). We compared the classical approach (external harmonization) in which harmonization is performed before train/test split, with an harmonization calculated only on the train set (internal harmonization), and with the dataset with no harmonization. The results showed that harmonization using the whole dataset achieved higher discrimination performance, while non-harmonized data and harmonization using only the train set showed similar results, for both structural and connectivity features. We also showed that the higher performances of the external harmonization are not due to larger size of the sample for the estimation of the model and hence these improved performance with the entire dataset may be ascribed to data leakage. In order to prevent this leakage, it is recommended to define the harmonization model solely using the train set.
Research Square Platform LLC
Title: Effect of data harmonization of multicentric dataset in ASD/TD classification
Description:
Abstract
Machine Learning (ML) is nowadays an essential tool in the analysis of Magnetic Resonance Imaging (MRI) data, in particular in the identification of brain correlates in neurological and neurodevelopmental disorders.
ML requires datasets of appropriate size for training, which in neuroimaging are typically obtained collecting data from multiple acquisition centers.
However, analyzing large multicentric datasets can introduce bias due to differences between acquisition centers.
ComBat harmonization is commonly used to address batch effects, but it can lead to data leakage when the entire dataset is used to estimate model parameters.
In this study, structural and functional MRI data from the Autism Brain Imaging Data Exchange (ABIDE) collection were used to classify subjects with Autism Spectrum Disorders (ASD) compared to Typical Developing controls (TD).
We compared the classical approach (external harmonization) in which harmonization is performed before train/test split, with an harmonization calculated only on the train set (internal harmonization), and with the dataset with no harmonization.
The results showed that harmonization using the whole dataset achieved higher discrimination performance, while non-harmonized data and harmonization using only the train set showed similar results, for both structural and connectivity features.
We also showed that the higher performances of the external harmonization are not due to larger size of the sample for the estimation of the model and hence these improved performance with the entire dataset may be ascribed to data leakage.
In order to prevent this leakage, it is recommended to define the harmonization model solely using the train set.
Related Results
Cognitive and ASD Symptom Profiles in Comorbid Down syndrome and Autism Spectrum Disorder
Cognitive and ASD Symptom Profiles in Comorbid Down syndrome and Autism Spectrum Disorder
Individuals with a comorbid presentation of Down syndrome (DS) and autism spectrum disorder (ASD) are a misunderstood and vulnerable group who have received little attention in the...
Identifying genomic risk factors for neurodevelopmental disorders using machine learning
Identifying genomic risk factors for neurodevelopmental disorders using machine learning
Neurodevelopmental disorders (NDDs) are a complex grouping of conditions arising in childhood relating to altered development and function of the brain. The primary conditions clas...
44 Functional Connectivity In The Default Mode Network Of ASD and ADHD
44 Functional Connectivity In The Default Mode Network Of ASD and ADHD
Objective:Autism Spectrum Disorders (ASD) and Attention Deficit Hyperactivity Disorder (ADHD) are neurodevelopmental disorders with overlapping symptomatology and shared genetic ma...
GW24-e1282 A noninvasive sizing method to choose fitted atrial septal defect occluder by transthoracic echocardiography in adults with secundum atrial septal defects
GW24-e1282 A noninvasive sizing method to choose fitted atrial septal defect occluder by transthoracic echocardiography in adults with secundum atrial septal defects
Objectives
In our clinical practice, we try to find a feasible method to size the hole of ASD by only 2D transthoracic echocardiography (2D-TTE). It should be mor...
Tele-assessment of young children referred for autism spectrum disorder evaluation during COVID-19: Associations among clinical characteristics and diagnostic outcome
Tele-assessment of young children referred for autism spectrum disorder evaluation during COVID-19: Associations among clinical characteristics and diagnostic outcome
Since the onset of the COVID-19 pandemic, there has been a rapid acceleration of innovative research on health services delivery, including real-world clinical implementation and e...
Interactions of genetic risks for autism and the broad autism phenotypes
Interactions of genetic risks for autism and the broad autism phenotypes
BackgroundCommon polygenic risk and de novo variants (DNVs) capture a small proportion of autism spectrum disorder (ASD) liability, and ASD phenotypic heterogeneity remains difficu...
P920Understanding arrhythmia mechanisms in patients with atrial septal defects
P920Understanding arrhythmia mechanisms in patients with atrial septal defects
Abstract
Background
Atrial arrhythmias represent a major cause of morbidity and hospitalization in patients with atrial septal d...
Genome-wide, integrative analysis implicates circular RNA dysregulation in autism and the corresponding circular RNA-microRNA-mRNA regulatory axes
Genome-wide, integrative analysis implicates circular RNA dysregulation in autism and the corresponding circular RNA-microRNA-mRNA regulatory axes
AbstractCircular RNAs (circRNAs), a class of long non-coding RNAs, are known to be enriched in mammalian brain and neural tissues. While the effects of regulatory genetic variants ...


