Javascript must be enabled to continue!
Valid inference for machine learning-assisted GWAS
View through CrossRef
AbstractMachine learning (ML) has revolutionized analytical strategies in almost all scientific disciplines including human genetics and genomics. Due to challenges in sample collection and precise phenotyping, ML-assisted genome-wide association study (GWAS) which uses sophisticated ML to impute phenotypes and then performs GWAS on imputed outcomes has quickly gained popularity in complex trait genetics research. However, the validity of associations identified from ML-assisted GWAS has not been carefully evaluated. In this study, we report pervasive risks for false positive associations in ML-assisted GWAS, and introduce POP-GWAS, a novel statistical framework that reimagines GWAS on ML-imputed outcomes. POP-GWAS provides valid statistical inference irrespective of the quality of imputation or variables and algorithms used for imputation. It also only requires GWAS summary statistics as input. We employed POP-GWAS to perform the largest GWAS of bone mineral density (BMD) derived from dual-energy X-ray absorptiometry imaging at 14 skeletal sites, identifying 89 novel loci reaching genome-wide significance and revealing skeletal site-specific genetic architecture of BMD. Our framework may fundamentally reshape the analytical strategies in future ML-assisted GWAS.
Cold Spring Harbor Laboratory
Title: Valid inference for machine learning-assisted GWAS
Description:
AbstractMachine learning (ML) has revolutionized analytical strategies in almost all scientific disciplines including human genetics and genomics.
Due to challenges in sample collection and precise phenotyping, ML-assisted genome-wide association study (GWAS) which uses sophisticated ML to impute phenotypes and then performs GWAS on imputed outcomes has quickly gained popularity in complex trait genetics research.
However, the validity of associations identified from ML-assisted GWAS has not been carefully evaluated.
In this study, we report pervasive risks for false positive associations in ML-assisted GWAS, and introduce POP-GWAS, a novel statistical framework that reimagines GWAS on ML-imputed outcomes.
POP-GWAS provides valid statistical inference irrespective of the quality of imputation or variables and algorithms used for imputation.
It also only requires GWAS summary statistics as input.
We employed POP-GWAS to perform the largest GWAS of bone mineral density (BMD) derived from dual-energy X-ray absorptiometry imaging at 14 skeletal sites, identifying 89 novel loci reaching genome-wide significance and revealing skeletal site-specific genetic architecture of BMD.
Our framework may fundamentally reshape the analytical strategies in future ML-assisted GWAS.
Related Results
GWAS significance thresholds in large cohorts
GWAS significance thresholds in large cohorts
AbstractWhile the p-value threshold of 5.0 × 10−8remains the standard for genome-wide association studies (GWAS) in humans and other species, it still needs to be updated to reflec...
Causality between cholelithiasis and ileus: a two-sample Mendelian randomization study
Causality between cholelithiasis and ileus: a two-sample Mendelian randomization study
Abstract
Background: Cholelithiasis is a prevalent digestive ailment in China, prompting extensive research on its association with ileus. However, prior investigations rel...
Evolutionary Grammatical Inference
Evolutionary Grammatical Inference
Grammatical Inference (also known as grammar induction) is the problem of learning a grammar for a language from a set of examples. In a broad sense, some data is presented to the ...
Identification and characterization of genes involved in antioxidant traits in local Thai rice (Oryza sativa L.)
Identification and characterization of genes involved in antioxidant traits in local Thai rice (Oryza sativa L.)
Developing rice (Oryza sativa L) cultivars with high antioxidant activities have become increasingly important since they have nutritional advantages for human health. Hence, the ...
Abstract ML-1: Pharmacogenomics in the Quest for Precision Endocrine Therapy of Breast Cancer
Abstract ML-1: Pharmacogenomics in the Quest for Precision Endocrine Therapy of Breast Cancer
Abstract
Endocrine therapy, with SERMs and AIs, is the most important treatment modality for the 70% of patients with ER+ early breast cancer. Clinically, there is m...
Processing genome-wide association studies within a repository of heterogeneous genomic datasets
Processing genome-wide association studies within a repository of heterogeneous genomic datasets
Abstract
Background
Genome Wide Association Studies (GWAS) are based on the observation of genome-wide sets of genetic variants – typically single-n...
Pet Euthanasia and Human Euthanasia
Pet Euthanasia and Human Euthanasia
Photo ID 213552852 © Yuryz | Dreamstime.com
Abstract
A criticism of assisted death is that it’s contrary to the Hippocratic Oath. This opposition to assisted death assumes that dea...
An Approach to Machine Learning
An Approach to Machine Learning
The process of automatically recognising significant patterns within large amounts of data is called "machine learning." Throughout the last couple of decades, it has evolved into ...

