Javascript must be enabled to continue!
Applying gradient tree boosting to QTL mapping with Shapley additive explanations
View through CrossRef
Abstract
Mapping quantitative trait loci (QTLs) is one of the major goals of quantitative genetics; however, identifying the interactions between QTLs (i.e., epistasis) remains challenging. Recently developed machine learning methods, such as deep learning and gradient boosting, are transforming the real world. These methods could advance QTL mapping methodologies because of their high capability for capturing complex relationships among features. One problem with applying such complex models to QTL mapping is the evaluation of feature importance. In this study, XGBoost, a popular gradient tree boosting algorithm, was applied for QTL mapping in biparental populations with Shapley additive explanations (SHAPs). SHAP is a local (i.e., instance-wise) importance index with the desired properties as feature importance indices. The SHAP-assisted XGBoost (SHAP-XGB) was compared with conventional methods, including composite interval mapping (CIM), multiple interval mapping (MIM), inclusive CIM (ICIM), and BayesC, using simulations and rice heading date data. SHAP-XGB performed comparablely to CIM, MIM, ICIM, and BayesC in mapping main QTL effects and was superior to MIM, ICIM, and BayesC in mapping QTL interaction effects. As SHAP can evaluate local importance, interactions between markers can be visualized by plotting SHAP interaction values for each instance (plant/line). These results illustrated the strength of SHAP-XGB in detecting and interpreting epistatic QTLs and suggest the possibility that SHAP-XGB complements conventional methods.
Title: Applying gradient tree boosting to QTL mapping with Shapley additive explanations
Description:
Abstract
Mapping quantitative trait loci (QTLs) is one of the major goals of quantitative genetics; however, identifying the interactions between QTLs (i.
e.
, epistasis) remains challenging.
Recently developed machine learning methods, such as deep learning and gradient boosting, are transforming the real world.
These methods could advance QTL mapping methodologies because of their high capability for capturing complex relationships among features.
One problem with applying such complex models to QTL mapping is the evaluation of feature importance.
In this study, XGBoost, a popular gradient tree boosting algorithm, was applied for QTL mapping in biparental populations with Shapley additive explanations (SHAPs).
SHAP is a local (i.
e.
, instance-wise) importance index with the desired properties as feature importance indices.
The SHAP-assisted XGBoost (SHAP-XGB) was compared with conventional methods, including composite interval mapping (CIM), multiple interval mapping (MIM), inclusive CIM (ICIM), and BayesC, using simulations and rice heading date data.
SHAP-XGB performed comparablely to CIM, MIM, ICIM, and BayesC in mapping main QTL effects and was superior to MIM, ICIM, and BayesC in mapping QTL interaction effects.
As SHAP can evaluate local importance, interactions between markers can be visualized by plotting SHAP interaction values for each instance (plant/line).
These results illustrated the strength of SHAP-XGB in detecting and interpreting epistatic QTLs and suggest the possibility that SHAP-XGB complements conventional methods.
Related Results
Development of doubled haploid population and QTL mapping for Fusarium stalk rot (FSR) resistance in tropical maize
Development of doubled haploid population and QTL mapping for Fusarium stalk rot (FSR) resistance in tropical maize
Abstract
Fusarium stalk rot disease (FSR) caused by Fusarium verticilloides is emerging as the major production constraint in maize across the world. As a prelude to develo...
Mapping of QTL for resistance to fusarium stalk rot (FSR) in tropical maize (Zea mays L.)
Mapping of QTL for resistance to fusarium stalk rot (FSR) in tropical maize (Zea mays L.)
Fusarium stalk rot disease (FSR) caused by Fusarium verticilloides is emerging as the major production constraint in maize across theworld.As a prelude to developing maize hybrids ...
QTL and Candidate Genes: Techniques and Advancement in Abiotic Stress Resistance Breeding of Major Cereals
QTL and Candidate Genes: Techniques and Advancement in Abiotic Stress Resistance Breeding of Major Cereals
At least 75% of the world’s grain production comes from the three most important cereal crops: rice (Oryza sativa), wheat (Triticum aestivum), and maize (Zea mays). However, abioti...
Costly Resistance to Parasitism
Costly Resistance to Parasitism
Abstract
Information on the molecular basis of resistance and the evolution of resistance is crucial to an understanding of the appearance, spread, and distribution ...
A Theory of Heterosis
A Theory of Heterosis
AbstractHeterosis refers to the superior performance of a hybrid over its parents. It is the basis for hybrid breeding particularly for maize and rice. Genetically it is due to int...
Detection of Quantitative Trait Loci (QTL) associated with the spring regrowth vigor trait in alfalfa (Medicago sativa L.)
Detection of Quantitative Trait Loci (QTL) associated with the spring regrowth vigor trait in alfalfa (Medicago sativa L.)
Abstract
Background: Alfalfa ( Medicago sativa L.) is a perennial forage legume with a reputation as being the “queen of forage”. Spring regrowth vigor refers to the proces...
Supplementary Python Jupyter and R/qtl Notebooks
Supplementary Python Jupyter and R/qtl Notebooks
Part 2: Supplementary Python Jupyter and R/qtl Notebooks is the essential companion to the main volume QTL Mapping with Python and R/qtl: A Reproducible Pipeline for Crop Genetics....

