Javascript must be enabled to continue!
CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS
View through CrossRef
Abstract
Genome wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence. Recent bacterial GWAS methods usually rely on
k
-mers, whose presence in a genome can denote variants ranging from single nucleotide polymorphisms to mobile genetic elements. Since many bacterial species include genes that are not shared among all strains, this approach avoids the reliance on a common reference genome. However, the same gene can exist in slightly different versions across different strains, leading to diluted effects when trying to detect its association to a phenotype through
k
-mer based GWAS. Here we propose to overcome this by testing covariates built from closed connected subgraphs of the De Bruijn graph defined over genomic
k
-mers. These covariates are able to capture polymorphic genes as a single entity, improving
k
-mer based GWAS in terms of power and interpretability. As the number of subgraphs is exponential in the number of nodes in the DBG, a method naively testing all possible subgraphs would result in very low statistical power due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable. The concept of testable hypothesis has successfully been used to address both problems in similar contexts. We leverage this concept to test all closed connected subgraphs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency. We illustrate this on both real and simulated datasets and also demonstrate how considering subgraphs leads to a more powerful and interpretable method. Our method integrates with existing visual tools to facilitate interpretation. We also provide an implementation of our method, as well as code to reproduce all results at
https://github.com/HectorRDB/Caldera_Recomb
.
Title: CALDERA: Finding all significant de Bruijn subgraphs for bacterial GWAS
Description:
Abstract
Genome wide association studies (GWAS), aiming to find genetic variants associated with a trait, have widely been used on bacteria to identify genetic determinants of drug resistance or hypervirulence.
Recent bacterial GWAS methods usually rely on
k
-mers, whose presence in a genome can denote variants ranging from single nucleotide polymorphisms to mobile genetic elements.
Since many bacterial species include genes that are not shared among all strains, this approach avoids the reliance on a common reference genome.
However, the same gene can exist in slightly different versions across different strains, leading to diluted effects when trying to detect its association to a phenotype through
k
-mer based GWAS.
Here we propose to overcome this by testing covariates built from closed connected subgraphs of the De Bruijn graph defined over genomic
k
-mers.
These covariates are able to capture polymorphic genes as a single entity, improving
k
-mer based GWAS in terms of power and interpretability.
As the number of subgraphs is exponential in the number of nodes in the DBG, a method naively testing all possible subgraphs would result in very low statistical power due to multiple testing corrections, and the mere exploration of these subgraphs would quickly become computationally intractable.
The concept of testable hypothesis has successfully been used to address both problems in similar contexts.
We leverage this concept to test all closed connected subgraphs by proposing a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, resulting in drastic improvements in computational efficiency.
We illustrate this on both real and simulated datasets and also demonstrate how considering subgraphs leads to a more powerful and interpretable method.
Our method integrates with existing visual tools to facilitate interpretation.
We also provide an implementation of our method, as well as code to reproduce all results at
https://github.
com/HectorRDB/Caldera_Recomb
.
Related Results
Tracking caldera cycles in the Aso-4 magmatic system
Tracking caldera cycles in the Aso-4 magmatic system
<p>Caldera-forming eruptions are among the most hazardous natural events on Earth and pose a significant risk for global consequences in the future. Recent petrologic...
Deformation around the Creede Caldera: A consequence of isostatic adjustment following Caldera Formation
Deformation around the Creede Caldera: A consequence of isostatic adjustment following Caldera Formation
The pattern of deformation around the Creede caldera (26.5 Ma), southwest Colorado, may provide clues to the physical mechanisms of caldera evolution, particularly resurgent doming...
Multi de Bruijn Sequences and the Cross-Join Method
Multi de Bruijn Sequences and the Cross-Join Method
We show a method to construct binary multi de Bruijn sequences using the cross-join method. We extend the proof given by Alhakim for ordinary de Bruijn sequences to the case of mul...
Caldera collapse thresholds correlate with magma chamber dimensions
Caldera collapse thresholds correlate with magma chamber dimensions
AbstractExplosive caldera-forming eruptions eject voluminous magma during the gravitational collapse of the roof of the magma chamber. Caldera collapse is known to occur by rapid d...
Patterns of ties in problem-solving networks and their dynamic properties
Patterns of ties in problem-solving networks and their dynamic properties
Understanding the functions carried out by network subgraphs is important to revealing the organizing principles of diverse complex networks. Here, we study this question in the co...
Downsag calderas, ring faults, caldera sizes, and incremental caldera growth
Downsag calderas, ring faults, caldera sizes, and incremental caldera growth
Not all calderas conform to the currently favored model, in which a cylindrical block subsides as in cauldrons of deeply eroded volcanoes. Some calderas are downsagged structures, ...
Generation of Pre-Caldera Qixiangzhan and Syn-Caldera Millennium Rhyolites from Changbaishan Volcano by Shallow Remelting: Evidence from Zircon Hf–O Isotopes
Generation of Pre-Caldera Qixiangzhan and Syn-Caldera Millennium Rhyolites from Changbaishan Volcano by Shallow Remelting: Evidence from Zircon Hf–O Isotopes
The Changbaishan volcano is well known for its major caldera-forming Millennium Eruption (ME) in 946 CE (Common Era). We report Hf–O isotopes of zircon grains from pre-caldera Qixi...
Phased Multi de Bruijn Sequences
Phased Multi de Bruijn Sequences
We introduce phased multi de Bruijn sequences, a generalization of de Bruijn sequences. A phased string is a string whose positions sequentially rotate through several alphabets; e...

