Javascript must be enabled to continue!
Pangenome References Improve Biomarker Estimation from Tumor Sequencing Data
View through CrossRef
Abstract
It has recently been shown that patients from non-European ancestries are at a higher risk of inappropriate clinical intervention because of inaccurate biomarker estimation, arising from the reference bias inherent in standard methods for determining the tumor genome from sequencing data. Here we demonstrate that these inaccuracies can be reduced by using a pangenome reference appropriate for the patient’s population.
We constructed a novel secondary analysis workflow where the pangenome reference serves as a scaffold for mapping the sequencing reads, and is also included in the relevant ‘panel of normals’ needed to discriminate between germline and somatic mutations in tumor-only sequencing. This approach detects known somatic mutations in tumor-only sequencing more accurately than the standard GATK somatic calling workflow, prevalent in diagnostic settings for analysis of sequencing data from tumor-only assays, on a standard benchmark tumor sample, HCC1395 (33% relative increase in F1 score).
We also assessed the expected clinical impact of our approach by comparing the Tumor Mutational Burden (TMB) calculated from missense somatic mutations called in tumor/normal samples from 6 patients self-reported as belonging to African, 1 to Asian and 3 to European populations respectively. We find that the TMB values calculated from the tumor-only sequencing data analyzed by our workflow more closely approximate the TMB values calculated from the tumor-normal analysis of the same sample, being 35% higher on average, whereas GATK tumor-only analysis generates TMB values 56% higher on average than the tumor-normal analysis of the same sample. Tumor-normal TMB values calculated by the two methods do not vary as drastically, GATK generated values being 13% higher on average, indicating that GATK tumor-only analysis leads to significant overestimation of TMB values, which can be largely corrected by using our workflow when tumor-normal sequencing is not available.
These results indicate that pangenome based analysis has the potential to become the new standard for unbiased processing of somatic sequencing samples, following on from its increased adoption for germline sequencing analysis.
Title: Pangenome References Improve Biomarker Estimation from Tumor Sequencing Data
Description:
Abstract
It has recently been shown that patients from non-European ancestries are at a higher risk of inappropriate clinical intervention because of inaccurate biomarker estimation, arising from the reference bias inherent in standard methods for determining the tumor genome from sequencing data.
Here we demonstrate that these inaccuracies can be reduced by using a pangenome reference appropriate for the patient’s population.
We constructed a novel secondary analysis workflow where the pangenome reference serves as a scaffold for mapping the sequencing reads, and is also included in the relevant ‘panel of normals’ needed to discriminate between germline and somatic mutations in tumor-only sequencing.
This approach detects known somatic mutations in tumor-only sequencing more accurately than the standard GATK somatic calling workflow, prevalent in diagnostic settings for analysis of sequencing data from tumor-only assays, on a standard benchmark tumor sample, HCC1395 (33% relative increase in F1 score).
We also assessed the expected clinical impact of our approach by comparing the Tumor Mutational Burden (TMB) calculated from missense somatic mutations called in tumor/normal samples from 6 patients self-reported as belonging to African, 1 to Asian and 3 to European populations respectively.
We find that the TMB values calculated from the tumor-only sequencing data analyzed by our workflow more closely approximate the TMB values calculated from the tumor-normal analysis of the same sample, being 35% higher on average, whereas GATK tumor-only analysis generates TMB values 56% higher on average than the tumor-normal analysis of the same sample.
Tumor-normal TMB values calculated by the two methods do not vary as drastically, GATK generated values being 13% higher on average, indicating that GATK tumor-only analysis leads to significant overestimation of TMB values, which can be largely corrected by using our workflow when tumor-normal sequencing is not available.
These results indicate that pangenome based analysis has the potential to become the new standard for unbiased processing of somatic sequencing samples, following on from its increased adoption for germline sequencing analysis.
Related Results
Giant Sacrococcygeal Teratoma in Infant: Systematic Review
Giant Sacrococcygeal Teratoma in Infant: Systematic Review
Abstract
Introduction
Sacrococcygeal teratoma (SCT) is a rare embryonal tumor that occurs in the sacrococcygeal region, with an incidence of about 1 in 35,000 to 40,000 live births...
Cluster-efficient pangenome graph construction with nf-core/pangenome
Cluster-efficient pangenome graph construction with nf-core/pangenome
Abstract
Motivation
Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. ...
Cluster efficient pangenome graph construction with nf-core/pangenome
Cluster efficient pangenome graph construction with nf-core/pangenome
Abstract
Motivation
Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. Howeve...
ODGI: understanding pangenome graphs
ODGI: understanding pangenome graphs
Abstract
Motivation
Pangenome graphs provide a complete representation of the mutual alignment of collections of genomes. These...
Next Generation Sequencing Technologies and Their Applications
Next Generation Sequencing Technologies and Their Applications
Abstract
The advances in next generation sequencing (NGS) technologies have tremendous impacts on the studies of structural and f...
Microwave Ablation with or Without Chemotherapy in Management of Non-Small Cell Lung Cancer: A Systematic Review
Microwave Ablation with or Without Chemotherapy in Management of Non-Small Cell Lung Cancer: A Systematic Review
Abstract
Introduction
Microwave ablation (MWA) has emerged as a minimally invasive treatment for patients with inoperable non-small cell lung cancer (NSCLC). However, whether it i...
Efficient inference of large pangenomes with PanTA
Efficient inference of large pangenomes with PanTA
Abstract
Pangenome analysis is an indispensable step in bacterial genomics to address the high variability of bacteria genomes. However, speed an...
Pangenome graph layout by Path-Guided Stochastic Gradient Descent
Pangenome graph layout by Path-Guided Stochastic Gradient Descent
Abstract
Motivation
The increasing availability of complete genomes demands for models to study genomic variability within enti...

