Javascript must be enabled to continue!
Pangenome References Improve Biomarker Estimation from Tumor Sequencing Data
View through CrossRef
Abstract
It has recently been shown that patients from non-European ancestries are at a higher risk of inappropriate clinical intervention because of inaccurate biomarker estimation, arising from the reference bias inherent in standard methods for determining the tumor genome from sequencing data. Here we demonstrate that these inaccuracies can be reduced by using a pangenome reference appropriate for the patient’s population.
We constructed a novel secondary analysis workflow where the pangenome reference serves as a scaffold for mapping the sequencing reads, and is also included in the relevant ‘panel of normals’ needed to discriminate between germline and somatic mutations in tumor-only sequencing. This approach detects known somatic mutations in tumor-only sequencing more accurately than the standard GATK somatic calling workflow, prevalent in diagnostic settings for analysis of sequencing data from tumor-only assays, on a standard benchmark tumor sample, HCC1395 (33% relative increase in F1 score).
We also assessed the expected clinical impact of our approach by comparing the Tumor Mutational Burden (TMB) calculated from missense somatic mutations called in tumor/normal samples from 6 patients self-reported as belonging to African, 1 to Asian and 3 to European populations respectively. We find that the TMB values calculated from the tumor-only sequencing data analyzed by our workflow more closely approximate the TMB values calculated from the tumor-normal analysis of the same sample, being 35% higher on average, whereas GATK tumor-only analysis generates TMB values 56% higher on average than the tumor-normal analysis of the same sample. Tumor-normal TMB values calculated by the two methods do not vary as drastically, GATK generated values being 13% higher on average, indicating that GATK tumor-only analysis leads to significant overestimation of TMB values, which can be largely corrected by using our workflow when tumor-normal sequencing is not available.
These results indicate that pangenome based analysis has the potential to become the new standard for unbiased processing of somatic sequencing samples, following on from its increased adoption for germline sequencing analysis.
Title: Pangenome References Improve Biomarker Estimation from Tumor Sequencing Data
Description:
Abstract
It has recently been shown that patients from non-European ancestries are at a higher risk of inappropriate clinical intervention because of inaccurate biomarker estimation, arising from the reference bias inherent in standard methods for determining the tumor genome from sequencing data.
Here we demonstrate that these inaccuracies can be reduced by using a pangenome reference appropriate for the patient’s population.
We constructed a novel secondary analysis workflow where the pangenome reference serves as a scaffold for mapping the sequencing reads, and is also included in the relevant ‘panel of normals’ needed to discriminate between germline and somatic mutations in tumor-only sequencing.
This approach detects known somatic mutations in tumor-only sequencing more accurately than the standard GATK somatic calling workflow, prevalent in diagnostic settings for analysis of sequencing data from tumor-only assays, on a standard benchmark tumor sample, HCC1395 (33% relative increase in F1 score).
We also assessed the expected clinical impact of our approach by comparing the Tumor Mutational Burden (TMB) calculated from missense somatic mutations called in tumor/normal samples from 6 patients self-reported as belonging to African, 1 to Asian and 3 to European populations respectively.
We find that the TMB values calculated from the tumor-only sequencing data analyzed by our workflow more closely approximate the TMB values calculated from the tumor-normal analysis of the same sample, being 35% higher on average, whereas GATK tumor-only analysis generates TMB values 56% higher on average than the tumor-normal analysis of the same sample.
Tumor-normal TMB values calculated by the two methods do not vary as drastically, GATK generated values being 13% higher on average, indicating that GATK tumor-only analysis leads to significant overestimation of TMB values, which can be largely corrected by using our workflow when tumor-normal sequencing is not available.
These results indicate that pangenome based analysis has the potential to become the new standard for unbiased processing of somatic sequencing samples, following on from its increased adoption for germline sequencing analysis.
Related Results
Disentangling the impacts of abiotic and biotic environmental factors and dispersal dynamics on the pangenome fluidity of bacterial pathogens
Disentangling the impacts of abiotic and biotic environmental factors and dispersal dynamics on the pangenome fluidity of bacterial pathogens
ABSTRACT
Understanding how pangenomes originate and evolve is crucial for predicting evolutionary trajectories and uncovering ecological interact...
Complex Collision Tumors: A Systematic Review
Complex Collision Tumors: A Systematic Review
Abstract
Introduction: A collision tumor consists of two distinct neoplastic components located within the same organ, separated by stromal tissue, without histological intermixing...
Genomic characterization of the
C. tuberculostearicum
species complex, a ubiquitous member of the human skin microbiome
Genomic characterization of the
C. tuberculostearicum
species complex, a ubiquitous member of the human skin microbiome
ABSTRACT
Corynebacterium
is a predominant genus in the skin microbiome, yet its genetic diversity on skin is incompletely chara...
Cluster-efficient pangenome graph construction with nf-core/pangenome
Cluster-efficient pangenome graph construction with nf-core/pangenome
Abstract
Motivation
Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. ...
Cluster efficient pangenome graph construction with nf-core/pangenome
Cluster efficient pangenome graph construction with nf-core/pangenome
Abstract
Motivation
Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. Howeve...
Giant Sacrococcygeal Teratoma in Infant: Systematic Review
Giant Sacrococcygeal Teratoma in Infant: Systematic Review
Abstract
Introduction
Sacrococcygeal teratoma (SCT) is a rare embryonal tumor that occurs in the sacrococcygeal region, with an incidence of about 1 in 35,000 to 40,000 live births...
SVPG: A pangenome-based structural variant detection approach and rapid augmentation of pangenome graphs with new samples
SVPG: A pangenome-based structural variant detection approach and rapid augmentation of pangenome graphs with new samples
Abstract
Breakthrough advances in long-read sequencing technologies have opened unprecedented opportunities to study genetic variations through comprehensive pangen...
Panaln: indexing pangenome for read alignment
Panaln: indexing pangenome for read alignment
Abstract
Motivation
Pangenome indexing is a critical supporting technology in biological sequence analysis such as read alignmen...

