Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Pangenome References Improve Biomarker Estimation from Tumor Sequencing Data

View through CrossRef
Abstract It has recently been shown that patients from non-European ancestries are at a higher risk of inappropriate clinical intervention because of inaccurate biomarker estimation, arising from the reference bias inherent in standard methods for determining the tumor genome from sequencing data. Here we demonstrate that these inaccuracies can be reduced by using a pangenome reference appropriate for the patient’s population. We constructed a novel secondary analysis workflow where the pangenome reference serves as a scaffold for mapping the sequencing reads, and is also included in the relevant ‘panel of normals’ needed to discriminate between germline and somatic mutations in tumor-only sequencing. This approach detects known somatic mutations in tumor-only sequencing more accurately than the standard GATK somatic calling workflow, prevalent in diagnostic settings for analysis of sequencing data from tumor-only assays, on a standard benchmark tumor sample, HCC1395 (33% relative increase in F1 score). We also assessed the expected clinical impact of our approach by comparing the Tumor Mutational Burden (TMB) calculated from missense somatic mutations called in tumor/normal samples from 6 patients self-reported as belonging to African, 1 to Asian and 3 to European populations respectively. We find that the TMB values calculated from the tumor-only sequencing data analyzed by our workflow more closely approximate the TMB values calculated from the tumor-normal analysis of the same sample, being 35% higher on average, whereas GATK tumor-only analysis generates TMB values 56% higher on average than the tumor-normal analysis of the same sample. Tumor-normal TMB values calculated by the two methods do not vary as drastically, GATK generated values being 13% higher on average, indicating that GATK tumor-only analysis leads to significant overestimation of TMB values, which can be largely corrected by using our workflow when tumor-normal sequencing is not available. These results indicate that pangenome based analysis has the potential to become the new standard for unbiased processing of somatic sequencing samples, following on from its increased adoption for germline sequencing analysis.
Title: Pangenome References Improve Biomarker Estimation from Tumor Sequencing Data
Description:
Abstract It has recently been shown that patients from non-European ancestries are at a higher risk of inappropriate clinical intervention because of inaccurate biomarker estimation, arising from the reference bias inherent in standard methods for determining the tumor genome from sequencing data.
Here we demonstrate that these inaccuracies can be reduced by using a pangenome reference appropriate for the patient’s population.
We constructed a novel secondary analysis workflow where the pangenome reference serves as a scaffold for mapping the sequencing reads, and is also included in the relevant ‘panel of normals’ needed to discriminate between germline and somatic mutations in tumor-only sequencing.
This approach detects known somatic mutations in tumor-only sequencing more accurately than the standard GATK somatic calling workflow, prevalent in diagnostic settings for analysis of sequencing data from tumor-only assays, on a standard benchmark tumor sample, HCC1395 (33% relative increase in F1 score).
We also assessed the expected clinical impact of our approach by comparing the Tumor Mutational Burden (TMB) calculated from missense somatic mutations called in tumor/normal samples from 6 patients self-reported as belonging to African, 1 to Asian and 3 to European populations respectively.
We find that the TMB values calculated from the tumor-only sequencing data analyzed by our workflow more closely approximate the TMB values calculated from the tumor-normal analysis of the same sample, being 35% higher on average, whereas GATK tumor-only analysis generates TMB values 56% higher on average than the tumor-normal analysis of the same sample.
Tumor-normal TMB values calculated by the two methods do not vary as drastically, GATK generated values being 13% higher on average, indicating that GATK tumor-only analysis leads to significant overestimation of TMB values, which can be largely corrected by using our workflow when tumor-normal sequencing is not available.
These results indicate that pangenome based analysis has the potential to become the new standard for unbiased processing of somatic sequencing samples, following on from its increased adoption for germline sequencing analysis.

Related Results

Complex Collision Tumors: A Systematic Review
Complex Collision Tumors: A Systematic Review
Abstract Introduction: A collision tumor consists of two distinct neoplastic components located within the same organ, separated by stromal tissue, without histological intermixing...
Cluster-efficient pangenome graph construction with nf-core/pangenome
Cluster-efficient pangenome graph construction with nf-core/pangenome
Abstract Motivation Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. ...
Cluster efficient pangenome graph construction with nf-core/pangenome
Cluster efficient pangenome graph construction with nf-core/pangenome
Abstract Motivation Pangenome graphs offer a comprehensive way of capturing genomic variability across multiple genomes. Howeve...
Giant Sacrococcygeal Teratoma in Infant: Systematic Review
Giant Sacrococcygeal Teratoma in Infant: Systematic Review
Abstract Introduction Sacrococcygeal teratoma (SCT) is a rare embryonal tumor that occurs in the sacrococcygeal region, with an incidence of about 1 in 35,000 to 40,000 live births...
SVPG: A pangenome-based structural variant detection approach and rapid augmentation of pangenome graphs with new samples
SVPG: A pangenome-based structural variant detection approach and rapid augmentation of pangenome graphs with new samples
Abstract Breakthrough advances in long-read sequencing technologies have opened unprecedented opportunities to study genetic variations through comprehensive pangen...
Panaln: indexing pangenome for read alignment
Panaln: indexing pangenome for read alignment
Abstract Motivation Pangenome indexing is a critical supporting technology in biological sequence analysis such as read alignmen...

Back to Top