Javascript must be enabled to continue!
Abstract B8: Why is there no consensus on GBM subgroups? Understanding the nature of biological and statistical variability in The Cancer Genome Atlas GBM data and the implications for molecular tumor classification
View through CrossRef
Abstract
Introduction: Clinical differences among patients with glioblastoma (GBM) suggest the existence of discrete subgroups of this disease. Such groups are not recognized histologically, engendering interest in molecular classification strategies for GBM. Numerous studies have described molecular fingerprints characteristic of GBM subclasses with unique genotypes and phenotypes, but these classifications have been difficult to reproduce. Accordingly, there remains little consensus regarding GBM subclasses, their characteristic molecular signatures, and the clinically-relevant genotype-phenotype correlations of the putative classes. We hypothesize that a combination of biological and mathematical factors confound interpretation of this data, and we have undertaken a comprehensive investigation into the nature and relative contributions of these factors to inconsistencies in molecular classification of GBMs.
Methods: We analyzed gene expression and clinical data for all 340 GBMs in The Cancer Genome Atlas (TCGA) profiled using the Affymetrix HT-HG-U133A platform. We created a logic model for systematically analyzing the sources of biological, technical, and mathematical variability inherent in this dataset and in the analytic strategies commonly employed in its interpretation. We then used standard linear classifiers and linear dimensionality reduction algorithms in conjunction with our logic model to investigate the nature and relative contributions of each factor.
Results: Gene expression data can be used in conjunction with unbiased linear classifiers to distinguish GBMs from other tumors, suggesting that valid biological data is contained in the dataset. However, the same classifiers fail to segregate GBMs into clinically-relevant molecular subgroups. Further investigation reveals that commonly-described sources of classification error, including individual sample characteristics, batch effects, and analytic and platform (technical) noise make a measurable but proportionally minor contribution to inaccurate classification. Instead, our analysis suggests that three, previously underappreciated classes of variability may account for a larger fraction of classification errors: biologic variability (noise) among tumors, of which we describe three types; skewed data distributions incorrectly assumed to be normal; and inherent nonlinear/nonorthogonal relationships among the variables (genes) used in conjunction with classification algorithms that assume linearity.
Conclusions: Technical sources of variability are often assumed to be the primary source of inaccurate molecular classification of GBMs. Our analysis of the TCGA data suggests a contributory role for these factors, and we believe that additional research in modeling this error is critical to improving classification accuracy. Notwithstanding, our analysis also suggests that three, rarely-discussed factors, biological variability, abnormal data distribution, and nonlinear relationships among genes, may, together, be responsible for a proportionally larger component of classification error. Additional research is necessary to better characterize the nature and relative magnitude of each of these effects. Subsequent efforts can then be made to develop strategies capable of more appropriately identifying and addressing these factors, thereby improving the accuracy and precision of future molecular classifiers for GBM.
Citation Information: Clin Cancer Res 2010;16(7 Suppl):B8
American Association for Cancer Research (AACR)
Title: Abstract B8: Why is there no consensus on GBM subgroups? Understanding the nature of biological and statistical variability in The Cancer Genome Atlas GBM data and the implications for molecular tumor classification
Description:
Abstract
Introduction: Clinical differences among patients with glioblastoma (GBM) suggest the existence of discrete subgroups of this disease.
Such groups are not recognized histologically, engendering interest in molecular classification strategies for GBM.
Numerous studies have described molecular fingerprints characteristic of GBM subclasses with unique genotypes and phenotypes, but these classifications have been difficult to reproduce.
Accordingly, there remains little consensus regarding GBM subclasses, their characteristic molecular signatures, and the clinically-relevant genotype-phenotype correlations of the putative classes.
We hypothesize that a combination of biological and mathematical factors confound interpretation of this data, and we have undertaken a comprehensive investigation into the nature and relative contributions of these factors to inconsistencies in molecular classification of GBMs.
Methods: We analyzed gene expression and clinical data for all 340 GBMs in The Cancer Genome Atlas (TCGA) profiled using the Affymetrix HT-HG-U133A platform.
We created a logic model for systematically analyzing the sources of biological, technical, and mathematical variability inherent in this dataset and in the analytic strategies commonly employed in its interpretation.
We then used standard linear classifiers and linear dimensionality reduction algorithms in conjunction with our logic model to investigate the nature and relative contributions of each factor.
Results: Gene expression data can be used in conjunction with unbiased linear classifiers to distinguish GBMs from other tumors, suggesting that valid biological data is contained in the dataset.
However, the same classifiers fail to segregate GBMs into clinically-relevant molecular subgroups.
Further investigation reveals that commonly-described sources of classification error, including individual sample characteristics, batch effects, and analytic and platform (technical) noise make a measurable but proportionally minor contribution to inaccurate classification.
Instead, our analysis suggests that three, previously underappreciated classes of variability may account for a larger fraction of classification errors: biologic variability (noise) among tumors, of which we describe three types; skewed data distributions incorrectly assumed to be normal; and inherent nonlinear/nonorthogonal relationships among the variables (genes) used in conjunction with classification algorithms that assume linearity.
Conclusions: Technical sources of variability are often assumed to be the primary source of inaccurate molecular classification of GBMs.
Our analysis of the TCGA data suggests a contributory role for these factors, and we believe that additional research in modeling this error is critical to improving classification accuracy.
Notwithstanding, our analysis also suggests that three, rarely-discussed factors, biological variability, abnormal data distribution, and nonlinear relationships among genes, may, together, be responsible for a proportionally larger component of classification error.
Additional research is necessary to better characterize the nature and relative magnitude of each of these effects.
Subsequent efforts can then be made to develop strategies capable of more appropriately identifying and addressing these factors, thereby improving the accuracy and precision of future molecular classifiers for GBM.
Citation Information: Clin Cancer Res 2010;16(7 Suppl):B8.
Related Results
Complex Collision Tumors: A Systematic Review
Complex Collision Tumors: A Systematic Review
Abstract
Introduction: A collision tumor consists of two distinct neoplastic components located within the same organ, separated by stromal tissue, without histological intermixing...
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Abstract
A cervical rib (CR), also known as a supernumerary or extra rib, is an additional rib that forms above the first rib, resulting from the overgrowth of the transverse proce...
Abstract LB-249: Condroitin sulfate proteoglycan 4 (CSPG4)- redirected T cells eliminate glioblastoma-derived neurospheres
Abstract LB-249: Condroitin sulfate proteoglycan 4 (CSPG4)- redirected T cells eliminate glioblastoma-derived neurospheres
Abstract
Chimeric Antigen Receptor-redirected T cells (CAR-Ts) remain challenging for the treatment of glioblastoma (GBM) due to the heterogeneous expression of targ...
Abstract 1249: SetD2 histone methyltransferase mutation status predicts treatment response in glioblastoma: Strategies to overcome chemoresistance
Abstract 1249: SetD2 histone methyltransferase mutation status predicts treatment response in glioblastoma: Strategies to overcome chemoresistance
Abstract
Purpose: Glioblastoma (GBM) is a highly aggressive primary brain tumor. A major challenge in GBM treatment is tumor resistance to radiation and chemotherapy...
Abstract 4300: Cell cycle pathway gene regulation in glioblastoma multiforme (GBM) and GBM derived stem cells: Implicating Pentraxin 3 upregulation
Abstract 4300: Cell cycle pathway gene regulation in glioblastoma multiforme (GBM) and GBM derived stem cells: Implicating Pentraxin 3 upregulation
Abstract
Introduction: Glioblastoma Multiforme (GBM) is the most aggressive type of brain cancer and progresses at a rapid rate. The major obstacle in the treatment ...
Abstract 1260: Tumor hypoxia conditions glioblastoma cells for immunosuppression
Abstract 1260: Tumor hypoxia conditions glioblastoma cells for immunosuppression
Abstract
Glioblastoma (GBM) is the most common and lethal malignant brain tumor that invariably recurs after standard therapy, with a median survival of only ~16 mon...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract
Introduction
Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
Giant Sacrococcygeal Teratoma in Infant: Systematic Review
Giant Sacrococcygeal Teratoma in Infant: Systematic Review
Abstract
Introduction
Sacrococcygeal teratoma (SCT) is a rare embryonal tumor that occurs in the sacrococcygeal region, with an incidence of about 1 in 35,000 to 40,000 live births...

