Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Machine learning framework for molecular classification of hematologic malignancies using transcriptome data

View through CrossRef
Abstract BACKGROUND Accurate characterization of hematologic malignancy is a first step in correct and tailored treatment. Tumor whole transcriptome sequencing (WTS) has recently emerged as a universal technique allowing for the accurate identification of not only all possible fusion transcripts, but also point mutations, copy number alterations, and gene overexpression, being especially useful for diagnosing cases of B-cell acute lymphoblastic leukemias (ALL). Since WTS provides gene expression landscape of the tumor sample, we decided to investigate whether the expression signatures could be used to also accurately classify a full spectrum of pediatric hematologic malignancies. METHODS We selected hematologic disorders with a minimum of 3 samples per condition in St. Jude Children's Research Hospital and Princess Maxima Center for Pediatric Oncology repositories. In total, this yielded WTS count data for 2914 samples across 69 distinct hematologic malignancy subtypes. The conditions spanned acute myeloid and lymphoid leukemias, Hodgkin and non-Hodgkin lymphomas, myelodysplastic syndrome, chronic myeloid leukemia, and others, including rare entities such as acute myeloid leukemia with KMT2A-partial tandem duplication or T-ALL with TLX1 activation. Raw gene counts of WTS data were used as model inputs. To focus on relevant genes in tumorigenesis, a list of 1245 known tumor drivers was selected based on OncoKB and CosmicDB. To improve classifier performance, gene counts were enriched with statistical features (total sum, mean, kurtosis and skew). Different machine learning models (linear regression, random forest, support vector classifier, multilayer perceptron and 3 types of gradient boosting models) were trained using a 3-fold cross-validation (due to a minimum of 3 samples per condition). F1 score (harmonic mean of precision and recall) was used as a primary metric to account for the imbalance of different hematologic conditions. Finally, external model validation was performed on an independent dataset of 28 hematological malignancy samples of 9 conditions from Children's Clinical University Hospital, Riga, Latvia. RESULTS The best performing model was a voting ensemble combining 7 different classifiers with internal cross-validation accuracy of 0.82 and an F1 score of 0.59. Performance on the independent test set was similar with overall accuracy of 0.82 and F1 score of 0.53. WTS in the test set identified all fusions that were also detected by the standard method. The model additionally detected clinically relevant events such as a case of B-ALL with PAX5-JAK2 fusion which was missed by the standard method. Conditions having similar expression patterns such B-ALL with ETV6-RUNX1 fusion and B-ALL ETV6-RUNX1-like were the most challenging to distinguish. Notably, acute megakaryoblastic leukemia cases, characterized by HOXA9 gene rearrangements, were classified as KMT2A-rearranged AML likely due to the shared transcriptional activation of HOXA9. CONCLUSION We propose a method to identify hematologic malignancy based on WTS gene counts. This approach enables the classification of diverse disease subtypes without the need for direct fusion detection and can complement existing diagnostic modalities. Finally, the classifier results also reveal transcriptional convergence between certain disease subtypes such as the KMT2A- and HOXA9-driven leukemia which may represent a potential shared target with emerging compounds such as menin inhibitors.
Title: Machine learning framework for molecular classification of hematologic malignancies using transcriptome data
Description:
Abstract BACKGROUND Accurate characterization of hematologic malignancy is a first step in correct and tailored treatment.
Tumor whole transcriptome sequencing (WTS) has recently emerged as a universal technique allowing for the accurate identification of not only all possible fusion transcripts, but also point mutations, copy number alterations, and gene overexpression, being especially useful for diagnosing cases of B-cell acute lymphoblastic leukemias (ALL).
Since WTS provides gene expression landscape of the tumor sample, we decided to investigate whether the expression signatures could be used to also accurately classify a full spectrum of pediatric hematologic malignancies.
METHODS We selected hematologic disorders with a minimum of 3 samples per condition in St.
Jude Children's Research Hospital and Princess Maxima Center for Pediatric Oncology repositories.
In total, this yielded WTS count data for 2914 samples across 69 distinct hematologic malignancy subtypes.
The conditions spanned acute myeloid and lymphoid leukemias, Hodgkin and non-Hodgkin lymphomas, myelodysplastic syndrome, chronic myeloid leukemia, and others, including rare entities such as acute myeloid leukemia with KMT2A-partial tandem duplication or T-ALL with TLX1 activation.
Raw gene counts of WTS data were used as model inputs.
To focus on relevant genes in tumorigenesis, a list of 1245 known tumor drivers was selected based on OncoKB and CosmicDB.
To improve classifier performance, gene counts were enriched with statistical features (total sum, mean, kurtosis and skew).
Different machine learning models (linear regression, random forest, support vector classifier, multilayer perceptron and 3 types of gradient boosting models) were trained using a 3-fold cross-validation (due to a minimum of 3 samples per condition).
F1 score (harmonic mean of precision and recall) was used as a primary metric to account for the imbalance of different hematologic conditions.
Finally, external model validation was performed on an independent dataset of 28 hematological malignancy samples of 9 conditions from Children's Clinical University Hospital, Riga, Latvia.
RESULTS The best performing model was a voting ensemble combining 7 different classifiers with internal cross-validation accuracy of 0.
82 and an F1 score of 0.
59.
Performance on the independent test set was similar with overall accuracy of 0.
82 and F1 score of 0.
53.
WTS in the test set identified all fusions that were also detected by the standard method.
The model additionally detected clinically relevant events such as a case of B-ALL with PAX5-JAK2 fusion which was missed by the standard method.
Conditions having similar expression patterns such B-ALL with ETV6-RUNX1 fusion and B-ALL ETV6-RUNX1-like were the most challenging to distinguish.
Notably, acute megakaryoblastic leukemia cases, characterized by HOXA9 gene rearrangements, were classified as KMT2A-rearranged AML likely due to the shared transcriptional activation of HOXA9.
CONCLUSION We propose a method to identify hematologic malignancy based on WTS gene counts.
This approach enables the classification of diverse disease subtypes without the need for direct fusion detection and can complement existing diagnostic modalities.
Finally, the classifier results also reveal transcriptional convergence between certain disease subtypes such as the KMT2A- and HOXA9-driven leukemia which may represent a potential shared target with emerging compounds such as menin inhibitors.

Related Results

Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Abstract A cervical rib (CR), also known as a supernumerary or extra rib, is an additional rib that forms above the first rib, resulting from the overgrowth of the transverse proce...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
Iron Overload after Hematopoietic Stem Cell Transplantation
Iron Overload after Hematopoietic Stem Cell Transplantation
Abstract Introduction Iron overload (IOL) is a common complication after HSCT, mainly due to iterative red blood cell (RBC) transfusions with other mechanisms as ine...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Categorizing Molecular Mutations in MDS and AML
Categorizing Molecular Mutations in MDS and AML
Abstract Introduction: A huge amount of data on genetic alterations has been compiled by high throughput sequencing studies in several hematologic mal...
1378. Clinical Characteristics of Tuberculosis Among Patients with Cancer in an Endemic Country
1378. Clinical Characteristics of Tuberculosis Among Patients with Cancer in an Endemic Country
Abstract Background Tuberculosis (TB) is an infection caused by reactivation of Mycobacterium tuberculosis. Decreasing host immu...
Tetraspanins set the stage for bone marrow microenvironment–induced chemoprotection in hematologic malignancies
Tetraspanins set the stage for bone marrow microenvironment–induced chemoprotection in hematologic malignancies
Abstract Despite recent advances in the treatment of hematologic malignancies, relapse still remains a consistent issue. One of the primary contributors to relapse i...
Complications of Hematologic Malignancies in the Emergency Department: A Primer for the Radiologist
Complications of Hematologic Malignancies in the Emergency Department: A Primer for the Radiologist
Hematologic malignancies, including diseases such as Hodgkin and non-Hodgkin lymphoma, acute and chronic lymphocytic and myelogenous leukemia, and multiple myeloma, comprise a set ...

Back to Top