Javascript must be enabled to continue!

Abstract 4966: Machine learning and large language model approach to pancancer data elements

Abstract Introductory Statement: The goal is to use machine learning (ML) and large language model (LLM) to augment the manual curation of cancer data elements. Introduction: Memorial Sloan Kettering Cancer Center (MSKCC) has ~100,000 cancer patients and counting with genomic testing. Clinicians use genomic data for research but lack clinical data to analyze together. We use a vendor, VASTA Global to hire curators to manually curate cancer patient’s core clinical data elements (CCDE) within unstructured/paragraph text in electronic medical record (EMR) notes. CCDE encompasses 122 data elements that include a patient’s full cancer history that can take up to 1 working day to curate. We collaborated with the Realyze Intelligence Healthcare Solutions vendor to use their AI pipeline to generate the manual curated dataset. Realyze generated the CCDE data elements such as histology, pathology site, MMR, TNM staging, ECOG, and KPS for a pilot lung cancer cohort of 150 patients. We manually validated the generated data for 74 out of 150 patients. Methods:The Realyze platform uses a combination of LLMs, ML algorithms and standard terminologies to create a cancer patient model. These models are flexible enough to address the unique needs and challenges of a pan-cancer oncology model. By using standardized FHIR export, results were delivered to a data lake solution and written into a REDCap database to enable human review. Summary:We manually assessed 74 patients. The NLP gave concordant values for MMR, KPS and TNM staging for 100% of the instances. For MMR these were all null values with false negative (FN) of 100% accuracy. Pathology site had 92.15% accuracy while histology has 97.5% accuracy. Conclusion:Will work on refining pathology site and histology’s ICDO3 list to increase the percentage of accuracy. Once Realyze refines their model for these data elements we will re-run it on a larger cohort of cancer patients and calculate the accuracy. Accuracy Results Clinical data elements 74 patients assessed: Accuracy % ECOG 98.6 KPS 100 T (path) 100 T (clinical) 100 N (path) 100 N (clinical) 100 M (path) 100 M(clinical) 100 MMR 100 Histology (path) 97.5 Path site 92.15 Citation Format: Andrew Niederhausern, Nadia S. Bahadur, Gary Wallace, Gilan E. Saadawi, John Philip. Machine learning and large language model approach to pancancer data elements [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4966.

American Association for Cancer Research (AACR)

Andrew Niederhausern Nadia S. Bahadur Gary Wallace Gilan E. Saadawi John Philip

Cancer Research

2024

Title: Abstract 4966: Machine learning and large language model approach to pancancer data elements

Description:

Abstract Introductory Statement: The goal is to use machine learning (ML) and large language model (LLM) to augment the manual curation of cancer data elements.

Introduction: Memorial Sloan Kettering Cancer Center (MSKCC) has ~100,000 cancer patients and counting with genomic testing.

Clinicians use genomic data for research but lack clinical data to analyze together.

We use a vendor, VASTA Global to hire curators to manually curate cancer patient’s core clinical data elements (CCDE) within unstructured/paragraph text in electronic medical record (EMR) notes.

CCDE encompasses 122 data elements that include a patient’s full cancer history that can take up to 1 working day to curate.

We collaborated with the Realyze Intelligence Healthcare Solutions vendor to use their AI pipeline to generate the manual curated dataset.

Realyze generated the CCDE data elements such as histology, pathology site, MMR, TNM staging, ECOG, and KPS for a pilot lung cancer cohort of 150 patients.

We manually validated the generated data for 74 out of 150 patients.

Methods:The Realyze platform uses a combination of LLMs, ML algorithms and standard terminologies to create a cancer patient model.

These models are flexible enough to address the unique needs and challenges of a pan-cancer oncology model.

By using standardized FHIR export, results were delivered to a data lake solution and written into a REDCap database to enable human review.

Summary:We manually assessed 74 patients.

The NLP gave concordant values for MMR, KPS and TNM staging for 100% of the instances.

For MMR these were all null values with false negative (FN) of 100% accuracy.

Pathology site had 92.

15% accuracy while histology has 97.

5% accuracy.

Conclusion:Will work on refining pathology site and histology’s ICDO3 list to increase the percentage of accuracy.

Once Realyze refines their model for these data elements we will re-run it on a larger cohort of cancer patients and calculate the accuracy.

Accuracy Results Clinical data elements 74 patients assessed: Accuracy % ECOG 98.

6 KPS 100 T (path) 100 T (clinical) 100 N (path) 100 N (clinical) 100 M (path) 100 M(clinical) 100 MMR 100 Histology (path) 97.

5 Path site 92.

15 Citation Format: Andrew Niederhausern, Nadia S.

Bahadur, Gary Wallace, Gilan E.

Saadawi, John Philip.

Machine learning and large language model approach to pancancer data elements [abstract].

In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA.

Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 4966.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)

BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program

Abstract Funding Acknowledgements Type of funding sources: None. INTRODUCTION Patients with heart failure (HF)...

Reflections Of Zoltan P. Dienes On Mathematics Education

The name of Zoltan P. Dienes (1916- ) stands with those ofJean Piaget, Jerome Bruner, Edward Begle, and Robert Davis as legendary figures whose work left a lasting impression on th...

Aviation English - A global perspective: analysis, teaching, assessment

This e-book brings together 13 chapters written by aviation English researchers and practitioners settled in six different countries, representing institutions and universities fro...

A Wideband mm-Wave Printed Dipole Antenna for 5G Applications

<span lang="EN-MY">In this paper, a wideband millimeter-wave (mm-Wave) printed dipole antenna is proposed to be used for fifth generation (5G) communications. The single elem...

An Approach to Machine Learning

The process of automatically recognising significant patterns within large amounts of data is called "machine learning." Throughout the last couple of decades, it has evolved into ...

Rodnoosjetljiv jezik na primjeru njemačkih časopisa Brigitte i Der Spiegel

On the basis of the comparative analysis of texts of the German biweekly magazine Brigitte and the weekly magazine Der Spiegel and under the presumption that gender-sensitive langu...

Email:
Password:

Email:

Abstract 4966: Machine learning and large language model approach to pancancer data elements

Related Results