Javascript must be enabled to continue!
Harnessing large language models (LLMs) for candidate gene prioritization and selection
View through CrossRef
AbstractBackgroundFeature selection is a critical step for translating advances afforded by systems-scale molecular profiling into actionable clinical insights. While data-driven methods are commonly utilized for selecting candidate genes, knowledge-driven methods must contend with the challenge of efficiently sifting through extensive volumes of biomedical information. This work aimed to assess the utility of large language models (LLMs) for knowledge-driven gene prioritization and selection.MethodsIn this proof of concept, we focused on 11 blood transcriptional modules associated with an Erythroid cells signature. We evaluated four leading LLMs across multiple tasks. Next, we established a workflow leveraging LLMs. The steps consisted of: (1) Selecting one of the 11 modules; (2) Identifying functional convergences among constituent genes using the LLMs; (3) Scoring candidate genes across six criteria capturing the gene’s biological and clinical relevance; (4) Prioritizing candidate genes and summarizing justifications; (5) Fact-checking justifications and identifying supporting references; (6) Selecting a top candidate gene based on validated scoring justifications; and (7) Factoring in transcriptome profiling data to finalize the selection of the top candidate gene.ResultsOf the four LLMs evaluated, OpenAI's GPT-4 and Anthropic's Claude demonstrated the best performance and were chosen for the implementation of the candidate gene prioritization and selection workflow. This workflow was run in parallel for each of the 11 erythroid cell modules by participants in a data mining workshop. Module M9.2 served as an illustrative use case. The 30 candidate genes forming this module were assessed, and the top five scoring genes were identified as BCL2L1, ALAS2, SLC4A1, CA1, and FECH. Researchers carefully fact-checked the summarized scoring justifications, after which the LLMs were prompted to select a top candidate based on this information. GPT-4 initially chose BCL2L1, while Claude selected ALAS2. When transcriptional profiling data from three reference datasets were provided for additional context, GPT-4 revised its initial choice to ALAS2, whereas Claude reaffirmed its original selection for this module.ConclusionsTaken together, our findings highlight the ability of LLMs to prioritize candidate genes with minimal human intervention. This suggests the potential of this technology to boost productivity, especially for tasks that require leveraging extensive biomedical knowledge.
Springer Science and Business Media LLC
Title: Harnessing large language models (LLMs) for candidate gene prioritization and selection
Description:
AbstractBackgroundFeature selection is a critical step for translating advances afforded by systems-scale molecular profiling into actionable clinical insights.
While data-driven methods are commonly utilized for selecting candidate genes, knowledge-driven methods must contend with the challenge of efficiently sifting through extensive volumes of biomedical information.
This work aimed to assess the utility of large language models (LLMs) for knowledge-driven gene prioritization and selection.
MethodsIn this proof of concept, we focused on 11 blood transcriptional modules associated with an Erythroid cells signature.
We evaluated four leading LLMs across multiple tasks.
Next, we established a workflow leveraging LLMs.
The steps consisted of: (1) Selecting one of the 11 modules; (2) Identifying functional convergences among constituent genes using the LLMs; (3) Scoring candidate genes across six criteria capturing the gene’s biological and clinical relevance; (4) Prioritizing candidate genes and summarizing justifications; (5) Fact-checking justifications and identifying supporting references; (6) Selecting a top candidate gene based on validated scoring justifications; and (7) Factoring in transcriptome profiling data to finalize the selection of the top candidate gene.
ResultsOf the four LLMs evaluated, OpenAI's GPT-4 and Anthropic's Claude demonstrated the best performance and were chosen for the implementation of the candidate gene prioritization and selection workflow.
This workflow was run in parallel for each of the 11 erythroid cell modules by participants in a data mining workshop.
Module M9.
2 served as an illustrative use case.
The 30 candidate genes forming this module were assessed, and the top five scoring genes were identified as BCL2L1, ALAS2, SLC4A1, CA1, and FECH.
Researchers carefully fact-checked the summarized scoring justifications, after which the LLMs were prompted to select a top candidate based on this information.
GPT-4 initially chose BCL2L1, while Claude selected ALAS2.
When transcriptional profiling data from three reference datasets were provided for additional context, GPT-4 revised its initial choice to ALAS2, whereas Claude reaffirmed its original selection for this module.
ConclusionsTaken together, our findings highlight the ability of LLMs to prioritize candidate genes with minimal human intervention.
This suggests the potential of this technology to boost productivity, especially for tasks that require leveraging extensive biomedical knowledge.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Perspectives and Experiences With Large Language Models in Health Care: Survey Study
Perspectives and Experiences With Large Language Models in Health Care: Survey Study
Background
Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unified Theory of ...
Perspectives and Experiences With Large Language Models in Health Care: Survey Study (Preprint)
Perspectives and Experiences With Large Language Models in Health Care: Survey Study (Preprint)
BACKGROUND
Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unifie...
LLMs and AI: Understanding Its Reach and Impact
LLMs and AI: Understanding Its Reach and Impact
Large Language Models (LLMs) have revolutionized the field of Artificial Intelligence with their ability to understand and generate natural language discourse. This has led to the ...
The limitations of large language models for understanding human language and cognition
The limitations of large language models for understanding human language and cognition
Researchers have recently argued that the capabilities of Large Language Models (LLMs) can provide new insights into longstanding debates about the role of learning and/or innatene...
Designing and deploying scalable intelligent tutoring systems to enhance adult education
Designing and deploying scalable intelligent tutoring systems to enhance adult education
Intelligent tutoring systems have consistently been shown to be effective in enhancing student learning outcomes. However, despite their demonstrated benefits, these systems have n...

