Javascript must be enabled to continue!
Abstract LB108: CarD-T: Interpreting Carcinomic Lexicon via Transformers
View through CrossRef
Abstract
The identification and classification of carcinogens is critical in cancer epidemiology. We introduce the Carcinogen Detection via Transformers (CarD-T) framework, combining transformer-based machine learning with probabilistic analysis to efficiently nominate potential carcinogens from scientific texts. Trained on 60% of established carcinogens, CarD-T correctly identifies all remaining known carcinogens and nominates ∼1,600 potential new carcinogens. Comparative assessment against GPT-4 reveals CarD-T's comparable precision (0.894 vs 0.903), and superior recall (0.857 vs 0.705), implying improved ability to classify carcinogens not in major databases. Additionally, CarD-T highlights 554 entities with disputing evidence, analyzed using Bayesian Probabilistic Carcinogenic Denomination (PCarD). The framework reveals significant shifts in research focus from chemical carcinogens to broader categories including environmental factors (18%), biological agents (10%), and emerging threats like COVID-19, supported by 577 publications since 2020. This framework enhances the agility of public health responses to carcinogen identification, setting a new benchmark for automated, scalable toxicological investigations.
Citation Format:
James (Jamey) ONeill, Parag A. Katira. CarD-T: Interpreting Carcinomic Lexicon via Transformers [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 2 (Late-Breaking, Clinical Trial, and Invited Abstracts); 2025 Apr 25-30; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_2):Abstract nr LB108.
American Association for Cancer Research (AACR)
Title: Abstract LB108: CarD-T: Interpreting Carcinomic Lexicon via Transformers
Description:
Abstract
The identification and classification of carcinogens is critical in cancer epidemiology.
We introduce the Carcinogen Detection via Transformers (CarD-T) framework, combining transformer-based machine learning with probabilistic analysis to efficiently nominate potential carcinogens from scientific texts.
Trained on 60% of established carcinogens, CarD-T correctly identifies all remaining known carcinogens and nominates ∼1,600 potential new carcinogens.
Comparative assessment against GPT-4 reveals CarD-T's comparable precision (0.
894 vs 0.
903), and superior recall (0.
857 vs 0.
705), implying improved ability to classify carcinogens not in major databases.
Additionally, CarD-T highlights 554 entities with disputing evidence, analyzed using Bayesian Probabilistic Carcinogenic Denomination (PCarD).
The framework reveals significant shifts in research focus from chemical carcinogens to broader categories including environmental factors (18%), biological agents (10%), and emerging threats like COVID-19, supported by 577 publications since 2020.
This framework enhances the agility of public health responses to carcinogen identification, setting a new benchmark for automated, scalable toxicological investigations.
Citation Format:
James (Jamey) ONeill, Parag A.
Katira.
CarD-T: Interpreting Carcinomic Lexicon via Transformers [abstract].
In: Proceedings of the American Association for Cancer Research Annual Meeting 2025; Part 2 (Late-Breaking, Clinical Trial, and Invited Abstracts); 2025 Apr 25-30; Chicago, IL.
Philadelphia (PA): AACR; Cancer Res 2025;85(8_Suppl_2):Abstract nr LB108.
Related Results
William Colenso’s Māori-English Lexicon
William Colenso’s Māori-English Lexicon
<p>William Colenso, one of Victorian New Zealand’s most accomplished polymaths, is remembered best as a printer, a defrocked missionary, botanist, and politician. Up till now...
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
The remote calibration of instrumentation transformers is theoretically possible using synchronous measurements across a transmission line with a known impedance and a local set of...
Increased Transformer Availability and Reliability
Increased Transformer Availability and Reliability
Abstract
Transformers are important components of the High Voltage electrical grid and electrical power installation in industrial plants such as the petroleum indus...
shift of lexicon in traditional technology system in Tolaki community at Konawe district of Southeast Sulawesi
shift of lexicon in traditional technology system in Tolaki community at Konawe district of Southeast Sulawesi
This research discusses the lexicon used for traditional technology systems in the Tolaki community. Lexicon is a language component containing all information about the meaning an...
Perbandingan Performa Labeling Lexicon InSet dan VADER pada Analisa Sentimen Rohingya di Aplikasi X dengan SVM
Perbandingan Performa Labeling Lexicon InSet dan VADER pada Analisa Sentimen Rohingya di Aplikasi X dengan SVM
Rohingya in Indonesia has become trending conversation on social media. Sentiment analysis can get public responds. Big data makes the problem time efficiency labeling process, the...
Interpreters as Professionals
Interpreters as Professionals
In this article, I
shall examine how interpreting studies have so far accounted for different
modes and types of interpreting, and suggest that the traditional subdivision
into con...
Optimal operation of paralleled power transformers
Optimal operation of paralleled power transformers
Parallel operation of power transformers is a common practice. Interest is placed on minimizing the reactive current circulation between transformers due to mismatching of electric...
A comparative interpreting studies view of interpreting in religious contexts
A comparative interpreting studies view of interpreting in religious contexts
This article applies Comparative Interpreting Studies to research on interpreting in religious contexts and the relevance of this literature to Interpreting Studies more broadly. C...

