Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

DeBERTa-Based SMILES Encoders for ADMET-Aware Drug Design

View through CrossRef
Multi-modal drug discovery frameworks increasingly rely on robust encoders capable of representing chemical structures alongside other data modalities. In this study, we fine-tuned a DeBERTa-based SMILES encoder to improve its predictive capacity for 22 ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) endpoints while maintaining strong comprehension of molecular syntax. Starting from a pretrained ZINC-based DeBERTa checkpoint, we trained on a 300K PubChem-ADMET dataset using a multi-label regression scheme with a focal MAE loss to stabilize learning across diverse properties. Our encoder achieved top-10 rankings on 16 TDC benchmark tasks, including notable improvements of 14–30% on critical endpoints such as bioavailability and CYP2C9-substrate. Compared with BERT- and RoBERTa-based molecular encoders, our approach preserved significantly higher MLM accuracy (>89%) over training, indicating robust retention of chemical language understanding. Additional analysis using an ADMET path length metric revealed that DeBERTa produced more disentangled latent representations, underscoring its suitability for property-specific molecular manipulation. These results demonstrate that a disentangled, ADMET-aware DeBERTa encoder can serve as a powerful component for future multi-modal pipelines in AI-driven drug design, effectively balancing structural fluency with predictive specialization.
Title: DeBERTa-Based SMILES Encoders for ADMET-Aware Drug Design
Description:
Multi-modal drug discovery frameworks increasingly rely on robust encoders capable of representing chemical structures alongside other data modalities.
In this study, we fine-tuned a DeBERTa-based SMILES encoder to improve its predictive capacity for 22 ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) endpoints while maintaining strong comprehension of molecular syntax.
Starting from a pretrained ZINC-based DeBERTa checkpoint, we trained on a 300K PubChem-ADMET dataset using a multi-label regression scheme with a focal MAE loss to stabilize learning across diverse properties.
Our encoder achieved top-10 rankings on 16 TDC benchmark tasks, including notable improvements of 14–30% on critical endpoints such as bioavailability and CYP2C9-substrate.
Compared with BERT- and RoBERTa-based molecular encoders, our approach preserved significantly higher MLM accuracy (>89%) over training, indicating robust retention of chemical language understanding.
Additional analysis using an ADMET path length metric revealed that DeBERTa produced more disentangled latent representations, underscoring its suitability for property-specific molecular manipulation.
These results demonstrate that a disentangled, ADMET-aware DeBERTa encoder can serve as a powerful component for future multi-modal pipelines in AI-driven drug design, effectively balancing structural fluency with predictive specialization.

Related Results

Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
A video polysomnographic study of spontaneous smiling during sleep in newborns
A video polysomnographic study of spontaneous smiling during sleep in newborns
AbstractThe objective of the present study was to confirm the link between spontaneous smiling and active sleep in newborns, and to identify the role of the cortex in the generatio...
Multiclass Sentiment Analysis of Electric Vehicle Incentive Policies Using IndoBERT and DeBERTa Algorithms
Multiclass Sentiment Analysis of Electric Vehicle Incentive Policies Using IndoBERT and DeBERTa Algorithms
The electric vehicle (EV) incentive policy in Indonesia has generated various public reactions, particularly on social media platforms. This study aims to classify public sentiment...
Reward, affiliation, and dominance smiles communicate different social motives following trust violations
Reward, affiliation, and dominance smiles communicate different social motives following trust violations
Others’ facial expressions can influence whether we trust them. For example, smiles tend to elicit positive impressions and increased cooperation. But how are smiles perceived when...
The Functional SMILES Perspective
The Functional SMILES Perspective
Simplified Molecular-Input Line-Entry System or SMILES is a notation scheme for representing chemical structures in a single line of text, encoding atom connectivity and stereochem...
Design
Design
Conventional definitions of design rarely capture its reach into our everyday lives. The Design Council, for example, estimates that more than 2.5 million people use design-related...
Evaluation of existence of anthropometric proportions in dentitions of females who are satisfied with their smile: a cross sectional study
Evaluation of existence of anthropometric proportions in dentitions of females who are satisfied with their smile: a cross sectional study
Aim: To evaluate Preston's ratio and Recurrent Esthetic Dental (RED) proportion in smiles of female patients who are satisfied with their smiles. Methodology: 86 subjects who fulfi...
Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models
Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models
Background Social media provides timely proxy signals of mental health, but reliable tweet-level classification of depression subtypes remains challenging due t...

Back to Top