Javascript must be enabled to continue!

DeBERTa-Based SMILES Encoders for ADMET-Aware Drug Design

Multi-modal drug discovery frameworks increasingly rely on robust encoders capable of representing chemical structures alongside other data modalities. In this study, we fine-tuned a DeBERTa-based SMILES encoder to improve its predictive capacity for 22 ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) endpoints while maintaining strong comprehension of molecular syntax. Starting from a pretrained ZINC-based DeBERTa checkpoint, we trained on a 300K PubChem-ADMET dataset using a multi-label regression scheme with a focal MAE loss to stabilize learning across diverse properties. Our encoder achieved top-10 rankings on 16 TDC benchmark tasks, including notable improvements of 14–30% on critical endpoints such as bioavailability and CYP2C9-substrate. Compared with BERT- and RoBERTa-based molecular encoders, our approach preserved significantly higher MLM accuracy (>89%) over training, indicating robust retention of chemical language understanding. Additional analysis using an ADMET path length metric revealed that DeBERTa produced more disentangled latent representations, underscoring its suitability for property-specific molecular manipulation. These results demonstrate that a disentangled, ADMET-aware DeBERTa encoder can serve as a powerful component for future multi-modal pipelines in AI-driven drug design, effectively balancing structural fluency with predictive specialization.

American Chemical Society (ACS)

Jong hyeon Lim Myounwoo Kim Youngmahn Han Jin Yong Lee

2025

Title: DeBERTa-Based SMILES Encoders for ADMET-Aware Drug Design

Description:

Multi-modal drug discovery frameworks increasingly rely on robust encoders capable of representing chemical structures alongside other data modalities.

In this study, we fine-tuned a DeBERTa-based SMILES encoder to improve its predictive capacity for 22 ADMET (Absorption, Distribution, Metabolism, Excretion, and Toxicity) endpoints while maintaining strong comprehension of molecular syntax.

Starting from a pretrained ZINC-based DeBERTa checkpoint, we trained on a 300K PubChem-ADMET dataset using a multi-label regression scheme with a focal MAE loss to stabilize learning across diverse properties.

Our encoder achieved top-10 rankings on 16 TDC benchmark tasks, including notable improvements of 14–30% on critical endpoints such as bioavailability and CYP2C9-substrate.

Compared with BERT- and RoBERTa-based molecular encoders, our approach preserved significantly higher MLM accuracy (>89%) over training, indicating robust retention of chemical language understanding.

Additional analysis using an ADMET path length metric revealed that DeBERTa produced more disentangled latent representations, underscoring its suitability for property-specific molecular manipulation.

These results demonstrate that a disentangled, ADMET-aware DeBERTa encoder can serve as a powerful component for future multi-modal pipelines in AI-driven drug design, effectively balancing structural fluency with predictive specialization.

Back

BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

A video polysomnographic study of spontaneous smiling during sleep in newborns

AbstractThe objective of the present study was to confirm the link between spontaneous smiling and active sleep in newborns, and to identify the role of the cortex in the generatio...

Multiclass Sentiment Analysis of Electric Vehicle Incentive Policies Using IndoBERT and DeBERTa Algorithms

The electric vehicle (EV) incentive policy in Indonesia has generated various public reactions, particularly on social media platforms. This study aims to classify public sentiment...

Reward, affiliation, and dominance smiles communicate different social motives following trust violations

Others’ facial expressions can influence whether we trust them. For example, smiles tend to elicit positive impressions and increased cooperation. But how are smiles perceived when...

The Functional SMILES Perspective

Simplified Molecular-Input Line-Entry System or SMILES is a notation scheme for representing chemical structures in a single line of text, encoding atom connectivity and stereochem...

Design

Conventional definitions of design rarely capture its reach into our everyday lives. The Design Council, for example, estimates that more than 2.5 million people use design-related...

Evaluation of existence of anthropometric proportions in dentitions of females who are satisfied with their smile: a cross sectional study

Aim: To evaluate Preston's ratio and Recurrent Esthetic Dental (RED) proportion in smiles of female patients who are satisfied with their smiles. Methodology: 86 subjects who fulfi...

Depression subtype classification from social media posts: few-shot prompting vs. fine-tuning of large language models

Background Social media provides timely proxy signals of mental health, but reliable tweet-level classification of depression subtypes remains challenging due t...

Email:
Password:

Email:

DeBERTa-Based SMILES Encoders for ADMET-Aware Drug Design

Related Results