Javascript must be enabled to continue!

A quantitative model of structure-based virtual screening performance

In recent studies, large library docking has predicted novel ligands with high “hit-rates” (number active/number experimentally tested) when chosen from molecules top-ranked in the screen. Our focus on hit-rates reflects the wide-spread view that while docking can succeed as a loose classifier, distinguishing likely ligands from non-binders, its scores do not meaningfully relate to affinity owing to well-known weaknesses in docking scoring functions. Here, we investigate this by analyzing large-scale experiments for three docking campaigns, where 2,544 ligands were synthesized and tested across the scoring landscape (poor scores, mediocre scores, high scores). We find that the observed experimental hit-rate curves can be accurately reproduced by a simple bivariate normal distribution model, where dock score is interpreted as a noisy predictor of binding free energy. To account for the plateauing and subsequent drop in hit-rates often seen at highly favorable docking scores, we add a term for high-ranking docking artifacts, a well-documented phenomenon observed across targets. From this simple model, three predictions emerge. First, while the model anticipates the improved hit rates and affinities as libraries have grown into the billions of molecules, it also predicts that even slight improvements in scoring accuracy would substantially improve both hit-rates and hit affinities; equivalent hit-rates could also be achieved with smaller libraries if scoring functions were improved. Second, while the nature and prevalence of artifacts is hard to anticipate, left unconsidered they can come to dominate top-scoring lists as the libraries grow. This emphasizes the importance of physically testing molecules across a range of log-normalized ranks (here called pProp) to identify the peak hit-rate of the docking model. Third, the virtual library’s intrinsic hit-rate, reflecting the percentage of molecules that would be active if all were tested, has a large impact on docking performance. Thus, pre-filtering a library for molecules with even grossly appropriate features (e.g., charge, hydrophobicity) can meaningfully boost performance with tera-scale libraries. These predictions are consistent with observations from ultra-large library docking to date, and can help us optimize future work to improve and understand results.

American Chemical Society (ACS)

Laust Moesgaard Brian K. Shoichet Olivier Mailhot

2025

Title: A quantitative model of structure-based virtual screening performance

Description:

In recent studies, large library docking has predicted novel ligands with high “hit-rates” (number active/number experimentally tested) when chosen from molecules top-ranked in the screen.

Our focus on hit-rates reflects the wide-spread view that while docking can succeed as a loose classifier, distinguishing likely ligands from non-binders, its scores do not meaningfully relate to affinity owing to well-known weaknesses in docking scoring functions.

Here, we investigate this by analyzing large-scale experiments for three docking campaigns, where 2,544 ligands were synthesized and tested across the scoring landscape (poor scores, mediocre scores, high scores).

We find that the observed experimental hit-rate curves can be accurately reproduced by a simple bivariate normal distribution model, where dock score is interpreted as a noisy predictor of binding free energy.

To account for the plateauing and subsequent drop in hit-rates often seen at highly favorable docking scores, we add a term for high-ranking docking artifacts, a well-documented phenomenon observed across targets.

From this simple model, three predictions emerge.

First, while the model anticipates the improved hit rates and affinities as libraries have grown into the billions of molecules, it also predicts that even slight improvements in scoring accuracy would substantially improve both hit-rates and hit affinities; equivalent hit-rates could also be achieved with smaller libraries if scoring functions were improved.

Second, while the nature and prevalence of artifacts is hard to anticipate, left unconsidered they can come to dominate top-scoring lists as the libraries grow.

This emphasizes the importance of physically testing molecules across a range of log-normalized ranks (here called pProp) to identify the peak hit-rate of the docking model.

Third, the virtual library’s intrinsic hit-rate, reflecting the percentage of molecules that would be active if all were tested, has a large impact on docking performance.

Thus, pre-filtering a library for molecules with even grossly appropriate features (e.

, charge, hydrophobicity) can meaningfully boost performance with tera-scale libraries.

These predictions are consistent with observations from ultra-large library docking to date, and can help us optimize future work to improve and understand results.

Back

Chemoinformatics is broadly a scientific discipline encompassing the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemic...

EFEKTIFITAS PELATIHAN LABORATORIUM VIRTUAL SEBAGAI MEDIA PEMBELAJARAN BAGI GURU KIMIA

EFFECTIVITY OF VIRTUAL LABORATORY TRAINING AS A LEARNING MEDIA FOR CHEMISTRY TEACHERSAchmad Lutfi, SukarminUniversitas Negeri Surabaya, Indonesia achmadlutfi@unesa.ac.idAbstractThe...

VR 101

Today we call many things “virtual.” Virtual corporations connect teams of workers located across the country. In leisure time, people form clubs based on shared interests in polit...

The effect of correlations between screening markers on screening performance

Objectives: It is widely thought that correlations between screening markers will tend to degrade screening performance. We performed a computer simulation study to investigate the...

Lung cancer screening on YouTube: Difficulty of finding balanced information.

162 Background: Lung cancer (LC) is the leading cause of cancer mortality in the US, the ACS estimates upwards of 220,000 new cases will be diagnosed this year. Recently, the Cent...

Defining "Virtual Community"

The rise of the Internet has spawned the prolific use of the adjective “virtual.” Both the popular press and scholarly researchers have written about virtual work, virtual teams, v...

Defining "Virtual Community"

The rise of the Internet has spawned the prolific use of the adjective “virtual.” Both the popular press and scholarly researchers have written about virtual work, virtual teams, v...

Cervical cancer screening utilization and predictors among eligible women in Ethiopia: A systematic review and meta-analysis

BackgroundDespite a remarkable progress in the reduction of global rate of maternal mortality, cervical cancer has been identified as the leading cause of maternal morbidity and mo...

Email:
Password:

Email:

A quantitative model of structure-based virtual screening performance

Related Results