Javascript must be enabled to continue!
A quantitative model of structure-based virtual screening performance
View through CrossRef
In recent studies, large library docking has predicted novel ligands with high “hit-rates” (number active/number experimentally tested) when chosen from molecules top-ranked in the screen. Our focus on hit-rates reflects the wide-spread view that while docking can succeed as a loose classifier, distinguishing likely ligands from non-binders, its scores do not meaningfully relate to affinity owing to well-known weaknesses in docking scoring functions. Here, we investigate this by analyzing large-scale experiments for three docking campaigns, where 2,544 ligands were synthesized and tested across the scoring landscape (poor scores, mediocre scores, high scores). We find that the observed experimental hit-rate curves can be accurately reproduced by a simple bivariate normal distribution model, where dock score is interpreted as a noisy predictor of binding free energy. To account for the plateauing and subsequent drop in hit-rates often seen at highly favorable docking scores, we add a term for high-ranking docking artifacts, a well-documented phenomenon observed across targets. From this simple model, three predictions emerge. First, while the model anticipates the improved hit rates and affinities as libraries have grown into the billions of molecules, it also predicts that even slight improvements in scoring accuracy would substantially improve both hit-rates and hit affinities; equivalent hit-rates could also be achieved with smaller libraries if scoring functions were improved. Second, while the nature and prevalence of artifacts is hard to anticipate, left unconsidered they can come to dominate top-scoring lists as the libraries grow. This emphasizes the importance of physically testing molecules across a range of log-normalized ranks (here called pProp) to identify the peak hit-rate of the docking model. Third, the virtual library’s intrinsic hit-rate, reflecting the percentage of molecules that would be active if all were tested, has a large impact on docking performance. Thus, pre-filtering a library for molecules with even grossly appropriate features (e.g., charge, hydrophobicity) can meaningfully boost performance with tera-scale libraries. These predictions are consistent with observations from ultra-large library docking to date, and can help us optimize future work to improve and understand results.
Title: A quantitative model of structure-based virtual screening performance
Description:
In recent studies, large library docking has predicted novel ligands with high “hit-rates” (number active/number experimentally tested) when chosen from molecules top-ranked in the screen.
Our focus on hit-rates reflects the wide-spread view that while docking can succeed as a loose classifier, distinguishing likely ligands from non-binders, its scores do not meaningfully relate to affinity owing to well-known weaknesses in docking scoring functions.
Here, we investigate this by analyzing large-scale experiments for three docking campaigns, where 2,544 ligands were synthesized and tested across the scoring landscape (poor scores, mediocre scores, high scores).
We find that the observed experimental hit-rate curves can be accurately reproduced by a simple bivariate normal distribution model, where dock score is interpreted as a noisy predictor of binding free energy.
To account for the plateauing and subsequent drop in hit-rates often seen at highly favorable docking scores, we add a term for high-ranking docking artifacts, a well-documented phenomenon observed across targets.
From this simple model, three predictions emerge.
First, while the model anticipates the improved hit rates and affinities as libraries have grown into the billions of molecules, it also predicts that even slight improvements in scoring accuracy would substantially improve both hit-rates and hit affinities; equivalent hit-rates could also be achieved with smaller libraries if scoring functions were improved.
Second, while the nature and prevalence of artifacts is hard to anticipate, left unconsidered they can come to dominate top-scoring lists as the libraries grow.
This emphasizes the importance of physically testing molecules across a range of log-normalized ranks (here called pProp) to identify the peak hit-rate of the docking model.
Third, the virtual library’s intrinsic hit-rate, reflecting the percentage of molecules that would be active if all were tested, has a large impact on docking performance.
Thus, pre-filtering a library for molecules with even grossly appropriate features (e.
g.
, charge, hydrophobicity) can meaningfully boost performance with tera-scale libraries.
These predictions are consistent with observations from ultra-large library docking to date, and can help us optimize future work to improve and understand results.
Related Results
Chemoinformatics Approaches to Virtual Screening
Chemoinformatics Approaches to Virtual Screening
Chemoinformatics is broadly a scientific discipline encompassing the design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemic...
EFEKTIFITAS PELATIHAN LABORATORIUM VIRTUAL SEBAGAI MEDIA PEMBELAJARAN BAGI GURU KIMIA
EFEKTIFITAS PELATIHAN LABORATORIUM VIRTUAL SEBAGAI MEDIA PEMBELAJARAN BAGI GURU KIMIA
EFFECTIVITY OF VIRTUAL LABORATORY TRAINING AS A LEARNING MEDIA FOR CHEMISTRY TEACHERSAchmad Lutfi, SukarminUniversitas Negeri Surabaya, Indonesia achmadlutfi@unesa.ac.idAbstractThe...
The effect of correlations between screening markers on screening performance
The effect of correlations between screening markers on screening performance
Objectives: It is widely thought that correlations between screening markers will tend to degrade screening performance. We performed a computer simulation study to investigate the...
Lung cancer screening on YouTube: Difficulty of finding balanced information.
Lung cancer screening on YouTube: Difficulty of finding balanced information.
162 Background: Lung cancer (LC) is the leading cause of cancer mortality in the US, the ACS estimates upwards of 220,000 new cases will be diagnosed this year. Recently, the Cent...
Defining "Virtual Community"
Defining "Virtual Community"
The rise of the Internet has spawned the prolific use of the adjective “virtual.” Both the popular press and scholarly researchers have written about virtual work, virtual teams, v...
Defining "Virtual Community"
Defining "Virtual Community"
The rise of the Internet has spawned the prolific use of the adjective “virtual.” Both the popular press and scholarly researchers have written about virtual work, virtual teams, v...
Cervical cancer screening utilization and predictors among eligible women in Ethiopia: A systematic review and meta-analysis
Cervical cancer screening utilization and predictors among eligible women in Ethiopia: A systematic review and meta-analysis
BackgroundDespite a remarkable progress in the reduction of global rate of maternal mortality, cervical cancer has been identified as the leading cause of maternal morbidity and mo...

