Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

PubChem and ChEMBL Beyond Lipinski

View through CrossRef
Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database. These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs. Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated. Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores. We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.gdb.tools/), the second one to perform MXFP nearest neighbor searches in NLP (http://similaritysearch.gdb.tools/). We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC. The interactive tools presented here are publicly available at http://gdb.unibe.ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.
Title: PubChem and ChEMBL Beyond Lipinski
Description:
Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also found in the ChEMBL database.
These non-Lipinski PubChem (NLP) and ChEMBL (NLC) subsets are interesting because they contain new modalities that can display biological properties not accessible to small molecule drugs.
Unfortunately, the current search tools in PubChem and ChEMBL are designed for small molecules and are not well suited to explore these subsets, which therefore remain poorly appreciated.
Herein we report MXFP (macromolecule extended atom-pair fingerprint), a 217-D fingerprint tailored to analyze large molecules in terms of molecular shape and pharmacophores.
We implement MXFP in two web-based applications, the first one to visualize NLP and NLC interactively using Faerun (http://faerun.
gdb.
tools/), the second one to perform MXFP nearest neighbor searches in NLP (http://similaritysearch.
gdb.
tools/).
We show that these tools provide a meaningful insight into the diversity of large molecules in NLP and NLC.
The interactive tools presented here are publicly available at http://gdb.
unibe.
ch and can be used freely to explore and better understand the diversity of non-Lipinski molecules in PubChem and ChEMBL.

Related Results

PubChem and ChEMBL Beyond Lipinski
PubChem and ChEMBL Beyond Lipinski
Seven million of the currently 94 million entries in the PubChem database break at least one of the four Lipinski constraints for oral bioavailability, 183,185 of which are also fo...
Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI
AbstractBackgroundThere are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provi...
Hybrid Approach to Identifying Druglikeness Leading Compounds against COVID-19 3CL Protease
Hybrid Approach to Identifying Druglikeness Leading Compounds against COVID-19 3CL Protease
SARS-CoV-2 is a positive single-strand RNA-based macromolecule that has caused the death of more than 6.3 million people since June 2022. Moreover, by disturbing global supply chai...
Drug Analogs from Fragment Based Long Short-Term Memory Generative Neural Networks
Drug Analogs from Fragment Based Long Short-Term Memory Generative Neural Networks
Several recent reports have shown that long short-term memory generative neural networks (LSTM) of the type used for grammar learning efficiently learn to write SMILES of drug-like...
Drug Analogs from Fragment Based Long Short-Term Memory Generative Neural Networks
Drug Analogs from Fragment Based Long Short-Term Memory Generative Neural Networks
Several recent reports have shown that long short-term memory generative neural networks (LSTM) of the type used for grammar learning efficiently learn to write SMILES of drug-like...
Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
Support Vector Machine model for hERG inhibitory activities based on the integrated hERG database using descriptor selection by NSGA-II
AbstractAssessing the hERG liability in the early stages of drug discovery programs is important. The recent increase of hERG-related information in public databases enabled variou...

Back to Top