Javascript must be enabled to continue!
Context-dependent similarity searching for small molecular fragments
View through CrossRef
Abstract
Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties. For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships due to feature sparseness. As an alternative, we have adapted the concept of context-depending word pair similarity from natural language processing to evaluate similarity relationships between substituents (R-groups) taking latent characteristics into account. Context-dependent similarity assessment is based on vector embeddings as fragment representations generated using neural networks. With active analogue series as a model system to establish a global structure–activity context, we demonstrate that this approach is applicable to systematic similarity searching for substituents and increases the performance of standard descriptor representations. Context-dependent similarity searching is capable of detecting remote and functionally relevant similarity relationships between substituents. Alternative search queries are introduced focusing on individual substituents within a global substituent context or individual sequences of substituents establishing a local context. For similarity searching, different structural or structure–property contexts can be established, providing opportunities for various applications.
Springer Science and Business Media LLC
Title: Context-dependent similarity searching for small molecular fragments
Description:
Abstract
Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties.
For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships due to feature sparseness.
As an alternative, we have adapted the concept of context-depending word pair similarity from natural language processing to evaluate similarity relationships between substituents (R-groups) taking latent characteristics into account.
Context-dependent similarity assessment is based on vector embeddings as fragment representations generated using neural networks.
With active analogue series as a model system to establish a global structure–activity context, we demonstrate that this approach is applicable to systematic similarity searching for substituents and increases the performance of standard descriptor representations.
Context-dependent similarity searching is capable of detecting remote and functionally relevant similarity relationships between substituents.
Alternative search queries are introduced focusing on individual substituents within a global substituent context or individual sequences of substituents establishing a local context.
For similarity searching, different structural or structure–property contexts can be established, providing opportunities for various applications.
Related Results
Similarity Search with Data Missing
Similarity Search with Data Missing
Similarity search is a fundamental research problem with broad applications in various research fields, including data mining, information retrieval, and machine learning. The core...
Produire des lames de hache en dolérite au Néolithique en Mayenne. La carrière-atelier de Beulin à Saint-Germain-le-Guillaume
Produire des lames de hache en dolérite au Néolithique en Mayenne. La carrière-atelier de Beulin à Saint-Germain-le-Guillaume
Wishing to emulate the advances made by their British colleagues (Stone Axe Studies), during the 1960s P.-R. Giot and J. Cogné worked to identify the main sources of the rocks used...
Using covariance weighted euclidean distance to assess the dissimilarity between integral experiments
Using covariance weighted euclidean distance to assess the dissimilarity between integral experiments
Integral experiments especially criticality experiments help a lot in designing either new nuclear reactor or criticality assembly. The calculation uncertainty of the integral para...
MoTSE: an interpretable task similarity estimator for small molecular property prediction tasks
MoTSE: an interpretable task similarity estimator for small molecular property prediction tasks
AbstractUnderstanding the molecular properties (e.g., physical, chemical or physiological characteristics and biological activities) of small molecules plays essential roles in bio...
Improved Cosine Similarity Measures for q-Rung Orthopair Fuzzy Sets
Improved Cosine Similarity Measures for q-Rung Orthopair Fuzzy Sets
In this paper, we introduce some novel cosine similarity measures for \(q\)-rung orthopair fuzzy sets (\(q\)-ROFSs), which capture both direction and magnitude aspects of fuzzy set...
A Quantum Geometric Framework for Modeling Color Similarity Judgements
A Quantum Geometric Framework for Modeling Color Similarity Judgements
Since Tversky (1977) argued that similarity judgments violate the three metric axioms, asymmetrical similarity judgments have been offered as particularly difficult challenges for ...
A Method for Detecting Abnormal Changes in the Temperature Field of Grain Bulk Based on HSV Features of Cloud Maps
A Method for Detecting Abnormal Changes in the Temperature Field of Grain Bulk Based on HSV Features of Cloud Maps
HighlightsAbnormal grain temperature changes were detected by calculating the similarity of HSV features in cloud maps.The F-measures were higher for the improved method than for m...
Searching and reporting in Campbell Collaboration systematic reviews: A systematic assessment of current methods
Searching and reporting in Campbell Collaboration systematic reviews: A systematic assessment of current methods
AbstractThe search methods used in systematic reviews provide the foundation for establishing the body of literature from which conclusions are drawn and recommendations made. Sear...

