Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Context-dependent similarity searching for small molecular fragments

View through CrossRef
Abstract Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties. For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships due to feature sparseness. As an alternative, we have adapted the concept of context-depending word pair similarity from natural language processing to evaluate similarity relationships between substituents (R-groups) taking latent characteristics into account. Context-dependent similarity assessment is based on vector embeddings as fragment representations generated using neural networks. With active analogue series as a model system to establish a global structure–activity context, we demonstrate that this approach is applicable to systematic similarity searching for substituents and increases the performance of standard descriptor representations. Context-dependent similarity searching is capable of detecting remote and functionally relevant similarity relationships between substituents. Alternative search queries are introduced focusing on individual substituents within a global substituent context or individual sequences of substituents establishing a local context. For similarity searching, different structural or structure–property contexts can be established, providing opportunities for various applications.
Title: Context-dependent similarity searching for small molecular fragments
Description:
Abstract Similarity searching is a mainstay in cheminformatics that is generally used to identify compounds with desired properties.
For small molecular fragments, similarity calculations based on standard descriptors often have limited utility for establishing meaningful similarity relationships due to feature sparseness.
As an alternative, we have adapted the concept of context-depending word pair similarity from natural language processing to evaluate similarity relationships between substituents (R-groups) taking latent characteristics into account.
Context-dependent similarity assessment is based on vector embeddings as fragment representations generated using neural networks.
With active analogue series as a model system to establish a global structure–activity context, we demonstrate that this approach is applicable to systematic similarity searching for substituents and increases the performance of standard descriptor representations.
Context-dependent similarity searching is capable of detecting remote and functionally relevant similarity relationships between substituents.
Alternative search queries are introduced focusing on individual substituents within a global substituent context or individual sequences of substituents establishing a local context.
For similarity searching, different structural or structure–property contexts can be established, providing opportunities for various applications.

Related Results

Similarity Search with Data Missing
Similarity Search with Data Missing
Similarity search is a fundamental research problem with broad applications in various research fields, including data mining, information retrieval, and machine learning. The core...
Using covariance weighted euclidean distance to assess the dissimilarity between integral experiments
Using covariance weighted euclidean distance to assess the dissimilarity between integral experiments
Integral experiments especially criticality experiments help a lot in designing either new nuclear reactor or criticality assembly. The calculation uncertainty of the integral para...
MoTSE: an interpretable task similarity estimator for small molecular property prediction tasks
MoTSE: an interpretable task similarity estimator for small molecular property prediction tasks
AbstractUnderstanding the molecular properties (e.g., physical, chemical or physiological characteristics and biological activities) of small molecules plays essential roles in bio...
Improved Cosine Similarity Measures for q-Rung Orthopair Fuzzy Sets
Improved Cosine Similarity Measures for q-Rung Orthopair Fuzzy Sets
In this paper, we introduce some novel cosine similarity measures for \(q\)-rung orthopair fuzzy sets (\(q\)-ROFSs), which capture both direction and magnitude aspects of fuzzy set...
Searching and reporting in Campbell Collaboration systematic reviews: A systematic assessment of current methods
Searching and reporting in Campbell Collaboration systematic reviews: A systematic assessment of current methods
AbstractThe search methods used in systematic reviews provide the foundation for establishing the body of literature from which conclusions are drawn and recommendations made. Sear...
Similarity Criteria of Water Drive Physical Simulation of Pressure-Sensitive Fractured Reservoirs
Similarity Criteria of Water Drive Physical Simulation of Pressure-Sensitive Fractured Reservoirs
A mathematical equation of water drive physical simulation of pressure-sensitive fractured reservoirs was established based on previous research results. In this study, the similar...
Analysis of a Similarity Measure for Non-Overlapped Data
Analysis of a Similarity Measure for Non-Overlapped Data
A similarity measure is a measure evaluating the degree of similarity between two fuzzy data sets and has become an essential tool in many applications including data mining, patte...
Satisfaction in Counseling Alumni and Students
Satisfaction in Counseling Alumni and Students
The purpose of this study was to investigate satisfaction in counseling majors. The four independent variables investigated were gender, program status, employment status, and age....

Back to Top