Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Assembly and reasoning over semantic mappings at scale for biomedical data integration

View through CrossRef
Motivation: Hundreds of resources assign identifiers to biomedical concepts including genes, small molecules, biological processes, diseases, and cell types. Often, these resources overlap by assigning identifiers to the same or related concepts. This creates a data interoperability bottleneck, as integrating data sets and knowledge bases that use identifiers for the same concepts from different resources require such identifiers to be mapped to each other. However, available mappings are incomplete and fragmented across individual resources, motivating their large-scale integration. Results: We developed SeMRA, a software tool that integrates mappings from multiple sources into a graph data structure. Using graph algorithms, it infers missing mappings implied by available ones while keeping track of provenance and confidence. This allows connecting identifier spaces between which direct mapping was previously not possible. SeMRA is customizable and takes a declarative specification as input describing sources to integrate with additional configuration parameters. We make available an aggregated mappings resource produced by SeMRA consisting of 43.4 million mappings from 127 sources that jointly cover identifiers from 445 ontologies and databases. We also describe benchmarks on specific use cases such as integrating mappings between resources cataloging diseases or cell types. Availability: The code is available under the MIT license at https://github.com/biopragmatics/semra. The mappings database assembled by SeMRA is available at https://zenodo.org/records/15208251.
Title: Assembly and reasoning over semantic mappings at scale for biomedical data integration
Description:
Motivation: Hundreds of resources assign identifiers to biomedical concepts including genes, small molecules, biological processes, diseases, and cell types.
Often, these resources overlap by assigning identifiers to the same or related concepts.
This creates a data interoperability bottleneck, as integrating data sets and knowledge bases that use identifiers for the same concepts from different resources require such identifiers to be mapped to each other.
However, available mappings are incomplete and fragmented across individual resources, motivating their large-scale integration.
Results: We developed SeMRA, a software tool that integrates mappings from multiple sources into a graph data structure.
Using graph algorithms, it infers missing mappings implied by available ones while keeping track of provenance and confidence.
This allows connecting identifier spaces between which direct mapping was previously not possible.
SeMRA is customizable and takes a declarative specification as input describing sources to integrate with additional configuration parameters.
We make available an aggregated mappings resource produced by SeMRA consisting of 43.
4 million mappings from 127 sources that jointly cover identifiers from 445 ontologies and databases.
We also describe benchmarks on specific use cases such as integrating mappings between resources cataloging diseases or cell types.
Availability: The code is available under the MIT license at https://github.
com/biopragmatics/semra.
The mappings database assembled by SeMRA is available at https://zenodo.
org/records/15208251.

Related Results

Biomappings: Community curation of mappings between biomedical entities
Biomappings: Community curation of mappings between biomedical entities
Many related biomedical resources propose their own identifiers for genes, proteins, chemicals, biological processes, and other entities of biological interest. The integration of ...
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
In order to realize an artificial intelligent system, a basic mechanism should be provided for expressing and processing the semantic. We have presented semantic computing models i...
Prediction and Curation of Missing Biomedical Identifier Mappings with Biomappings
Prediction and Curation of Missing Biomedical Identifier Mappings with Biomappings
Abstract Motivation Biomedical identifier resources (ontologies, taxonomies, controlled vocabularies) commonly overlap in scope...
Logical Challenges in Artificial General Intelligence
Logical Challenges in Artificial General Intelligence
The present thesis pertains to the research area of logic for artificial intelligence (AI), and is motivated by the critical role of automated reasoning in AI, particularly by the ...
Prediction and curation of missing biomedical identifier mappings with Biomappings
Prediction and curation of missing biomedical identifier mappings with Biomappings
AbstractMotivationBiomedical identifier resources (such as ontologies, taxonomies, and controlled vocabularies) commonly overlap in scope and contain equivalent entries under diffe...
Investigation of Prospective Science Teachers’ Understandings on Ergastic Substances with the Semantic Mappings
Investigation of Prospective Science Teachers’ Understandings on Ergastic Substances with the Semantic Mappings
The aim of this study was investigated of prospective science teachers’ understandings on ergastic substances with the semantic mappings. This study was phenomenological research m...
Topological Mappings Based on SPG*-Closed Set
Topological Mappings Based on SPG*-Closed Set
In this paper, we introduce the concept of SPG*-closed mapping and continuous mapping among which SPG-closed mappings, SPG*-closed mappings and SPG**-closed mappings and the relati...

Back to Top