Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Biomappings: Community curation of mappings between biomedical entities

View through CrossRef
Many related biomedical resources propose their own identifiers for genes, proteins, chemicals, biological processes, and other entities of biological interest. The integration of data and knowledge fundamentally relies on mappings between equivalent entities across resources. While some maintain and distribute mappings to external namespaces (e.g. HGNC provides mappings to Entrez Gene identifiers), there exist systematic gaps in the availability of mappings between widely used resources. Automated approaches including lexical and structural alignment have been used to find missing mappings, but most do not store important metadata like mapping confidence nor important curation artifacts including proposed mappings curated to be (nontrivially) incorrect, nor provide interfaces for curating (reviewing, and confirming or rejecting) predicted mappings. We introduce Biomappings, an open repository for making expert-curated mappings (currently 5,700), and predicted mappings (currently 28,000) available with their associated metadata in an intuitive tab-delimited format. It is licensed under the permissive CC0 license to encourage community contributions and restriction-free integration back into primary resources. Biomappings provides a web-based interface for curating predictions and adding manually constructed mappings, a Python package for interacting with the mappings, and several workflow examples for generating new mappings. We applied the Biomappings curation workflow to missing mappings between the Medical Subject Headings and several other ontologies including the Disease Ontology, ChEBI, and the Gene Ontology. We also used Biomappings to curate an exhaustive set of proposed mappings (constructed automatically based on lexical overlap) between entries representing cancer cell lines across three previously unmapped resources: the Cancer Cell Line Encyclopedia, the Experimental Factor Ontology, and Cellosaurus. All data and code are available at https://github.com/biopragmatics/biomappings .
Title: Biomappings: Community curation of mappings between biomedical entities
Description:
Many related biomedical resources propose their own identifiers for genes, proteins, chemicals, biological processes, and other entities of biological interest.
The integration of data and knowledge fundamentally relies on mappings between equivalent entities across resources.
While some maintain and distribute mappings to external namespaces (e.
g.
HGNC provides mappings to Entrez Gene identifiers), there exist systematic gaps in the availability of mappings between widely used resources.
Automated approaches including lexical and structural alignment have been used to find missing mappings, but most do not store important metadata like mapping confidence nor important curation artifacts including proposed mappings curated to be (nontrivially) incorrect, nor provide interfaces for curating (reviewing, and confirming or rejecting) predicted mappings.
We introduce Biomappings, an open repository for making expert-curated mappings (currently 5,700), and predicted mappings (currently 28,000) available with their associated metadata in an intuitive tab-delimited format.
It is licensed under the permissive CC0 license to encourage community contributions and restriction-free integration back into primary resources.
Biomappings provides a web-based interface for curating predictions and adding manually constructed mappings, a Python package for interacting with the mappings, and several workflow examples for generating new mappings.
We applied the Biomappings curation workflow to missing mappings between the Medical Subject Headings and several other ontologies including the Disease Ontology, ChEBI, and the Gene Ontology.
We also used Biomappings to curate an exhaustive set of proposed mappings (constructed automatically based on lexical overlap) between entries representing cancer cell lines across three previously unmapped resources: the Cancer Cell Line Encyclopedia, the Experimental Factor Ontology, and Cellosaurus.
All data and code are available at https://github.
com/biopragmatics/biomappings .

Related Results

Prediction and Curation of Missing Biomedical Identifier Mappings with Biomappings
Prediction and Curation of Missing Biomedical Identifier Mappings with Biomappings
Abstract Motivation Biomedical identifier resources (ontologies, taxonomies, controlled vocabularies) commonly overlap in scope...
Prediction and curation of missing biomedical identifier mappings with Biomappings
Prediction and curation of missing biomedical identifier mappings with Biomappings
AbstractMotivationBiomedical identifier resources (such as ontologies, taxonomies, and controlled vocabularies) commonly overlap in scope and contain equivalent entries under diffe...
Digital Curation and Doctoral Research
Digital Curation and Doctoral Research
This article considers digital curation in doctoral study and the role of the doctoral supervisor and institution in facilitating students’ acquisition of digital curation skills...
Big data curation framework: Curation actions and challenges
Big data curation framework: Curation actions and challenges
Big data curation represents an emerging topic of inquiry but still in an early phase along its adoption curve. The term big data itself is a nebulous concept, and the differences ...
Topological Mappings Based on SPG*-Closed Set
Topological Mappings Based on SPG*-Closed Set
In this paper, we introduce the concept of SPG*-closed mapping and continuous mapping among which SPG-closed mappings, SPG*-closed mappings and SPG**-closed mappings and the relati...
Digital Curation and its Costs: A Study of Practices and Insights
Digital Curation and its Costs: A Study of Practices and Insights
Introduction - Regarding the concerns that prompted the emergence of digital curation, this study takes as an example the context of the production of large volumes of scientific i...
Assembly and reasoning over semantic mappings at scale for biomedical data integration
Assembly and reasoning over semantic mappings at scale for biomedical data integration
Motivation: Hundreds of resources assign identifiers to biomedical concepts including genes, small molecules, biological processes, diseases, and cell types. Often, these resources...
Extended multi-valued pseudocontractive mappings and extended Mann and Ishikawa iteration schemes for finite family of mappings
Extended multi-valued pseudocontractive mappings and extended Mann and Ishikawa iteration schemes for finite family of mappings
In this paper, new classes of extended multi-valued pseudocontractive mappings are introduced. It is established that the type-one subclass of the extended strictly pseudocon...

Back to Top