Javascript must be enabled to continue!
BioPaxCOMP: an efficient system for integrating, compressing, and querying BioPAX
View through CrossRef
Abstract truncated at 3,000 characters - the full version is available in the pdf file.
Biological networks and, in particular, biological pathways are composed of thousands of nodes and edges, posing several challenge regarding analysis and storage. The primary format used to represent pathways data is BioPAX (http://biopax.org.) BioPAX is a standard language that aims to enable integration, exchange, visualization and analysis of biological pathway data. BioPAX is an open and collaborative effort made by the community of researchers, software developers, and institutions and it specifically supports data exchange between pathway data groups. BioPAX is defined in OWL and is represented in the RDF/XML format. OWL (Web Ontology Language) is a W3C standard and is designed for use by applications that need to process the content of information instead of just presenting information to humans. RDF is a standard model for data interchange on the Web. Although OWL allows a standard representation of pathways, since it is based on XML, it is a verbose and redundant language, so the storage of pathways may be very huge, preventing an efficient transmission and sharing of this data. The typical size of a pathway is related to the organism, for example, the size of Homo Sapiens pathways (from Reactome database) is near to 200 MB on disk. Moreover, integrating pathways data coming from different data sources may require GBytes of space. A second problem with pathways is related to the possibility to integrate information coming from different data sources to have updated information in a centralized way. There exist several different databases for pathways data that emphasizes different aspect of the same pathway, thus, it could be useful to integrate and annotate together pathways coming from different databases to obtain a centralized and more informative pathway data. The principal obstacle for integrating, storing and exchanging such data is the extreme size growth when several pathways data are merged together, posing several challenges from the computational and archiving point of view. Pathways data can be easily classified as big data, because they meet all the 5V (Volume, Velocity, Variety, Veracity, Value) characteristics typical of Big Data, thus, the necessity to efficiently integrate and compress pathways data arises. The methodology for pathways data integration is based on the following steps: i) aggregation and validation locally of data coming from several pathway databases, ii) identification and normalization of compounds and reactions identifier and iii) integration. Integration occurs at the level of physical entities, such as proteins and small molecules. This is accomplished by linking interaction and pathway records together if they use the same physical entities (such as from UniProt for proteins) and by adding annotation data from UniProt or GeneOntology.
Title: BioPaxCOMP: an efficient system for integrating, compressing, and querying BioPAX
Description:
Abstract truncated at 3,000 characters - the full version is available in the pdf file.
Biological networks and, in particular, biological pathways are composed of thousands of nodes and edges, posing several challenge regarding analysis and storage.
The primary format used to represent pathways data is BioPAX (http://biopax.
org.
) BioPAX is a standard language that aims to enable integration, exchange, visualization and analysis of biological pathway data.
BioPAX is an open and collaborative effort made by the community of researchers, software developers, and institutions and it specifically supports data exchange between pathway data groups.
BioPAX is defined in OWL and is represented in the RDF/XML format.
OWL (Web Ontology Language) is a W3C standard and is designed for use by applications that need to process the content of information instead of just presenting information to humans.
RDF is a standard model for data interchange on the Web.
Although OWL allows a standard representation of pathways, since it is based on XML, it is a verbose and redundant language, so the storage of pathways may be very huge, preventing an efficient transmission and sharing of this data.
The typical size of a pathway is related to the organism, for example, the size of Homo Sapiens pathways (from Reactome database) is near to 200 MB on disk.
Moreover, integrating pathways data coming from different data sources may require GBytes of space.
A second problem with pathways is related to the possibility to integrate information coming from different data sources to have updated information in a centralized way.
There exist several different databases for pathways data that emphasizes different aspect of the same pathway, thus, it could be useful to integrate and annotate together pathways coming from different databases to obtain a centralized and more informative pathway data.
The principal obstacle for integrating, storing and exchanging such data is the extreme size growth when several pathways data are merged together, posing several challenges from the computational and archiving point of view.
Pathways data can be easily classified as big data, because they meet all the 5V (Volume, Velocity, Variety, Veracity, Value) characteristics typical of Big Data, thus, the necessity to efficiently integrate and compress pathways data arises.
The methodology for pathways data integration is based on the following steps: i) aggregation and validation locally of data coming from several pathway databases, ii) identification and normalization of compounds and reactions identifier and iii) integration.
Integration occurs at the level of physical entities, such as proteins and small molecules.
This is accomplished by linking interaction and pathway records together if they use the same physical entities (such as from UniProt for proteins) and by adding annotation data from UniProt or GeneOntology.
Related Results
BiNoM 2.0, a Cytoscape plugin for accessing and analyzing pathways using standard systems biology formats
BiNoM 2.0, a Cytoscape plugin for accessing and analyzing pathways using standard systems biology formats
Abstract
Background
Public repositories of biological pathways and networks have greatly expanded in recent years. Such databases contain many pa...
Dynamic-budget superpixel active learning for semantic segmentation
Dynamic-budget superpixel active learning for semantic segmentation
IntroductionActive learning can significantly decrease the labeling cost of deep learning workflows by prioritizing the limited labeling budget to high-impact data points that have...
Improving the performance of 3D image model compression based on optimized DEFLATE algorithm
Improving the performance of 3D image model compression based on optimized DEFLATE algorithm
AbstractThis study focuses on optimizing and designing the Delayed-Fix-Later Awaiting Transmission Encoding (DEFLATE) algorithm to enhance its compression performance and reduce th...
Building pathway graphs from BioPAX data in R
Building pathway graphs from BioPAX data in R
Biological pathways are increasingly available in the BioPAX format which uses an RDF model for data storage. One can retrieve the information in this data model in the scripting l...
Hierarchical Fuzzy Sets to Query Possibilistic Databases
Hierarchical Fuzzy Sets to Query Possibilistic Databases
Within the framework of flexible querying of possibilistic databases, based on the fuzzy set theory, this chapter focuses on the case where the vocabulary used both in the querying...
AQ-CSL: Attention Querying Facial Action Unit Detection Net withCross Subject Learning
AQ-CSL: Attention Querying Facial Action Unit Detection Net withCross Subject Learning
Abstract
Recently, the advancement of deep learning has led to considerable breakthroughs in the automated detection of ActionUnits (AUs). Nevertheless, this field is still...
EPD Electronic Pathogen Detection v1
EPD Electronic Pathogen Detection v1
Electronic pathogen detection (EPD) is a non - invasive, rapid, affordable, point- of- care test, for Covid 19 resulting from infection with SARS-CoV-2 virus. EPD scanning techno...
Flight Reservation System
Flight Reservation System
Flight reservation System is a computerized system used to store and retrieve information and conduct transactions related to air travel. The project is aimed at exposing the relev...

