Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Automated annotation in UniProt

View through CrossRef
UniProt is a high quality, comprehensive protein resource in which the core activity is the expert review and annotation of proteins where the function has been experimentally investigated. At the same time, the UniProt database contains large numbers of proteins which are predicted to exist from gene models, but which do not have associated experimental evidence indicating their function. UniProt commits significant resources to developing computational methods for functional annotation of these predicted proteins based on the data in entries that have gone through the expert review process. We will describe the two main automated annotation systems currently in use. First, UniRule, which is an established UniProt system in which curators manually develop rules for annotation. Second, ARBA (Association-Rule-Based Annotator), which is a multi-class learning system which uses rule mining techniques to generate concise annotation models. ARBA employs a data exclusion algorithm that censors data not suitable for computational annotation, and generates human-readable rules for each UniProt release. As part of our interest in engaging with the machine learning community, we will also introduce the contribution of ProtNLM (Protein Natural Language Model), from Google Research, which annotates proteins which have "uncharacterised" names. We will also introduce UniFIRE, an open source software that enables researchers to annotate their own protein dataset by using the above mentioned annotation systems. In order to provide an easy and straightforward way to download and set up this tool we have containerised UniFIRE together with all its dependencies and the latest set of UniRule and ARBA rules. In this webinar, we will show how to create functional predictions for protein sequences by using this container image.
Title: Automated annotation in UniProt
Description:
UniProt is a high quality, comprehensive protein resource in which the core activity is the expert review and annotation of proteins where the function has been experimentally investigated.
At the same time, the UniProt database contains large numbers of proteins which are predicted to exist from gene models, but which do not have associated experimental evidence indicating their function.
UniProt commits significant resources to developing computational methods for functional annotation of these predicted proteins based on the data in entries that have gone through the expert review process.
We will describe the two main automated annotation systems currently in use.
First, UniRule, which is an established UniProt system in which curators manually develop rules for annotation.
Second, ARBA (Association-Rule-Based Annotator), which is a multi-class learning system which uses rule mining techniques to generate concise annotation models.
ARBA employs a data exclusion algorithm that censors data not suitable for computational annotation, and generates human-readable rules for each UniProt release.
As part of our interest in engaging with the machine learning community, we will also introduce the contribution of ProtNLM (Protein Natural Language Model), from Google Research, which annotates proteins which have "uncharacterised" names.
We will also introduce UniFIRE, an open source software that enables researchers to annotate their own protein dataset by using the above mentioned annotation systems.
In order to provide an easy and straightforward way to download and set up this tool we have containerised UniFIRE together with all its dependencies and the latest set of UniRule and ARBA rules.
In this webinar, we will show how to create functional predictions for protein sequences by using this container image.

Related Results

QALB: Qatar Arabic language bank
QALB: Qatar Arabic language bank
Automatic text correction has been attracting research attention for English and some other western languages. Applications for automatic text correction vary from improving langua...
Mining sequence annotation databanks for association patterns
Mining sequence annotation databanks for association patterns
Abstract Motivation: Millions of protein sequences currently being deposited to sequence databanks will never be annotated manually. Similarity-based annotation gene...
Applying negative rule mining to improve genome annotation
Applying negative rule mining to improve genome annotation
Abstract Background Unsupervised annotation of proteins by software pipelines suffers from very high error rates. Spurious functional assignments...
Development and Evaluation of Gold Standard Dataset for Sentiment Analysis of Tweets
Development and Evaluation of Gold Standard Dataset for Sentiment Analysis of Tweets
Pre-labeled data is typically required for supervised machine learning. A limited number of object classes in the majority of open access and pre-annotated datasets make them unsui...
Navigating UniProt: a brief overview and recent updates
Navigating UniProt: a brief overview and recent updates
Join us for an engaging webinar that will guide you through the new and enhanced features and functionalities of the UniProt website. Designed with user experience at the forefron...
AMAW: automated gene annotation for non-model eukaryotic genomes
AMAW: automated gene annotation for non-model eukaryotic genomes
Background: The annotation of genomes is a crucial step regarding the analysis of new genomic data and resulting insights, and this especially for emerging organisms which allow re...
Learning maximally spanning representations improves protein function annotation
Learning maximally spanning representations improves protein function annotation
Abstract Automated protein function annotation is a fundamental problem in computational biology, crucial for understanding the functional roles ...
Carbohydrate-active enzyme annotation in microbiomes using dbCAN
Carbohydrate-active enzyme annotation in microbiomes using dbCAN
AbstractCAZymes or carbohydrate-active enzymes are critically important for human gut health, lignocellulose degradation, global carbon recycling, soil health, and plant disease. W...

Back to Top