Javascript must be enabled to continue!
FAMUS: A Few-Shot Learning Framework for Large-Scale Protein Annotation
View through CrossRef
Predicting gene function is a pivotal and challenging step in genomic and metagenomic data analysis. Current automatic annotation tools typically rely on the single most similar sequence from the query database. The sparsity of data per annotation makes it challenging to confidently assign gene function for underrepresented genes. Here, we present a contrastive learning framework for functional annotation. FAMUS (Functional Annotation Method Using Supervised contrastive learning) compares query sequences to profile Hidden Markov Model databases and transforms the similarity scores into a condensed vector space that minimizes the distance of proteins from the same family. The similarity scores of a query to all profiles are used for its representation instead of considering only the top-ranking hit. In a protein family assignment task, FAMUS outperformed KEGG's native KofamScan for KEGG Orthology annotation and InterPro's InterProScan for PANTHER family annotation. We thus created four protein annotation models using protein families from the KEGG Orthology, InterPro family, OrthoDB, and EggNOG databases. All four models are available as a conda package and via our user-friendly web server, allowing users to annotate large-scale datasets. FAMUS is the first comprehensive and modular annotation framework based on contrastive learning. It supports both pre-defined and user-specific databases for tailored annotation, and can be easily integrated into any genomic and metagenomic analysis pipeline to facilitate accurate, large-scale functional annotation.
Title: FAMUS: A Few-Shot Learning Framework for Large-Scale Protein Annotation
Description:
Predicting gene function is a pivotal and challenging step in genomic and metagenomic data analysis.
Current automatic annotation tools typically rely on the single most similar sequence from the query database.
The sparsity of data per annotation makes it challenging to confidently assign gene function for underrepresented genes.
Here, we present a contrastive learning framework for functional annotation.
FAMUS (Functional Annotation Method Using Supervised contrastive learning) compares query sequences to profile Hidden Markov Model databases and transforms the similarity scores into a condensed vector space that minimizes the distance of proteins from the same family.
The similarity scores of a query to all profiles are used for its representation instead of considering only the top-ranking hit.
In a protein family assignment task, FAMUS outperformed KEGG's native KofamScan for KEGG Orthology annotation and InterPro's InterProScan for PANTHER family annotation.
We thus created four protein annotation models using protein families from the KEGG Orthology, InterPro family, OrthoDB, and EggNOG databases.
All four models are available as a conda package and via our user-friendly web server, allowing users to annotate large-scale datasets.
FAMUS is the first comprehensive and modular annotation framework based on contrastive learning.
It supports both pre-defined and user-specific databases for tailored annotation, and can be easily integrated into any genomic and metagenomic analysis pipeline to facilitate accurate, large-scale functional annotation.
Related Results
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Focused Acute Medicine Ultrasound (FAMUS) – point of care ultrasound for the Acute Medical Unit
Focused Acute Medicine Ultrasound (FAMUS) – point of care ultrasound for the Acute Medical Unit
Point of care ultrasound (POCU) is becoming increasingly popular as an extension to clinical examination techniques. Specific POCU training pathways have been developed in specialt...
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
EMNet: A Novel Few-Shot Image Classification Model with Enhanced Self-Correlation Attention and Multi-Branch Joint Module
EMNet: A Novel Few-Shot Image Classification Model with Enhanced Self-Correlation Attention and Multi-Branch Joint Module
In this research, inspired by the principles of biological visual attention mechanisms and swarm intelligence found in nature, we present an Enhanced Self-Correlation Attention and...
Study on hardness and wear resistance of shot peened AA7075-T6 aluminum alloy
Study on hardness and wear resistance of shot peened AA7075-T6 aluminum alloy
Abstract
AA7075-T6 aluminum alloy samples were shot peened at various shot peening pressures in the range of 10–70 psi to study their mechanical and tribological ...
Comparative Evaluation of Zero-Shot and Few-Shot Performance of Large Language Models in Low-Resource Language Machine Translation
Comparative Evaluation of Zero-Shot and Few-Shot Performance of Large Language Models in Low-Resource Language Machine Translation
Large language models (LLMs) have demonstrated remarkable translation capabilities for high-resource languages, yet their effectiveness on low-resource languages under varying prom...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
Identify Cricket Shots using Machine Learning
Identify Cricket Shots using Machine Learning
Cricket shot detection is a game-changing technology that offers deep insights into player performance and match data, completely changing the way the sport is played. The main ele...

