Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

GO2Sum: Generating Human Readable Functional Summary of Proteins from GO Terms

View through CrossRef
AbstractUnderstanding the biological functions of proteins is of fundamental importance in modern biology. To represent function of proteins, Gene Ontology (GO), a controlled vocabulary, is frequently used, because it is easy to handle by computer programs avoiding open-ended text interpretation. Particularly, the majority of current protein function prediction methods rely on GO terms. However, the extensive list of GO terms that describe a protein function can pose challenges for biologists when it comes to interpretation. In response to this issue, we developed GO2Sum (Gene Ontology terms Summarizer), a model that takes a set of GO terms as input and generates a human-readable summary using the T5 large language model. GO2Sum was developed by fine-tuning T5 on GO term assignments and free-text function descriptions for UniProt entries, enabling it to recreate function descriptions by concatenating GO term descriptions. Our results demonstrated that GO2Sum significantly outperforms the original T5 model that was trained on the entire web corpus in generating Function, Subunit Structure, and Pathway paragraphs for UniProt entries.
Title: GO2Sum: Generating Human Readable Functional Summary of Proteins from GO Terms
Description:
AbstractUnderstanding the biological functions of proteins is of fundamental importance in modern biology.
To represent function of proteins, Gene Ontology (GO), a controlled vocabulary, is frequently used, because it is easy to handle by computer programs avoiding open-ended text interpretation.
Particularly, the majority of current protein function prediction methods rely on GO terms.
However, the extensive list of GO terms that describe a protein function can pose challenges for biologists when it comes to interpretation.
In response to this issue, we developed GO2Sum (Gene Ontology terms Summarizer), a model that takes a set of GO terms as input and generates a human-readable summary using the T5 large language model.
GO2Sum was developed by fine-tuning T5 on GO term assignments and free-text function descriptions for UniProt entries, enabling it to recreate function descriptions by concatenating GO term descriptions.
Our results demonstrated that GO2Sum significantly outperforms the original T5 model that was trained on the entire web corpus in generating Function, Subunit Structure, and Pathway paragraphs for UniProt entries.

Related Results

GO2Sum: generating human-readable functional summary of proteins from GO terms
GO2Sum: generating human-readable functional summary of proteins from GO terms
AbstractUnderstanding the biological functions of proteins is of fundamental importance in modern biology. To represent a function of proteins, Gene Ontology (GO), a controlled voc...
Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review
Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review
This review summarizes the evidence from six randomized controlled trials that judged the effectiveness of systematic review summaries on policymakers' decision making, or the most...
Identification of heparin‐binding proteins in bovine seminal plasma
Identification of heparin‐binding proteins in bovine seminal plasma
AbstractA group of four similar proteins, BSP‐A1, BSP‐A2, BSP‐A3, and BSP‐30‐kDa, represent the major acidic proteins found in bovine seminal plasma (BSP). These proteins are secre...
Deciphering the immunogenic potential of wheat flour: a reference map of the salt-soluble proteome from the U.S. wheat Butte 86
Deciphering the immunogenic potential of wheat flour: a reference map of the salt-soluble proteome from the U.S. wheat Butte 86
Abstract Background Within the complex wheat flour proteome, the gluten proteins have attracted most of the attention because of their importance in determining the functional prop...
Comprehensive host-pathogen protein-protein interaction network analysis
Comprehensive host-pathogen protein-protein interaction network analysis
Abstract Background Infectious diseases are a cruel assassin with millions of victims around the world each year. Understanding infectious mechanism...
SCIENTIFIC AND THEORETICAL FOUNDATIONS OF HUMAN-GEOGRAPHICAL TERMINOLOGY CREATION
SCIENTIFIC AND THEORETICAL FOUNDATIONS OF HUMAN-GEOGRAPHICAL TERMINOLOGY CREATION
The importance of human-geographical terminology-knowledge as area of theory of geographical science is emphasized in this artikle. Human-geographical terminology-knowledge highlig...
Cover Picture: Proteomics 13'09
Cover Picture: Proteomics 13'09
AbstractHuman Growth Hormone: Variants vary in sugar coatingIn the last issue we commented on the age and variants of the heat shock proteins, in this issue we take a brief look at...

Back to Top