Javascript must be enabled to continue!
GO2Sum: generating human-readable functional summary of proteins from GO terms
View through CrossRef
AbstractUnderstanding the biological functions of proteins is of fundamental importance in modern biology. To represent a function of proteins, Gene Ontology (GO), a controlled vocabulary, is frequently used, because it is easy to handle by computer programs avoiding open-ended text interpretation. Particularly, the majority of current protein function prediction methods rely on GO terms. However, the extensive list of GO terms that describe a protein function can pose challenges for biologists when it comes to interpretation. In response to this issue, we developed GO2Sum (Gene Ontology terms Summarizer), a model that takes a set of GO terms as input and generates a human-readable summary using the T5 large language model. GO2Sum was developed by fine-tuning T5 on GO term assignments and free-text function descriptions for UniProt entries, enabling it to recreate function descriptions by concatenating GO term descriptions. Our results demonstrated that GO2Sum significantly outperforms the original T5 model that was trained on the entire web corpus in generating Function, Subunit Structure, and Pathway paragraphs for UniProt entries.
Springer Science and Business Media LLC
Title: GO2Sum: generating human-readable functional summary of proteins from GO terms
Description:
AbstractUnderstanding the biological functions of proteins is of fundamental importance in modern biology.
To represent a function of proteins, Gene Ontology (GO), a controlled vocabulary, is frequently used, because it is easy to handle by computer programs avoiding open-ended text interpretation.
Particularly, the majority of current protein function prediction methods rely on GO terms.
However, the extensive list of GO terms that describe a protein function can pose challenges for biologists when it comes to interpretation.
In response to this issue, we developed GO2Sum (Gene Ontology terms Summarizer), a model that takes a set of GO terms as input and generates a human-readable summary using the T5 large language model.
GO2Sum was developed by fine-tuning T5 on GO term assignments and free-text function descriptions for UniProt entries, enabling it to recreate function descriptions by concatenating GO term descriptions.
Our results demonstrated that GO2Sum significantly outperforms the original T5 model that was trained on the entire web corpus in generating Function, Subunit Structure, and Pathway paragraphs for UniProt entries.
Related Results
GO2Sum: Generating Human Readable Functional Summary of Proteins from GO Terms
GO2Sum: Generating Human Readable Functional Summary of Proteins from GO Terms
AbstractUnderstanding the biological functions of proteins is of fundamental importance in modern biology. To represent function of proteins, Gene Ontology (GO), a controlled vocab...
Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review
Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review
This review summarizes the evidence from six randomized controlled trials that judged the effectiveness of systematic review summaries on policymakers' decision making, or the most...
Identification of heparin‐binding proteins in bovine seminal plasma
Identification of heparin‐binding proteins in bovine seminal plasma
AbstractA group of four similar proteins, BSP‐A1, BSP‐A2, BSP‐A3, and BSP‐30‐kDa, represent the major acidic proteins found in bovine seminal plasma (BSP). These proteins are secre...
Deciphering the immunogenic potential of wheat flour: a reference map of the salt-soluble proteome from the U.S. wheat Butte 86
Deciphering the immunogenic potential of wheat flour: a reference map of the salt-soluble proteome from the U.S. wheat Butte 86
Abstract
Background
Within the complex wheat flour proteome, the gluten proteins have attracted most of the attention because of their importance in determining the functional prop...
Comprehensive host-pathogen protein-protein interaction network analysis
Comprehensive host-pathogen protein-protein interaction network analysis
Abstract
Background
Infectious diseases are a cruel assassin with millions of victims around the world each year. Understanding infectious mechanism...
SCIENTIFIC AND THEORETICAL FOUNDATIONS OF HUMAN-GEOGRAPHICAL TERMINOLOGY CREATION
SCIENTIFIC AND THEORETICAL FOUNDATIONS OF HUMAN-GEOGRAPHICAL TERMINOLOGY CREATION
The importance of human-geographical terminology-knowledge as area of theory of geographical science is emphasized in this artikle. Human-geographical terminology-knowledge highlig...
Proteomics Analysis and Identification of Critical Proteins and Network Interactions that Regulate the Specific Deposition of IMF of Jingyuan Chicken
Proteomics Analysis and Identification of Critical Proteins and Network Interactions that Regulate the Specific Deposition of IMF of Jingyuan Chicken
Abstract
Background: Improving broiler production efficiency and delivering good quality chicken has become an exciting area of research. Many factors affect the quality of...
Cover Picture: Proteomics 13'09
Cover Picture: Proteomics 13'09
AbstractHuman Growth Hormone: Variants vary in sugar coatingIn the last issue we commented on the age and variants of the heat shock proteins, in this issue we take a brief look at...

