Javascript must be enabled to continue!
Harnessing Transformers to Generate Protein Sequences Prone to Liquid Liquid Phase Separation
View through CrossRef
AbstractUnderstanding the molecular grammar that governs protein phase separation is essential for advancements in bioinformatics and protein engineering. This study leverages Generative Pre-trained Transformer (GPT)-based Protein Language Models (PLMs) to decode the complex grammar of proteins prone to liquid-liquid phase separation (LLPS). We trained three distinct GPT models on datasets comprising amino acid sequences with varying LLPS propensities: highly predisposed (LLPS+ GPT), moderate (LLPS-GPT), and resistant (PDB* GPT). As training progressed, the LLPS-prone model began to learn embeddings that were distinct from those in LLPS-resistant sequences. These models generated 18,000 protein sequences ranging from 20 to 200 amino acids, which exhibited low similarity to known sequences in the SwissProt database. Statistical analysis revealed subtle but significant differences in amino acid occurrence probabilities between sequences from LLPS-prone and LLPS-resistant models, suggesting distinct molecular grammar underlying their phase separation abilities. Notably, sequences from LLPS+ GPT showed fewer aromatic residues and a higher fraction of charge decoration. Short peptides (20-25 amino acids) generated from LLPS+ GPT underwent computational and wet-lab validation, demonstrating their ability to form phase-separated states in vitro. The generated sequences enriched the existing database and enabled the development of a robust classifier that accurately distinguishes LLPS-prone from non-LLPS sequences. This research marks a significant advancement in using computational models to explore and engineer the vast protein sequence space associated with LLPS-prone proteins.
Cold Spring Harbor Laboratory
Title: Harnessing Transformers to Generate Protein Sequences Prone to Liquid Liquid Phase Separation
Description:
AbstractUnderstanding the molecular grammar that governs protein phase separation is essential for advancements in bioinformatics and protein engineering.
This study leverages Generative Pre-trained Transformer (GPT)-based Protein Language Models (PLMs) to decode the complex grammar of proteins prone to liquid-liquid phase separation (LLPS).
We trained three distinct GPT models on datasets comprising amino acid sequences with varying LLPS propensities: highly predisposed (LLPS+ GPT), moderate (LLPS-GPT), and resistant (PDB* GPT).
As training progressed, the LLPS-prone model began to learn embeddings that were distinct from those in LLPS-resistant sequences.
These models generated 18,000 protein sequences ranging from 20 to 200 amino acids, which exhibited low similarity to known sequences in the SwissProt database.
Statistical analysis revealed subtle but significant differences in amino acid occurrence probabilities between sequences from LLPS-prone and LLPS-resistant models, suggesting distinct molecular grammar underlying their phase separation abilities.
Notably, sequences from LLPS+ GPT showed fewer aromatic residues and a higher fraction of charge decoration.
Short peptides (20-25 amino acids) generated from LLPS+ GPT underwent computational and wet-lab validation, demonstrating their ability to form phase-separated states in vitro.
The generated sequences enriched the existing database and enabled the development of a robust classifier that accurately distinguishes LLPS-prone from non-LLPS sequences.
This research marks a significant advancement in using computational models to explore and engineer the vast protein sequence space associated with LLPS-prone proteins.
Related Results
BIAPSS - BioInformatic Analysis of liquid-liquid Phase-Separating protein Sequences
BIAPSS - BioInformatic Analysis of liquid-liquid Phase-Separating protein Sequences
AbstractLiquid-liquid phase separation (LLPS) has recently emerged as a cornerstone mechanism underlying the biogenesis of membraneless organelles (MLOs). However, a quantitative m...
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
On the Remote Calibration of Instrumentation Transformers: Influence of Temperature
The remote calibration of instrumentation transformers is theoretically possible using synchronous measurements across a transmission line with a known impedance and a local set of...
Phase separation in synthetic biology
Phase separation in synthetic biology
BackgroundThe concept of phase separation has been used to describe and interpret physicochemical phenomena in biological systems for decades. Many intracellular macromolecules und...
Chromatography, Liquid
Chromatography, Liquid
AbstractThis article describes the modern practice of analytical high performance liquid chromatography (HPLC). Liquid chromatography involves the separation of compounds by differ...
Increased Transformer Availability and Reliability
Increased Transformer Availability and Reliability
Abstract
Transformers are important components of the High Voltage electrical grid and electrical power installation in industrial plants such as the petroleum indus...
A COMPARISON STUDY OF HUSBAND AND WIFE SEPARATION
A COMPARISON STUDY OF HUSBAND AND WIFE SEPARATION
A legal separation is a court-supervised arrangement that allows couples to live separate lives. This is usually by living apart. The court directs financial obligations, child vis...
Charting Peptide Shared Sequences Between ‘Diabetes-Viruses’ and Human Pancreatic Proteins, Their Structural and Autoimmune Implications
Charting Peptide Shared Sequences Between ‘Diabetes-Viruses’ and Human Pancreatic Proteins, Their Structural and Autoimmune Implications
Diabetes mellitus (DM) is a metabolic syndrome characterized by hyperglycaemia, polydipsia, polyuria, and weight loss, among others. The pathophysiology for the disorders is comple...

