Javascript must be enabled to continue!
Transferable deep generative modeling of intrinsically disordered protein conformations
View through CrossRef
ABSTRACT
Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
AUTHOR SUMMARY
Proteins are essential molecules in living organisms and some of them have highly dynamical structures, which makes understanding their biological roles challenging. Disordered proteins can be studied through a combination of computer simulations and experiments. Computer simulations are often resource-intensive. Recently, machine learning has been used to make this process more efficient. The strategy is to learn from previous simulations to model the heterogenous conformations of proteins. However, such methods still suffer from poor transferability, meaning that they tend to make incorrect predictions on proteins not seen in training data. In this study, we present idpSAM, a method based on generative artificial intelligence for modeling the structures of disordered proteins. The model was trained using a vast dataset and, thanks to its architecture and training procedure, it performs well on not just proteins in the training set but achieves high levels transferability to proteins unseen in training. This advancement is a step forward in modeling biologically relevant disordered proteins. It shows how the combination of generative modeling and large training sets and can aid us understand how dynamical proteins behave.
Title: Transferable deep generative modeling of intrinsically disordered protein conformations
Description:
ABSTRACT
Intrinsically disordered proteins have dynamic structures through which they play key biological roles.
The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods.
Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive.
Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles.
However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data.
Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles.
The approach, named idpSAM, is a latent diffusion model based on transformer neural networks.
It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space.
IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model.
Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set.
Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization.
We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
AUTHOR SUMMARY
Proteins are essential molecules in living organisms and some of them have highly dynamical structures, which makes understanding their biological roles challenging.
Disordered proteins can be studied through a combination of computer simulations and experiments.
Computer simulations are often resource-intensive.
Recently, machine learning has been used to make this process more efficient.
The strategy is to learn from previous simulations to model the heterogenous conformations of proteins.
However, such methods still suffer from poor transferability, meaning that they tend to make incorrect predictions on proteins not seen in training data.
In this study, we present idpSAM, a method based on generative artificial intelligence for modeling the structures of disordered proteins.
The model was trained using a vast dataset and, thanks to its architecture and training procedure, it performs well on not just proteins in the training set but achieves high levels transferability to proteins unseen in training.
This advancement is a step forward in modeling biologically relevant disordered proteins.
It shows how the combination of generative modeling and large training sets and can aid us understand how dynamical proteins behave.
Related Results
Molecular Dynamics Studies of Intrinsically Disordered Peptides and Proteins
Molecular Dynamics Studies of Intrinsically Disordered Peptides and Proteins
A tremendous amount of evidence has accumulated in regards to the importance of intrinsically disordered proteins (IDPs) in the functioning of the cell and their role in human dise...
Comparing Population-General and Sport-Specific Correlates of Disordered Eating Amongst Elite Athletes: A Cross-Sectional Study
Comparing Population-General and Sport-Specific Correlates of Disordered Eating Amongst Elite Athletes: A Cross-Sectional Study
Abstract
Background
Despite the high prevalence of disordered eating and eating disorders amongst elite athletes, it remains unclear whether risk fa...
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
Structure modeling of disordered protein interactions
Structure modeling of disordered protein interactions
Disordered protein-protein interactions (PPIs), those involving a folded protein and an intrinsically disordered protein (IDP), are prevalent in the cell, including important signa...
AlphaFold2 modeling and molecular dynamics simulations of an intrinsically disordered protein
AlphaFold2 modeling and molecular dynamics simulations of an intrinsically disordered protein
AbstractWe use AlphaFold2 (AF2) to model the monomer and dimer structures of an intrinsically disordered protein (IDP),Nvjp-1, assisted by molecular dynamics (MD) simulations. We o...
A Social Clinical Perspective on Perfectionism in Disordered Eating Behaviour
A Social Clinical Perspective on Perfectionism in Disordered Eating Behaviour
ABSTRACTAll over Australia, disordered eating rates are increasing. Decades of research have indicated that perfectionism is a key risk factor for disordered eating behaviour. Whil...
Enzymatic Function of an Intrinsically Disordered Protein
Enzymatic Function of an Intrinsically Disordered Protein
Abstract
Intrinsically disordered proteins (IDPs) challenge the traditional structure-function paradigm by lacking a stable three-dimensional str...
Structural studies of intrinsically disordered MLL‐fusion protein AF9 in complex with peptidomimetic inhibitors
Structural studies of intrinsically disordered MLL‐fusion protein AF9 in complex with peptidomimetic inhibitors
AbstractAF9 (MLLT3) and its paralog ENL(MLLT1) are members of the YEATS family of proteins with important role in transcriptional and epigenetic regulatory complexes. These protein...

