Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Transferable deep generative modeling of intrinsically disordered protein conformations

View through CrossRef
ABSTRACT Intrinsically disordered proteins have dynamic structures through which they play key biological roles. The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods. Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive. Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles. However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data. Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles. The approach, named idpSAM, is a latent diffusion model based on transformer neural networks. It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space. IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model. Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set. Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization. We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning. AUTHOR SUMMARY Proteins are essential molecules in living organisms and some of them have highly dynamical structures, which makes understanding their biological roles challenging. Disordered proteins can be studied through a combination of computer simulations and experiments. Computer simulations are often resource-intensive. Recently, machine learning has been used to make this process more efficient. The strategy is to learn from previous simulations to model the heterogenous conformations of proteins. However, such methods still suffer from poor transferability, meaning that they tend to make incorrect predictions on proteins not seen in training data. In this study, we present idpSAM, a method based on generative artificial intelligence for modeling the structures of disordered proteins. The model was trained using a vast dataset and, thanks to its architecture and training procedure, it performs well on not just proteins in the training set but achieves high levels transferability to proteins unseen in training. This advancement is a step forward in modeling biologically relevant disordered proteins. It shows how the combination of generative modeling and large training sets and can aid us understand how dynamical proteins behave.
Title: Transferable deep generative modeling of intrinsically disordered protein conformations
Description:
ABSTRACT Intrinsically disordered proteins have dynamic structures through which they play key biological roles.
The elucidation of their conformational ensembles is a challenging problem requiring an integrated use of computational and experimental methods.
Molecular simulations are a valuable computational strategy for constructing structural ensembles of disordered proteins but are highly resource-intensive.
Recently, machine learning approaches based on deep generative models that learn from simulation data have emerged as an efficient alternative for generating structural ensembles.
However, such methods currently suffer from limited transferability when modeling sequences and conformations absent in the training data.
Here, we develop a novel generative model that achieves high levels of transferability for intrinsically disordered protein ensembles.
The approach, named idpSAM, is a latent diffusion model based on transformer neural networks.
It combines an autoencoder to learn a representation of protein geometry and a diffusion model to sample novel conformations in the encoded space.
IdpSAM was trained on a large dataset of simulations of disordered protein regions performed with the ABSINTH implicit solvent model.
Thanks to the expressiveness of its neural networks and its training stability, idpSAM faithfully captures 3D structural ensembles of test sequences with no similarity in the training set.
Our study also demonstrates the potential for generating full conformational ensembles from datasets with limited sampling and underscores the importance of training set size for generalization.
We believe that idpSAM represents a significant progress in transferable protein ensemble modeling through machine learning.
AUTHOR SUMMARY Proteins are essential molecules in living organisms and some of them have highly dynamical structures, which makes understanding their biological roles challenging.
Disordered proteins can be studied through a combination of computer simulations and experiments.
Computer simulations are often resource-intensive.
Recently, machine learning has been used to make this process more efficient.
The strategy is to learn from previous simulations to model the heterogenous conformations of proteins.
However, such methods still suffer from poor transferability, meaning that they tend to make incorrect predictions on proteins not seen in training data.
In this study, we present idpSAM, a method based on generative artificial intelligence for modeling the structures of disordered proteins.
The model was trained using a vast dataset and, thanks to its architecture and training procedure, it performs well on not just proteins in the training set but achieves high levels transferability to proteins unseen in training.
This advancement is a step forward in modeling biologically relevant disordered proteins.
It shows how the combination of generative modeling and large training sets and can aid us understand how dynamical proteins behave.

Related Results

Molecular dynamics studies of intrinsically disordered peptides and proteins
Molecular dynamics studies of intrinsically disordered peptides and proteins
A tremendous amount of evidence has accumulated in regards to the importance of intrinsically disordered proteins (IDPs) in the functioning of the cell and their role in human dise...
Local conformations in ordered and Intrinsically disordered proteins
Local conformations in ordered and Intrinsically disordered proteins
Protein structures are highly dynamic macromolecules. This dynamics is often analysed with a limited number of proteins. In our study, molecular dynamics (MDs) simulations were per...
Local conformations analyses in ordered and intrinsically disordered proteins
Local conformations analyses in ordered and intrinsically disordered proteins
Protein structures are highly dynamic macromolecules. This dynamics is often analysed with a limited number of proteins. In our study, molecular dynamics (MDs) simulations were per...
Structure modeling of disordered protein interactions
Structure modeling of disordered protein interactions
Disordered protein-protein interactions (PPIs), those involving a folded protein and an intrinsically disordered protein (IDP), are prevalent in the cell, including important signa...
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
AlphaFold2 modeling and molecular dynamics simulations of an intrinsically disordered protein
AlphaFold2 modeling and molecular dynamics simulations of an intrinsically disordered protein
AbstractWe use AlphaFold2 (AF2) to model the monomer and dimer structures of an intrinsically disordered protein (IDP),Nvjp-1, assisted by molecular dynamics (MD) simulations. We o...
Experimental and Computational Characterization of Disordered States of Proteins
Experimental and Computational Characterization of Disordered States of Proteins
Disordered states of proteins include (i) the unfolded states of folded proteins and (ii) the biologically functional intrinsically disordered proteins. Due to the highly dynamic a...

Back to Top