Javascript must be enabled to continue!
Ensemble-conditioned protein sequence design with Caliby
View through CrossRef
Structure-conditioned sequence design models aim to design a protein sequence that will fold into a given target structure. Deep-learning-based approaches for sequence design have proven highly successful for various protein design applications, but many non-idealized backbones still remain out of reach for current models under typical in silico success criteria. We hypothesize that training objectives prioritizing native sequence recovery unintentionally push models to reproduce non-structural signals (e.g. phylogenetic relatedness, neutral drift, or dataset sampling biases), rather than a broadly generalizable structure-sequence mapping. Inspired by recent work bridging sequence likelihood and fitness prediction in protein language models, we introduce Caliby, a Potts model-based sequence design method capable of conditioning on an ensemble of structures. Conditioning on a synthetic ensemble generated from an input backbone allows sampling of sequences consistent with the structural constraints of the ensemble while averaging out undesired biases towards the native sequence. Ensemble-conditioned sequence design with Caliby reduces native sequence recovery while substantially improving AlphaFold2 self-consistency, outperforming state-of-the-art models ProteinMPNN and ChromaDesign on both native and de novo backbones. Finally, we train a variant of Caliby on only soluble proteins and demonstrate in silico that Protpardelle-1c binder designs that were previously deemed undesignable by SolubleMPNN are actually designable under SolubleCaliby, highlighting limitations of existing filtering pipelines. These results suggest that Caliby can expand the de novo design space beyond highly idealized backbones.
Cold Spring Harbor Laboratory
Title: Ensemble-conditioned protein sequence design with Caliby
Description:
Structure-conditioned sequence design models aim to design a protein sequence that will fold into a given target structure.
Deep-learning-based approaches for sequence design have proven highly successful for various protein design applications, but many non-idealized backbones still remain out of reach for current models under typical in silico success criteria.
We hypothesize that training objectives prioritizing native sequence recovery unintentionally push models to reproduce non-structural signals (e.
g.
phylogenetic relatedness, neutral drift, or dataset sampling biases), rather than a broadly generalizable structure-sequence mapping.
Inspired by recent work bridging sequence likelihood and fitness prediction in protein language models, we introduce Caliby, a Potts model-based sequence design method capable of conditioning on an ensemble of structures.
Conditioning on a synthetic ensemble generated from an input backbone allows sampling of sequences consistent with the structural constraints of the ensemble while averaging out undesired biases towards the native sequence.
Ensemble-conditioned sequence design with Caliby reduces native sequence recovery while substantially improving AlphaFold2 self-consistency, outperforming state-of-the-art models ProteinMPNN and ChromaDesign on both native and de novo backbones.
Finally, we train a variant of Caliby on only soluble proteins and demonstrate in silico that Protpardelle-1c binder designs that were previously deemed undesignable by SolubleMPNN are actually designable under SolubleCaliby, highlighting limitations of existing filtering pipelines.
These results suggest that Caliby can expand the de novo design space beyond highly idealized backbones.
Related Results
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features
Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features
The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-through...
Blunt Chest Trauma and Chylothorax: A Systematic Review
Blunt Chest Trauma and Chylothorax: A Systematic Review
Abstract
Introduction: Although traumatic chylothorax is predominantly associated with penetrating injuries, instances following blunt trauma, as a rare and challenging condition, ...
Multivariate Ensemble Sensitivity Analysis for an Extreme Weather Event Over Indian Subcontinent
Multivariate Ensemble Sensitivity Analysis for an Extreme Weather Event Over Indian Subcontinent
<p>Ensemble forecasts have proven useful for diagnosing the source of forecast uncertainty in a wide variety of atmospheric systems. Ensemble Sensitivity Analysis (ES...
Preliminary study of a new-style terrain disturbance method based on gradient inhomogeneity in convection-allowing scale ensemble prediction system
Preliminary study of a new-style terrain disturbance method based on gradient inhomogeneity in convection-allowing scale ensemble prediction system
<p>Terrain with different shapes and ground surface properties has extremely complex impacts on atmospheric motion, and the forecast uncertainty and complexity caused...
Steering Protein Fermentation in Pigs
Steering Protein Fermentation in Pigs
Protein fermentation in pigs has been associated with diarrhea through the presence of potentially toxic metabolites, including ammonia, branched chain fatty acids, biogenic amines...
The Story of the Lost Thai Classical Music Ensemble: The Wang Bang Kholaem Ensemble
The Story of the Lost Thai Classical Music Ensemble: The Wang Bang Kholaem Ensemble
This article was written to answer the following two questions, which are 1) What is the history of the Wang Bang Kholaem ensemble? What were the reasons for its establishment and ...

