Javascript must be enabled to continue!
Ensemble-conditioned protein sequence design with Caliby
View through CrossRef
Structure-conditioned sequence design models aim to design a protein sequence that will fold into a given target structure. Deep-learning-based approaches for sequence design have proven highly successful for various protein design applications, but many non-idealized backbones still remain out of reach for current models under typical in silico success criteria. We hypothesize that training objectives prioritizing native sequence recovery unintentionally push models to reproduce non-structural signals (e.g. phylogenetic relatedness, neutral drift, or dataset sampling biases), rather than a broadly generalizable structure-sequence mapping. Inspired by recent work bridging sequence likelihood and fitness prediction in protein language models, we introduce Caliby, a Potts model-based sequence design method capable of conditioning on an ensemble of structures. Conditioning on a synthetic ensemble generated from an input backbone allows sampling of sequences consistent with the structural constraints of the ensemble while averaging out undesired biases towards the native sequence. Ensemble-conditioned sequence design with Caliby reduces native sequence recovery while substantially improving AlphaFold2 self-consistency, outperforming state-of-the-art models ProteinMPNN and ChromaDesign on both native and de novo backbones. Finally, we train a variant of Caliby on only soluble proteins and demonstrate in silico that Protpardelle-1c binder designs that were previously deemed undesignable by SolubleMPNN are actually designable under SolubleCaliby, highlighting limitations of existing filtering pipelines. These results suggest that Caliby can expand the de novo design space beyond highly idealized backbones.
Cold Spring Harbor Laboratory
Title: Ensemble-conditioned protein sequence design with Caliby
Description:
Structure-conditioned sequence design models aim to design a protein sequence that will fold into a given target structure.
Deep-learning-based approaches for sequence design have proven highly successful for various protein design applications, but many non-idealized backbones still remain out of reach for current models under typical in silico success criteria.
We hypothesize that training objectives prioritizing native sequence recovery unintentionally push models to reproduce non-structural signals (e.
g.
phylogenetic relatedness, neutral drift, or dataset sampling biases), rather than a broadly generalizable structure-sequence mapping.
Inspired by recent work bridging sequence likelihood and fitness prediction in protein language models, we introduce Caliby, a Potts model-based sequence design method capable of conditioning on an ensemble of structures.
Conditioning on a synthetic ensemble generated from an input backbone allows sampling of sequences consistent with the structural constraints of the ensemble while averaging out undesired biases towards the native sequence.
Ensemble-conditioned sequence design with Caliby reduces native sequence recovery while substantially improving AlphaFold2 self-consistency, outperforming state-of-the-art models ProteinMPNN and ChromaDesign on both native and de novo backbones.
Finally, we train a variant of Caliby on only soluble proteins and demonstrate in silico that Protpardelle-1c binder designs that were previously deemed undesignable by SolubleMPNN are actually designable under SolubleCaliby, highlighting limitations of existing filtering pipelines.
These results suggest that Caliby can expand the de novo design space beyond highly idealized backbones.
Related Results
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
Sequence-ensemble-function relationships for disordered proteins in live cells
Sequence-ensemble-function relationships for disordered proteins in live cells
Abstract
Intrinsically disordered protein regions (IDRs) are ubiquitous across all kingdoms of life and play a variety of essential cellular roles. IDRs exist in a collecti...
Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features
Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features
The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-through...
Push-pull cropping system soil legacy alter maize metabolism and fall armyworm, Spodoptera frugiperda (Lepidoptera: Noctuidae) resistance through tritrophic interactions”
Push-pull cropping system soil legacy alter maize metabolism and fall armyworm, Spodoptera frugiperda (Lepidoptera: Noctuidae) resistance through tritrophic interactions”
Abstract
Background and aims
Crop cultivation practices and soil legacies are intrinsically linked and are hypothesized to influence plant direct and indirect defence again...
Blunt Chest Trauma and Chylothorax: A Systematic Review
Blunt Chest Trauma and Chylothorax: A Systematic Review
Abstract
Introduction: Although traumatic chylothorax is predominantly associated with penetrating injuries, instances following blunt trauma, as a rare and challenging condition, ...
Multivariate Ensemble Sensitivity Analysis for an Extreme Weather Event Over Indian Subcontinent
Multivariate Ensemble Sensitivity Analysis for an Extreme Weather Event Over Indian Subcontinent
<p>Ensemble forecasts have proven useful for diagnosing the source of forecast uncertainty in a wide variety of atmospheric systems. Ensemble Sensitivity Analysis (ES...
Preliminary study of a new-style terrain disturbance method based on gradient inhomogeneity in convection-allowing scale ensemble prediction system
Preliminary study of a new-style terrain disturbance method based on gradient inhomogeneity in convection-allowing scale ensemble prediction system
<p>Terrain with different shapes and ground surface properties has extremely complex impacts on atmospheric motion, and the forecast uncertainty and complexity caused...

