Javascript must be enabled to continue!
Predicting enzyme substrate chemical structure with protein language models
View through CrossRef
AbstractThe number of unannotated or orphan enzymes vastly outnumber those for which the chemical structure of the substrates are known. While a number of enzyme function prediction algorithms exist, these often predict Enzyme Commission (EC) numbers or enzyme family, which limits their ability to generate experimentally testable hypotheses. Here, we harness protein language models, cheminformatics, and machine learning classification techniques to accelerate the annotation of orphan enzymes by predicting their substrate’s chemical structural class. We use the orphan enzymes ofMycobacterium tuberculosisas a case study, focusing on two protein families that are highly abundant in its proteome: the short-chain dehydrogenase/reductases (SDRs) and the S-adenosylmethionine (SAM)-dependent methyltransferases. Training machine learning classification models that take as input the protein sequence embeddings obtained from a pre-trained, self-supervised protein language model results in excellent accuracy for a wide variety of prediction tasks. These include redox cofactor preference for SDRs; small-molecule vs. polymer (i.e. protein, DNA or RNA) substrate preference for SAM-dependent methyltransferases; as well as more detailed chemical structural predictions for the preferred substrates of both enzyme families. We then use these trained classifiers to generate predictions for the full set of unannotated SDRs and SAM-methyltransferases in the proteomes ofM. tuberculosisand other mycobacteria, generating a set of biochemically testable hypotheses. Our approach can be extended and generalized to other enzyme families and organisms, and we envision it will help accelerate the annotation of a large number of orphan enzymes.Graphical abstract
Cold Spring Harbor Laboratory
Title: Predicting enzyme substrate chemical structure with protein language models
Description:
AbstractThe number of unannotated or orphan enzymes vastly outnumber those for which the chemical structure of the substrates are known.
While a number of enzyme function prediction algorithms exist, these often predict Enzyme Commission (EC) numbers or enzyme family, which limits their ability to generate experimentally testable hypotheses.
Here, we harness protein language models, cheminformatics, and machine learning classification techniques to accelerate the annotation of orphan enzymes by predicting their substrate’s chemical structural class.
We use the orphan enzymes ofMycobacterium tuberculosisas a case study, focusing on two protein families that are highly abundant in its proteome: the short-chain dehydrogenase/reductases (SDRs) and the S-adenosylmethionine (SAM)-dependent methyltransferases.
Training machine learning classification models that take as input the protein sequence embeddings obtained from a pre-trained, self-supervised protein language model results in excellent accuracy for a wide variety of prediction tasks.
These include redox cofactor preference for SDRs; small-molecule vs.
polymer (i.
e.
protein, DNA or RNA) substrate preference for SAM-dependent methyltransferases; as well as more detailed chemical structural predictions for the preferred substrates of both enzyme families.
We then use these trained classifiers to generate predictions for the full set of unannotated SDRs and SAM-methyltransferases in the proteomes ofM.
tuberculosisand other mycobacteria, generating a set of biochemically testable hypotheses.
Our approach can be extended and generalized to other enzyme families and organisms, and we envision it will help accelerate the annotation of a large number of orphan enzymes.
Graphical abstract.
Related Results
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
On the relationships between the Michaelis–Menten kinetics, reverse Michaelis–Menten kinetics, equilibrium chemistry approximation kinetics, and quadratic kinetics
On the relationships between the Michaelis–Menten kinetics, reverse Michaelis–Menten kinetics, equilibrium chemistry approximation kinetics, and quadratic kinetics
Abstract. The Michaelis–Menten kinetics and the reverse Michaelis–Menten kinetics are two popular mathematical formulations used in many land biogeochemical models to describe how ...
On the relationships between Michaelis–Menten kinetics, reverse Michaelis–Menten kinetics, Equilibrium Chemistry Approximation kinetics and quadratic kinetics
On the relationships between Michaelis–Menten kinetics, reverse Michaelis–Menten kinetics, Equilibrium Chemistry Approximation kinetics and quadratic kinetics
Abstract. The Michaelis–Menten kinetics and the reverse Michaelis–Menten kinetics are two popular mathematical formulations used in many land biogeochemical models to describe how ...
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
A Wideband mm-Wave Printed Dipole Antenna for 5G Applications
<span lang="EN-MY">In this paper, a wideband millimeter-wave (mm-Wave) printed dipole antenna is proposed to be used for fifth generation (5G) communications. The single elem...
Rodnoosjetljiv jezik na primjeru njemačkih časopisa Brigitte i Der Spiegel
Rodnoosjetljiv jezik na primjeru njemačkih časopisa Brigitte i Der Spiegel
On the basis of the comparative analysis of texts of the German biweekly magazine Brigitte and the weekly magazine Der Spiegel and under the presumption that gender-sensitive langu...
Aviation English - A global perspective: analysis, teaching, assessment
Aviation English - A global perspective: analysis, teaching, assessment
This e-book brings together 13 chapters written by aviation English researchers and practitioners settled in six different countries, representing institutions and universities fro...
Decoding Substrate Specificity in a Promiscuous Biocatalyst by Enzyme Proximity Sequencing
Decoding Substrate Specificity in a Promiscuous Biocatalyst by Enzyme Proximity Sequencing
Abstract
Substrate specificity is a defining feature of enzyme function, but its molecular underpinnings remain difficult to decode and engineer. Here, we leveraged enz...

