Javascript must be enabled to continue!
Machine Learning Differentiates Enzymatic and Non-Enzymatic Metals in Proteins
View through CrossRef
Abstract
Metalloenzymes are 40% of all enzymes and can perform all seven classes of enzyme reactions. Because of the physicochemical similarities between the active sites of metalloenzymes and inactive metal binding sites, it is challenging to differentiate between them. Yet distinguishing these two classes is critical for the identification of both native and designed enzymes. Because of similarities between these two types of metal binding sites, finding physicochemical features that distinguish active and inactive metal sites can indicate aspects that are critical to enzyme function. In this work, we develop the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date. We then use a decision-tree ensemble machine learning model to classify metals bound to proteins as enzymatic or non-enzymatic with 92.2% precision and 90.1% recall. Our model scores electrostatic and pocket lining features as more important than pocket volume, despite the fact that volume is the most quantitatively different feature between enzyme and non-enzymatic sites. Finally, we find our model has overall better performance in a side-to-side comparison against other methods that differentiate enzymatic from non-enzymatic sequences. We anticipate that our model’s ability to correctly identify which metal sites are responsible for enzymatic activity could enable identification of new enzymatic mechanisms and
de novo
enzyme design.
Title: Machine Learning Differentiates Enzymatic and Non-Enzymatic Metals in Proteins
Description:
Abstract
Metalloenzymes are 40% of all enzymes and can perform all seven classes of enzyme reactions.
Because of the physicochemical similarities between the active sites of metalloenzymes and inactive metal binding sites, it is challenging to differentiate between them.
Yet distinguishing these two classes is critical for the identification of both native and designed enzymes.
Because of similarities between these two types of metal binding sites, finding physicochemical features that distinguish active and inactive metal sites can indicate aspects that are critical to enzyme function.
In this work, we develop the largest structural dataset of enzymatic and non-enzymatic metalloprotein sites to date.
We then use a decision-tree ensemble machine learning model to classify metals bound to proteins as enzymatic or non-enzymatic with 92.
2% precision and 90.
1% recall.
Our model scores electrostatic and pocket lining features as more important than pocket volume, despite the fact that volume is the most quantitatively different feature between enzyme and non-enzymatic sites.
Finally, we find our model has overall better performance in a side-to-side comparison against other methods that differentiate enzymatic from non-enzymatic sequences.
We anticipate that our model’s ability to correctly identify which metal sites are responsible for enzymatic activity could enable identification of new enzymatic mechanisms and
de novo
enzyme design.
Related Results
A Review on the Synergistic Approaches for Heavy Metals Bioremediation: Harnessing the Power of Plant-Microbe Interactions
A Review on the Synergistic Approaches for Heavy Metals Bioremediation: Harnessing the Power of Plant-Microbe Interactions
Heavy metals contamination is a serious threat to all life forms. Long term exposure of heavy metals can lead to different life-threatening medical conditions including cancers of ...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
The After Effect of Crude Oil Spillage on Some Associated Heavy Metals in the Soil
The After Effect of Crude Oil Spillage on Some Associated Heavy Metals in the Soil
Abstract
Crude oil spillage is one major means of environmental pollution in oil and gas exploration and production. Since 1976, about 5334 cases of crude oil spi...
Section-level genome sequencing and comparative genomics of Aspergillus sections Cavernicolus and Usti
Section-level genome sequencing and comparative genomics of Aspergillus sections Cavernicolus and Usti
Fig. S1. A cladogram representation of the phylogenetic relations between the species in this paper. The red labels show bootstrap values of 100 % and the black labels show bootstr...
Heavy Metals Excessive Intake in Humans: Implications for Brain Cognition and Selected Dietary Essential Micronutrients
Heavy Metals Excessive Intake in Humans: Implications for Brain Cognition and Selected Dietary Essential Micronutrients
The objective of this article is to deliver a conceptual review of implications of heavy metals intake for brains and certain dietary nutrients such as copper, zinc, etc. Heavy met...
Identification of heparin‐binding proteins in bovine seminal plasma
Identification of heparin‐binding proteins in bovine seminal plasma
AbstractA group of four similar proteins, BSP‐A1, BSP‐A2, BSP‐A3, and BSP‐30‐kDa, represent the major acidic proteins found in bovine seminal plasma (BSP). These proteins are secre...
Identification and characterization of CFEM Proteins from Phakopsora pachyrhizi, the Asian soybean rust fungus
Identification and characterization of CFEM Proteins from Phakopsora pachyrhizi, the Asian soybean rust fungus
Asian soybean rust (ASR) caused by the fungus Phakopsora pachyrhizi is one of the main fungal diseases of soybean, which can cause losses of up to 95% of production under favorable...

