Javascript must be enabled to continue!
Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features
View through CrossRef
The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions. With the advances of high-throughput sequencing technologies and proteomic methods, the protein sequences of numerous yeasts have become publicly available, which enables us to computationally predict yeast protein subcellular localization. However, widely-used protein sequence representation techniques, such as amino acid composition and the Chou’s pseudo amino acid composition (PseAAC), are difficult in extracting adequate information about the interactions between residues and position distribution of each residue. Therefore, it is still urgent to develop novel sequence representations. In this study, we have presented two novel protein sequence representation techniques including Generalized Chaos Game Representation (GCGR) based on the frequency and distributions of the residues in the protein primary sequence, and novel statistics and information theory (NSI) reflecting local position information of the sequence. In the GCGR + NSI representation, a protein primary sequence is simply represented by a 5-dimensional feature vector, while other popular methods like PseAAC and dipeptide adopt features of more than hundreds of dimensions. In practice, the feature representation is highly efficient in predicting protein subcellular localization. Even without using machine learning-based classifiers, a simple model based on the feature vector can achieve prediction accuracies of 0.8825 and 0.7736 respectively for the CL317 and ZW225 datasets. To further evaluate the effectiveness of the proposed encoding schemes, we introduce a multi-view features-based method to combine the two above-mentioned features with other well-known features including PseAAC and dipeptide composition, and use support vector machine as the classifier to predict protein subcellular localization. This novel model achieves prediction accuracies of 0.927 and 0.871 respectively for the CL317 and ZW225 datasets, better than other existing methods in the jackknife tests. The results suggest that the GCGR and NSI features are useful complements to popular protein sequence representations in predicting yeast protein subcellular localization. Finally, we validate a few newly predicted protein subcellular localizations by evidences from some published articles in authority journals and books.
Title: Prediction of Protein Subcellular Localization Based on Fusion of Multi-view Features
Description:
The prediction of protein subcellular localization is critical for inferring protein functions, gene regulations and protein-protein interactions.
With the advances of high-throughput sequencing technologies and proteomic methods, the protein sequences of numerous yeasts have become publicly available, which enables us to computationally predict yeast protein subcellular localization.
However, widely-used protein sequence representation techniques, such as amino acid composition and the Chou’s pseudo amino acid composition (PseAAC), are difficult in extracting adequate information about the interactions between residues and position distribution of each residue.
Therefore, it is still urgent to develop novel sequence representations.
In this study, we have presented two novel protein sequence representation techniques including Generalized Chaos Game Representation (GCGR) based on the frequency and distributions of the residues in the protein primary sequence, and novel statistics and information theory (NSI) reflecting local position information of the sequence.
In the GCGR + NSI representation, a protein primary sequence is simply represented by a 5-dimensional feature vector, while other popular methods like PseAAC and dipeptide adopt features of more than hundreds of dimensions.
In practice, the feature representation is highly efficient in predicting protein subcellular localization.
Even without using machine learning-based classifiers, a simple model based on the feature vector can achieve prediction accuracies of 0.
8825 and 0.
7736 respectively for the CL317 and ZW225 datasets.
To further evaluate the effectiveness of the proposed encoding schemes, we introduce a multi-view features-based method to combine the two above-mentioned features with other well-known features including PseAAC and dipeptide composition, and use support vector machine as the classifier to predict protein subcellular localization.
This novel model achieves prediction accuracies of 0.
927 and 0.
871 respectively for the CL317 and ZW225 datasets, better than other existing methods in the jackknife tests.
The results suggest that the GCGR and NSI features are useful complements to popular protein sequence representations in predicting yeast protein subcellular localization.
Finally, we validate a few newly predicted protein subcellular localizations by evidences from some published articles in authority journals and books.
Related Results
The Nuclear Fusion Award
The Nuclear Fusion Award
The Nuclear Fusion Award ceremony for 2009 and 2010 award winners was held during the 23rd IAEA Fusion Energy Conference in Daejeon. This time, both 2009 and 2010 award winners w...
Indoor Localization System Based on RSSI-APIT Algorithm
Indoor Localization System Based on RSSI-APIT Algorithm
An indoor localization system based on the RSSI-APIT algorithm is designed in this study. Integrated RSSI (received signal strength indication) and non-ranging APIT (approximate pe...
Deep generative model for protein subcellular localization prediction
Deep generative model for protein subcellular localization prediction
AbstractProtein sequence determines not only its structure but also its subcellular localization. Although a series of artificial intelligence models have been reported to predict ...
Endothelial Protein C Receptor
Endothelial Protein C Receptor
IntroductionThe protein C anticoagulant pathway plays a critical role in the negative regulation of the blood clotting response. The pathway is triggered by thrombin, which allows ...
Nonproliferation and fusion power plants
Nonproliferation and fusion power plants
Abstract
The world now appears to be on the brink of realizing commercial fusion. As fusion energy progresses towards near-term commercial deployment, the question arises a...
Validating subcellular localization prediction tools with mycobacterial proteins
Validating subcellular localization prediction tools with mycobacterial proteins
Abstract
Background
The computational prediction of mycobacterial proteins' subcellular localization is of key importance for proteome annotation...
PreSubLncR: Predicting Subcellular Localization of Long Non-Coding RNA Based on Multi-Scale Attention Convolutional Network and Bidirectional Long Short-Term Memory Network
PreSubLncR: Predicting Subcellular Localization of Long Non-Coding RNA Based on Multi-Scale Attention Convolutional Network and Bidirectional Long Short-Term Memory Network
The subcellular localization of long non-coding RNA (lncRNA) provides important insights and opportunities for an in-depth understanding of cell biology, revealing disease mechanis...
Volume 10, Index
Volume 10, Index
<p><strong>Vol 10, No 1 (2015)</strong></p><p><strong> </strong></p><p><a href="http://www.world-education-center.org/index...

