Javascript must be enabled to continue!
Hist2Vec: Kernel-Based Embeddings for Biological Sequence Classification
View through CrossRef
AbstractBiological sequence classification is vital in various fields, such as genomics and bioinformatics. The advancement and reduced cost of genomic sequencing have brought the attention of researchers for protein and nucleotide sequence classification. Traditional approaches face limitations in capturing the intricate relationships and hierarchical structures inherent in genomic sequences, while numerous machine-learning models have been proposed to tackle this challenge. In this work, we propose Hist2Vec, a novel kernel-based embedding generation approach for capturing sequence similarities. Hist2Vec combines the concept of histogram-based kernel matrices and Gaussian kernel functions. It constructs histogram-based representations using the uniquek-mers present in the sequences. By leveraging the power of Gaussian kernels, Hist2Vec transforms these representations into high-dimensional feature spaces, preserving important sequence information. Hist2Vec aims to address the limitations of existing methods by capturing sequence similarities in a high-dimensional feature space while providing a robust and efficient framework for classification. We employ kernel Principal Component Analysis (PCA) using standard machine-learning algorithms to generate embedding for efficient classification. Experimental evaluations on protein and nucleotide datasets demonstrate the efficacy of Hist2Vec in achieving high classification accuracy compared to state-of-the-art methods. It outperforms state-of-the-art methods by achieving > 76% and > 83% accuracies for DNA and Protein datasets, respectively. Hist2Vec provides a robust framework for biological sequence classification, enabling better classification and promising avenues for further analysis of biological data.
Title: Hist2Vec: Kernel-Based Embeddings for Biological Sequence Classification
Description:
AbstractBiological sequence classification is vital in various fields, such as genomics and bioinformatics.
The advancement and reduced cost of genomic sequencing have brought the attention of researchers for protein and nucleotide sequence classification.
Traditional approaches face limitations in capturing the intricate relationships and hierarchical structures inherent in genomic sequences, while numerous machine-learning models have been proposed to tackle this challenge.
In this work, we propose Hist2Vec, a novel kernel-based embedding generation approach for capturing sequence similarities.
Hist2Vec combines the concept of histogram-based kernel matrices and Gaussian kernel functions.
It constructs histogram-based representations using the uniquek-mers present in the sequences.
By leveraging the power of Gaussian kernels, Hist2Vec transforms these representations into high-dimensional feature spaces, preserving important sequence information.
Hist2Vec aims to address the limitations of existing methods by capturing sequence similarities in a high-dimensional feature space while providing a robust and efficient framework for classification.
We employ kernel Principal Component Analysis (PCA) using standard machine-learning algorithms to generate embedding for efficient classification.
Experimental evaluations on protein and nucleotide datasets demonstrate the efficacy of Hist2Vec in achieving high classification accuracy compared to state-of-the-art methods.
It outperforms state-of-the-art methods by achieving > 76% and > 83% accuracies for DNA and Protein datasets, respectively.
Hist2Vec provides a robust framework for biological sequence classification, enabling better classification and promising avenues for further analysis of biological data.
Related Results
Genetic Variation in Potential Kernel Size Affects Kernel Growth and Yield of Sorghum
Genetic Variation in Potential Kernel Size Affects Kernel Growth and Yield of Sorghum
Large‐seededness can increase grain yield in sorghum [Sorghum bicolor (L.) Moench] if larger kernel size more than compensates for the associated reduction in kernel number. The ai...
Sorghum Kernel Weight
Sorghum Kernel Weight
The influence of genotype and panicle position on sorghum [Sorghum bicolor (L.) Moench] kernel growth is poorly understood. In the present study, sorghum kernel weight (KW) differe...
Physicochemical Properties of Wheat Fractionated by Wheat Kernel Thickness and Separated by Kernel Specific Density
Physicochemical Properties of Wheat Fractionated by Wheat Kernel Thickness and Separated by Kernel Specific Density
ABSTRACTTwo wheat cultivars, soft white winter wheat Yang‐mai 11 and hard white winter wheat Zheng‐mai 9023, were fractionated by kernel thickness into five sections; the fractiona...
Spectral-Similarity-Based Kernel of SVM for Hyperspectral Image Classification
Spectral-Similarity-Based Kernel of SVM for Hyperspectral Image Classification
Spectral similarity measures can be regarded as potential metrics for kernel functions, and can be used to generate spectral-similarity-based kernels. However, spectral-similarity-...
Cost-sensitive multi-kernel ELM based on reduced expectation kernel auto-encoder
Cost-sensitive multi-kernel ELM based on reduced expectation kernel auto-encoder
ELM (Extreme learning machine) has drawn great attention due its high training speed and outstanding generalization performance. To solve the problem that the long training time of...
Polyphenol Oxidase in Wheat Grain: Whole Kernel and Bran Assays for Total and Soluble Activity
Polyphenol Oxidase in Wheat Grain: Whole Kernel and Bran Assays for Total and Soluble Activity
ABSTRACTColor is a key quality trait of wheat products, and polyphenol oxidase (PPO) is implicated as playing a significant role in darkening and discoloration. In this study, tota...
Makna Kode Semik dan Simbolik (Semiotik Roland Barthes)
Makna Kode Semik dan Simbolik (Semiotik Roland Barthes)
Permasalahan yang terdapat dalam tulisan ini kemudian dirumuskan sebagai berikut: kode semiotik apa sajakah yang terdapat dalam novel Aroma Karsa karya Dee Lestari? dan bagaimanaka...
Plot Multivariate Menggunakan Kernel Principal Component Analysis (KPCA) dengan Fungsi Power Kernel
Plot Multivariate Menggunakan Kernel Principal Component Analysis (KPCA) dengan Fungsi Power Kernel
Kernel PCA merupakan PCA yang diaplikasikan pada input data yang telah ditransformasikan ke feature space. Misalkan F: Rn®F fungsi yang memetakan semua input data xiÎRn, berlaku F...

