Javascript must be enabled to continue!
Comment text clustering algorithm based on improved DEC
View through CrossRef
Aiming at the problem that the initial number of clusters and cluster centers obtained by the clustering layer in the original deep embedding clustering (DEC) algorithm are highly random, thus affecting the effect of the DEC algorithm, a comment text clustering algorithm based on improved DEC is proposed to perform unsupervised clustering on e-commerce comment data without category annotations. Firstly, the vectorized representation of the BERT-LDA dataset that integrates sentence embedding vectors and topic distribution vectors is obtained; then the DEC algorithm is improved, and the dimension reduction is performed through an autoencoder. A clustering layer is stacked after the encoder, in which the number of clusters in the clustering layer is selected based on topic coherence, and the topic feature vector is used as a custom clustering center. The encoder and clustering layer are then jointly trained to improve the accuracy of clustering; finally, the clustering effect is intuitively displayed using a visualization tool. To verify the effectiveness of the algorithm, the algorithm is compared with 6 comparison algorithms for unsupervised clustering training on an unlabeled product review dataset. The results show that the algorithm achieves the best results of 0.2135 and 2958.18 in the silhouette coefficient and Calinski-Harabaz index, respectively. This shows that it can effectively process e-commerce review data and reflect users' attention to products.
Title: Comment text clustering algorithm based on improved DEC
Description:
Aiming at the problem that the initial number of clusters and cluster centers obtained by the clustering layer in the original deep embedding clustering (DEC) algorithm are highly random, thus affecting the effect of the DEC algorithm, a comment text clustering algorithm based on improved DEC is proposed to perform unsupervised clustering on e-commerce comment data without category annotations.
Firstly, the vectorized representation of the BERT-LDA dataset that integrates sentence embedding vectors and topic distribution vectors is obtained; then the DEC algorithm is improved, and the dimension reduction is performed through an autoencoder.
A clustering layer is stacked after the encoder, in which the number of clusters in the clustering layer is selected based on topic coherence, and the topic feature vector is used as a custom clustering center.
The encoder and clustering layer are then jointly trained to improve the accuracy of clustering; finally, the clustering effect is intuitively displayed using a visualization tool.
To verify the effectiveness of the algorithm, the algorithm is compared with 6 comparison algorithms for unsupervised clustering training on an unlabeled product review dataset.
The results show that the algorithm achieves the best results of 0.
2135 and 2958.
18 in the silhouette coefficient and Calinski-Harabaz index, respectively.
This shows that it can effectively process e-commerce review data and reflect users' attention to products.
Related Results
The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background:
Clustering is one of the most important data mining methods. The k-means
(c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...
E-Press and Oppress
E-Press and Oppress
From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...
MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization
MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization
<p>Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, the density-based clustering algorithm faces three challenging ...
Image clustering using exponential discriminant analysis
Image clustering using exponential discriminant analysis
Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
<span style="font-size:11pt"><span style="background:#f9f9f4"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><b><spa...
Research on a microseismic signal picking algorithm based on GTOA clustering
Research on a microseismic signal picking algorithm based on GTOA clustering
Abstract. Clustering is one of the challenging problems in machine learning. Adopting clustering methods for the picking of microseismic signals has emerged as a new approach. Howe...
Exploring the topical structure of short text through probability models : from tasks to fundamentals
Exploring the topical structure of short text through probability models : from tasks to fundamentals
Recent technological advances have radically changed the way we communicate. Today’s
communication has become ubiquitous and it has fostered the need for information that is easie...

