Javascript must be enabled to continue!

Comment text clustering algorithm based on improved DEC

Aiming at the problem that the initial number of clusters and cluster centers obtained by the clustering layer in the original deep embedding clustering (DEC) algorithm are highly random, thus affecting the effect of the DEC algorithm, a comment text clustering algorithm based on improved DEC is proposed to perform unsupervised clustering on e-commerce comment data without category annotations. Firstly, the vectorized representation of the BERT-LDA dataset that integrates sentence embedding vectors and topic distribution vectors is obtained; then the DEC algorithm is improved, and the dimension reduction is performed through an autoencoder. A clustering layer is stacked after the encoder, in which the number of clusters in the clustering layer is selected based on topic coherence, and the topic feature vector is used as a custom clustering center. The encoder and clustering layer are then jointly trained to improve the accuracy of clustering; finally, the clustering effect is intuitively displayed using a visualization tool. To verify the effectiveness of the algorithm, the algorithm is compared with 6 comparison algorithms for unsupervised clustering training on an unlabeled product review dataset. The results show that the algorithm achieves the best results of 0.2135 and 2958.18 in the silhouette coefficient and Calinski-Harabaz index, respectively. This shows that it can effectively process e-commerce review data and reflect users' attention to products.

Cresta Press

Chen Kejia Xia Ruidong Lin Hongxi

Scientific Insights and Discoveries Review

2024

Title: Comment text clustering algorithm based on improved DEC

Description:

Firstly, the vectorized representation of the BERT-LDA dataset that integrates sentence embedding vectors and topic distribution vectors is obtained; then the DEC algorithm is improved, and the dimension reduction is performed through an autoencoder.

A clustering layer is stacked after the encoder, in which the number of clusters in the clustering layer is selected based on topic coherence, and the topic feature vector is used as a custom clustering center.

The encoder and clustering layer are then jointly trained to improve the accuracy of clustering; finally, the clustering effect is intuitively displayed using a visualization tool.

To verify the effectiveness of the algorithm, the algorithm is compared with 6 comparison algorithms for unsupervised clustering training on an unlabeled product review dataset.

The results show that the algorithm achieves the best results of 0.

2135 and 2958.

18 in the silhouette coefficient and Calinski-Harabaz index, respectively.

This shows that it can effectively process e-commerce review data and reflect users' attention to products.

Back

Related Results

The Kernel Rough K-Means Algorithm

Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...

Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm

In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...

E-Press and Oppress

From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...

MR-DBIFOA: a parallel Density-based Clustering Algorithm by Using Improve Fruit Fly Optimization

Clustering is an important technique for data analysis and knowledge discovery. In the context of big data, the density-based clustering algorithm faces three challenging ...

On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/

<spa...

Image clustering using exponential discriminant analysis

Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...

Research on a microseismic signal picking algorithm based on GTOA clustering

Abstract. Clustering is one of the challenging problems in machine learning. Adopting clustering methods for the picking of microseismic signals has emerged as a new approach. Howe...

Exploring the topical structure of short text through probability models : from tasks to fundamentals

Recent technological advances have radically changed the way we communicate. Today’s communication has become ubiquitous and it has fostered the need for information that is easie...

Email:
Password:

Email: