Javascript must be enabled to continue!
Comment text clustering algorithm based on improved DEC
View through CrossRef
Aiming at the problem that the initial number of clusters and cluster centers obtained by the clustering layer in the original deep embedding clustering (DEC) algorithm are highly random, thus affecting the effect of the DEC algorithm, a comment text clustering algorithm based on improved DEC is proposed to perform unsupervised clustering on e-commerce comment data without category annotations. Firstly, the vectorized representation of the BERT-LDA dataset that integrates sentence embedding vectors and topic distribution vectors is obtained; then the DEC algorithm is improved, and the dimension reduction is performed through an autoencoder. A clustering layer is stacked after the encoder, in which the number of clusters in the clustering layer is selected based on topic coherence, and the topic feature vector is used as a custom clustering center. The encoder and clustering layer are then jointly trained to improve the accuracy of clustering; finally, the clustering effect is intuitively displayed using a visualization tool. To verify the effectiveness of the algorithm, the algorithm is compared with 6 comparison algorithms for unsupervised clustering training on an unlabeled product review dataset. The results show that the algorithm achieves the best results of 0.2135 and 2958.18 in the silhouette coefficient and Calinski-Harabaz index, respectively. This shows that it can effectively process e-commerce review data and reflect users' attention to products.
Title: Comment text clustering algorithm based on improved DEC
Description:
Aiming at the problem that the initial number of clusters and cluster centers obtained by the clustering layer in the original deep embedding clustering (DEC) algorithm are highly random, thus affecting the effect of the DEC algorithm, a comment text clustering algorithm based on improved DEC is proposed to perform unsupervised clustering on e-commerce comment data without category annotations.
Firstly, the vectorized representation of the BERT-LDA dataset that integrates sentence embedding vectors and topic distribution vectors is obtained; then the DEC algorithm is improved, and the dimension reduction is performed through an autoencoder.
A clustering layer is stacked after the encoder, in which the number of clusters in the clustering layer is selected based on topic coherence, and the topic feature vector is used as a custom clustering center.
The encoder and clustering layer are then jointly trained to improve the accuracy of clustering; finally, the clustering effect is intuitively displayed using a visualization tool.
To verify the effectiveness of the algorithm, the algorithm is compared with 6 comparison algorithms for unsupervised clustering training on an unlabeled product review dataset.
The results show that the algorithm achieves the best results of 0.
2135 and 2958.
18 in the silhouette coefficient and Calinski-Harabaz index, respectively.
This shows that it can effectively process e-commerce review data and reflect users' attention to products.
Related Results
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background:
Clustering is one of the most important data mining methods. The k-means
(c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
The Canberra Bubble
The Canberra Bubble
According to the ABC television program Four Corners, “Parliament House in Canberra is a hotbed of political intrigue and high tension … . It’s known as the ‘Canberra Bubble’ and i...
Application of Dec 1 for Decolorization of Other Colored Substances
Application of Dec 1 for Decolorization of Other Colored Substances
Molasses is a primary carbon source, especially in the microbial industry;
however, molasses includes many colored substances, like melanoidins, which become
concentrated by the Ma...
Bounds on the sum of broadcast domination number and strong metric dimension of graphs
Bounds on the sum of broadcast domination number and strong metric dimension of graphs
Let [Formula: see text] be a connected graph of order at least two with vertex set [Formula: see text]. For [Formula: see text], let [Formula: see text] denote the length of an [Fo...
ANALYSIS OF READING MATERIALS IN TEXTBOOK FOR GRADE XI SENIOR HIGH SCHOOL
ANALYSIS OF READING MATERIALS IN TEXTBOOK FOR GRADE XI SENIOR HIGH SCHOOL
This study aims to find out the GI and LD level, the text which has the highest GI and LD and what make the text has the highest GI and LD of Advanced Learning English 2 textbook. ...
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...

