Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Semantic clustering method using integration of advanced LDA algorithm and BERT algorithm

View through CrossRef
The subject of the study is an in-depth semantic data analysis based on the modification of the Latent Dirichlet Allocation (LDA) methodology and its integration with the bidirectional encoding representation of transformers (BERT). Relevance. Latent Dirichlet Allocation (LDA) is a fundamental topic modeling technique that is widely used in a variety of text analysis applications. Although its usefulness is widely recognized, traditional LDA models often face limitations, such as a rigid distribution of topics and inadequate representation of semantic nuances inherent in natural language. The purpose and main idea of the study is to improve the adequacy and accuracy of semantic analysis by improving the basic LDA mechanism that integrates adaptive Dirichlet priorities and exploits the deep semantic capabilities of BERT embeddings. Research methods: 1) selection of textual datasets; 2) data preprocessing steps; 3) improvement of the LDA algorithm; 4) integration with BERT Embeddings; 5) comparative analysis. Research objectives: 1) theoretical substantiation of LDA modification; 2) implementation of integration with BERT; 3) evaluation of the method efficiency; 4) comparative analysis; 5) development of an architectural solution. The results of the research are that, first of all, the theoretical foundations of both the standard and modified LDA models are outlined, and their extended formula is presented in detail. Through a series of experiments on text datasets characterized by different emotional states, we emphasize the key advantages of the proposed approach. Based on a comparative analysis of such indicators as intra- and inter-cluster distances and silhouette coefficient, we prove the increased coherence, interpretability, and adaptability of the modified LDA model. An architectural solution for implementing the method is proposed. Conclusions. The empirical results indicate a significant improvement in the detection of subtle complexities and thematic structures in textual data, which is a step in the evolutionary development of thematic modeling methodologies. In addition, the results of the research not only open up the possibility of applying LDA to more complex linguistic scenarios, but also outline ways to further improve them for unsupervised text analysis.
Title: Semantic clustering method using integration of advanced LDA algorithm and BERT algorithm
Description:
The subject of the study is an in-depth semantic data analysis based on the modification of the Latent Dirichlet Allocation (LDA) methodology and its integration with the bidirectional encoding representation of transformers (BERT).
Relevance.
Latent Dirichlet Allocation (LDA) is a fundamental topic modeling technique that is widely used in a variety of text analysis applications.
Although its usefulness is widely recognized, traditional LDA models often face limitations, such as a rigid distribution of topics and inadequate representation of semantic nuances inherent in natural language.
The purpose and main idea of the study is to improve the adequacy and accuracy of semantic analysis by improving the basic LDA mechanism that integrates adaptive Dirichlet priorities and exploits the deep semantic capabilities of BERT embeddings.
Research methods: 1) selection of textual datasets; 2) data preprocessing steps; 3) improvement of the LDA algorithm; 4) integration with BERT Embeddings; 5) comparative analysis.
Research objectives: 1) theoretical substantiation of LDA modification; 2) implementation of integration with BERT; 3) evaluation of the method efficiency; 4) comparative analysis; 5) development of an architectural solution.
The results of the research are that, first of all, the theoretical foundations of both the standard and modified LDA models are outlined, and their extended formula is presented in detail.
Through a series of experiments on text datasets characterized by different emotional states, we emphasize the key advantages of the proposed approach.
Based on a comparative analysis of such indicators as intra- and inter-cluster distances and silhouette coefficient, we prove the increased coherence, interpretability, and adaptability of the modified LDA model.
An architectural solution for implementing the method is proposed.
Conclusions.
The empirical results indicate a significant improvement in the detection of subtle complexities and thematic structures in textual data, which is a step in the evolutionary development of thematic modeling methodologies.
In addition, the results of the research not only open up the possibility of applying LDA to more complex linguistic scenarios, but also outline ways to further improve them for unsupervised text analysis.

Related Results

МЕТОД СЕМАНТИЧНОГО АНАЛІЗУ ДАНИХ ДЛЯ ВИЗНАЧЕННЯ МАРКЕРНИХ СЛІВ ПРИ ОБРОБЛЕННІ РЕЗУЛЬТАТІВ ОЦІНКИ ВІЗИТОРІВ В ІНТЕРАКТИВНОМУ МИСТЕЦТВІ
МЕТОД СЕМАНТИЧНОГО АНАЛІЗУ ДАНИХ ДЛЯ ВИЗНАЧЕННЯ МАРКЕРНИХ СЛІВ ПРИ ОБРОБЛЕННІ РЕЗУЛЬТАТІВ ОЦІНКИ ВІЗИТОРІВ В ІНТЕРАКТИВНОМУ МИСТЕЦТВІ
Предметом дослідження є поглиблений семантичний аналіз даних, що базується на інтеграції методологій латентного розподілу Діріхле (LDA) та двонаправленого кодувального представленн...
Image clustering using exponential discriminant analysis
Image clustering using exponential discriminant analysis
Local learning based image clustering models are usually employed to deal with images sampled from the non‐linear manifold. Recently, linear discriminant analysis (LDA) based vario...
The Kernel Rough K-Means Algorithm
The Kernel Rough K-Means Algorithm
Background: Clustering is one of the most important data mining methods. The k-means (c-means ) and its derivative methods are the hotspot in the field of clustering research in re...
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
A Semantic Orthogonal Mapping Method Through Deep-Learning for Semantic Computing
In order to realize an artificial intelligent system, a basic mechanism should be provided for expressing and processing the semantic. We have presented semantic computing models i...
A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT
A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT
Abstract Background: Pre-training large-scale neural language models on raw texts has been shown to make a significant contribution to a strategy for transfer learning in n...
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
Parallel density clustering algorithm based on MapReduce and optimized cuckoo algorithm
In the process of parallel density clustering, the boundary points of clusters with different densities are blurred and there is data noise, which affects the clustering performanc...
Comment text clustering algorithm based on improved DEC
Comment text clustering algorithm based on improved DEC
Aiming at the problem that the initial number of clusters and cluster centers obtained by the clustering layer in the original deep embedding clustering (DEC) algorithm are highly ...

Back to Top