Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

A Hybrid BERT-ALBERT Model for Text Classification: Improving Accuracy in Document Analysis

View through CrossRef
In document analysis, text classification is an essential activity that facilitates automatic contentcategorisation, sentiment analysis, and effective information retrieval. This paper explores a combination(Bidirectional Encoder Representations from Transformers) BERT+ALBERT (A Lite BERT) to enhanceclassification accuracy while reducing computational complexity. This model utilizes transformer encoderblocks and bidirectional position encoding for text processing. Large text data handling requiresautomation, and attention techniques and transformers are becoming viable approaches. The model wasevaluated on a custom dataset consisting of 80,000 Turkish-language documents spanning 30 categories,including finance, education, healthcare, and travel. The dataset was split into 70% training, 15%validation, and 15% testing. Pretrained FastText embeddings were used alongside BERT and ALBERT tocapture rich semantic features. While ALBERT increases efficiency through parameter reduction andcross-layer parameter sharing, BERT offers deep contextual embeddings. According to experimentalassessments, the BERT+ALBERT hybrid model performs better than transformer models (BERT,ALBERT, and LSTM) and classic machine learning models, attaining the recommended model accuracyof 96.6%, precision of 95.7%, recall of 95.1%, and F1-Score of 95.5%. Statistical significance of theperformance gains was confirmed using t-tests (p < 0.05) across five independent runs. Stronggeneralisation and little overfitting are shown by the training and validation curves. These resultsdemonstrate the benefits of using many transformer topologies for document categorisation, providing atrade-off between computing efficiency and accuracy. Additional optimisations, such as domain-specificfine-tuning and sophisticated attention processes, can be investigated in future research.
Slovenian Association Informatika
Title: A Hybrid BERT-ALBERT Model for Text Classification: Improving Accuracy in Document Analysis
Description:
In document analysis, text classification is an essential activity that facilitates automatic contentcategorisation, sentiment analysis, and effective information retrieval.
This paper explores a combination(Bidirectional Encoder Representations from Transformers) BERT+ALBERT (A Lite BERT) to enhanceclassification accuracy while reducing computational complexity.
This model utilizes transformer encoderblocks and bidirectional position encoding for text processing.
Large text data handling requiresautomation, and attention techniques and transformers are becoming viable approaches.
The model wasevaluated on a custom dataset consisting of 80,000 Turkish-language documents spanning 30 categories,including finance, education, healthcare, and travel.
The dataset was split into 70% training, 15%validation, and 15% testing.
Pretrained FastText embeddings were used alongside BERT and ALBERT tocapture rich semantic features.
While ALBERT increases efficiency through parameter reduction andcross-layer parameter sharing, BERT offers deep contextual embeddings.
According to experimentalassessments, the BERT+ALBERT hybrid model performs better than transformer models (BERT,ALBERT, and LSTM) and classic machine learning models, attaining the recommended model accuracyof 96.
6%, precision of 95.
7%, recall of 95.
1%, and F1-Score of 95.
5%.
Statistical significance of theperformance gains was confirmed using t-tests (p < 0.
05) across five independent runs.
Stronggeneralisation and little overfitting are shown by the training and validation curves.
These resultsdemonstrate the benefits of using many transformer topologies for document categorisation, providing atrade-off between computing efficiency and accuracy.
Additional optimisations, such as domain-specificfine-tuning and sophisticated attention processes, can be investigated in future research.

Related Results

Theoretical study of laser-cooled SH<sup>–</sup> anion
Theoretical study of laser-cooled SH<sup>–</sup> anion
The potential energy curves, dipole moments, and transition dipole moments for the <inline-formula><tex-math id="M13">\begin{document}${{\rm{X}}^1}{\Sigma ^ + }$\end{do...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Revisiting near-threshold photoelectron interference in argon with a non-adiabatic semiclassical model
Revisiting near-threshold photoelectron interference in argon with a non-adiabatic semiclassical model
<sec> <b>Purpose:</b> The interaction of intense, ultrashort laser pulses with atoms gives rise to rich non-perturbative phenomena, which are encoded within th...
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches
Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...
ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)
ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)
BACKGROUND Question answering (QA) system is widely used in web-based health-care applications. Health consumers likely asked similar questions in various n...
Bounds on the sum of broadcast domination number and strong metric dimension of graphs
Bounds on the sum of broadcast domination number and strong metric dimension of graphs
Let [Formula: see text] be a connected graph of order at least two with vertex set [Formula: see text]. For [Formula: see text], let [Formula: see text] denote the length of an [Fo...

Back to Top