Javascript must be enabled to continue!

A Hybrid BERT-ALBERT Model for Text Classification: Improving Accuracy in Document Analysis

In document analysis, text classification is an essential activity that facilitates automatic contentcategorisation, sentiment analysis, and effective information retrieval. This paper explores a combination(Bidirectional Encoder Representations from Transformers) BERT+ALBERT (A Lite BERT) to enhanceclassification accuracy while reducing computational complexity. This model utilizes transformer encoderblocks and bidirectional position encoding for text processing. Large text data handling requiresautomation, and attention techniques and transformers are becoming viable approaches. The model wasevaluated on a custom dataset consisting of 80,000 Turkish-language documents spanning 30 categories,including finance, education, healthcare, and travel. The dataset was split into 70% training, 15%validation, and 15% testing. Pretrained FastText embeddings were used alongside BERT and ALBERT tocapture rich semantic features. While ALBERT increases efficiency through parameter reduction andcross-layer parameter sharing, BERT offers deep contextual embeddings. According to experimentalassessments, the BERT+ALBERT hybrid model performs better than transformer models (BERT,ALBERT, and LSTM) and classic machine learning models, attaining the recommended model accuracyof 96.6%, precision of 95.7%, recall of 95.1%, and F1-Score of 95.5%. Statistical significance of theperformance gains was confirmed using t-tests (p < 0.05) across five independent runs. Stronggeneralisation and little overfitting are shown by the training and validation curves. These resultsdemonstrate the benefits of using many transformer topologies for document categorisation, providing atrade-off between computing efficiency and accuracy. Additional optimisations, such as domain-specificfine-tuning and sophisticated attention processes, can be investigated in future research.

Slovenian Association Informatika

Xiaokui Liu

Informatica

2026

Title: A Hybrid BERT-ALBERT Model for Text Classification: Improving Accuracy in Document Analysis

Description:

In document analysis, text classification is an essential activity that facilitates automatic contentcategorisation, sentiment analysis, and effective information retrieval.

This paper explores a combination(Bidirectional Encoder Representations from Transformers) BERT+ALBERT (A Lite BERT) to enhanceclassification accuracy while reducing computational complexity.

This model utilizes transformer encoderblocks and bidirectional position encoding for text processing.

Large text data handling requiresautomation, and attention techniques and transformers are becoming viable approaches.

The model wasevaluated on a custom dataset consisting of 80,000 Turkish-language documents spanning 30 categories,including finance, education, healthcare, and travel.

The dataset was split into 70% training, 15%validation, and 15% testing.

Pretrained FastText embeddings were used alongside BERT and ALBERT tocapture rich semantic features.

While ALBERT increases efficiency through parameter reduction andcross-layer parameter sharing, BERT offers deep contextual embeddings.

According to experimentalassessments, the BERT+ALBERT hybrid model performs better than transformer models (BERT,ALBERT, and LSTM) and classic machine learning models, attaining the recommended model accuracyof 96.

6%, precision of 95.

7%, recall of 95.

1%, and F1-Score of 95.

5%.

Statistical significance of theperformance gains was confirmed using t-tests (p < 0.

05) across five independent runs.

Stronggeneralisation and little overfitting are shown by the training and validation curves.

These resultsdemonstrate the benefits of using many transformer topologies for document categorisation, providing atrade-off between computing efficiency and accuracy.

Additional optimisations, such as domain-specificfine-tuning and sophisticated attention processes, can be investigated in future research.

Back

The potential energy curves, dipole moments, and transition dipole moments for the <inline-formula><tex-math id="M13">\begin{document}${{\rm{X}}^1}{\Sigma ^ + }$\end{do...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Revisiting near-threshold photoelectron interference in argon with a non-adiabatic semiclassical model

<sec> <b>Purpose:</b> The interaction of intense, ultrashort laser pulses with atoms gives rise to rich non-perturbative phenomena, which are encoded within th...

Over-Sampling Effect in Pre-Training for Bidirectional Encoder Representations from Transformers (BERT) to Localize Medical BERT and Enhance Biomedical BERT (Preprint)

BACKGROUND Pre-training large-scale neural language models on raw texts has made a significant contribution to improving transfer learning in natural langua...

Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using d...

ALBERT-QM: An ALBERT Based Method for Chinese Health Related Question Matching (Preprint)

BACKGROUND Question answering (QA) system is widely used in web-based health-care applications. Health consumers likely asked similar questions in various n...

Bounds on the sum of broadcast domination number and strong metric dimension of graphs

Let [Formula: see text] be a connected graph of order at least two with vertex set [Formula: see text]. For [Formula: see text], let [Formula: see text] denote the length of an [Fo...

Email:
Password:

Email:

A Hybrid BERT-ALBERT Model for Text Classification: Improving Accuracy in Document Analysis

Related Results