Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

TF-IDF Based Classification of Uzbek Educational Texts

View through CrossRef
This paper presents an approach to automatic Uzbek text classification. Uzbek language is a morphologically rich and low-resource language. The approach integrates Term Frequency–Inverse Document Frequency (TF-IDF) representation with conventional machine learning and similarity-based approaches. The aim is to categorize learning materials at the school grade level to support improved alignment of materials and student learning outcomes. In order to carry out the research, a dataset of 5th-11th grade school textbooks in different subjects was collected. The texts were preprocessed using standard natural language processing (NLP) tools and were transformed into TF-IDF vectors. These were used to train three common classification models: Logistic Regression (LR), k-Nearest Neighbors (k-NN), and Cosine Similarity (CS).Each new input text is compared with the grade-level textbook corpus, and the grade with the highest similarity is selected. It provides an estimate of the appropriate intellectual level for the material. The experimental findings indicate that Logistic Regression achieved 82% accuracy, and Cosine Similarity performed slightly better at 85.7%. Conversely, the k-NN method achieved only 22% accuracy, indicating its low applicability for Uzbek text classification. Overall, the proposed approach demonstrates practical value for pedagogical purposes and potential applicability to wider document analysis issues.
Title: TF-IDF Based Classification of Uzbek Educational Texts
Description:
This paper presents an approach to automatic Uzbek text classification.
Uzbek language is a morphologically rich and low-resource language.
The approach integrates Term Frequency–Inverse Document Frequency (TF-IDF) representation with conventional machine learning and similarity-based approaches.
The aim is to categorize learning materials at the school grade level to support improved alignment of materials and student learning outcomes.
In order to carry out the research, a dataset of 5th-11th grade school textbooks in different subjects was collected.
The texts were preprocessed using standard natural language processing (NLP) tools and were transformed into TF-IDF vectors.
These were used to train three common classification models: Logistic Regression (LR), k-Nearest Neighbors (k-NN), and Cosine Similarity (CS).
Each new input text is compared with the grade-level textbook corpus, and the grade with the highest similarity is selected.
It provides an estimate of the appropriate intellectual level for the material.
The experimental findings indicate that Logistic Regression achieved 82% accuracy, and Cosine Similarity performed slightly better at 85.
7%.
Conversely, the k-NN method achieved only 22% accuracy, indicating its low applicability for Uzbek text classification.
Overall, the proposed approach demonstrates practical value for pedagogical purposes and potential applicability to wider document analysis issues.

Related Results

TF-IDF-Based Classification of Uzbek Educational Texts
TF-IDF-Based Classification of Uzbek Educational Texts
This paper presents a baseline study on automatic Uzbek text classification. Uzbek is a morphologically rich and low-resource language, which makes reliable preprocessing and evalu...
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...
Software Requirements Classification Using Machine Learning Algorithms
Software Requirements Classification Using Machine Learning Algorithms
The correct classification of requirements has become an essential task within software engineering. This study shows a comparison among the text feature extraction techniques, and...
Comparative Analysis of Developed Rainfall Intensity–Duration–Frequency Curves for Erbil with Other Iraqi Urban Areas
Comparative Analysis of Developed Rainfall Intensity–Duration–Frequency Curves for Erbil with Other Iraqi Urban Areas
Rainfall Intensity–Duration–Frequency (IDF) relationships are widely used in water infrastructure design and construction. IDF curves represent the relationship between rainfall in...
Biblical Texts and Interpretations in the Dead Sea Scrolls: Biblical Texts
Biblical Texts and Interpretations in the Dead Sea Scrolls: Biblical Texts
The introduction to this entry places the Dead Sea Scrolls in their historical and chronological context and discusses the popularity and provenance of the texts found in the Judea...
Modelling of Intensity-Duration Frequency curves for Upper Cauvery Karnataka through Normal Distribution
Modelling of Intensity-Duration Frequency curves for Upper Cauvery Karnataka through Normal Distribution
The IDF Curves accessible are for the most part done by fitting arrangement of yearly greatest precipitation force to parametric dispersions. Intensity-duration-frequency (IDF) cur...
JIS Definition Identified More Malaysian Adults with Metabolic Syndrome Compared to the NCEP-ATP III and IDF Criteria
JIS Definition Identified More Malaysian Adults with Metabolic Syndrome Compared to the NCEP-ATP III and IDF Criteria
Metabolic syndrome (MetS) is a steering force for the cardiovascular diseases epidemic in Asia. This study aimed to compare the prevalence of MetS in Malaysian adults using NCEP-AT...
Physicochemical properties of dietary fiber of bergamot and its effect on diabetic mice
Physicochemical properties of dietary fiber of bergamot and its effect on diabetic mice
Bergamot (Citrus medica L. var. sarcodactylis) contains different bioactive compounds, and their effects remain unclear. Therefore, the structural and bio-function of bergamot diet...

Back to Top