Javascript must be enabled to continue!

Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using deep learning techniques. This study aims to evaluate the effectiveness of different text augmentation strategies—character-level, token-level, and semantic-level—in improving the classification accuracy of these certificates, which are crucial for bridging the skills gap in the digital economy. Background: Traditional education systems often overlook skills gained through non-formal learning, creating a gap between industry needs and academic qualifications. This paper addresses this by using BERT-based deep learning models to classify non-formal learning certificates, enhanced by text augmentation techniques to improve accuracy in mapping them to formal academic standards. Methodology: This study employs a deep learning approach using Bidirectional Encoder Representations from Transformers (BERT) to classify non-formal learning certificates into seven core computer science courses. The research utilizes text augmentation techniques at character, token, and semantic levels to improve classification accuracy. A dataset of 525 certificates, collected through data gathering, was preprocessed using Optical Character Recognition (OCR) to extract text from PDF documents, followed by cleaning and augmentation before training the BERT model. Contribution: This paper addresses the growing need for efficient Recognition of Prior Learning (RPL) in the context of rapidly advancing knowledge, particularly in the AI era, where non-formal learning is becoming increasingly important. We present a novel approach to automating the classification and validation of non-formal learning certificates using deep learning techniques. The study evaluates and compares character-level, token-level, and semantic-level text augmentation methods to improve the accuracy of certificate classification. What sets this research apart is the systematic assessment of which augmentation method best enhances model performance for RPL tasks, providing new insights into optimizing deep learning models for this purpose. The findings aim to reduce human error and improve the efficiency of RPL implementation, offering a scalable solution for better integrating or converting non-formal learning into formal educational systems. Findings: The study found that token-level augmentations, particularly word insertion and word deletion, significantly improved classification accuracy, with validation accuracies exceeding 88%. Character-level augmentations also contributed to model performance, but with slightly lower accuracy. Semantic-level augmentation via back translation showed the least impact. These results demonstrate that token-level text augmentations offer the most effective strategy for enhancing the classification of non-formal learning certificates in the context of Recognition of Prior Learning (RPL). Recommendations for Practitioners: Practitioners should focus on token-level text augmentation techniques, like word insertion and deletion, to improve the accuracy of machine learning models for classifying non-formal learning certificates, enabling better integration into formal education and employment pathways. Recommendation for Researchers: Researchers should explore combining multiple augmentation techniques (e.g., token-level and semantic-level) and investigate advanced models like BERT-large or multilingual variants for improved classification accuracy. Additionally, examining the impact of different OCR tools and preprocessing strategies could further enhance non-formal learning certificate recognition. Impact on Society: The findings of this study have significant implications for improving access to education and employment opportunities. By enhancing the recognition of prior learning through automated classification of non-formal learning certificates, this research supports a more inclusive and equitable education system. It can help individuals, particularly those with non-traditional educational backgrounds, gain recognition for their skills, ultimately bridging the skills gap in the workforce and promoting lifelong learning in the digital economy. Future Research: Future research should focus on expanding the dataset to include multilingual certificates, which would enhance the model’s ability to generalize across different languages and cultural contexts. Additionally, researchers could investigate the use of hybrid models that combine BERT with other machine learning techniques to further improve classification accuracy. Exploring the integration of real-world data sources, such as employer-verified work experience and additional non-formal learning formats, could also provide a more comprehensive approach to recognizing prior learning.

Informing Science Institute

I Gede Susrama Mas Diyasa Eva Yulia Puspaningrum Dimas Saputra Wan Suryani Wan Awang

Interdisciplinary Journal of Information, Knowledge, and Management

2025

Title: Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Description:

This study aims to evaluate the effectiveness of different text augmentation strategies—character-level, token-level, and semantic-level—in improving the classification accuracy of these certificates, which are crucial for bridging the skills gap in the digital economy.

Background: Traditional education systems often overlook skills gained through non-formal learning, creating a gap between industry needs and academic qualifications.

This paper addresses this by using BERT-based deep learning models to classify non-formal learning certificates, enhanced by text augmentation techniques to improve accuracy in mapping them to formal academic standards.

Methodology: This study employs a deep learning approach using Bidirectional Encoder Representations from Transformers (BERT) to classify non-formal learning certificates into seven core computer science courses.

The research utilizes text augmentation techniques at character, token, and semantic levels to improve classification accuracy.

A dataset of 525 certificates, collected through data gathering, was preprocessed using Optical Character Recognition (OCR) to extract text from PDF documents, followed by cleaning and augmentation before training the BERT model.

Contribution: This paper addresses the growing need for efficient Recognition of Prior Learning (RPL) in the context of rapidly advancing knowledge, particularly in the AI era, where non-formal learning is becoming increasingly important.

We present a novel approach to automating the classification and validation of non-formal learning certificates using deep learning techniques.

The study evaluates and compares character-level, token-level, and semantic-level text augmentation methods to improve the accuracy of certificate classification.

What sets this research apart is the systematic assessment of which augmentation method best enhances model performance for RPL tasks, providing new insights into optimizing deep learning models for this purpose.

The findings aim to reduce human error and improve the efficiency of RPL implementation, offering a scalable solution for better integrating or converting non-formal learning into formal educational systems.

Findings: The study found that token-level augmentations, particularly word insertion and word deletion, significantly improved classification accuracy, with validation accuracies exceeding 88%.

Character-level augmentations also contributed to model performance, but with slightly lower accuracy.

Semantic-level augmentation via back translation showed the least impact.

These results demonstrate that token-level text augmentations offer the most effective strategy for enhancing the classification of non-formal learning certificates in the context of Recognition of Prior Learning (RPL).

Recommendations for Practitioners: Practitioners should focus on token-level text augmentation techniques, like word insertion and deletion, to improve the accuracy of machine learning models for classifying non-formal learning certificates, enabling better integration into formal education and employment pathways.

Recommendation for Researchers: Researchers should explore combining multiple augmentation techniques (e.

, token-level and semantic-level) and investigate advanced models like BERT-large or multilingual variants for improved classification accuracy.

Additionally, examining the impact of different OCR tools and preprocessing strategies could further enhance non-formal learning certificate recognition.

Impact on Society: The findings of this study have significant implications for improving access to education and employment opportunities.

By enhancing the recognition of prior learning through automated classification of non-formal learning certificates, this research supports a more inclusive and equitable education system.

It can help individuals, particularly those with non-traditional educational backgrounds, gain recognition for their skills, ultimately bridging the skills gap in the workforce and promoting lifelong learning in the digital economy.

Future Research: Future research should focus on expanding the dataset to include multilingual certificates, which would enhance the model’s ability to generalize across different languages and cultural contexts.

Additionally, researchers could investigate the use of hybrid models that combine BERT with other machine learning techniques to further improve classification accuracy.

Exploring the integration of real-world data sources, such as employer-verified work experience and additional non-formal learning formats, could also provide a more comprehensive approach to recognizing prior learning.

Back

This study aims to analyze the implementation of social studies learning as strengthening character education in elementary schools. The research method used is a qualitative descr...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

Sleep Habits and Occurrence of Lowback Pain among Craftsmen

<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...

CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021

The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...

THE LAW EDUCATION AND MAKING THE BIRTH CERTIFICATE AS PROTECTION FORM FOR THE CHILDREN

The birth of a child is a gift and the happiness of parents. The birth is an important event in population administration. Each child must get clear and correct self identity. Birt...

Bounds on the sum of broadcast domination number and strong metric dimension of graphs

Let [Formula: see text] be a connected graph of order at least two with vertex set [Formula: see text]. For [Formula: see text], let [Formula: see text] denote the length of an [Fo...

Analisa Penerapan Algoritma Keccak untuk Keamanan Permintaan API

Implementing REST in modern applications, security will be a key foundation for its development because the REST architecture requires communication between servers. In this study,...

ANALYSIS OF READING MATERIALS IN TEXTBOOK FOR GRADE XI SENIOR HIGH SCHOOL

This study aims to find out the GI and LD level, the text which has the highest GI and LD and what make the text has the highest GI and LD of Advanced Learning English 2 textbook. ...

Email:
Password:

Email:

Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Related Results