Javascript must be enabled to continue!

Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Aim/Purpose: The purpose of this paper is to address the gap in the recognition of prior learning (RPL) by automating the classification of non-formal learning certificates using deep learning techniques. This study aims to evaluate the effectiveness of different text augmentation strategies—character-level, token-level, and semantic-level—in improving the classification accuracy of these certificates, which are crucial for bridging the skills gap in the digital economy. Background: Traditional education systems often overlook skills gained through non-formal learning, creating a gap between industry needs and academic qualifications. This paper addresses this by using BERT-based deep learning models to classify non-formal learning certificates, enhanced by text augmentation techniques to improve accuracy in mapping them to formal academic standards. Methodology: This study employs a deep learning approach using Bidirectional Encoder Representations from Transformers (BERT) to classify non-formal learning certificates into seven core computer science courses. The research utilizes text augmentation techniques at character, token, and semantic levels to improve classification accuracy. A dataset of 525 certificates, collected through data gathering, was preprocessed using Optical Character Recognition (OCR) to extract text from PDF documents, followed by cleaning and augmentation before training the BERT model. Contribution: This paper addresses the growing need for efficient Recognition of Prior Learning (RPL) in the context of rapidly advancing knowledge, particularly in the AI era, where non-formal learning is becoming increasingly important. We present a novel approach to automating the classification and validation of non-formal learning certificates using deep learning techniques. The study evaluates and compares character-level, token-level, and semantic-level text augmentation methods to improve the accuracy of certificate classification. What sets this research apart is the systematic assessment of which augmentation method best enhances model performance for RPL tasks, providing new insights into optimizing deep learning models for this purpose. The findings aim to reduce human error and improve the efficiency of RPL implementation, offering a scalable solution for better integrating or converting non-formal learning into formal educational systems. Findings: The study found that token-level augmentations, particularly word insertion and word deletion, significantly improved classification accuracy, with validation accuracies exceeding 88%. Character-level augmentations also contributed to model performance, but with slightly lower accuracy. Semantic-level augmentation via back translation showed the least impact. These results demonstrate that token-level text augmentations offer the most effective strategy for enhancing the classification of non-formal learning certificates in the context of Recognition of Prior Learning (RPL). Recommendations for Practitioners: Practitioners should focus on token-level text augmentation techniques, like word insertion and deletion, to improve the accuracy of machine learning models for classifying non-formal learning certificates, enabling better integration into formal education and employment pathways. Recommendation for Researchers: Researchers should explore combining multiple augmentation techniques (e.g., token-level and semantic-level) and investigate advanced models like BERT-large or multilingual variants for improved classification accuracy. Additionally, examining the impact of different OCR tools and preprocessing strategies could further enhance non-formal learning certificate recognition. Impact on Society: The findings of this study have significant implications for improving access to education and employment opportunities. By enhancing the recognition of prior learning through automated classification of non-formal learning certificates, this research supports a more inclusive and equitable education system. It can help individuals, particularly those with non-traditional educational backgrounds, gain recognition for their skills, ultimately bridging the skills gap in the workforce and promoting lifelong learning in the digital economy. Future Research: Future research should focus on expanding the dataset to include multilingual certificates, which would enhance the model’s ability to generalize across different languages and cultural contexts. Additionally, researchers could investigate the use of hybrid models that combine BERT with other machine learning techniques to further improve classification accuracy. Exploring the integration of real-world data sources, such as employer-verified work experience and additional non-formal learning formats, could also provide a more comprehensive approach to recognizing prior learning.

Informing Science Institute

I Gede Susrama Mas Diyasa Eva Yulia Puspaningrum Dimas Saputra Wan Suryani Wan Awang

Interdisciplinary Journal of Information, Knowledge, and Management

2025

Title: Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Description:

This study aims to evaluate the effectiveness of different text augmentation strategies—character-level, token-level, and semantic-level—in improving the classification accuracy of these certificates, which are crucial for bridging the skills gap in the digital economy.

Background: Traditional education systems often overlook skills gained through non-formal learning, creating a gap between industry needs and academic qualifications.

This paper addresses this by using BERT-based deep learning models to classify non-formal learning certificates, enhanced by text augmentation techniques to improve accuracy in mapping them to formal academic standards.

Methodology: This study employs a deep learning approach using Bidirectional Encoder Representations from Transformers (BERT) to classify non-formal learning certificates into seven core computer science courses.

The research utilizes text augmentation techniques at character, token, and semantic levels to improve classification accuracy.

A dataset of 525 certificates, collected through data gathering, was preprocessed using Optical Character Recognition (OCR) to extract text from PDF documents, followed by cleaning and augmentation before training the BERT model.

Contribution: This paper addresses the growing need for efficient Recognition of Prior Learning (RPL) in the context of rapidly advancing knowledge, particularly in the AI era, where non-formal learning is becoming increasingly important.

We present a novel approach to automating the classification and validation of non-formal learning certificates using deep learning techniques.

The study evaluates and compares character-level, token-level, and semantic-level text augmentation methods to improve the accuracy of certificate classification.

What sets this research apart is the systematic assessment of which augmentation method best enhances model performance for RPL tasks, providing new insights into optimizing deep learning models for this purpose.

The findings aim to reduce human error and improve the efficiency of RPL implementation, offering a scalable solution for better integrating or converting non-formal learning into formal educational systems.

Findings: The study found that token-level augmentations, particularly word insertion and word deletion, significantly improved classification accuracy, with validation accuracies exceeding 88%.

Character-level augmentations also contributed to model performance, but with slightly lower accuracy.

Semantic-level augmentation via back translation showed the least impact.

These results demonstrate that token-level text augmentations offer the most effective strategy for enhancing the classification of non-formal learning certificates in the context of Recognition of Prior Learning (RPL).

Recommendations for Practitioners: Practitioners should focus on token-level text augmentation techniques, like word insertion and deletion, to improve the accuracy of machine learning models for classifying non-formal learning certificates, enabling better integration into formal education and employment pathways.

Recommendation for Researchers: Researchers should explore combining multiple augmentation techniques (e.

, token-level and semantic-level) and investigate advanced models like BERT-large or multilingual variants for improved classification accuracy.

Additionally, examining the impact of different OCR tools and preprocessing strategies could further enhance non-formal learning certificate recognition.

Impact on Society: The findings of this study have significant implications for improving access to education and employment opportunities.

By enhancing the recognition of prior learning through automated classification of non-formal learning certificates, this research supports a more inclusive and equitable education system.

It can help individuals, particularly those with non-traditional educational backgrounds, gain recognition for their skills, ultimately bridging the skills gap in the workforce and promoting lifelong learning in the digital economy.

Future Research: Future research should focus on expanding the dataset to include multilingual certificates, which would enhance the model’s ability to generalize across different languages and cultural contexts.

Additionally, researchers could investigate the use of hybrid models that combine BERT with other machine learning techniques to further improve classification accuracy.

Exploring the integration of real-world data sources, such as employer-verified work experience and additional non-formal learning formats, could also provide a more comprehensive approach to recognizing prior learning.

Back

In order to realize an artificial intelligent system, a basic mechanism should be provided for expressing and processing the semantic. We have presented semantic computing models i...

Improving Medical Document Classification via Feature Engineering

Document classification (DC) is the task of assigning the predefined labels to unseen documents by utilizing the model trained on the available labeled documents...

E-Press and Oppress

From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...

On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/

<spa...

Analisis Hukum Terhadap Jaminan Sertifikat Tanah yang Bukan Milik Sendiri Berdasarkan Perjanjian Pinjam Pakai dalam KUH Perdata

This research aims to find out and analyze the form of a loan-to-use agreement for a certificate that is not one's own as collateral in the Civil Code and to find out and analyze t...

Integrating Character Education on Physics Courses with Schoology Based E-learning

Aim/Purpose: This study intends to find out the difference between the use of Schoology-based e-learning and conventional learning by integrating character education in the learnin...

Text Data Augmentation for Deep Learning

Abstract Natural Language Processing (NLP) is one of the most captivating applications of Deep Learning. In this survey, we consider how the Data Augmentation training stra...

The Effectiveness of Data Augmentation for Bone Suppression in Chest Radiograph using Convolutional Neural Network

Objective: Bone suppression of chest radiograph holds great promise to improve the localization accuracy in Image-Guided Radiation Therapy (IGRT). However, data scarcity has long b...

Email:
Password:

Email:

Enhancing Non-Formal Learning Certificate Classification with Text Augmentation: A Comparison of Character, Token, and Semantic Approaches

Related Results