Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

An Enhanced Neural Word Embedding Model for Transfer Learning

View through CrossRef
Due to the expansion of data generation, more and more natural language processing (NLP) tasks are needing to be solved. For this, word representation plays a vital role. Computation-based word embedding in various high languages is very useful. However, until now, low-resource languages such as Bangla have had very limited resources available in terms of models, toolkits, and datasets. Considering this fact, in this paper, an enhanced BanglaFastText word embedding model is developed using Python and two large pre-trained Bangla models of FastText (Skip-gram and cbow). These pre-trained models were trained on a collected large Bangla corpus (around 20 million points of text data, in which every paragraph of text is considered as a data point). BanglaFastText outperformed Facebook’s FastText by a significant margin. To evaluate and analyze the performance of these pre-trained models, the proposed work accomplished text classification based on three popular textual Bangla datasets, and developed models using various machine learning classical approaches, as well as a deep neural network. The evaluations showed a superior performance over existing word embedding techniques and the Facebook Bangla FastText pre-trained model for Bangla NLP. In addition, the performance in the original work concerning these textual datasets provides excellent results. A Python toolkit is proposed, which is convenient for accessing the models and using the models for word embedding, obtaining semantic relationships word-by-word or sentence-by-sentence; sentence embedding for classical machine learning approaches; and also the unsupervised finetuning of any Bangla linguistic dataset.
Title: An Enhanced Neural Word Embedding Model for Transfer Learning
Description:
Due to the expansion of data generation, more and more natural language processing (NLP) tasks are needing to be solved.
For this, word representation plays a vital role.
Computation-based word embedding in various high languages is very useful.
However, until now, low-resource languages such as Bangla have had very limited resources available in terms of models, toolkits, and datasets.
Considering this fact, in this paper, an enhanced BanglaFastText word embedding model is developed using Python and two large pre-trained Bangla models of FastText (Skip-gram and cbow).
These pre-trained models were trained on a collected large Bangla corpus (around 20 million points of text data, in which every paragraph of text is considered as a data point).
BanglaFastText outperformed Facebook’s FastText by a significant margin.
To evaluate and analyze the performance of these pre-trained models, the proposed work accomplished text classification based on three popular textual Bangla datasets, and developed models using various machine learning classical approaches, as well as a deep neural network.
The evaluations showed a superior performance over existing word embedding techniques and the Facebook Bangla FastText pre-trained model for Bangla NLP.
In addition, the performance in the original work concerning these textual datasets provides excellent results.
A Python toolkit is proposed, which is convenient for accessing the models and using the models for word embedding, obtaining semantic relationships word-by-word or sentence-by-sentence; sentence embedding for classical machine learning approaches; and also the unsupervised finetuning of any Bangla linguistic dataset.

Related Results

CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Računalno potpomognuto usmjeravanje kod dvojezičnih govornika
Računalno potpomognuto usmjeravanje kod dvojezičnih govornika
This thesis investigates whether modern computer models can confirm how people encounter words and then use these findings in didactics. In recent years, computers have been used i...
A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification
A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification
Sentiment analysis on social media platforms (i.e., Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences. However, the accuracy of senti...
Exploring the effectiveness of word embedding based deep learning model for improving email classification
Exploring the effectiveness of word embedding based deep learning model for improving email classification
PurposeClassifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature...

Back to Top