Javascript must be enabled to continue!

An Enhanced Neural Word Embedding Model for Transfer Learning

Due to the expansion of data generation, more and more natural language processing (NLP) tasks are needing to be solved. For this, word representation plays a vital role. Computation-based word embedding in various high languages is very useful. However, until now, low-resource languages such as Bangla have had very limited resources available in terms of models, toolkits, and datasets. Considering this fact, in this paper, an enhanced BanglaFastText word embedding model is developed using Python and two large pre-trained Bangla models of FastText (Skip-gram and cbow). These pre-trained models were trained on a collected large Bangla corpus (around 20 million points of text data, in which every paragraph of text is considered as a data point). BanglaFastText outperformed Facebook’s FastText by a significant margin. To evaluate and analyze the performance of these pre-trained models, the proposed work accomplished text classification based on three popular textual Bangla datasets, and developed models using various machine learning classical approaches, as well as a deep neural network. The evaluations showed a superior performance over existing word embedding techniques and the Facebook Bangla FastText pre-trained model for Bangla NLP. In addition, the performance in the original work concerning these textual datasets provides excellent results. A Python toolkit is proposed, which is convenient for accessing the models and using the models for word embedding, obtaining semantic relationships word-by-word or sentence-by-sentence; sentence embedding for classical machine learning approaches; and also the unsupervised finetuning of any Bangla linguistic dataset.

MDPI AG

Md. Kowsher Md. Shohanur Islam Sobuj Md. Fahim Shahriar Nusrat Jahan Prottasha Mohammad Shamsul Arefin Pranab Kumar Dhar Takeshi Koshiba

Applied Sciences

2022

Title: An Enhanced Neural Word Embedding Model for Transfer Learning

Description:

Due to the expansion of data generation, more and more natural language processing (NLP) tasks are needing to be solved.

For this, word representation plays a vital role.

Computation-based word embedding in various high languages is very useful.

However, until now, low-resource languages such as Bangla have had very limited resources available in terms of models, toolkits, and datasets.

Considering this fact, in this paper, an enhanced BanglaFastText word embedding model is developed using Python and two large pre-trained Bangla models of FastText (Skip-gram and cbow).

These pre-trained models were trained on a collected large Bangla corpus (around 20 million points of text data, in which every paragraph of text is considered as a data point).

BanglaFastText outperformed Facebook’s FastText by a significant margin.

To evaluate and analyze the performance of these pre-trained models, the proposed work accomplished text classification based on three popular textual Bangla datasets, and developed models using various machine learning classical approaches, as well as a deep neural network.

The evaluations showed a superior performance over existing word embedding techniques and the Facebook Bangla FastText pre-trained model for Bangla NLP.

In addition, the performance in the original work concerning these textual datasets provides excellent results.

A Python toolkit is proposed, which is convenient for accessing the models and using the models for word embedding, obtaining semantic relationships word-by-word or sentence-by-sentence; sentence embedding for classical machine learning approaches; and also the unsupervised finetuning of any Bangla linguistic dataset.

Back

The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...

Računalno potpomognuto usmjeravanje kod dvojezičnih govornika

This thesis investigates whether modern computer models can confirm how people encounter words and then use these findings in didactics. In recent years, computers have been used i...

A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification

Sentiment analysis on social media platforms (i.e., Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences. However, the accuracy of senti...

A Technique for Constructing <span class="changedDisabl

To solve the problem of constructing the frequency responses (FR) of filters on switched capacitors, which belong to the class of electronic circuits with a periodically changing s...

Exploring the effectiveness of word embedding based deep learning model for improving email classification

PurposeClassifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature...

Successful Replacement Therapy After <span c

Background. Vitamin D has recognized immunomodulatory, anti-proliferative, and differentiation-regulating effects primarily mediated through its genomic effects via the vitamin D r...

'A Large Quantity of E

The succesful escape from slavery between the late 17th and the mid 19thth century depended greatly on the runaway’s skills in adapting themselves to their natural environment. Alt...

Exploratory AI-Assisted ML Screening <s

This technical note reports an exploratory, AI-assisted in silico proof of concept implementing a “signaling first, killing later” discovery paradigm: prioritizing compounds with h...

Email:
Password:

Email:

An Enhanced Neural Word Embedding Model for Transfer Learning

Related Results