Javascript must be enabled to continue!
An Enhanced Neural Word Embedding Model for Transfer Learning
View through CrossRef
Due to the expansion of data generation, more and more natural language processing (NLP) tasks are needing to be solved. For this, word representation plays a vital role. Computation-based word embedding in various high languages is very useful. However, until now, low-resource languages such as Bangla have had very limited resources available in terms of models, toolkits, and datasets. Considering this fact, in this paper, an enhanced BanglaFastText word embedding model is developed using Python and two large pre-trained Bangla models of FastText (Skip-gram and cbow). These pre-trained models were trained on a collected large Bangla corpus (around 20 million points of text data, in which every paragraph of text is considered as a data point). BanglaFastText outperformed Facebook’s FastText by a significant margin. To evaluate and analyze the performance of these pre-trained models, the proposed work accomplished text classification based on three popular textual Bangla datasets, and developed models using various machine learning classical approaches, as well as a deep neural network. The evaluations showed a superior performance over existing word embedding techniques and the Facebook Bangla FastText pre-trained model for Bangla NLP. In addition, the performance in the original work concerning these textual datasets provides excellent results. A Python toolkit is proposed, which is convenient for accessing the models and using the models for word embedding, obtaining semantic relationships word-by-word or sentence-by-sentence; sentence embedding for classical machine learning approaches; and also the unsupervised finetuning of any Bangla linguistic dataset.
Title: An Enhanced Neural Word Embedding Model for Transfer Learning
Description:
Due to the expansion of data generation, more and more natural language processing (NLP) tasks are needing to be solved.
For this, word representation plays a vital role.
Computation-based word embedding in various high languages is very useful.
However, until now, low-resource languages such as Bangla have had very limited resources available in terms of models, toolkits, and datasets.
Considering this fact, in this paper, an enhanced BanglaFastText word embedding model is developed using Python and two large pre-trained Bangla models of FastText (Skip-gram and cbow).
These pre-trained models were trained on a collected large Bangla corpus (around 20 million points of text data, in which every paragraph of text is considered as a data point).
BanglaFastText outperformed Facebook’s FastText by a significant margin.
To evaluate and analyze the performance of these pre-trained models, the proposed work accomplished text classification based on three popular textual Bangla datasets, and developed models using various machine learning classical approaches, as well as a deep neural network.
The evaluations showed a superior performance over existing word embedding techniques and the Facebook Bangla FastText pre-trained model for Bangla NLP.
In addition, the performance in the original work concerning these textual datasets provides excellent results.
A Python toolkit is proposed, which is convenient for accessing the models and using the models for word embedding, obtaining semantic relationships word-by-word or sentence-by-sentence; sentence embedding for classical machine learning approaches; and also the unsupervised finetuning of any Bangla linguistic dataset.
Related Results
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Računalno potpomognuto usmjeravanje kod dvojezičnih govornika
Računalno potpomognuto usmjeravanje kod dvojezičnih govornika
This thesis investigates whether modern computer models can confirm how people encounter words and then use these findings in didactics. In recent years, computers have been used i...
A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification
A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification
Sentiment analysis on social media platforms (i.e., Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences. However, the accuracy of senti...
<span class="word">A <span class="word"><span class="changedDisabled">Technique <span class="word">for <span class="word"><span class="changedDisabled">Constructing <span class="word"><span class="changedDisabl
<span class="word">A <span class="word"><span class="changedDisabled">Technique <span class="word">for <span class="word"><span class="changedDisabled">Constructing <span class="word"><span class="changedDisabl
To solve the problem of constructing the frequency responses (FR) of filters on switched capacitors, which belong to the class of electronic circuits with a periodically changing s...
Exploring the effectiveness of word embedding based deep learning model for improving email classification
Exploring the effectiveness of word embedding based deep learning model for improving email classification
PurposeClassifying emails as ham or spam based on their content is essential. Determining the semantic and syntactic meaning of words and putting them in a high-dimensional feature...
<span class="word">Successful <span class="word"><span class="changedDisabled">Replacement <span class="word"><span class="changedDisabled">Therapy <span class="word"><span class="changedDisabled">After <span c
<span class="word">Successful <span class="word"><span class="changedDisabled">Replacement <span class="word"><span class="changedDisabled">Therapy <span class="word"><span class="changedDisabled">After <span c
Background. Vitamin D has recognized immunomodulatory, anti-proliferative, and differentiation-regulating effects primarily mediated through its genomic effects via the vitamin D r...
<span class="word">'A <span class="word"><span class="changedDisabled">Large <span class="word"><span class="changedDisabled">Quantity <span class="word">of <span class="word"><span class="changedDisabled">E
<span class="word">'A <span class="word"><span class="changedDisabled">Large <span class="word"><span class="changedDisabled">Quantity <span class="word">of <span class="word"><span class="changedDisabled">E
The succesful escape from slavery between the late 17th and the mid 19thth century depended greatly on the runaway’s skills in adapting themselves to their natural environment. Alt...
<span class="word">Exploratory <span class="word allCaps">AI-<span class="word"><span class="changedDisabled">Assisted <span class="word allCaps">ML <span class="word"><span class="changedDisabled">Screening <s
<span class="word">Exploratory <span class="word allCaps">AI-<span class="word"><span class="changedDisabled">Assisted <span class="word allCaps">ML <span class="word"><span class="changedDisabled">Screening <s
This technical note reports an exploratory, AI-assisted in silico proof of concept implementing a “signaling first, killing later” discovery paradigm: prioritizing compounds with h...

