Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

View through CrossRef
Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.
Title: Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili
Description:
Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role.
However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa.
This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages.
Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network.
The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.
229 perplexity value) and the medium dataset (45.
859 perplexity value), outperforming character-aware language models.
We further evaluated the word embeddings using word analogy task.
To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors.
Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.

Related Results

Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Means of realization of language game the Swahili language paroemias
Means of realization of language game the Swahili language paroemias
The object of this article is the language game in its various manifestations. The subject of this research is the proverbs and sayings of the Swahili language, in which the author...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract Introduction Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
On the history of studying proverbs in the Swahili language
On the history of studying proverbs in the Swahili language
The subject of this research is the African paremiology. The object is the history of studying proverbs in the Swahili language. The author examines the chronology of studying this...
The Use of Adeno-associated virus (AAV) in Vaccine Development
The Use of Adeno-associated virus (AAV) in Vaccine Development
  Adeno-associated virus (AAV) is a very tiny (20-26 nm) icosahedral and non-enveloped virus, and it belongs to the Parvoviridae family. AAV vectors are the most widely used ...
Chest Wall Hydatid Cysts: A Systematic Review
Chest Wall Hydatid Cysts: A Systematic Review
Abstract Introduction Given the rarity of chest wall hydatid disease, information on this condition is primarily drawn from case reports. Hence, this study systematically reviews t...
The Existential and Anthropological Semantics of the Word in Late 17th-Century Sermons
The Existential and Anthropological Semantics of the Word in Late 17th-Century Sermons
This article describes the semantics of the word concept, which is represented in late 17th-century homiletic texts. It is defined by the topics of sermons in terms of their ontolo...
Spoken Word Recognition
Spoken Word Recognition
The core question that spoken word recognition research attempts to address is: How does a phonological word-form activate the corresponding lexical representation that is stored i...

Back to Top