Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Uyghur–Kazakh–Kirghiz Text Keyword Extraction Based on Morpheme Segmentation

View through CrossRef
In this study, based on a morpheme segmentation framework, we researched a text keyword extraction method for Uyghur, Kazakh and Kirghiz languages, which have similar grammatical and lexical structures. In these languages, affixes and a stem are joined together to form a word. A stem is a word particle with a notional meaning, while the affixes perform grammatical functions. Because of these derivative properties, the vocabularies used for these languages are huge. Therefore, pre-processing is a necessary step in NLP tasks for Uyghur, Kazakh and Kirghiz. Morpheme segmentation enabled us to remove the suffixes as the auxiliary unit while retaining the meaningful stem and it reduced the dimension of the feature space present in the keyword extraction task for Uyghur, Kazakh and Kirghiz texts. We transformed the morpheme segmentation task into the problem of labeling the morpheme sequences, and we used the Bi-LSTM network to bidirectionally obtain the position feature information of character sequences. We applied CRF to effectively learn the information of the preceding and following label sequences to build a highly accurate Bi-LSTM_CRF morpheme segmentation model, and we prepared morpheme-based experimental text sets by using this model. Subsequently, we used the stem vectors’ similarity to modify the TextRank algorithm, subsequent to the training of the stem embedding vector using the Doc2vec algorithm, and then we performed a text keyword extraction experiment. In this experiment, the highest F1 scores of 43.8%, 44% and 43.9% were obtained for three datasets. The experimental results show that the morpheme-based approach provides much better results than the word-based approach, which shows the stem vector similarity weighting is an efficient method for the text keyword extraction task, thus proving the efficiency of morpheme sequence for morphologically derivative languages.
Title: Uyghur–Kazakh–Kirghiz Text Keyword Extraction Based on Morpheme Segmentation
Description:
In this study, based on a morpheme segmentation framework, we researched a text keyword extraction method for Uyghur, Kazakh and Kirghiz languages, which have similar grammatical and lexical structures.
In these languages, affixes and a stem are joined together to form a word.
A stem is a word particle with a notional meaning, while the affixes perform grammatical functions.
Because of these derivative properties, the vocabularies used for these languages are huge.
Therefore, pre-processing is a necessary step in NLP tasks for Uyghur, Kazakh and Kirghiz.
Morpheme segmentation enabled us to remove the suffixes as the auxiliary unit while retaining the meaningful stem and it reduced the dimension of the feature space present in the keyword extraction task for Uyghur, Kazakh and Kirghiz texts.
We transformed the morpheme segmentation task into the problem of labeling the morpheme sequences, and we used the Bi-LSTM network to bidirectionally obtain the position feature information of character sequences.
We applied CRF to effectively learn the information of the preceding and following label sequences to build a highly accurate Bi-LSTM_CRF morpheme segmentation model, and we prepared morpheme-based experimental text sets by using this model.
Subsequently, we used the stem vectors’ similarity to modify the TextRank algorithm, subsequent to the training of the stem embedding vector using the Doc2vec algorithm, and then we performed a text keyword extraction experiment.
In this experiment, the highest F1 scores of 43.
8%, 44% and 43.
9% were obtained for three datasets.
The experimental results show that the morpheme-based approach provides much better results than the word-based approach, which shows the stem vector similarity weighting is an efficient method for the text keyword extraction task, thus proving the efficiency of morpheme sequence for morphologically derivative languages.

Related Results

ARCHIVAL MATERIALS ABOUT KAZAKH BATYRS(XVIII - FIRST HALF OF XIX CENTURIES)
ARCHIVAL MATERIALS ABOUT KAZAKH BATYRS(XVIII - FIRST HALF OF XIX CENTURIES)
This article describes reliable historical archival materials about Kazakh batyrs (knights, heroes) from the XVIII century to the first half of the XIXcenturies. All historical eve...
LEXICAL PARADIGMATICS OF OLD UYGHUR AND KAZAKH LANGUAGES
LEXICAL PARADIGMATICS OF OLD UYGHUR AND KAZAKH LANGUAGES
This article examines the lexical paradigms of the Old Uyghur and Kazakh languages. Homonyms, synonyms, and antonyms of the two related languages – ancient and modern – are compare...
UYGUR LITERATURE IN INDEPENDENT KAZAKHSTAN: PAST AND PRESENT
UYGUR LITERATURE IN INDEPENDENT KAZAKHSTAN: PAST AND PRESENT
The article provides a brief overview of the socio-political periods of the development of Uyghur literature in Kazakhstan, some of its thematic and genre features. The main goal o...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
Sleep Habits and Occurrence of Lowback Pain among Craftsmen
<span style="color: #000000; font-family: Verdana, Arial, Helvetica, sans-serif; font-size: 10px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; ...
DEIKSIS PERSONA DALAM BAHASA MUNA
DEIKSIS PERSONA DALAM BAHASA MUNA
Abstract : The purpose of this study is to describe the form and meaning of the word deiksis persona in the Muna language. This type of research is a qualitative descriptive. Quali...
An Analysis Of Derivational And Inflectional English Morphemes
An Analysis Of Derivational And Inflectional English Morphemes
derivation and inflection Morpheme  is one of the elements present in the field of morphology. Where the morphology is the study of morphemes, and morphemes are elements of languag...

Back to Top