Javascript must be enabled to continue!
Korean Subword vocabulary optimization by removing compositional words in neural machine translation
View through CrossRef
Byte Pair Encoding (BPE) is widely recognized as an effective approach for machine translation across multiple languages. However, in morphologically rich languages such as Korean, BPE can lead to excessive segmentation, which harms word semantics and creates semantic confusion during the training. This semantic confusion ultimately leads to an overall degradation in translation quality. Subword segmentation is an effective solution to the vocabulary problem in neural machine translation. This paper proposes a method to optimize the Korean subword vocabulary for neural machine translation, based on the fact that a Korean subword vocabulary created with the BPE training algorithm contains many compositional subwords. The optimized Korean subword vocabulary demonstrates experimentally stabilized translation performance by maintaining a balanced distribution while removing unnecessary compositional subwords.
Title: Korean Subword vocabulary optimization by removing compositional words in neural machine translation
Description:
Byte Pair Encoding (BPE) is widely recognized as an effective approach for machine translation across multiple languages.
However, in morphologically rich languages such as Korean, BPE can lead to excessive segmentation, which harms word semantics and creates semantic confusion during the training.
This semantic confusion ultimately leads to an overall degradation in translation quality.
Subword segmentation is an effective solution to the vocabulary problem in neural machine translation.
This paper proposes a method to optimize the Korean subword vocabulary for neural machine translation, based on the fact that a Korean subword vocabulary created with the BPE training algorithm contains many compositional subwords.
The optimized Korean subword vocabulary demonstrates experimentally stabilized translation performance by maintaining a balanced distribution while removing unnecessary compositional subwords.
Related Results
Attention-enabled Multi-layer Subword Joint Learning for Chinese Word Embedding
Attention-enabled Multi-layer Subword Joint Learning for Chinese Word Embedding
Abstract
In recent years, Chinese word embeddings have attracted significant attention in the field of natural language processing (NLP). The complex structures and diverse...
Constitutional Policy Protection of North Korean Residents: Focusing on Improvement of North Korean Human Rights Act
Constitutional Policy Protection of North Korean Residents: Focusing on Improvement of North Korean Human Rights Act
This paper examines the protection of the constitutional rights of North Korean residents, focusing on ways to improve the North Korean Human Rights Act, which has faced various di...
SPECIFIC TRAITS OF HUNGARIAN-UKRAINIAN POETRY TRANSLATION (BASED ON YURII SHKROBYNETS’ TRANSLATIONS)
SPECIFIC TRAITS OF HUNGARIAN-UKRAINIAN POETRY TRANSLATION (BASED ON YURII SHKROBYNETS’ TRANSLATIONS)
The article addresses matters related to the peculiarities of Hungarian-Ukrainian poetic translation. It was noted that the quality, complexity and overall mastery of literary tran...
Framing Buku Pernah Tenggelam Terhadap Fenomena Korean Wave
Framing Buku Pernah Tenggelam Terhadap Fenomena Korean Wave
Abstract. Nowadays, Korean wave is growing in Indonesia, but there are also various phenomena of Korean wave that are contrary to Islamic law. Fuadh Naim, a former Korean wave fanb...
Translation
Translation
The theoretical, empirical, and pedagogic study of translation is the concern of the interdisciplinary and international field of scholarship known, since 1972, as translation stud...
Compositional Space Parameterization for Flow Simulation
Compositional Space Parameterization for Flow Simulation
Abstract
Thermodynamic equilibrium (flash) calculations in compositional simulators are used to find the partitioning of components among fluid phases. The basic ...
Editorial Introduction: Translating the Future: Exploring the Impact of Technology and AI on Modern Translation Studies
Editorial Introduction: Translating the Future: Exploring the Impact of Technology and AI on Modern Translation Studies
We have entered into the era of artificial intelligence, neural machine translation, and especially large language models which have dramatically changed the landscape of human tra...
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Žanrovska analiza pomorskopravnih tekstova i ostvarenje prijevodnih univerzalija u njihovim prijevodima s engleskoga jezika
Genre implies formal and stylistic conventions of a particular text type, which inevitably affects the translation process. This „force of genre bias“ (Prieto Ramos, 2014) has been...

