Javascript must be enabled to continue!

Attention-enabled Multi-layer Subword Joint Learning for Chinese Word Embedding

Abstract In recent years, Chinese word embeddings have attracted significant attention in the field of natural language processing (NLP). The complex structures and diverse influences of Chinese characters present distinct challenges for semantic representation. As a result, Chinese word embeddings are primarily investigated in conjunction with characters and their subcomponents. Previous research has demonstrated that word vectors frequently fail to capture the subtle semantics embedded within the complex structure of Chinese characters. Furthermore, they often neglect the varying contributions of subword information to semantics at different levels. To tackle these challenges, we present a weight-based word vector model that takes into account the internal structure of Chinese words at various levels. The model further categorizes the internal structure of Chinese words into six layers of subword information: words, characters, components, pinyin, strokes, and structures. The semantics of Chinese words can be derived by integrating the subword information from various layers. Moreover, the model considers the varying contributions of each subword layer to the semantics of Chinese words. It utilizes an attention mechanism to determine the weights between and within the subword layers, facilitating the comprehensive extraction of word semantics. The word-level subwords act as the attention mechanism query for subwords in other layers to learn semantic bias. Experimental results show that the proposed word vector model achieves enhancements in various evaluation metrics, such as word similarity, word analogy, text categorization, and case studies.

Springer Science and Business Media LLC

Pengpeng Xue Liang Tan Jing Xiong Zhongzhu Liu Kanglong Liu

2024

Title: Attention-enabled Multi-layer Subword Joint Learning for Chinese Word Embedding

Description:

Abstract In recent years, Chinese word embeddings have attracted significant attention in the field of natural language processing (NLP).

The complex structures and diverse influences of Chinese characters present distinct challenges for semantic representation.

As a result, Chinese word embeddings are primarily investigated in conjunction with characters and their subcomponents.

Previous research has demonstrated that word vectors frequently fail to capture the subtle semantics embedded within the complex structure of Chinese characters.

Furthermore, they often neglect the varying contributions of subword information to semantics at different levels.

To tackle these challenges, we present a weight-based word vector model that takes into account the internal structure of Chinese words at various levels.

The model further categorizes the internal structure of Chinese words into six layers of subword information: words, characters, components, pinyin, strokes, and structures.

The semantics of Chinese words can be derived by integrating the subword information from various layers.

Moreover, the model considers the varying contributions of each subword layer to the semantics of Chinese words.

It utilizes an attention mechanism to determine the weights between and within the subword layers, facilitating the comprehensive extraction of word semantics.

The word-level subwords act as the attention mechanism query for subwords in other layers to learn semantic bias.

Experimental results show that the proposed word vector model achieves enhancements in various evaluation metrics, such as word similarity, word analogy, text categorization, and case studies.

Back

Abstract Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...

Effective Attributed Network Embedding with Information Behavior Extraction

Abstract Network embedding has shown its effectiveness in many tasks such as link prediction, node classification, and community detection. Most attributed network embeddin...

Effective attributed network embedding with information behavior extraction

Network embedding has shown its effectiveness in many tasks, such as link prediction, node classification, and community detection. Most attributed network embedding methods consid...

An Efficient ZZW Construction Using Low-Density Generator-Matrix Embedding Techniques

A novel steganographic algorithm based on ZZW construction is proposed to improve the steganographic embedding efficiency. Low-density generator-matrix (LDGM) embedding is an effic...

The Existential and Anthropological Semantics of the Word in Late 17th-Century Sermons

This article describes the semantics of the word concept, which is represented in late 17th-century homiletic texts. It is defined by the topics of sermons in terms of their ontolo...

COMMERCIALIZATION OF THE RESULTS OF INNOVATIVE ACTIVITY OF JOINT UKRAINIAN-CHINESE ENTERPRISES

Research objective. The objective of this study is to examine the current state of the establishment of joint Ukrainian-Chinese enterprises and to analyse the prospects for the for...

Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic 

Abstract Background: To minimize the risk of infection during the COVID-19 pandemic, the learning mode of universities in China has been adjusted, and the online learning o...

Synchronizability and eigenvalues of two-layer star networks

From the study of multilayer networks, scientists have found that the properties of the multilayer networks show great difference from those of the traditional complex networks. In...

Email:
Password:

Email:

Attention-enabled Multi-layer Subword Joint Learning for Chinese Word Embedding

Related Results