Javascript must be enabled to continue!
Attention-enabled Multi-layer Subword Joint Learning for Chinese Word Embedding
View through CrossRef
Abstract
In recent years, Chinese word embeddings have attracted significant attention in the field of natural language processing (NLP). The complex structures and diverse influences of Chinese characters present distinct challenges for semantic representation. As a result, Chinese word embeddings are primarily investigated in conjunction with characters and their subcomponents. Previous research has demonstrated that word vectors frequently fail to capture the subtle semantics embedded within the complex structure of Chinese characters. Furthermore, they often neglect the varying contributions of subword information to semantics at different levels. To tackle these challenges, we present a weight-based word vector model that takes into account the internal structure of Chinese words at various levels. The model further categorizes the internal structure of Chinese words into six layers of subword information: words, characters, components, pinyin, strokes, and structures. The semantics of Chinese words can be derived by integrating the subword information from various layers. Moreover, the model considers the varying contributions of each subword layer to the semantics of Chinese words. It utilizes an attention mechanism to determine the weights between and within the subword layers, facilitating the comprehensive extraction of word semantics. The word-level subwords act as the attention mechanism query for subwords in other layers to learn semantic bias. Experimental results show that the proposed word vector model achieves enhancements in various evaluation metrics, such as word similarity, word analogy, text categorization, and case studies.
Springer Science and Business Media LLC
Title: Attention-enabled Multi-layer Subword Joint Learning for Chinese Word Embedding
Description:
Abstract
In recent years, Chinese word embeddings have attracted significant attention in the field of natural language processing (NLP).
The complex structures and diverse influences of Chinese characters present distinct challenges for semantic representation.
As a result, Chinese word embeddings are primarily investigated in conjunction with characters and their subcomponents.
Previous research has demonstrated that word vectors frequently fail to capture the subtle semantics embedded within the complex structure of Chinese characters.
Furthermore, they often neglect the varying contributions of subword information to semantics at different levels.
To tackle these challenges, we present a weight-based word vector model that takes into account the internal structure of Chinese words at various levels.
The model further categorizes the internal structure of Chinese words into six layers of subword information: words, characters, components, pinyin, strokes, and structures.
The semantics of Chinese words can be derived by integrating the subword information from various layers.
Moreover, the model considers the varying contributions of each subword layer to the semantics of Chinese words.
It utilizes an attention mechanism to determine the weights between and within the subword layers, facilitating the comprehensive extraction of word semantics.
The word-level subwords act as the attention mechanism query for subwords in other layers to learn semantic bias.
Experimental results show that the proposed word vector model achieves enhancements in various evaluation metrics, such as word similarity, word analogy, text categorization, and case studies.
Related Results
<span class="word">A <span class="word"><span class="changedDisabled">Technique <span class="word">for <span class="word"><span class="changedDisabled">Constructing <span class="word"><span class="changedDisabl
<span class="word">A <span class="word"><span class="changedDisabled">Technique <span class="word">for <span class="word"><span class="changedDisabled">Constructing <span class="word"><span class="changedDisabl
To solve the problem of constructing the frequency responses (FR) of filters on switched capacitors, which belong to the class of electronic circuits with a periodically changing s...
<span class="word">Successful <span class="word"><span class="changedDisabled">Replacement <span class="word"><span class="changedDisabled">Therapy <span class="word"><span class="changedDisabled">After <span c
<span class="word">Successful <span class="word"><span class="changedDisabled">Replacement <span class="word"><span class="changedDisabled">Therapy <span class="word"><span class="changedDisabled">After <span c
Background. Vitamin D has recognized immunomodulatory, anti-proliferative, and differentiation-regulating effects primarily mediated through its genomic effects via the vitamin D r...
<span class="word">Exploratory <span class="word allCaps">AI-<span class="word"><span class="changedDisabled">Assisted <span class="word allCaps">ML <span class="word"><span class="changedDisabled">Screening <s
<span class="word">Exploratory <span class="word allCaps">AI-<span class="word"><span class="changedDisabled">Assisted <span class="word allCaps">ML <span class="word"><span class="changedDisabled">Screening <s
This technical note reports an exploratory, AI-assisted in silico proof of concept implementing a “signaling first, killing later” discovery paradigm: prioritizing compounds with h...
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Differential Diagnosis of Neurogenic Thoracic Outlet Syndrome: A Review
Abstract
Thoracic outlet syndrome (TOS) is a complex and often overlooked condition caused by the compression of neurovascular structures as they pass through the thoracic outlet. ...
<span class="word">IMGT® <span class="word"><span class="changedDisabled">Nomenclature <span class="word">of <span class="word"><span class="changedDisabled">Immunoglobulins (<span class="word allCaps">IG) <spa
<span class="word">IMGT® <span class="word"><span class="changedDisabled">Nomenclature <span class="word">of <span class="word"><span class="changedDisabled">Immunoglobulins (<span class="word allCaps">IG) <spa
The immunoglobulins (IG) or antibodies and the T cell receptors (TR) are the antigen receptors of the adaptive immune responses (AIR) of the jawed vertebrates (Gnathostomata). IMGT...
<span class="word">Strontium <span class="word"><span class="changedDisabled">Substitution, <span class="word"><span class="changedDisabled">Coordination <span class="word"><span class="changedDisabled">Chemistry
<span class="word">Strontium <span class="word"><span class="changedDisabled">Substitution, <span class="word"><span class="changedDisabled">Coordination <span class="word"><span class="changedDisabled">Chemistry
This study investigates the effect of progressive CaO/SrO substitution on the structure, crystallisation behaviour, and coordination chemistry of fluorapatite-forming glass-ceramic...
Effective Attributed Network Embedding with Information Behavior Extraction
Effective Attributed Network Embedding with Information Behavior Extraction
Abstract
Network embedding has shown its effectiveness in many tasks such as link prediction, node classification, and community detection. Most attributed network embeddin...
Effective attributed network embedding with information behavior extraction
Effective attributed network embedding with information behavior extraction
Network embedding has shown its effectiveness in many tasks, such as link prediction, node classification, and community detection. Most attributed network embedding methods consid...

