Javascript must be enabled to continue!
MLNet: a multi-level multimodal named entity recognition architecture
View through CrossRef
In the field of human–computer interaction, accurate identification of talking objects can help robots to accomplish subsequent tasks such as decision-making or recommendation; therefore, object determination is of great interest as a pre-requisite task. Whether it is named entity recognition (NER) in natural language processing (NLP) work or object detection (OD) task in the computer vision (CV) field, the essence is to achieve object recognition. Currently, multimodal approaches are widely used in basic image recognition and natural language processing tasks. This multimodal architecture can perform entity recognition tasks more accurately, but when faced with short texts and images containing more noise, we find that there is still room for optimization in the image-text-based multimodal named entity recognition (MNER) architecture. In this study, we propose a new multi-level multimodal named entity recognition architecture, which is a network capable of extracting useful visual information for boosting semantic understanding and subsequently improving entity identification efficacy. Specifically, we first performed image and text encoding separately and then built a symmetric neural network architecture based on Transformer for multimodal feature fusion. We utilized a gating mechanism to filter visual information that is significantly related to the textual content, in order to enhance text understanding and achieve semantic disambiguation. Furthermore, we incorporated character-level vector encoding to reduce text noise. Finally, we employed Conditional Random Fields for label classification task. Experiments on the Twitter dataset show that our model works to increase the accuracy of the MNER task.
Frontiers Media SA
Title: MLNet: a multi-level multimodal named entity recognition architecture
Description:
In the field of human–computer interaction, accurate identification of talking objects can help robots to accomplish subsequent tasks such as decision-making or recommendation; therefore, object determination is of great interest as a pre-requisite task.
Whether it is named entity recognition (NER) in natural language processing (NLP) work or object detection (OD) task in the computer vision (CV) field, the essence is to achieve object recognition.
Currently, multimodal approaches are widely used in basic image recognition and natural language processing tasks.
This multimodal architecture can perform entity recognition tasks more accurately, but when faced with short texts and images containing more noise, we find that there is still room for optimization in the image-text-based multimodal named entity recognition (MNER) architecture.
In this study, we propose a new multi-level multimodal named entity recognition architecture, which is a network capable of extracting useful visual information for boosting semantic understanding and subsequently improving entity identification efficacy.
Specifically, we first performed image and text encoding separately and then built a symmetric neural network architecture based on Transformer for multimodal feature fusion.
We utilized a gating mechanism to filter visual information that is significantly related to the textual content, in order to enhance text understanding and achieve semantic disambiguation.
Furthermore, we incorporated character-level vector encoding to reduce text noise.
Finally, we employed Conditional Random Fields for label classification task.
Experiments on the Twitter dataset show that our model works to increase the accuracy of the MNER task.
Related Results
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Imagined worldviews in John Lennon’s “Imagine”: a multimodal re-performance / Visões de mundo imaginadas no “Imagine” de John Lennon: uma re-performance multimodal
Abstract: This paper addresses the issue of multimodal re-performance, a concept developed by us, in view of the fact that the famous song “Imagine”, by John Lennon, was published ...
A Phase 1b, Dose-Finding Study Of Ruxolitinib Plus Panobinostat In Patients With Primary Myelofibrosis (PMF), Post–Polycythemia Vera MF (PPV-MF), Or Post–Essential Thrombocythemia MF (PET-MF): Identification Of The Recommended Phase 2 Dose
A Phase 1b, Dose-Finding Study Of Ruxolitinib Plus Panobinostat In Patients With Primary Myelofibrosis (PMF), Post–Polycythemia Vera MF (PPV-MF), Or Post–Essential Thrombocythemia MF (PET-MF): Identification Of The Recommended Phase 2 Dose
Abstract
Background
Myelofibrosis (MF) is a myeloproliferative neoplasm associated with progressive, debilitating symptoms that ...
Few-Shot Named Entity Recognition with Hybrid Multi-Prototype Learning
Few-Shot Named Entity Recognition with Hybrid Multi-Prototype Learning
Abstract
Information extraction provides the basic technical support for knowledge graph construction and Web applications. Named entity recognition(NER) is one of the fund...
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Abstract
Background: Imetelstat, a first in class specific telomerase inhibitor, induced hematologic responses in all patients (pts) with essential thrombocythemia (...
Chinese medical named entity recognition based on multimodal information fusion and hybrid attention mechanism
Chinese medical named entity recognition based on multimodal information fusion and hybrid attention mechanism
Chinese Medical Named Entity Recognition (CMNER) seeks to identify and extract medical entities from unstructured medical texts. Existing methods often depend on single-modality re...
Unsupervised entity linking using graph-based semantic similarity
Unsupervised entity linking using graph-based semantic similarity
Nowadays, the human textual data constitutes a great proportion of the shared information resources such as World Wide Web (WWW). Social networks, news and learning resources as we...
Joint Extraction of Entities and Relations Based on Hybrid Feature Representations
Joint Extraction of Entities and Relations Based on Hybrid Feature Representations
Abstract
Although the fine-tuning pre-training model technique has obtained tremendous success in the domains of named entity recognition and relation extraction, re...
Nested Entity Recognition Method Based On Multidimensional Features And Fuzzy Localization
Nested Entity Recognition Method Based On Multidimensional Features And Fuzzy Localization
Abstract
Nested named entity recognition (NNER) aims to identify possibly overlapping named entities, which is a crucial and challenging sub-task in the field of named enti...

