Javascript must be enabled to continue!
Named Entity Recognition in Statistical Dataset Search Queries
View through CrossRef
Search engines must understand user queries to provide relevant search results. Search engines can enhance their understanding of user intent by employing named entity recognition (NER) to identify the entity in the query. Knowing the types of entities in the query can be the initial step in helping search engines better understand search intent. In this research, a dataset was constructed using search query history from the Statistics Indonesia (Badan Pusat Statistik, BPS) website, and NER in query modeling was employed to extract entities from search queries related to statistical datasets. The research stages included query data collection, query data preprocessing, query data labeling, NER in query modeling, and model evaluation. The conditional random field (CRF) model was employed for NER in query modeling with two scenarios: CRF with basic features and CRF with basic features plus part of speech (POS) features. The CRF model was used due to its well-known effectiveness in natural language processing (NLP), particularly for tasks like NER with sequence labeling. In this research, the basic CRF and the CRF model with POS feature achieved an F1-score of 0.9139 and 0.9110, respectively. A case study on a Linked Open Data (LOD) statistical dataset indicated that searches with synonym query expansion on entities from NER in query produced better search results than regular searches without query expansion. The model's performance incorporating additional POS tagging features did not result in a significant improvement. Therefore, it is recommended that future research will elaborate on deep learning.
Universitas Gadjah Mada
Title: Named Entity Recognition in Statistical Dataset Search Queries
Description:
Search engines must understand user queries to provide relevant search results.
Search engines can enhance their understanding of user intent by employing named entity recognition (NER) to identify the entity in the query.
Knowing the types of entities in the query can be the initial step in helping search engines better understand search intent.
In this research, a dataset was constructed using search query history from the Statistics Indonesia (Badan Pusat Statistik, BPS) website, and NER in query modeling was employed to extract entities from search queries related to statistical datasets.
The research stages included query data collection, query data preprocessing, query data labeling, NER in query modeling, and model evaluation.
The conditional random field (CRF) model was employed for NER in query modeling with two scenarios: CRF with basic features and CRF with basic features plus part of speech (POS) features.
The CRF model was used due to its well-known effectiveness in natural language processing (NLP), particularly for tasks like NER with sequence labeling.
In this research, the basic CRF and the CRF model with POS feature achieved an F1-score of 0.
9139 and 0.
9110, respectively.
A case study on a Linked Open Data (LOD) statistical dataset indicated that searches with synonym query expansion on entities from NER in query produced better search results than regular searches without query expansion.
The model's performance incorporating additional POS tagging features did not result in a significant improvement.
Therefore, it is recommended that future research will elaborate on deep learning.
Related Results
Graph-based Interactive Bibliographic Information Retrieval Systems
Graph-based Interactive Bibliographic Information Retrieval Systems
In the big data era, we have witnessed the explosion of scholarly literature. This explosion has imposed challenges to the retrieval of bibliographic information. Retrieval of inte...
A Phase 1b, Dose-Finding Study Of Ruxolitinib Plus Panobinostat In Patients With Primary Myelofibrosis (PMF), Post–Polycythemia Vera MF (PPV-MF), Or Post–Essential Thrombocythemia MF (PET-MF): Identification Of The Recommended Phase 2 Dose
A Phase 1b, Dose-Finding Study Of Ruxolitinib Plus Panobinostat In Patients With Primary Myelofibrosis (PMF), Post–Polycythemia Vera MF (PPV-MF), Or Post–Essential Thrombocythemia MF (PET-MF): Identification Of The Recommended Phase 2 Dose
Abstract
Background
Myelofibrosis (MF) is a myeloproliferative neoplasm associated with progressive, debilitating symptoms that ...
Unsupervised entity linking using graph-based semantic similarity
Unsupervised entity linking using graph-based semantic similarity
Nowadays, the human textual data constitutes a great proportion of the shared information resources such as World Wide Web (WWW). Social networks, news and learning resources as we...
Joint Extraction of Entities and Relations Based on Hybrid Feature Representations
Joint Extraction of Entities and Relations Based on Hybrid Feature Representations
Abstract
Although the fine-tuning pre-training model technique has obtained tremendous success in the domains of named entity recognition and relation extraction, re...
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Abstract
Background: Imetelstat, a first in class specific telomerase inhibitor, induced hematologic responses in all patients (pts) with essential thrombocythemia (...
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract
The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
Combinatorial Antigen Targeting Strategy for Acute Myeloid Leukemia
Combinatorial Antigen Targeting Strategy for Acute Myeloid Leukemia
Introduction: Efforts to safely and effectively treat acute myeloid leukemia (AML) by targeting a single leukemia associated antigen with chimeric antigen receptor T (CAR T) cells ...
A Chinese telemedicine-dialogue dataset annotated for named entities
A Chinese telemedicine-dialogue dataset annotated for named entities
Abstract
Background: A large collection of dialogues between patients and doctors are needed to be annotated for medical named entities to build intelligence for telemedici...

