Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Named Entity Recognition in Statistical Dataset Search Queries

View through CrossRef
Search engines must understand user queries to provide relevant search results. Search engines can enhance their understanding of user intent by employing named entity recognition (NER) to identify the entity in the query. Knowing the types of entities in the query can be the initial step in helping search engines better understand search intent. In this research, a dataset was constructed using search query history from the Statistics Indonesia (Badan Pusat Statistik, BPS) website, and NER in query modeling was employed to extract entities from search queries related to statistical datasets. The research stages included query data collection, query data preprocessing, query data labeling, NER in query modeling, and model evaluation. The conditional random field (CRF) model was employed for NER in query modeling with two scenarios: CRF with basic features and CRF with basic features plus part of speech (POS) features. The CRF model was used due to its well-known effectiveness in natural language processing (NLP), particularly for tasks like NER with sequence labeling. In this research, the basic CRF and the CRF model with POS feature achieved an F1-score of 0.9139 and 0.9110, respectively. A case study on a Linked Open Data (LOD) statistical dataset indicated that searches with synonym query expansion on entities from NER in query produced better search results than regular searches without query expansion. The model's performance incorporating additional POS tagging features did not result in a significant improvement. Therefore, it is recommended that future research will elaborate on deep learning.
Title: Named Entity Recognition in Statistical Dataset Search Queries
Description:
Search engines must understand user queries to provide relevant search results.
Search engines can enhance their understanding of user intent by employing named entity recognition (NER) to identify the entity in the query.
Knowing the types of entities in the query can be the initial step in helping search engines better understand search intent.
In this research, a dataset was constructed using search query history from the Statistics Indonesia (Badan Pusat Statistik, BPS) website, and NER in query modeling was employed to extract entities from search queries related to statistical datasets.
The research stages included query data collection, query data preprocessing, query data labeling, NER in query modeling, and model evaluation.
The conditional random field (CRF) model was employed for NER in query modeling with two scenarios: CRF with basic features and CRF with basic features plus part of speech (POS) features.
The CRF model was used due to its well-known effectiveness in natural language processing (NLP), particularly for tasks like NER with sequence labeling.
In this research, the basic CRF and the CRF model with POS feature achieved an F1-score of 0.
9139 and 0.
9110, respectively.
A case study on a Linked Open Data (LOD) statistical dataset indicated that searches with synonym query expansion on entities from NER in query produced better search results than regular searches without query expansion.
The model's performance incorporating additional POS tagging features did not result in a significant improvement.
Therefore, it is recommended that future research will elaborate on deep learning.

Related Results

Graph-based interactive bibliographic information retrieval systems
Graph-based interactive bibliographic information retrieval systems
In the big data era, we have witnessed the explosion of scholarly literature. This explosion has imposed challenges to the retrieval of bibliographic information. Retrieval of inte...
Unsupervised entity linking using graph-based semantic similarity
Unsupervised entity linking using graph-based semantic similarity
Nowadays, the human textual data constitutes a great proportion of the shared information resources such as World Wide Web (WWW). Social networks, news and learning resources as we...
HDI Corpus: A Dataset for Named Entity Recognition for In-Context Herb-Drug Interactions
HDI Corpus: A Dataset for Named Entity Recognition for In-Context Herb-Drug Interactions
Introduction This article proposes a new dataset for Named Entity Recognition based on PubMed articles and aiming to address the problem of Herb-Drug Interactio...
Joint Extraction of Entities and Relations Based on Hybrid Feature Representations
Joint Extraction of Entities and Relations Based on Hybrid Feature Representations
Abstract Although the fine-tuning pre-training model technique has obtained tremendous success in the domains of named entity recognition and relation extraction, re...
Active learning for Named Entity Recognition in Kannada
Active learning for Named Entity Recognition in Kannada
<p>Named Entity Recognition (NER) task aims at automatically recognising and classifying named entities in a given natural language input. Majority of the studies related to ...
Active learning for Named Entity Recognition in Kannada
Active learning for Named Entity Recognition in Kannada
Named Entity Recognition (NER) task aims at automatically recognising and classifying named entities in a given natural language input. Majority of the studies related to Named Ent...

Back to Top