Javascript must be enabled to continue!
Handling Compound Hindi OOV words in web queries
View through CrossRef
Abstract
Handling of Out of Vocabulary (OOV) words is still a problem in NLP. If there is a word for which morphological analyser is not able to find a morpheme, that word is known as OOV word. These words if not identified, may restrict to understand the proper meaning of the sentence. It may also have severe impact on the IR system involving the queries. Detection and identification of OOV words in information retrieval is a challenging task. This problem may become more challenging in case of cross lingual information retrieval (CLIR) due to issues in query translation. The objective of this paper is to understand the impact of web queries involving these words on the retrieval effectiveness of web searches. Subsequently, we have also proposed an algorithm to successfully detect and handle the impact of Hindi web queries involving compound OOV. Our results have shown increased precision of 8.53% for one-word web queries involving only OOV word and 15.68% with queries having more than one word having at least one OOV word.
Title: Handling Compound Hindi OOV words in web queries
Description:
Abstract
Handling of Out of Vocabulary (OOV) words is still a problem in NLP.
If there is a word for which morphological analyser is not able to find a morpheme, that word is known as OOV word.
These words if not identified, may restrict to understand the proper meaning of the sentence.
It may also have severe impact on the IR system involving the queries.
Detection and identification of OOV words in information retrieval is a challenging task.
This problem may become more challenging in case of cross lingual information retrieval (CLIR) due to issues in query translation.
The objective of this paper is to understand the impact of web queries involving these words on the retrieval effectiveness of web searches.
Subsequently, we have also proposed an algorithm to successfully detect and handle the impact of Hindi web queries involving compound OOV.
Our results have shown increased precision of 8.
53% for one-word web queries involving only OOV word and 15.
68% with queries having more than one word having at least one OOV word.
Related Results
Incorporate Out-of-Vocabulary Words for Psycholinguistic Analysis using Social Media Texts - An OOV-aware Data Curation Process and a Hybrid Approach
Incorporate Out-of-Vocabulary Words for Psycholinguistic Analysis using Social Media Texts - An OOV-aware Data Curation Process and a Hybrid Approach
Massive user generated social media texts (SMT) posits new opportunities as well as challenges for psycholinguistic analysis to understand individual differences such as personalit...
Graph-based Interactive Bibliographic Information Retrieval Systems
Graph-based Interactive Bibliographic Information Retrieval Systems
In the big data era, we have witnessed the explosion of scholarly literature. This explosion has imposed challenges to the retrieval of bibliographic information. Retrieval of inte...
The Making of Modern Hindi
The Making of Modern Hindi
The Making of Modern Hindi examines the politics and processes of making Hindi modern at a formative moment in India’s history, when British imperialism was at its peak and anti-co...
Sebk-i Hindî Tesiri Bağlamında Mahvî Dîvânı’nda Karamsarlık
Sebk-i Hindî Tesiri Bağlamında Mahvî Dîvânı’nda Karamsarlık
Sebk-i Hindî, Fars edebiyatı ve Hint sanat zevkinin senteziyle meydana gelmiş olup klasik Türk şiirinde XVII. yüzyıl ortalarında görülmeye başlamıştır. Klasik Türk edebiyatında ken...
Eliciting Single-Peaked Preferences Using Comparison Queries
Eliciting Single-Peaked Preferences Using Comparison Queries
Voting is a general method for aggregating the preferences of multiple agents. Each agent ranks all the possible alternatives, and based on this, an aggregate ranking of the alter...
Premsagar (1810) and Orientalist Narratives of the “Invention” Of Modern Hindi
Premsagar (1810) and Orientalist Narratives of the “Invention” Of Modern Hindi
What exactly was invented when the language department of Bhakka was formally instituted at Fort William College in 1801 and Lallu Jee Lal was selected as the first Braj Bhakka sch...
Sensationalizing Hindi
Sensationalizing Hindi
Dwivedi’s attempt to sway his public through verbal and visual rhetoric is the primary focus of Chapter 1. Resorting to scaremongering and sensationalism, Dwivedi issues a variety ...
A Conditional Random Field Approach for Named Entity Recognition in Bengali and Hindi
A Conditional Random Field Approach for Named Entity Recognition in Bengali and Hindi
This paper describes the development of Named Entity Recognition (NER) systems for two leading Indian languages, namely Bengali and Hindi, using the Conditional Random Field (CRF) ...

