Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Neural Embedding-Based Metrics for Pre-retrieval Query Performance Prediction

View through CrossRef
<p>Pre-retrieval Query Performance Prediction (QPP) methods are oblivious to the performance of the retrieval model as they predict query difficulty prior to observing the set of documents retrieved for the query. Among pre-retrieval query performance predictors, specificity-based metrics investigate how corpus, query and corpus-query level statistics can be used to predict the performance of the query. In this thesis, we explore how neural embeddings can be utilized to define corpus-independent and semantics-aware specificity metrics. Our metrics are based on the intuition that a term that is closely surrounded by other terms in the embedding space is more likely to be specific while a term surrounded by less closely related terms is more likely to be generic. On this basis, we leverage geometric properties between embedded terms to define four groups of metrics: (1) neighborhood-based, (2) graph-based, (3) cluster-based and (4) vector-based metrics. Moreover, we employ learning-to-rank techniques to analyze the importance of individual specificity metrics. To evaluate the proposed metrics, we have curated and publicly share a test collection of term specificity measurements defined based on Wikipedia category hierarchy and DMOZ taxonomy. We report on our extensive experiments on the effectiveness of our metrics through metric comparison, ablation study and comparison against the state-of-the-art baselines. We have shown that our proposed set of pre-retrieval QPP metrics based on the properties of pre-trained neural embeddings are more effective for performance prediction compared to the state-of-the-art methods. We report our findings based on Robust04, ClueWeb09 and Gov2 corpora and their associated TREC topics.</p>
Ryerson University Library and Archives
Title: Neural Embedding-Based Metrics for Pre-retrieval Query Performance Prediction
Description:
<p>Pre-retrieval Query Performance Prediction (QPP) methods are oblivious to the performance of the retrieval model as they predict query difficulty prior to observing the set of documents retrieved for the query.
Among pre-retrieval query performance predictors, specificity-based metrics investigate how corpus, query and corpus-query level statistics can be used to predict the performance of the query.
In this thesis, we explore how neural embeddings can be utilized to define corpus-independent and semantics-aware specificity metrics.
Our metrics are based on the intuition that a term that is closely surrounded by other terms in the embedding space is more likely to be specific while a term surrounded by less closely related terms is more likely to be generic.
On this basis, we leverage geometric properties between embedded terms to define four groups of metrics: (1) neighborhood-based, (2) graph-based, (3) cluster-based and (4) vector-based metrics.
Moreover, we employ learning-to-rank techniques to analyze the importance of individual specificity metrics.
To evaluate the proposed metrics, we have curated and publicly share a test collection of term specificity measurements defined based on Wikipedia category hierarchy and DMOZ taxonomy.
We report on our extensive experiments on the effectiveness of our metrics through metric comparison, ablation study and comparison against the state-of-the-art baselines.
We have shown that our proposed set of pre-retrieval QPP metrics based on the properties of pre-trained neural embeddings are more effective for performance prediction compared to the state-of-the-art methods.
We report our findings based on Robust04, ClueWeb09 and Gov2 corpora and their associated TREC topics.
</p>.

Related Results

Neural Embedding-Based Metrics for Pre-retrieval Query Performance Prediction
Neural Embedding-Based Metrics for Pre-retrieval Query Performance Prediction
<p>Pre-retrieval Query Performance Prediction (QPP) methods are oblivious to the performance of the retrieval model as they predict query difficulty prior to observing the se...
A Survey of Query Auto Completion in Information Retrieval
A Survey of Query Auto Completion in Information Retrieval
In information retrieval, query auto completion (QAC), also known as type-ahead [Xiao et al., 2013, Cai et al., 2014b] and auto-complete suggestion [Jain and Mishne, 2010], refers ...
Automatic data aggregation for recursively modeled NFV services
Automatic data aggregation for recursively modeled NFV services
SummaryNetwork function virtualization (NFV) allows to model network services as graphs interconnecting virtual network functions (VNFs), which may include nested VNFs, modeled as ...
Named Entity Recognition in Statistical Dataset Search Queries
Named Entity Recognition in Statistical Dataset Search Queries
Search engines must understand user queries to provide relevant search results. Search engines can enhance their understanding of user intent by employing named entity recognition ...
RaPID-Query for Fast Identity by Descent Search and Genealogical Analysis
RaPID-Query for Fast Identity by Descent Search and Genealogical Analysis
AbstractThe size of genetic databases has grown large enough such that, genetic genealogical search, a process of inferring familial relatedness by identifying DNA matches, has bec...
Information-Theoretic Limits for Steganography in Multimedia
Information-Theoretic Limits for Steganography in Multimedia
<pre>Steganography in multimedia aims to embed secret data into an innocent multimedia cover object. The embedding introduces some distortion to the cover object and produces...
Improving Neural Retrieval with Contrastive Learning
Improving Neural Retrieval with Contrastive Learning
In recent years, neural retrieval models have shown remarkable progress in improving the efficiency and accuracy of information retrieval systems. However, challenges remain in eff...
A New Remote Sensing Image Retrieval Method Based on CNN and YOLO
A New Remote Sensing Image Retrieval Method Based on CNN and YOLO
<>Retrieving remote sensing images plays a key role in RS fields, which activates researchers to design a highly effective extraction method of image high-level features. How...

Back to Top