Javascript must be enabled to continue!

A prediction model for web search hit counts using word frequencies

A search engine user with a well-defined information need is not interested in getting thousands of hits, but a few hits that are all highly relevant to their search. Often search words need to be refined and augmented to narrow results to more relevant pages. However, an overly specific query may lead to no hits at all, while most typical queries lead to thousands or even millions of them, both undesirable outcomes. This paper suggests a query rewriting method for generating alternative query strings and proposes a hit count prediction model for predicting the number of search engine hits for each alternative query string, based on the English language frequencies of the words in the search terms. Using the hit count prediction model, different types of search strategies, such as a lowest hit count query preference, can be utilized to improve users’ search experience. We present an evaluation experiment of the hit count prediction model for three major search engines. We also discuss and quantify how far the Google, Yahoo! and Bing search engines diverge from monotonic behaviour, considering negative and positive search terms separately.

SAGE Publications

Tian Tian Soon Ae Chun James Geller

Journal of Information Science

2011

Title: A prediction model for web search hit counts using word frequencies

Description:

A search engine user with a well-defined information need is not interested in getting thousands of hits, but a few hits that are all highly relevant to their search.

Often search words need to be refined and augmented to narrow results to more relevant pages.

However, an overly specific query may lead to no hits at all, while most typical queries lead to thousands or even millions of them, both undesirable outcomes.

This paper suggests a query rewriting method for generating alternative query strings and proposes a hit count prediction model for predicting the number of search engine hits for each alternative query string, based on the English language frequencies of the words in the search terms.

Using the hit count prediction model, different types of search strategies, such as a lowest hit count query preference, can be utilized to improve users’ search experience.

We present an evaluation experiment of the hit count prediction model for three major search engines.

We also discuss and quantify how far the Google, Yahoo! and Bing search engines diverge from monotonic behaviour, considering negative and positive search terms separately.

Back

Introduction: Heparin induced-thrombocytopenia (HIT) is a severe autoimmune reaction to heparin that increases patients' risk of developing venous thrombosis, lea...

Source number counts at high energies: Swift versus NuSTAR

The hard X-ray sky at energies above 10 keV has been extensively explored by the Swift/Gehrels and the NuSTAR missions in the 14−195 keV and the 3−24 keV bands. respectively. The m...

Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report

Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...

External validation of the HIT Expert Probability (HEP) score

SummaryThe diagnosis of heparin-induced thrombocytopenia (HIT) can be challenging. The HIT Expert Probability (HEP) Score has recently been proposed to aid in the diagnosis of HIT....

Age, gender, and ethnicity are associated with higher all-cause mortality in hospitalized patients with heparin-induced thrombocytopenia: A nationwide analysis

Abstract Introduction Heparin-induced thrombocytopenia (HIT) is a life-threatening immunologic reaction to heparin exposure that is associated with substantial morbidity an...

Measurement And Projection Of Exploration Search Efficiency

Abstract The efficiency of exploration is an intuitive concept to the explorationist. Factors that obviously contribute to efficiency include good geological inte...

Application of Immunocytochemistry to Monitor the T Cell Subset Counts in Patients with Sepsis

Abstract [Objective]: To evaluate the application significance of immunocytochemistry for monitoring peripheral blood CD3+ T cell subset (CD3+/CD3+CD4+/CD3+CD8+) counts in ...

WEB PROGRAMMING

"Web Programming" is a comprehensive book that provides a detailed overview of various aspects of web programming. The book is co-authored by Dr. Chitra Ravi and Dr. Mohan Kumar S,...

Email:
Password:

Email:

A prediction model for web search hit counts using word frequencies

Related Results