Javascript must be enabled to continue!
Automatic new topic identification in search engine transaction logs
View through CrossRef
PurposeContent analysis of search engine user queries is an important task, since successful exploitation of the content of queries can result in the design of efficient information retrieval algorithms of search engines, which can offer custom‐tailored services to the web user. Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. The purpose of this study is to address these issues.Design/methodology/approachThis study applies genetic algorithms and Dempster‐Shafer theory, proposed by He et al., to automatically identify topic changes in a user session by using statistical characteristics of queries, such as time intervals and query reformulation patterns. A sample data log from the Norwegian search engine FAST (currently owned by overture) is selected to apply Dempster‐Shafer theory and genetic algorithms for identifying topic changes in the data log.FindingsAs a result, 97.7 percent of topic shifts and 87.2 percent of topic continuations were estimated correctly. The findings are consistent with the previous application of the Dempster‐Shafer theory and genetic algorithms on a different search engine data log. This finding could be implied as an indication that content‐ignorant topic identification, using query patterns and time intervals, is a promising line of research.Originality/valueStudies an important dimension of user behavior in information retrieval.
Title: Automatic new topic identification in search engine transaction logs
Description:
PurposeContent analysis of search engine user queries is an important task, since successful exploitation of the content of queries can result in the design of efficient information retrieval algorithms of search engines, which can offer custom‐tailored services to the web user.
Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries.
The purpose of this study is to address these issues.
Design/methodology/approachThis study applies genetic algorithms and Dempster‐Shafer theory, proposed by He et al.
, to automatically identify topic changes in a user session by using statistical characteristics of queries, such as time intervals and query reformulation patterns.
A sample data log from the Norwegian search engine FAST (currently owned by overture) is selected to apply Dempster‐Shafer theory and genetic algorithms for identifying topic changes in the data log.
FindingsAs a result, 97.
7 percent of topic shifts and 87.
2 percent of topic continuations were estimated correctly.
The findings are consistent with the previous application of the Dempster‐Shafer theory and genetic algorithms on a different search engine data log.
This finding could be implied as an indication that content‐ignorant topic identification, using query patterns and time intervals, is a promising line of research.
Originality/valueStudies an important dimension of user behavior in information retrieval.
Related Results
A New Workflow for Estimating Reservoir Properties With Gradient Boosting Model and Joint Inversion Using MWD Measurements
A New Workflow for Estimating Reservoir Properties With Gradient Boosting Model and Joint Inversion Using MWD Measurements
Triple-combo logs are important measurements for estimating geological, petrophysical, and geomechanical properties. Unfortunately, wireline and advanced logging-while-drilling (LW...
A New Workflow for Estimating Reservoir Properties With Gradient Boosting Modeland Joint Inversion Using MWD Measurements
A New Workflow for Estimating Reservoir Properties With Gradient Boosting Modeland Joint Inversion Using MWD Measurements
Triple-combo logs are important measurements for estimating geological, petrophysical, and geomechanical properties. Unfortunately, wireline and advanced LWD logs are typically dro...
Development of the Tour Split-Cycle Internal Combustion Engine
Development of the Tour Split-Cycle Internal Combustion Engine
<div class="section abstract"><div class="htmlview paragraph">The Tour engine is a novel split-cycle internal combustion engine (ICE) that divides the four-stroke Otto ...
Quantitative Feedback Control of Air Path in Diesel-Dual-Fuel Engine
Quantitative Feedback Control of Air Path in Diesel-Dual-Fuel Engine
<div class="section abstract"><div class="htmlview paragraph">In this paper, we investigate a multivariable control of air path of a diesel-dual-fuel (DDF) engine. The ...
Islamic Business Ethics of Small Industries in Blitar District
Islamic Business Ethics of Small Industries in Blitar District
Islam was revealed as a code of moral and ethical behavior for life. The sources of values and ethics in all aspects of human life as a whole, including in the business world, ar...
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract
The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
The F-16 Common Engine Bay
The F-16 Common Engine Bay
In 1979 the United States Air Force elected under the Engine Model Derivative Program (EMDP) to explore derivative engine concepts by the General Electric Company and the Pratt and...
Using Metadata to Understand Search Behavior in Digital Libraries
Using Metadata to Understand Search Behavior in Digital Libraries
This thesis explores how search log analysis can be used to gain a deeper understanding of online search behavior in curated collections by leveraging the metadata. For this, we us...

