Javascript must be enabled to continue!
Automatic thesaurus for enhanced Chinese text retrieval
View through CrossRef
Asian languages such as Japanese, Korean and in particular Chinese, are beginning to gain popularity in the information retrieval (IR) domain. The quality of IR systems has traditionally been judged by the system’s retrieval effectiveness which, in turn, is commonly measured by data recall and data precision. This paper proposes and describes a process for generating an automatic Chinese thesaurus that can be used to provide related terms to a user’s queries to enhance retrieval effectiveness. In the absence of existing automatic Chinese thesauri, techniques used in English thesaurus generation have been evaluated and adapted to generate a Chinese equivalent. The automatic thesaurus is generated by computing the co‐occurrence values between domain‐specific terms found in a document collection. These co‐occurrence values are in turn derived from the term and document frequencies of the terms. A set of experiments was subsequently carried out on a document test set to evaluate the applicability of the thesaurus. Results obtained from these experiments confirmed that such an automatic generated thesaurus is able to improve the retrieval effectiveness of a Chinese IR system.
Title: Automatic thesaurus for enhanced Chinese text retrieval
Description:
Asian languages such as Japanese, Korean and in particular Chinese, are beginning to gain popularity in the information retrieval (IR) domain.
The quality of IR systems has traditionally been judged by the system’s retrieval effectiveness which, in turn, is commonly measured by data recall and data precision.
This paper proposes and describes a process for generating an automatic Chinese thesaurus that can be used to provide related terms to a user’s queries to enhance retrieval effectiveness.
In the absence of existing automatic Chinese thesauri, techniques used in English thesaurus generation have been evaluated and adapted to generate a Chinese equivalent.
The automatic thesaurus is generated by computing the co‐occurrence values between domain‐specific terms found in a document collection.
These co‐occurrence values are in turn derived from the term and document frequencies of the terms.
A set of experiments was subsequently carried out on a document test set to evaluate the applicability of the thesaurus.
Results obtained from these experiments confirmed that such an automatic generated thesaurus is able to improve the retrieval effectiveness of a Chinese IR system.
Related Results
E-Press and Oppress
E-Press and Oppress
From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
<span style="font-size:11pt"><span style="background:#f9f9f4"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><b><spa...
Improving Sentence Retrieval Using Sequence Similarity
Improving Sentence Retrieval Using Sequence Similarity
Sentence retrieval is an information retrieval technique that aims to find sentences corresponding to an information need. It is used for tasks like question answering (QA) or nove...
New Research Progress in Image Retrieval
New Research Progress in Image Retrieval
Image retrieval is generally divided into two categories: one is text-based Image Retrieval; another is content-based Image Retrieval. Early image retrieval technology is mainly ba...
Thesauri in the modern world: Research and prospects for application
Thesauri in the modern world: Research and prospects for application
In today’s information society, where the amount of available information is constantly growing, the issues of semantic classification and data organization are becoming more and m...
Λc Physics at BESIII
Λc Physics at BESIII
In 2014 BESIII collected a data sample of 567 [Formula: see text] at [Formula: see text] = 4.6 GeV, which is just above the [Formula: see text] pair production threshold. By analyz...
Neuromodulatory signaling in hippocampus‐dependent memory retrieval
Neuromodulatory signaling in hippocampus‐dependent memory retrieval
ABSTRACTConsiderable advances have been made toward understanding the molecular signaling events that underlie memory acquisition and consolidation. In contrast, less is known abou...
A New Remote Sensing Image Retrieval Method Based on CNN and YOLO
A New Remote Sensing Image Retrieval Method Based on CNN and YOLO
<>Retrieving remote sensing images plays a key role in RS fields, which activates researchers to design a highly effective extraction method of image high-level features. How...

