Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Robust knowledge extraction over large text collections

View through CrossRef
Automatic knowledge extraction over large text collections has been a challenging task due to many constraints such as needs of large annotated training data, requirement of extensive manual processing of data, and huge amount of domain-specific terms. In order to address these constraints, this study proposes and develops a complete solution for extracting knowledge from large text collections with minimum human intervention. As a testbed system, a novel robust and quality knowledge extraction system, called RIKE (Robust Iterative Knowledge Extraction), has been developed. RIKE consists of two major components: DocSpotter and HiMMIE. DocSpotter queries and retrieves promising documents for extraction. HiMMIE extracts target entities based on a Mixture Hidden Markov Model from the selected documents from DocSpotter. The following three research questions are examined to evaluate RIKE: 1) How accurately does RIKE retrieve the promising documents for information extraction from huge text collections such as MEDLINE or TREC? 2) Does ontology enhance extraction accuracy of RIKE in retrieving the promising documents? 3) How well does RIKE extract the target entities from a huge medical text collection, MEDLINE? The major contributions of this study are1) an automatic unsupervised query generation for effective retrieval from text databases is proposed and evaluated, 2) Mixture Hidden Markov models for automatic instances extraction are proposed and tested, 3) Three Ontology-driven query expansion algorithms are proposed and evaluated, and 4) Object-oriented methodologies for knowledge extraction system are adopted. Through extensive experiments, RIKE is proved to be a robust and quality knowledge extraction technique. DocSpotter outperforms other leading techniques for retrieving promising documents for extraction from 15.5% to 35.34% in P@20. HiMMIE improves extraction accuracy from 9.43% to 24.67% in F-measures.
Drexel University Libraries
Title: Robust knowledge extraction over large text collections
Description:
Automatic knowledge extraction over large text collections has been a challenging task due to many constraints such as needs of large annotated training data, requirement of extensive manual processing of data, and huge amount of domain-specific terms.
In order to address these constraints, this study proposes and develops a complete solution for extracting knowledge from large text collections with minimum human intervention.
As a testbed system, a novel robust and quality knowledge extraction system, called RIKE (Robust Iterative Knowledge Extraction), has been developed.
RIKE consists of two major components: DocSpotter and HiMMIE.
DocSpotter queries and retrieves promising documents for extraction.
HiMMIE extracts target entities based on a Mixture Hidden Markov Model from the selected documents from DocSpotter.
The following three research questions are examined to evaluate RIKE: 1) How accurately does RIKE retrieve the promising documents for information extraction from huge text collections such as MEDLINE or TREC? 2) Does ontology enhance extraction accuracy of RIKE in retrieving the promising documents? 3) How well does RIKE extract the target entities from a huge medical text collection, MEDLINE? The major contributions of this study are1) an automatic unsupervised query generation for effective retrieval from text databases is proposed and evaluated, 2) Mixture Hidden Markov models for automatic instances extraction are proposed and tested, 3) Three Ontology-driven query expansion algorithms are proposed and evaluated, and 4) Object-oriented methodologies for knowledge extraction system are adopted.
Through extensive experiments, RIKE is proved to be a robust and quality knowledge extraction technique.
DocSpotter outperforms other leading techniques for retrieving promising documents for extraction from 15.
5% to 35.
34% in P@20.
HiMMIE improves extraction accuracy from 9.
43% to 24.
67% in F-measures.

Related Results

Utilizing Large Language Models for Geoscience Literature Information Extraction
Utilizing Large Language Models for Geoscience Literature Information Extraction
Extracting information from unstructured and semi-structured geoscience literature is a crucial step in conducting geological research. The traditional machine learning extraction ...
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
On Flores Island, do "ape-men" still exist? https://www.sapiens.org/biology/flores-island-ape-men/
<span style="font-size:11pt"><span style="background:#f9f9f4"><span style="line-height:normal"><span style="font-family:Calibri,sans-serif"><b><spa...
E-Press and Oppress
E-Press and Oppress
From elephants to ABBA fans, silicon to hormone, the following discussion uses a new research method to look at printed text, motion pictures and a te...
Optimization of ultrasonic extraction of Lycium barbarum polysaccharides using response surface methodology
Optimization of ultrasonic extraction of Lycium barbarum polysaccharides using response surface methodology
Abstract Ultrasonic extraction was a new development method to achieve high-efficiency extraction of Lycium barbarum polysaccharides instead of hot water extraction....
Λc Physics at BESIII
Λc Physics at BESIII
In 2014 BESIII collected a data sample of 567 [Formula: see text] at [Formula: see text] = 4.6 GeV, which is just above the [Formula: see text] pair production threshold. By analyz...
KNOWLEDGE IN PRACTICE
KNOWLEDGE IN PRACTICE
Knowledge is an understanding of someone or something, such as facts, information, descriptions or skills, which is acquired by individuals through education, learning, experience ...
Strong vb-dominating and vb-independent sets of a graph
Strong vb-dominating and vb-independent sets of a graph
Let [Formula: see text] be a graph. A vertex [Formula: see text] strongly (weakly) b-dominates block [Formula: see text] if [Formula: see text] ([Formula: see text]) for every vert...
Enhanced Scene Text Extraction through “Texture Analysis and Deep Convolutional Networks"
Enhanced Scene Text Extraction through “Texture Analysis and Deep Convolutional Networks"
The widespread use of portable cameras and advancements in visual computing have made extracting text from images captured in natural settings an increasingly important area of res...

Back to Top