Javascript must be enabled to continue!
Unsupervised Evaluation of Entity Resolution
View through CrossRef
Entity resolution is the problem of identifying records that refer to the same entity from one or multiple databases. Applications of entity resolution range from health and social science research to national security and online commerce. Entity resolution can be viewed as a classification task where pairs of records are classified as matches (referring to the same entity) or non-matches (referring to different entities). Alternatively, clustering-based entity resolution methods generate clusters of records such that each cluster refers to one entity, and each entity is represented by one cluster. If ground truth data in the form of known matches and non-matches are available, then performance measures such as precision, recall, and the F-measure, are commonly used to evaluate the quality of entity resolution methods. In practical applications, however, ground truth data are often not available, or they can be incomplete or biased, making quality evaluation challenging. To overcome this gap, we develop multiple methods to evaluate the quality of an entity resolution result without the need of ground truth data by calculating estimated numbers of true and false matches, as well as missed matches. These allow the calculation of estimates for precision, recall, and the F-measure. Our methods are either based on analysing links (classified record pairs) or the clustering structure provided by an entity resolution method. We validate our methods on multiple data sets from diverse domains, showing they can obtain precision and recall estimates close to their true values.
Association for Computing Machinery (ACM)
Title: Unsupervised Evaluation of Entity Resolution
Description:
Entity resolution is the problem of identifying records that refer to the same entity from one or multiple databases.
Applications of entity resolution range from health and social science research to national security and online commerce.
Entity resolution can be viewed as a classification task where pairs of records are classified as matches (referring to the same entity) or non-matches (referring to different entities).
Alternatively, clustering-based entity resolution methods generate clusters of records such that each cluster refers to one entity, and each entity is represented by one cluster.
If ground truth data in the form of known matches and non-matches are available, then performance measures such as precision, recall, and the F-measure, are commonly used to evaluate the quality of entity resolution methods.
In practical applications, however, ground truth data are often not available, or they can be incomplete or biased, making quality evaluation challenging.
To overcome this gap, we develop multiple methods to evaluate the quality of an entity resolution result without the need of ground truth data by calculating estimated numbers of true and false matches, as well as missed matches.
These allow the calculation of estimates for precision, recall, and the F-measure.
Our methods are either based on analysing links (classified record pairs) or the clustering structure provided by an entity resolution method.
We validate our methods on multiple data sets from diverse domains, showing they can obtain precision and recall estimates close to their true values.
Related Results
A Phase 1b, Dose-Finding Study Of Ruxolitinib Plus Panobinostat In Patients With Primary Myelofibrosis (PMF), Post–Polycythemia Vera MF (PPV-MF), Or Post–Essential Thrombocythemia MF (PET-MF): Identification Of The Recommended Phase 2 Dose
A Phase 1b, Dose-Finding Study Of Ruxolitinib Plus Panobinostat In Patients With Primary Myelofibrosis (PMF), Post–Polycythemia Vera MF (PPV-MF), Or Post–Essential Thrombocythemia MF (PET-MF): Identification Of The Recommended Phase 2 Dose
Abstract
Background
Myelofibrosis (MF) is a myeloproliferative neoplasm associated with progressive, debilitating symptoms that ...
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Abstract
Background: Imetelstat, a first in class specific telomerase inhibitor, induced hematologic responses in all patients (pts) with essential thrombocythemia (...
Combinatorial Antigen Targeting Strategy for Acute Myeloid Leukemia
Combinatorial Antigen Targeting Strategy for Acute Myeloid Leukemia
Introduction: Efforts to safely and effectively treat acute myeloid leukemia (AML) by targeting a single leukemia associated antigen with chimeric antigen receptor T (CAR T) cells ...
Unsupervised entity linking using graph-based semantic similarity
Unsupervised entity linking using graph-based semantic similarity
Nowadays, the human textual data constitutes a great proportion of the shared information resources such as World Wide Web (WWW). Social networks, news and learning resources as we...
A scalable MapReduce-based design of an unsupervised entity resolution system
A scalable MapReduce-based design of an unsupervised entity resolution system
Traditional data curation processes typically depend on human intervention. As data volume and variety grow exponentially, organizations are striving to increase efficiency of thei...
Unsupervised Deep Learning for Enhanced holoentropy Image Stitching
Unsupervised Deep Learning for Enhanced holoentropy Image Stitching
Traditional feature-based image stitching technologies rely heavily on feature detection quality, often failing to stitch images with few features or low resolution. The learning b...
DLUT: Decoupled Learning-Based Unsupervised Tracker
DLUT: Decoupled Learning-Based Unsupervised Tracker
Unsupervised learning has shown immense potential in object tracking, where accurate classification and regression are crucial for unsupervised trackers. However, the classificatio...
Few-Shot Named Entity Recognition with Hybrid Multi-Prototype Learning
Few-Shot Named Entity Recognition with Hybrid Multi-Prototype Learning
Abstract
Information extraction provides the basic technical support for knowledge graph construction and Web applications. Named entity recognition(NER) is one of the fund...

