Javascript must be enabled to continue!
Unsupervised Evaluation of Entity Resolution
View through CrossRef
Entity resolution is the problem of identifying records that refer to the same entity from one or multiple databases. Applications of entity resolution range from health and social science research to national security and online commerce. Entity resolution can be viewed as a classification task where pairs of records are classified as matches (referring to the same entity) or non-matches (referring to different entities). Alternatively, clustering-based entity resolution methods generate clusters of records such that each cluster refers to one entity, and each entity is represented by one cluster. If ground truth data in the form of known matches and non-matches are available, then performance measures such as precision, recall, and the F-measure, are commonly used to evaluate the quality of entity resolution methods. In practical applications, however, ground truth data are often not available, or they can be incomplete or biased, making quality evaluation challenging. To overcome this gap, we develop multiple methods to evaluate the quality of an entity resolution result without the need of ground truth data by calculating estimated numbers of true and false matches, as well as missed matches. These allow the calculation of estimates for precision, recall, and the F-measure. Our methods are either based on analysing links (classified record pairs) or the clustering structure provided by an entity resolution method. We validate our methods on multiple data sets from diverse domains, showing they can obtain precision and recall estimates close to their true values.
Association for Computing Machinery (ACM)
Title: Unsupervised Evaluation of Entity Resolution
Description:
Entity resolution is the problem of identifying records that refer to the same entity from one or multiple databases.
Applications of entity resolution range from health and social science research to national security and online commerce.
Entity resolution can be viewed as a classification task where pairs of records are classified as matches (referring to the same entity) or non-matches (referring to different entities).
Alternatively, clustering-based entity resolution methods generate clusters of records such that each cluster refers to one entity, and each entity is represented by one cluster.
If ground truth data in the form of known matches and non-matches are available, then performance measures such as precision, recall, and the F-measure, are commonly used to evaluate the quality of entity resolution methods.
In practical applications, however, ground truth data are often not available, or they can be incomplete or biased, making quality evaluation challenging.
To overcome this gap, we develop multiple methods to evaluate the quality of an entity resolution result without the need of ground truth data by calculating estimated numbers of true and false matches, as well as missed matches.
These allow the calculation of estimates for precision, recall, and the F-measure.
Our methods are either based on analysing links (classified record pairs) or the clustering structure provided by an entity resolution method.
We validate our methods on multiple data sets from diverse domains, showing they can obtain precision and recall estimates close to their true values.
Related Results
Efficacy of an Extended Half-Life GlycoPEGylated rFVIII (N8-GP): Pooled Analysis of ABR (Results from Two Clinical Trials)
Efficacy of an Extended Half-Life GlycoPEGylated rFVIII (N8-GP): Pooled Analysis of ABR (Results from Two Clinical Trials)
Abstract
Introduction
The short half-life of standard factor VIII (FVIII) products means that frequent injections (3 to 4 times/week) are needed for e...
A Phase 1b, Dose-Finding Study Of Ruxolitinib Plus Panobinostat In Patients With Primary Myelofibrosis (PMF), Post–Polycythemia Vera MF (PPV-MF), Or Post–Essential Thrombocythemia MF (PET-MF): Identification Of The Recommended Phase 2 Dose
A Phase 1b, Dose-Finding Study Of Ruxolitinib Plus Panobinostat In Patients With Primary Myelofibrosis (PMF), Post–Polycythemia Vera MF (PPV-MF), Or Post–Essential Thrombocythemia MF (PET-MF): Identification Of The Recommended Phase 2 Dose
Abstract
Background
Myelofibrosis (MF) is a myeloproliferative neoplasm associated with progressive, debilitating symptoms that ...
A novel unsupervised deep learning network for intelligent fault diagnosis of rotating machinery
A novel unsupervised deep learning network for intelligent fault diagnosis of rotating machinery
Generally, the health conditions of rotating machinery are complicated and changeable. Meanwhile, its fault labeled information is mostly unknown. Therefore, it is man-sized to aut...
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Dynamics of Mutations in Patients with ET Treated with Imetelstat
Abstract
Background: Imetelstat, a first in class specific telomerase inhibitor, induced hematologic responses in all patients (pts) with essential thrombocythemia (...
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Non-Recommended Publishing Lists: Strategies for Detecting Deceitful Journals
Abstract
The rapid growth of open access publishing (OAP) has significantly improved the accessibility and dissemination of scientific knowledge. However, this expansion has also c...
Combinatorial Antigen Targeting Strategy for Acute Myeloid Leukemia
Combinatorial Antigen Targeting Strategy for Acute Myeloid Leukemia
Introduction: Efforts to safely and effectively treat acute myeloid leukemia (AML) by targeting a single leukemia associated antigen with chimeric antigen receptor T (CAR T) cells ...
Risk of Infections with BCMA-Directed Immunotherapy in Multiple Myeloma
Risk of Infections with BCMA-Directed Immunotherapy in Multiple Myeloma
Abstract
Introduction: B cell maturation antigen (BCMA) is a novel target for T cell immunotherapy in MM including bispecific antibody (bsAb) and chimeric antigen re...
Efficacy and Safety of Subcutaneous Prophylaxis with Concizumab in Patients with Hemophilia a or B with Inhibitors: Results from explorer4, a Phase 2, Randomized, Open-Label, Controlled Trial
Efficacy and Safety of Subcutaneous Prophylaxis with Concizumab in Patients with Hemophilia a or B with Inhibitors: Results from explorer4, a Phase 2, Randomized, Open-Label, Controlled Trial
Introduction Concizumab is an anti-tissue factor pathway inhibitor (TFPI) monoclonal antibody in clinical development for the subcutaneous prophylactic treatment of hemophilia pati...

