Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Evaluation measure for group-based record linkage

View through CrossRef
Introduction The robustness of record linkage evaluation measures is of high importance since linkage techniques are assessed based on these. However, minimal research has been conducted to evaluate the suitability of existing evaluation measures in the context of linking groups of records. Linkage quality is generally evaluated based on traditional measures such as precision and recall. As we show, these traditional evaluation measures are not suitable for evaluating groups of linked records because they evaluate the quality of individual record pairs rather than the quality of records grouped into clusters. Objectives We highlight the shortcomings of traditional evaluation measures and then propose a novel method to evaluate clustering quality in the context of group-based record linkage. Methods The proposed linkage evaluation method assesses how well individual records have been allocated into predicted groups/clusters with respect to ground-truth data. We first identify the best representative predicted cluster for each ground-truth cluster and, based on the resulting mapping, each record in a ground-truth cluster is assigned to one of seven categories. These categories reflect how well the linkage technique assigned records into groups. Results We empirically evaluate our proposed method using real-world data and show that it better reflects the quality of clusters generated by three group-based record linkage techniques. We also show that traditional measures such as precision and recall can produce ambiguous results whereas our method does not. Conclusions The proposed evaluation method provides unambiguous results regarding the assessed group-based record linkage approaches. The method comprises of seven categories which reflect how each record was predicted, providing more detailed information about the quality of the linkage result. This will help to make better-informed decisions about which linkage technique is best suited for a given linkage application.
Title: Evaluation measure for group-based record linkage
Description:
Introduction The robustness of record linkage evaluation measures is of high importance since linkage techniques are assessed based on these.
However, minimal research has been conducted to evaluate the suitability of existing evaluation measures in the context of linking groups of records.
Linkage quality is generally evaluated based on traditional measures such as precision and recall.
As we show, these traditional evaluation measures are not suitable for evaluating groups of linked records because they evaluate the quality of individual record pairs rather than the quality of records grouped into clusters.
Objectives We highlight the shortcomings of traditional evaluation measures and then propose a novel method to evaluate clustering quality in the context of group-based record linkage.
Methods The proposed linkage evaluation method assesses how well individual records have been allocated into predicted groups/clusters with respect to ground-truth data.
We first identify the best representative predicted cluster for each ground-truth cluster and, based on the resulting mapping, each record in a ground-truth cluster is assigned to one of seven categories.
These categories reflect how well the linkage technique assigned records into groups.
Results We empirically evaluate our proposed method using real-world data and show that it better reflects the quality of clusters generated by three group-based record linkage techniques.
We also show that traditional measures such as precision and recall can produce ambiguous results whereas our method does not.
Conclusions The proposed evaluation method provides unambiguous results regarding the assessed group-based record linkage approaches.
The method comprises of seven categories which reflect how each record was predicted, providing more detailed information about the quality of the linkage result.
This will help to make better-informed decisions about which linkage technique is best suited for a given linkage application.

Related Results

Federated Data Linkage in Practice
Federated Data Linkage in Practice
In recent years, great strides have been made towards the deployment of federated systems for data research, including exploring federated trusted research environments (TREs). The...
Perspectives on linkage to care for patients diagnosed with HIV: A qualitative study at a rural health center in South Western Uganda
Perspectives on linkage to care for patients diagnosed with HIV: A qualitative study at a rural health center in South Western Uganda
Linkage to care for newly diagnosed human immunodeficiency virus (HIV) patients is important to ensure that patients have good access to care. However, there is little information ...
Effects of herbal tea (Platostoma palustre) on the Hyperlipidemia in vivo
Effects of herbal tea (Platostoma palustre) on the Hyperlipidemia in vivo
Platostoma palustre jelly is a traditional food. Platostoma palustre has been used as folk medicine and is effective against heat-shock, hypertension and diabetes. Therefore, the a...
Abstract 1341: Identification of significant linkage evidence for lethal prostate cancer on chromosome arm 11p15.
Abstract 1341: Identification of significant linkage evidence for lethal prostate cancer on chromosome arm 11p15.
Abstract We performed genome wide linkage analysis in a set of high-risk prostate cancer pedigrees, each with 3 or more sampled cases whose death certificate indicat...
Linking Sensitive Data – Applications, Techniques, and Challenges
Linking Sensitive Data – Applications, Techniques, and Challenges
IntroductionThe linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging fro...
DLforum – A multidisciplinary online discussion forum for data linkage researchers and practitioners
DLforum – A multidisciplinary online discussion forum for data linkage researchers and practitioners
Data linkage, the process of identifying records that refer to the same entities across databases, is a crucial component of Population Data Science. Data linkage has a history goi...
Towards Streamlined Transparent Data Linkage
Towards Streamlined Transparent Data Linkage
Linked data is a powerful resource within data analytics and population-level research. However, methods for linkage vary and the choice of approach can impact downstream usage of ...
An Evaluation Framework for Privacy-Preserving Record Linkage
An Evaluation Framework for Privacy-Preserving Record Linkage
Privacy-preserving record linkage (PPRL) addresses the problem of identifying matching records from different databases that correspond to the same real-world entities using quasi-...

Back to Top