Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Towards Streamlined Transparent Data Linkage

View through CrossRef
Linked data is a powerful resource within data analytics and population-level research. However, methods for linkage vary and the choice of approach can impact downstream usage of data by introducing assumptions and biases in resulting links. Selecting stringent linkage methods helps strengthen identified links at risk of missing links; meanwhile, lenient rules or ill-considered comparisons may introduce false positive links. Therefore, the approach is non-trivial, requiring careful selection of preprocessing steps, model development and quality review to ensure suitable outputs, which can require significant human expertise and insight. Real-world population-scale linkage can benefit from automation and scalability offered within modern data centres, with many tasks eligible for pipelining, such as applying predefined cleaning routines, training defined models, and generating mapping tables. Despite this, there are still pinch points requiring human interaction, such as selecting appropriate linkage fields, blocking rules and comparison methods, and reviewing quality of predictions. We present an approach to provide scalable automation in linkage pipelines, whilst retaining transparency of the linkage process for downstream users, providing them with a dataset’s life history. The work output for a given dataset is a versioned catalogue documenting the dataset’s journey, with transparent reporting of data origin, linkage settings, routines, and privacy-preserving quality analysis for inspection. This gives researchers insight into how it may affect their data and provides confidence in data usage. These insights also work in both directions, allowing users to provide feedback and iteratively refine linkage approaches.
Title: Towards Streamlined Transparent Data Linkage
Description:
Linked data is a powerful resource within data analytics and population-level research.
However, methods for linkage vary and the choice of approach can impact downstream usage of data by introducing assumptions and biases in resulting links.
Selecting stringent linkage methods helps strengthen identified links at risk of missing links; meanwhile, lenient rules or ill-considered comparisons may introduce false positive links.
Therefore, the approach is non-trivial, requiring careful selection of preprocessing steps, model development and quality review to ensure suitable outputs, which can require significant human expertise and insight.
Real-world population-scale linkage can benefit from automation and scalability offered within modern data centres, with many tasks eligible for pipelining, such as applying predefined cleaning routines, training defined models, and generating mapping tables.
Despite this, there are still pinch points requiring human interaction, such as selecting appropriate linkage fields, blocking rules and comparison methods, and reviewing quality of predictions.
We present an approach to provide scalable automation in linkage pipelines, whilst retaining transparency of the linkage process for downstream users, providing them with a dataset’s life history.
The work output for a given dataset is a versioned catalogue documenting the dataset’s journey, with transparent reporting of data origin, linkage settings, routines, and privacy-preserving quality analysis for inspection.
This gives researchers insight into how it may affect their data and provides confidence in data usage.
These insights also work in both directions, allowing users to provide feedback and iteratively refine linkage approaches.

Related Results

Evaluation measure for group-based record linkage
Evaluation measure for group-based record linkage
Introduction The robustness of record linkage evaluation measures is of high importance since linkage techniques are assessed based on these. However, minimal research has been con...
Streamlining Grant Applications - What are the probabilities a streamlined grant is fundable and that a fundable grant is streamlined?
Streamlining Grant Applications - What are the probabilities a streamlined grant is fundable and that a fundable grant is streamlined?
Background: Securing qualified peer reviewers for public granting agencies is challenging and to avoid needlessly overworking these volunteers, there is increasing reliance on tria...
Perspectives on linkage to care for patients diagnosed with HIV: A qualitative study at a rural health center in South Western Uganda
Perspectives on linkage to care for patients diagnosed with HIV: A qualitative study at a rural health center in South Western Uganda
Linkage to care for newly diagnosed human immunodeficiency virus (HIV) patients is important to ensure that patients have good access to care. However, there is little information ...
Federated Data Linkage in Practice
Federated Data Linkage in Practice
In recent years, great strides have been made towards the deployment of federated systems for data research, including exploring federated trusted research environments (TREs). The...
DLforum – A multidisciplinary online discussion forum for data linkage researchers and practitioners
DLforum – A multidisciplinary online discussion forum for data linkage researchers and practitioners
Data linkage, the process of identifying records that refer to the same entities across databases, is a crucial component of Population Data Science. Data linkage has a history goi...
Establishing an ethics and governance framework for access to and linkage of electronic health data for research projects.
Establishing an ethics and governance framework for access to and linkage of electronic health data for research projects.
ObjectivesTo develop an ethics and governance framework for the National Centre for Healthy Ageing (NCHA) data platform that supports: streamlined access to data for research; tran...
Linking Sensitive Data – Applications, Techniques, and Challenges
Linking Sensitive Data – Applications, Techniques, and Challenges
IntroductionThe linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging fro...
Abstract 1341: Identification of significant linkage evidence for lethal prostate cancer on chromosome arm 11p15.
Abstract 1341: Identification of significant linkage evidence for lethal prostate cancer on chromosome arm 11p15.
Abstract We performed genome wide linkage analysis in a set of high-risk prostate cancer pedigrees, each with 3 or more sampled cases whose death certificate indicat...

Back to Top