Javascript must be enabled to continue!
Blinded Anonymization: a method for evaluating cancer prevention programs under restrictive data protection regulations
View through CrossRef
Evaluating cancer prevention programs requires collecting and linking data on a case specific level from multiple sources of the healthcare system. Therefore, one has to comply with data protection regulations which are restrictive in Germany and will likely become stricter in Europe in general. To facilitate the mortality evaluation of the German mammography screening program, with more than 10 Million eligible women, we developed a method that does not require written individual consent and is compliant to existing privacy regulations. Our setup is composed of different data owners, a data collection center (DCC) and an evaluation center (EC). Each data owner uses a dedicated software that preprocesses plain-text personal identifiers (IDAT) and plaintext evaluation data (EDAT) in such a way that only irreversibly encrypted record assignment numbers (RAN) and pre-aggregated, reversibly encrypted EDAT are transmitted to the DCC. The DCC uses the RANs to perform a probabilistic record linkage which is based on an established and evaluated algorithm. For potentially identifying attributes within the EDAT (‘quasi-identifiers’), we developed a novel process, named ‘blinded anonymization’. It allows selecting a specific generalization from the pre-processed and encrypted attribute aggregations, to create a new data set with assured k-anonymity, without using any plain-text information. The anonymized data is transferred to the EC where the EDAT is decrypted and used for evaluation. Our concept was approved by German data protection authorities. We implemented a prototype and tested it with more than 1.5 Million simulated records, containing realistically distributed IDAT. The core processes worked well with regard to performance parameters. We created different generalizations and calculated the respective suppression rates. We discuss modalities, implications and limitations for large data sets in the cancer registry domain, as well as approaches for further improvements like l-diversity and automatic computation of ‘optimal’ generalizations.
Title: Blinded Anonymization: a method for evaluating cancer prevention programs under restrictive data protection regulations
Description:
Evaluating cancer prevention programs requires collecting and linking data on a case specific level from multiple sources of the healthcare system.
Therefore, one has to comply with data protection regulations which are restrictive in Germany and will likely become stricter in Europe in general.
To facilitate the mortality evaluation of the German mammography screening program, with more than 10 Million eligible women, we developed a method that does not require written individual consent and is compliant to existing privacy regulations.
Our setup is composed of different data owners, a data collection center (DCC) and an evaluation center (EC).
Each data owner uses a dedicated software that preprocesses plain-text personal identifiers (IDAT) and plaintext evaluation data (EDAT) in such a way that only irreversibly encrypted record assignment numbers (RAN) and pre-aggregated, reversibly encrypted EDAT are transmitted to the DCC.
The DCC uses the RANs to perform a probabilistic record linkage which is based on an established and evaluated algorithm.
For potentially identifying attributes within the EDAT (‘quasi-identifiers’), we developed a novel process, named ‘blinded anonymization’.
It allows selecting a specific generalization from the pre-processed and encrypted attribute aggregations, to create a new data set with assured k-anonymity, without using any plain-text information.
The anonymized data is transferred to the EC where the EDAT is decrypted and used for evaluation.
Our concept was approved by German data protection authorities.
We implemented a prototype and tested it with more than 1.
5 Million simulated records, containing realistically distributed IDAT.
The core processes worked well with regard to performance parameters.
We created different generalizations and calculated the respective suppression rates.
We discuss modalities, implications and limitations for large data sets in the cancer registry domain, as well as approaches for further improvements like l-diversity and automatic computation of ‘optimal’ generalizations.
Related Results
The Costs of Anonymization: Case Study Using Clinical Data (Preprint)
The Costs of Anonymization: Case Study Using Clinical Data (Preprint)
BACKGROUND
Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collab...
Edoxaban and Cancer-Associated Venous Thromboembolism: A Meta-analysis of Clinical Trials
Edoxaban and Cancer-Associated Venous Thromboembolism: A Meta-analysis of Clinical Trials
Abstract
Introduction
Cancer patients face a venous thromboembolism (VTE) risk that is up to 50 times higher compared to individuals without cancer. In 2010, direct oral anticoagul...
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Are Cervical Ribs Indicators of Childhood Cancer? A Narrative Review
Abstract
A cervical rib (CR), also known as a supernumerary or extra rib, is an additional rib that forms above the first rib, resulting from the overgrowth of the transverse proce...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract
Introduction
Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
Data Anonymization for Open Science: A Case Study
Data Anonymization for Open Science: A Case Study
ABSTRACTOne of many challenges to open science is anonymization of personal data so that it may be shared. This paper presents a case study of the anonymization of a dataset contai...
Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics
Anonymize or synthesize? Privacy-preserving methods for heart failure score analytics
Abstract
Aims
Data availability remains a critical challenge in modern, data-driven medical research. Due to the sensitive natur...
Systematic Review of Abstinence-Plus HIV Prevention Programs in High-Income Countries Dr. Sergio Grunbaum Ph.D
Systematic Review of Abstinence-Plus HIV Prevention Programs in High-Income Countries Dr. Sergio Grunbaum Ph.D
Background.
Human immunodeficiency virus (HIV), which causes AIDS, is most often spread through unprotected sex (vaginal, oral, or anal) with an infected partner. Individuals can r...
Abstract OI-1: OI-1 Decoding breast cancer predisposition genes
Abstract OI-1: OI-1 Decoding breast cancer predisposition genes
Abstract
Women with one or more first-degree female relatives with a history of breast cancer have a two-fold increased risk of developing breast cancer. This risk i...

