Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage

View through CrossRef
Introduction Over the last decade, the demand for linking records about people across databases has increased in various domains. Privacy challenges associated with linking sensitive information led to the development of privacy-preserving record linkage techniques. The multiple dynamic match-key encoding approach recently proposed by Randall et al. (IJPDS, 2019) is such a technique aimed at providing sufficient privacy for linkage applications while obtaining high linkage quality. However, the use of this encoding in large databases can reveal frequency information that can allow the re-identification of encoded values. Objectives We propose a frequency-based attack to evaluate the privacy guarantees of multiple dynamic match-key encoding. We then present two improvements to this match-key encoding approach to prevent such a privacy attack. Methods The proposed attack analyses the frequency distributions of individual match-keys in order to identify the attributes used for each match-key, where we assume the adversary has access to a plain-text database with similar characteristics as the encoded database. We employ a set of statistical correlation tests to compare the frequency distributions of match-key values between the encoded and plain-text databases. Once the attribute combinations used for match-keys are discovered, we then re-identify encoded sensitive values by utilising a frequency alignment method. Next, we propose two modifications to the match-key encoding; one to alter the original frequency distributions and another to make the frequency distributions uniform. Both will help to prevent frequency-based attacks. Results We evaluate our privacy attack using two large real-world databases. The results show that in certain situations the attack can successfully re-identify a set of sensitive values encoded using the multiple dynamic match-key encoding approach. On the databases used in our experiments, the attack is able to re-identify plain-text values with a precision and recall of both up to 98%. Furthermore, we show that our proposed improvements are able to make this attack harder to perform with only a small reduction in linkage quality. Conclusions Our proposed privacy attack demonstrates the weaknesses of multiple match-key encoding that should be taken into consideration when linking databases that contain sensitive personal information. Our proposed modifications ensure that the multiple dynamic match-key encoding approach can be used securely while retaining high linkage quality.
Title: Privacy Attack on Multiple Dynamic Match-key based Privacy-Preserving Record Linkage
Description:
Introduction Over the last decade, the demand for linking records about people across databases has increased in various domains.
Privacy challenges associated with linking sensitive information led to the development of privacy-preserving record linkage techniques.
The multiple dynamic match-key encoding approach recently proposed by Randall et al.
(IJPDS, 2019) is such a technique aimed at providing sufficient privacy for linkage applications while obtaining high linkage quality.
However, the use of this encoding in large databases can reveal frequency information that can allow the re-identification of encoded values.
Objectives We propose a frequency-based attack to evaluate the privacy guarantees of multiple dynamic match-key encoding.
We then present two improvements to this match-key encoding approach to prevent such a privacy attack.
Methods The proposed attack analyses the frequency distributions of individual match-keys in order to identify the attributes used for each match-key, where we assume the adversary has access to a plain-text database with similar characteristics as the encoded database.
We employ a set of statistical correlation tests to compare the frequency distributions of match-key values between the encoded and plain-text databases.
Once the attribute combinations used for match-keys are discovered, we then re-identify encoded sensitive values by utilising a frequency alignment method.
Next, we propose two modifications to the match-key encoding; one to alter the original frequency distributions and another to make the frequency distributions uniform.
Both will help to prevent frequency-based attacks.
Results We evaluate our privacy attack using two large real-world databases.
The results show that in certain situations the attack can successfully re-identify a set of sensitive values encoded using the multiple dynamic match-key encoding approach.
On the databases used in our experiments, the attack is able to re-identify plain-text values with a precision and recall of both up to 98%.
Furthermore, we show that our proposed improvements are able to make this attack harder to perform with only a small reduction in linkage quality.
Conclusions Our proposed privacy attack demonstrates the weaknesses of multiple match-key encoding that should be taken into consideration when linking databases that contain sensitive personal information.
Our proposed modifications ensure that the multiple dynamic match-key encoding approach can be used securely while retaining high linkage quality.

Related Results

Linking Sensitive Data – Applications, Techniques, and Challenges
Linking Sensitive Data – Applications, Techniques, and Challenges
IntroductionThe linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging fro...
Federated Data Linkage in Practice
Federated Data Linkage in Practice
In recent years, great strides have been made towards the deployment of federated systems for data research, including exploring federated trusted research environments (TREs). The...
Evaluation measure for group-based record linkage
Evaluation measure for group-based record linkage
Introduction The robustness of record linkage evaluation measures is of high importance since linkage techniques are assessed based on these. However, minimal research has been con...
Taxonomy of Attacks on Privacy-Preserving Record Linkage
Taxonomy of Attacks on Privacy-Preserving Record Linkage
Record linkage is the process of identifying records that corresponds to the same real-world entities across different databases. Due to the absence of unique entity identifiers, r...
An Evaluation Framework for Privacy-Preserving Record Linkage
An Evaluation Framework for Privacy-Preserving Record Linkage
Privacy-preserving record linkage (PPRL) addresses the problem of identifying matching records from different databases that correspond to the same real-world entities using quasi-...
Augmented Differential Privacy Framework for Data Analytics
Augmented Differential Privacy Framework for Data Analytics
Abstract Differential privacy has emerged as a popular privacy framework for providing privacy preserving noisy query answers based on statistical properties of databases. ...
Blood Cross Matching Without Anti-Human Globulin (AHG) and Bovine Serum: A New Interest for an Old Idea
Blood Cross Matching Without Anti-Human Globulin (AHG) and Bovine Serum: A New Interest for an Old Idea
Abstract  Introduction Transfusion medicine promotes the safety of blood transfusions by rigorously testing to eliminate risks of infection and hemolytic. The efficacy (to correct ...
Parameterized Strings: Algorithms and Applications
Parameterized Strings: Algorithms and Applications
The parameterized string (p-string), a generalization of the traditional string, is composed of constant and parameter symbols. A parameterized match (p-match) exists between two p...

Back to Top