Javascript must be enabled to continue!
Evaluating hardening techniques against cryptanalysis attacks on Bloom filter encodings for record linkage
View through CrossRef
IntroductionDue to privacy concerns personal identifiers used for linking data often have to be encoded (masked) before being linked across organisations. Bloom filter (BF) encoding is a popular privacy technique that is now employed in real-world linkage applications. Recent research has however shown that BFs are vulnerable to cryptanalysis attacks.
Objectives and ApproachAttacks on BFs either exploit that encoding frequent plain-text values (such as common names) results in corresponding frequent BFs, or they apply pattern mining to identify co-occurring BF bit positions that correspond to frequent encoded q-grams (sub-strings). In this study we empirically evaluated the privacy of individuals encoded in BFs against two recent cryptanalysis attack methods by Christen et al. (2017/2018). We used two snapshots of the North Carolina Voter Registration database for our evaluation, where pairs of records corresponding to the same voter (with name or address variations) resulted in files with 222,251 BFs and 224,061 plain-text records, respectively.
ResultsWe encoded between two and four of the fields first and last name, street, and city into one BF per record. For combinations of three and four fields all plain-text values and BFs were unique, challenging any frequency-based attack. For hardening BFs, different suggested methods (balancing, random hashing, XOR, BLIP, and salting) were applied.
Without any hardening applied up to 20.7% and 5% of plain-text values were correctly re-identified as 1-to-1 matches by both the pattern-mining and frequency-based attack methods, respectively. No more than 5\% correct 1-to-1 re-identification matches were achieved with the frequency-based attack on BFs encoding two fields when either balancing, random hashing, or XOR folding was applied; while the pattern-mining based attack was not successful in any correct re-identifications for any hardening technique.
Conclusion/ImplicationsGiven that BF encoding is now being employed in real-world linkage applications, it is important to study the limits of this privacy technique. Our experimental evaluation shows that although basic BFs without hardening technique are susceptible to cryptanalysis attacks, some hardening techniques are able to protect BFs against these attacks.
Title: Evaluating hardening techniques against cryptanalysis attacks on Bloom filter encodings for record linkage
Description:
IntroductionDue to privacy concerns personal identifiers used for linking data often have to be encoded (masked) before being linked across organisations.
Bloom filter (BF) encoding is a popular privacy technique that is now employed in real-world linkage applications.
Recent research has however shown that BFs are vulnerable to cryptanalysis attacks.
Objectives and ApproachAttacks on BFs either exploit that encoding frequent plain-text values (such as common names) results in corresponding frequent BFs, or they apply pattern mining to identify co-occurring BF bit positions that correspond to frequent encoded q-grams (sub-strings).
In this study we empirically evaluated the privacy of individuals encoded in BFs against two recent cryptanalysis attack methods by Christen et al.
(2017/2018).
We used two snapshots of the North Carolina Voter Registration database for our evaluation, where pairs of records corresponding to the same voter (with name or address variations) resulted in files with 222,251 BFs and 224,061 plain-text records, respectively.
ResultsWe encoded between two and four of the fields first and last name, street, and city into one BF per record.
For combinations of three and four fields all plain-text values and BFs were unique, challenging any frequency-based attack.
For hardening BFs, different suggested methods (balancing, random hashing, XOR, BLIP, and salting) were applied.
Without any hardening applied up to 20.
7% and 5% of plain-text values were correctly re-identified as 1-to-1 matches by both the pattern-mining and frequency-based attack methods, respectively.
No more than 5\% correct 1-to-1 re-identification matches were achieved with the frequency-based attack on BFs encoding two fields when either balancing, random hashing, or XOR folding was applied; while the pattern-mining based attack was not successful in any correct re-identifications for any hardening technique.
Conclusion/ImplicationsGiven that BF encoding is now being employed in real-world linkage applications, it is important to study the limits of this privacy technique.
Our experimental evaluation shows that although basic BFs without hardening technique are susceptible to cryptanalysis attacks, some hardening techniques are able to protect BFs against these attacks.
Related Results
Evaluating Binary Encoding Techniques in The Presence of Missing Values in Privacy-Preserving Record Linkage
Evaluating Binary Encoding Techniques in The Presence of Missing Values in Privacy-Preserving Record Linkage
IntroductionApplications in domains ranging from healthcare to national security increasingly require records about individuals in sensitive databases to be linked in privacy-prese...
Evaluation measure for group-based record linkage
Evaluation measure for group-based record linkage
Introduction The robustness of record linkage evaluation measures is of high importance since linkage techniques are assessed based on these. However, minimal research has been con...
Linking Sensitive Data – Applications, Techniques, and Challenges
Linking Sensitive Data – Applications, Techniques, and Challenges
IntroductionThe linking of sensitive databases containing personal identifying information across organisations is an increasingly important task in application domains ranging fro...
Deception-Based Security Framework for IoT: An Empirical Study
Deception-Based Security Framework for IoT: An Empirical Study
<p><b>A large number of Internet of Things (IoT) devices in use has provided a vast attack surface. The security in IoT devices is a significant challenge considering c...
Taxonomy of Attacks on Privacy-Preserving Record Linkage
Taxonomy of Attacks on Privacy-Preserving Record Linkage
Record linkage is the process of identifying records that corresponds to the same real-world entities across different databases. Due to the absence of unique entity identifiers, r...
eBF: An Enhanced Bloom Filter for Intrusion Detection in IoT
eBF: An Enhanced Bloom Filter for Intrusion Detection in IoT
Abstract
Intrusion detection is an essential process to identify malicious incidents and continuously alert the many users of the Internet of Things (IoT). The constant mon...
Deep Learning-Based Cryptanalysis of a Simplified AES Cipher
Deep Learning-Based Cryptanalysis of a Simplified AES Cipher
Recently, Deep Neural Networks have shown great deal of reliability and applicability as its applications spread in different areas. This paper proposes a cryptanalysis model based...
Distinguishing examples while building concepts in hippocampal and artificial networks
Distinguishing examples while building concepts in hippocampal and artificial networks
AbstractThe hippocampal subfield CA3 is thought to function as an autoassociative network that stores experiences as memories. Information from these experiences arrives via the en...


