Javascript must be enabled to continue!

Differential privacy learned index

Indexes are fundamental components of database management systems, traditionally implemented through structures like B-Tree, Hash, and BitMap indexes. These index structures map keys to data records, optimizing search efficiency within databases. Recent advancements in machine learning have introduced the concept of learned indexes, where models such as neural networks predict the position or existence of records based on the learned data distribution. This exploratory research posits that traditional index structures can be replaced with learned models, potentially offering significant performance improvements. Initial findings indicate that neural network-based indexes can outperform cache-optimized B-Trees in speed while reducing memory usage across various real-world datasets. The growing frequency of data breaches and the necessity for strong privacy safeguards are crucial factors to consider in database management systems. An exemplary instance is the 2013 Yahoo data breach, widely regarded as one of the most significant in history. Malicious individuals leveraged a vulnerability in Yahoo’s cookie infrastructure to get unauthorized access to the personal information, including names, birthdates, email addresses, and passwords, of the entire user base consisting of 3 billion Yahoo accounts. The complete magnitude of the security breach was disclosed in 2016 while Verizon was in the process of acquiring the company, resulting in a decrease of USD 350 million in Verizon’s proposed offer. This occurrence highlights the pressing requirement for improved data security and privacy in the management of databases. Concurrently, the discipline of machine learning encounters the task of balancing the retrieval of valuable data with the safeguarding of confidentiality. Differential privacy has become a strong framework for safeguarding the privacy of individual data while also allowing for data analysis. This thesis investigates the incorporation of differential privacy into learnt index structures, analyzing machine learning algorithms that protect privacy and mechanisms for releasing data depending on learning. We examine the theoretical boundaries of differential privacy within the realm of machine learning, encompassing the maximum values of loss functions for algorithms that ensure differential privacy. The intersection of learned indexes and differential privacy presents unique challenges and opportunities. This research addresses key issues such as the incorporation of public data, handling missing data in private datasets, and the impact of differential privacy on the utility of machine learning algorithms as data volume increases. Our work aims to demonstrate that differentially private learned indexes can achieve comparable utility to non-private counterparts while ensuring robust privacy protections. This thesis provides a comprehensive overview of the potential for integrating learned indexes with differential privacy, paving the way for more secure and efficient data management systems.

University of Massachusetts Dartmouth

Tilak Mudgal

2025

Title: Differential privacy learned index

Description:

Indexes are fundamental components of database management systems, traditionally implemented through structures like B-Tree, Hash, and BitMap indexes.

These index structures map keys to data records, optimizing search efficiency within databases.

Recent advancements in machine learning have introduced the concept of learned indexes, where models such as neural networks predict the position or existence of records based on the learned data distribution.

This exploratory research posits that traditional index structures can be replaced with learned models, potentially offering significant performance improvements.

Initial findings indicate that neural network-based indexes can outperform cache-optimized B-Trees in speed while reducing memory usage across various real-world datasets.

The growing frequency of data breaches and the necessity for strong privacy safeguards are crucial factors to consider in database management systems.

An exemplary instance is the 2013 Yahoo data breach, widely regarded as one of the most significant in history.

Malicious individuals leveraged a vulnerability in Yahoo’s cookie infrastructure to get unauthorized access to the personal information, including names, birthdates, email addresses, and passwords, of the entire user base consisting of 3 billion Yahoo accounts.

The complete magnitude of the security breach was disclosed in 2016 while Verizon was in the process of acquiring the company, resulting in a decrease of USD 350 million in Verizon’s proposed offer.

This occurrence highlights the pressing requirement for improved data security and privacy in the management of databases.

Concurrently, the discipline of machine learning encounters the task of balancing the retrieval of valuable data with the safeguarding of confidentiality.

Differential privacy has become a strong framework for safeguarding the privacy of individual data while also allowing for data analysis.

This thesis investigates the incorporation of differential privacy into learnt index structures, analyzing machine learning algorithms that protect privacy and mechanisms for releasing data depending on learning.

We examine the theoretical boundaries of differential privacy within the realm of machine learning, encompassing the maximum values of loss functions for algorithms that ensure differential privacy.

The intersection of learned indexes and differential privacy presents unique challenges and opportunities.

This research addresses key issues such as the incorporation of public data, handling missing data in private datasets, and the impact of differential privacy on the utility of machine learning algorithms as data volume increases.

Our work aims to demonstrate that differentially private learned indexes can achieve comparable utility to non-private counterparts while ensuring robust privacy protections.

This thesis provides a comprehensive overview of the potential for integrating learned indexes with differential privacy, paving the way for more secure and efficient data management systems.

Back

Abstract Differential privacy has emerged as a popular privacy framework for providing privacy preserving noisy query answers based on statistical properties of databases. ...

Privacy and Security for Digital Health: Assessing Risks and Harms to Users

Electronic Health (e-Health), such as mobile health (mHealth) and Health Information Systems (HIS), benefits healthcare consumers and professionals. However, it also poses potentia...

A Privacy Protection Method for Power User Profiles That Integrates Improved Differential Privacy and Secret Sharing

ABSTRACT In response to the privacy leakage risks inherent in the big data processing of power user personas, propose a collaborative optimiz...

Privacy Risk in Recommender Systems

Nowadays, recommender systems are mostly used in many online applications to filter information and help users in selecting their relevant requirements. It avoids users to become o...

THE SECURITY AND PRIVACY MEASURING SYSTEM FOR THE INTERNET OF THINGS DEVICES

The purpose of the article: elimination of the gap in existing need in the set of clear and objective security and privacy metrics for the IoT devices users and manufacturers and a...

Privacy Threats and Privacy Preservation in Multiple Data Releases of High-Dimensional Datasets

A major challenge is when datasets are released to be utilized in the outside scope of data-collecting organizations, it is how to balance data utilities and data privacy. To achie...

Factors Affecting Students’ Privacy Paradox and Privacy Protection Behavior

AbstractIn this exploratory study, we investigate the factors affecting two opposite types of online privacy behavior: 1) online privacy paradox, i.e. a mismatch between users’ onl...

Heterogeneous Differential Privacy

The massive collection of personal data by personalization systems has rendered the preservation of privacy of individuals more and more difficult. Most of the proposed approaches ...

Email:
Password:

Email:

Differential privacy learned index

Related Results