Javascript must be enabled to continue!

Differential privacy learned index

Indexes are fundamental components of database management systems, traditionally implemented through structures like B-Tree, Hash, and BitMap indexes. These index structures map keys to data records, optimizing search efficiency within databases. Recent advancements in machine learning have introduced the concept of learned indexes, where models such as neural networks predict the position or existence of records based on the learned data distribution. This exploratory research posits that traditional index structures can be replaced with learned models, potentially offering significant performance improvements. Initial findings indicate that neural network-based indexes can outperform cache-optimized B-Trees in speed while reducing memory usage across various real-world datasets. The growing frequency of data breaches and the necessity for strong privacy safeguards are crucial factors to consider in database management systems. An exemplary instance is the 2013 Yahoo data breach, widely regarded as one of the most significant in history. Malicious individuals leveraged a vulnerability in Yahoo’s cookie infrastructure to get unauthorized access to the personal information, including names, birthdates, email addresses, and passwords, of the entire user base consisting of 3 billion Yahoo accounts. The complete magnitude of the security breach was disclosed in 2016 while Verizon was in the process of acquiring the company, resulting in a decrease of USD 350 million in Verizon’s proposed offer. This occurrence highlights the pressing requirement for improved data security and privacy in the management of databases. Concurrently, the discipline of machine learning encounters the task of balancing the retrieval of valuable data with the safeguarding of confidentiality. Differential privacy has become a strong framework for safeguarding the privacy of individual data while also allowing for data analysis. This thesis investigates the incorporation of differential privacy into learnt index structures, analyzing machine learning algorithms that protect privacy and mechanisms for releasing data depending on learning. We examine the theoretical boundaries of differential privacy within the realm of machine learning, encompassing the maximum values of loss functions for algorithms that ensure differential privacy. The intersection of learned indexes and differential privacy presents unique challenges and opportunities. This research addresses key issues such as the incorporation of public data, handling missing data in private datasets, and the impact of differential privacy on the utility of machine learning algorithms as data volume increases. Our work aims to demonstrate that differentially private learned indexes can achieve comparable utility to non-private counterparts while ensuring robust privacy protections. This thesis provides a comprehensive overview of the potential for integrating learned indexes with differential privacy, paving the way for more secure and efficient data management systems.

University of Massachusetts Dartmouth

Tilak Mudgal

2025

Title: Differential privacy learned index

Description:

Indexes are fundamental components of database management systems, traditionally implemented through structures like B-Tree, Hash, and BitMap indexes.

These index structures map keys to data records, optimizing search efficiency within databases.

Recent advancements in machine learning have introduced the concept of learned indexes, where models such as neural networks predict the position or existence of records based on the learned data distribution.

This exploratory research posits that traditional index structures can be replaced with learned models, potentially offering significant performance improvements.

Initial findings indicate that neural network-based indexes can outperform cache-optimized B-Trees in speed while reducing memory usage across various real-world datasets.

The growing frequency of data breaches and the necessity for strong privacy safeguards are crucial factors to consider in database management systems.

An exemplary instance is the 2013 Yahoo data breach, widely regarded as one of the most significant in history.

Malicious individuals leveraged a vulnerability in Yahoo’s cookie infrastructure to get unauthorized access to the personal information, including names, birthdates, email addresses, and passwords, of the entire user base consisting of 3 billion Yahoo accounts.

The complete magnitude of the security breach was disclosed in 2016 while Verizon was in the process of acquiring the company, resulting in a decrease of USD 350 million in Verizon’s proposed offer.

This occurrence highlights the pressing requirement for improved data security and privacy in the management of databases.

Concurrently, the discipline of machine learning encounters the task of balancing the retrieval of valuable data with the safeguarding of confidentiality.

Differential privacy has become a strong framework for safeguarding the privacy of individual data while also allowing for data analysis.

This thesis investigates the incorporation of differential privacy into learnt index structures, analyzing machine learning algorithms that protect privacy and mechanisms for releasing data depending on learning.

We examine the theoretical boundaries of differential privacy within the realm of machine learning, encompassing the maximum values of loss functions for algorithms that ensure differential privacy.

The intersection of learned indexes and differential privacy presents unique challenges and opportunities.

This research addresses key issues such as the incorporation of public data, handling missing data in private datasets, and the impact of differential privacy on the utility of machine learning algorithms as data volume increases.

Our work aims to demonstrate that differentially private learned indexes can achieve comparable utility to non-private counterparts while ensuring robust privacy protections.

This thesis provides a comprehensive overview of the potential for integrating learned indexes with differential privacy, paving the way for more secure and efficient data management systems.

Back

Abstract Differential privacy has emerged as a popular privacy framework for providing privacy preserving noisy query answers based on statistical properties of databases. ...

Privacy Risk in Recommender Systems

Nowadays, recommender systems are mostly used in many online applications to filter information and help users in selecting their relevant requirements. It avoids users to become o...

THE SECURITY AND PRIVACY MEASURING SYSTEM FOR THE INTERNET OF THINGS DEVICES

The purpose of the article: elimination of the gap in existing need in the set of clear and objective security and privacy metrics for the IoT devices users and manufacturers and a...

Heterogeneous Differential Privacy

The massive collection of personal data by personalization systems has rendered the preservation of privacy of individuals more and more difficult. Most of the proposed approaches ...

Differential privacy and SPARQL

Differential privacy is a framework that provides formal tools to develop algorithms to access databases and answer statistical queries with quantifiable accuracy and privacy guara...

Hierarchical Aggregation for Numerical Data under Local Differential Privacy

The proposal of local differential privacy solves the problem that the data collector must be trusted in centralized differential privacy models. The statistical analysis of numeri...

Per-instance Differential Privacy

We consider a refinement of differential privacy --- per instance differential privacy (pDP), which captures the privacy of a specific individual with respect to a fixed data set. ...

Privacy in online advertising platforms

Online advertising is consistently considered as the pillar of the "free• content on the Web since it is commonly the funding source of websites. Furthermore, the option of deliver...

Email:
Password:

Email:

Differential privacy learned index

Related Results