Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Differential privacy learned index

View through CrossRef
Indexes are fundamental components of database management systems, traditionally implemented through structures like B-Tree, Hash, and BitMap indexes. These index structures map keys to data records, optimizing search efficiency within databases. Recent advancements in machine learning have introduced the concept of learned indexes, where models such as neural networks predict the position or existence of records based on the learned data distribution. This exploratory research posits that traditional index structures can be replaced with learned models, potentially offering significant performance improvements. Initial findings indicate that neural network-based indexes can outperform cache-optimized B-Trees in speed while reducing memory usage across various real-world datasets. The growing frequency of data breaches and the necessity for strong privacy safeguards are crucial factors to consider in database management systems. An exemplary instance is the 2013 Yahoo data breach, widely regarded as one of the most significant in history. Malicious individuals leveraged a vulnerability in Yahoo’s cookie infrastructure to get unauthorized access to the personal information, including names, birthdates, email addresses, and passwords, of the entire user base consisting of 3 billion Yahoo accounts. The complete magnitude of the security breach was disclosed in 2016 while Verizon was in the process of acquiring the company, resulting in a decrease of USD 350 million in Verizon’s proposed offer. This occurrence highlights the pressing requirement for improved data security and privacy in the management of databases. Concurrently, the discipline of machine learning encounters the task of balancing the retrieval of valuable data with the safeguarding of confidentiality. Differential privacy has become a strong framework for safeguarding the privacy of individual data while also allowing for data analysis. This thesis investigates the incorporation of differential privacy into learnt index structures, analyzing machine learning algorithms that protect privacy and mechanisms for releasing data depending on learning. We examine the theoretical boundaries of differential privacy within the realm of machine learning, encompassing the maximum values of loss functions for algorithms that ensure differential privacy. The intersection of learned indexes and differential privacy presents unique challenges and opportunities. This research addresses key issues such as the incorporation of public data, handling missing data in private datasets, and the impact of differential privacy on the utility of machine learning algorithms as data volume increases. Our work aims to demonstrate that differentially private learned indexes can achieve comparable utility to non-private counterparts while ensuring robust privacy protections. This thesis provides a comprehensive overview of the potential for integrating learned indexes with differential privacy, paving the way for more secure and efficient data management systems.
University of Massachusetts Dartmouth
Title: Differential privacy learned index
Description:
Indexes are fundamental components of database management systems, traditionally implemented through structures like B-Tree, Hash, and BitMap indexes.
These index structures map keys to data records, optimizing search efficiency within databases.
Recent advancements in machine learning have introduced the concept of learned indexes, where models such as neural networks predict the position or existence of records based on the learned data distribution.
This exploratory research posits that traditional index structures can be replaced with learned models, potentially offering significant performance improvements.
Initial findings indicate that neural network-based indexes can outperform cache-optimized B-Trees in speed while reducing memory usage across various real-world datasets.
The growing frequency of data breaches and the necessity for strong privacy safeguards are crucial factors to consider in database management systems.
An exemplary instance is the 2013 Yahoo data breach, widely regarded as one of the most significant in history.
Malicious individuals leveraged a vulnerability in Yahoo’s cookie infrastructure to get unauthorized access to the personal information, including names, birthdates, email addresses, and passwords, of the entire user base consisting of 3 billion Yahoo accounts.
The complete magnitude of the security breach was disclosed in 2016 while Verizon was in the process of acquiring the company, resulting in a decrease of USD 350 million in Verizon’s proposed offer.
This occurrence highlights the pressing requirement for improved data security and privacy in the management of databases.
Concurrently, the discipline of machine learning encounters the task of balancing the retrieval of valuable data with the safeguarding of confidentiality.
Differential privacy has become a strong framework for safeguarding the privacy of individual data while also allowing for data analysis.
This thesis investigates the incorporation of differential privacy into learnt index structures, analyzing machine learning algorithms that protect privacy and mechanisms for releasing data depending on learning.
We examine the theoretical boundaries of differential privacy within the realm of machine learning, encompassing the maximum values of loss functions for algorithms that ensure differential privacy.
The intersection of learned indexes and differential privacy presents unique challenges and opportunities.
This research addresses key issues such as the incorporation of public data, handling missing data in private datasets, and the impact of differential privacy on the utility of machine learning algorithms as data volume increases.
Our work aims to demonstrate that differentially private learned indexes can achieve comparable utility to non-private counterparts while ensuring robust privacy protections.
This thesis provides a comprehensive overview of the potential for integrating learned indexes with differential privacy, paving the way for more secure and efficient data management systems.

Related Results

Augmented Differential Privacy Framework for Data Analytics
Augmented Differential Privacy Framework for Data Analytics
Abstract Differential privacy has emerged as a popular privacy framework for providing privacy preserving noisy query answers based on statistical properties of databases. ...
Privacy Risk in Recommender Systems
Privacy Risk in Recommender Systems
Nowadays, recommender systems are mostly used in many online applications to filter information and help users in selecting their relevant requirements. It avoids users to become o...
THE SECURITY AND PRIVACY MEASURING SYSTEM FOR THE INTERNET OF THINGS DEVICES
THE SECURITY AND PRIVACY MEASURING SYSTEM FOR THE INTERNET OF THINGS DEVICES
The purpose of the article: elimination of the gap in existing need in the set of clear and objective security and privacy metrics for the IoT devices users and manufacturers and a...
Heterogeneous Differential Privacy
Heterogeneous Differential Privacy
The massive collection of personal data by personalization systems has rendered the preservation of privacy of individuals more and more difficult. Most of the proposed approaches ...
Per-instance Differential Privacy
Per-instance Differential Privacy
We consider a refinement of differential privacy --- per instance differential privacy (pDP), which captures the privacy of a specific individual with respect to a fixed data set. ...
Privacy in online advertising platforms
Privacy in online advertising platforms
Online advertising is consistently considered as the pillar of the "free• content on the Web since it is commonly the funding source of websites. Furthermore, the option of deliver...
Privacy awareness in generative AI: the case of ChatGPT
Privacy awareness in generative AI: the case of ChatGPT
Purpose Generative AI, like ChatGPT, uses large language models that process human language and learn from patterns identified in large data sets. Despite the great benefits offere...
Privacy-Preserving Data Analytics in Internet of Medical Things
Privacy-Preserving Data Analytics in Internet of Medical Things
The healthcare sector has changed dramatically in recent years due to depending more and more on big data to improve patient care, enhance or improve operational effectiveness, and...

Back to Top