skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Purging Compliance from Database Backups by Encryption
Data compliance laws establish rules intended to protect privacy. These define both retention durations (how long data must be kept) and purging deadlines (when the data must be destroyed in storage). To comply with the laws and to minimize liability, companies must destroy data that must be purged or is no longer needed. However, database backups generally cannot be edited to purge ``expired'' data and erasing the entire backup is impractical. To maintain compliance, data curators need a mechanism to support targeted destruction of data in backups. In this paper, we present a cryptographic erasure framework that can purge data from across database backups. We demonstrate how different purge policies can be defined through views and enforced without violating database constraints.  more » « less
Award ID(s):
2016548
PAR ID:
10380892
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Journal of Data Intelligence
Volume:
3
Issue:
1
ISSN:
2577-610X
Page Range / eLocation ID:
149 to 168
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Data retention laws establish rules intended to protect privacy. These define both retention durations (how long data must be kept) and purging deadlines (when the data must be destroyed in storage). To comply with the laws and to minimize liability, companies should destroy data that must be purged or is no longer needed. However, database backups generally cannot be edited to purge “expired” data and erasing the entire backup is impractical. To maintain compliance, data curators need a mechanism to support targeted destruction of data in backups. In this paper, we present a cryptographic erasure framework that can purge data from all database backups. Our approach can be transparently integrated into existing database backup processes. We demonstrate how different purge policies can be defined through views and enforced by triggers without violating database constraints. 
    more » « less
  2. Most organizations rely on relational database(s) for their day-to-day business functions. Data management policies fall under the umbrella of IT Operations, dictated by a combination of internal organizational policies and government regulations. Many privacy laws (such as Europe’s General Data Protection Regulation and California’s Consumer Privacy Act) establish policy requirements for organizations, requiring the preservation or purging of certain customer data across their systems. Organization disaster recovery policies also mandate backup policies to prevent data loss. Thus, the data in these databases are subject to a range of policies, including data retention and data purging rules, which may come into conflict with the need for regular backups. In this paper, we discuss the trade-offs between different compliance mechanisms to maintain IT Operational policies. We consider the practical availability of data in an active relational database and in a backup, including: 1) supporting data privacy rules with respect to preserving or purging customer data, and 2) the application performance impact caused by the database policy implementation. We first discuss the state of data privacy compliance in database systems. We then look at enforcement of common IT operational policies with regard to database backups. We consider different implementations used to enforce privacy rule compliance combined with a detailed discussion for how these approaches impact the performance of a database at different phases. We demonstrate that naive compliance implementations will incur a prohibitively high cost and impose onerous restrictions on backup and restore process, but will not affect daily user query transaction cost. However, we also show that other solutions can achieve a far lower backup and restore costs at a price of a small (<5%) overhead to non-SELECT queries. 
    more » « less
  3. Compliance with data retention laws and legislation is an important aspect of data management. As new laws governing personal data management are introduced (e.g., California Consumer Privacy Act enacted in 2020) and a greater emphasis is placed on enforcing data privacy law compliance, data retention support must be an inherent part of data management systems. However, relational databases do not currently offer functionality to enforce retention compliance. In this paper, we propose a framework that integrates data retention support into any relational database. Using SQL-based mechanisms, our system supports an intuitive definition of data retention policies. We demonstrate that our approach meets the legal requirements of retention and can be implemented to transparently guarantee compliance. Our framework streamlines compliance support without requiring database schema changes, while incurring an average 6.7% overhead compared to the current state-of-the-art solution. 
    more » « less
  4. Best practices in data management and privacy mandate that old data must be irreversibly destroyed. However, due to performance optimization reasons, old (deleted or updated) data is not immediately purged from active database storage. Database backups that typically work by backing up table and index pages (rather than logical rows) greatly exacerbate the privacy problem of the old surviving data. Copying such deleted data into backups ensures that unknown quantities of old data can be stored indefinitely. In this paper, we quantify the amount of deleted data retained in backups by four major representative databases, comparing the default behavior versus an explicit defrag operation. We review the defrag options available in these databases and discuss the impact they have on eliminating old data from backups. We demonstrate that each database has a defrag mechanism that can eliminate most of old deleted data (although in Oracle pre-update content may survive defrag). Finally, we outline the factors that organizations should consider when deciding whether to apply defrag prior to executing their backups. 
    more » « less
  5. Data privacy policy requirements are a quickly evolving part of the data management domain. Healthcare (e.g., HIPAA), financial (e.g., GLBA), and general laws such as GDPR or CCPA impose controls on how personal data should be managed. Relational databases do not offer built-in features to support data management features to comply with such laws. As a result, many organizations implement ad-hoc solutions or use third party tools to ensure compliance with privacy policies. However, external compliance framework can conflict with the internal activity in a database (e.g., trigger side-effects or aborted transactions). In our prior work, we introduced a framework that integrates data retention and data purging compliance into the database itself, requiring only the support for triggers and encryption, which are already available in any mainstream database engine. In this demonstration paper, we introduce DBCompliant – a tool that demonstrates how our approach can seamlessly integrate comprehensive policy compliance (defined via SQL queries). Although we use PostgreSQL as our back-end, DBCompliant could be adapted to any other relational database. Finally, our approach imposes low (less than 5%) user query overhead. 
    more » « less