skip to main content


This content will become publicly available on July 10, 2024

Title: K9db: Privacy-Compliant Storage For Web Applications By Construction
Data privacy laws like the EU’s GDPR grant users new rights, such as the right to request access to and deletion of their data. Manual compliance with these requests is error-prone and imposes costly burdens especially on smaller organizations, as non-compliance risks steep fines. K9db is a new, MySQL-compatible database that complies with privacy laws by construction. The key idea is to make the data ownership and sharing semantics explicit in the storage system. This requires K9db to capture and enforce applications’ complex data ownership and sharing semantics, but in exchange simplifies privacy compliance. Using a small set of schema annotations, K9db infers storage organization, generates procedures for data retrieval and deletion, and reports compliance errors if an application risks violating the GDPR. Our K9db prototype successfully expresses the data sharing semantics of real web applications, and guides developers to getting privacy compliance right. K9db also matches or exceeds the performance of existing storage systems, at the cost of a modest increase in state size.  more » « less
Award ID(s):
2045170 2039354
NSF-PAR ID:
10486260
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
USENIX
Date Published:
Journal Name:
Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI)
Format(s):
Medium: X
Location:
Boston, MA, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. New laws such as the European Union’s General Data Protection Regulation (GDPR) grant users unprecedented control over personal data stored and processed by businesses. Compliance can require expensive manual labor or retrofitting of existing systems, e.g., to handle data retrieval and removal requests. We argue for treating these new requirements as an opportunity for new system designs. These designs should make data ownership a first-class concern and achieve compliance with privacy legislation by construction. A compliant-by-construction system could build a shared database, with similar performance as current systems, from personal databases that let users contribute, audit, retrieve, and remove their personal data through easy-to-understand APIs. Realizing compliant-by-construction systems requires new cross-cutting abstractions that make data dependencies explicit and that augment classic data processing pipelines with ownership information. We suggest what such abstractions might look like, and highlight existing technologies that we believe make compliant-by-construction systems feasible today. We believe that progress towards such systems is at hand, and highlight challenges for researchers to address to make them a reality. 
    more » « less
  2. In recent years, well-known cyber breaches have placed growing pressure on organizations to implement proper privacy and data protection standards. Attacks involving the theft of employee and customer personal information have damaged the reputations of well-known brands, resulting in significant financial costs. As a result, governments across the globe are actively examining and strengthening laws to better protect the personal data of its citizens. The General Data Protection Regulation (GDPR) updates European privacy law with an array of provisions that better protect consumers and require organizations to focus on accounting for privacy in their business processes through “privacy-by-design” and “privacy by default” principles. In the US, the National Privacy Research Strategy (NPRS), makes several recommendations that reinforce the need for organizations to better protect data. In response to these rapid developments in privacy compliance, data flow mapping has emerged as a valuable tool. Data flow mapping depicts the flow of data through a system or process, enumerating specific data elements handled, while identifying the risks at different stages of the data lifecycle. This Article explains the critical features of a data flow map and discusses how mapping may improve the transparency of the data lifecycle, while recognizing the limitations in building out data flow maps and the difficulties of maintaining updated maps. The Article then explores how data flow mapping may support data collection, transfer, storage, and destruction practices pursuant to various privacy regulations. Finally, a hypothetical case study is presented to show how data flow mapping was used by an organization to stay compliant with privacy rules and to improve the transparency of information flows 
    more » « less
  3. The General Data Protection Regulation (GDPR) in the European Union contains directions on how user data may be collected, stored, and when it must be deleted. As similar legislation is developed around the globe, there is the potential for repercussions across multiple fields of research, including educational data mining (EDM). Over the past two decades, the EDM community has taken consistent steps to protect learner privacy within our research, whilst pursuing goals that will benefit their learning. However, recent privacy legislation may cause our practices to need to change. The right to be forgotten states that users have the right to request that all their data (including deidentified data generated by them) be removed. In this paper, we discuss the potential challenges of this legislation for EDM research, including impacts on Open Science practices, Data Modeling, and Data sharing. We also consider changes to EDM best practices that may aid compliance with this new legislation. 
    more » « less
  4. New privacy laws like the European Union's General Data Protection Regulation (GDPR) require database administrators (DBAs) to identify all information related to an individual on request, e.g. , to return or delete it. This requires time-consuming manual labor today, particularly for legacy schemas and applications. In this paper, we investigate what it takes to provide mostly-automated tools that assist DBAs in GDPR-compliant data extraction for legacy databases. We find that a combination of techniques is needed to realize a tool that works for the databases of real-world applications, such as web applications, which may violate strict normal forms or encode data relationships in bespoke ways. Our tool, GDPRizer, relies on foreign keys, query logs that identify implied relationships, data-driven methods, and coarse-grained annotations provided by the DBA to extract an individual's data. In a case study with three popular web applications, GDPRizer achieves 100% precision and 96--100% recall. GDPRizer saves work compared to hand-written queries, and while manual verification of its outputs is required, GDPRizer simplifies privacy compliance. 
    more » « less
  5. null (Ed.)
    Soteria is a user right management system designed to safeguard user-data privacy in a transparent and provable manner in compliance to regulations such as GDPR and CCPA. Soteria represents user data rights as formal executable sharing agreements, which can automatically be translated into a human readable form and enforced as data are queried. To support revocation and to prove compliance, an indelible, audited trail of the hash of data access and sharing agreements are stored on a two-layer distributed ledger. The main chain ensures partition tolerance and availability (PA) properties while side chains ensure consistency and availability (CA), thus providing the three properties of the CAP (consistency, availability, and partition tolerance) theorem. Besides depicting the two-layer architecture of Soteria, this paper evaluates representative consensus protocols and recommends side-chain and inter-chain management strategies for improving latency and throughput. 
    more » « less