The landscape of privacy laws and regulations around the world is complex and ever-changing. National and super-national laws, agreements, decrees, and other government-issued rules form a patchwork that companies must follow to operate internationally. To examine the status and evolution of this patchwork, we introduce the Privacy Law Corpus, of 1,043 privacy laws, regulations, and guidelines, covering 183 jurisdictions. This corpus enables a large-scale quantitative and qualitative examination of legal focus on privacy. We examine the temporal distribution of when privacy laws were created and illustrate the dramatic increase in privacy legislation over the past 50 years, although a finer-grained examination reveals that the rate of increase varies depending on the personal data types that privacy laws address. Our exploration also demonstrates that most privacy laws respectively address relatively few personal data types. Additionally, topic modeling results show the prevalence of common themes in privacy laws, such as finance, healthcare, and telecommunications. Finally, we release the corpus to the research community to promote further study.
more »
« less
This content will become publicly available on June 17, 2025
Creation and Analysis of an International Corpus of Privacy Laws
The landscape of privacy laws and regulations around the world is complex and ever-changing. National and super-national laws, agreements, decrees, and other government-issued rules form a patchwork that companies must follow to operate internationally. To examine the status and evolution of this patchwork, we introduce the Privacy Law Corpus, of 1,043 privacy laws, regulations, and guidelines, covering 183 jurisdictions. This corpus enables a large-scale quantitative and qualitative examination of legal focus on privacy. We examine the temporal distribution of when privacy laws were created and illustrate the dramatic increase in privacy legislation over the past 50 years, although a finer-grained examination reveals that the rate of increase varies depending on the personal data types that privacy laws address. Our exploration also demonstrates that most privacy laws respectively address relatively few personal data types. Additionally, topic modeling results show the prevalence of common themes in privacy laws, such as finance, healthcare, and telecommunications. Finally, we release the corpus to the research community to promote further study.
more »
« less
- Award ID(s):
- 1914486
- PAR ID:
- 10523176
- Publisher / Repository:
- Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
- Date Published:
- Format(s):
- Medium: X
- Location:
- https://aclanthology.org/2024.lrec-main.365/
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The landscape of privacy laws and regulations around the world is complex and ever-changing. National and super-national laws, agreements, decrees, and other government-issued rules form a patchwork that companies must follow to operate internationally. To examine the status and evolution of this patchwork, we introduce the Privacy Law Corpus, of 1,043 privacy laws, regulations, and guidelines, covering 183 jurisdictions. This corpus enables a large-scale quantitative and qualitative examination of legal focus on privacy. We examine the temporal distribution of when privacy laws were created and illustrate the dramatic increase in privacy legislation over the past 50 years, although a finer-grained examination reveals that the rate of increase varies depending on the personal data types that privacy laws address. Our exploration also demonstrates that most privacy laws respectively address relatively few personal data types. Additionally, topic modeling results show the prevalence of common themes in privacy laws, such as finance, healthcare, and telecommunications. Finally, we release the corpus to the research community to promote further study.more » « less
-
In recent years, well-known cyber breaches have placed growing pressure on organizations to implement proper privacy and data protection standards. Attacks involving the theft of employee and customer personal information have damaged the reputations of well-known brands, resulting in significant financial costs. As a result, governments across the globe are actively examining and strengthening laws to better protect the personal data of its citizens. The General Data Protection Regulation (GDPR) updates European privacy law with an array of provisions that better protect consumers and require organizations to focus on accounting for privacy in their business processes through “privacy-by-design” and “privacy by default” principles. In the US, the National Privacy Research Strategy (NPRS), makes several recommendations that reinforce the need for organizations to better protect data. In response to these rapid developments in privacy compliance, data flow mapping has emerged as a valuable tool. Data flow mapping depicts the flow of data through a system or process, enumerating specific data elements handled, while identifying the risks at different stages of the data lifecycle. This Article explains the critical features of a data flow map and discusses how mapping may improve the transparency of the data lifecycle, while recognizing the limitations in building out data flow maps and the difficulties of maintaining updated maps. The Article then explores how data flow mapping may support data collection, transfer, storage, and destruction practices pursuant to various privacy regulations. Finally, a hypothetical case study is presented to show how data flow mapping was used by an organization to stay compliant with privacy rules and to improve the transparency of information flowsmore » « less
-
This work examines privacy laws and regulations that limit disclosure of personal data, and explores whether and how these restrictions apply when participants use cryptographically secure multi-party computation (MPC). By protecting data during use, MPC offers the promise of conducting data science in a way that (in some use cases) meets or even exceeds most people’s conceptions of data privacy. With MPC, it is possible to correlate individual records across multiple datasets without revealing the underlying records, to conduct aggregate analysis across datasets which parties are otherwise unwilling to share for competitive reasons, and to analyze aggregate statistics across datasets which no individual party may lawfully hold. However, most adoptions of MPC to date involve data that is not subject to privacy protection under the law. We posit that a major impediment to the adoption of MPC—on the data that society has deemed most worthy of protection—is the difficulty of mapping this new technology onto the design principles of data privacy laws. While a computer scientist might reasonably believe that transforming any data analysis into its privacy-protective variant using MPC is a clear win, we show in this work that the technological guarantees of MPC do not directly imply compliance with privacy laws. Specifically, a lawyer will likely want to ask several important questions about the pre-conditions that are necessary for MPC to succeed, the risk that data might inadvertently or maliciously be disclosed to someone other than the output party, and what recourse to take if this bad event occurs. We have two goals for this work: explaining why the privacy law questions are nuanced and that the lawyer is correct to proceed cautiously, and providing a framework that lawyers can use to reason systematically about whether and how MPC implicates data privacy laws in the context of a specific use case. Our framework revolves around three questions: a definitional question on whether the encodings still constitute ‘personal data,’ a process question about whether the act of executing MPC constitutes a data disclosure event, and a liability question about what happens if something goes wrong. We conclude by providing advice to regulators and suggestions to early adopters to spur uptake of MPC. It is our hope that this work provides the first step toward a methodology that organizations can use when contemplating the use of MPC.more » « less
-
Nearly all software built today impinges upon end-user privacy and needs to comply with relevant regulations. Therefore, there have been increasing calls for integrating considerations of compliance with privacy regulations throughout the software engineering lifecycle. However, software engineers are typically trained in the technical fields and lack sufficient knowledge and support for sociotechnical considerations of privacy. Privacy ideation cards attempt to address this issue by making privacy compliance understandable and actionable for software developers. However, the application of privacy ideation cards in real-world software projects has not yet been systemically investigated. The effectiveness of ideation cards as a pedagogical tool has not yet been examined either. We address these gaps by studying how teams of undergraduate students applied privacy ideation cards in capstone projects that involved building real-world software for industry sponsors. We found that privacy ideation cards fostered greater consideration and understanding of the extent to which the projects aligned with privacy regulations. We identified three main themes from student discussions of privacy compliance: (i) defining personal data; (ii) assigning responsibility for privacy compliance; and (iii) determining and exercising autonomy. The results suggest that application of the cards for real-world projects requires careful consideration of intersecting factors such as the stage at which the cards are used and the autonomy available to the developers. Pedagogically, ideation cards can facilitate low-level cognitive engagement (especially the cognitive processes of meaning construction and interpretation) for specific components within a project. Higher-level cognitive processes were comparatively rare in ideation sessions. These findings provide important insight to help enhance capstone instruction and to improve privacy ideation cards to increase their impact on the privacy properties of the developed software.more » « less