The General Data Protection Regulation (GDPR) in the European Union contains directions on how user data may be collected, stored, and when it must be deleted. As similar legislation is developed around the globe, there is the potential for repercussions across multiple fields of research, including educational data mining (EDM). Over the past two decades, the EDM community has taken consistent steps to protect learner privacy within our research, whilst pursuing goals that will benefit their learning. However, recent privacy legislation may cause our practices to need to change. The right to be forgotten states that users have the right to request that all their data (including deidentified data generated by them) be removed. In this paper, we discuss the potential challenges of this legislation for EDM research, including impacts on Open Science practices, Data Modeling, and Data sharing. We also consider changes to EDM best practices that may aid compliance with this new legislation.
more »
« less
Navigating the United States Legislative Landscape on Voice Privacy: Existing Laws, Proposed Bills, Protection for Children, and Synthetic Data for AI
Privacy is a hot topic for policymakers across the globe, including the United States. Evolving advances in AI and emerging concerns about the misuse of personal data have pushed policymakers to draft legislation on trustworthy AI and privacy protection for its citizens. This paper presents the state of the privacy legislation at the U.S. Congress and outlines how voice data is considered as part of the legislation definition. This paper also reviews additional privacy protection for children. This paper presents a holistic review of enacted and proposed privacy laws, and consideration for voice data, including guidelines for processing children’s data, in those laws across the fifty U.S. states. As a groundbreaking alternative to actual human data, ethically generated synthetic data allows much flexibility to keep AI innovation in progress. Given the consideration of synthetic data in AI legislation by policymakers to be relatively new, as compared to that of privacy laws, this paper reviews regulatory considerations for synthetic data.
more »
« less
- Award ID(s):
- 2234916
- PAR ID:
- 10610082
- Publisher / Repository:
- ISCA
- Date Published:
- Page Range / eLocation ID:
- 91 to 95
- Subject(s) / Keyword(s):
- voice privacy privacy laws children’s privacy synthetic data AI legislation
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Silicon Valley, and the U.S. tech sector more broadly, have changed the world in part by embracing a “move fast and break things” mentality popularized by Mark Zuckerberg. While it is true that the tech sector has attempted to break with such a reactive and flippant response to security concerns, including at Microsoft itself through its Security Development Lifecycle, cyberattacks continue at an alarming rate. As a result, there are growing calls from regulators around the world to change the risk equation. An example is the 2023 U.S. National Cybersecurity Strategy, which argues that “[w]e must hold the stewards of our data accountable for the protection of personal data; drive the development of more secure connected devices; and reshape laws that govern liability for data losses and harm caused by cybersecurity errors, software vulnerabilities, and other risks created by software and digital technologies.” What exact form such liability should take is up for debate. The defect model of products liability law is one clear option, and courts across the United States have already been applying it using both strict liability and risk utility framings in a variety of cases. This Article delves into the debates by considering how other cyber powers around the world—including the European Union—are extending products liability law to cover software, and it examines the lessons these efforts hold for U.S. policymakers with case studies focusing on liability for AI- generated content and Internet-connected critical infrastructure.more » « less
-
Compliance with data retention laws and legislation is an important aspect of data management. As new laws governing personal data management are introduced (e.g., California Consumer Privacy Act enacted in 2020) and a greater emphasis is placed on enforcing data privacy law compliance, data retention support must be an inherent part of data management systems. However, relational databases do not currently offer functionality to enforce retention compliance. In this paper, we propose a framework that integrates data retention support into any relational database. Using SQL-based mechanisms, our system supports an intuitive definition of data retention policies. We demonstrate that our approach meets the legal requirements of retention and can be implemented to transparently guarantee compliance. Our framework streamlines compliance support without requiring database schema changes, while incurring an average 6.7% overhead compared to the current state-of-the-art solution.more » « less
-
As new laws governing management of personal data are introduced, e.g., the European Union’s General Data Protection Regulation of 2016 and the California Consumer Privacy Act of 2018, compliance with data governance legislation is becoming an increasingly important aspect of data management. An important component of many data privacy laws is that they require companies to only use an individual’s data for a purpose the individual has explicitly consented to. Prior methods for enforcing consent for aggregate queries either use access control to eliminate data without consent from query evaluation or apply differential privacy algorithms to inject synthetic noise into the outcomes of queries (or input data) to ensure that the anonymity of non-consenting individuals is preserved with high probability. Both approaches return query results that differ from the ground truth results corresponding to the full input containing data from both consenting and non-consenting individuals. We present an alternative frame- work for group-by aggregate queries, tailored for applications, e.g., medicine, where even a small deviation from the correct answer to a query cannot be tolerated. Our approach uses provenance to determine, for each output tuple of a group-by aggregate query, which individual’s data was used to derive the result for this group. We then use statistical tests to determine how likely it is that the presence of data for a non-consenting individual will be revealed by such an output tuple. We filter out tuples for which this test fails, i.e., which are deemed likely to reveal non-consenting data. Thus, our approach always returns a subset of the ground truth query answers. Our experiments successfully return only 100% accurate results in instances where access control or differential privacy would have either returned less total or less accurate results.more » « less
-
Patient-generated health data (PGHD), created and captured from patients via wearable devices and mobile apps, are proliferating outside of clinical settings. Examples include sleep tracking, fitness trackers, continuous glucose monitors, and RFID-enabled implants, with many additional biometric or health surveillance applications in development or envisioned. These data are included in growing stockpiles of personal health data being mined for insight via big data analytics and artificial intelligence/deep learning technologies. Governing these data resources to facilitate patient care and health research while preserving individual privacy and autonomy will be challenging, as PGHD are the least regulated domains of digitalized personal health data (U.S. Department of Health and Human Services, 2018). When patients themselves collect digitalized PGHD using “apps” provided by technology firms, these data fall outside of conventional health data regulation, such as HIPAA. Instead, PGHD are maintained primarily on the information technology infrastructure of vendors, and data are governed under the IT firm’s own privacy policies and within the firm’s intellectual property rights. Dominant narratives position these highly personal data as valuable resources to transform healthcare, stimulate innovation in medical research, and engage individuals in their health and healthcare. However, ensuring privacy, security, and equity of benefits from PGHD will be challenging. PGHD can be aggregated and, despite putative “deidentification,” be linked with other health, economic, and social data for predictive analytics. As large tech companies enter the healthcare sector (e.g., Google Health is partnering with Ascension Health to analyze the PHI of millions of people across 21 U.S. states), the lack of harmonization between regulatory regimes may render existing safeguards to preserve patient privacy and control over their PHI ineffective. While healthcare providers are bound to adhere to health privacy laws, Big Tech comes under more relaxed regulatory regimes that will facilitate monetizing PGHD. We explore three existing data protection regimes relevant to PGHD in the United States that are currently in tension with one another: federal and state health-sector laws, data use and reuse for research and innovation, and industry self-regulation by large tech companies We then identify three types of structures (organizational, regulatory, technological/algorithmic), which synergistically could help enact needed regulatory oversight while limiting the friction and economic costs of regulation. This analysis provides a starting point for further discussions and negotiations among stakeholders and regulators to do so.more » « less
An official website of the United States government

