The computer science literature on identification of people using personal information paints a wide spectrum, from aggregate information that doesn’t contain information about individual people, to information that itself identifies a person. However, privacy laws and regulations often distinguish between only two types, often called personally identifiable information and de-identified information. We show that the collapse of this technological spectrum of identifiability into only two legal definitions results in the failure to encourage privacy-preserving practices. We propose a set of legal definitions that spans the spectrum. We start with anonymous information. Computer science has created anonymization algorithms, including differential privacy, that provide mathematical guarantees that a person cannot be identified. Although the California Consumer Privacy Act (CCPA) defines aggregate information, it treats aggregate information the same as de-identified information. We propose a definition of anonymous information based on the technological possibility of logical association of the information with other information. We argue for the exclusion of anonymous information from notice and consent requirements. We next consider de-identified information. Computer science has created de-identification algorithms, including generalization, that minimize (but not eliminate) the risk of re-identification. GDPR defines anonymous information but not de-identified information, and CCPA defines de-identified information but not anonymous information. The definitions do not align. We propose a definition of de-identified information based on the reasonableness of association with other information. We propose legal controls to protect against re-identification. We argue for the inclusion of de-identified information in notice requirements, but the exclusion of de-identified information from choice requirements. We next address the distinction between trackable and non-trackable information. Computer science has shown how one-time identifiers can be used to protect reasonably linkable information from being tracked over time. Although both GDPR and CCPA discuss profiling, neither formally defines it as a form of personal information, and thus both fail to adequately protect against it. We propose definitions of trackable information and non-trackable information based on the likelihood of association with information from other contexts. We propose a set of legal controls to protect against tracking. We argue for requiring stronger forms of user choice for trackable information, which will encourage the use of non-trackable information. Finally, we address the distinction between pseudonymous and reasonably identifiable information. Computer science has shown how pseudonyms can be used to reduce identification. Neither GDPR nor CCPA makes a distinction between pseudonymous and reasonable identifiable information. We propose definitions based on the reasonableness of identifiability of the information, and we propose a set of legal controls to protect against identification. We argue for requiring stronger forms of user choice for reasonably identifiable information, which will encourage the use of pseudonymous information. Our definitions of anonymous information, de-identified information, non-trackable information, trackable information, and reasonably identifiable information can replace the over-simplified distinction between personally identifiable information versus de-identified information. We hope that this full spectrum of definitions can be used in a comprehensive privacy law to tailor notice and consent requirements to the characteristics of each type of information.
more »
« less
This content will become publicly available on November 4, 2025
DiffAudit: Auditing Privacy Practices of Online Services for Children and Adolescents
Children’s and adolescents’ online data privacy are regulated by laws such as the Children’s Online Privacy Protection Act (COPPA) and the California Consumer Privacy Act (CCPA). Online services that are directed towards general audiences (i.e., including children, adolescents, and adults) must comply with these laws. In this paper, first, we present DiffAudit, a platform-agnostic privacy auditing methodology for general audience services. DiffAudit performs differential analysis of network traffic data flows to compare data processing practices (i) between child, adolescent, and adult users and (ii) before and after consent is given and user age is disclosed. We also present a data type classification method that utilizes GPT-4 and our data type ontology based on COPPA and CCPA, allowing us to identify considerably more data types than prior work. Second, we apply DiffAudit to a set of popular general audience mobile and web services and observe a rich set of behaviors extracted from over 440K outgoing requests, containing 3,968 unique data types we extracted and classified. We reveal problematic data processing practices prior to consent and age disclosure, lack of differentiation between age-specific data flows, inconsistent privacy policy disclosures, and sharing of linkable data with third parties, including advertising and tracking services.
more »
« less
- PAR ID:
- 10585587
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400705922
- Page Range / eLocation ID:
- 488 to 504
- Format(s):
- Medium: X
- Location:
- Madrid Spain
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)User engagement with data privacy and security through consent banners has become a ubiquitous part of interacting with internet services. While previous work has addressed consent banners from either interaction design, legal, and ethics-focused perspectives, little research addresses the connections among multiple disciplinary approaches, including tensions and opportunities that transcend disciplinary boundaries. In this paper, we draw together perspectives and commentary from HCI, design, privacy and data protection, and legal research communities, using the language and strategies of “dark patterns” to perform an interaction criticism reading of three different types of consent banners. Our analysis builds upon designer, interface, user, and social context lenses to raise tensions and synergies that arise together in complex, contingent, and conflicting ways in the act of designing consent banners. We conclude with opportunities for transdisciplinary dialogue across legal, ethical, computer science, and interactive systems scholarship to translate matters of ethical concern into public policy.more » « less
-
While the United States currently has no comprehensive privacy law, the Children’s Online Privacy Protection Act (“COPPA”) has been in effect for over twenty years. As a result, the study of compliance issues among child-directed online services can yield important lessons for future enforcement efforts and can be used to inform the design of future state and federal privacy laws designed to protect people of all ages. This Essay describes relevant research conducted to understand privacy compliance issues and how that has led the author to several recommendations for how privacy enforcement can be improved more generally. While these recommendations are informed by the study of child-directed services’ compliance with COPPA, they are applicable to future state and federal privacy laws aimed at protecting the general public (i.e., not just children). Despite evidence of thousands of COPPA violations (e.g., one study found evidence that a majority of child-directed mo-bile apps appeared to be violating COPPA in various ways), the Federal Trade Commission (“FTC”) and state attorneys general — the only entities with enforcement authority under the law — pursue few enforcement efforts each year. Despite having competent personnel, these organizations are heavily constrained and under-resourced — as a result, enforcement by regulators is simply not seen as a credible threat by software developers. Research has found that developers are much more concerned with apps being removed from app stores (i.e., due to enforcement of platforms’ terms of service) than with the largely theoretical threat of regulatory enforcement. Yet the burden of COPPA compliance largely rests on numerous individual app developers. Thus, shifting enforcement efforts to the far-fewer platforms that distribute the apps (and make representations about their privacy and security properties) and data recipients (who ultimately receive consumers’ identifiable data) would likely yield better outcomes for consumers, while allowing the FTC to better focus its enforcement efforts and have greater impact. Based on these observations, this Essay proposes a new enforcement framework. In this framework, compliance burdens are shifted away from the numerous individual online services to the fewer bigger players who are best positioned to comply: platforms and third-party data recipients. The FTC’s limited resources can then focus on those entities at the top of the data food chain. Enforcement targeting the other, more numerous, individual online services could be left to a novel mechanism that uses a private right of action to foster more robust industry self-regulation through FTC-approved certification programs.more » « less
-
This article is an exploratory analysis of the impact of the California Consumer Privacy Act (CCPA) on data breaches that result in exposing sensitive private data of consumers. The CCPA applies to large for-profit businesses that collect and disseminate personal information of Californian consumers. It provides for consumer rights and imposes notification and security requirements on businesses that collect private information. We analyzed how CCPA affects data breach notifications that are required by the state's Office of Auditor General, for the period 2012 to 2023. The analysis provides interesting insights into the impact of CCPA on the pattern of data breaches. Our principal finding is that privacy breaches reduced to some extent after CCPA. Importantly, CCPA has helped in the overall improvement in reporting privacy breaches. We surmise that the CCPA brought more data breaches into light.more » « less
-
As new laws governing management of personal data are introduced, e.g., the European Union’s General Data Protection Regulation of 2016 and the California Consumer Privacy Act of 2018, compliance with data governance legislation is becoming an increasingly important aspect of data management. An important component of many data privacy laws is that they require companies to only use an individual’s data for a purpose the individual has explicitly consented to. Prior methods for enforcing consent for aggregate queries either use access control to eliminate data without consent from query evaluation or apply differential privacy algorithms to inject synthetic noise into the outcomes of queries (or input data) to ensure that the anonymity of non-consenting individuals is preserved with high probability. Both approaches return query results that differ from the ground truth results corresponding to the full input containing data from both consenting and non-consenting individuals. We present an alternative frame- work for group-by aggregate queries, tailored for applications, e.g., medicine, where even a small deviation from the correct answer to a query cannot be tolerated. Our approach uses provenance to determine, for each output tuple of a group-by aggregate query, which individual’s data was used to derive the result for this group. We then use statistical tests to determine how likely it is that the presence of data for a non-consenting individual will be revealed by such an output tuple. We filter out tuples for which this test fails, i.e., which are deemed likely to reveal non-consenting data. Thus, our approach always returns a subset of the ground truth query answers. Our experiments successfully return only 100% accurate results in instances where access control or differential privacy would have either returned less total or less accurate results.more » « less
An official website of the United States government
