skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Automated Analysis of Changes in Privacy Policies: A Structured Self-Attentive Sentence Embedding Approach
The increasing societal concern for consumer information privacy has led to the enforcement of privacy regulations worldwide. In an effort to adhere to privacy regulations such as the General Data Protection Regulation (GDPR), many companies’ privacy policies have become increasingly lengthy and complex. In this study, we adopted the computational design science paradigm to design a novel privacy policy evolution analytics framework to help identify how companies change and present their privacy policies based on privacy regulations. The framework includes a self-attentive annotation system (SAAS) that automatically annotates paragraph-length segments in privacy policies to help stakeholders identify data practices of interest for further investigation. We rigorously evaluated SAAS against state-of-the-art machine learning (ML) and deep learning (DL)-based methods on a well-established privacy policy dataset, OPP-115. SAAS outperformed conventional ML and DL models in terms of F1-score by statistically significant margins. We demonstrate the proposed framework’s practical utility with an in-depth case study of GDPR’s impact on Amazon’s privacy policies. The case study results indicate that Amazon’s post-GDPR privacy policy potentially violates a fundamental principle of GDPR by causing consumers to exert more effort to find information about first-party data collection. Given the increasing importance of consumer information privacy, the proposed framework has important implications for regulators and companies. We discuss several design principles followed by the SAAS that can help guide future design science-based e-commerce, health, and privacy research.  more » « less
Award ID(s):
2338479 2319325 1946537
PAR ID:
10611541
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
MIS Quarterly
Date Published:
Journal Name:
MIS Quarterly
Volume:
48
Issue:
4
ISSN:
0276-7783
Page Range / eLocation ID:
1453 to 1482
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The European General Data Protection Regulation (GDPR) mandates a data controller (e.g., an app developer) to provide all information specified in Articles (Arts.) 13 and 14 to data subjects (e.g., app users) regarding how their data are being processed and what are their rights. While some studies have started to detect the fulfillment of GDPR requirements in a privacy policy, their exploration only focused on a subset of mandatory GDPR requirements. In this paper, our goal is to explore the state of GDPR-completeness violations in mobile apps' privacy policies. To achieve our goal, we design the PolicyChecker framework by taking a rule and semantic role based approach. PolicyChecker automatically detects completeness violations in privacy policies based not only on all mandatory GDPR requirements but also on all if-applicable GDPR requirements that will become mandatory under specific conditions. Using PolicyChecker, we conduct the first large-scale GDPR-completeness violation study on 205,973 privacy policies of Android apps in the UK Google Play store. PolicyChecker identified 163,068 (79.2%) privacy policies containing data collection statements; therefore, such policies are regulated by GDPR requirements. However, the majority (99.3%) of them failed to achieve the GDPR-completeness with at least one unsatisfied requirement; 98.1% of them had at least one unsatisfied mandatory requirement, while 73.0% of them had at least one unsatisfied if-applicable requirement logic chain. We conjecture that controllers' lack of understanding of some GDPR requirements and their poor practices in composing a privacy policy can be the potential major causes behind the GDPR-completeness violations. We further discuss recommendations for app developers to improve the completeness of their apps' privacy policies to provide a more transparent personal data processing environment to users. 
    more » « less
  2. Data privacy regulations like GDPR and CCPA define a right of access empowering consumers to view the data companies store about them. Companies satisfy these requirements in part via data downloads, or downloadable archives containing this information. Data downloads vary in format, organization, comprehensiveness, and content. It is unknown, however, whether current data downloads actually achieve the transparency goals embodied by the right of access. In this paper, we report on the first exploration of the design of data downloads. Through 12 focus groups involving 42 participants, we gathered reactions to six companies’ data downloads. Using co-design techniques, we solicited ideas for future data download designs, formats, and tools. Most participants indicated that current offerings need improvement to be useful, emphasizing the need for better filtration, visualization, and summarization to help them hone in on key information. 
    more » « less
  3. The development of tools and techniques to analyze and extract organizations’ data habits from privacy policies are critical for scalable regulatory compliance audits. Unfortunately, these tools are becoming increasingly limited in their ability to identify compliance issues and fixes. After all, most were developed using regulationagnostic datasets of annotated privacy policies obtained from a time before the introduction of landmark privacy regulations such as EU’s GDPR and California’s CCPA. In this paper, we describe the first open regulation-aware dataset of expert-annotated privacy policies, C3PA (CCPA Privacy Policy Provision Annotations), aimed to address this challenge. C3PA contains over 48K expert-labeled privacy policy text segments associated with responses to CCPA-specific disclosure mandates from 411 unique organizations. We demonstrate that the C3PA dataset is uniquely suited for aiding automated audits of compliance with CCPA-related disclosure mandates. 
    more » « less
  4. Villata, S. (Ed.)
    The European Union’s General Data Protection Regulation (GDPR) has compelled businesses and other organizations to update their privacy policies to state specific information about their data practices. Simultaneously, researchers in natural language processing (NLP) have developed corpora and annotation schemes for extracting salient information from privacy policies, often independently of specific laws. To connect existing NLP research on privacy policies with the GDPR, we introduce a mapping from GDPR provisions to the OPP-115 annotation scheme, which serves as the basis for a growing number of projects to automatically classify privacy policy text. We show that assumptions made in the annotation scheme about the essential topics for a privacy policy reflect many of the same topics that the GDPR requires in these documents. This suggests that OPP-115 continues to be representative of the anatomy of a legally compliant privacy policy, and that the legal assumptions behind it represent the elements of data processing that ought to be disclosed within a policy for transparency. The correspondences we show between OPP-115 and the GDPR suggest the feasibility of bridging existing computational and legal research on privacy policies, benefiting both areas. 
    more » « less
  5. null (Ed.)
    The European Union’s General Data Protection Regulation (GDPR) has compelled businesses and other organizations to update their privacy policies to state specific information about their data practices. Simultaneously, researchers in natural language processing (NLP) have developed corpora and annotation schemes for extracting salient information from privacy policies, often independently of specific laws. To connect existing NLP research on privacy policies with the GDPR, we introduce a mapping from GDPR provisions to the OPP-115 annotation scheme, which serves as the basis for a growing number of projects to automatically classify privacy policy text. We show that assumptions made in the annotation scheme about the essential topics for a privacy policy reflect many of the same topics that the GDPR requires in these documents. This suggests that OPP-115 continues to be representative of the anatomy of a legally compliant privacy policy, and that the legal assumptions behind it represent the elements of data processing that ought to be disclosed within a policy for transparency. The correspondences we show between OPP-115 and the GDPR suggest the feasibility of bridging existing computational and legal research on privacy policies, benefiting both areas. 
    more » « less