skip to main content


Title: Assessing Purpose-Extraction for Automated Corpora Annotations
Privacy policies contain important information regarding the collection and use of user’s data. As Internet of Things (IoT) devices have become popular during the last years, these policies have become important to protect IoT users from unwanted use of private data collected through them. However, IoT policies tend to be long thus discouraging users to read them. In this paper, we seek to create an automated and annotated corpus for IoT privacy policies through the use of natural language processing techniques. Our method extracts the purpose from privacy policies and allows users to quickly find the important information relevant to their data collection/use.  more » « less
Award ID(s):
1950416
NSF-PAR ID:
10387038
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of 2022 REUNS Workshop
Page Range / eLocation ID:
718 to 719
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Privacy policies, despite the important information they provide about the collection and use of one's data, tend to be skipped over by most Internet users. In this paper, we seek to make privacy policies more accessible by automatically classifying web privacy. We use natural language processing techniques and multiple machine learning models to determine the effectiveness of each method in the classification method. We also explore the effectiveness of these methods to classify privacy policies of Internet of Things (IoT) devices. 
    more » « less
  2. Furnell, Steven (Ed.)
    A huge amount of personal and sensitive data is shared on Facebook, which makes it a prime target for attackers. Adversaries can exploit third-party applications connected to a user’s Facebook profile (i.e., Facebook apps) to gain access to this personal information. Users’ lack of knowledge and the varying privacy policies of these apps make them further vulnerable to information leakage. However, little has been done to identify mismatches between users’ perceptions and the privacy policies of Facebook apps. We address this challenge in our work. We conducted a lab study with 31 participants, where we received data on how they share information in Facebook, their Facebook-related security and privacy practices, and their perceptions on the privacy aspects of 65 frequently-used Facebook apps in terms of data collection, sharing, and deletion. We then compared participants’ perceptions with the privacy policy of each reported app. Participants also reported their expectations about the types of information that should not be collected or shared by any Facebook app. Our analysis reveals significant mismatches between users’ privacy perceptions and reality (i.e., privacy policies of Facebook apps), where we identified over-optimism not only in users’ perceptions of information collection, but also on their self-efficacy in protecting their information in Facebook despite experiencing negative incidents in the past. To the best of our knowledge, this is the first study on the gap between users’ privacy perceptions around Facebook apps and the reality. The findings from this study offer directions for future research to address that gap through designing usable, effective, and personalized privacy notices to help users to make informed decisions about using Facebook apps. 
    more » « less
  3. Smart home devices transmit highly sensitive usage information to servers owned by vendors or third-parties as part of their core functionality. Hence, it is necessary to provide users with the context in which their device data is collected and shared, to enable them to weigh the benefits of deploying smart home technology against the resulting loss of privacy. As privacy policies are generally expected to precisely convey this information, we perform a systematic and data-driven analysis of the current state of smart home privacy policies, with a particular focus on three key questions: (1) how hard privacy policies are for consumers to obtain, (2) how existing policies describe the collection and sharing of device data, and (3) how accurate these descriptions are when compared to information derived from alternate sources. Our analysis of 596 smart home vendors, affecting 2, 442 smart home devices yields 17 findings that impact millions of users, demonstrate gaps in existing smart home privacy policies, as well as challenges and opportunities for automated analysis. 
    more » « less
  4. The European General Data Protection Regulation (GDPR) mandates a data controller (e.g., an app developer) to provide all information specified in Articles (Arts.) 13 and 14 to data subjects (e.g., app users) regarding how their data are being processed and what are their rights. While some studies have started to detect the fulfillment of GDPR requirements in a privacy policy, their exploration only focused on a subset of mandatory GDPR requirements. In this paper, our goal is to explore the state of GDPR-completeness violations in mobile apps' privacy policies. To achieve our goal, we design the PolicyChecker framework by taking a rule and semantic role based approach. PolicyChecker automatically detects completeness violations in privacy policies based not only on all mandatory GDPR requirements but also on all if-applicable GDPR requirements that will become mandatory under specific conditions. Using PolicyChecker, we conduct the first large-scale GDPR-completeness violation study on 205,973 privacy policies of Android apps in the UK Google Play store. PolicyChecker identified 163,068 (79.2%) privacy policies containing data collection statements; therefore, such policies are regulated by GDPR requirements. However, the majority (99.3%) of them failed to achieve the GDPR-completeness with at least one unsatisfied requirement; 98.1% of them had at least one unsatisfied mandatory requirement, while 73.0% of them had at least one unsatisfied if-applicable requirement logic chain. We conjecture that controllers' lack of understanding of some GDPR requirements and their poor practices in composing a privacy policy can be the potential major causes behind the GDPR-completeness violations. We further discuss recommendations for app developers to improve the completeness of their apps' privacy policies to provide a more transparent personal data processing environment to users. 
    more » « less
  5. The dominant privacy framework of the information age relies on notions of “notice and consent.” That is, service providers will disclose, often through privacy policies, their data collection practices, and users can then consent to their terms. However, it is unlikely that most users comprehend these disclosures, which is due in no small part to ambiguous, deceptive, and misleading statements. By comparing actual collection and sharing practices to disclosures in privacy policies, we demonstrate the scope of the problem. Through analysis of 68,051 apps from the Google Play Store, their corresponding privacy policies, and observed data transmissions, we investigated the potential misrepresentations of apps in the Designed For Families (DFF) program, inconsistencies in disclosures regarding third-party data sharing, as well as contradictory disclosures about secure data transmissions. We find that of the 8,030 DFF apps (i.e., apps directed at children), 9.1% claim that their apps are not directed at children, while 30.6% claim to have no knowledge that the received data comes from children. In addition, we observe that 10.5% of 68,051 apps share personal identifiers with third-party service providers, yet do not declare any in their privacy policies, and only 22.2% of the apps explicitly name third parties. This ultimately makes it not only difficult, but in most cases impossible, for users to establish where their personal data is being processed. Furthermore, we find that 9,424 apps do not use TLS when transmitting personal identifiers, yet 28.4% of these apps claim to take measures to secure data transfer. Ultimately, these divergences between disclosures and actual app behaviors illustrate the ridiculousness of the notice and consent framework. 
    more » « less