skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A User-Centric and Sentiment Aware Privacy-Disclosure Detection Framework based on Multi-input Neural Network
Data and information privacy is a major concern of today’s world. More specifically, users’ digital privacy has become one of the most important issues to deal with, as advancements are being made in information sharing technology. An increasing number of users are sharing information through text messages, emails, and social media without proper awareness of privacy threats and their consequences. One approach to prevent the disclosure of private information is to identify them in a conversation and warn the dispatcher before the conveyance happens between the sender and the receiver. Another way of preventing information (sensitive) loss might be to analyze and sanitize a batch of offline documents when the data is already accumulated somewhere. However, automating the process of identifying user-centric privacy disclosure in textual data is challenging. This is because the natural language has an extremely rich form and structure with different levels of ambiguities. Therefore, we inquire after a potential framework that could bring this challenge within reach by precisely recognizing users’ privacy disclosures in a piece of text by taking into account - the authorship and sentiment (tone) of the content alongside the linguistic features and techniques. The proposed framework is considered as the supporting plugin to help text classification systems more accurately identify text that might disclose the author’s personal or private information.  more » « less
Award ID(s):
1657774
PAR ID:
10222649
Author(s) / Creator(s):
;
Date Published:
Journal Name:
the PrivateNLP 2020 Workshop on Privacy and Natural Language Processing Colocated with 13th ACM International WSDM Conference, 2020, in Houston, Texas, USA.
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The concern regarding users’ data privacy has risen to its highest level due to the massive increase in communication platforms, social networking sites, and greater users’ participation in online public discourse. An increasing number of people exchange private information via emails, text messages, and social media without being aware of the risks and implications. Researchers in the field of Natural Language Processing (NLP) have concentrated on creating tools and strategies to identify, categorize, and sanitize private information in text data since a substantial amount of data is exchanged in textual form. However, most of the detection methods solely rely on the existence of pre-identified keywords in the text and disregard the inference of underlying meaning of the utterance in a specific context. Hence, in some situations these tools and algorithms fail to detect disclosure, or the produced results are miss classified. In this paper, we propose a multi-input, multi-output hybrid neural network which utilizes transfer-learning, linguistics, and metadata to learn the hidden patterns. Our goal is to better classify disclosure/non-disclosure content in terms of the context of situation. We trained and evaluated our model on a human-annotated ground truth dataset, containing a total of 5,400 tweets. The results show that the proposed model was able to identify privacy disclosure through tweets with an accuracy of 77.4% while classifying the information type of those tweets with an impressive accuracy of 99%, by jointly learning for two separate tasks. 
    more » « less
  2. null (Ed.)
    An increasing number of people are sharing information through text messages, emails, and social media without proper privacy checks. In many situations, this could lead to serious privacy threats. This paper presents a methodology for providing extra safety precautions without being intrusive to users. We have developed and evaluated a model to help users take control of their shared information by automatically identifying text (i.e., a sentence or a transcribed utterance) that might contain personal or private disclosures. We apply off-the-shelf natural language processing tools to derive linguistic features such as part-of-speech, syntactic dependencies, and entity relations. From these features, we model and train a multichannel convolutional neural network as a classifier to identify short texts that have personal, private disclosures. We show how our model can notify users if a piece of text discloses personal or private information, and evaluate our approach in a binary classification task with 93% accuracy on our own labeled dataset, and 86% on a dataset of ground truth. Unlike document classification tasks in the area of natural language processing, our framework is developed keeping the sentence level context into consideration. 
    more » « less
  3. Voluntary sharing of personal information is at the heart of user engagement on social media and central to platforms' business models. From the users' perspective, so-called self-disclosure is closely connected with both privacy risks and social rewards. Prior work has studied contextual influences on self-disclosure, from platform affordances and interface design to user demographics and perceived social capital. Our work takes a mixed-methods approach to understand the contextual information which might be integrated in the development of privacy-enhancing technologies. Through observational study of several Reddit communities, we explore the ways in which topic of discussion, group norms, peer effects, and audience size are correlated with personal information sharing. We then build and test a prototype privacy-enhancing tool that exposes these contextual factors. Our work culminates in a browser extension that automatically detects instances of self-disclosure in Reddit posts at the time of posting and provides additional context to users before they post to support enhanced privacy decision-making. We share this prototype with social media users, solicit their feedback, and outline a path forward for privacy-enhancing technologies in this space. 
    more » « less
  4. null (Ed.)
    Abstract Smartphone location sharing is a particularly sensitive type of information disclosure that has implications for users’ digital privacy and security as well as their physical safety. To understand and predict location disclosure behavior, we developed an Android app that scraped metadata from users’ phones, asked them to grant the location-sharing permission to the app, and administered a survey. We compared the effectiveness of using self-report measures commonly used in the social sciences, behavioral data collected from users’ mobile phones, and a new type of measure that we developed, representing a hybrid of self-report and behavioral data to contextualize users’ attitudes toward their past location-sharing behaviors. This new type of measure is based on a reflective learning paradigm where individuals reflect on past behavior to inform future behavior. Based on data from 380 Android smartphone users, we found that the best predictors of whether participants granted the location-sharing permission to our app were: behavioral intention to share information with apps, the “FYI” communication style, and one of our new hybrid measures asking users whether they were comfortable sharing location with apps currently installed on their smartphones. Our novel, hybrid construct of self-reflection on past behavior significantly improves predictive power and shows the importance of combining social science and computational science approaches for improving the prediction of users’ privacy behaviors. Further, when assessing the construct validity of the Behavioral Intention construct drawn from previous location-sharing research, our data showed a clear distinction between two different types of Behavioral Intention: self-reported intention to use mobile apps versus the intention to share information with these apps. This finding suggests that users desire the ability to use mobile apps without being required to share sensitive information, such as their location. These results have important implications for cybersecurity research and system design to meet users’ location-sharing privacy needs. 
    more » « less
  5. null (Ed.)
    With the growth of Internet in many different aspects of life, users are required to share private information more than ever. Hence, users need a privacy management tool that can enforce complex and customized privacy policies. In this paper, we propose a privacy management system that not only allows users to define complex privacy policies for data sharing actions, but also monitors users' behavior and relationships to generate realistic policies. In addition, the proposed system utilizes formal modeling and model-checking approach to prove that information disclosures are valid and privacy policies are consistent with one another 
    more » « less