skip to main content

Title: A User-Centric and Sentiment Aware Privacy-Disclosure Detection Framework based on Multi-input Neural Network
Data and information privacy is a major concern of today’s world. More specifically, users’ digital privacy has become one of the most important issues to deal with, as advancements are being made in information sharing technology. An increasing number of users are sharing information through text messages, emails, and social media without proper awareness of privacy threats and their consequences. One approach to prevent the disclosure of private information is to identify them in a conversation and warn the dispatcher before the conveyance happens between the sender and the receiver. Another way of preventing information (sensitive) loss might be to analyze and sanitize a batch of offline documents when the data is already accumulated somewhere. However, automating the process of identifying user-centric privacy disclosure in textual data is challenging. This is because the natural language has an extremely rich form and structure with different levels of ambiguities. Therefore, we inquire after a potential framework that could bring this challenge within reach by precisely recognizing users’ privacy disclosures in a piece of text by taking into account - the authorship and sentiment (tone) of the content alongside the linguistic features and techniques. The proposed framework is considered as the supporting plugin to help text classification systems more accurately identify text that might disclose the author’s personal or private information.
Authors:
;
Award ID(s):
1657774
Publication Date:
NSF-PAR ID:
10222649
Journal Name:
the PrivateNLP 2020 Workshop on Privacy and Natural Language Processing Colocated with 13th ACM International WSDM Conference, 2020, in Houston, Texas, USA.
Sponsoring Org:
National Science Foundation
More Like this
  1. The concern regarding users’ data privacy has risen to its highest level due to the massive increase in communication platforms, social networking sites, and greater users’ participation in online public discourse. An increasing number of people exchange private information via emails, text messages, and social media without being aware of the risks and implications. Researchers in the field of Natural Language Processing (NLP) have concentrated on creating tools and strategies to identify, categorize, and sanitize private information in text data since a substantial amount of data is exchanged in textual form. However, most of the detection methods solely rely onmore »the existence of pre-identified keywords in the text and disregard the inference of underlying meaning of the utterance in a specific context. Hence, in some situations these tools and algorithms fail to detect disclosure, or the produced results are miss classified. In this paper, we propose a multi-input, multi-output hybrid neural network which utilizes transfer-learning, linguistics, and metadata to learn the hidden patterns. Our goal is to better classify disclosure/non-disclosure content in terms of the context of situation. We trained and evaluated our model on a human-annotated ground truth dataset, containing a total of 5,400 tweets. The results show that the proposed model was able to identify privacy disclosure through tweets with an accuracy of 77.4% while classifying the information type of those tweets with an impressive accuracy of 99%, by jointly learning for two separate tasks.

    « less
  2. An increasing number of people are sharing information through text messages, emails, and social media without proper privacy checks. In many situations, this could lead to serious privacy threats. This paper presents a methodology for providing extra safety precautions without being intrusive to users. We have developed and evaluated a model to help users take control of their shared information by automatically identifying text (i.e., a sentence or a transcribed utterance) that might contain personal or private disclosures. We apply off-the-shelf natural language processing tools to derive linguistic features such as part-of-speech, syntactic dependencies, and entity relations. From these features,more »we model and train a multichannel convolutional neural network as a classifier to identify short texts that have personal, private disclosures. We show how our model can notify users if a piece of text discloses personal or private information, and evaluate our approach in a binary classification task with 93% accuracy on our own labeled dataset, and 86% on a dataset of ground truth. Unlike document classification tasks in the area of natural language processing, our framework is developed keeping the sentence level context into consideration.« less
  3. Abstract Smartphone location sharing is a particularly sensitive type of information disclosure that has implications for users’ digital privacy and security as well as their physical safety. To understand and predict location disclosure behavior, we developed an Android app that scraped metadata from users’ phones, asked them to grant the location-sharing permission to the app, and administered a survey. We compared the effectiveness of using self-report measures commonly used in the social sciences, behavioral data collected from users’ mobile phones, and a new type of measure that we developed, representing a hybrid of self-report and behavioral data to contextualize users’more »attitudes toward their past location-sharing behaviors. This new type of measure is based on a reflective learning paradigm where individuals reflect on past behavior to inform future behavior. Based on data from 380 Android smartphone users, we found that the best predictors of whether participants granted the location-sharing permission to our app were: behavioral intention to share information with apps, the “FYI” communication style, and one of our new hybrid measures asking users whether they were comfortable sharing location with apps currently installed on their smartphones. Our novel, hybrid construct of self-reflection on past behavior significantly improves predictive power and shows the importance of combining social science and computational science approaches for improving the prediction of users’ privacy behaviors. Further, when assessing the construct validity of the Behavioral Intention construct drawn from previous location-sharing research, our data showed a clear distinction between two different types of Behavioral Intention: self-reported intention to use mobile apps versus the intention to share information with these apps. This finding suggests that users desire the ability to use mobile apps without being required to share sensitive information, such as their location. These results have important implications for cybersecurity research and system design to meet users’ location-sharing privacy needs.« less
  4. The emergence of mobile apps (e.g., location-based services, geo-social networks, ride-sharing) led to the collection of vast amounts of trajectory data that greatly benefit the understanding of individual mobility. One problem of particular interest is next-location prediction, which facilitates location-based advertising, point-of-interest recommendation, traffic optimization,etc. However, using individual trajectories to build prediction models introduces serious privacy concerns, since exact whereabouts of users can disclose sensitive information such as their health status or lifestyle choices. Several research efforts focused on privacy-preserving next-location prediction, but they have serious limitations: some use outdated privacy models (e.g., k-anonymity), while others employ learning models withmore »limited expressivity (e.g., matrix factorization). More recent approaches(e.g., DP-SGD) integrate the powerful differential privacy model with neural networks, but they provide only generic and difficult-to-tune methods that do not perform well on location data, which is inherently skewed and sparse.We propose a technique that builds upon DP-SGD, but adapts it for the requirements of next-location prediction. We focus on user-level privacy, a strong privacy guarantee that protects users regardless of how much data they contribute. Central to our approach is the use of the skip-gram model, and its negative sampling technique. Our work is the first to propose differentially-private learning with skip-grams. In addition, we devise data grouping techniques within the skip-gram framework that pool together trajectories from multiple users in order to accelerate learning and improve model accuracy. Experiments conducted on real datasets demonstrate that our approach significantly boosts prediction accuracy compared to existing DP-SGD techniques.« less
  5. The emergence of mobile apps (e.g., location-based services,geo-social networks, ride-sharing) led to the collection of vast amounts of trajectory data that greatly benefit the understanding of individual mobility. One problem of particular interest is next-location prediction, which facilitates location-based advertising, point-of-interest recommendation, traffic optimization,etc. However, using individual trajectories to build prediction models introduces serious privacy concerns, since exact whereabouts of users can disclose sensitive information such as their health status or lifestyle choices. Several research efforts focused on privacy-preserving next-location prediction, but they have serious limitations: some use outdated privacy models (e.g., k-anonymity), while others employ learning models with limitedmore »expressivity (e.g., matrix factorization). More recent approaches(e.g., DP-SGD) integrate the powerful differential privacy model with neural networks, but they provide only generic and difficult-to-tune methods that do not perform well on location data, which is inherently skewed and sparse.We propose a technique that builds upon DP-SGD, but adapts it for the requirements of next-location prediction. We focus on user-level privacy, a strong privacy guarantee that protects users regardless of how much data they contribute. Central toour approach is the use of the skip-gram model, and its negative sampling technique. Our work is the first to propose differentially-private learning with skip-grams. In addition, we devise data grouping techniques within the skip-gram framework that pool together trajectories from multiple users in order to acceleratelearning and improve model accuracy. Experiments conducted on real datasets demonstrate that our approach significantly boosts prediction accuracy compared to existing DP-SGD techniques.« less