skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Pairing Users in Social Media via Processing Meta-data from Conversational Files
Massive amounts of data today are being generated from users engaging on social media. Despite knowing that whatever they post on social media can be viewed, downloaded and analyzed by unauthorized entities, a large number of people are still willing to compromise their privacy today. On the other hand though, this trend may change. Improved awareness on protecting content on social media, coupled with governments creating and enforcing data protection laws, mean that in the near future, users may become increasingly protective of what they share. Furthermore, new laws could limit what data social media companies can use without explicit consent from users. In this paper, we present and address a relatively new problem in privacy-preserved mining of social media logs. Specifically, the problem here is the feasibility of deriving the topology of network communications (i.e., match senders and receivers in a social network), but with only meta-data of conversational files that are shared by users, after anonymizing all identities and content. More explicitly, if users are willing to share only (a) whether a message was sent or received, (b) the temporal ordering of messages and (c) the length of each message (after anonymizing everything else, including usernames from their social media logs), how can the underlying topology of sender-receiver patterns be generated. To address this problem, we present a Dynamic Time Warping based solution that models the meta-data as a time series sequence. We present a formal algorithm and interesting results in multiple scenarios wherein users may or may not delete content arbitrarily before sharing. Our performance results are very favorable when applied in the context of Twitter. Towards the end of the paper, we also present interesting practical applications of our problem and solutions. To the best of our knowledge, the problem we address and the solution we propose are unique, and could provide important future perspectives on learning from privacy-preserving mining of social media logs.  more » « less
Award ID(s):
1718071
PAR ID:
10179453
Author(s) / Creator(s):
Date Published:
Journal Name:
Big-Data Analytics (BDA)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The volume, variety, and velocity of different data, e.g., simulation data, observation data, and social media data, are growing ever faster, posing grand challenges for data discovery. An increasing trend in data discovery is to mine hidden relationships among users and metadata from the web usage logs to support the data discovery process. Web usage log mining is the process of reconstructing sessions from raw logs and finding interesting patterns or implicit linkages. The mining results play an important role in improving quality of search-related components, e.g., ranking, query suggestion, and recommendation. While researches were done in the data discovery domain, collecting and analyzing logs efficiently remains a challenge because (1) the volume of web usage logs continues to grow as long as users access the data; (2) the dynamic volume of logs requires on-demand computing resources for mining tasks; (3) the mining process is compute-intensive and time-intensive. To speed up the mining process, we propose a cloud-based log-mining framework using Apache Spark and Elasticsearch. In addition, a data partition paradigm, logPartitioner, is designed to solve the data imbalance problem in data parallelism. As a proof of concept, oceanographic data search and access logs are chosen to validate performance of the proposed parallel log-mining framework. 
    more » « less
  2. Despite recent widespread deployment of differential privacy, relatively little is known about what users think of differential privacy. In this work, we seek to explore users' privacy expectations related to differential privacy. Specifically, we investigate (1) whether users care about the protections afforded by differential privacy, and (2) whether they are therefore more willing to share their data with differentially private systems. Further, we attempt to understand (3) users' privacy expectations of the differentially private systems they may encounter in practice and (4) their willingness to share data in such systems. To answer these questions, we use a series of rigorously conducted surveys (n=2424).   We find that users care about the kinds of information leaks against which differential privacy protects and are more willing to share their private information when the risks of these leaks are less likely to happen.  Additionally, we find that the ways in which differential privacy is described in-the-wild haphazardly set users' privacy expectations, which can be misleading depending on the deployment. We synthesize our results into a framework for understanding a user's willingness to share information with differentially private systems, which takes into account the interaction between the user's prior privacy concerns and how differential privacy is described. 
    more » « less
  3. Interdependent privacy (IDP) violations occur when users share personal information about others without permission, resulting in potential embarrassment, reputation loss, or harassment. There are several strategies that can be applied to protect IDP, but little is known regarding how social media users perceive IDP threats or how they prefer to respond to them. We utilized a mixed-method approach with a replication study to examine user beliefs about various government-, platform-, and user-level strategies for managing IDP violations. Participants reported that IDP represented a 'serious' online threat, and identified themselves as primarily responsible for responding to violations. IDP strategies that felt more familiar and provided greater perceived control over violations (e.g., flagging, blocking, unfriending) were rated as more effective than platform or government driven interventions. Furthermore, we found users were more willing to share on social media if they perceived their interactions as protected. Findings are discussed in relation to control paradox theory. 
    more » « less
  4. ‘Interdependent’ privacy violations occur when users share private photos and information about other people in social media without permission. This research investigated user characteristics associated with interdependent privacy perceptions, by asking social media users to rate photo-based memes depicting strangers on the degree to which they were too private to share. Users also completed questionnaires measuring social media usage and personality. Separate groups rated the memes on shareability, valence, and entertainment value. Users were less likely to share memes that were rated as private, except when the meme was entertaining or when users exhibited dark triad characteristics. Users with dark triad characteristics demonstrated a heightened awareness of interdependent privacy and increased sharing of others’ photos. A model is introduced that highlights user types and characteristics that correspond to different privacy preferences: privacy preservers, ignorers, and violators. We discuss how interventions to support interdependent privacy must effectively influence diverse users. 
    more » « less
  5. Understanding human mobility has become an important aspect of location-based services in tasks such as personalized recommendation and individual moving pattern recognition, enabled by the large volumes of data from geo-tagged social media (GTSM). Prior studies mainly focus on analyzing human historical footprints collected by GTSM and assuming the veracity of the data, which need not hold when some users are not willing to share their real footprints due to privacy concerns—thereby affecting reliability/authenticity. In this study, we address the problem of Inferring Real Mobility (IRMo) of users, from their unreliable historical traces. Tackling IRMo is a non-trivial task due to the: (1) sparsity of check-in data; (2) suspicious counterfeit check-in behaviors; and (3) unobserved dependencies in human trajectories. To address these issues, we develop a novel Graph-enhanced Attention model calledIRMoGA, which attempts to capture underlying mobility patterns and check-in correlations by exploiting the unreliable spatio-temporal data. Specifically, we incorporate the attention mechanism (rather than solely relying on traditional recursive models) to understand the regularity of human mobility, while employing a graph neural network to understand the mutual interactions from human historical check-ins and leveraging prior knowledge to alleviate the inferring bias. Our experiments conducted on four real-world datasets demonstrate the superior performance of IRMoGA over several state-of-the-art baselines, e.g., up to 39.16% improvement regarding the Recall score on Foursquare. 
    more » « less