skip to main content


Title: Privacy-preserving Travel Time Prediction with Uncertainty Using GPS Trace Data
The rapid growth of GPS technology and mobile devices has led to a massive accumulation of location data, bringing considerable benefits to individuals and society. One of the major usages of such data is travel time prediction, a typical service provided by GPS navigation devices and apps. Meanwhile, the constant collection and analysis of the individual location data also pose unprecedented privacy threats. We leverage the notion of geo-indistinguishability, an extension of differential privacy to the location privacy setting, and propose a procedure for privacy-preserving travel time prediction without collecting actual individual GPS trace data. We propose new concepts to examine the impact of the geo-indistinguishability sanitization on the usefulness of GPS traces and provide analytical and experimental utility analysis for privacy-preserving travel time prediction. We also propose new metrics to measure the adversary error in learning individual GPS traces from the collected sanitized data. Our experiment results suggest that the proposed procedure provides travel time analysis with satisfactory accuracy at reasonably small privacy costs.  more » « less
Award ID(s):
1717417
NSF-PAR ID:
10311800
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Mobile Computing
ISSN:
1536-1233
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The emergence of mobile apps (e.g., location-based services, geo-social networks, ride-sharing) led to the collection of vast amounts of trajectory data that greatly benefit the understanding of individual mobility. One problem of particular interest is next-location prediction, which facilitates location-based advertising, point-of-interest recommendation, traffic optimization,etc. However, using individual trajectories to build prediction models introduces serious privacy concerns, since exact whereabouts of users can disclose sensitive information such as their health status or lifestyle choices. Several research efforts focused on privacy-preserving next-location prediction, but they have serious limitations: some use outdated privacy models (e.g., k-anonymity), while others employ learning models with limited expressivity (e.g., matrix factorization). More recent approaches(e.g., DP-SGD) integrate the powerful differential privacy model with neural networks, but they provide only generic and difficult-to-tune methods that do not perform well on location data, which is inherently skewed and sparse.We propose a technique that builds upon DP-SGD, but adapts it for the requirements of next-location prediction. We focus on user-level privacy, a strong privacy guarantee that protects users regardless of how much data they contribute. Central to our approach is the use of the skip-gram model, and its negative sampling technique. Our work is the first to propose differentially-private learning with skip-grams. In addition, we devise data grouping techniques within the skip-gram framework that pool together trajectories from multiple users in order to accelerate learning and improve model accuracy. Experiments conducted on real datasets demonstrate that our approach significantly boosts prediction accuracy compared to existing DP-SGD techniques. 
    more » « less
  2. The emergence of mobile apps (e.g., location-based services,geo-social networks, ride-sharing) led to the collection of vast amounts of trajectory data that greatly benefit the understanding of individual mobility. One problem of particular interest is next-location prediction, which facilitates location-based advertising, point-of-interest recommendation, traffic optimization,etc. However, using individual trajectories to build prediction models introduces serious privacy concerns, since exact whereabouts of users can disclose sensitive information such as their health status or lifestyle choices. Several research efforts focused on privacy-preserving next-location prediction, but they have serious limitations: some use outdated privacy models (e.g., k-anonymity), while others employ learning models with limited expressivity (e.g., matrix factorization). More recent approaches(e.g., DP-SGD) integrate the powerful differential privacy model with neural networks, but they provide only generic and difficult-to-tune methods that do not perform well on location data, which is inherently skewed and sparse.We propose a technique that builds upon DP-SGD, but adapts it for the requirements of next-location prediction. We focus on user-level privacy, a strong privacy guarantee that protects users regardless of how much data they contribute. Central toour approach is the use of the skip-gram model, and its negative sampling technique. Our work is the first to propose differentially-private learning with skip-grams. In addition, we devise data grouping techniques within the skip-gram framework that pool together trajectories from multiple users in order to acceleratelearning and improve model accuracy. Experiments conducted on real datasets demonstrate that our approach significantly boosts prediction accuracy compared to existing DP-SGD techniques. 
    more » « less
  3. Summary

    The ubiquitous use of location‐based services (LBS) through smart devices produces massive amounts of location data. An attacker, with an access to such data, can reveal sensitive information about users. In this paper, we study location inference attacks based on the probability distribution of historical location data, travel time information between locations using knowledge of a map, and short and long‐term observation of privacy‐preserving queries. We show that existing privacy‐preserving approaches are vulnerable to such attacks. In this context, we propose a novel location privacy‐preserving approach, called KLAP, based on the three fundamental obfuscation requirements: minimumk‐locations,l‐diversity, and privacyareapreservation. KLAP adopts a personalized privacy preference for sporadic, frequent, and continuous LBS use cases. Specifically, it generates a secure concealing region (CR) to obfuscate the user's location and directs that CR to the service provider. The main contribution of this work is twofold. First, a CR pruning technique is devised to establish a balance between privacy and delay in LBS usage. Second, a new attack model called a long‐term obfuscated location tracking attack, and its countermeasure is proposed and evaluated both theoretically and empirically. We assess KLAP with two real‐world datasets. Experimental results show that it can achieve better privacy, reduced delay, and lower communication costs than existing state‐of‐the‐art methods.

     
    more » « less
  4. null (Ed.)
    Abstract Background Personal privacy is a significant concern in the era of big data. In the field of health geography, personal health data are collected with geographic location information which may increase disclosure risk and threaten personal geoprivacy. Geomasking is used to protect individuals’ geoprivacy by masking the geographic location information, and spatial k-anonymity is widely used to measure the disclosure risk after geomasking is applied. With the emergence of individual GPS trajectory datasets that contains large volumes of confidential geospatial information, disclosure risk can no longer be comprehensively assessed by the spatial k-anonymity method. Methods This study proposes and develops daily activity locations (DAL) k-anonymity as a new method for evaluating the disclosure risk of GPS data. Instead of calculating disclosure risk based on only one geographic location (e.g., home) of an individual, the new DAL k-anonymity is a composite evaluation of disclosure risk based on all activity locations of an individual and the time he/she spends at each location abstracted from GPS datasets. With a simulated individual GPS dataset, we present case studies of applying DAL k-anonymity in various scenarios to investigate its performance. The results of applying DAL k-anonymity are also compared with those obtained with spatial k-anonymity under these scenarios. Results The results of this study indicate that DAL k-anonymity provides a better estimation of the disclosure risk than does spatial k-anonymity. In various case-study scenarios of individual GPS data, DAL k-anonymity provides a more effective method for evaluating the disclosure risk by considering the probability of re-identifying an individual’s home and all the other daily activity locations. Conclusions This new method provides a quantitative means for understanding the disclosure risk of sharing or publishing GPS data. It also helps shed new light on the development of new geomasking methods for GPS datasets. Ultimately, the findings of this study will help to protect individual geoprivacy while benefiting the research community by promoting and facilitating geospatial data sharing. 
    more » « less
  5. Abstract Objective

    Emerging technologies (eg, wearable devices) have made it possible to collect data directly from individuals (eg, time-series), providing new insights on the health and well-being of individual patients. Broadening the access to these data would facilitate the integration with existing data sources (eg, clinical and genomic data) and advance medical research. Compared to traditional health data, these data are collected directly from individuals, are highly unique and provide fine-grained information, posing new privacy challenges. In this work, we study the applicability of a novel privacy model to enable individual-level time-series data sharing while maintaining the usability for data analytics.

    Methods and materials

    We propose a privacy-protecting method for sharing individual-level electrocardiography (ECG) time-series data, which leverages dimensional reduction technique and random sampling to achieve provable privacy protection. We show that our solution provides strong privacy protection against an informed adversarial model while enabling useful aggregate-level analysis.

    Results

    We conduct our evaluations on 2 real-world ECG datasets. Our empirical results show that the privacy risk is significantly reduced after sanitization while the data usability is retained for a variety of clinical tasks (eg, predictive modeling and clustering).

    Discussion

    Our study investigates the privacy risk in sharing individual-level ECG time-series data. We demonstrate that individual-level data can be highly unique, requiring new privacy solutions to protect data contributors.

    Conclusion

    The results suggest our proposed privacy-protection method provides strong privacy protections while preserving the usefulness of the data.

     
    more » « less