skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Title: TransRisk: Mobility Privacy Risk Prediction based on Transferred Knowledge
Human mobility data may lead to privacy concerns because a resident can be re-identified from these data by malicious attacks even with anonymized user IDs. For an urban service collecting mobility data, an efficient privacy risk assessment is essential for the privacy protection of its users. The existing methods enable efficient privacy risk assessments for service operators to fast adjust the quality of sensing data to lower privacy risk by using prediction models. However, for these prediction models, most of them require massive training data, which has to be collected and stored first. Such a large-scale long-term training data collection contradicts the purpose of privacy risk prediction for new urban services, which is to ensure that the quality of high-risk human mobility data is adjusted to low privacy risk within a short time. To solve this problem, we present a privacy risk prediction model based on transfer learning, i.e., TransRisk, to predict the privacy risk for a new target urban service through (1) small-scale short-term data of its own, and (2) the knowledge learned from data from other existing urban services. We envision the application of TransRisk on the traffic camera surveillance system and evaluate it with real-world mobility datasets already collected in a Chinese city, Shenzhen, including four source datasets, i.e., (i) one call detail record dataset (CDR) with 1.2 million users; (ii) one cellphone connection data dataset (CONN) with 1.2 million users; (iii) a vehicular GPS dataset (Vehicles) with 10 thousand vehicles; (iv) an electronic toll collection transaction dataset (ETC) with 156 thousand users, and a target dataset, i.e., a camera dataset (Camera) with 248 cameras. The results show that our model outperforms the state-of-the-art methods in terms of RMSE and MAE. Our work also provides valuable insights and implications on mobility data privacy risk assessment for both current and future large-scale services.  more » « less
Award ID(s):
1849238 2047822 1951890 1952096 2003874 1932223 2002985
PAR ID:
10436081
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Volume:
6
Issue:
2
ISSN:
2474-9567
Page Range / eLocation ID:
1 to 19
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Urban anomalies have a large impact on passengers' travel behavior and city infrastructures, which can cause uncertainty on travel time estimation. Understanding the impact of urban anomalies on travel time is of great value for various applications such as urban planning, human mobility studies and navigation systems. Most existing studies on travel time have been focused on the total riding time between two locations on an individual transportation modality. However, passengers often take different modes of transportation, e.g., taxis, subways, buses or private vehicles, and a significant portion of the travel time is spent in the uncertain waiting. In this paper, we study the fine-grained travel time patterns in multiple transportation systems under the impact of urban anomalies. Specifically, (i) we investigate implicit components, including waiting and riding time, in multiple transportation systems; (ii) we measure the impact of real-world anomalies on travel time components; (iii) we design a learning-based model for travel time component prediction with anomalies. Different from existing studies, we implement and evaluate our measurement framework on multiple data sources including four city-scale transportation systems, which are (i) a 14-thousand taxicab network, (ii) a 13-thousand bus network, (iii) a 10-thousand private vehicle network, and (iv) an automatic fare collection system for a public transit network (i.e., subway and bus) with 5 million smart cards. 
    more » « less
  2. Recently, the ubiquity of mobile devices leads to an increasing demand of public network services, e.g., WiFi hot spots. As a part of this trend, modern transportation systems are equipped with public WiFi devices to provide Internet access for passengers as people spend a large amount of time on public transportation in their daily life. However, one of the key issues in public WiFi spots is the privacy concern due to its open access nature. Existing works either studied location privacy risk in human traces or privacy leakage in private networks such as cellular networks based on the data from cellular carriers. To the best of our knowledge, none of these work has been focused on bus WiFi privacy based on large-scale real-world data. In this paper, to explore the privacy risk in bus WiFi systems, we focus on two key questions how likely bus WiFi users can be uniquely re-identified if partial usage information is leaked and how we can protect users from the leaked information. To understand the above questions, we conduct a case study in a large-scale bus WiFi system, which contains 20 million connection records and 78 million location records from 770 thousand bus WiFi users during a two-month period. Technically, we design two models for our uniqueness analyses and protection, i.e., a PB-FIND model to identify the probability a user can be uniquely re-identified from leaked information; a PB-HIDE model to protect users from potentially leaked information. Specifically, we systematically measure the user uniqueness on users' finger traces (i.e., connection URL and domain), foot traces (i.e., locations), and hybrid traces (i.e., both finger and foot traces). Our measurement results reveal (i) 97.8% users can be uniquely re-identified by 4 random domain records of their finger traces and 96.2% users can be uniquely re-identified by 5 random locations on buses; (ii) 98.1% users can be uniquely re-identified by only 2 random records if both their connection records and locations are leaked to attackers. Moreover, the evaluation results show our PB-HIDE algorithm protects more than 95% users from the potentially leaked information by inserting only 1.5% synthetic records in the original dataset to preserve their data utility. 
    more » « less
  3. With the trend of vehicles becoming increasingly connected and potentially autonomous, vehicles are being equipped with rich sensing and communication devices. Various vehicular services based on shared real-time sensor data of vehicles from a fleet have been proposed to improve the urban efficiency, e.g., HD-live map, and traffic accident recovery. However, due to the high cost of data uploading (e.g., monthly fees for a cellular network), it would be impractical to make all well-equipped vehicles to upload real-time sensor data constantly. To better utilize these limited uploading resources and achieve an optimal road segment sensing coverage, we present a real-time sensing task scheduling framework, i.e., RISC, for Resource-Constraint modeling for urban sensing by scheduling sensing tasks of commercial vehicles with sensors based on the predictability of vehicles' mobility patterns. In particular, we utilize the commercial vehicles, including taxicabs, buses, and logistics trucks as mobile sensors to sense urban phenomena, e.g., traffic, by using the equipped vehicular sensors, e.g., dash-cam, lidar, automotive radar, etc. We implement RISC on a Chinese city Shenzhen with one-month real-world data from (i) a taxi fleet with 14 thousand vehicles; (ii) a bus fleet with 13 thousand vehicles; (iii) a truck fleet with 4 thousand vehicles. Further, we design an application, i.e., track suspect vehicles (e.g., hit-and-run vehicles), to evaluate the performance of RISC on the urban sensing aspect based on the data from a regular vehicle (i.e., personal car) fleet with 11 thousand vehicles. The evaluation results show that compared to the state-of-the-art solutions, we improved sensing coverage (i.e., the number of road segments covered by sensing vehicles) by 10% on average. 
    more » « less
  4. One of the most popular location privacy-preserving mechanisms applied in location-based services (LBS) is location obfuscation, where mobile users are allowed to report obfuscated locations instead of their real locations to services. Many existing obfuscation approaches consider mobile users that can move freely over a region. However, this is inadequate for protecting the location privacy of vehicles, as their mobility is restricted by external factors, such as road networks and traffic flows. This auxiliary information about external factors helps an attacker to shrink the search range of vehicles' locations, increasing the risk of location exposure. In this paper, we propose a vehicle traffic flow aware attack that leverages public traffic flow information to recover a vehicle's real location from obfuscated location. As a countermeasure, we then develop an adaptive strategy to obfuscate a vehicle's location by a "fake" trajectory that follows a realistic traffic flow. The fake trajectory is designed to not only hide the vehicle's real location but also guarantee the quality of service (QoS) of LBS. Our experimental results demonstrate that 1) the new threat model can accurately track vehicles' real locations, which have been obfuscated by two state-of-the-art algorithms, and 2) the proposed obfuscation method can effectively protect vehicles' location privacy under the new threat model without compromising QoS. 
    more » « less
  5. Mobile fitness tracking apps allow users to track their workouts and share them with friends through online social networks. Although the sharing of personal data is an inherent risk in all social networks, the dangers presented by sharing personal workouts comprised of geospatial and health data may prove especially grave. While fitness apps offer a variety of privacy features, at present it is unclear if these countermeasures are sufficient to thwart a determined attacker, nor is it clear how many of these services’ users are at risk. In this work, we perform a systematic analysis of privacy behaviors and threats in fitness tracking social networks. Collecting a month-long snapshot of public posts of a popular fitness tracking service (21 million posts, 3 million users), we observe that 16.5% of users make use of Endpoint Privacy Zones (EPZs), which conceal fitness activity near user-designated sensitive locations (e.g., home, office). We go on to develop an attack against EPZs that infers users’ protected locations from the remaining available information in public posts, discovering that 95.1% of moderately active users are at risk of having their protected locations extracted by an attacker. Finally, we consider the efficacy of state-of-the-art privacy mechanisms through adapting geo-indistinguishability techniques as well as developing a novel EPZ fuzzing technique. The affected companies have been notified of the discovered vulnerabilities and at the time of publication have incorporated our proposed countermeasures into their production systems. 
    more » « less