skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Fang, Zhihan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Understanding human mobility is of great significance for sustainable transportation planning. Long-term travel delay change is a key metric to measure human mobility evolution in cities. However, it is challenging to quantify the long-term travel delay because it happens in different modalities, e.g., subway, taxi, bus, and personal cars, with implicated coupling. More importantly, the data for long-term multi-modal delay modeling is challenging to obtain in practice. As a result, the existing travel delay measurements mainly focus on either single-modal system or short-term mobility patterns, which cannot reveal the long-term travel dynamics and the impact among multi-modal systems. In this paper, we perform a travel delay measurement study to quantify and understand long-term multi-modal travel delay. Our measurement study utilizes a 5-year dataset of 8 million residents from 2013 to 2017 including a subway system with 3 million daily passengers, a 15 thousand taxi system, a 10 thousand personal car system, and a 13 thousand bus system in the Chinese city Shenzhen. We share new observations as follows: (1) the aboveground system has a higher delay increase overall than that of the underground system but the increase of it is slow down; (2) the underground system infrastructure upgrades decreases the aboveground system travel delay increase in contrast to the increase the underground system travel delay caused by the aboveground system infrastructure upgrades; (3) the travel delays of the underground system decreases in the higher population region and during the peak hours. 
    more » « less
  2. Human mobility data may lead to privacy concerns because a resident can be re-identified from these data by malicious attacks even with anonymized user IDs. For an urban service collecting mobility data, an efficient privacy risk assessment is essential for the privacy protection of its users. The existing methods enable efficient privacy risk assessments for service operators to fast adjust the quality of sensing data to lower privacy risk by using prediction models. However, for these prediction models, most of them require massive training data, which has to be collected and stored first. Such a large-scale long-term training data collection contradicts the purpose of privacy risk prediction for new urban services, which is to ensure that the quality of high-risk human mobility data is adjusted to low privacy risk within a short time. To solve this problem, we present a privacy risk prediction model based on transfer learning, i.e., TransRisk, to predict the privacy risk for a new target urban service through (1) small-scale short-term data of its own, and (2) the knowledge learned from data from other existing urban services. We envision the application of TransRisk on the traffic camera surveillance system and evaluate it with real-world mobility datasets already collected in a Chinese city, Shenzhen, including four source datasets, i.e., (i) one call detail record dataset (CDR) with 1.2 million users; (ii) one cellphone connection data dataset (CONN) with 1.2 million users; (iii) a vehicular GPS dataset (Vehicles) with 10 thousand vehicles; (iv) an electronic toll collection transaction dataset (ETC) with 156 thousand users, and a target dataset, i.e., a camera dataset (Camera) with 248 cameras. The results show that our model outperforms the state-of-the-art methods in terms of RMSE and MAE. Our work also provides valuable insights and implications on mobility data privacy risk assessment for both current and future large-scale services. 
    more » « less
  3. Data from the cellular network have been proved as one of the most promising way to understand large-scale human mobility for various ubiquitous computing applications due to the high penetration of cellphones and low collection cost. Existing mobility models driven by cellular network data suffer from sparse spatial-temporal observations because user locations are recorded with cellphone activities, e.g., calls, text, or internet access. In this paper, we design a human mobility recovery system called CellSense to take the sparse cellular billing data (CBR) as input and outputs dense continuous records to recover the sensing gap when using cellular networks as sensing systems to sense the human mobility. There is limited work on this kind of recovery systems at large scale because even though it is straightforward to design a recovery system based on regression models, it is very challenging to evaluate these models at large scale due to the lack of the ground truth data. In this paper, we explore a new opportunity based on the upgrade of cellular infrastructures to obtain cellular network signaling data as the ground truth data, which log the interaction between cellphones and cellular towers at signal levels (e.g., attaching, detaching, paging) even without billable activities. Based on the signaling data, we design a system CellSense for human mobility recovery by integrating collective mobility patterns with individual mobility modeling, which achieves the 35.3% improvement over the state-of-the-art models. The key application of our recovery model is to take regular sparse CBR data that a researcher already has, and to recover the missing data due to sensing gaps of CBR data to produce a dense cellular data for them to train a machine learning model for their use cases, e.g., next location prediction. 
    more » « less
  4. Accurate and up-to-date digital road maps are the foundation of many mobile applications, such as navigation and autonomous driving. A manually-created map suffers from the high cost for creation and maintenance due to constant road network updating. Recently, the ubiquity of GPS devices in vehicular systems has led to an unprecedented amount of vehicle sensing data for map inference. Unfortunately, accurate map inference based on vehicle GPS is challenging for two reasons. First, it is challenging to infer complete road structures due to the sensing deviation, sparse coverage, and low sampling rate of GPS of a fleet of vehicles with similar mobility patterns, e.g., taxis. Second, a road map requires various road properties such as road categories, which is challenging to be inferred by just GPS locations of vehicles. In this paper, we design a map inference system called coMap by considering multiple fleets of vehicles with Complementary Mobility Features. coMap has two key components: a graph-based map sketching component, a learning-based map painting component. We implement coMap with the data from four type-aware vehicular sensing systems in one city, which consists of 18 thousand taxis, 10 thousand private vehicles, 6 thousand trucks, and 14 thousand buses. We conduct a comprehensive evaluation of coMap with two state-of-the-art baselines along with ground truth based on OpenStreetMap and a commercial map provider, i.e., Baidu Maps. The results show that (i) for the map sketching, our work improves the performance by 15.9%; (ii) for the map painting, our work achieves 74.58% of average accuracy on road category classification. 
    more » « less
  5. We are witnessing a rapid growth of electrified vehicles due to the ever-increasing concerns on urban air quality and energy security. Compared to other types of electric vehicles, electric buses have not yet been prevailingly adopted worldwide due to their high owning and operating costs, long charging time, and the uneven spatial distribution of charging facilities. Moreover, the highly dynamic environment factors such as unpredictable traffic congestion, different passenger demands, and even the changing weather can significantly affect electric bus charging efficiency and potentially hinder the further promotion of large-scale electric bus fleets. To address these issues, in this article, we first analyze a real-world dataset including massive data from 16,359 electric buses, 1,400 bus lines, and 5,562 bus stops. Then, we investigate the electric bus network to understand its operating and charging patterns, and further verify the necessity and feasibility of a real-time charging scheduling. With such understanding, we design busCharging , a pricing-aware real-time charging scheduling system based on Markov Decision Process to reduce the overall charging and operating costs for city-scale electric bus fleets, taking the time-variant electricity pricing into account. To show the effectiveness of busCharging , we implement it with the real-world data from Shenzhen, which includes GPS data of electric buses, the metadata of all bus lines and bus stops, combined with data of 376 charging stations for electric buses. The evaluation results show that busCharging dramatically reduces the charging cost by 23.7% and 12.8% of electricity usage simultaneously. Finally, we design a scheduling-based charging station expansion strategy to verify our busCharging is also effective during the charging station expansion process. 
    more » « less
  6. Human mobility modeling has many applications in location-based services, mobile networking, city management, and epidemiology. Previous sensing approaches for human mobility are mainly categorized into two types: stationary sensing systems (e.g., surveillance cameras and toll booths) and mobile sensing systems (e.g., smartphone apps and vehicle tracking devices). However, stationary sensing systems only provide mobility information of human in limited coverage (e.g., camera-equipped roads) and mobile sensing systems only capture a limited number of people (e.g., people using a particular smartphone app). In this work, we design a novel system Mohen to model human mobility with a heterogeneous sensing system. The key novelty of Mohen is to fundamentally extend the sensing coverage of a large-scale stationary sensing system with a small-scale sensing system. Based on the evaluation on data from real-world urban sensing systems, our system outperforms them by 35% and achieves a competitive result to an Oracle method. 
    more » « less
  7. With the trend of vehicles becoming increasingly connected and potentially autonomous, vehicles are being equipped with rich sensing and communication devices. Various vehicular services based on shared real-time sensor data of vehicles from a fleet have been proposed to improve the urban efficiency, e.g., HD-live map, and traffic accident recovery. However, due to the high cost of data uploading (e.g., monthly fees for a cellular network), it would be impractical to make all well-equipped vehicles to upload real-time sensor data constantly. To better utilize these limited uploading resources and achieve an optimal road segment sensing coverage, we present a real-time sensing task scheduling framework, i.e., RISC, for Resource-Constraint modeling for urban sensing by scheduling sensing tasks of commercial vehicles with sensors based on the predictability of vehicles' mobility patterns. In particular, we utilize the commercial vehicles, including taxicabs, buses, and logistics trucks as mobile sensors to sense urban phenomena, e.g., traffic, by using the equipped vehicular sensors, e.g., dash-cam, lidar, automotive radar, etc. We implement RISC on a Chinese city Shenzhen with one-month real-world data from (i) a taxi fleet with 14 thousand vehicles; (ii) a bus fleet with 13 thousand vehicles; (iii) a truck fleet with 4 thousand vehicles. Further, we design an application, i.e., track suspect vehicles (e.g., hit-and-run vehicles), to evaluate the performance of RISC on the urban sensing aspect based on the data from a regular vehicle (i.e., personal car) fleet with 11 thousand vehicles. The evaluation results show that compared to the state-of-the-art solutions, we improved sensing coverage (i.e., the number of road segments covered by sensing vehicles) by 10% on average. 
    more » « less
  8. Recently, the ubiquity of mobile devices leads to an increasing demand of public network services, e.g., WiFi hot spots. As a part of this trend, modern transportation systems are equipped with public WiFi devices to provide Internet access for passengers as people spend a large amount of time on public transportation in their daily life. However, one of the key issues in public WiFi spots is the privacy concern due to its open access nature. Existing works either studied location privacy risk in human traces or privacy leakage in private networks such as cellular networks based on the data from cellular carriers. To the best of our knowledge, none of these work has been focused on bus WiFi privacy based on large-scale real-world data. In this paper, to explore the privacy risk in bus WiFi systems, we focus on two key questions how likely bus WiFi users can be uniquely re-identified if partial usage information is leaked and how we can protect users from the leaked information. To understand the above questions, we conduct a case study in a large-scale bus WiFi system, which contains 20 million connection records and 78 million location records from 770 thousand bus WiFi users during a two-month period. Technically, we design two models for our uniqueness analyses and protection, i.e., a PB-FIND model to identify the probability a user can be uniquely re-identified from leaked information; a PB-HIDE model to protect users from potentially leaked information. Specifically, we systematically measure the user uniqueness on users' finger traces (i.e., connection URL and domain), foot traces (i.e., locations), and hybrid traces (i.e., both finger and foot traces). Our measurement results reveal (i) 97.8% users can be uniquely re-identified by 4 random domain records of their finger traces and 96.2% users can be uniquely re-identified by 5 random locations on buses; (ii) 98.1% users can be uniquely re-identified by only 2 random records if both their connection records and locations are leaked to attackers. Moreover, the evaluation results show our PB-HIDE algorithm protects more than 95% users from the potentially leaked information by inserting only 1.5% synthetic records in the original dataset to preserve their data utility. 
    more » « less