skip to main content


Title: MobilityMirror: Bias-Adjusted Transportation Datasets
We describe customized synthetic datasets for publishing mobility data. Private companies are providing new transportation modalities, and their data is of high value for integrative transportation research, policy enforcement, and public accountability. However, these companies are disincentivized from sharing data not only to protect the privacy of individuals (drivers and/or passengers), but also to protect their own competitive advantage. Moreover, demographic biases arising from how the services are delivered may be amplified if released data is used in other contexts. We describe a model and algorithm for releasing origin-destination histograms that removes selected biases in the data using causality-based methods. We compute the origin-destination histogram of the original dataset then adjust the counts to remove undesirable causal relationships that can lead to discrimination or violate contractual obligations with data owners. We evaluate the utility of the algorithm on real data from a dockless bike share program in Seattle and taxi data in New York, and show that these adjusted transportation datasets can retain utility while removing bias in the underlying data.  more » « less
Award ID(s):
1740996
NSF-PAR ID:
10074167
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Workshop on Big Social Data and Urban Computing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mobility-as-a-service systems are becoming increasingly important in the context of smart cities, with challenges arising for public agencies to obtain data from private operators. Only limited mobility data are typically provided to city agencies, which are not enough to support their decision-making. This study proposed an entropy-maximizing gravity model to predict origin–destination patterns of both passenger and mobility fleets with only partial operator data. An iterative balancing algorithm was proposed to efficiently reach the entropy maximization state. With different trip length distributions data available, two calibration applications were discussed and validated with a small-scale numerical example. Tests were also conducted to verify the applicability of the proposed model and algorithm to large-scale real data from Chicago transportation network companies. Both shared-ride and single-ride trips were forecast based on the calibrated model, and the prediction of single-ride has a higher level of accuracy. The proposed solution and calibration algorithms are also efficient to handle large scenarios. Additional analyses were conducted for north and south sub-areas of Chicago and revealed different travel patterns in these two sub-areas.

     
    more » « less
  2. Cire, A.A. (Ed.)
    Wildlife trafficking (WT), the illegal trade of wild fauna, flora, and their parts, directly threatens biodiversity and conservation of trafficked species, while also negatively impacting human health, national security, and economic development. Wildlife traffickers obfuscate their activities in plain sight, leveraging legal, large, and globally linked transportation networks. To complicate matters, defensive interdiction resources are limited, datasets are fragmented and rarely interoperable, and interventions like setting checkpoints place a burden on legal transportation. As a result, interpretable predictions of which routes wildlife traffickers are likely to take can help target defensive efforts and understand what wildlife traffickers may be considering when selecting routes. We propose a data-driven model for predicting trafficking routes on the global commercial flight network, a transportation network for which we have some historical seizure data and a specification of the possible routes that traffickers may take. While seizure data has limitations such as data bias and dependence on the deployed defensive resources, this is a first step towards predicting wildlife trafficking routes on real-world data. Our seizure data documents the planned commercial flight itinerary of trafficked and successfully interdicted wildlife. We aim to provide predictions of highly-trafficked flight paths for known origin-destination pairs with plausible explanations that illuminate how traffickers make decisions based on the presence of criminal actors, markets, and resilience systems. We propose a model that first predicts likelihoods of which commercial flights will be taken out of a given airport given input features, and then subsequently finds the highest-likelihood flight path from origin to destination using a differentiable shortest path solver, allowing us to automatically align our model’s loss with the overall goal of correctly predicting the full flight itinerary from a given source to a destination. We evaluate the proposed model’s predictions and interpretations both quantitatively and qualitatively, showing that the predicted paths are aligned with observed held-out seizures, and can be interpreted by policy-makers 
    more » « less
  3. Abstract Non-pharmacologic interventions (NPIs) promote protective actions to lessen exposure risk to COVID-19 by reducing mobility patterns. However, there is a limited understanding of the underlying mechanisms associated with reducing mobility patterns especially for socially vulnerable populations. The research examines two datasets at a granular scale for five urban locations. Through exploratory analysis of networks, statistics, and spatial clustering, the research extensively investigates the exposure risk reduction after the implementation of NPIs to socially vulnerable populations, specifically lower income and non-white populations. The mobility dataset tracks population movement across ZIP codes for an origin–destination (O–D) network analysis. The population activity dataset uses the visits from census block groups (cbg) to points-of-interest (POIs) for network analysis of population-facilities interactions. The mobility dataset originates from a collaboration with StreetLight Data, a company focusing on transportation analytics, whereas the population activity dataset originates from a collaboration with SafeGraph, a company focusing on POI data. Both datasets indicated that low-income and non-white populations faced higher exposure risk. These findings can assist emergency planners and public health officials in comprehending how different populations are able to implement protective actions and it can inform more equitable and data-driven NPI policies for future epidemics. 
    more » « less
  4. Recently, the ubiquity of mobile devices leads to an increasing demand of public network services, e.g., WiFi hot spots. As a part of this trend, modern transportation systems are equipped with public WiFi devices to provide Internet access for passengers as people spend a large amount of time on public transportation in their daily life. However, one of the key issues in public WiFi spots is the privacy concern due to its open access nature. Existing works either studied location privacy risk in human traces or privacy leakage in private networks such as cellular networks based on the data from cellular carriers. To the best of our knowledge, none of these work has been focused on bus WiFi privacy based on large-scale real-world data. In this paper, to explore the privacy risk in bus WiFi systems, we focus on two key questions how likely bus WiFi users can be uniquely re-identified if partial usage information is leaked and how we can protect users from the leaked information. To understand the above questions, we conduct a case study in a large-scale bus WiFi system, which contains 20 million connection records and 78 million location records from 770 thousand bus WiFi users during a two-month period. Technically, we design two models for our uniqueness analyses and protection, i.e., a PB-FIND model to identify the probability a user can be uniquely re-identified from leaked information; a PB-HIDE model to protect users from potentially leaked information. Specifically, we systematically measure the user uniqueness on users' finger traces (i.e., connection URL and domain), foot traces (i.e., locations), and hybrid traces (i.e., both finger and foot traces). Our measurement results reveal (i) 97.8% users can be uniquely re-identified by 4 random domain records of their finger traces and 96.2% users can be uniquely re-identified by 5 random locations on buses; (ii) 98.1% users can be uniquely re-identified by only 2 random records if both their connection records and locations are leaked to attackers. Moreover, the evaluation results show our PB-HIDE algorithm protects more than 95% users from the potentially leaked information by inserting only 1.5% synthetic records in the original dataset to preserve their data utility. 
    more » « less
  5. null (Ed.)
    Given a spatial graph, an origin and a destination, and on-board diagnostics (OBD) data, the energy-efficient path selection problem aims to find the path with the least expected energy consumption (EEC). Two main objectives of smart cities are sustainability and prosperity, both of which benefit from reducing the energy consumption of transportation. The challenges of the problem include the dependence of EEC on the physical parameters of vehicles, the autocorrelation of the EEC on segments of paths, the high computational cost of EEC estimation, and potential negative EEC. However, the current cost estimation models for the path selection problem do not consider vehicles’ physical parameters. Moreover, the current path selection algorithms follow the “path + edge” pattern when exploring candidate paths, resulting in redundant computation. Our preliminary work introduced a physics-guided energy consumption model and proposed a maximal-frequented-path-graph shortest-path algorithm using the model. In this work, we propose an informed algorithm using an admissible heuristic and propose an algorithm to handle negative EEC. We analyze the proposed algorithms theoretically and evaluate the proposed algorithms via experiments with real-world and synthetic data. We also conduct two case studies using real-world data and a road test to validate the proposed method. 
    more » « less