skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 16, 2026

Title: Using Multiple Biased Data Sets to Recover Missing Trips with a Behaviorally Informed Model
Trip generation, a critical first step in travel demand forecasting, requires not only estimating trips from the observed sample data, but also calculating the total number of trips in the population, including both the observed trips and the trips missed from the sample (we call them missing trips in this paper). The latter, how to recover missing trips, is scarcely studied in the academic literature, and the state-of-the-art practice is through the application of sample weights to extrapolate from observed trips to the population total. In recent years, big location-based service (LBS) has become a promising alternative data source (in addition to household travel survey data) in trip generation. Because users self-select into using different mobile services that result in LBS data, selection bias exists in the LBS data, and the kinds of trips excluded or included differ systematically among data sources. This study addresses this issue and develops a behaviorally informed approach to quantify the selection biases and recover missing trips. The key idea is that because biases reflected in different data sources are likely different, the integration of multiple biased data sources will mitigate biases. This is achieved by formulating a capture probability that specifies the probability of capturing a trip in a data set as a function of various behavioral factors (e.g., socio-demographics and area-related factors) and estimating the associated parameters through maximum likelihood or Bayesian methods. This approach is evaluated through experimental studies that test the effects of data and model uncertainty on its ability of recovering missing trips. The model is also applied to two real-world case studies: one using the 2017 National Household Travel Survey data and the other using two LBS data sets. Our results demonstrate the robustness of the model in recovering missing trips, even when the analyst completely mis-specifies the underlying trip generation process and the capture probability functions (for quantifying selection biases). The developed methodology can be scalable to any number of data sets and is applicable to both big and small data sets. History: This paper has been accepted for the Transportation Science Special Issue on Machine Learning Methods for Urban Mobility. Funding: This work was supported by the Division of Civil, Mechanical and Manufacturing Innovation [Grant 2114260], the National Institute of General Medical Sciences [Grant 1R01GM108731-01A1], and the U.S. Department of Transportation [Grant 69A3551747116]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/trsc.2024.0550 .  more » « less
Award ID(s):
2114260
PAR ID:
10615876
Author(s) / Creator(s):
; ;
Publisher / Repository:
INFORMS
Date Published:
Journal Name:
Transportation Science
ISSN:
0041-1655
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Driverless or fully automated vehicles (AVs) are expected to fundamentally change how individuals and households travel and how vehicles use roadway infrastructure. The first goal of this study is to develop a modeling framework of activity-constrained household travel in a future multi-modal network with private AVs, shared-use AVs, transit, and intermodal AV-transit travel options. The second goal is to analyze the potential impacts of AVs—including intermodal AV-transit travel—on (a) household-level travel behavior, (b) household travel costs, (c) demand for transport modes, including transit, and (d) vehicle kilometers traveled or VKT. To meet the first goal, we propose and formulate the Household Activity Pattern Problem with AV-enabled Intermodal Trips (HAPP-AV-IT) that incorporates AV deadheading and intermodal AV-transit trips. The modeling framework extends prior HAPP-based formulations that model household-level travel decisions as vehicle (and person) routing and scheduling problems, similar to the pickup and delivery problem with time-windows. To meet the second goal, we apply the HAPP-AV-IT to two case studies and conduct many computational experiments. We use synthetic activity location data for synthetic households and a fictitious medium-size network with a road network, transit network, residential locations, activity locations, and parking locations. The computational results illustrate (a) the critical role that household AV ownership plays in terms of household travel decisions, modal demand, and VKT, (b) that with AVs, deadheading accounts for 30–40 % of vehicle operating distances, (c) that around 10 % of households in the study region benefit from AV-based intermodal trips, and (d) that those 10 % of households see 5 % reductions in household travel costs and 25 % reductions in VKT on average in the most transit friendly scenario. This last finding suggests that intermodal AV-transit trips may exist in a driverless vehicle future, and therefore, transit agencies and transportation planners should consider how to serve this market. We also propose and test a simple heuristic algorithm that quickly solves HAPP-AV-IT problem instances. 
    more » « less
  2. Urban heat exposure is an increasing health risk among urban dwellers. Many cities are considering accommodating active mobility, especially walking and biking, to reduce greenhouse gas emissions. However, promoting active mobility without proper planning and transportation infrastructure to combat extreme heat exposure may cause more heat-related morbidity and mortality, particularly in future with projected climate change. This study estimated the effectiveness of active trip heat exposure mitigation under built environment and travel behavior change. Simulations of the Phoenix metro region's 624,987 active trips were conducted using the activity-based travel model (ABM), mean radiant temperature (T MRT , net human radiation exposure), transportation network, and local climate zones. Two scenarios were designed to reduce traveler exposure: one that focuses on built environment change (making neighborhoods cooler) and the other on travel behavior (switching from shorter travel time but higher exposure routes to longer travel time but cooler routes) change. Travelers experienced T MRT heat exposure ranging from 29°C to 76°C (84°F to 168°F) without environmental or behavioral change. Active trip T MRT exposures were reduced by an average of 1.2–3.7°C when the built environment was changed from a hotter to cooler design. Behavioral changes cooled up to 10 times more trips than changes in built environment changes. The marginal benefit of cooling decreased as the number of cooled corridors transformed increased. When the most traveled 10 km of corridors were cooled, the marginal benefit affected over 1,000 trips/km. However, cooling all corridors results in marginal benefits as low as 1 trip/km. The results reveal that heavily traveled corridors should be prioritized with limited resources, and the best cooling results come from environment and travel behavior change together. The results show how to surgically invest in travel behavior and built environment change to most effectively protect active travelers. 
    more » « less
  3. Abstract Ride-hailing can potentially provide a variety of benefits to individuals who need to chain several activities together within a single trip chain, relative to other travel modes. Using household travel diary/survey data, the goal of this study is to assess the role ride-hailing currently plays within trip chains. Specifically, the study aims to determine, within trip chains, who uses ride-hailing services, for what trip/activity purposes, and to/from what types of areas, as well as the characteristics of trip chains that involve ride-hailing segments. To meet these objectives, the study estimates a binary logit model using 2017 National Household Travel Survey data, where the dependent variable denotes the inclusion of at least one ride-hailing trip within a trip chain. Similar to the non-trip-chaining ride-hailing literature, this study indicates that trip chains with ride-hailing legs are positively associated with travelers who are younger, live in high-income households, frequently use transit, and reside in high-density areas. However, this study includes novel findings indicating statistically significant relationships between ride-hailing and trip chains that end in healthcare and social/recreational activities. Moreover, trip chains with ride-hailing tend to have fewer stops and longer activity durations than trip chains without ride-hailing. This study also includes nested logit choice models, wherein the dependent variable denotes the primary mode (ride-hailing, transit, personal vehicle, or non-motorized transport) of a trip chain. These model results provide additional insights into the role of ride-hailing within trip chains, as they allow for cross-mode comparisons. The paper discusses the potential transportation planning and policy implications of the model results as well as future research directions. 
    more » « less
  4. For transportation system analysis in a new space dimension with respect to individual trips’ remaining distances, vehicle trips demand has two main components: the departure time and the trip distance. In particular, the trip distance distribution (TDD) is a direct input to the bathtub model in the new space dimension, and is a very important variable to consider in many applications, such as the development of distance-based congestion pricing strategies or mileage tax. For a good understanding of the demand pattern, both the distribution of trip initiation and trip distance should be calibrated from real data. In this paper, it is assumed that the demand pattern can be described by the joint distribution of trip distance and departure time. In other words, TDD is assumed to be time-dependent, and a calibration and validation methodology of the joint probability is proposed, based on log-likelihood maximization and the Kolmogorov–Smirnov test. The calibration method is applied to empirical for-hire vehicle trips in Chicago, and it is concluded that TDD varies more within a day than across weekdays. The hypothesis that TDD follows a negative exponential, log-normal, or Gamma distribution is rejected. However, the best fit is systematically observed for the time-dependent log-normal probability density function. In the future, other trip distributions should be considered and also non-parametric probability density estimation should be explored for a better understanding of the demand pattern. 
    more » « less
  5. Agent-based models have been extensively used to simulate the behavior of travelers in transportation systems because they allow for realistic and versatile modeling of interactions. However, traditional agent-based models suffer from high computational costs and rely on tracking physical locations, raising privacy concerns. This paper proposes an efficient formulation for the agent-based bathtub model (AB2M) in the relative space, where each agent’s trajectory is represented by a time series of the remaining distance to its destination. The AB2M can be understood as a microscopic model that tracks individual trips’ initiation, progression, and completion and is an exact numerical solution of the bathtub model for generic (time-dependent) trip distance distributions. The model can be solved for a deterministic set of trips with a given demand pattern (defined by the start time of each trip and its distance), or it can be used to run Monte Carlo simulations to capture the average behavior and variations of stochastic demand patterns. To enhance the computational efficiency, we introduce a priority queue formulation for AB2M, eliminating the need to update trip positions at each time step and allowing us to run large-scale scenarios with millions of individual trips in seconds. We systematically explore the scaling properties of AB2M and discuss the introduction of biases and numerical errors. Finally, we analyze the upper bound of the computational complexity of the AB2M and the benefits of the priority queue formulation and downscaling on the computational cost. The systematic exploration of scaling properties of the modeling of individual agents in the relative space with the AB2M further enhances its applicability to large-scale transportation systems and opens up opportunities for studying travel time reliability, scheduling, and mode choices. 
    more » « less