skip to main content


Title: Towards a Better Understanding of Public Transportation Traffic: A Case Study of the Washington, DC Metro
The problem of traffic prediction is paramount in a plethora of applications, ranging from individual trip planning to urban planning. Existing work mainly focuses on traffic prediction on road networks. Yet, public transportation contributes a significant portion to overall human mobility and passenger volume. For example, the Washington, DC metro has on average 600,000 passengers on a weekday. In this work, we address the problem of modeling, classifying and predicting such passenger volume in public transportation systems. We study the case of the Washington, DC metro exploring fare card data, and specifically passenger in- and outflow at stations. To reduce dimensionality of the data, we apply principal component analysis to extract latent features for different stations and for different calendar days. Our unsupervised clustering results demonstrate that these latent features are highly discriminative. They allow us to derive different station types (residential, commercial, and mixed) and to effectively classify and identify the passenger flow of “unknown” stations. Finally, we also show that this classification can be applied to predict the passenger volume at stations. By learning latent features of stations for some time, we are able to predict the flow for the following hours. Extensive experimentation using a baseline neural network and two naïve periodicity approaches shows the considerable accuracy improvement when using the latent feature based approach.  more » « less
Award ID(s):
1637541
NSF-PAR ID:
10110163
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Urban Science
Volume:
2
Issue:
3
ISSN:
2413-8851
Page Range / eLocation ID:
65
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Urban public transit planning is crucial in reducing traffic congestion and enabling green transportation. However, there is no systematic way to integrate passengers' personal preferences in planning public transit routes and schedules so as to achieve high occupancy rates and efficiency gain of ride-sharing. In this paper, we take the first step tp exact passengers' preferences in planning from history public transit data. We propose a data-driven method to construct a Markov decision process model that characterizes the process of passengers making sequential public transit choices, in bus routes, subway lines, and transfer stops/stations. Using the model, we integrate softmax policy iteration into maximum entropy inverse reinforcement learning to infer the passenger's reward function from observed trajectory data. The inferred reward function will enable an urban planner to predict passengers' route planning decisions given some proposed transit plans, for example, opening a new bus route or subway line. Finally, we demonstrate the correctness and accuracy of our modeling and inference methods in a large-scale (three months) passenger-level public transit trajectory data from Shenzhen, China. Our method contributes to smart transportation design and human-centric urban planning. 
    more » « less
  2. Effective road traffic assessment and estimation is crucial not only for traffic management applications, but also for long-term trans- portation and, more generally, urban planning. Traditionally, this task has been achieved by using a network of stationary traffic count sensors. These costly and unreliable sensors have been replaced with so-called Probe Vehicle Data (PVD), which relies on sampling individual vehicles in traffic using for example smartphones to assess the overall traffic condition. While PVD provides uniform road network coverage, it does not capture the actual traffic flow. On the other hand, stationary sensors capture the absolute traffic flow only at discrete locations. Furthermore, these sensors are often unreliable; temporary mal- functions create gaps in their time-series of measurements. This work bridges the gap between these two data sources by learning the time-dependent fraction of vehicles captured by GPS-based probe data at discrete stationary sensor locations. We can then account for the gaps of the traffic-loop measurements by using the PVD data to estimate the actual total flow. In this work, we show that the PVD flow capture changes sig- nificantly over time in the Washington DC area. Exploiting this information, we are able to derive tight confidence intervals of the traffic volume for areas with no stationary sensor coverage. 
    more » « less
  3. Abstract Individual passenger travel patterns have significant value in understanding passenger’s behavior, such as learning the hidden clusters of locations, time, and passengers. The learned clusters further enable commercially beneficial actions such as customized services, promotions, data-driven urban-use planning, peak hour discovery, and so on. However, the individualized passenger modeling is very challenging for the following reasons: 1) The individual passenger travel data are multi-dimensional spatiotemporal big data, including at least the origin, destination, and time dimensions; 2) Moreover, individualized passenger travel patterns usually depend on the external environment, such as the distances and functions of locations, which are ignored in most current works. This work proposes a multi-clustering model to learn the latent clusters along the multiple dimensions of Origin, Destination, Time, and eventually, Passenger (ODT-P). We develop a graph-regularized tensor Latent Dirichlet Allocation (LDA) model by first extending the traditional LDA model into a tensor version and then applies to individual travel data. Then, the external information of stations is formulated as semantic graphs and incorporated as the Laplacian regularizations; Furthermore, to improve the model scalability when dealing with massive data, an online stochastic learning method based on tensorized variational Expectation-Maximization algorithm is developed. Finally, a case study based on passengers in the Hong Kong metro system is conducted and demonstrates that a better clustering performance is achieved compared to state-of-the-arts with the improvement in point-wise mutual information index and algorithm convergence speed by a factor of two. 
    more » « less
  4. An effective real-time estimation of the travel time for vehicles, using AVL (Automatic Vehicle Locators) has added a new dimension to the smart city planning. In this paper, the authors used data collected over several months from a transit agency and show how this data can be potentially used to learn patterns of travel time during specially planned events like NFL (National Football League) games and music award ceremonies. The impact of NFL games along with consideration of other factors like weather, traffic condition, distance is discussed with their relative importance to the prediction of travel time. Statistical learning models are used to predict travel time and subsequently assess the cascading effects of delay. The model performance is determined based on its predictive accuracy according to the out-of-sample error. In addition, the models help identify the most significant variables that influence the delay in the transit system. In order to compare the actual and predicted travel time for days having special events, heat maps are generated showing the delay impacts in different time windows between two timepoint-segments in comparison to a non-game day. This work focuses on the prediction and visualization of the delay in the public transit system and the analysis of its cascading effects on the entire transportation network. According to the study results, the authors are able to explain more than 80% of the variance in the bus travel time at each segment and can make future travel predictions during planned events with an out-of-sample error of 2.0 minutes using information on the bus schedule, traffic, weather, and scheduled events. According to the variable importance analysis, traffic information is most significant in predicting the delay in the transit system. 
    more » « less
  5. Many studies have reported associations between respiratory symptoms and resident proximity to traffic. However, only a few have documented information about the relationship between traffic volume and air quality in local areas. This study investigates the impact of traffic volume on air quality at different geographical locations in the state of South Carolina using multilevel linear mixed models and Grey Systems. Historical traffic volume and air quality data between 2006 and 2016 are obtained from the South Carolina Department of Transportation (SCDOT) and the United States Environmental Protection Agency (EPA) monitoring stations. The data are used to develop prediction models that relate Air Quality Index (AQI) to traffic volume for selected counties and schools. For the counties, two models are developed, one with Ozone (O3) and one with PM2:5 as the dependent variable. For the schools, only one model is developed, with O3 as the dependent variable. The number of counties and schools studied are limited by the availability of air monitoring stations dedicated to measuring O3 and PM2:5. Several types of models were investigated. They include linear regression model (LM), linear mixed-effect regression model (LMER), Grey Systems (GM), error corrected GM (EGM), Grey Verhulst (GV), error corrected GV (EGV), and LMER + EGM. The LM model produced the least accurate estimate while the LMER + EGM model produced the most accurate estimate (average RMSE is less than 5%). The models’ estimates suggest that air quality in South Carolina will continue to get worse in the coming years due to increasing AADT. An interesting finding of this study is that some counties and schools will have higher levels of O3 or PM2:5 when AADT decreases. This finding suggests that there are other factors, other than AADT, that influence the air quality in these counties and schools. 
    more » « less