skip to main content

Title: Data-driven Bus Crowding Prediction Models Using Context-specific Features
Public transit is one of the first things that come to mind when someone talks about “smart cities.” As a result, many technologies, applications, and infrastructure have already been deployed to bring the promise of the smart city to public transportation. Most of these have focused on answering the question, “When will my bus arrive?”; little has been done to answer the question, “How full will my next bus be?” which also dramatically affects commuters’ quality of life. In this article, we consider the bus fullness problem. In particular, we propose two different formulations of the problem, develop multiple predictive models, and evaluate their accuracy using data from the Pittsburgh region. Our predictive models consistently outperform the baselines (by up to 8 times).
Award ID(s):
Publication Date:
Journal Name:
ACM/IMS Transactions on Data Science
Page Range or eLocation-ID:
1 to 33
Sponsoring Org:
National Science Foundation
More Like this
  1. An effective real-time estimation of the travel time for vehicles, using AVL (Automatic Vehicle Locators) has added a new dimension to the smart city planning. In this paper, the authors used data collected over several months from a transit agency and show how this data can be potentially used to learn patterns of travel time during specially planned events like NFL (National Football League) games and music award ceremonies. The impact of NFL games along with consideration of other factors like weather, traffic condition, distance is discussed with their relative importance to the prediction of travel time. Statistical learning modelsmore »are used to predict travel time and subsequently assess the cascading effects of delay. The model performance is determined based on its predictive accuracy according to the out-of-sample error. In addition, the models help identify the most significant variables that influence the delay in the transit system. In order to compare the actual and predicted travel time for days having special events, heat maps are generated showing the delay impacts in different time windows between two timepoint-segments in comparison to a non-game day. This work focuses on the prediction and visualization of the delay in the public transit system and the analysis of its cascading effects on the entire transportation network. According to the study results, the authors are able to explain more than 80% of the variance in the bus travel time at each segment and can make future travel predictions during planned events with an out-of-sample error of 2.0 minutes using information on the bus schedule, traffic, weather, and scheduled events. According to the variable importance analysis, traffic information is most significant in predicting the delay in the transit system.« less
  2. Effective public transit operations are one of the fundamental requirements for a modern community. Recently, a number of transit agencies have started integrating automated vehicle locators in their fleet, which provides a real-time estimate of the time of arrival. In this paper, we use the data collected over several months from one such transit system and show how this data can be potentially used to learn long term patterns of travel time. More specifically, we study the effect of weather and other factors such as traffic on the transit system delay. These models can later be used to understand themore »seasonal variations and to design adaptive and transient transit schedules. Towards this goal, we also propose an online architecture called DelayRadar. The novelty of DelayRadar lies in three aspects: (1) a data store that collects and integrates real-time and static data from multiple data sources, (2) a predictive statistical model that analyzes the data to make predictions on transit travel time, and (3) a decision making framework to develop an optimal transit schedule based on variable forecasts related to traffic, weather, and other impactful factors. This paper focuses on identifying the model with the best predictive accuracy to be used in DelayRadar. According to the preliminary study results, we are able to explain more than 70% of the variance in the bus travel time and we can make future travel predictions with an out-of-sample error of 4.8 minutes with information on the bus schedule, traffic, and weather.« less
  3. Smart city projects aim to enhance the management of city infrastructure by enabling government entities to monitor, control and maintain infrastructure efficiently through the deployment of Internet-of-things (IoT) devices. However, the financial burden associated with smart city projects is a detriment to prospective smart cities. A noteworthy factor that impacts the cost and sustainability of smart city projects is providing cellular Internet connectivity to IoT devices. In response to this problem, this paper explores the use of public transportation network nodes and mules, such as bus-stops as buses, to facilitate connectivity via device-to-device communication in order to reduce cellular connectivitymore »costs within a smart city. The data mules convey non-urgent data from IoT devices to edge computing hardware, where data can be processed or sent to the cloud. Consequently, this paper focuses on edge node placement in smart cities that opportunistically leverage public transit networks for reducing reliance on and thus costs of cellular connectivity. We introduce an algorithm that selects a set of edge nodes that provides maximal sensor coverage and explore another that selects a set of edge nodes that provide minimal delivery delay within a budget. The algorithms are evaluated for two public transit network data-sets: Chapel Hill, North Carolina and Louisville, Kentucky. Results show that our algorithms consistently outperform edge node placement strategies that rely on traditional centrality metrics (betweenness and in-degree centrality) by over 77% reduction in coverage budget and over 20 minutes reduction in latency.« less
  4. Systems exhibiting nonlinear dynamics, including but not limited to chaos, are ubiquitous across Earth Sciences such as Meteorology, Hydrology, Climate and Ecology, as well as Biology such as neural and cardiac processes. However, System Identification remains a challenge. In climate and earth systems models, while governing equations follow from first principles and understanding of key processes has steadily improved, the largest uncertainties are often caused by parameterizations such as cloud physics, which in turn have witnessed limited improvements over the last several decades. Climate scientists have pointed to Machine Learning enhanced parameter estimation as a possible solution, with proof-of-concept methodologicalmore »adaptations being examined on idealized systems. While climate science has been highlighted as a "Big Data" challenge owing to the volume and complexity of archived model-simulations and observations from remote and in-situ sensors, the parameter estimation process is often relatively a "small data" problem. A crucial question for data scientists in this context is the relevance of state-of-the-art data-driven approaches including those based on deep neural networks or kernel-based processes. Here we consider a chaotic system - two-level Lorenz-96 - used as a benchmark model in the climate science literature, adopt a methodology based on Gaussian Processes for parameter estimation and compare the gains in predictive understanding with a suite of Deep Learning and strawman Linear Regression methods. Our results show that adaptations of kernel-based Gaussian Processes can outperform other approaches under small data constraints along with uncertainty quantification; and needs to be considered as a viable approach in climate science and earth system modeling.« less
  5. Public transit agencies are focused on making their fixed-line bus systems more energy efficient by introducing electric (EV) and hybrid (HV) vehicles to their eets. However, because of the high upfront cost of these vehicles, most agencies are tasked with managing a mixed-fleet of internal combustion vehicles (ICEVs), EVs, and HVs. In managing mixed-fleets, agencies require accurate predictions of energy use for optimizing the assignment of vehicles to transit routes, scheduling charging, and ensuring that emission standards are met. The current state-of-the-art is to develop separate neural network models to predict energy consumption for each vehicle class. Although different vehiclemore »classes’ energy consumption depends on a varied set of covariates, we hypothesize that there are broader generalizable patterns that govern energy consumption and emissions. In this paper, we seek to extract these patterns to aid learning to address two problems faced by transit agencies. First, in the case of a transit agency which operates many ICEVs, HVs, and EVs, we use multi-task learning (MTL) to improve accuracy of forecasting energy consumption. Second, in the case where there is a significant variation in vehicles in each category, we use inductive transfer learning (ITL) to improve predictive accuracy for vehicle class models with insufficient data. As this work is to be deployed by our partner agency, we also provide an online pipeline for joining the various sensor streams for xed-line transit energy prediction. We find that our approach outperforms vehicle-specific baselines in both the MTL and ITL settings.« less