skip to main content


Title: Data-driven Bus Crowding Prediction Models Using Context-specific Features
Public transit is one of the first things that come to mind when someone talks about “smart cities.” As a result, many technologies, applications, and infrastructure have already been deployed to bring the promise of the smart city to public transportation. Most of these have focused on answering the question, “When will my bus arrive?”; little has been done to answer the question, “How full will my next bus be?” which also dramatically affects commuters’ quality of life. In this article, we consider the bus fullness problem. In particular, we propose two different formulations of the problem, develop multiple predictive models, and evaluate their accuracy using data from the Pittsburgh region. Our predictive models consistently outperform the baselines (by up to 8 times).  more » « less
Award ID(s):
1739413
NSF-PAR ID:
10229761
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ACM/IMS Transactions on Data Science
Volume:
1
Issue:
3
ISSN:
2691-1922
Page Range / eLocation ID:
1 to 33
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. An effective real-time estimation of the travel time for vehicles, using AVL (Automatic Vehicle Locators) has added a new dimension to the smart city planning. In this paper, the authors used data collected over several months from a transit agency and show how this data can be potentially used to learn patterns of travel time during specially planned events like NFL (National Football League) games and music award ceremonies. The impact of NFL games along with consideration of other factors like weather, traffic condition, distance is discussed with their relative importance to the prediction of travel time. Statistical learning models are used to predict travel time and subsequently assess the cascading effects of delay. The model performance is determined based on its predictive accuracy according to the out-of-sample error. In addition, the models help identify the most significant variables that influence the delay in the transit system. In order to compare the actual and predicted travel time for days having special events, heat maps are generated showing the delay impacts in different time windows between two timepoint-segments in comparison to a non-game day. This work focuses on the prediction and visualization of the delay in the public transit system and the analysis of its cascading effects on the entire transportation network. According to the study results, the authors are able to explain more than 80% of the variance in the bus travel time at each segment and can make future travel predictions during planned events with an out-of-sample error of 2.0 minutes using information on the bus schedule, traffic, weather, and scheduled events. According to the variable importance analysis, traffic information is most significant in predicting the delay in the transit system. 
    more » « less
  2. null (Ed.)
    Smart city projects aim to enhance the management of city infrastructure by enabling government entities to monitor, control and maintain infrastructure efficiently through the deployment of Internet-of-things (IoT) devices. However, the financial burden associated with smart city projects is a detriment to prospective smart cities. A noteworthy factor that impacts the cost and sustainability of smart city projects is providing cellular Internet connectivity to IoT devices. In response to this problem, this paper explores the use of public transportation network nodes and mules, such as bus-stops as buses, to facilitate connectivity via device-to-device communication in order to reduce cellular connectivity costs within a smart city. The data mules convey non-urgent data from IoT devices to edge computing hardware, where data can be processed or sent to the cloud. Consequently, this paper focuses on edge node placement in smart cities that opportunistically leverage public transit networks for reducing reliance on and thus costs of cellular connectivity. We introduce an algorithm that selects a set of edge nodes that provides maximal sensor coverage and explore another that selects a set of edge nodes that provide minimal delivery delay within a budget. The algorithms are evaluated for two public transit network data-sets: Chapel Hill, North Carolina and Louisville, Kentucky. Results show that our algorithms consistently outperform edge node placement strategies that rely on traditional centrality metrics (betweenness and in-degree centrality) by over 77% reduction in coverage budget and over 20 minutes reduction in latency. 
    more » « less
  3. Effective public transit operations are one of the fundamental requirements for a modern community. Recently, a number of transit agencies have started integrating automated vehicle locators in their fleet, which provides a real-time estimate of the time of arrival. In this paper, we use the data collected over several months from one such transit system and show how this data can be potentially used to learn long term patterns of travel time. More specifically, we study the effect of weather and other factors such as traffic on the transit system delay. These models can later be used to understand the seasonal variations and to design adaptive and transient transit schedules. Towards this goal, we also propose an online architecture called DelayRadar. The novelty of DelayRadar lies in three aspects: (1) a data store that collects and integrates real-time and static data from multiple data sources, (2) a predictive statistical model that analyzes the data to make predictions on transit travel time, and (3) a decision making framework to develop an optimal transit schedule based on variable forecasts related to traffic, weather, and other impactful factors. This paper focuses on identifying the model with the best predictive accuracy to be used in DelayRadar. According to the preliminary study results, we are able to explain more than 70% of the variance in the bus travel time and we can make future travel predictions with an out-of-sample error of 4.8 minutes with information on the bus schedule, traffic, and weather. 
    more » « less
  4. Urban anomalies have a large impact on passengers' travel behavior and city infrastructures, which can cause uncertainty on travel time estimation. Understanding the impact of urban anomalies on travel time is of great value for various applications such as urban planning, human mobility studies and navigation systems. Most existing studies on travel time have been focused on the total riding time between two locations on an individual transportation modality. However, passengers often take different modes of transportation, e.g., taxis, subways, buses or private vehicles, and a significant portion of the travel time is spent in the uncertain waiting. In this paper, we study the fine-grained travel time patterns in multiple transportation systems under the impact of urban anomalies. Specifically, (i) we investigate implicit components, including waiting and riding time, in multiple transportation systems; (ii) we measure the impact of real-world anomalies on travel time components; (iii) we design a learning-based model for travel time component prediction with anomalies. Different from existing studies, we implement and evaluate our measurement framework on multiple data sources including four city-scale transportation systems, which are (i) a 14-thousand taxicab network, (ii) a 13-thousand bus network, (iii) a 10-thousand private vehicle network, and (iv) an automatic fare collection system for a public transit network (i.e., subway and bus) with 5 million smart cards. 
    more » « less
  5. Public transit is a critical component of a smart and connected community. As such, citizens expect and require accurate information about real-time arrival/departures of transportation assets. As transit agencies enable large-scale integration of real-time sensors and support back-end data-driven decision support systems, the dynamic data-driven applications systems (DDDAS) paradigm becomes a promising approach to make the system smarter by providing online model learning and multi-time scale analytics as part of the decision support system that is used in the DDDAS feedback loop. In this paper, we describe a system in use in Nashville and illustrate the analytic methods developed by our team. These methods use both historical as well as real-time streaming data for online bus arrival prediction. The historical data is used to build classifiers that enable us to create expected performance models as well as identify anomalies. These classifiers can be used to provide schedule adjustment feedback to the metro transit authority. We also show how these analytics services can be packaged into modular, distributed and resilient micro-services that can be deployed on both cloud back ends as well as edge computing resources. 
    more » « less