skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predicting Taxi and Uber Demand in Cities: Approaching the Limit of Predictability
Utilizing large-scale urban data sets to predict taxi and Uber passengers demand in cities is valuable for designing better taxi dispatch system and improving taxi services. In this paper, we predict taxi and Uber demand using two real-world data sets. Our approach consists of two key steps. First, we use temporal-correlated entropy to measure the demand regularity and obtain the maximum predictability. Second, we implement and assess five well-known representative predictors (Markov, LZW, ARIMA, MLP and LSTM) in achieving the maximum predictability. The results show that, on average, the maximum predictability can be as high as 83%, indicating a high temporal regularity of taxi demand in cities. In areas with low maximum predictability ( Πmax<0.83 ), the deep learning predictor LSTM can achieve high prediction accuracy by capturing hidden long-term temporal dependency. In areas with high maximum predictability ( Πmax⩾0.83 ), the Markov predictor can infer taxi demand with 86% accuracy, 14% better than LSTM, while requiring only 0.02% computation time. These findings suggest that the maximum predictability can help determine which predictor to use in terms of the accuracy and computational costs.  more » « less
Award ID(s):
1827505
PAR ID:
10185775
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Knowledge and Data Engineering
ISSN:
1041-4347
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Mobile sensing and information technology have enabled us to collect a large amount of mobility data from human decision-makers, for example, GPS trajectories from taxis, Uber cars, and passenger trip data of taking buses and trains. Understanding and learning human decision-making strategies from such data can potentially promote individual's well-being and improve the transportation service quality. Existing works on human strategy learning, such as inverse reinforcement learning, all model the decision-making process as a Markov decision process, thus assuming the Markov property. In this work, we show that such Markov property does not hold in real-world human decision-making processes. To tackle this challenge, we develop a Trajectory Generative Adversarial Imitation Learning (TrajGAIL) framework. It captures the long-term decision dependency by modeling the human decision processes as variable length Markov decision processes (VLMDPs), and designs a deep-neural-network-based framework to inversely learn the decision-making strategy from the human agent's historical dataset. We validate our framework using two real world human-generated spatial-temporal datasets including taxi driver passenger-seeking decision data and public transit trip data. Results demonstrate significant accuracy improvement in learning human decision-making strategies, when comparing to baselines with Markov property assumptions. 
    more » « less
  2. Urban dispersal events are processes where an unusually large number of people leave the same area in a short period. Early prediction of dispersal events is important in mitigating congestion and safety risks and making better dispatching decisions for taxi and ride-sharing fleets. Existing work mostly focuses on predicting taxi demand in the near future by learning patterns from historical data. However, they fail in case of abnormality because dispersal events with abnormally high demand are non-repetitive and violate common assumptions such as smoothness in demand change over time. Instead, in this paper we argue that dispersal events follow a complex pattern of trips and other related features in the past, which can be used to predict such events. Therefore, we formulate the dispersal event prediction problem as a survival analysis problem. We propose a two-stage framework (DILSA), where a deep learning model combined with survival analysis is developed to predict the probability of a dispersal event and its demand volume. We conduct extensive case studies and experiments on the NYC Yellow taxi dataset from 2014-2016. Results show that DILSA can predict events in the next 5 hours with F1-score of 0:7 and with average time error of 18 minutes. It is orders of magnitude better than the state-of-the-art deep learning approaches for taxi demand prediction. 
    more » « less
  3. Abstract This study illustrates the considerable improvement in accuracy achievable for long‐lead forecasts (18 months) of the Ocean Niño Index (ONI) through the utilization of a long short‐term memory (LSTM) machine learning algorithm. The research assesses the predictive potential of eight predictors from both tropical and extratropical regions constructed based on sea surface temperature, outgoing longwave radiation, sea surface height and zonal and meridional wind anomalies. In comparison to linear regression model forecasts, the LSTM model outperforms them for both the tropical and extratropical predictor sets. Among all the predictors, the western North Pacific (WNP) index demonstrates the highest prediction skill in ONI forecasts, followed by the North Tropical Atlantic (NTA) index and then the sea surface height index. While other predictors help the LSTM model to forecast either the phase variation of the amplitude variation of the observed ONI, the extratropical WNP predictor enables the LSTM model to forecast both variations. This superiority can be attributed to the involvement of SST anomalies in the WNP region in both tropical and extratropical El Niño–Southern Oscillation (ENSO) dynamics, allowing for the utilization of predictive potential from both components of ENSO dynamics. The study also concludes that the extratropical ENSO dynamics provide a robust source of predictability for long‐lead ENSO forecasts, which can be effectively harnessed using the LSTM model. 
    more » « less
  4. Public transit is a vital mode of transportation in urban areas, and its efficiency is crucial for the daily commute of millions of people. To improve the reliability and predictability of transit systems, researchers have developed separate single-task learning models to predict the occupancy and delay of buses at the stop or route level. However, these models provide a narrow view of delay and occupancy at each stop and do not account for the correlation between the two. We propose a novel approach that leverages broader generalizable patterns governing delay and occupancy for improved prediction. We introduce a multitask learning toolchain that takes into account General Transit Feed Specification feeds, Automatic Passenger Counter data, and contextual temporal and spatial information. The toolchain predicts transit delay and occupancy at the stop level, improving the accuracy of the predictions of these two features of a trip given sparse and noisy data. We also show that our toolchain can adapt to fewer samples of new transit data once it has been trained on previous routes/trips as compared to state-of-the-art methods. Finally, we use actual data from Chattanooga, Tennessee, to validate our approach. We compare our approach against the state-of-the-art methods and we show that treating occupancy and delay as related problems improves the accuracy of the predictions. We show that our approach improves delay prediction significantly by as much as 4% in F1 scores while producing equivalent or better results for occupancy. 
    more » « less
  5. Raza, Mudassar (Ed.)
    Cosegmentation is a newly emerging computer vision technique used to segment an object from the background by processing multiple images at the same time. Traditional plant phenotyping analysis uses thresholding segmentation methods which result in high segmentation accuracy. Although there are proposed machine learning and deep learning algorithms for plant segmentation, predictions rely on the specific features being present in the training set. The need for a multi-featured dataset and analytics for cosegmentation becomes critical to better understand and predict plants’ responses to the environment. High-throughput phenotyping produces an abundance of data that can be leveraged to improve segmentation accuracy and plant phenotyping. This paper introduces four datasets consisting of two plant species, Buckwheat and Sunflower, each split into control and drought conditions. Each dataset has three modalities (Fluorescence, Infrared, and Visible) with 7 to 14 temporal images that are collected in a high-throughput facility at the University of Nebraska-Lincoln. The four datasets (which will be collected under the CosegPP data repository in this paper) are evaluated using three cosegmentation algorithms: Markov random fields-based, Clustering-based, and Deep learning-based cosegmentation, and one commonly used segmentation approach in plant phenotyping. The integration of CosegPP with advanced cosegmentation methods will be the latest benchmark in comparing segmentation accuracy and finding areas of improvement for cosegmentation methodology. 
    more » « less