skip to main content


Title: Predicting Taxi and Uber Demand in Cities: Approaching the Limit of Predictability
Utilizing large-scale urban data sets to predict taxi and Uber passengers demand in cities is valuable for designing better taxi dispatch system and improving taxi services. In this paper, we predict taxi and Uber demand using two real-world data sets. Our approach consists of two key steps. First, we use temporal-correlated entropy to measure the demand regularity and obtain the maximum predictability. Second, we implement and assess five well-known representative predictors (Markov, LZW, ARIMA, MLP and LSTM) in achieving the maximum predictability. The results show that, on average, the maximum predictability can be as high as 83%, indicating a high temporal regularity of taxi demand in cities. In areas with low maximum predictability ( Πmax<0.83 ), the deep learning predictor LSTM can achieve high prediction accuracy by capturing hidden long-term temporal dependency. In areas with high maximum predictability ( Πmax⩾0.83 ), the Markov predictor can infer taxi demand with 86% accuracy, 14% better than LSTM, while requiring only 0.02% computation time. These findings suggest that the maximum predictability can help determine which predictor to use in terms of the accuracy and computational costs.  more » « less
Award ID(s):
1827505
NSF-PAR ID:
10185775
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Knowledge and Data Engineering
ISSN:
1041-4347
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Mobile sensing and information technology have enabled us to collect a large amount of mobility data from human decision-makers, for example, GPS trajectories from taxis, Uber cars, and passenger trip data of taking buses and trains. Understanding and learning human decision-making strategies from such data can potentially promote individual's well-being and improve the transportation service quality. Existing works on human strategy learning, such as inverse reinforcement learning, all model the decision-making process as a Markov decision process, thus assuming the Markov property. In this work, we show that such Markov property does not hold in real-world human decision-making processes. To tackle this challenge, we develop a Trajectory Generative Adversarial Imitation Learning (TrajGAIL) framework. It captures the long-term decision dependency by modeling the human decision processes as variable length Markov decision processes (VLMDPs), and designs a deep-neural-network-based framework to inversely learn the decision-making strategy from the human agent's historical dataset. We validate our framework using two real world human-generated spatial-temporal datasets including taxi driver passenger-seeking decision data and public transit trip data. Results demonstrate significant accuracy improvement in learning human decision-making strategies, when comparing to baselines with Markov property assumptions. 
    more » « less
  2. Public transit is a vital mode of transportation in urban areas, and its efficiency is crucial for the daily commute of millions of people. To improve the reliability and predictability of transit systems, researchers have developed separate single-task learning models to predict the occupancy and delay of buses at the stop or route level. However, these models provide a narrow view of delay and occupancy at each stop and do not account for the correlation between the two. We propose a novel approach that leverages broader generalizable patterns governing delay and occupancy for improved prediction. We introduce a multitask learning toolchain that takes into account General Transit Feed Specification feeds, Automatic Passenger Counter data, and contextual temporal and spatial information. The toolchain predicts transit delay and occupancy at the stop level, improving the accuracy of the predictions of these two features of a trip given sparse and noisy data. We also show that our toolchain can adapt to fewer samples of new transit data once it has been trained on previous routes/trips as compared to state-of-the-art methods. Finally, we use actual data from Chattanooga, Tennessee, to validate our approach. We compare our approach against the state-of-the-art methods and we show that treating occupancy and delay as related problems improves the accuracy of the predictions. We show that our approach improves delay prediction significantly by as much as 4% in F1 scores while producing equivalent or better results for occupancy. 
    more » « less
  3. Urban dispersal events are processes where an unusually large number of people leave the same area in a short period. Early prediction of dispersal events is important in mitigating congestion and safety risks and making better dispatching decisions for taxi and ride-sharing fleets. Existing work mostly focuses on predicting taxi demand in the near future by learning patterns from historical data. However, they fail in case of abnormality because dispersal events with abnormally high demand are non-repetitive and violate common assumptions such as smoothness in demand change over time. Instead, in this paper we argue that dispersal events follow a complex pattern of trips and other related features in the past, which can be used to predict such events. Therefore, we formulate the dispersal event prediction problem as a survival analysis problem. We propose a two-stage framework (DILSA), where a deep learning model combined with survival analysis is developed to predict the probability of a dispersal event and its demand volume. We conduct extensive case studies and experiments on the NYC Yellow taxi dataset from 2014-2016. Results show that DILSA can predict events in the next 5 hours with F1-score of 0:7 and with average time error of 18 minutes. It is orders of magnitude better than the state-of-the-art deep learning approaches for taxi demand prediction. 
    more » « less
  4. Solar energy is now the cheapest form of electricity in history. Unfortunately, significantly increasing the electric grid's fraction of solar energy remains challenging due to its variability, which makes balancing electricity's supply and demand more difficult. While thermal generators' ramp rate---the maximum rate at which they can change their energy generation---is finite, solar energy's ramp rate is essentially infinite. Thus, accurate near-term solar forecasting, or nowcasting, is important to provide advance warnings to adjust thermal generator output in response to variations in solar generation to ensure a balanced supply and demand. To address the problem, this paper develops a general model for solar nowcasting from abundant and readily available multispectral satellite data using self-supervised learning. Specifically, we develop deep auto-regressive models using convolutional neural networks (CNN) and long short-term memory networks (LSTM) that are globally trained across multiple locations to predict raw future observations of the spatio-temporal spectral data collected by the recently launched GOES-R series of satellites. Our model estimates a location's near-term future solar irradiance based on satellite observations, which we feed to a regression model trained on smaller site-specific solar data to provide near-term solar photovoltaic (PV) forecasts that account for site-specific characteristics. We evaluate our approach for different coverage areas and forecast horizons across 25 solar sites and show that it yields errors close to that of a model using ground-truth observations. 
    more » « less
  5. Raza, Mudassar (Ed.)
    Cosegmentation is a newly emerging computer vision technique used to segment an object from the background by processing multiple images at the same time. Traditional plant phenotyping analysis uses thresholding segmentation methods which result in high segmentation accuracy. Although there are proposed machine learning and deep learning algorithms for plant segmentation, predictions rely on the specific features being present in the training set. The need for a multi-featured dataset and analytics for cosegmentation becomes critical to better understand and predict plants’ responses to the environment. High-throughput phenotyping produces an abundance of data that can be leveraged to improve segmentation accuracy and plant phenotyping. This paper introduces four datasets consisting of two plant species, Buckwheat and Sunflower, each split into control and drought conditions. Each dataset has three modalities (Fluorescence, Infrared, and Visible) with 7 to 14 temporal images that are collected in a high-throughput facility at the University of Nebraska-Lincoln. The four datasets (which will be collected under the CosegPP data repository in this paper) are evaluated using three cosegmentation algorithms: Markov random fields-based, Clustering-based, and Deep learning-based cosegmentation, and one commonly used segmentation approach in plant phenotyping. The integration of CosegPP with advanced cosegmentation methods will be the latest benchmark in comparing segmentation accuracy and finding areas of improvement for cosegmentation methodology. 
    more » « less