skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Individualized passenger travel pattern multi-clustering based on graph regularized tensor latent dirichlet allocation
Abstract Individual passenger travel patterns have significant value in understanding passenger’s behavior, such as learning the hidden clusters of locations, time, and passengers. The learned clusters further enable commercially beneficial actions such as customized services, promotions, data-driven urban-use planning, peak hour discovery, and so on. However, the individualized passenger modeling is very challenging for the following reasons: 1) The individual passenger travel data are multi-dimensional spatiotemporal big data, including at least the origin, destination, and time dimensions; 2) Moreover, individualized passenger travel patterns usually depend on the external environment, such as the distances and functions of locations, which are ignored in most current works. This work proposes a multi-clustering model to learn the latent clusters along the multiple dimensions of Origin, Destination, Time, and eventually, Passenger (ODT-P). We develop a graph-regularized tensor Latent Dirichlet Allocation (LDA) model by first extending the traditional LDA model into a tensor version and then applies to individual travel data. Then, the external information of stations is formulated as semantic graphs and incorporated as the Laplacian regularizations; Furthermore, to improve the model scalability when dealing with massive data, an online stochastic learning method based on tensorized variational Expectation-Maximization algorithm is developed. Finally, a case study based on passengers in the Hong Kong metro system is conducted and demonstrates that a better clustering performance is achieved compared to state-of-the-arts with the improvement in point-wise mutual information index and algorithm convergence speed by a factor of two.  more » « less
Award ID(s):
1830363
PAR ID:
10391850
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Data Mining and Knowledge Discovery
Volume:
36
Issue:
4
ISSN:
1384-5810
Page Range / eLocation ID:
1247 to 1278
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Traditionally vehicles act only as servers in transporting passengers and goods. With increasing sensor equipment in vehicles, including automated vehicles, there is a need to test algorithms that consider the dual role of vehicles as both servers and sensors. The paper formulates a sequential route selection problem as a shortest path problem with on-time arrival reliability under a multi-armed bandit setting, a type of reinforcement learning model. A decision-maker has to make a finite set of decisions sequentially on departure time and path between a fixed origin-destination pair such that on-time reliability is maximized while travel time is minimized. The upper confidence bound algorithm is extended to handle this problem. Several tests are conducted. First, simulated data successfully verifies the method, then a real-data scenario is constructed of a hotel shuttle service from midtown Manhattan in New York City providing hourly access to John F. Kennedy International Airport. Results suggest that route selection with multi-armed bandit learning algorithms can be effective but neglecting passenger scheduling constraints can have negative effects on on-time arrival reliability by as much as 4.8% and combined reliability and travel time by 66.1%. 
    more » « less
  2. The problem of traffic prediction is paramount in a plethora of applications, ranging from individual trip planning to urban planning. Existing work mainly focuses on traffic prediction on road networks. Yet, public transportation contributes a significant portion to overall human mobility and passenger volume. For example, the Washington, DC metro has on average 600,000 passengers on a weekday. In this work, we address the problem of modeling, classifying and predicting such passenger volume in public transportation systems. We study the case of the Washington, DC metro exploring fare card data, and specifically passenger in- and outflow at stations. To reduce dimensionality of the data, we apply principal component analysis to extract latent features for different stations and for different calendar days. Our unsupervised clustering results demonstrate that these latent features are highly discriminative. They allow us to derive different station types (residential, commercial, and mixed) and to effectively classify and identify the passenger flow of “unknown” stations. Finally, we also show that this classification can be applied to predict the passenger volume at stations. By learning latent features of stations for some time, we are able to predict the flow for the following hours. Extensive experimentation using a baseline neural network and two naïve periodicity approaches shows the considerable accuracy improvement when using the latent feature based approach. 
    more » « less
  3. Urban anomalies have a large impact on passengers' travel behavior and city infrastructures, which can cause uncertainty on travel time estimation. Understanding the impact of urban anomalies on travel time is of great value for various applications such as urban planning, human mobility studies and navigation systems. Most existing studies on travel time have been focused on the total riding time between two locations on an individual transportation modality. However, passengers often take different modes of transportation, e.g., taxis, subways, buses or private vehicles, and a significant portion of the travel time is spent in the uncertain waiting. In this paper, we study the fine-grained travel time patterns in multiple transportation systems under the impact of urban anomalies. Specifically, (i) we investigate implicit components, including waiting and riding time, in multiple transportation systems; (ii) we measure the impact of real-world anomalies on travel time components; (iii) we design a learning-based model for travel time component prediction with anomalies. Different from existing studies, we implement and evaluate our measurement framework on multiple data sources including four city-scale transportation systems, which are (i) a 14-thousand taxicab network, (ii) a 13-thousand bus network, (iii) a 10-thousand private vehicle network, and (iv) an automatic fare collection system for a public transit network (i.e., subway and bus) with 5 million smart cards. 
    more » « less
  4. Mobility-as-a-service systems are becoming increasingly important in the context of smart cities, with challenges arising for public agencies to obtain data from private operators. Only limited mobility data are typically provided to city agencies, which are not enough to support their decision-making. This study proposed an entropy-maximizing gravity model to predict origin–destination patterns of both passenger and mobility fleets with only partial operator data. An iterative balancing algorithm was proposed to efficiently reach the entropy maximization state. With different trip length distributions data available, two calibration applications were discussed and validated with a small-scale numerical example. Tests were also conducted to verify the applicability of the proposed model and algorithm to large-scale real data from Chicago transportation network companies. Both shared-ride and single-ride trips were forecast based on the calibrated model, and the prediction of single-ride has a higher level of accuracy. The proposed solution and calibration algorithms are also efficient to handle large scenarios. Additional analyses were conducted for north and south sub-areas of Chicago and revealed different travel patterns in these two sub-areas. 
    more » « less
  5. null (Ed.)
    Although urban transit systems (UTS) often have fixed vehicle capacity and relatively constant departure headways, they may need to accommodate dramatically fluctuating passenger demands over space and time, resulting in either excessive passenger waiting or vehicle capacity and energy waste. Therefore, on the one hand, optimal operations of UTS rely on accurate modeling of passenger queuing dynamics, which is particularly complex on a multistop transit corridor. On the other hand, capacities of transit vehicles can be made variable and adaptive to time-variant passenger demand so as to minimize energy waste, especially with the emergence of modular vehicle technologies. This paper investigates operations of a multistop transit corridor in which vehicles may have different capacities across dispatches. We specify skewed time coordinates to simplify the problem structure while incorporating traffic congestion. Then, we propose a mixed integer linear programming model that determines the optimal dynamic headways and vehicle capacities over the study time horizon to minimize the overall system cost for the transit corridor. In particular, the proposed model considers a realistic multistop first-in, first-out (MSFIFO) rule that gives the same boarding priority to passengers arriving at a station in the same time interval yet with different destinations. A customized dynamic programming (DP) algorithm is proposed to solve this model efficiently. To circumvent the rapid increase of the state space of a typical DP algorithm, we analyze the theoretical properties of the investigated problem and identify upper and lower bounds to a feasible solution. The bounds largely reduce the state space during the DP iterations and greatly improve the efficiency of the proposed DP algorithm. The state dimensions are also reduced with the MSFIFO rule such that all queues with different destinations at the same origin can be tracked with a single boarding position state variable at each stage. A hypothetical example and a real-world case study show that the proposed DP algorithm greatly outperforms a state-of-the-art commercial solver (Gurobi) in both solution quality and time. 
    more » « less