skip to main content


Search for: All records

Creators/Authors contains: "He, Suining"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Understanding and learning the actor-to-X interactions (AXIs), such as those between the focal vehicles (actor) and other traffic participants, such as other vehicles and pedestrians, as well as traffic environments like the city or road map, is essential for the development of a decision-making model and the simulation of autonomous driving. Existing practices on imitation learning (IL) for autonomous driving simulation, despite the advances in the model learnability, have not accounted for fusing and differentiating the heterogeneous AXIs in complex road environments. Furthermore, how to further explain the hierarchical structures within the complex AXIs remains largely under-explored.

    To meet these challenges, we proposeHGIL, an interaction-aware and hierarchically-explainableHeterogeneousGraph-basedImitationLearning approach for autonomous driving simulation. We have designed a novel heterogeneous interaction graph (HIG) to provide local and global representation as well as awareness of the AXIs. Integrating the HIG as the state embeddings, we have designed a hierarchically-explainable generative adversarial imitation learning approach, with local sub-graph and global cross-graph attention, to capture the interaction behaviors and driving decision-making processes. Our data-driven simulation and explanation studies based on the Argoverse v2 dataset (with a total of 40,000 driving scenes) have corroborated the accuracy (e.g., lower displacement errors compared to the state-of-the-art (SOTA) approaches) and explainability ofHGILin learning and capturing the complex AXIs.

     
    more » « less
    Free, publicly-accessible full text available December 12, 2025
  2. Electric(e)-scooters have emerged as a popular, ubiquitous, and first/last-mile micromobility transportation option within and across many cities worldwide. With the increasing situation-awareness and on-board computational capability, such intelligent micromobility has become a critical means of understanding the rider's interactions with other traffic constituents (called Rider-to-X Interactions, RXIs), such as pedestrians, cars, and other micromobility vehicles, as well as road environments, including curbs, road infrastructures, and traffic signs. How to interpret these complex, dynamic, and context-dependent RXIs, particularly for the rider-centric understandings across different data modalities --- such as visual, behavioral, and textual data --- is essential for enabling safer and more comfortable micromobility riding experience and the greater good of urban transportation networks.

    Under a naturalistic riding setting (i.e., without any unnatural constraint on rider's decision-making and maneuvering), we have designed, implemented, and evaluated a pilot Cross-modality E-scooter Naturalistic Riding Understanding System, namely CENRUS, from a human-centered AI perspective. We have conducted an extensive study with CENRUS in sensing, analyzing, and understanding the behavioral, visual, and textual annotation data of RXIs during naturalistic riding. We have also designed a novel, efficient, and usable disentanglement mechanism to conceptualize and understand the e-scooter naturalistic riding processes, and conducted extensive human-centered AI model studies. We have performed multiple downstream tasks enabled by the core model within CENRUS to derive the human-centered AI understandings and insights of complex RXIs, showcasing such downstream tasks as efficient information retrieval and scene understanding. CENRUS can serve as a foundational system for safe and easy-to-use micromobility rider assistance as well as accountable use of micromobility vehicles.

     
    more » « less
    Free, publicly-accessible full text available August 22, 2025
  3. Accurate prediction of citywide crowd activity levels (CALs),i.e., the numbers of participants of citywide crowd activities under different venue categories at certain time and locations, is essential for the city management, the personal service applications, and the entrepreneurs in commercial strategic planning. Existing studies have not thoroughly taken into account the complex spatial and temporal interactions among different categories of CALs and their extreme occurrences, leading to lowered adaptivity and accuracy of their models. To address above concerns, we have proposedIE-CALP, a novel spatio-temporalInteractive attention-based andExtreme-aware model forCrowdActivityLevelPrediction. The tasks ofIE-CALPconsist of(a)forecasting the spatial distributions of various CALs at different city regions (spatial CALs), and(b)predicting the number of participants per category of the CALs (categorical CALs). To realize above, we have designed a novel spatial CAL-POI interaction-attentive learning component inIE-CALPto model the spatial interactions across different CAL categories, as well as those among the spatial urban regions and CALs. In addition,IE-CALPincorporate the multi-level trends (e.g., daily and weekly levels of temporal granularity) of CALs through a multi-level temporal feature learning component. Furthermore, to enhance the model adaptivity to extreme CALs (e.g., during extreme urban events or weather conditions), we further take into account theextreme value theoryand model the impacts of historical CALs upon the occurrences of extreme CALs. Extensive experiments upon a total of 738,715 CAL records and 246,660 POIs in New York City (NYC), Los Angeles (LA), and Tokyo have further validated the accuracy, adaptivity, and effectiveness ofIE-CALP’s interaction-attentive and extreme-aware CAL predictions.

     
    more » « less
    Free, publicly-accessible full text available July 29, 2025
  4. Accurate citywide crowd activity prediction (CAP) can enable proactive crowd mobility management and timely responses to urban events, which has become increasingly important for a myriad of smart city planning and management purposes. However, complex correlations across the crowd activities, spatial and temporal urban environment features and theirinteractivedependencies, and relevant external factors (e.g., weather conditions) make it highly challenging to predict crowd activities accurately in terms of different venue categories (for instance, venues related to dining, services, and residence) and varying degrees (e.g., daytime and nighttime).

    To address the above concerns, we proposeSTICAP, a citywide spatio-temporal interactive crowd activity prediction approach. In particular,STICAPtakes in the location-based social network check-in data (e.g., from Foursquare/Gowalla) as the model inputs and forecasts the crowd activity within each time step for each venue category. Furthermore, we have integrated multiple levels of temporal discretization to interactively capture the relations with historical data. Then, three parallelResidual Spatial Attention Networks(RSAN) in theSpatial Attention Componentexploit the hourly, daily, and weekly spatial features of crowd activities, which are further fused and processed by theTemporal Attention Componentforinteractive CAP. Along with other external factors such as weather conditions and holidays,STICAPadaptively and accurately forecasts the final crowd activities per venue category, enabling potential activity recommendation and other smart city applications. Extensive experimental studies based on three different real-world crowd activity datasets have demonstrated that our proposedSTICAPoutperforms the baseline and state-of-the-art algorithms in CAP accuracy, with an average error reduction of 35.02%.

     
    more » « less
    Free, publicly-accessible full text available March 31, 2025
  5. Driver maneuver interaction learning (DMIL) refers to the classification task with the goal of identifying different driver-vehicle maneuver interactions (e.g., left/right turns). Existing conventional studies largely focused on the centralized collection of sensor data from the drivers' smartphones (say, inertial measurement units or IMUs, like accelerometer and gyroscope). Such a centralized mechanism might be precluded by data regulatory constraints. Furthermore, how to enable an adaptive and accurate DMIL framework remains challenging due to (i) complexity in heterogeneous driver maneuver patterns, and (ii) impacts of anomalous driver maneuvers due to, for instance, aggressive driving styles and behaviors.

    To overcome the above challenges, we propose AF-DMIL, an Anomaly-aware Federated Driver Maneuver Interaction Learning system. We focus on the real-world IMU sensor datasets (e.g., collected by smartphones) for our pilot case study. In particular, we have designed three heterogeneous representations for AF-DMIL regarding spectral, time series, and statistical features that are derived from the IMU sensor readings. We have designed a novel heterogeneous representation attention network (HetRANet) based on spectral channel attention, temporal sequence attention, and statistical feature learning mechanisms, jointly capturing and identifying the complex patterns within driver maneuver behaviors. Furthermore, we have designed a densely-connected convolutional neural network in HetRANet to enable the complex feature extraction and enhance the computational efficiency of HetRANet. In addition, we have designed within AF-DMIL a novel anomaly-aware federated learning approach for decentralized DMIL in response to anomalous maneuver data. To ease extraction of the maneuver patterns and evaluation of their mutual differences, we have designed an embedding projection network that projects the high-dimensional driver maneuver features into low-dimensional space, and further derives the exemplars that represent the driver maneuver patterns for mutual comparison. Then, AF-DMIL further leverages the mutual differences of the exemplars to identify those that exhibit anomalous patterns and deviate from others, and mitigates their impacts upon the federated DMIL. We have conducted extensive driver data analytics and experimental studies on three real-world datasets (one is harvested on our own) to evaluate the prototype of AF-DMIL, demonstrating AF-DMIL's accuracy and effectiveness compared to the state-of-the-art DMIL baselines (on average by more than 13% improvement in terms of DMIL accuracy), as well as fewer communication rounds (on average 29.20% fewer than existing distributed learning mechanisms).

     
    more » « less
  6. Understanding and learning the actor-to-X inter-actions (AXIs), such as those between the focal vehicles (actor) and other traffic participants (e.g., other vehicles, pedestrians) as well as traffic environments (e.g., city/road map), is essential for the development of a decision-making model and simulation of autonomous driving (AD). Existing practices on imitation learning (IL) for AD simulation, despite the advances in the model learnability, have not accounted for fusing and differentiating the heterogeneous AXIs in complex road environments. Furthermore, how to further explain the hierarchical structures within the complex AXIs remains largely under-explored. To overcome these challenges, we propose HGIL, an interaction- aware and hierarchically-explainable Heterogeneous _Graph- based Imitation Learning approach for AD simulation. We have designed a novel heterogeneous interaction graph (HIG) to provide local and global representation as well as awareness of the AXIs. Integrating the HIG as the state embeddings, we have designed a hierarchically-explainable generative adversarial imitation learning approach, with local sub-graph and global cross-graph attention, to capture the interaction behaviors and driving decision-making processes. Our data-driven simulation and explanation studies have corroborated the accuracy and explainability of HGIL in learning and capturing the complex AXIs. 
    more » « less
  7. Learning the human--mobility interaction (HMI) on interactive scenes (e.g., how a vehicle turns at an intersection in response to traffic lights and other oncoming vehicles) can enhance the safety, efficiency, and resilience of smart mobility systems (e.g., autonomous vehicles) and many other ubiquitous computing applications. Towards the ubiquitous and understandable HMI learning, this paper considers both spoken language (e.g., human textual annotations) and unspoken language (e.g., visual and sensor-based behavioral mobility information related to the HMI scenes) in terms of information modalities from the real-world HMI scenarios. We aim to extract the important but possibly implicit HMI concepts (as the named entities) from the textual annotations (provided by human annotators) through a novel human language and sensor data co-learning design.

    To this end, we propose CG-HMI, a novel Cross-modality Graph fusion approach for extracting important Human-Mobility Interaction concepts from co-learning of textual annotations as well as the visual and behavioral sensor data. In order to fuse both unspoken and spoken languages, we have designed a unified representation called the human--mobility interaction graph (HMIG) for each modality related to the HMI scenes, i.e., textual annotations, visual video frames, and behavioral sensor time-series (e.g., from the on-board or smartphone inertial measurement units). The nodes of the HMIG in these modalities correspond to the textual words (tokenized for ease of processing) related to HMI concepts, the detected traffic participant/environment categories, and the vehicle maneuver behavior types determined from the behavioral sensor time-series. To extract the inter- and intra-modality semantic correspondences and interactions in the HMIG, we have designed a novel graph interaction fusion approach with differentiable pooling-based graph attention. The resulting graph embeddings are then processed to identify and retrieve the HMI concepts within the annotations, which can benefit the downstream human-computer interaction and ubiquitous computing applications. We have developed and implemented CG-HMI into a system prototype, and performed extensive studies upon three real-world HMI datasets (two on car driving and the third one on e-scooter riding). We have corroborated the excellent performance (on average 13.11% higher accuracy than the other baselines in terms of precision, recall, and F1 measure) and effectiveness of CG-HMI in recognizing and extracting the important HMI concepts through cross-modality learning. Our CG-HMI studies also provide real-world implications (e.g., road safety and driving behaviors) about the interactions between the drivers and other traffic participants.

     
    more » « less