An ideal traffic simulator replicates the realistic long-term point-to-point trip that a self-driving system experiences during deployment. Prior models and benchmarks focus on closed-loop motion simulation for initial agents in a scene. This is problematic for long-term simulation. Agents enter and exit the scene as the ego vehicle enters new regions. We propose InfGen, a unified next-token prediction model that performs interleaved closed-loop motion simulation and scene generation. InfGen automatically switches between closed-loop motion simulation and scene generation mode. It enables stable long-term rollout simulation. InfGen performs at the state-of-the-art in short-term (9s) traffic simulation, and significantly outperforms all other methods in long-term (30s) simulation.
more »
« less
Language Conditioned Traffic Generation
Simulation forms the backbone of modern self-driving development. Simulators help develop, test, and improve driving systems without putting humans, vehicles, or their environment at risk. However, simulators face a major challenge: They rely on realistic, scalable, yet interesting content. While recent advances in rendering and scene reconstruction make great strides in creating static scene assets, modeling their layout, dynamics, and behaviors remains challenging. In this work, we turn to language as a source of supervision for dynamic traffic scene generation. Our model, LCTGen, combines a large language model with a transformer-based decoder architecture that selects likely map locations from a dataset of maps, and produces an initial traffic distribution, as well as the dynamics of each vehicle. LCTGen outperforms prior work in both unconditional and conditional traffic scene generation in terms of realism and fidelity.
more »
« less
- Award ID(s):
- 1845485
- PAR ID:
- 10522916
- Publisher / Repository:
- CoRL
- Date Published:
- Format(s):
- Medium: X
- Location:
- Atlanta, TA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
GrokWalks: A Portable Virtual Reality Platform to Facilitate Studying Driver-Pedestrian InteractionsDriving simulators are vital for human-centered automotive research, offering safe, replicable environments for studying human interaction with transportation technology interfaces and behaviors. However, traditional driving simulators are not well-suited to studying traffic interactions with various degrees of freedom in a way that allows for the capture of nuances in implicit and explicit interactions, e.g. gestures, body language, and movement. We developed a multi-participant virtual reality (VR) driving simulation platform to study these interactions. This portable system supports cross-cultural experiments by modeling diverse scenarios, generating analyzable data, and capturing human behaviors in traffic. Our interactive demo allows participants to experience roles as drivers or pedestrians in a shared virtual environment, with the goal of providing a hands-on experience with this open-source VR simulator and demonstrating its affordability and scalability for traffic interaction studies to researchers and practitioners.more » « less
-
Large-scale driving datasets such as Waymo Open Dataset and nuScenes substantially accelerate autonomous driving research, especially for perception tasks such as 3D detection and trajectory forecasting. Since the driving logs in these datasets contain HD maps and detailed object annotations that accurately reflect the real- world complexity of traffic behaviors, we can harvest a massive number of complex traffic scenarios and recreate their digital twins in simulation. Compared to the hand- crafted scenarios often used in existing simulators, data-driven scenarios collected from the real world can facilitate many research opportunities in machine learning and autonomous driving. In this work, we present ScenarioNet, an open-source platform for large-scale traffic scenario modeling and simulation. ScenarioNet defines a unified scenario description format and collects a large-scale repository of real-world traffic scenarios from the heterogeneous data in various driving datasets including Waymo, nuScenes, Lyft L5, Argoverse, and nuPlan datasets. These scenarios can be further replayed and interacted with in multiple views from Bird- Eye-View layout to realistic 3D rendering in MetaDrive simulator. This provides a benchmark for evaluating the safety of autonomous driving stacks in simulation before their real-world deployment. We further demonstrate the strengths of ScenarioNet on large-scale scenario generation, imitation learning, and reinforcement learning in both single-agent and multi-agent settings. Code, demo videos, and website are available at https://metadriverse.github.io/scenarionet.more » « less
-
In the complex traffic environments, understanding how a focal vehicle interacts (e.g., maneuvers) with various traffic elements (e.g., other vehicles, pedestrians, and road infrastructures), i.e., vehicle-to-X interactions (VXIs), is essential for developing the advanced driving support and intelligent vehicles. To derive the VXI scene understanding, reasoning, and decision support (e.g., suggesting cautious move in response of a pedestrian crossing the street), this work takes into account the recent advances of multi-modality large language models (MLLMs). We develop VXI-SUR, a novel VXI Scene Understanding and Reasoning system based on vision-language modeling. VXI-SUR takes in the visual VXI scene, and generates the structured textual responses that interpret the VXI scene and suggests an appropriate decision (e.g., braking, slowing down). We have designed within VXI-SUR a VXI memory mechanism with both scene and knowledge augmentation mechanisms, and enabled scene-knowledge co-learning to capture complex correspondences across scenes and decisions. We have performed extensive and comprehensive evaluations of VXI-SUR based on an open-source dataset with ∼17k VXI scenes. We have conducted extensive experimentation studies upon VXI-SUR, and corroborated VXI awareness, description preciseness, semantic matching, and quality in understanding and reasoning the complex VXI scenes.more » « less
-
Numerous solutions are proposed for the Traffic Signal Control (TSC) tasks aiming to provide efficient transportation and alleviate traffic congestion. Recently, promising results have been attained by Reinforcement Learning (RL) methods through trial and error in simulators, bringing confidence in solving cities' congestion problems. However, performance gaps still exist when simulator-trained policies are deployed to the real world. This issue is mainly introduced by the system dynamic difference between the training simulators and the real-world environments. In this work, we leverage the knowledge of Large Language Models (LLMs) to understand and profile the system dynamics by a prompt-based grounded action transformation to bridge the performance gap. Specifically, this paper exploits the pre-trained LLM's inference ability to understand how traffic dynamics change with weather conditions, traffic states, and road types. Being aware of the changes, the policies' action is taken and grounded based on realistic dynamics, thus helping the agent learn a more realistic policy. We conduct experiments on four different scenarios to show the effectiveness of the proposed PromptGAT's ability to mitigate the performance gap of reinforcement learning from simulation to reality (sim-to-real).more » « less
An official website of the United States government

