skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 10, 2026

Title: Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene
Self-driving cars relying solely on ego-centric perception face limitations in sensing, often failing to detect occluded, faraway objects. Collaborative autonomous driving (CAV) seems like a promising direction, but collecting data for development is non-trivial. It requires placing multiple sensor-equipped agents in a real-world driving scene, simultaneously! As such, existing datasets are limited in locations and agents. We introduce a novel surrogate to the rescue, which is to generate realistic perception from different viewpoints in a driving scene, conditioned on a real-world sample—the ego-car’s sensory data. This surrogate has huge potential: it could potentially turn any ego-car dataset into a collaborative driving one to scale up the development of CAV. We present the very first solution, using a combination of simulated collaborative data and real ego-car data. Our method Transfer Your Perspective (TYP) learns a conditioned diffusion model whose output samples are not only realistic but also consistent in both semantics and layouts with the given ego-car data. Empirical results demonstrate TYP’s effectiveness in aiding in a CAV setting. In particular, TYP enables us to (pre-)train collaborative perception algorithms like early and late fusion with little or no real-world collaborative data, greatly facilitating downstream CAV applications.  more » « less
Award ID(s):
2107077
PAR ID:
10639178
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
IEEE
Date Published:
Page Range / eLocation ID:
12027 to 12036
Format(s):
Medium: X
Location:
2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions. 
    more » « less
  2. Trajectory prediction forecasts nearby agents’ moves based on their historical trajectories. Accurate trajectory prediction (or prediction in short) is crucial for autonomous vehicles (AVs). Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack approach that induces prediction errors via attacks against the perception module of a victim AV. Although it has been shown that physically realizable attacks against LiDAR-based perception are possible by placing a few objects at strategic locations, it is still an open challenge to find an object location from the vast search space in order to launch effective attacks against prediction under varying victim AV velocities. Through analysis, we observe that a prediction model is prone to an attack focusing on a single point in the scene. Consequently, we propose a novel two-stage attack framework to realize the single-point attack. The first stage of predictionside attack efficiently identifies, guided by the distribution of detection results under object-based attacks against perception, the state perturbations for the prediction model that are effective and velocity-insensitive. In the second stage of location matching, we match the feasible object locations with the found state perturbations. Our evaluation using a public autonomous driving dataset shows that our attack causes a collision rate of up to 63% and various hazardous responses of the victim AV. The effectiveness of our attack is also demonstrated on a real testbed car 1. To the best of our knowledge, this study is the first security analysis spanning from LiDARbased perception to prediction in autonomous driving, leading to a realistic attack on prediction. To counteract the proposed attack, potential defenses are discussed. 
    more » « less
  3. An ideal traffic simulator replicates the realistic long-term point-to-point trip that a self-driving system experiences during deployment. Prior models and benchmarks focus on closed-loop motion simulation for initial agents in a scene. This is problematic for long-term simulation. Agents enter and exit the scene as the ego vehicle enters new regions. We propose InfGen, a unified next-token prediction model that performs interleaved closed-loop motion simulation and scene generation. InfGen automatically switches between closed-loop motion simulation and scene generation mode. It enables stable long-term rollout simulation. InfGen performs at the state-of-the-art in short-term (9s) traffic simulation, and significantly outperforms all other methods in long-term (30s) simulation. 
    more » « less
  4. Connected Autonomous Vehicles (CAVs) are expected to enable reliable, efficient, and intelligent transportation systems. Most motion planning algorithms for multi-agent systems implicitly assume that all vehicles/agents will execute the expected plan with a small error and evaluate their safety constraints based on this fact. This assumption, however, is hard to keep for CAVs since they may have to change their plan (e.g., to yield to another vehicle) or are forced to stop (e.g., A CAV may break down). While it is desired that a CAV never gets involved in an accident, it may be hit by other vehicles and sometimes, preventing the accident is impossible (e.g., getting hit from behind while waiting behind the red light). Responsibility-Sensitive Safety (RSS) is a set of safety rules that defines the objective of CAV to blame, instead of safety. Thus, instead of developing a CAV algorithm that will avoid any accident, it ensures that the ego vehicle will not be blamed for any accident it is a part of. Original RSS rules, however, are hard to evaluate for merge, intersection, and unstructured road scenarios, plus RSS rules do not prevent deadlock situations among vehicles. In this paper, we propose a new formulation for RSS rules that can be applied to any driving scenario. We integrate the proposed RSS rules with the CAV’s motion planning algorithm to enable cooperative driving of CAVs. We use Control Barrier Functions to enforce safety constraints and compute the energy optimal trajectory for the ego CAV. Finally, to ensure liveness, our approach detects and resolves deadlocks in a decentralized manner. We have conducted different experiments to verify that the ego CAV does not cause an accident no matter when other CAVs slow down or stop. We also showcase our deadlock detection and resolution mechanism using our simulator. Finally, we compare the average velocity and fuel consumption of vehicles when they drive autonomously with the case that they are autonomous and connected. 
    more » « less
  5. We introduce a novel vision-and-language navigation (VLN) task of learning to provide real-time guidance to a blind follower situated in complex dynamic navigation scenarios. Towards exploring real-time information needs and fundamental challenges in our novel modeling task, we first collect a multi-modal real-world benchmark with in-situ Orientation and Mobility (O&M) instructional guidance. Subsequently, we leverage the real-world study to inform the design of a larger-scale simulation benchmark, thus enabling comprehensive analysis of limitations in current VLN models. Motivated by how sighted O&M guides seamlessly and safely support the awareness of individuals with visual impairments when collaborating on navigation tasks, we present ASSISTER, an imitation-learned agent that can embody such effective guidance. The proposed assistive VLN agent is conditioned on navigational goals and commands for generating instructional sentences that are coherent with the surrounding visual scene, while also carefully accounting for the immediate assistive navigation task. Altogether, our introduced evaluation and training framework takes a step towards scalable development of the next generation of seamless, human-like assistive agents. 
    more » « less