NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning 3D Perception from Others' Predictions

Yoo, Jinsu; Feng, Zhenyang; Pan, Tai-Yu; Sun, Yihong; Phoo, Cheng Perng; Chen, Xiangyu; Campbell, Mark; Weinberger, Kilian Q; Hariharan, Bharath; Chao, Wei-Lun (April 2025, International Conference on Learning Representations)

Accurate 3D object detection in real-world environments requires a huge amount of annotated data with high quality. Acquiring such data is tedious and expensive, and often needs repeated effort when a new sensor is adopted or when the detector is deployed in a new environment. We investigate a new scenario to construct 3D object detectors: learning from the predictions of a nearby unit that is equipped with an accurate detector. For example, when a self-driving car enters a new area, it may learn from other traffic participants whose detectors have been optimized for that area. This setting is label-efficient, sensor-agnostic, and communication-efficient: nearby units only need to share the predictions with the ego agent (e.g., car). Naively using the received predictions as ground-truths to train the detector for the ego car, however, leads to inferior performance. We systematically study the problem and identify viewpoint mismatches and mislocalization (due to synchronization and GPS errors) as the main causes, which unavoidably result in false positives, false negatives, and inaccurate pseudo labels. We propose a distance-based curriculum, first learning from closer units with similar viewpoints and subsequently improving the quality of other units' predictions via self-training. We further demonstrate that an effective pseudo label refinement module can be trained with a handful of annotated data, largely reducing the data quantity necessary to train an object detector. We validate our approach on the recently released real-world collaborative driving dataset, using reference cars' predictions as pseudo labels for the ego car. Extensive experiments including several scenarios (e.g., different sensors, detectors, and domains) demonstrate the effectiveness of our approach toward label-efficient learning of 3D perception from other units' predictions.
more » « less
Free, publicly-accessible full text available April 28, 2026
Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene

https://doi.org/10.1109/CVPR52734.2025.01123

Pan, Tai-Yu; Jeon, Sooyoung; Fan, Mengdi; Yoo, Jinsu; Feng, Zhenyang; Campbell, Mark; Weinberger, Kilian Q; Hariharan, Bharath; Chao, Wei-Lun (June 2025, IEEE)

Self-driving cars relying solely on ego-centric perception face limitations in sensing, often failing to detect occluded, faraway objects. Collaborative autonomous driving (CAV) seems like a promising direction, but collecting data for development is non-trivial. It requires placing multiple sensor-equipped agents in a real-world driving scene, simultaneously! As such, existing datasets are limited in locations and agents. We introduce a novel surrogate to the rescue, which is to generate realistic perception from different viewpoints in a driving scene, conditioned on a real-world sample—the ego-car’s sensory data. This surrogate has huge potential: it could potentially turn any ego-car dataset into a collaborative driving one to scale up the development of CAV. We present the very first solution, using a combination of simulated collaborative data and real ego-car data. Our method Transfer Your Perspective (TYP) learns a conditioned diffusion model whose output samples are not only realistic but also consistent in both semantics and layouts with the given ego-car data. Empirical results demonstrate TYP’s effectiveness in aiding in a CAV setting. In particular, TYP enables us to (pre-)train collaborative perception algorithms like early and late fusion with little or no real-world collaborative data, greatly facilitating downstream CAV applications.
more » « less
Free, publicly-accessible full text available June 10, 2026
Pre-Training LiDAR-Based 3D Object Detectors Through Colorization

Pan, Tai-Yu; Ma, Chenyang; Chen, Tianle; Phoo, Cheng Perng; Luo, Katie Z; You, Yurong; Campbell, Mark; Weinberger, Kilian Q; Hariharan, Bharath; Chao, Wei-Lun (May 2024, International Conference on Learning Representations)

Full Text Available
Learning with Free Object Segments for Long-Tailed Instance Segmentation

Zhang, Cheng; Pan, Tai-Yu; Chen, Tianle; Zhong, Jike; Fu, Wenjin; Chao, Wei-Lun (October 2022, European Conference on Computer Vision)

One fundamental challenge in building an instance segmen- tation model for a large number of classes in complex scenes is the lack of training examples, especially for rare objects. In this paper, we ex- plore the possibility to increase the training examples without laborious data collection and annotation. We find that an abundance of instance segments can potentially be obtained freely from object-centric images, according to two insights: (i) an object-centric image usually contains one salient object in a simple background; (ii) objects from the same class often share similar appearances or similar contrasts to the background. Motivated by these insights, we propose a simple and scalable frame- work FreeSeg for extracting and leveraging these “free” object fore- ground segments to facilitate model training in long-tailed instance seg- mentation. Concretely, we investigate the similarity among object-centric images of the same class to propose candidate segments of foreground instances, followed by a novel ranking of segment quality. The resulting high-quality object segments can then be used to augment the exist- ing long-tailed datasets, e.g., by copying and pasting the segments onto the original training images. Extensive experiments show that FreeSeg yields substantial improvements on top of strong baselines and achieves state-of-the-art accuracy for segmenting rare object categories. Our code is publicly available at https://github.com/czhang0528/FreeSeg.
more » « less
Full Text Available
Learning with Free Object Segments for Long-Tailed Instance Segmentation

https://doi.org/10.1007/978-3-031-20080-9_38

Zhang, Cheng; Pan, Tai-Yu; Chen, Tianle; Zhong, Jike; Fu, Wenjin; Chao, Wei-Lun (October 2022, European Conference on Computer Vision (ECCV))

Full Text Available
Learning with Free Object Segments for Long-Tailed Instance Segmentation

Zhang, Cheng; Pan, Tai-Yu; Chen, Tianle; Zhong, Jike; Fu, Wenjin; Chao, Wei-Lun (October 2022, European Conference on Computer Vision)

Full Text Available
One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

https://doi.org/10.1109/CVPR52688.2022.01504

Song, Chan Hee; Kil, Jihyung; Pan, Tai-Yu; Sadler, Brian M.; Chao, Wei-Lun; Su, Yu (June 2022, Conference on Computer Vision and Pattern Recognition)

Full Text Available
Learning with Free Object Segments for Long-Tailed Instance Segmentation

Zhang, Cheng; Pan, Tai-Yu; Chen, Tianle; Zhong, Jike; Fu, Wenjin; and Chao, Wei-Lun (January 2022, L3D-IVU: Workshop on Learning with Limited Labeled Data for Image and Video Understanding, in conjunction with the IEEE / CVF Computer Vision and Pattern Recognition Conference)

In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation for long-tailed instance segmentation. We find that an abundance of instance segments can potentially be obtained freely from object-centric images, according to two insights: (i) an object-centric image usually contains one salient object in a simple background; (ii) objects from the same class often share similar appearances or similar contrasts to the background. Motivated by these insights, we propose a simple and scalable framework FREESEG for extracting and leveraging these “free” object segments to facilitate model training. Concretely, we investigate the similarity among object-centric images of the same class to propose candidate segments of foreground instances, followed by a novel ranking of segment quality. The resulting high quality object segments can then be used to augment the existing long-tailed datasets, e.g., by copying and pasting the segments onto the original training images. Extensive experiments show that FREESEG yields substantial improvements on top of strong baselines and achieves state-of-the-art accuracy for segmenting rare object categories.
more » « less
Full Text Available
One Step at a Time: Long-Horizon Vision-and-Language Navigation With Milestones

Song, Chan Hee; Kil, Jihyung; Pan, Tai-Yu; Sadler, Brian M.; Chao, Wei-Lun; and Su, Yu (January 2022, IEEE / CVF Computer Vision and Pattern Recognition Conference)

We study the problem of developing autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in the middle of the long instructions and eventually fail the task. To address this challenge, we propose a model-agnostic milestone-based task tracker(M-TRACK) to guide the agent and monitor its progress. Specifcally, we propose a milestone builder that tags the instructions with navigation and interaction milestones which the agent needs to complete step by step, and a milestone checker that systemically checks the agent’s progress in its current milestone and determines when to proceed to the next. On the challenging ALFRED dataset, our M-TRACK leads to a notable 33% and 52% relative improvement in unseen success rate over two competitive base models.
more » « less
Full Text Available
On Model Calibration for Long-Tailed Object Detection and Instance Segmentation

Pan, Tai-Yu; Zhang, Cheng; Li, Yandong; Hu, Hexiang; Xuan, Dong; Changpinyo, Soravit; Gong, Boqing; Chao, Wei-Lun (December 2021, Conference on Neural Information Processing Systems)

Full Text Available

« Prev Next »

Search for: All records