skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery
Recent advances in machine learning have shown that Reinforcement Learning from Human Feedback (RLHF) can improve machine learning models and align them with human preferences. Although very successful for Large Language Models (LLMs), these advancements have not had a comparable impact in research for autonomous vehicles—where alignment with human expectations can be imperative. In this paper, we propose to adapt similar RL-based methods to unsupervised object discovery, i.e. learning to detect objects from LiDAR points without any training labels. Instead of labels, we use simple heuristics to mimic human feedback. More explicitly, we combine multiple heuristics into a simple reward function that positively correlates its score with bounding box accuracy, i.e., boxes containing objects are scored higher than those without. We start from the detector’s own predictions to explore the space and reinforce boxes with high rewards through gradient updates. Empirically, we demonstrate that our approach is not only more accurate, but also orders of magnitudes faster to train compared to prior works on object discovery. Code is available at https://github.com/katieluo88/DRIFT.  more » « less
Award ID(s):
2107161 2305532 1830497
PAR ID:
10546174
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
Advances in Neural Information Processing Systems 36 (NeurIPS 2023)
Date Published:
ISBN:
9781713899921
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Leonardis, A; Ricci, E; Roth, S; Russakovsky, O; Sattler, T; Varol, G (Ed.)
    Embodied agents must detect and localize objects of interest, e.g. traffic participants for self-driving cars. Supervision in the form of bounding boxes for this task is extremely expensive. As such, prior work has looked at unsupervised instance detection and segmentation, but in the absence of annotated boxes, it is unclear how pixels must be grouped into objects and which objects are of interest. This results in over-/under- segmentation and irrelevant objects. Inspired by human visual system and practical applications, we posit that the key missing cue for un- supervised detection is motion: objects of interest are typically mobile objects that frequently move and their motions can specify separate in- stances. In this paper, we propose MOD-UV, a Mobile Object Detector learned from Unlabeled Videos only. We begin with instance pseudo- labels derived from motion segmentation, but introduce a novel training paradigm to progressively discover small objects and static-but-mobile objects that are missed by motion segmentation. As a result, though only learned from unlabeled videos, MOD-UV can detect and segment mo- bile objects from a single static image. Empirically, we achieve state-of- the-art performance in unsupervised mobile object detection on Waymo Open, nuScenes, and KITTI Datasets without using any external data or supervised models. Code is available at github.com/YihongSun/MOD-UV. 
    more » « less
  2. Embodied agents must detect and localize objects of interest, e.g. traffic participants for self-driving cars. Supervision in the form of bounding boxes for this task is extremely expensive. As such, prior work has looked at unsupervised instance detection and segmentation, but in the absence of annotated boxes, it is unclear how pixels must be grouped into objects and which objects are of interest. This results in over-/under- segmentation and irrelevant objects. Inspired by human visual system and practical applications, we posit that the key missing cue for un- supervised detection is motion: objects of interest are typically mobile objects that frequently move and their motions can specify separate in- stances. In this paper, we propose MOD-UV, a Mobile Object Detector learned from Unlabeled Videos only. We begin with instance pseudo- labels derived from motion segmentation, but introduce a novel training paradigm to progressively discover small objects and static-but-mobile objects that are missed by motion segmentation. As a result, though only learned from unlabeled videos, MOD-UV can detect and segment mo- bile objects from a single static image. Empirically, we achieve state-of- the-art performance in unsupervised mobile object detection on Waymo Open, nuScenes, and KITTI Datasets without using any external data or supervised models. Code is available at github.com/YihongSun/MOD-UV. 
    more » « less
  3. With emerging vision-based autonomous driving (AD) systems, it becomes increasingly important to have datasets to evaluate their correct operation and identify potential security flaws. However, when collecting a large amount of data, either human experts manually label potentially hundreds of thousands of image frames or systems use machine learning algorithms to label the data, with the hope that the accuracy is good enough for the application. This can become especially problematic when tracking the context information, such as the location and velocity of surrounding objects, useful to evaluate the correctness and improve stability and robustness of the AD systems. In this paper, we introduce DRIVETRUTH, a data collection framework built on CARLA, an open-source simulator for AD research, which constructs datasets with automatically generated accurate object labels, bounding boxes of objects and their contextual information through accessing simulation state and using semantic LiDAR raycasts. By leveraging the actual state of the simulation and the agents within it, we guarantee complete accuracy in all labels and gathered contextual information. Further, the use of the simulator provides easily collecting data in diverse environmental conditions and agent behaviors, with lighting, weather, and traffic behavior being configurable within the simulation. Through this effort, we provide users a means to extracting actionable simulated data from CARLA to test and explore attacks and defenses for AD systems. 
    more » « less
  4. Humans are the final decision makers in critical tasks that involve ethical and legal concerns, ranging from recidivism prediction, to medical diagnosis, to fighting against fake news. Although machine learning models can sometimes achieve impressive performance in these tasks, these tasks are not amenable to full automation. To realize the potential of machine learning for improving human decisions, it is important to understand how assistance from machine learning models affects human performance and human agency. In this paper, we use deception detection as a testbed and investigate how we can harness explanations and predictions of machine learning models to improve human performance while retaining human agency. We propose a spectrum between full human agency and full automation, and develop varying levels of machine assistance along the spectrum that gradually increase the influence of machine predictions. We find that without showing predicted labels, explanations alone slightly improve human performance in the end task. In comparison, human performance is greatly improved by showing predicted labels (>20% relative improvement) and can be further improved by explicitly suggesting strong machine performance. Interestingly, when predicted labels are shown, explanations of machine predictions induce a similar level of accuracy as an explicit statement of strong machine performance. Our results demonstrate a tradeoff between human performance and human agency and show that explanations of machine predictions can moderate this tradeoff. 
    more » « less
  5. null (Ed.)
    Computing goal-directed behavior is essential to designing efficient AI systems. Due to the computational complexity of planning, current approaches rely primarily upon hand-coded symbolic action models and hand-coded heuristic function generators for efficiency. Learned heuristics for such prob- lems have been of limited utility as they are difficult to apply to problems with objects and object quantities that are signif- icantly different from those in the training data. This paper develops a new approach for learning generalized heuristics in the absence of symbolic action models using deep neural networks that utilize an input predicate vocabulary but are agnostic to object names and quantities. It uses an abstract state representation to facilitate data-efficient, generalizable learning. Empirical evaluation on a range of benchmark do- mains shows that in contrast to prior approaches, generalized heuristics computed by this method can be transferred easily to problems with different objects and with object quantities much larger than those in the training data. 
    more » « less