skip to main content


Title: Learning to Detect Mobile Objects from LiDAR Scans Without Labels
Current 3D object detectors for autonomous driving are almost entirely trained on human-annotated data. Although of high quality, the generation of such data is laborious and costly, restricting them to a few specific locations and object types. This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth. Our ap- proach leverages several simple common sense heuristics to create an initial set of approximate seed labels. For ex- ample, relevant traffic participants are generally not per- sistent across multiple traversals of the same route, do not fly, and are never under ground. We demonstrate that these seed labels are highly effective to bootstrap a surpris- ingly accurate detector through repeated self-training with- out a single human annotated label. Code is available at https:// github.com/ YurongYou/ MODEST .  more » « less
Award ID(s):
2107161
NSF-PAR ID:
10350994
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Current 3D object detectors for autonomous driving are almost entirely trained on human-annotated data. Although of high quality, the generation of such data is laborious and costly, restricting them to a few specific locations and object types. This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth. Our approach leverages several simple common sense heuristics to create an initial set of approximate seed labels. For example, relevant traffic participants are generally not persistent across multiple traversals of the same route, do not fly, and are never under ground. We demonstrate that these seed labels are highly effective to bootstrap a surprisingly accurate detector through repeated self-training without a single human annotated label. Code is available at https://github.com/YurongYou/MODEST. 
    more » « less
  2. Meila, Marina ; Zhang, Tong (Ed.)
    The label noise transition matrix, characterizing the probabilities of a training instance being wrongly annotated, is crucial to designing popular solutions to learning with noisy labels. Existing works heavily rely on finding “anchor points” or their approximates, defined as instances belonging to a particular class almost surely. Nonetheless, finding anchor points remains a non-trivial task, and the estimation accuracy is also often throttled by the number of available anchor points. In this paper, we propose an alternative option to the above task. Our main contribution is the discovery of an efficient estimation procedure based on a clusterability condition. We prove that with clusterable representations of features, using up to third-order consensuses of noisy labels among neighbor representations is sufficient to estimate a unique transition matrix. Compared with methods using anchor points, our approach uses substantially more instances and benefits from a much better sample complexity. We demonstrate the estimation accuracy and advantages of our estimates using both synthetic noisy labels (on CIFAR-10/100) and real human-level noisy labels (on Clothing1M and our self-collected human-annotated CIFAR-10). Our code and human-level noisy CIFAR-10 labels are available at https://github.com/UCSC-REAL/HOC. 
    more » « less
  3. An important task in human-computer interaction is to rank speech samples according to their expressive content. A preference learning framework is appropriate for obtaining an emotional rank for a set of speech samples. However, obtaining reliable labels for training a preference learning framework is a challenging task. Most existing databases provide sentence-level absolute attribute scores annotated by multiple raters, which have to be transformed to obtain preference labels. Previous studies have shown that evaluators anchor their absolute assessments on previously annotated samples. Hence, this study proposes a novel formulation for obtaining preference learning labels by only considering annotation trends assigned by a rater to consecutive samples within an evaluation session. The experiments show that the use of the proposed anchor-based ordinal labels leads to significantly better performance than models trained using existing alternative labels. 
    more » « less
  4. Living in a data-driven world with rapidly growing machine learning techniques, it is apparent that utilizing these methods is necessary to achieve state-of-the-art performance in object detection. Recent novel approaches in the deep-learning field have boasted real-time object segmentation methods given the algorithm is connected to a large validation dataset. Knowing that these algorithms are restricted to a given dataset, it is apparent that the need for data generating algorithms is on a rise. As some object detection problems may suffice with a statically trained deep-learning model, it is true that others will not. Given the no free lunch theorem, we know that no machine learning algorithm can truly generalize to data it has not been trained on; therefore, deep learning models trained on images of cats will not necessarily classify dogs correctly. With modern deep learning libraries being ported for mobile devices, a wide range of utilityhas been made apparent for plant researchers around the world. One such usage of these real-time approaches is to count and classify seed kernels, replacing monotonous-human-error-ridden tasks. Plant scientists around the world have daily jobs of counting seeds by hand or using multi-thousand dollar devices to automate the task. It is apparent that many third world countries, where such consumer devices do not exist or require too many resources, could benefit from such an automated task. PhenoApps, an organization started within Kansas State University, has been supplying a subset of these countries with modern phones for such uses. With the following seed segmentation algorithm and the usage of modern mobile devices, scientists can count seeds with the click of a button and produce results in split-seconds. The algorithms proposed in this paper achieve multiple novel implementations. Mainly, Rice’s Theorem was used to show that object detection in clusters is an undecidable task for Turing Machines. Along with this, the novel implementations include an Android application which can segment seed kernels and a machine learning algorithm which can accurately generate contour data sets. The data generator provided in this paper is an effective start for the later usage of deep learning models and is the first step for a real-time dynamic and static seed counter. 
    more » « less
  5. This work investigates how different forms of input elicitation obtained from crowdsourcing can be utilized to improve the quality of inferred labels for image classification tasks, where an image must be labeled as either positive or negative depending on the presence/absence of a specified object. Five types of input elicitation methods are tested: binary classification (positive or negative); the ( x, y )-coordinate of the position participants believe a target object is located; level of confidence in binary response (on a scale from 0 to 100%); what participants believe the majority of the other participants' binary classification is; and participant's perceived difficulty level of the task (on a discrete scale). We design two crowdsourcing studies to test the performance of a variety of input elicitation methods and utilize data from over 300 participants. Various existing voting and machine learning (ML) methods are applied to make the best use of these inputs. In an effort to assess their performance on classification tasks of varying difficulty, a systematic synthetic image generation process is developed. Each generated image combines items from the MPEG-7 Core Experiment CE-Shape-1 Test Set into a single image using multiple parameters (e.g., density, transparency, etc.) and may or may not contain a target object. The difficulty of these images is validated by the performance of an automated image classification method. Experiment results suggest that more accurate results can be achieved with smaller training datasets when both the crowdsourced binary classification labels and the average of the self-reported confidence values in these labels are used as features for the ML classifiers. Moreover, when a relatively larger properly annotated dataset is available, in some cases augmenting these ML algorithms with the results (i.e., probability of outcome) from an automated classifier can achieve even higher performance than what can be obtained by using any one of the individual classifiers. Lastly, supplementary analysis of the collected data demonstrates that other performance metrics of interest, namely reduced false-negative rates, can be prioritized through special modifications of the proposed aggregation methods. 
    more » « less