skip to main content


Title: Semi-supervised 3D Object Detection via Temporal Graph Neural Networks
3D object detection plays an important role in autonomous driving and other robotics applications. However, these detectors usually require training on large amounts of annotated data that is expensive and time-consuming to collect. Instead, we propose leveraging large amounts of unlabeled point cloud videos by semi-supervised learning of 3D object detectors via temporal graph neural networks. Our insight is that temporal smoothing can create more accurate detection results on unlabeled data, and these smoothed detections can then be used to retrain the detector. We learn to perform this temporal reasoning with a graph neural network, where edges represent the relationship between candidate detections in different time frames.  more » « less
Award ID(s):
1849154
NSF-PAR ID:
10322232
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Conference on 3D Vision (3DV)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. For a self-driving car to operate reliably, its perceptual system must generalize to the end-user's environment---ideally without additional annotation efforts. One potential solution is to leverage unlabeled data (eg, unlabeled LiDAR point clouds) collected from the end-users' environments (ie target domain) to adapt the system to the difference between training and testing environments. While extensive research has been done on such an unsupervised domain adaptation problem, one fundamental problem lingers: there is no reliable signal in the target domain to supervise the adaptation process. To overcome this issue we observe that it is easy to collect unsupervised data from multiple traversals of repeated routes. While different from conventional unsupervised domain adaptation, this assumption is extremely realistic since many drivers share the same roads. We show that this simple additional assumption is sufficient to obtain a potent signal that allows us to perform iterative self-training of 3D object detectors on the target domain. Concretely, we generate pseudo-labels with the out-of-domain detector but reduce false positives by removing detections of supposedly mobile objects that are persistent across traversals. Further, we reduce false negatives by encouraging predictions in regions that are not persistent. We experiment with our approach on two large-scale driving datasets and show remarkable improvement in 3D object detection of cars, pedestrians, and cyclists, bringing us a step closer to generalizable autonomous driving. 
    more » « less
  2. null (Ed.)
    3D object trackers usually require training on large amounts of annotated data that is expensive and time-consuming to collect. Instead, we propose leveraging vast unlabeled datasets by self-supervised metric learning of 3D object trackers, with a focus on data association. Large scale annotations for unlabeled data are cheaply obtained by automatic object detection and association across frames. We show how these self-supervised annotations can be used in a principled manner to learn point-cloud embeddings that are effective for 3D tracking. We estimate and incorporate uncertainty in self-supervised tracking to learn more robust embeddings, without needing any labeled data. We design embeddings to differentiate objects across frames, and learn them using uncertainty-aware self-supervised training. Finally, we demonstrate their ability to perform accurate data association across frames, towards effective and accurate 3D tracking. 
    more » « less
  3. Abstract

    Marine megafauna are difficult to observe and count because many species travel widely and spend large amounts of time submerged. As such, management programmes seeking to conserve these species are often hampered by limited information about population levels.

    Unoccupied aircraft systems (UAS, aka drones) provide a potentially useful technique for assessing marine animal populations, but a central challenge lies in analysing the vast amounts of data generated in the images or video acquired during each flight. Neural networks are emerging as a powerful tool for automating object detection across data domains and can be applied to UAS imagery to generate new population‐level insights. To explore the utility of these emerging technologies in a challenging field setting, we used neural networks to enumerate olive ridley turtlesLepidochelys olivaceain drone images acquired during a mass‐nesting event on the coast of Ostional, Costa Rica.

    Results revealed substantial promise for this approach; specifically, our model detected 8% more turtles than manual counts while effectively reducing the manual validation burden from 2,971,554 to 44,822 image windows. Our detection pipeline was trained on a relatively small set of turtle examples (N = 944), implying that this method can be easily bootstrapped for other applications, and is practical with real‐world UAS datasets.

    Our findings highlight the feasibility of combining UAS and neural networks to estimate population levels of diverse marine animals and suggest that the automation inherent in these techniques will soon permit monitoring over spatial and temporal scales that would previously have been impractical.

     
    more » « less
  4. To alleviate the cost of collecting and annotating large-scale "3D object" point cloud data, we propose an unsupervised learning approach to learn features from an unlabeled point cloud dataset by using part contrasting and object clustering with deep graph convolutional neural networks (GCNNs). In the contrast learning step, all the samples in the 3D object dataset are cut into two parts and put into a "part" dataset. Then a contrast learning GCNN (ContrastNet) is trained to verify whether two randomly sampled parts from the part dataset belong to the same object. In the cluster learning step, the trained ContrastNet is applied to all the samples in the original 3D object dataset to extract features, which are used to group the samples into clusters. Then another GCNN for clustering learning (ClusterNet) is trained from the orignal 3D data to predict the cluster IDs of all the training samples. The contrasting learning forces the ContrastNet to learn semantic features of objects, while the ClusterNet improves the quality of learned features by being trained to discover objects that belong to the same semantic categories by using cluster IDs. We have conducted extensive experiments to evaluate the proposed framework on point cloud classification tasks. The proposed unsupervised learning approach obtains comparable performance to the state-of-the-art with heavier shape auto-encoding unsupervised feature extraction methods. We have also tested the networks on object recognition using partial 3D data, by simulating occlusions and perspective views, and obtained practically useful results. The code of this work is publicly available at: https://github.com/lingzhang1/ContrastNet. 
    more » « less
  5. Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)
    Scalp electroencephalograms (EEGs) are the primary means by which phy-sicians diagnose brain-related illnesses such as epilepsy and seizures. Au-tomated seizure detection using clinical EEGs is a very difficult machine learning problem due to the low fidelity of a scalp EEG signal. Neverthe-less, despite the poor signal quality, clinicians can reliably diagnose ill-nesses from visual inspection of the signal waveform. Commercially avail-able automated seizure detection systems, however, suffer from unaccepta-bly high false alarm rates. Deep learning algorithms that require large amounts of training data have not previously been effective on this task due to the lack of big data resources necessary for building such models and the complexity of the signals involved. The evolution of big data science, most notably the release of the Temple University EEG (TUEG) Corpus, has mo-tivated renewed interest in this problem. In this chapter, we discuss the application of a variety of deep learning ar-chitectures to automated seizure detection. Architectures explored include multilayer perceptrons, convolutional neural networks (CNNs), long short-term memory networks (LSTMs), gated recurrent units and residual neural networks. We use the TUEG Corpus, supplemented with data from Duke University, to evaluate the performance of these hybrid deep structures. Since TUEG contains a significant amount of unlabeled data, we also dis-cuss unsupervised pre-training methods used prior to training these com-plex recurrent networks. Exploiting spatial and temporal context is critical for accurate disambigua-tion of seizures from artifacts. We explore how effectively several conven-tional architectures are able to model context and introduce a hybrid system that integrates CNNs and LSTMs. The primary error modalities observed by this state-of-the-art system were false alarms generated during brief delta range slowing patterns such as intermittent rhythmic delta activity. A varie-ty of these types of events have been observed during inter-ictal and post-ictal stages. Training models on such events with diverse morphologies has the potential to significantly reduce the remaining false alarms. This is one reason we are continuing our efforts to annotate a larger portion of TUEG. Increasing the data set size significantly allows us to leverage more ad-vanced machine learning methodologies. 
    more » « less