skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Optical flow training under limited label budget via active learning
Supervised training of optical flow predictors generally yields better accuracy than unsupervised training. However, the improved performance comes at an often high annotation cost. Semi-supervised training trades off accuracy against annotation cost. We use a simple yet effective semi-supervised training method to show that even a small fraction of labels can improve flow accuracy by a significant margin over unsupervised training. In addition, we propose active learning methods based on simple heuristics to further reduce the number of labels required to achieve the same target accuracy. Our experiments on both synthetic and real optical flow datasets show that our semi-supervised networks generally need around 50% of the labels to achieve close to full-label accuracy, and only around 20% with active learning on Sintel. We also analyze and show insights on the factors that may influence active learning performance. Code is available at https://github.com/duke-vision/ optical-flow-active-learning-release.  more » « less
Award ID(s):
1909821
PAR ID:
10377791
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
European Conference on Computer Vision
Page Range / eLocation ID:
410-427
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Although seismic industry has been investigating decades on solving the first break picking problems automatically, there are still enormous challenges during the investigation. Even till today, there are not solid solutions to avoid human labors to manually pick data by geophysicists. With the raise of deep learning and powerful hardware, many of those challenges can be overcome. In this work, we propose a deep semi-supervised neural network to achieve automatic picking for the first break in seismic data. The network is designed to perform with both unlabeled data and a limited amount of real data with labels. Initial feature representation is learning in a discriminative unsupervised manner on real datasets without labels. Since no assumptions are made with regard to the difference of underlying distributions between the synthetic and real data, our model has more marginal gain to compensate for the distribution drifting compare to the supervised learning models. In addition, the network is capable of updating itself through continuous learning. The system is able to identify labeling anomalies onsite and update the model through active learning. In simulation, we show our proposed deep semi-supervised neural network can achieve high accuracy on first break picking. Comparing with the supervised neural networks, our proposed network shows the advantage on using both labeled and unlabeled data set to achieve higher accuracy. 
    more » « less
  2. In recent years, deep learning has achieved tremendous success in image segmentation for computer vision applications. The performance of these models heavily relies on the availability of large-scale high-quality training labels (e.g., PASCAL VOC 2012). Unfortunately, such large-scale high-quality training data are often unavailable in many real-world spatial or spatiotemporal problems in earth science and remote sensing (e.g., mapping the nationwide river streams for water resource management). Although extensive efforts have been made to reduce the reliance on labeled data (e.g., semi-supervised or unsupervised learning, few-shot learning), the complex nature of geographic data such as spatial heterogeneity still requires sufficient training labels when transferring a pre-trained model from one region to another. On the other hand, it is often much easier to collect lower-quality training labels with imperfect alignment with earth imagery pixels (e.g., through interpreting coarse imagery by non-expert volunteers). However, directly training a deep neural network on imperfect labels with geometric annotation errors could significantly impact model performance. Existing research that overcomes imperfect training labels either focuses on errors in label class semantics or characterizes label location errors at the pixel level. These methods do not fully incorporate the geometric properties of label location errors in the vector representation. To fill the gap, this article proposes a weakly supervised learning framework to simultaneously update deep learning model parameters and infer hidden true vector label locations. Specifically, we model label location errors in the vector representation to partially reserve geometric properties (e.g., spatial contiguity within line segments). Evaluations on real-world datasets in the National Hydrography Dataset (NHD) refinement application illustrate that the proposed framework outperforms baseline methods in classification accuracy. 
    more » « less
  3. Adaptive optics-optical coherence tomography (AO-OCT) allows for the three-dimensional visualization of retinal ganglion cells (RGCs) in the living human eye. Quantitative analyses of RGCs have significant potential for improving the diagnosis and monitoring of diseases such as glaucoma. Recent advances in machine learning (ML) have made possible the automatic identification and analysis of RGCs within the complex three-dimensional retinal volumes obtained with such imaging. However, the current state-of-the-art ML approach relies on fully supervised training, which demands large amounts of training labels. Each volume requires many hours of expert manual annotation. Here, two semi-supervised training schemes are introduced, (i) cross-consistency training and (ii) cross pseudo supervision that utilize unlabeled AO-OCT volumes together with a minimal set of labels, vastly reducing the labeling demands. Moreover, these methods outperformed their fully supervised counterpart and achieved accuracy comparable to that of human experts. 
    more » « less
  4. In this work we explore confidence elicitation methods for crowdsourcing "soft" labels, e.g., probability estimates, to reduce the annotation costs for domains with ambiguous data. Machine learning research has shown that such "soft" labels are more informative and can reduce the data requirements when training supervised machine learning models. By reducing the number of required labels, we can reduce the costs of slow annotation processes such as audio annotation. In our experiments we evaluated three confidence elicitation methods: 1) "No Confidence" elicitation, 2) "Simple Confidence" elicitation, and 3) "Betting" mechanism for confidence elicitation, at both individual (i.e., per participant) and aggregate (i.e., crowd) levels. In addition, we evaluated the interaction between confidence elicitation methods, annotation types (binary, probability, and z-score derived probability), and "soft" versus "hard" (i.e., binarized) aggregate labels. Our results show that both confidence elicitation mechanisms result in higher annotation quality than the "No Confidence" mechanism for binary annotations at both participant and recording levels. In addition, when aggregating labels at the recording level, results indicate that we can achieve comparable results to those with 10-participant aggregate annotations using fewer annotators if we aggregate "soft" labels instead of "hard" labels. These results suggest that for binary audio annotation using a confidence elicitation mechanism and aggregating continuous labels we can obtain higher annotation quality, more informative labels, with quality differences more pronounced with fewer participants. Finally, we propose a way of integrating these confidence elicitation methods into a two-stage, multi-label annotation pipeline. 
    more » « less
  5. Recent advancements in two-photon calcium imaging have enabled scientists to record the activity of thousands of neurons with cellular resolution. This scope of data collection is crucial to understanding the next generation of neuroscience questions, but analyzing these large recordings requires automated methods for neuron segmentation. Supervised methods for neuron segmentation achieve state of-the-art accuracy and speed but currently require large amounts of manually generated ground truth training labels. We reduced the required number of training labels by designing a semi-supervised pipeline. Our pipeline used neural network ensembling to generate pseudolabels to train a single shallow U-Net. We tested our method on three publicly available datasets and compared our performance to three widely used segmentation methods. Our method outperformed other methods when trained on a small number of ground truth labels and could achieve state-of-the-art accuracy after training on approximately a quarter of the number of ground truth labels as supervised methods. When trained on many ground truth labels, our pipeline attained higher accuracy than that of state-of-the-art methods. Overall, our work will help researchers accurately process large neural recordings while minimizing the time and effort needed to generate manual labels. 
    more » « less