skip to main content

Title: Benefits of Synthetically Pre-trained Depth-Prediction Networks for Indoor/Outdoor Image Classification
Ground truth depth information is necessary for many computer vision tasks. Collecting this information is chal-lenging, especially for outdoor scenes. In this work, we propose utilizing single-view depth prediction neural networks pre-trained on synthetic scenes to generate relative depth, which we call pseudo-depth. This approach is a less expen-sive option as the pre-trained neural network obtains ac-curate depth information from synthetic scenes, which does not require any expensive sensor equipment and takes less time. We measure the usefulness of pseudo-depth from pre-trained neural networks by training indoor/outdoor binary classifiers with and without it. We also compare the difference in accuracy between using pseudo-depth and ground truth depth. We experimentally show that adding pseudo-depth to training achieves a 4.4% performance boost over the non-depth baseline model on DIODE, a large stan-dard test dataset, retaining 63.8% of the performance boost achieved from training a classifier on RGB and ground truth depth. It also boosts performance by 1.3% on another dataset, SUN397, for which ground truth depth is not avail-able. Our result shows that it is possible to take information obtained from a model pre-trained on synthetic scenes and successfully apply it beyond the synthetic domain to real-world data.  more » « less
Award ID(s):
2211784 1911230
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Date Published:
Journal Name:
2023 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)
Page Range / eLocation ID:
360 to 369
Medium: X
Waikoloa, HI, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Reliable probability estimation is of crucial importance in many real-world applications where there is inherent (aleatoric) uncertainty. Probability-estimation models are trained on observed outcomes (e.g. whether it has rained or not, or whether a patient has died or not), because the ground-truth probabilities of the events of interest are typically unknown. The problem is therefore analogous to binary classification, with the difference that the objective is to estimate probabilities rather than predicting the specific outcome. This work investigates probability estimation from high-dimensional data using deep neural networks. There exist several methods to improve the probabilities generated by these models but they mostly focus on model (epistemic) uncertainty. For problems with inherent uncertainty, it is challenging to evaluate performance without access to ground-truth probabilities. To address this, we build a synthetic dataset to study and compare different computable metrics. We evaluate existing methods on the synthetic data as well as on three real-world probability estimation tasks, all of which involve inherent uncertainty: precipitation forecasting from radar images, predicting cancer patient survival from histopathology images, and predicting car crashes from dashcam videos. We also give a theoretical analysis of a model for high-dimensional probability estimation which reproduces several of the phenomena evinced in our experiments. Finally, we propose a new method for probability estimation using neural networks, which modifies the training process to promote output probabilities that are consistent with empirical probabilities computed from the data. The method outperforms existing approaches on most metrics on the simulated as well as real-world data. 
    more » « less
  2. Transparent objects are a very challenging problem in computer vision. They are hard to segment or classify due to their lack of precise boundaries, and there is limited data available for training deep neural networks. As such, current solutions for this problem employ rigid synthetic datasets, which lack flexibility and lead to severe performance degradation when deployed on real-world scenarios. In particular, these synthetic datasets omit features such as refraction, dispersion and caustics due to limitations in the rendering pipeline. To address this issue, we present SuperCaustics, a real-time, open-source simulation of transparent objects designed for deep learning applications. SuperCaustics features extensive modules for stochastic environment creation; uses hardware ray-tracing to support caustics, dispersion, and refraction; and enables generating massive datasets with multi-modal, pixel-perfect ground truth annotations. To validate our proposed system, we trained a deep neural network from scratch to segment transparent objects in difficult lighting scenarios. Our neural network achieved performance comparable to the state-of-the-art on a real-world dataset using only 10% of the training data and in a fraction of the training time. Further experiments show that a model trained with SuperCaustics can segment different types of caustics, even in images with multiple overlapping transparent objects. To the best of our knowledge, this is the first such result for a model trained on synthetic data. Both our open-source code and experimental data are freely available online. 
    more » « less
  3. null (Ed.)
    Arguably one of the top success stories of deep learning is transfer learning. The finding that pre-training a network on a rich source set (e.g., ImageNet) can help boost performance once fine-tuned on a usually much smaller target set, has been instrumental to many applications in language and vision. Yet, very little is known about its usefulness in 3D point cloud understanding. We see this as an opportunity considering the effort required for annotating data in 3D. In this work, we aim at facilitating research on 3D representation learning. Different from previous works, we focus on high-level scene understanding tasks. To this end, we select a suit of diverse datasets and tasks to measure the effect of unsupervised pre-training on a large source set of 3D scenes. Our findings are extremely encouraging: using a unified triplet of architecture, source dataset, and contrastive loss for pre-training, we achieve improvement over recent best results in segmentation and detection across 6 different benchmarks for indoor and outdoor, real and synthetic datasets – demonstrating that the learned representation can generalize across domains. Furthermore, the improvement was similar to supervised pre-training, suggesting that future efforts should favor scaling data collection over more detailed annotation. We hope these findings will encourage more research on unsupervised pretext task design for 3D deep learning. 
    more » « less
  4. Abstract Purpose

    Synthetic digital mammogram (SDM) is a 2D image generated from digital breast tomosynthesis (DBT) and used as a substitute for a full‐field digital mammogram (FFDM) to reduce the radiation dose for breast cancer screening. The previous deep learning‐based method used FFDM images as the ground truth, and trained a single neural network to directly generate SDM images with similar appearances (e.g., intensity distribution, textures) to the FFDM images. However, the FFDM image has a different texture pattern from DBT. The difference in texture pattern might make the training of the neural network unstable and result in high‐intensity distortion, which makes it hard to decrease intensity distortion and increase perceptual similarity (e.g., generate similar textures) at the same time. Clinically, radiologists want to have a 2D synthesized image that feels like an FFDM image in vision and preserves local structures such as both mass and microcalcifications (MCs) in DBT because radiologists have been trained on reading FFDM images for a long time, while local structures are important for diagnosis. In this study, we proposed to use a deep convolutional neural network to learn the transformation to generate SDM from DBT.


    To decrease intensity distortion and increase perceptual similarity, a multi‐scale cascaded network (MSCN) is proposed to generate low‐frequency structures (e.g., intensity distribution) and high‐frequency structures (e.g., textures) separately. The MSCN consist of two cascaded sub‐networks: the first sub‐network is used to predict the low‐frequency part of the FFDM image; the second sub‐network is used to generate a full SDM image with textures similar to the FFDM image based on the prediction of the first sub‐network. The mean‐squared error (MSE) objective function is used to train the first sub‐network, termed low‐frequency network, to generate a low‐frequency SDM image. The gradient‐guided generative adversarial network's objective function is to train the second sub‐network, termed high‐frequency network, to generate a full SDM image with textures similar to the FFDM image.


    1646 cases with FFDM and DBT were retrospectively collected from the Hologic Selenia system for training and validation dataset, and 145 cases with masses or MC clusters were independently collected from the Hologic Selenia system for testing dataset. For comparison, the baseline network has the same architecture as the high‐frequency network and directly generates a full SDM image. Compared to the baseline method, the proposed MSCN improves the peak‐to‐noise ratio from 25.3 to 27.9 dB and improves the structural similarity from 0.703 to 0.724, and significantly increases the perceptual similarity.


    The proposed method can stabilize the training and generate SDM images with lower intensity distortion and higher perceptual similarity.

    more » « less
  5. Inference accuracy of deep neural networks (DNNs) is a crucial performance metric, but can vary greatly in practice subject to actual test datasets and is typically unknown due to the lack of ground truth labels. This has raised significant concerns with trustworthiness of DNNs, especially in safety-critical applications. In this paper, we address trustworthiness of DNNs by using post-hoc processing to monitor the true inference accuracy on a user’s dataset. Concretely, we propose a neural network-based accuracy monitor model, which only takes the deployed DNN’s softmax probability output as its input and directly predicts if the DNN’s prediction result is correct or not, thus leading to an estimate of the true inference accuracy. The accuracy monitor model can be pre-trained on a dataset relevant to the target application of interest, and only needs to actively label a small portion (1% in our experiments) of the user’s dataset for model transfer. For estimation robustness, we further employ an ensemble of monitor models based on the Monte-Carlo dropout method. We evaluate our approach on different deployed DNN models for image classification and traffic sign detection over multiple datasets (including adversarial samples). The result shows that our accuracy monitor model provides a close-to-true accuracy estimation and outperforms the existing baseline methods. 
    more » « less