skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Efficient multilevel architecture for depth estimation from a single image
Monocular depth estimation is an important task in scene understanding with applications to pose, segmentation and autonomous navigation. Deep Learning methods relying on multilevel features are currently used for extracting local information that is used to infer depth from a single RGB image. We present an efficient architecture that utilizes the features from multiple levels with fewer connections compared to previous networks. Our model achieves comparable scores for monocular depth estimation with better efficiency on the memory requirements and computational burden.  more » « less
Award ID(s):
1749376
PAR ID:
10140798
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Electronic Imaging
ISSN:
2470-1173
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Unsupervised monocular depth estimation techniques have demonstrated encour- aging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be ex- plained by hypothesizing the object’s independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving objects. To resolve this issue, we introduce Dynamo-Depth, an unifying approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. Specifically, we offer our key insight that a good initial estimation of motion seg- mentation is sufficient for jointly learning depth and independent motion despite the fundamental underlying ambiguity. Our proposed method achieves state-of-the-art performance on monocular depth estimation on Waymo Open [34] and nuScenes [3] Dataset with significant improvement in the depth of moving objects. Code and additional results are available at https://dynamo-depth.github.io. 
    more » « less
  2. Unsupervised monocular depth estimation techniques have demonstrated encouraging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be explained by hypothesizing the object's independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving objects. To resolve this issue, we introduce Dynamo-Depth, an unifying approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. Specifically, we offer our key insight that a good initial estimation of motion segmentation is sufficient for jointly learning depth and independent motion despite the fundamental underlying ambiguity. Our proposed method achieves state-of-the-art performance on monocular depth estimation on Waymo Open and nuScenes Dataset with significant improvement in the depth of moving objects. Code and additional results are available at https://dynamo-depth.github.io. 
    more » « less
  3. null (Ed.)
    While 360° images are becoming ubiquitous due to popularity of panoramic content, they cannot directly work with most of the existing depth estimation techniques developed for perspective images. In this paper, we present a deep-learning-based framework of estimating depth from 360° images. We present an adaptive depth refinement procedure that refines depth estimates using normal estimates and pixel-wise uncertainty scores. We introduce double quaternion approximation to combine the loss of the joint estimation of depth and surface normal. Furthermore, we use the double quaternion formulation to also measure stereo consistency between the horizontally displaced depth maps, leading to a new loss function for training a depth estimation CNN. Results show that the new double-quaternion-based loss and the adaptive depth refinement procedure lead to better network performance. Our proposed method can be used with monocular as well as stereo images. When evaluated on several datasets, our method surpasses state-of-the-art methods on most metrics. 
    more » « less
  4. The primary objective of this study was to utilize state-of-the-art deep learning-based monocular depth estimation models to assist UAS pilots in rainforest canopy data collection and navigation. Monocular depth estimation models provide a complementary technique to other depth measurement and estimation techniques to extend the range and improve mea- surements. Several state-of-the-art models were evaluated using a novel dataset composed of data from a simulated rainforest environment. In the evaluation, MiDaS outperformed the other models, and a segmentation pipeline was designed using this model to identify the highest areas of the canopies. The segmen- tation pipeline was evaluated using 1080p and 360p input videos from the simulated rainforest dataset. It was able to achieve an IoU of 0.848 and 0.826 and an F1 score of 0.915 and 0.902 at each resolution, respectively. We incorporated the proposed depth-estimation-based segmentation pipeline into an example application and deployed it on an edge system. Experimental results display the capabilities of a UAS using the segmentation pipeline for rainforest data collection. 
    more » « less
  5. In monocular visual-inertial navigation, it is desirable to initialize the system as quickly and robustly as possible. A state-of-the-art initialization method typically constructs a linear system to find a closed-form solution using the image features and inertial measurements and then refines the states with a nonlinear optimization. These methods generally require a few seconds of data, which however can be expedited (less than a second) by adding constraints from a robust but only up-to-scale monocular depth network in the nonlinear optimization. To further accelerate this process, in this work, we leverage the scale-less depth measurements instead in the linear initialization step that is performed prior to the nonlinear one, which only requires a single depth image for the first frame. Importantly, we show that the typical estimation of all feature states independently in the closed-form solution can be modeled as estimating only the scale and bias parameters of the learned depth map. As such, our formulation enables building a smaller minimal problem than the state of the art, which can be seamlessly integrated into RANSAC for robust estimation. Experiments show that our method has state-of-the-art initialization performance in simulation as well as on popular real-world datasets (TUM-VI, and EuRoC MAV). For the TUM-VI dataset in simulation as well as real-world, we demonstrate the superior initialization performance with only a 0.3 s window of data, which is the smallest ever reported, and validate that our method can initialize more often, robustly, and accurately in different challenging scenarios. 
    more » « less