Unsupervised monocular depth estimation techniques have demonstrated encour- aging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be ex- plained by hypothesizing the object’s independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving objects. To resolve this issue, we introduce Dynamo-Depth, an unifying approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. Specifically, we offer our key insight that a good initial estimation of motion seg- mentation is sufficient for jointly learning depth and independent motion despite the fundamental underlying ambiguity. Our proposed method achieves state-of-the-art performance on monocular depth estimation on Waymo Open [34] and nuScenes [3] Dataset with significant improvement in the depth of moving objects. Code and additional results are available at https://dynamo-depth.github.io.
more »
« less
Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes
Unsupervised monocular depth estimation techniques have demonstrated encouraging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be explained by hypothesizing the object's independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving objects. To resolve this issue, we introduce Dynamo-Depth, an unifying approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. Specifically, we offer our key insight that a good initial estimation of motion segmentation is sufficient for jointly learning depth and independent motion despite the fundamental underlying ambiguity. Our proposed method achieves state-of-the-art performance on monocular depth estimation on Waymo Open and nuScenes Dataset with significant improvement in the depth of moving objects. Code and additional results are available at https://dynamo-depth.github.io.
more »
« less
- Award ID(s):
- 2107161
- PAR ID:
- 10546173
- Publisher / Repository:
- Advances in Neural Information Processing Systems 36 (NeurIPS 2023)
- Date Published:
- ISBN:
- 9781713899921
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The primary objective of this study was to utilize state-of-the-art deep learning-based monocular depth estimation models to assist UAS pilots in rainforest canopy data collection and navigation. Monocular depth estimation models provide a complementary technique to other depth measurement and estimation techniques to extend the range and improve mea- surements. Several state-of-the-art models were evaluated using a novel dataset composed of data from a simulated rainforest environment. In the evaluation, MiDaS outperformed the other models, and a segmentation pipeline was designed using this model to identify the highest areas of the canopies. The segmen- tation pipeline was evaluated using 1080p and 360p input videos from the simulated rainforest dataset. It was able to achieve an IoU of 0.848 and 0.826 and an F1 score of 0.915 and 0.902 at each resolution, respectively. We incorporated the proposed depth-estimation-based segmentation pipeline into an example application and deployed it on an edge system. Experimental results display the capabilities of a UAS using the segmentation pipeline for rainforest data collection.more » « less
-
Abstract—Robotic geo-fencing and surveillance systems require accurate monitoring of objects if/when they violate perimeter restrictions. In this paper, we seek a solution for depth imaging of such objects of interest at high accuracy (few tens of cm) over extended ranges (up to 300 meters) from a single vantage point, such as a pole mounted platform. Unfortunately, the rich literature in depth imaging using camera, lidar and radar in isolation struggles to meet these tight requirements in real-world conditions. This paper proposes Metamoran, a solution that explores long-range depth imaging of objects of interest by fusing the strengths of two complementary technologies: mmWave radar and camera. Unlike cameras, mmWave radars offer excellent cm-scale depth resolution even at very long ranges. However, their angular resolution is at least 10× worse than camera systems. Fusing these two modalities is natural, but in scenes with high clutter and at long ranges, radar reflections are weak and experience spurious artifacts. Metamoran’s core contribution is to leverage image segmentation and monocular depth estimation on camera images to help declutter radar and discover true object reflections.We perform a detailed evaluation of Metamoran’s depth imaging capabilities in 400 diverse scenarios. Our evaluation shows that Metamoran estimates the depth of static objects up to 90 m away and moving objects up to 305 m away and with a median error of 28 cm, an improvement of 13× over a naive radar+camera baseline and 23× compared to monocular depth estimation.more » « less
-
Monocular depth estimation is an important task in scene understanding with applications to pose, segmentation and autonomous navigation. Deep Learning methods relying on multilevel features are currently used for extracting local information that is used to infer depth from a single RGB image. We present an efficient architecture that utilizes the features from multiple levels with fewer connections compared to previous networks. Our model achieves comparable scores for monocular depth estimation with better efficiency on the memory requirements and computational burden.more » « less
-
Recent advances in object segmentation have demonstrated that deep neural networks excel at object segmentation for specific classes in color and depth images. However, their performance is dictated by the number of classes and objects used for training, thereby hindering generalization to never seen objects or zero-shot samples. To exacerbate the problem further, object segmentation using image frames rely on recognition and pattern matching cues. Instead, we utilize the ‘active’ nature of a robot and their ability to ‘interact’ with the environment to induce additional geometric constraints for segmenting zero-shot samples. In this paper, we present the first framework to segment unknown objects in a cluttered scene by repeatedly ‘nudging’ at the objects and moving them to obtain additional motion cues at every step using only a monochrome monocular camera. We call our framework NudgeSeg. These motion cues are used to refine the segmentation masks. We successfully test our approach to segment novel objects in various cluttered scenes and provide an extensive study with image and motion segmentation methods. We show an impressive average detection rate of over 86% on zero-shot objects.more » « less
An official website of the United States government

