skip to main content

Title: Efficient multilevel architecture for depth estimation from a single image
Monocular depth estimation is an important task in scene understanding with applications to pose, segmentation and autonomous navigation. Deep Learning methods relying on multilevel features are currently used for extracting local information that is used to infer depth from a single RGB image. We present an efficient architecture that utilizes the features from multiple levels with fewer connections compared to previous networks. Our model achieves comparable scores for monocular depth estimation with better efficiency on the memory requirements and computational burden.
; ;
Award ID(s):
Publication Date:
Journal Name:
Electronic Imaging
Sponsoring Org:
National Science Foundation
More Like this
  1. While 360° images are becoming ubiquitous due to popularity of panoramic content, they cannot directly work with most of the existing depth estimation techniques developed for perspective images. In this paper, we present a deep-learning-based framework of estimating depth from 360° images. We present an adaptive depth refinement procedure that refines depth estimates using normal estimates and pixel-wise uncertainty scores. We introduce double quaternion approximation to combine the loss of the joint estimation of depth and surface normal. Furthermore, we use the double quaternion formulation to also measure stereo consistency between the horizontally displaced depth maps, leading to a new loss function for training a depth estimation CNN. Results show that the new double-quaternion-based loss and the adaptive depth refinement procedure lead to better network performance. Our proposed method can be used with monocular as well as stereo images. When evaluated on several datasets, our method surpasses state-of-the-art methods on most metrics.
  2. Abstract—Robotic geo-fencing and surveillance systems require accurate monitoring of objects if/when they violate perimeter restrictions. In this paper, we seek a solution for depth imaging of such objects of interest at high accuracy (few tens of cm) over extended ranges (up to 300 meters) from a single vantage point, such as a pole mounted platform. Unfortunately, the rich literature in depth imaging using camera, lidar and radar in isolation struggles to meet these tight requirements in real-world conditions. This paper proposes Metamoran, a solution that explores long-range depth imaging of objects of interest by fusing the strengths of two complementary technologies: mmWave radar and camera. Unlike cameras, mmWave radars offer excellent cm-scale depth resolution even at very long ranges. However, their angular resolution is at least 10× worse than camera systems. Fusing these two modalities is natural, but in scenes with high clutter and at long ranges, radar reflections are weak and experience spurious artifacts. Metamoran’s core contribution is to leverage image segmentation and monocular depth estimation on camera images to help declutter radar and discover true object reflections.We perform a detailed evaluation of Metamoran’s depth imaging capabilities in 400 diverse scenarios. Our evaluation shows that Metamoran estimates the depthmore »of static objects up to 90 m away and moving objects up to 305 m away and with a median error of 28 cm, an improvement of 13× over a naive radar+camera baseline and 23× compared to monocular depth estimation.« less
  3. This work describes a monocular visual odometry framework, which exploits the best attributes of edge features for illumination-robust camera tracking, while at the same time ameliorating the performance degradation of edge mapping. In the front-end, an ICP-based edge registration provides robust motion estimation and coarse data association under lighting changes. In the back-end, a novel edge-guided data association pipeline searches for the best photometrically matched points along geometrically possible edges through template matching, so that the matches can be further refined in later bundle adjustment. The core of our proposed data association strategy lies in a point-to-edge geometric uncertainty analysis, which analytically derives (1) a probabilistic search length formula that significantly reduces the search space and (2) a geometric confidence metric for mapping degradation detection based on the predicted depth uncertainty. Moreover, a match confidence based patch size adaption strategy is integrated into our pipeline to reduce matching ambiguity. We present extensive analysis and evaluation of our proposed system on synthetic and real- world benchmark datasets under the influence of illumination changes and large camera motions, where our proposed system outperforms current state-of-art algorithms.
  4. Edge-assisted Augmented Reality (AR) which offloads computeintensive Deep Neural Network (DNN)-based AR tasks to edge servers faces an important design challenge: how to pick the DNN model out of many choices proposed for each AR task for offloading. For each AR task, e.g., depth estimation, many DNN-based models have been proposed over time that vary in accuracy and complexity. In general, more accurate models are also more complex; they are larger and have longer inference time. Thus choosing a larger model in offloading can provide higher accuracy for the offloaded frames but also incur longer turnaround time, during which the AR app has to reuse the estimation result from the last offloaded frame, which can lead to lower average accuracy. In this paper, we experimentally study this design tradeoff using depth estimation as a case study. We design optimal offloading schedule and further consider the impact of numerous factors such as on-device fast tracking, frame downsizing and available network bandwidth. Our results show that for edge-assisted monocular depth estimation, with proper frame downsizing and fast tracking, compared to small models, the improved accuracy of large models can offset its longer turnaround time to provide higher average estimation accuracy across framesmore »under both LTE and 5G mmWave.« less
  5. We present a personalized, comprehensive eye-tracking solution based on tracking higher-order Purkinje images, suited explicitly for eyeglasses-style AR and VR displays. Existing eye-tracking systems for near-eye applications are typically designed to work for an on-axis configuration and rely on pupil center and corneal reflections (PCCR) to estimate gaze with an accuracy of only about 0.5°to 1°. These are often expensive, bulky in form factor, and fail to estimate monocular accommodation, which is crucial for focus adjustment within the AR glasses. Our system independently measures the binocular vergence and monocular accommodation using higher-order Purkinje reflections from the eye, extending the PCCR based methods. We demonstrate that these reflections are sensitive to both gaze rotation and lens accommodation and model the Purkinje images’ behavior in simulation. We also design and fabricate a user-customized eye tracker using cheap off-the-shelf cameras and LEDs. We use an end-to-end convolutional neural network (CNN) for calibrating the eye tracker for the individual user, allowing for robust and simultaneous estimation of vergence and accommodation. Experimental results show that our solution, specifically catering to individual users, outperforms state-of-the-art methods for vergence and depth estimation, achieving an accuracy of 0.3782°and 1.108 cm respectively.