skip to main content


Title: Exploring mmWave Radar and Camera Fusion for High-Resolution and Long-Range Depth Imaging
Abstract—Robotic geo-fencing and surveillance systems require accurate monitoring of objects if/when they violate perimeter restrictions. In this paper, we seek a solution for depth imaging of such objects of interest at high accuracy (few tens of cm) over extended ranges (up to 300 meters) from a single vantage point, such as a pole mounted platform. Unfortunately, the rich literature in depth imaging using camera, lidar and radar in isolation struggles to meet these tight requirements in real-world conditions. This paper proposes Metamoran, a solution that explores long-range depth imaging of objects of interest by fusing the strengths of two complementary technologies: mmWave radar and camera. Unlike cameras, mmWave radars offer excellent cm-scale depth resolution even at very long ranges. However, their angular resolution is at least 10× worse than camera systems. Fusing these two modalities is natural, but in scenes with high clutter and at long ranges, radar reflections are weak and experience spurious artifacts. Metamoran’s core contribution is to leverage image segmentation and monocular depth estimation on camera images to help declutter radar and discover true object reflections.We perform a detailed evaluation of Metamoran’s depth imaging capabilities in 400 diverse scenarios. Our evaluation shows that Metamoran estimates the depth of static objects up to 90 m away and moving objects up to 305 m away and with a median error of 28 cm, an improvement of 13× over a naive radar+camera baseline and 23× compared to monocular depth estimation.  more » « less
Award ID(s):
1823235
PAR ID:
10359547
Author(s) / Creator(s):
Date Published:
Journal Name:
IROS
ISSN:
0166-5464
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The ubiquity of millimeter-wave (mmWave) technology could bring through-obstruction imaging to portable, mobile systems. Existing through-obstruction imaging systems rely on Synthetic Aperture Radar (SAR) technique, but emulating the SAR principle on hand-held devices has been challenging. We propose ViSAR, a portable platform that integrates an optical camera and mmWave radar to emulate the SAR principle and enable through-obstruction 3D imaging. ViSAR synchronizes the devices at the software-level and uses the Time Domain Backprojection algorithm to generate vision-augmented mmWave images. We have experimentally evaluated ViSAR by imaging several indoor objects. 
    more » « less
  2. Unsupervised monocular depth estimation techniques have demonstrated encour- aging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be ex- plained by hypothesizing the object’s independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving objects. To resolve this issue, we introduce Dynamo-Depth, an unifying approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. Specifically, we offer our key insight that a good initial estimation of motion seg- mentation is sufficient for jointly learning depth and independent motion despite the fundamental underlying ambiguity. Our proposed method achieves state-of-the-art performance on monocular depth estimation on Waymo Open [34] and nuScenes [3] Dataset with significant improvement in the depth of moving objects. Code and additional results are available at https://dynamo-depth.github.io. 
    more » « less
  3. Unsupervised monocular depth estimation techniques have demonstrated encouraging results but typically assume that the scene is static. These techniques suffer when trained on dynamical scenes, where apparent object motion can equally be explained by hypothesizing the object's independent motion, or by altering its depth. This ambiguity causes depth estimators to predict erroneous depth for moving objects. To resolve this issue, we introduce Dynamo-Depth, an unifying approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos. Specifically, we offer our key insight that a good initial estimation of motion segmentation is sufficient for jointly learning depth and independent motion despite the fundamental underlying ambiguity. Our proposed method achieves state-of-the-art performance on monocular depth estimation on Waymo Open and nuScenes Dataset with significant improvement in the depth of moving objects. Code and additional results are available at https://dynamo-depth.github.io. 
    more » « less
  4. Millimeter-Wave (mmWave) radar can enable high-resolution human pose estimation with low cost and computational requirements. However, mmWave data point cloud, the primary input to processing algorithms, is highly sparse and carries significantly less information than other alternatives such as video frames. Furthermore, the scarce labeled mmWave data impedes the development of machine learning (ML) models that can generalize to unseen scenarios. We propose a fast and scalable human pose estimation (FUSE) framework that combines multi-frame representation and meta-learning to address these challenges. Experimental evaluations show that FUSE adapts to the unseen scenarios 4× faster than current supervised learning approaches and estimates human joint coordinates with about 7 cm mean absolute error. 
    more » « less
  5. Millimeter wave (mmWave) sensing has recently gained attention for its robustness in challenging environments. When visual sensors such as cameras fail to perform, mmWave radars can be used to provide reliable performance. However, the poor scattering performance and lack of texture in millimeter waves can make it difficult for radars to identify objects in some situations precisely. In this paper, we take insight from camera fiducials which are very easily identifiable by a camera, and present R-fiducial tags, which smartly augment the current infrastructure to enable myriad applications with mmwave radars. R-fiducial acts as fiducials for mmwave sensing, similar to camera fiducials, and can be reliably identified by a mmwave radar. We identify a set of requirements for millimeter wave fiducials and show how R-fiducial meets them all. R-fiducial uses a novel spread-spectrum modulation technique to provide low latency with high reliability. Our evaluations show that R-fiducial can be reliably detected with a 100% detection rate up to 25 meters with a 120-degree field of view and a few milliseconds of latency. We also conduct experiments and case studies in adverse and low visibility conditions to demonstrate the potential of R-fiducial in a variety of applications. 
    more » « less