- PAR ID:
- 10475942
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Robotics and Automation Letters
- Volume:
- 8
- Issue:
- 10
- ISSN:
- 2377-3774
- Page Range / eLocation ID:
- 6843 to 6850
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Video scene analysis is a well-investigated area where researchers have devoted efforts to detect and classify people and objects in the scene. However, real-life scenes are more complex: the intrinsic states of the objects (e.g., machine operating states or human vital signals) are often overlooked by vision-based scene analysis. Recent work has proposed a radio frequency (RF) sensing technique, wireless vibrometry, that employs wireless signals to sense subtle vibrations from the objects and infer their internal states. We envision that the combination of video scene analysis with wireless vibrometry form a more comprehensive understanding of the scene, namely "rich scene analysis". However, the RF sensors used in wireless vibrometry only provide time series, and it is challenging to associate these time series data with multiple real-world objects. We propose a real-time RF-vision sensor fusion system, Capricorn, that efficiently builds a cross-modal correspondence between visual pixels and RF time series to better understand the complex natures of a scene. The vision sensors in Capricorn model the surrounding environment in 3D and obtain the distances of different objects. In the RF domain, the distance is proportional to the signal time-of-flight (ToF), and we can leverage the ToF to separate the RF time series corresponding to each object. The RF-vision sensor fusion in Capricorn brings multiple benefits. The vision sensors provide environmental contexts to guide the processing of RF data, which helps us select the most appropriate algorithms and models. Meanwhile, the RF sensor yields additional information that is originally invisible to vision sensors, providing insight into objects' intrinsic states. Our extensive evaluations show that Capricorn real-timely monitors multiple appliances' operating status with an accuracy of 97%+ and recovers vital signals like respirations from multiple people. A video (https://youtu.be/b-5nav3Fi78) demonstrates the capability of Capricorn.more » « less
-
Conventional continuous-wave amplitude-modulated time-of-flight (CWAM ToF) cameras suffer from a fundamental trade-off between light throughput and depth of field (DoF): a larger lens aperture allows more light collection but suffers from significantly lower DoF. However, both high light throughput, which increases signal-to-noise ratio, and a wide DoF, which enlarges the system’s applicable depth range, are valuable for CWAM ToF applications. In this work, we propose EDoF-ToF, an algorithmic method to extend the DoF of large-aperture CWAM ToF cameras by using a neural network to deblur objects outside of the lens’s narrow focal region and thus produce an all-in-focus measurement. A key component of our work is the proposed large-aperture ToF training data simulator, which models the depth-dependent blurs and partial occlusions caused by such apertures. Contrary to conventional image deblurring where the blur model is typically linear, ToF depth maps are nonlinear functions of scene intensities, resulting in a nonlinear blur model that we also derive for our simulator. Unlike extended DoF for conventional photography where depth information needs to be encoded (or made depth-invariant) using additional hardware (phase masks, focal sweeping, etc.), ToF sensor measurements naturally encode depth information, allowing a completely software solution to extended DoF. We experimentally demonstrate EDoF-ToF increasing the DoF of a conventional ToF system by 3.6 ×, effectively achieving the DoF of a smaller lens aperture that allows 22.1 × less light. Ultimately, EDoF-ToF enables CWAM ToF cameras to enjoy the benefits of both high light throughput and a wide DoF.
-
We introduce Doppler time-of-flight (D-ToF) rendering, an extension of ToF rendering for dynamic scenes, with applications in simulating D-ToF cameras. D-ToF cameras use high-frequency modulation of illumination and exposure, and measure the Doppler frequency shift to compute the radial velocity of dynamic objects. The time-varying scene geometry and high-frequency modulation functions used in such cameras make it challenging to accurately and efficiently simulate their measurements with existing ToF rendering algorithms. We overcome these challenges in a twofold manner: To achieve accuracy, we derive path integral expressions for D-ToF measurements under global illumination and form unbiased Monte Carlo estimates of these integrals. To achieve efficiency, we develop a tailored time-path sampling technique that combines antithetic time sampling with correlated path sampling. We show experimentally that our sampling technique achieves up to two orders of magnitude lower variance compared to naive time-path sampling. We provide an open-source simulator that serves as a digital twin for D-ToF imaging systems, allowing imaging researchers, for the first time, to investigate the impact of modulation functions, material properties, and global illumination on D-ToF imaging performance.
-
Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with low reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors now available on modern smartphones.more » « less
-
Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with low reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors that are now available on modern smartphones.more » « less