Video scene analysis is a well-investigated area where researchers have devoted efforts to detect and classify people and objects in the scene. However, real-life scenes are more complex: the intrinsic states of the objects (e.g., machine operating states or human vital signals) are often overlooked by vision-based scene analysis. Recent work has proposed a radio frequency (RF) sensing technique, wireless vibrometry, that employs wireless signals to sense subtle vibrations from the objects and infer their internal states. We envision that the combination of video scene analysis with wireless vibrometry form a more comprehensive understanding of the scene, namely "rich scene analysis". However, the RF sensors used in wireless vibrometry only provide time series, and it is challenging to associate these time series data with multiple real-world objects. We propose a real-time RF-vision sensor fusion system, Capricorn, that efficiently builds a cross-modal correspondence between visual pixels and RF time series to better understand the complex natures of a scene. The vision sensors in Capricorn model the surrounding environment in 3D and obtain the distances of different objects. In the RF domain, the distance is proportional to the signal time-of-flight (ToF), and we can leverage the ToF to separate the RF time series corresponding to each object. The RF-vision sensor fusion in Capricorn brings multiple benefits. The vision sensors provide environmental contexts to guide the processing of RF data, which helps us select the most appropriate algorithms and models. Meanwhile, the RF sensor yields additional information that is originally invisible to vision sensors, providing insight into objects' intrinsic states. Our extensive evaluations show that Capricorn real-timely monitors multiple appliances' operating status with an accuracy of 97%+ and recovers vital signals like respirations from multiple people. A video (https://youtu.be/b-5nav3Fi78) demonstrates the capability of Capricorn.
more »
« less
Demo abstract: Towards Real-Time Rich Scene Analysis Using Vision-Guided Wireless Vibrometry
Intelligent systems commonly employ vision sensors like cameras to analyze a scene. Recent work has proposed a wireless sensing technique, wireless vibrometry, to enrich the scene analysis generated by vision sensors. Wireless vibrometry employs wireless signals to sense subtle vibrations from the objects and infer their internal states. However, it is difficult for pure Radio-Frequency (RF) sensing systems to obtain objects' visual appearances (e.g., object types and locations), especially when an object is inactive. Thus, most existing wireless vibrometry systems assume that the number and the types of objects in the scene are known. The key to getting rid of these presumptions is to build a connection between wireless sensor time series and vision sensor images. We present Capricorn, a vision-guided wireless vibrometry system. In Capricorn, the object type information from vision sensors guides the wireless vibrometry system to select the most appropriate signal processing pipeline. The object tracking capability in computer vision also helps wireless systems efficiently detect and separate vibrations from multiple objects in real time.
more »
« less
- Award ID(s):
- 1705135
- PAR ID:
- 10494741
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the 20th ACM Conference on Embedded Networked Sensor Systems
- ISBN:
- 9781450398862
- Page Range / eLocation ID:
- 786 to 787
- Format(s):
- Medium: X
- Location:
- Boston, Massachusetts
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper presents methods for improved teleoperation in dynamic environments in which the objects to be manipulated are moving, but vision may not meet size, biocompatibility, or maneuverability requirements. In such situations, the object could be tracked through non-geometric means, such as heat, radioactivity, or other markers. In order to safely explore a region, we use an optical time-of-flight pretouch sensor to detect (and range) target objects prior to contact. Information from these sensors is presented to the user via haptic virtual fixtures. This combination of techniques allows the teleoperator to “feel” the object without an actual contact event between the robot and the target object. Thus it provides the perceptual benefits of touch interaction to the operator, without incurring the negative consequences of the robot contacting unknown geometrical structures; premature contact can lead to damage or unwanted displacement of the target. The authors propose that as the geometry of the scene transitions from completely unknown to partially explored, haptic virtual fixtures can both prevent collisions and guide the user towards areas of interest, thus improving exploration speed. Experimental results show that for situations that are not amenable to vision, haptically-presented pretouch sensor information allows operators to more effectively explore moving objects.more » « less
-
In this work, we propose a novel method to supervise 3D Gaussian Splatting (3DGS) scenes using optical tactile sensors. Optical tactile sensors have become widespread in their use in robotics for manipulation and object representation; however, raw optical tactile sensor data is unsuitable to directly supervise a 3DGS scene. Our representation leverages a Gaussian Process Implicit Surface to implicitly represent the object, combining many touches into a unified representation with uncertainty. We merge this model with a monocular depth estimation network, which is aligned in a two stage process, coarsely aligning with a depth camera and then finely adjusting to match our touch data. For every training image, our method produces a corresponding fused depth and uncertainty map. Utilizing this additional information, we propose a new loss function, variance weighted depth supervised loss, for training the 3DGS scene model. We leverage the DenseTact optical tactile sensor and RealSense RGB-D camera to show that combining touch and vision in this manner leads to quantitatively and qualitatively better results than vision or touch alone in a few-view scene syntheses on opaque as well as on reflective and transparent objects. Please see our project page at armlabstanford.github.io/touch-gsmore » « less
-
We present a method for reconstructing 3D shape of arbitrary Lambertian objects based on measurements by miniature, energy-efficient, low-cost single-photon cameras. These cameras, operating as time resolved image sensors, illuminate the scene with a very fast pulse of diffuse light and record the shape of that pulse as it returns back from the scene at a high temporal resolution. We propose to model this image formation process, account for its non-idealities, and adapt neural rendering to reconstruct 3D geometry from a set of spatially distributed sensors with known poses. We show that our approach can successfully recover complex 3D shapes from simulated data. We further demonstrate 3D object reconstruction from real-world captures, utilizing measurements from a commodity proximity sensor. Our work draws a connection between image-based modeling and active range scanning and is a step towards 3D vision with single-photon cameras.more » « less
-
Multi-sensor fusion has been widely used by autonomous vehicles (AVs) to integrate the perception results from different sensing modalities including LiDAR, camera and radar. Despite the rapid development of multi-sensor fusion systems in autonomous driving, their vulnerability to malicious attacks have not been well studied. Although some prior works have studied the attacks against the perception systems of AVs, they only consider a single sensing modality or a camera-LiDAR fusion system, which can not attack the sensor fusion system based on LiDAR, camera, and radar. To fill this research gap, in this paper, we present the first study on the vulnerability of multi-sensor fusion systems that employ LiDAR, camera, and radar. Specifically, we propose a novel attack method that can simultaneously attack all three types of sensing modalities using a single type of adversarial object. The adversarial object can be easily fabricated at low cost, and the proposed attack can be easily performed with high stealthiness and flexibility in practice. Extensive experiments based on a real-world AV testbed show that the proposed attack can continuously hide a target vehicle from the perception system of a victim AV using only two small adversarial objects.more » « less
An official website of the United States government

