This project introduces a framework to enable robots to recognize human hand signals, a reliable and feasible device-free means of communication in many noisy environments such as construction sites and airport ramps, to facilitate efficient human-robot collaboration. Various hand signal systems are accepted in many small groups for specific purposes, such as Marshalling on airport ramps and construction site crane operations. Robots must be robust to unpredictable conditions, including various backgrounds and human appearances, an extreme challenge imposed by open environments. To address these challenges, we propose Instant Hand Signal Recognition (IHSR), a learning-based framework with world knowledge of human gestures embedded, for robots to learn novel hand signals in a few samples. It also offers robust zero-shot generalization to recognize learned signals in novel scenarios. Extensive experiments show that our IHSR can learn a novel hand signal in only 50 samples, which is 30+ times more efficient than the state-of-the-art method. It also demonstrates a robust zero-shot generalization for deploying a learned model in unseen environments to recognize hand signals from unseen human users.
more »
« less
ViSig: Automatic Interpretation of Visual Body Signals Using On-Body Sensors
Visual body signals are designated body poses that deliver an application-specific message. Such signals are widely used for fast message communication in sports (signaling by umpires and referees), transportation (naval officers and aircraft marshallers), and construction (signaling by riggers and crane operators), to list a few examples. Automatic interpretation of such signals can help maintaining safer operations in these industries, help in record-keeping for auditing or accident investigation purposes, and function as a score-keeper in sports. When automation of these signals is desired, it is traditionally performed from a viewer's perspective by running computer vision algorithms on camera feeds. However, computer vision based approaches suffer from performance deterioration in scenarios such as lighting variations, occlusions, etc., might face resolution limitations, and can be challenging to install. Our work, ViSig, breaks with tradition by instead deploying on-body sensors for signal interpretation. Our key innovation is the fusion of ultra-wideband (UWB) sensors for capturing on-body distance measurements, inertial sensors (IMU) for capturing orientation of a few body segments, and photodiodes for finger signal recognition, enabling a robust interpretation of signals. By deploying only a small number of sensors, we show that body signals can be interpreted unambiguously in many different settings, including in games of Cricket, Baseball, and Football, and in operational safety use-cases such as crane operations and flag semaphores for maritime navigation, with > 90% accuracy. Overall, we have seen substantial promise in this approach and expect a large body of future follow-on work to start using UWB and IMU fused modalities for the more general human pose estimation problems.
more »
« less
- Award ID(s):
- 2145278
- PAR ID:
- 10409365
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
- Volume:
- 7
- Issue:
- 1
- ISSN:
- 2474-9567
- Page Range / eLocation ID:
- 1 to 27
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Sensors in and around the environment becoming ubiquitous has ushered in the concept of smart animal agriculture which has the potential to greatly improve animal health and productivity using the concepts of remote health monitoring which is a necessity in times when there is a great demand for animal products. The data from in and around animals gathered from sensors dwelling in animal agriculture settings have made farms a part of the Internet of Things space. This has led to active research in developing efficient communication methodologies for farm networks. This study focuses on the first hop of any such farm network where the data from inside the body of the animals is to be communicated to a node dwelling outside the body of the animal. In this paper, we use novel experimental methods to calculate the channel loss of signal at sub-GHz frequencies of 100 - 900 MHz to characterize the in-body to out-of-body communication channel in large animals. A first-of-its-kind 3D bovine modeling is done with computer vision techniques for detailed morphological features of the animal body is used to perform Finite Element Method based Electromagnetic simulations. The results of the simulations are experimentally validated to come up with a complete channel modeling methodology for in-body to out-of-body animal body communication. The experimentally validated 3D bovine model is made available publicly on https://github.com/SparcLab/Bovine-FEM-Model.git GitHub. The results illustrate that an in-body to out-of-body communication channel is realizable from the rumen to the collar of ruminants with $$\leq {90}~{\rm dB}$$ path loss at sub-GHz frequencies ( $100-900~MHz$ ) making communication feasible. The developed methodology has been illustrated for ruminants but can also be used for other related in-body to out-of-body studies. Using the developed channel modeling technique, an efficient communication architecture can be formed for in-body to out-of-body communication in animals which paves the way for the design and development of future smart animal agriculture systems.more » « less
-
Evolution has honed predatory skills in the natural world where localizing and intercepting fast-moving prey is required. The current generation of robotic systems mimics these biological systems using deep learning. High-speed processing of the camera frames using convolutional neural networks (CNN) (frame pipeline) on such constrained aerial edge-robots gets resource-limited. Adding more compute resources also eventually limits the throughput at the frame rate of the camera as frame-only traditional systems fail to capture the detailed temporal dynamics of the environment. Bio-inspired event cameras and spiking neural networks (SNN) provide an asynchronous sensor-processor pair (event pipeline) capturing the continuous temporal details of the scene for high-speed but lag in terms of accuracy. In this work, we propose a target localization system combining event-camera and SNN-based high-speed target estimation and frame-based camera and CNN-driven reliable object detection by fusing complementary spatio-temporal prowess of event and frame pipelines. One of our main contributions involves the design of an SNN filter that borrows from the neural mechanism for ego-motion cancelation in houseflies. It fuses the vestibular sensors with the vision to cancel the activity corresponding to the predator's self-motion. We also integrate the neuro-inspired multi-pipeline processing with task-optimized multi-neuronal pathway structure in primates and insects. The system is validated to outperform CNN-only processing using prey-predator drone simulations in realistic 3D virtual environments. The system is then demonstrated in a real-world multi-drone set-up with emulated event data. Subsequently, we use recorded actual sensory data from multi-camera and inertial measurement unit (IMU) assembly to show desired working while tolerating the realistic noise in vision and IMU sensors. We analyze the design space to identify optimal parameters for spiking neurons, CNN models, and for checking their effect on the performance metrics of the fused system. Finally, we map the throughput controlling SNN and fusion network on edge-compatible Zynq-7000 FPGA to show a potential 264 outputs per second even at constrained resource availability. This work may open new research directions by coupling multiple sensing and processing modalities inspired by discoveries in neuroscience to break fundamental trade-offs in frame-based computer vision 1 .more » « less
-
This study presents a mobile app that facilitates undergraduate students to learn data science through their own full body motions. Leveraging the built-in camera of a mobile device, the proposed app captures the user and feeds their images into an open-source computer-vision algorithm that localizes the key joint points of human body. As students can participate in the entire data collection process, the obtained motion data is context-rich and personally relevant to them. The app utilizes the collected motion data to explain various concepts and methods in data science under the context of human movements. The app also visualizes the geometric interpretation of data through various visual aids, such as interactive graphs and figures. In this study, we use principal component analysis, a commonly used dimensionality reduction method, as an example to demonstrate the proposed learning framework. Strategies to encompass other learning modules are also discussed for further improvement.more » « less
-
Intelligent systems commonly employ vision sensors like cameras to analyze a scene. Recent work has proposed a wireless sensing technique, wireless vibrometry, to enrich the scene analysis generated by vision sensors. Wireless vibrometry employs wireless signals to sense subtle vibrations from the objects and infer their internal states. However, it is difficult for pure Radio-Frequency (RF) sensing systems to obtain objects' visual appearances (e.g., object types and locations), especially when an object is inactive. Thus, most existing wireless vibrometry systems assume that the number and the types of objects in the scene are known. The key to getting rid of these presumptions is to build a connection between wireless sensor time series and vision sensor images. We present Capricorn, a vision-guided wireless vibrometry system. In Capricorn, the object type information from vision sensors guides the wireless vibrometry system to select the most appropriate signal processing pipeline. The object tracking capability in computer vision also helps wireless systems efficiently detect and separate vibrations from multiple objects in real time.more » « less
An official website of the United States government

