skip to main content


Title: ViSig: Automatic Interpretation of Visual Body Signals Using On-Body Sensors
Visual body signals are designated body poses that deliver an application-specific message. Such signals are widely used for fast message communication in sports (signaling by umpires and referees), transportation (naval officers and aircraft marshallers), and construction (signaling by riggers and crane operators), to list a few examples. Automatic interpretation of such signals can help maintaining safer operations in these industries, help in record-keeping for auditing or accident investigation purposes, and function as a score-keeper in sports. When automation of these signals is desired, it is traditionally performed from a viewer's perspective by running computer vision algorithms on camera feeds. However, computer vision based approaches suffer from performance deterioration in scenarios such as lighting variations, occlusions, etc., might face resolution limitations, and can be challenging to install. Our work, ViSig, breaks with tradition by instead deploying on-body sensors for signal interpretation. Our key innovation is the fusion of ultra-wideband (UWB) sensors for capturing on-body distance measurements, inertial sensors (IMU) for capturing orientation of a few body segments, and photodiodes for finger signal recognition, enabling a robust interpretation of signals. By deploying only a small number of sensors, we show that body signals can be interpreted unambiguously in many different settings, including in games of Cricket, Baseball, and Football, and in operational safety use-cases such as crane operations and flag semaphores for maritime navigation, with > 90% accuracy. Overall, we have seen substantial promise in this approach and expect a large body of future follow-on work to start using UWB and IMU fused modalities for the more general human pose estimation problems.  more » « less
Award ID(s):
2145278
NSF-PAR ID:
10409365
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
Volume:
7
Issue:
1
ISSN:
2474-9567
Page Range / eLocation ID:
1 to 27
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Evolution has honed predatory skills in the natural world where localizing and intercepting fast-moving prey is required. The current generation of robotic systems mimics these biological systems using deep learning. High-speed processing of the camera frames using convolutional neural networks (CNN) (frame pipeline) on such constrained aerial edge-robots gets resource-limited. Adding more compute resources also eventually limits the throughput at the frame rate of the camera as frame-only traditional systems fail to capture the detailed temporal dynamics of the environment. Bio-inspired event cameras and spiking neural networks (SNN) provide an asynchronous sensor-processor pair (event pipeline) capturing the continuous temporal details of the scene for high-speed but lag in terms of accuracy. In this work, we propose a target localization system combining event-camera and SNN-based high-speed target estimation and frame-based camera and CNN-driven reliable object detection by fusing complementary spatio-temporal prowess of event and frame pipelines. One of our main contributions involves the design of an SNN filter that borrows from the neural mechanism for ego-motion cancelation in houseflies. It fuses the vestibular sensors with the vision to cancel the activity corresponding to the predator's self-motion. We also integrate the neuro-inspired multi-pipeline processing with task-optimized multi-neuronal pathway structure in primates and insects. The system is validated to outperform CNN-only processing using prey-predator drone simulations in realistic 3D virtual environments. The system is then demonstrated in a real-world multi-drone set-up with emulated event data. Subsequently, we use recorded actual sensory data from multi-camera and inertial measurement unit (IMU) assembly to show desired working while tolerating the realistic noise in vision and IMU sensors. We analyze the design space to identify optimal parameters for spiking neurons, CNN models, and for checking their effect on the performance metrics of the fused system. Finally, we map the throughput controlling SNN and fusion network on edge-compatible Zynq-7000 FPGA to show a potential 264 outputs per second even at constrained resource availability. This work may open new research directions by coupling multiple sensing and processing modalities inspired by discoveries in neuroscience to break fundamental trade-offs in frame-based computer vision 1 . 
    more » « less
  2. The objective of this research is to evaluate vision-based pose estimation methods for on-site construction robots. The prospect of human-robot collaborative work on construction sites introduces new workplace hazards that must be mitigated to ensure safety. Human workers working on tasks alongside construction robots must perceive the interaction to be safe to ensure team identification and trust. Detecting the robot pose in real-time is thus a key requirement in order to inform the workers and to enable autonomous operation. Vision-based (marker-less, marker-based) and sensor-based (IMU, UWB) are two of the main methods for estimating robot pose. The marker-based and sensor-based methods require some additional preinstalled sensors or markers, whereas the marker-less method only requires an on-site camera system, which is common on modern construction sites. In this research, we develop a marker-less pose estimation system, which is based on a convolutional neural network (CNN) human pose estimation algorithm: stacked hourglass networks. The system is trained with image data collected from a factory setup environment and labels of excavator pose. We use a KUKA robot arm with a bucket mounted on the end-effector to represent a robotic excavator in our experiment. We evaluate the marker-less method and compare the result with the robot’s ground truth pose. The preliminary results show that the marker-less method is capable of estimating the pose of the excavator based on a state-of-the-art human pose estimation algorithm. 
    more » « less
  3. Sensors in and around the environment becoming ubiquitous has ushered in the concept of smart animal agriculture which has the potential to greatly improve animal health and productivity using the concepts of remote health monitoring which is a necessity in times when there is a great demand for animal products. The data from in and around animals gathered from sensors dwelling in animal agriculture settings have made farms a part of the Internet of Things space. This has led to active research in developing efficient communication methodologies for farm networks. This study focuses on the first hop of any such farm network where the data from inside the body of the animals is to be communicated to a node dwelling outside the body of the animal. In this paper, we use novel experimental methods to calculate the channel loss of signal at sub-GHz frequencies of 100 - 900 MHz to characterize the in-body to out-of-body communication channel in large animals. A first-of-its-kind 3D bovine modeling is done with computer vision techniques for detailed morphological features of the animal body is used to perform Finite Element Method based Electromagnetic simulations. The results of the simulations are experimentally validated to come up with a complete channel modeling methodology for in-body to out-of-body animal body communication. The experimentally validated 3D bovine model is made available publicly on https://github.com/SparcLab/Bovine-FEM-Model.git GitHub. The results illustrate that an in-body to out-of-body communication channel is realizable from the rumen to the collar of ruminants with $\leq {90}~{\rm dB}$ path loss at sub-GHz frequencies ( $100-900~MHz$ ) making communication feasible. The developed methodology has been illustrated for ruminants but can also be used for other related in-body to out-of-body studies. Using the developed channel modeling technique, an efficient communication architecture can be formed for in-body to out-of-body communication in animals which paves the way for the design and development of future smart animal agriculture systems. 
    more » « less
  4. Production innovations are occurring faster than ever. Manufacturing workers thus need to frequently learn new methods and skills. In fast changing, largely uncertain production systems, manufacturers with the ability to comprehend workers' behavior and assess their operation performance in near real-time will achieve better performance than peers. Action recognition can serve this purpose. Despite that human action recognition has been an active field of study in machine learning, limited work has been done for recognizing worker actions in performing manufacturing tasks that involve complex, intricate operations. Using data captured by one sensor or a single type of sensor to recognize those actions lacks reliability. The limitation can be surpassed by sensor fusion at data, feature, and decision levels. This paper presents a study that developed a multimodal sensor system and used sensor fusion methods to enhance the reliability of action recognition. One step in assembling a Bukito 3D printer, which composed of a sequence of 7 actions, was used to illustrate and assess the proposed method. Two wearable sensors namely Myo-armband captured both Inertial Measurement Unit (IMU) and electromyography (EMG) signals of assembly workers. Microsoft Kinect, a vision based sensor, simultaneously tracked predefined skeleton joints of them. The collected IMU, EMG, and skeleton data were respectively used to train five individual Convolutional Neural Network (CNN) models. Then, various fusion methods were implemented to integrate the prediction results of independent models to yield the final prediction. Reasons for achieving better performance using sensor fusion were identified from this study. 
    more » « less
  5. El Asmar, Mounir ; Grau, David ; Tang, Pingbo (Ed.)
    As a proactive means of preventing struck-by accidents in construction, many studies have presented proximity monitoring applications using wireless sensors (e.g., RFID, UWB, and GPS) or computer vision methods. Most prior research has emphasized proximity detection rather than prediction. However, prediction can be more effective and important for contact-driven accident prevention, particularly given that the sooner workers (e.g., equipment operators and workers on foot) are informed of their proximity to each other, the more likely they are to avoid the impending collision. In earlier studies, the authors presented a trajectory prediction method leveraging a deep neural network to examine the feasibility of proximity prediction in real-world applications. In this study, we enhance the existing trajectory prediction accuracy. Specifically, we improve the trajectory prediction model by tuning its pre-trained weight parameters with construction data. Moreover, inherent movement-driven post-processing algorithm is developed to refine the trajectory prediction of a target in accordance with its inherent movement patterns such as the final position, predominant direction, and average velocity. In a test on real-site operations data, the proposed approach demonstrates the improvement in accuracy: for 5.28 seconds’ prediction, it achieves 0.39 meter average displacement error, improved by 51.43% as compared with the previous one (0.84 meters). The improved trajectory prediction method can support to predict potential contact-driven hazards in advance, which can allow for prompt feedback (e.g., visible, acoustic, and vibration alarms) to equipment operators and workers on foot. The proactive intervention can lead the workers to take prompt evasive action, thereby reducing the chance of an impending collision. 
    more » « less