In this paper, we are interested in startle reflex detection with WiFi signals. We propose that two parameters related to the received signal bandwidth, maximum normalized bandwidth and bandwidth-intense duration, can successfully detect reflexes and robustly differentiate them from non-reflex events, even from those that involve intense body motions (e.g., certain exercises). In order to confirm this, we need a massive RF reflex dataset which would be prohibitively laborious to collect. On the other hand, there are many available reflex/non-reflex videos online. We then propose an efficient way of translating the content of a video to the bandwidth of the corresponding received RF signal that would have been measured if there was a link near the event in the video, by drawing analogies between our problem and the classic bandwidth modeling work of J. Carson in the context of analog FM radios (Carson's Rule). This then allows us to translate online reflex/non-reflex videos to an instant large RF bandwidth dataset, and characterize optimum 2D reflex/non-reflex decision regions accordingly, to be used during real operation with WiFi. We extensively test our approach with 203 reflex events, 322 non-reflex events (including 142 intense body motion events), over four areas (including several through-wall ones), and with 15 participants, achieving a correct reflex detection rate of 90.15% and a false alarm rate of 2.49% (all events are natural). While the paper is extensively tested with startle reflexes, it is also applicable to sport-type reflexes, and is thus tested with sport-related reflexes as well. We further show reflex detection with multiple people simultaneously engaged in a series of activities. Optimality of the proposed design is also demonstrated experimentally. Finally, we conduct experiments to show the potential of our approach for providing cost-effective and quantifiable metrics in sports, by quantifying a goalkeeper's reaction. Overall, our results confirm a fast, robust, and cost-effective reflex detection system, without collecting any RF training data, or training a neural network.
more »
« less
Teaching RF to Sense without RF Training Measurements
In this paper, we propose a novel, generalizable, and scalable idea that eliminates the need for collecting Radio Frequency (RF) measurements, when training RF sensing systems for human-motion-related activities. Existing learning-based RF sensing systems require collecting massive RF training data, which depends heavily on the particular sensing setup/involved activities. Thus, new data needs to be collected when the setup/activities change, significantly limiting the practical deployment of RF sensing systems. On the other hand, recent years have seen a growing, massive number of online videos involving various human activities/motions. In this paper, we propose to translate such already-available online videos to instant simulated RF data for training any human-motion-based RF sensing system, in any given setup. To validate our proposed framework, we conduct a case study of gym activity classification, where CSI magnitude measurements of three WiFi links are used to classify a person's activity from 10 different physical exercises. We utilize YouTube gym activity videos and translate them to RF by simulating the WiFi signals that would have been measured if the person in the video was performing the activity near the transceivers. We then train a classifier on the simulated data, and extensively test it with real WiFi data of 10 subjects performing the activities in 3 areas. Our system achieves a classification accuracy of 86% on activity periods, each containing an average of 5.1 exercise repetitions, and 81% on individual repetitions of the exercises. This demonstrates that our approach can generate reliable RF training data from already-available videos, and can successfully train an RF sensing system without any real RF measurements. The proposed pipeline can also be used beyond training and for analysis and design of RF sensing systems, without the need for massive RF data collection.
more »
« less
- Award ID(s):
- 1816931
- PAR ID:
- 10311534
- Date Published:
- Journal Name:
- Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
- Volume:
- 4
- Issue:
- 4
- ISSN:
- 2474-9567
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The lack of adequate training data is one of the major hurdles in WiFi-based activity recognition systems. In this paper, we propose Wi-Fringe, which is a WiFi CSI-based devicefree human gesture recognition system that recognizes named gestures, i.e., activities and gestures that have a semantically meaningful name in English language, as opposed to arbitrary free-form gestures. Given a list of activities (only their names in English text), along with zero or more training examples (WiFi CSI values) per activity, Wi-Fringe is able to detect all activities at runtime. We show for the first time that by utilizing the state-of-the-art semantic representation of English words, which is learned from datasets like the Wikipedia (e.g., Google's word-to-vector [1]) and verb attributes learned from how a word is defined (e.g, American Heritage Dictionary), we can enhance the capability of WiFi-based named gesture recognition systems that lack adequate training examples per class. We propose a novel cross-domain knowledge transfer algorithm between radio frequency (RF) and text to lessen the burden on developers and end-users from the tedious task of data collection for all possible activities. To evaluate Wi-Fringe, we collect data from four volunteers in a multi-person apartment and an office building for a total of 20 activities. We empirically quantify the trade-off between the accuracy and the number of unseen activities.more » « less
-
Many technologies for human-computer interaction have been designed for hearing individuals and depend upon vocalized speech, precluding users of American Sign Language (ASL) in the Deaf community from benefiting from these advancements. While great strides have been made in ASL recognition with video or wearable gloves, the use of video in homes has raised privacy concerns, while wearable gloves severely restrict movement and infringe on daily life. Methods: This paper proposes the use of RF sensors for HCI applications serving the Deaf community. A multi-frequency RF sensor network is used to acquire non-invasive, non-contact measurements of ASL signing irrespective of lighting conditions. The unique patterns of motion present in the RF data due to the micro-Doppler effect are revealed using time-frequency analysis with the Short-Time Fourier Transform. Linguistic properties of RF ASL data are investigated using machine learning (ML). Results: The information content, measured by fractal complexity, of ASL signing is shown to be greater than that of other upper body activities encountered in daily living. This can be used to differentiate daily activities from signing, while features from RF data show that imitation signing by non-signers is 99% differentiable from native ASL signing. Feature-level fusion of RF sensor network data is used to achieve 72.5% accuracy in classification of 20 native ASL signs. Implications: RF sensing can be used to study dynamic linguistic properties of ASL and design Deaf-centric smart environments for non-invasive, remote recognition of ASL. ML algorithms should be benchmarked on native, not imitation, ASL data.more » « less
-
Human activity recognition provides insights into physical and mental well-being by monitoring patterns of movement and behavior, facilitating personalized interventions and proactive health management. Radio Frequency (RF)-based human activity recognition (HAR) is gaining attention due to its less privacy exposure and non-contact characteristics. However, it suffers from data scarcity problems and is sensitive to environment changes. Collecting and labeling such data is laborintensive and time consuming. The limited training data makes generalizability challenging when the sensor is deployed in a very different relative view in the real world. Synthetic data generation from abundant videos presents a potential to address data scarcity issues, yet the domain gaps between synthetic and real data constrain its benefit. In this paper, we firstly share our investigations and insights on the intrinsic limitations of existing video-based data synthesis methods. Then we present M4X, a method using metric learning to extract effective viewindependent features from the more abundant synthetic data despite their domain gaps, thus enhancing cross-view generalizability. We explore two main design issues in different mining strategies for contrastive pairs/triplets construction, and different forms of loss functions. We find that the best choices are offline triplet mining with real data as anchors, balanced triplets, and a triplet loss function without hard negative mining for higher discriminative power. Comprehensive experiments show that M4X consistently outperform baseline methods in cross-view generalizability. In the most challenging case of the least amount of real training data, M4X outperforms three baselines by 7.9- 16.5% on all views, and 18.9-25.6% on a view with only synthetic but no real data during training. This proves its effectiveness in extracting view-independent features from synthetic data despite their domain gaps. We also observe that given limited sensor deployments, a participant-facing viewpoint and another at a large angle (e.g. 60◦) tend to produce much better performance.more » « less
-
null (Ed.)RF sensing based human activity and hand gesture recognition (HGR) methods have gained enormous popularity with the development of small package, high frequency radar systems and powerful machine learning tools. However, most HGR experiments in the literature have been conducted on individual gestures and in isolation from preceding and subsequent motions. This paper considers the problem of American sign language (ASL) recognition in the context of daily living, which involves sequential classification of a continuous stream of signing mixed with daily activities. In particular, this paper investigates the efficacy of different RF input representations and fusion techniques for ASL and trigger gesture recognition tasks in a daily living scenario, which can be potentially used for sign language sensitive human-computer interfaces (HCI). The proposed approach involves first detecting and segmenting periods of motion, followed by feature level fusion of the range-Doppler map, micro-Doppler spectrogram, and envelope for classification with a bi-directional long short-term memory (BiL-STM) recurrent neural network. Results show 93.3% accuracy in identification of 6 activities and 4 ASL signs, as well as a trigger sign detection rate of 0.93.more » « less
An official website of the United States government

