skip to main content


Title: Wi-Fringe: Leveraging Text Semantics in WiFi CSI-Based Device-Free Named Gesture Recognition
The lack of adequate training data is one of the major hurdles in WiFi-based activity recognition systems. In this paper, we propose Wi-Fringe, which is a WiFi CSI-based devicefree human gesture recognition system that recognizes named gestures, i.e., activities and gestures that have a semantically meaningful name in English language, as opposed to arbitrary free-form gestures. Given a list of activities (only their names in English text), along with zero or more training examples (WiFi CSI values) per activity, Wi-Fringe is able to detect all activities at runtime. We show for the first time that by utilizing the state-of-the-art semantic representation of English words, which is learned from datasets like the Wikipedia (e.g., Google's word-to-vector [1]) and verb attributes learned from how a word is defined (e.g, American Heritage Dictionary), we can enhance the capability of WiFi-based named gesture recognition systems that lack adequate training examples per class. We propose a novel cross-domain knowledge transfer algorithm between radio frequency (RF) and text to lessen the burden on developers and end-users from the tedious task of data collection for all possible activities. To evaluate Wi-Fringe, we collect data from four volunteers in a multi-person apartment and an office building for a total of 20 activities. We empirically quantify the trade-off between the accuracy and the number of unseen activities.  more » « less
Award ID(s):
1840131
NSF-PAR ID:
10198358
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2020 16th International Conference on Distributed Computing in Sensor Systems (DCOSS)
Page Range / eLocation ID:
35 to 42
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Gesture recognition has become increasingly important in human-computer interaction and can support different applications such as smart home, VR, and gaming. Traditional approaches usually rely on dedicated sensors that are worn by the user or cameras that require line of sight. In this paper, we present fine-grained finger gesture recognition by using commodity WiFi without requiring user to wear any sensors. Our system takes advantages of the fine-grained Channel State Information available from commodity WiFi devices and the prevalence of WiFi network infrastructures. It senses and identifies subtle movements of finger gestures by examining the unique patterns exhibited in the detailed CSI. We devise environmental noise removal mechanism to mitigate the effect of signal dynamic due to the environment changes. Moreover, we propose to capture the intrinsic gesture behavior to deal with individual diversity and gesture inconsistency. Lastly, we utilize multiple WiFi links and larger bandwidth at 5GHz to achieve finger gesture recognition under multi-user scenario. Our experimental evaluation in different environments demonstrates that our system can achieve over 90% recognition accuracy and is robust to both environment changes and individual diversity. Results also show that our system can provide accurate gesture recognition under different scenarios. 
    more » « less
  2. In this paper, we propose a novel, generalizable, and scalable idea that eliminates the need for collecting Radio Frequency (RF) measurements, when training RF sensing systems for human-motion-related activities. Existing learning-based RF sensing systems require collecting massive RF training data, which depends heavily on the particular sensing setup/involved activities. Thus, new data needs to be collected when the setup/activities change, significantly limiting the practical deployment of RF sensing systems. On the other hand, recent years have seen a growing, massive number of online videos involving various human activities/motions. In this paper, we propose to translate such already-available online videos to instant simulated RF data for training any human-motion-based RF sensing system, in any given setup. To validate our proposed framework, we conduct a case study of gym activity classification, where CSI magnitude measurements of three WiFi links are used to classify a person's activity from 10 different physical exercises. We utilize YouTube gym activity videos and translate them to RF by simulating the WiFi signals that would have been measured if the person in the video was performing the activity near the transceivers. We then train a classifier on the simulated data, and extensively test it with real WiFi data of 10 subjects performing the activities in 3 areas. Our system achieves a classification accuracy of 86% on activity periods, each containing an average of 5.1 exercise repetitions, and 81% on individual repetitions of the exercises. This demonstrates that our approach can generate reliable RF training data from already-available videos, and can successfully train an RF sensing system without any real RF measurements. The proposed pipeline can also be used beyond training and for analysis and design of RF sensing systems, without the need for massive RF data collection. 
    more » « less
  3. Abstract

    Gestures that accompany speech are an essential part of natural and efficient embodied human communication. The automatic generation of such co‐speech gestures is a long‐standing problem in computer animation and is considered an enabling technology for creating believable characters in film, games, and virtual social spaces, as well as for interaction with social robots. The problem is made challenging by the idiosyncratic and non‐periodic nature of human co‐speech gesture motion, and by the great diversity of communicative functions that gestures encompass. The field of gesture generation has seen surging interest in the last few years, owing to the emergence of more and larger datasets of human gesture motion, combined with strides in deep‐learning‐based generative models that benefit from the growing availability of data. This review article summarizes co‐speech gesture generation research, with a particular focus on deep generative models. First, we articulate the theory describing human gesticulation and how it complements speech. Next, we briefly discuss rule‐based and classical statistical gesture synthesis, before delving into deep learning approaches. We employ the choice of input modalities as an organizing principle, examining systems that generate gestures from audio, text and non‐linguistic input. Concurrent with the exposition of deep learning approaches, we chronicle the evolution of the related training data sets in terms of size, diversity, motion quality, and collection method (e.g., optical motion capture or pose estimation from video). Finally, we identify key research challenges in gesture generation, including data availability and quality; producing human‐like motion; grounding the gesture in the co‐occurring speech in interaction with other speakers, and in the environment; performing gesture evaluation; and integration of gesture synthesis into applications. We highlight recent approaches to tackling the various key challenges, as well as the limitations of these approaches, and point toward areas of future development.

     
    more » « less
  4. While inferring human activities from sensors embedded in mobile devices using machine learning algorithms has been studied, current research relies primarily on sensor data that are collected in controlled settings often with healthy individuals. Currently, there exists a gap in research about how to design activity recognition models based on sensor data collected with chronically-ill individuals and in free-living environments. In this paper, we focus on a situation where free-living activity data are collected continuously, activity vocabulary (i.e., class labels) are not known as a priori, and sensor data are annotated by end-users through an active learning process. By analyzing sensor data collected in a clinical study involving patients with cardiovascular disease, we demonstrate significant challenges that arise while inferring physical activities in uncontrolled environments. In particular, we observe that activity labels that are distinct in syntax can refer to semantically-identical behaviors, resulting in a sparse label space. To construct a meaningful label space, we propose LabelMerger, a framework for restructuring the label space created through active learning in uncontrolled environments in preparation for training activity recognition models. LabelMerger combines the semantic meaning of activity labels with physical attributes of the activities (i.e., domain knowledge) to generate a flexible and meaningful representation of the labels. Specifically, our approach merges labels using both word embedding techniques from the natural language processing domain and activity intensity from the physical activity research. We show that the new representation of the sensor data obtained by LabelMerger results in more accurate activity recognition models compared to the case where original label space is used to learn recognition models. 
    more » « less
  5. The rapid pace of urbanization and socioeconomic development encourage people to spend more time together and therefore monitoring of human dynamics is of great importance, especially for facilities of elder care and involving multiple activities. Traditional approaches are limited due to their high deployment costs and privacy concerns (e.g., camera-based surveillance or sensor-attachment-based solutions). In this work, we propose to provide a fine-grained comprehensive view of human dynamics using existing WiFi infrastructures often available in many indoor venues. Our approach is low-cost and device-free, which does not require any active human participation. Our system aims to provide smart human dynamics monitoring through participant number estimation, human density estimation and walking speed and direction derivation. A semi-supervised learning approach leveraging the non-linear regression model is developed to significantly reduce training efforts and accommodate different monitoring environments. We further derive participant number and density estimation based on the statistical distribution of Channel State Information (CSI) measurements. In addition, people's walking speed and direction are estimated by using a frequency-based mechanism. Extensive experiments over 12 months demonstrate that our system can perform fine-grained effective human dynamic monitoring with over 90% accuracy in estimating participants number, density, and walking speed and direction at various indoor environments. 
    more » « less