skip to main content


Title: A Comparative Study of Hand Gesture Recognition Devices in the Context of Game Design
Gesture recognition devices provide a new means for natural human-computer interaction. However, when selecting these devices for games, designers might find it challenging to decide which gesture recognition device will work best. In the present research, we compare three vision-based, hand gesture devices: Leap Motion, Microsoft's Kinect, and Intel's RealSense. We developed a simple hand-gesture based game to evaluate performance, cognitive demand, comfort, and player experience of using these gesture devices. We found that participants' preferred and performed much better using Leap Motion and Kinect compared to using RealSense. Leap Motion also outperformed or was equivalent to Kinect. These findings suggest that not all gesture recognition devices can be suitable for games and that designers need to make better decisions when selecting gesture recognition devices and designing gesture based games to insure the usability, accuracy, and comfort of such games.  more » « less
Award ID(s):
1651532 1619273
NSF-PAR ID:
10174280
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM Digital Library
Date Published:
Journal Name:
Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces
Page Range / eLocation ID:
397 - 402
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gesture recognition devices provide a new means for natural human-computer interaction. However, when selecting these devices to be used in games, designers might find it challenging to decide which gesture recognition device will work best. In the present research, we compare three vision-based, hand-gesture devices: Leap Motion, Microsoft’s Kinect, and Intel’s RealSense. The comparison provides game designers with an understanding of the main factors to consider when selecting these devices and how to design games that use them. We developed a simple hand-gesture-based game to evaluate performance, cognitive demand, comfort, and player experience of using these gesture devices. We found that participants preferred and performed much better using Leap Motion and Kinect compared to using RealSense. Leap Motion also outperformed or was equivalent to Kinect. These findings were supported by players’ accounts of their experiences using these gesture devices. Based on these findings, we discuss how such devices can be used by game designers and provide them with a set of design cautions that provide insights into the design of gesture-based games. 
    more » « less
  2. Researchers, educators, and multimedia designers need to better understand how mixing physical tangible objects with virtual experiences affects learning and science identity. In this novel study, a 3D-printed tangible that is an accurate facsimile of the sort of expensive glassware that chemists use in real laboratories is tethered to a laptop with a digitized lesson. Interactive educational content is increasingly being placed online, it is important to understand the educational boundary conditions associated with passive haptics and 3D-printed manipulables. Cost-effective printed objects would be particularly welcome in rural and low Socio-Economic (SES) classrooms. A Mixed Reality (MR) experience was created that used a physical 3D-printed haptic burette to control a computer-based chemistry titration experiment. This randomized control trial study with 136 college students had two conditions: 1) low-embodied control (using keyboard arrows), and 2) high-embodied experimental (physically turning a valve/stopcock on the 3D-printed burette). Although both groups displayed similar significant gains on the declarative knowledge test, deeper analyses revealed nuanced Aptitude by Treatment Interactions (ATIs). These interactionsfavored the high-embodied experimental group that used the MR devicefor both titration-specific posttest knowledge questions and for science efficacy and science identity. Those students with higher prior science knowledge displayed higher titration knowledge scores after using the experimental 3D-printed haptic device. A multi-modal linguistic and gesture analysis revealed that during recall the experimental participants used the stopcock-turning gesture significantly more often, and their recalls created a significantly different Epistemic Network Analysis (ENA). ENA is a type of 2D projection of the recall data, stronger connections were seen in the high embodied group mainly centering on the key hand-turning gesture. Instructors and designers should consider the multi-modal and multi-dimensional nature of the user interface, and how the addition of another sensory-based learning signal (haptics) might differentially affect lower prior knowledge students. One hypothesis is that haptically manipulating novel devices during learning may create more cognitive load. For low prior knowledge students, it may be advantageous for them to begin learning content on a more ubiquitous interface (e.g., keyboard) before moving them to more novel, multi-modal MR devices/interfaces.

     
    more » « less
  3. The PoseASL dataset consists of color and depth videos collected from ASL signers at the Linguistic and Assistive Technologies Laboratory under the direction of Matt Huenerfauth, as part of a collaborative research project with researchers at the Rochester Institute of Technology, Boston University, and the University of Pennsylvania. Access: After becoming an authorized user of Databrary, please contact Matt Huenerfauth if you have difficulty accessing this volume. We have collected a new dataset consisting of color and depth videos of fluent American Sign Language signers performing sequences ASL signs and sentences. Given interest among sign-recognition and other computer-vision researchers in red-green-blue-depth (RBGD) video, we release this dataset for use by the research community. In addition to the video files, we share depth data files from a Kinect v2 sensor, as well as additional motion-tracking files produced through post-processing of this data. Organization of the Dataset: The dataset is organized into sub-folders, with codenames such as "P01" or "P16" etc. These codenames refer to specific human signers who were recorded in this dataset. Please note that there was no participant P11 nor P14; those numbers were accidentally skipped during the process of making appointments to collect video stimuli. Task: During the recording session, the participant was met by a member of our research team who was a native ASL signer. No other individuals were present during the data collection session. After signing the informed consent and video release document, participants responded to a demographic questionnaire. Next, the data-collection session consisted of English word stimuli and cartoon videos. The recording session began with showing participants stimuli consisting of slides that displayed English word and photos of items, and participants were asked to produce the sign for each (PDF included in materials subfolder). Next, participants viewed three videos of short animated cartoons, which they were asked to recount in ASL: - Canary Row, Warner Brothers Merrie Melodies 1950 (the 7-minute video divided into seven parts) - Mr. Koumal Flies Like a Bird, Studio Animovaneho Filmu 1969 - Mr. Koumal Battles his Conscience, Studio Animovaneho Filmu 1971 The word list and cartoons were selected as they are identical to the stimuli used in the collection of the Nicaraguan Sign Language video corpora - see: Senghas, A. (1995). Children’s Contribution to the Birth of Nicaraguan Sign Language. Doctoral dissertation, Department of Brain and Cognitive Sciences, MIT. Demographics: All 14 of our participants were fluent ASL signers. As screening, we asked our participants: Did you use ASL at home growing up, or did you attend a school as a very young child where you used ASL? All the participants responded affirmatively to this question. A total of 14 DHH participants were recruited on the Rochester Institute of Technology campus. Participants included 7 men and 7 women, aged 21 to 35 (median = 23.5). All of our participants reported that they began using ASL when they were 5 years old or younger, with 8 reporting ASL use since birth, and 3 others reporting ASL use since age 18 months. Filetypes: *.avi, *_dep.bin: The PoseASL dataset has been captured by using a Kinect 2.0 RGBD camera. The output of this camera system includes multiple channels which include RGB, depth, skeleton joints (25 joints for every video frame), and HD face (1,347 points). The video resolution produced in 1920 x 1080 pixels for the RGB channel and 512 x 424 pixels for the depth channels respectively. Due to limitations in the acceptable filetypes for sharing on Databrary, it was not permitted to share binary *_dep.bin files directly produced by the Kinect v2 camera system on the Databrary platform. If your research requires the original binary *_dep.bin files, then please contact Matt Huenerfauth. *_face.txt, *_HDface.txt, *_skl.txt: To make it easier for future researchers to make use of this dataset, we have also performed some post-processing of the Kinect data. To extract the skeleton coordinates of the RGB videos, we used the Openpose system, which is capable of detecting body, hand, facial, and foot keypoints of multiple people on single images in real time. The output of Openpose includes estimation of 70 keypoints for the face including eyes, eyebrows, nose, mouth and face contour. The software also estimates 21 keypoints for each of the hands (Simon et al, 2017), including 3 keypoints for each finger, as shown in Figure 2. Additionally, there are 25 keypoints estimated for the body pose (and feet) (Cao et al, 2017; Wei et al, 2016). Reporting Bugs or Errors: Please contact Matt Huenerfauth to report any bugs or errors that you identify in the corpus. We appreciate your help in improving the quality of the corpus over time by identifying any errors. Acknowledgement: This material is based upon work supported by the National Science Foundation under award 1749376: "Collaborative Research: Multimethod Investigation of Articulatory and Perceptual Constraints on Natural Language Evolution." 
    more » « less
  4. Raynal, Ann M. ; Ranney, Kenneth I. (Ed.)
    Most research in technologies for the Deaf community have focused on translation using either video or wearable devices. Sensor-augmented gloves have been reported to yield higher gesture recognition rates than camera-based systems; however, they cannot capture information expressed through head and body movement. Gloves are also intrusive and inhibit users in their pursuit of normal daily life, while cameras can raise concerns over privacy and are ineffective in the dark. In contrast, RF sensors are non-contact, non-invasive and do not reveal private information even if hacked. Although RF sensors are unable to measure facial expressions or hand shapes, which would be required for complete translation, this paper aims to exploit near real-time ASL recognition using RF sensors for the design of smart Deaf spaces. In this way, we hope to enable the Deaf community to benefit from advances in technologies that could generate tangible improvements in their quality of life. More specifically, this paper investigates near real-time implementation of machine learning and deep learning architectures for the purpose of sequential ASL signing recognition. We utilize a 60 GHz RF sensor which transmits a frequency modulation continuous wave (FMWC waveform). RF sensors can acquire a unique source of information that is inaccessible to optical or wearable devices: namely, a visual representation of the kinematic patterns of motion via the micro-Doppler signature. Micro-Doppler refers to frequency modulations that appear about the central Doppler shift, which are caused by rotational or vibrational motions that deviate from principle translational motion. In prior work, we showed that fractal complexity computed from RF data could be used to discriminate signing from daily activities and that RF data could reveal linguistic properties, such as coarticulation. We have also shown that machine learning can be used to discriminate with 99% accuracy the signing of native Deaf ASL users from that of copysigning (or imitation signing) by hearing individuals. Therefore, imitation signing data is not effective for directly training deep models. But, adversarial learning can be used to transform imitation signing to resemble native signing, or, alternatively, physics-aware generative models can be used to synthesize ASL micro-Doppler signatures for training deep neural networks. With such approaches, we have achieved over 90% recognition accuracy of 20 ASL signs. In natural environments, however, near real-time implementations of classification algorithms are required, as well as an ability to process data streams in a continuous and sequential fashion. In this work, we focus on extensions of our prior work towards this aim, and compare the efficacy of various approaches for embedding deep neural networks (DNNs) on platforms such as a Raspberry Pi or Jetson board. We examine methods for optimizing the size and computational complexity of DNNs for embedded micro-Doppler analysis, methods for network compression, and their resulting sequential ASL recognition performance. 
    more » « less
  5. The goal of this research is to provide much needed empirical data on how the fidelity of popular hand gesture tracked based pointing metaphors versus commodity controller based input affects the efficiency and speed-accuracy tradeoff in users’ spatial selection in personal space interactions in VR. We conduct two experiments in which participants select spherical targets arranged in a circle in personal space, or near-field within their maximum arms reach distance, in VR. Both experiments required participants to select the targets with either a VR controller or with their dominant hand’s index finger, which was tracked with one of two popular contemporary tracking methods. In the first experiment, the targets are arranged in a flat circle in accordance with the ISO 9241-9 Fitts’ law standard, and the simulation selected random combinations of 3 target amplitudes and 3 target widths. Targets were placed centered around the users’ eye level, and the arrangement was placed at either 60%, 75%, or 90% depth plane of the users’ maximum arm’s reach. In experiment 2, the targets varied in depth randomly from one depth plane to another within the same configuration of 13 targets within a trial set, which resembled button selection task in hierarchical menus in differing depth planes in the near-field. The study was conducted using the HTC Vive head-mounted display, and used either a VR controller (HTC Vive), low-fidelity virtual pointing (Leap Motion), or a high-fidelity virtual pointing (tracked VR glove) conditions. Our results revealed that low-fidelity pointing performed worse than both high-fidelity pointing and the VR controller. Overall, target selection performance was found to be worse in depth planes closer to the maximum arms reach, as compared to middle and nearer distances. 
    more » « less