PoseASL: An RGBD Dataset of American Sign Language

Huenerfauth, Matt

doi:10.17910/b7.1279

{"Abstract":["The PoseASL dataset consists of color and depth videos collected from ASL signers at the Linguistic and Assistive Technologies Laboratory under the direction of Matt Huenerfauth, as part of a collaborative research project with researchers at the Rochester Institute of Technology, Boston University, and the University of Pennsylvania.\n\nAccess: After becoming an authorized user of Databrary, please contact Matt Huenerfauth if you have difficulty accessing this volume. \n\nWe have collected a new dataset consisting of color and depth videos of fluent American Sign Language signers performing sequences ASL signs and sentences. Given interest among sign-recognition and other computer-vision researchers in red-green-blue-depth (RBGD) video, we release this dataset for use by the research community. In addition to the video files, we share depth data files from a Kinect v2 sensor, as well as additional motion-tracking files produced through post-processing of this data.\n\nOrganization of the Dataset: The dataset is organized into sub-folders, with codenames such as "P01" or "P16" etc. These codenames refer to specific human signers who were recorded in this dataset. Please note that there was no participant P11 nor P14; those numbers were accidentally skipped during the process of making appointments to collect video stimuli.\n\nTask: During the recording session, the participant was met by a member of our research team who was a native ASL signer. No other individuals were present during the data collection session. After signing the informed consent and video release document, participants responded to a demographic questionnaire. Next, the data-collection session consisted of English word stimuli and cartoon videos. The recording session began with showing participants stimuli consisting of slides that displayed English word and photos of items, and participants were asked to produce the sign for each (PDF included in materials subfolder). Next, participants viewed three videos of short animated cartoons, which they were asked to recount in ASL:\n- Canary Row, Warner Brothers Merrie Melodies 1950 (the 7-minute video divided into seven parts)\n- Mr. Koumal Flies Like a Bird, Studio Animovaneho Filmu 1969\n- Mr. Koumal Battles his Conscience, Studio Animovaneho Filmu 1971\nThe word list and cartoons were selected as they are identical to the stimuli used in the collection of the Nicaraguan Sign Language video corpora - see: Senghas, A. (1995). Children\u2019s Contribution to the Birth of Nicaraguan Sign Language. Doctoral dissertation, Department of Brain and Cognitive Sciences, MIT.\n\nDemographics: All 14 of our participants were fluent ASL signers. As screening, we asked our participants: Did you use ASL at home growing up, or did you attend a school as a very young child where you used ASL? All the participants responded affirmatively to this question. A total of 14 DHH participants were recruited on the Rochester Institute of Technology campus. Participants included 7 men and 7 women, aged 21 to 35 (median = 23.5). All of our participants reported that they began using ASL when they were 5 years old or younger, with 8 reporting ASL use since birth, and 3 others reporting ASL use since age 18 months. \n\nFiletypes:\n\n*.avi, *_dep.bin: The PoseASL dataset has been captured by using a Kinect 2.0 RGBD camera. The output of this camera system includes multiple channels which include RGB, depth, skeleton joints (25 joints for every video frame), and HD face (1,347 points). The video resolution produced in 1920 x 1080 pixels for the RGB channel and 512 x 424 pixels for the depth channels respectively. Due to limitations in the acceptable filetypes for sharing on Databrary, it was not permitted to share binary *_dep.bin files directly produced by the Kinect v2 camera system on the Databrary platform. If your research requires the original binary *_dep.bin files, then please contact Matt Huenerfauth.\n\n*_face.txt, *_HDface.txt, *_skl.txt: To make it easier for future researchers to make use of this dataset, we have also performed some post-processing of the Kinect data. To extract the skeleton coordinates of the RGB videos, we used the Openpose system, which is capable of detecting body, hand, facial, and foot keypoints of multiple people on single images in real time. The output of Openpose includes estimation of 70 keypoints for the face including eyes, eyebrows, nose, mouth and face contour. The software also estimates 21 keypoints for each of the hands (Simon et al, 2017), including 3 keypoints for each finger, as shown in Figure 2. Additionally, there are 25 keypoints estimated for the body pose (and feet) (Cao et al, 2017; Wei et al, 2016).\n\nReporting Bugs or Errors:\n\nPlease contact Matt Huenerfauth to report any bugs or errors that you identify in the corpus. We appreciate your help in improving the quality of the corpus over time by identifying any errors.\n\nAcknowledgement: This material is based upon work supported by the National Science Foundation under award 1749376: "Collaborative Research: Multimethod Investigation of Articulatory and Perceptual Constraints on Natural Language Evolution.""]}

More Like this