skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Title: The Haptic Video Player: Using Mobile Robots to Create Tangible Video Annotations
Video and animation are common ways of delivering concepts that cannot be easily communicated through text. This visual information is often inaccessible to blind and visually impaired people, and alternative representations such as Braille and audio may leave out important details. Audio-haptic displays with along with supplemental descriptions allow for the presentation of complex spatial information, along with accompanying description. We introduce the Haptic Video Player, a system for authoring and presenting audio-haptic content from videos. The Haptic Video Player presents video using mobile robots that can be touched as they move over a touch screen. We describe the design of the Haptic Video Player system, and present user studies with educators and blind individuals that demonstrate the ability of this system to render dynamic visual content non-visually.  more » « less
Award ID(s):
1652907 1619384
PAR ID:
10094096
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 2018 ACM International Conference on Interactive Surfaces and Spaces (ISS 2018)
Page Range / eLocation ID:
203 to 211
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The iASSIST is an iPhone-based assistive sensor solution for independent and safe travel for people who are blind or visually impaired, or those who simply face challenges in navigating an unfamiliar indoor environment. The solution integrates information of Bluetooth beacons, data connectivity, visual models, and user preferences. Hybrid models of interiors are created in a modeling stage with these multimodal data, collected, and mapped to the floor plan as the modeler walks through the building. Client-server architecture allows scaling to large areas by lazy-loading models according to beacon signals and/or adjacent region proximity. During the navigation stage, a user with the navigation app is localized within the floor plan, using visual, connectivity, and user preference data, along an optimal route to their destination. User interfaces for both modeling and navigation use multimedia channels, including visual, audio, and haptic feedback for targeted users. The design of human subject test experiments is also described, in addition to some preliminary experimental results. 
    more » « less
  2. With the advancement and dominant service of Internet videos, the content-based video deduplication system becomes an essential and dependent infrastructure for Internet video service. However, the explosively growing video data on the Internet challenges the system design and implementation for its scalability in several ways. (1) Although the quantization-based indexing techniques are effective for searching visual features at a large scale, the costly re-training over the complete dataset must be done periodically. (2) The high-dimensional vectors for visual features demand increasingly large SSD space, degrading I/O performance. (3) Videos crawled from the Internet are diverse, and visually similar videos are not necessarily the duplicates, increasing deduplication complexity. (4) Most videos are edited ones. The duplicate contents are more likely discovered as clips inside the videos, demanding processing techniques with close attention to details. To address above-mentioned issues, we propose Maze, a full-fledged video deduplication system. Maze has an ANNS layer that indexes and searches the high dimensional feature vectors. The architecture of the ANNS layer supports efficient reads and writes and eliminates the data migration caused by re-training. Maze adopts the CNN-based feature and the ORB feature as the visual features, which are optimized for the specific video deduplication task. The features are compact and fully reside in the memory. Acoustic features are also incorporated in Maze so that the visually similar videos but having different audio tracks are recognizable. A clip-based matching algorithm is developed to discover duplicate contents at a fine granularity. Maze has been deployed as a production system for two years. It has indexed 1.3 billion videos and is indexing ~800 thousand videos per day. For the ANNS layer, the average read latency is 4 seconds and the average write latency is at most 4.84 seconds. The re-training over the complete dataset is no longer required no matter how many new data sets are added, eliminating the costly data migration between nodes. Maze recognizes the duplicate live streaming videos with both the similar appearance and the similar audio at a recall of 98%. Most importantly, Maze is also cost-effective. For example, the compact feature design helps save 5800 SSDs and the computation resources devoted to running the whole system decrease to 250K standard cores per billion videos. 
    more » « less
  3. null (Ed.)
    Research has provided evidence of the value of producing multiple representations of content for learners (e.g., verbal, visual, etc.). However, much of the research has acknowledged changes in visual technologies while not recognizing or utilizing related audio innovations. For instance, teacher education students who were once taught through two-dimensional video are now being presented with interactive, three-dimensional content (e.g., simulations or 360 video). Users in old and new formats, however, still typically receive monophonic sound. A limited number of research studies exist that have examined the impact of combining three-dimensional sound to match three-dimensional video in learning environments. The purpose of this study was to respond to this gap by comparing the outcomes of watching 360 video with either monophonic or ambisonic audio. Results provided evidence that ambisonic audio increased perceived presence for those familiar with the content being taught, led to differentiation in what ambisonic viewers noticed compared to monophonic groups, and improved participant focus in watching the 360 video. Implications for the development and implementation into virtual worlds are discussed. 
    more » « less
  4. The joint analysis of audio and video is a powerful tool that can be applied to various contexts, including action, speech, and sound recognition, audio-visual video parsing, emotion recognition in affective computing, and self-supervised training of deep learning models. Solving these problems often involves tackling core audio-visual tasks, such as audio-visual source localization, audio-visual correspondence, and audio-visual source separation, which can be combined in various ways to achieve the desired results. This paper provides a review of the literature in this area, discussing the advancements, history, and datasets of audio-visual learning methods for various application domains. It also presents an overview of the reported performances on standard datasets and suggests promising directions for future research. 
    more » « less
  5. Abstract We present an experimental investigation of spatial audio feedback using smartphones to support direction localization in pointing tasks for people with visual impairments (PVIs). We do this using a mobile game based on a bow-and-arrow metaphor. Our game provides a combination of spatial and non-spatial (sound beacon) audio to help the user locate the direction of the target. Our experiments with sighted, sighted-blindfolded, and visually impaired users shows that (a) the efficacy of spatial audio is relatively higher for PVIs than for blindfolded sighted users during the initial reaction time for direction localization, (b) the general behavior between PVIs and blind-folded individuals is statistically similar, and (c) the lack of spatial audio significantly reduces the localization performance even in sighted blind-folded users. Based on our findings, we discuss the system and interaction design implications for making future mobile-based spatial interactions accessible to PVIs. 
    more » « less