skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


Title: EyeDescribe: Combining Eye Gaze and Speech to Automatically Create Accessible Touch Screen Artwork
Many images on the Web, including photographs and artistic images, feature spatial relationships between objects that are inaccessible to someone who is blind or visually impaired even when a text description is provided. While some tools exist to manually create accessible image descriptions, this work is time consuming and requires specialized tools. We introduce an approach that automatically creates spatially registered image labels based on how a sighted person naturally interacts with the image. Our system collects behavioral data from sighted viewers of an image, specifically eye gaze data and spoken descriptions, and uses them to generate a spatially indexed accessible image that can then be explored using an audio-based touch screen application. We describe our approach to assigning text labels to locations in an image based on eye gaze. We then report on two formative studies with blind users testing EyeDescribe. Our approach resulted in correct labels for all objects in our image set. Participants were able to better recall the location of objects when given both object labels and spatial locations. This approach provides a new method for creating accessible images with minimum required effort.  more » « less
Award ID(s):
1652907
PAR ID:
10165065
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ISS '19: Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces
Page Range / eLocation ID:
101 to 112
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Text correction on mobile devices usually requires precise and repetitive manual control. In this paper, we present EyeSayCorrect, an eye gaze and voice based hands-free text correction method for mobile devices. To correct text with EyeSayCorrect, the user first utilizes the gaze location on the screen to select a word, then speaks the new phrase. EyeSayCorrect would then infer the user’s correction intention based on the inputs and the text context. We used a Bayesian approach for determining the selected word given an eye-gaze trajectory. Given each sampling point in an eye-gaze trajectory, the posterior probability of selecting a word is calculated and accumulated. The target word would be selected when its accumulated interest is larger than a threshold. The misspelt words have higher priors. Our user studies showed that using priors for misspelt words reduced the task completion time up to 23.79% and the text selection time up to 40.35%, and EyeSayCorrect is a feasible hands-free text correction method on mobile devices. 
    more » « less
  2. Abstract We present an experimental investigation of spatial audio feedback using smartphones to support direction localization in pointing tasks for people with visual impairments (PVIs). We do this using a mobile game based on a bow-and-arrow metaphor. Our game provides a combination of spatial and non-spatial (sound beacon) audio to help the user locate the direction of the target. Our experiments with sighted, sighted-blindfolded, and visually impaired users shows that (a) the efficacy of spatial audio is relatively higher for PVIs than for blindfolded sighted users during the initial reaction time for direction localization, (b) the general behavior between PVIs and blind-folded individuals is statistically similar, and (c) the lack of spatial audio significantly reduces the localization performance even in sighted blind-folded users. Based on our findings, we discuss the system and interaction design implications for making future mobile-based spatial interactions accessible to PVIs. 
    more » « less
  3. In the last decade, there has been a surge in development and mainstream adoption of Artificial Intelligence (AI) systems that can generate textual image descriptions from images. However, only a few of these, such as Microsoft’s SeeingAI, are specifically tailored to needs of people who are blind screen reader users, and none of these have been brought to bear on the particular challenges faced by parents who desire image descriptions of children’s picture books. Such images have distinct qualities, but there exists no research to explore the current state of the art and opportunities to improve image-to-text AI systems for this problem domain. We conducted a content analysis of the image descriptions generated for a sample of 20 images selected from 17 recently published children’s picture books, using five AI systems: asticaVision, BLIP, SeeingAI, TapTapSee, and VertexAI. We found that descriptions varied widely in their accuracy and completeness, with only 13% meeting both criteria. Overall, our findings suggest a need for AI image-to-text generation systems that are trained on the types, contents, styles, and layouts characteristic of children’s picture book images, towards increased accessibility for blind parents. 
    more » « less
  4. Evaluating the quality of accessible image captions with human raters is difficult, as it may be difficult for a visually impaired user to know how comprehensive a caption is, whereas a sighted assistant may not know what information a user will need from a caption. To explore how image captioners and caption consumers assess caption content, we conducted a series of collaborative captioning sessions in which six pairs, consisting of a blind person and their sighted partner, worked together to discuss, create, and evaluate image captions. By making captioning a collaborative task, we were able to observe captioning strategies, to elicit questions and answers about image captions, and to explore blind users’ caption preferences. Our findings provide insight about the process of creating good captions and serve as a case study for cross-ability collaboration between blind and sighted people. 
    more » « less
  5. Most programmers rely on visual tools (block-based editors, auto-indentation, bracket matching, syntax highlighting, etc.), which are inaccessible to visually-impaired programmers. While prior language-specific, downloadable tools have demonstrated benefits for the visually-impaired, we lack language-independent, cloud-based tools, both of which are critically needed. We present a new toolkit for building fully-accessible, browser-based programming environments for multiple languages. Given a parser that meets certain specifications, this toolkit will generate a block editor familiar to sighted users that also communicates the structure of a program using spoken descriptions, and allows for navigation using standard (accessible) keyboard shortcuts. This paper presents the toolkit and a first evaluation of it. While the toolkit allows for full editing of code, we chose to focus strictly on navigation for this evaluation, using the navigation-only study design of Baker, Milne and Ladner. Visually-impaired programmers completed several tasks with and without our tool, and we compared their results and experience. Users had improved accuracy when completing tasks, were significantly better able to orient when reading code, and felt better about completing the tasks when using the tool. Moreover, these improvements came with no significant change in task completion time over plain text, even for experienced programmers who navigate text using screen readers set to high words-per-minutes. 
    more » « less