skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: EyeDescribe: Combining Eye Gaze and Speech to Automatically Create Accessible Touch Screen Artwork
Many images on the Web, including photographs and artistic images, feature spatial relationships between objects that are inaccessible to someone who is blind or visually impaired even when a text description is provided. While some tools exist to manually create accessible image descriptions, this work is time consuming and requires specialized tools. We introduce an approach that automatically creates spatially registered image labels based on how a sighted person naturally interacts with the image. Our system collects behavioral data from sighted viewers of an image, specifically eye gaze data and spoken descriptions, and uses them to generate a spatially indexed accessible image that can then be explored using an audio-based touch screen application. We describe our approach to assigning text labels to locations in an image based on eye gaze. We then report on two formative studies with blind users testing EyeDescribe. Our approach resulted in correct labels for all objects in our image set. Participants were able to better recall the location of objects when given both object labels and spatial locations. This approach provides a new method for creating accessible images with minimum required effort.  more » « less
Award ID(s):
1652907
PAR ID:
10165065
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ISS '19: Proceedings of the 2019 ACM International Conference on Interactive Surfaces and Spaces
Page Range / eLocation ID:
101 to 112
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Text correction on mobile devices usually requires precise and repetitive manual control. In this paper, we present EyeSayCorrect, an eye gaze and voice based hands-free text correction method for mobile devices. To correct text with EyeSayCorrect, the user first utilizes the gaze location on the screen to select a word, then speaks the new phrase. EyeSayCorrect would then infer the user’s correction intention based on the inputs and the text context. We used a Bayesian approach for determining the selected word given an eye-gaze trajectory. Given each sampling point in an eye-gaze trajectory, the posterior probability of selecting a word is calculated and accumulated. The target word would be selected when its accumulated interest is larger than a threshold. The misspelt words have higher priors. Our user studies showed that using priors for misspelt words reduced the task completion time up to 23.79% and the text selection time up to 40.35%, and EyeSayCorrect is a feasible hands-free text correction method on mobile devices. 
    more » « less
  2. Abstract We present an experimental investigation of spatial audio feedback using smartphones to support direction localization in pointing tasks for people with visual impairments (PVIs). We do this using a mobile game based on a bow-and-arrow metaphor. Our game provides a combination of spatial and non-spatial (sound beacon) audio to help the user locate the direction of the target. Our experiments with sighted, sighted-blindfolded, and visually impaired users shows that (a) the efficacy of spatial audio is relatively higher for PVIs than for blindfolded sighted users during the initial reaction time for direction localization, (b) the general behavior between PVIs and blind-folded individuals is statistically similar, and (c) the lack of spatial audio significantly reduces the localization performance even in sighted blind-folded users. Based on our findings, we discuss the system and interaction design implications for making future mobile-based spatial interactions accessible to PVIs. 
    more » « less
  3. In the last decade, there has been a surge in development and mainstream adoption of Artificial Intelligence (AI) systems that can generate textual image descriptions from images. However, only a few of these, such as Microsoft’s SeeingAI, are specifically tailored to needs of people who are blind screen reader users, and none of these have been brought to bear on the particular challenges faced by parents who desire image descriptions of children’s picture books. Such images have distinct qualities, but there exists no research to explore the current state of the art and opportunities to improve image-to-text AI systems for this problem domain. We conducted a content analysis of the image descriptions generated for a sample of 20 images selected from 17 recently published children’s picture books, using five AI systems: asticaVision, BLIP, SeeingAI, TapTapSee, and VertexAI. We found that descriptions varied widely in their accuracy and completeness, with only 13% meeting both criteria. Overall, our findings suggest a need for AI image-to-text generation systems that are trained on the types, contents, styles, and layouts characteristic of children’s picture book images, towards increased accessibility for blind parents. 
    more » « less
  4. Geovisualizations are powerful tools for exploratory spatial analysis, enabling sighted users to discern patterns, trends, and relationships within geographic data. However, these visual tools have remained largely inaccessible to screen-reader users. We introduce AltGeoViz, a new interactive geovisualization approach that dynamically generates alt-text descriptions based on the user’s current map view, providing voiceover summaries of spatial patterns and descriptive statistics. In a remote user study with five screenreader users, we found that participants were able to interact with spatial data in previously infeasible ways, demonstrated a clear understanding of data summaries and their location context, and could synthesize spatial understandings of their explorations. Moreover, we identified key areas for improvement, such as the addition of spatial navigation controls and comparative analysis features 
    more » « less
  5. Human-robot collaboration systems benefit from recognizing people’s intentions. This capability is especially useful for collaborative manipulation applications, in which users operate robot arms to manipulate objects. For collaborative manipulation, systems can determine users’ intentions by tracking eye gaze and identifying gaze fixations on particular objects in the scene (i.e., semantic gaze labeling). Translating 2D fixation locations (from eye trackers) into 3D fixation locations (in the real world) is a technical challenge. One approach is to assign each fixation to the object closest to it. However, calibration drift, head motion, and the extra dimension required for real-world interactions make this position matching approach inaccurate. In this work, we introduce velocity features that compare the relative motion between subsequent gaze fixations and a finite set of known points and assign fixation position to one of those known points. We validate our approach on synthetic data to demonstrate that classifying using velocity features is more robust than a position matching approach. In addition, we show that a classifier using velocity features improves semantic labeling on a real-world dataset of human-robot assistive manipulation interactions. 
    more » « less