skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Revisiting Blind Photography in the Context of Teachable Object Recognizers
For people with visual impairments, photography is essential in identifying objects through remote sighted help and image recognition apps. This is especially the case for teachable object recognizers, where recognition models are trained on user's photos. Here, we propose real-time feedback for communicating the location of an object of interest in the camera frame. Our audio-haptic feedback is powered by a deep learning model that estimates the object center location based on its proximity to the user's hand. To evaluate our approach, we conducted a user study in the lab, where participants with visual impairments (N=9) used our feedback to train and test their object recognizer in vanilla and cluttered environments. We found that very few photos did not include the object (2% in the vanilla and 8% in the cluttered) and the recognition performance was promising even for participants with no prior camera experience. Participants tended to trust the feedback even though they know it can be wrong. Our cluster analysis indicates that better feedback is associated with photos that include the entire object. Our results provide insights into factors that can degrade feedback and recognition performance in teachable interfaces.  more » « less
Award ID(s):
1816380
PAR ID:
10180846
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
The 21st International ACM SIGACCESS Conference on Computers and Accessibility
Page Range / eLocation ID:
83 to 95
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Teachable object recognizers provide a solution for a very practical need for blind people – instance level object recognition. They assume one can visually inspect the photos they provide for training, a critical and inaccessible step for those who are blind. In this work, we engineer data descriptors that address this challenge. They indicate in real time whether the object in the photo is cropped or too small, a hand is included, the photos is blurred, and how much photos vary from each other. Our descriptors are built into open source testbed iOS app, called MYCam. In a remote user study in (N = 12) blind participants’ homes, we show how descriptors, even when error-prone, support experimentation and have a positive impact in the quality of training set that can translate to model performance though this gain is not uniform. Participants found the app simple to use indicating that they could effectively train it and that the descriptors were useful. However, many found the training being tedious, opening discussions around the need for balance between information, time, and cognitive load. 
    more » « less
  2. Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet. Further, they are quantitatively accurate models of temporally-averaged responses of neurons in the primate brain's visual system. However, biological visual systems have two ubiquitous architectural features not shared with typical CNNs: local recurrence within cortical areas, and long-range feedback from downstream areas to upstream areas. Here we explored the role of recurrence in improving classification performance. We found that standard forms of recurrence (vanilla RNNs and LSTMs) do not perform well within deep CNNs on the ImageNet task. In contrast, novel cells that incorporated two structural features, bypassing and gating, were able to boost task accuracy substantially. We extended these design principles in an automated search over thousands of model architectures, which identified novel local recurrent cells and long-range feedback connections useful for object recognition. Moreover, these task-optimized ConvRNNs matched the dynamics of neural activity in the primate visual system better than feedforward networks, suggesting a role for the brain's recurrent connections in performing difficult visual behaviors. 
    more » « less
  3. In this paper, we design and evaluate a novel form of visually-simulated haptic feedback cue for communicating weight in robot teleoperation. We propose that a visuo-proprioceptive cue results from inconsistencies created between the user's visual and proprioceptive senses when the robot's movement differs from the movement of the user's input. In a user study where participants teleoperate a six-DoF robot arm, we demonstrate the feasibility of using such a cue for communicating weight in four telemanipulation tasks to enhance user experience and task performance. 
    more » « less
  4. Visual qualitative methodologies enhance the richness of data and makes participants experts on the object of interest. Visual data brings another dimension to the evaluation process, besides surveys and interviews, as well as depth and breadth to participants reactions to specific program activities. Visual data consists of images, such as photos, drawings, artwork, among others. Exploring a different approach to assess impact of an educational activity, an exercise was designed where participants were asked to take photos to document a site visit to an area impacted by a swarm of earthquakes in 2019. The exercise required taking five photos of either objects, persons, scenery, structures, or any other thing that captured their attention during the visit and write a reflective essay to answer three questions: 1) How do these photos represent your site visit experience? 2) Based on the content of your photos, write about what you learned, discovered, new knowledge acquired, emotions, changes in your way of thinking, etc., and 3) What did you learned or discovered from doing this exercise? Twenty-two undergraduate engineering and architecture students from the RISE-UP Program, enrolled in a curricular sequence in design and construction of resilient and sustainable structures, completed the exercise. Analyses of obtained data includes frequency of captured images and content analysis of reflective essays to determine instances where each of the four proposed learning objectives was present. Results show that across essays, 32% of the essays include text that demonstrate impact related to the first objective, 59% for the second, 73% for the third, and 86% for the fourth objective. Forty-five percent of essays included text considered relevant but not related to an objective. Personal, social, and career insights were categorized as unintended results. Photos taken by students represent what they considered relevant during the visit and also evidence the achievement of the proposed learning objectives. In general, three mayor categories emerged from the content in photos: 1) photos related to the design and construction of the structure and specific damage observed from earthquakes; 2) photos of classmates, professors, and group activities; and 3) other photos that do not share a theme. Both photos and essays demonstrate that the learning objectives were successfully achieved and encourage the use of visual data as an alternative for the evaluation of educational activities. 
    more » « less
  5. Navigating unfamiliar websites is challenging for users with visual impairments. Although many websites offer visual cues to facilitate access to pages/features most websites are expected to have (e.g., log in at the top right), such visual shortcuts are not accessible to users with visual impairments. Moreover, although such pages serve the same functionality across websites (e.g., to log in, to sign up), the location, wording, and navigation path of links to these pages vary from one website to another. Such inconsistencies are challenging for users with visual impairments, especially for users of screen readers, who often need to linearly listen to content of pages to figure out how to access certain website features. To study how to improve access to main website features, we iteratively designed and tested a command-based approach for main features of websites via a browser extension powered by machine learning and human input. The browser extension gives users a way to access high-level website features (e.g., log in, find stores, contact) via keyboard commands. We tested the browser extension in a lab setting with 15 Internet users, including 9 users with visual impairments and 6 without. Our study showed that commands for main website features can greatly improve the experience of users with visual impairments. People without visual impairments also found command-based access helpful when visiting unfamiliar, cluttered, or infrequently visited websites, suggesting that this approach can support users with visual impairments while also benefiting other user groups (i.e., universal design). Our study reveals concerns about the handling of unsupported commands and the availability and trustworthiness of human input. We discuss how websites, browsers, and assistive technologies could incorporate a command-based paradigm to enhance web accessibility and provide more consistency on the web to benefit users with varied abilities when navigating unfamiliar or complex websites. 
    more » « less