skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 25, 2026

Title: Identifying the Desired Word Suggestion in Simultaneous Audio
We explore a method for presenting word suggestions for non-visual text input using simultaneous voices. We conduct two perceptual studies and investigate the impact of different presentations of voices on a user's ability to detect which voice, if any, spoke their desired word. Our sets of words simulated the word suggestions of a predictive keyboard during real-world text input. We find that when voices are simultaneous, user accuracy decreases significantly with each added word suggestion. However, adding a slight 0.15s delay between the start of each subsequent word allows two simultaneous words to be presented with no significant decrease in accuracy compared to presenting two words sequentially (84% simultaneous versus 86% sequential). This allows two word suggestions to be presented to the user 32% faster than sequential playback without decreasing accuracy.  more » « less
Award ID(s):
1909248
PAR ID:
10628759
Author(s) / Creator(s):
;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400714023
Page Range / eLocation ID:
154 to 160
Format(s):
Medium: X
Location:
Corfu Island Greece
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Typing every character in a text message may require more time or effort than strictly necessary. Skipping spaces or other characters may be able to speed input and also reduce a user's physical input effort. This can be particularly important for people with motor impairments. In a large crowdsourced study, we found workers frequently abbreviated text by omitting mid-word vowels. We designed a recognizer optimized for noisy input where users often omit spaces and mid-word vowels. We show using neural language models for selecting training text and rescoring sentences improved accuracy. On noisy touchscreen data collected from hundreds of users, we found accurate abbreviated input was possible even if a third of characters were omitted. Finally, in a study where users had to dwell for a second on each key, sentence abbreviated input was competitive with a conventional keyboard with word predictions. After practice, users wrote abbreviated sentences at 9.6 words-per-minute versus word input at 9.9 words-per-minute. 
    more » « less
  2. Text input on mobile devices without physical keys can be challenging for people who are blind or low-vision. We interview 12 blind adults about their experiences with current mobile text input to provide insights into what sorts of interface improvements may be the most beneficial. We identify three primary themes that were experiences or opinions shared by participants: the poor accuracy of dictation, difficulty entering text in noisy environments, and difficulty correcting errors in entered text. We also discuss an experimental non-visual text input method with each participant to solicit opinions on the method and probe their willingness to learn a novel method. We find that the largest concern was the time required to learn a new technique. We find that the majority of our participants do not use word predictions while typing but instead find it faster to finish typing words manually. Finally, we distill five future directions for non-visual text input: improved dictation, less reliance on or improved audio feedback, improved error correction, reducing the barrier to entry for new methods, and more fluid non-visual word predictions. 
    more » « less
  3. Text correction on mobile devices usually requires precise and repetitive manual control. In this paper, we present EyeSayCorrect, an eye gaze and voice based hands-free text correction method for mobile devices. To correct text with EyeSayCorrect, the user first utilizes the gaze location on the screen to select a word, then speaks the new phrase. EyeSayCorrect would then infer the user’s correction intention based on the inputs and the text context. We used a Bayesian approach for determining the selected word given an eye-gaze trajectory. Given each sampling point in an eye-gaze trajectory, the posterior probability of selecting a word is calculated and accumulated. The target word would be selected when its accumulated interest is larger than a threshold. The misspelt words have higher priors. Our user studies showed that using priors for misspelt words reduced the task completion time up to 23.79% and the text selection time up to 40.35%, and EyeSayCorrect is a feasible hands-free text correction method on mobile devices. 
    more » « less
  4. Pre-trained language models (PLMs) aim to learn universal language representations by conducting self-supervised training tasks on large-scale corpora. Since PLMs capture word semantics in different contexts, the quality of word representations highly depends on word frequency, which usually follows a heavy-tailed distributions in the pre-training corpus. Therefore, the embeddings of rare words on the tail are usually poorly optimized. In this work, we focus on enhancing language model pre-training by leveraging definitions of the rare words in dictionaries (e.g., Wiktionary). To incorporate a rare word definition as a part of input, we fetch its definition from the dictionary and append it to the end of the input text sequence. In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary. We evaluate the proposed Dict-BERT model on the language understanding benchmark GLUE and eight specialized domain benchmark datasets. Extensive experiments demonstrate that Dict-BERT can significantly improve the understanding of rare words and boost model performance on various NLP downstream tasks. 
    more » « less
  5. State-of-the-art text spotting systems typically aim to detect isolated words or word-by-word text in images of natural scenes and ignore the semantic coherence within a region of text. However, when interpreted together, seemingly isolated words may be easier to recognize. On this basis, we propose a novel "semantic-based text recognition" (STR) deep learning model that reads text in images with the help of understanding context. STR consists of several modules. We introduce the Text Grouping and Arranging (TGA) algorithm to connect and order isolated text regions. A text-recognition network interprets isolated words. Benefiting from semantic information, a sequence-to-sequence network model efficiently corrects inaccurate and uncertain phrases produced earlier in the STR pipeline. We present experiments on two new distinct datasets that contain scanned catalog images of interior designs and photographs of protesters with hand-written signs, respectively. Our results show that our STR model outperforms a baseline method that uses state-of-the-art single-word recognition techniques on both datasets. STR yields a high accuracy rate of 90% on the catalog images and 71% on the more difficult protest images, suggesting its generality in recognizing text. 
    more » « less