skip to main content


Title: A Dataset of Noisy Typing on QWERTY Keyboards
Text entry is a common and important part of many intelligent user interfaces. However, inferring a user’s intended text from their input can be challenging: motor actions can be imprecise, input sensors can be noisy, and situations or disabilities can hamper a user’s perception of interface feedback. Numerous prior studies have explored input on touchscreen phones, smartwatches, in midair, and on desktop keyboards. Based on these prior studies, we are releasing a large and diverse data set of noisy typing input consisting of thousands of sentences written by hundreds of users on QWERTY-layout keyboards. This paper describes the various subsets contained in this new research dataset as well as the data format.  more » « less
Award ID(s):
1909248 1750193
NSF-PAR ID:
10404000
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Companion Proceedings of the 28th International Conference on Intelligent User Interfaces
Page Range / eLocation ID:
251 to 254
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Keystroke dynamics study the way in which users input text via their keyboards, which is unique to each individual, and can form a component of a behavioral biometric system to improve existing account security. Keystroke dynamics systems on free-text data use n-graphs that measure the timing between consecutive keystrokes to distinguish between users. Many algorithms require 500, 1,000, or more keystrokes to achieve EERs of below 10%. In this paper, we propose an instance-based graph comparison algorithm to reduce the number of keystrokes required to authenticate users. Commonly used features such as monographs and digraphs are investigated. Feature importance is determined and used to construct a fused classifier. Detection error tradeoff (DET) curves are produced with different numbers of keystrokes. The fused classifier outperforms the state-of-the-art with EERs of 7.9%, 5.7%, 3.4%, and 2.7% for test samples of 50, 100, 200, and 500 keystrokes. 
    more » « less
  2. Bottoni, Paolo ; Panizzi, Emanuele (Ed.)
    Many questions regarding single-hand text entry on modern smartphones (in particular, large-screen smartphones) remain under-explored, such as, (i) will the existing prevailing single-handed keyboards fit for large-screen smartphone users? and (ii) will individual customization improve single-handed keyboard performance? In this paper we study single-handed typing behaviors on several representative keyboards on large-screen mobile devices.We found that, (i) the user-adaptable-shape curved keyboard performs best among all the studied keyboards; (ii) users’ familiarity with the Qwerty layout plays a significant role at the beginning, but after several sessions of training, the user-adaptable curved keyboard can have the best learning curve and performs best; (iii) generally the statistical decoding algorithms via spatial and language models can well handle the input noise from single-handed typing. 
    more » « less
  3. null (Ed.)
    Typing every character in a text message may require more time or effort than strictly necessary. Skipping spaces or other characters may be able to speed input and also reduce a user's physical input effort. This can be particularly important for people with motor impairments. In a large crowdsourced study, we found workers frequently abbreviated text by omitting mid-word vowels. We designed a recognizer optimized for noisy input where users often omit spaces and mid-word vowels. We show using neural language models for selecting training text and rescoring sentences improved accuracy. On noisy touchscreen data collected from hundreds of users, we found accurate abbreviated input was possible even if a third of characters were omitted. Finally, in a study where users had to dwell for a second on each key, sentence abbreviated input was competitive with a conventional keyboard with word predictions. After practice, users wrote abbreviated sentences at 9.6 words-per-minute versus word input at 9.9 words-per-minute. 
    more » « less
  4. Abstract We present a method for mining the web for text entered on mobile devices. Using searching, crawling, and parsing techniques, we locate text that can be reliably identified as originating from 300 mobile devices. This includes 341,000 sentences written on iPhones alone. Our data enables a richer understanding of how users type “in the wild” on their mobile devices. We compare text and error characteristics of different device types, such as touchscreen phones, phones with physical keyboards, and tablet computers. Using our mined data, we train language models and evaluate these models on mobile test data. A mixture model trained on our mined data, Twitter, blog, and forum data predicts mobile text better than baseline models. Using phone and smartwatch typing data from 135 users, we demonstrate our models improve the recognition accuracy and word predictions of a state-of-the-art touchscreen virtual keyboard decoder. Finally, we make our language models and mined dataset available to other researchers. 
    more » « less
  5. One longstanding complication with Earth data discovery involves understanding a user’s search intent from the input query. Most of the geospatial data portals use keyword-based match to search data. Little attention has focused on the spatial and temporal information from a query or understanding the query with ontology. No research in the geospatial domain has investigated user queries in a systematic way. Here, we propose a query understanding framework and apply it to fill the gap by better interpreting a user’s search intent for Earth data search engines and adopting knowledge that was mined from metadata and user query logs. The proposed query understanding tool contains four components: spatial and temporal parsing; concept recognition; Named Entity Recognition (NER); and, semantic query expansion. Spatial and temporal parsing detects the spatial bounding box and temporal range from a query. Concept recognition isolates clauses from free text and provides the search engine phrases instead of a list of words. Name entity recognition detects entities from the query, which inform the search engine to query the entities detected. The semantic query expansion module expands the original query by adding synonyms and acronyms to phrases in the query that was discovered from Web usage data and metadata. The four modules interact to parse a user’s query from multiple perspectives, with the goal of understanding the consumer’s quest intent for data. As a proof-of-concept, the framework is applied to oceanographic data discovery. It is demonstrated that the proposed framework accurately captures a user’s intent. 
    more » « less