skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: Enhancing the Composition Task in Text Entry Studies: Eliciting Difficult Text and Improving Error Rate Calculation
Participants in text entry studies usually copy phrases or compose novel messages. A composition task mimics actual user behavior and can allow researchers to better understand how a system might perform in reality. A problem with composition is that participants may gravitate towards writing simple text, that is, text containing only common words. Such simple text is insufficient to explore all factors governing a text entry method, such as its error correction features. We contribute to enhancing composition tasks in two ways. First, we show participants can modulate the difficulty of their compositions based on simple instructions. While it took more time to compose difficult messages, they were longer, had more difficult words, and resulted in more use of error correction features. Second, we compare two methods for obtaining a participant’s intended text, comparing both methods with a previously proposed crowdsourced judging procedure. We found participant-supplied references were more accurate.  more » « less
Award ID(s):
1909248
NSF-PAR ID:
10237296
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems
Page Range / eLocation ID:
1 to 8
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In many situations, it may be impractical or impossible to enter text by selecting precise locations on a physical or touchscreen keyboard. We present an ambiguous keyboard with four character groups that has potential applications for eyes-free text entry, as well as text entry using a single switch or a brain-computer interface.We develop a procedure for optimizing these character groupings based on a disambiguation algorithm that leverages a long-span language model. We produce both alphabetically-constrained and unconstrained character groups in an offline optimization experiment and compare them in a longitudinal user study. Our results did not show a significant difference between the constrained and unconstrained character groups after four hours of practice. As expected, participants had significantly more errors with the unconstrained groups in the first session, suggesting a higher barrier to learning the technique.We therefore recommend the alphabetically-constrained character groups, where participants were able to achieve an average entry rate of 12.0 words per minute with a 2.03% character error rate using a single hand and with no visual feedback. 
    more » « less
  2. Given that depression is one of the most prevalent mental illnesses, developing effective and unobtrusive diagnosis tools is of great importance. Recent work that screens for depression with text messages leverage models relying on lexical category features. Given the colloquial nature of text messages, the performance of these models may be limited by formal lexicons. We thus propose a strategy to automatically construct alternative lexicons that contain more relevant and colloquial terms. Specifically, we generate 36 lexicons from fiction, forum, and news corpuses. These lexicons are then used to extract lexical category features from the text messages. We utilize machine learning models to compare the depression screening capabilities of these lexical category features. Out of our 36 constructed lexicons, 14 achieved statistically significantly higher average F1 scores over the pre-existing formal lexicon and basic bag-of-words approach. In comparison to the pre-existing lexicon, our best performing lexicon increased the average F1 scores by 10%. We thus confirm our hypothesis that less formal lexicons can improve the performance of classification models that screen for depression with text messages. By providing our automatically constructed lexicons, we aid future machine learning research that leverages less formal text. 
    more » « less
  3. null (Ed.)
    Text entry makes up about one-fourth of the smartphone interaction events, and is known to be challenging and difficult. However, there has been little study about the characteristics of text entry in the context of smartphone app usage. In this paper, we present a mixed-method in-situ study conducted in 2016 with 17 active smartphone users to better understand text entry in smartphone app usage. Our results show 80% of text was entered into communication apps, with different apps exhibiting distinct usage patterns. We found that structured data such as URLs and email addresses are rarely typed but instead are auto-completed or replaced with search, copy-and-paste is rarely used, and sessions of smartphone usage with text entry involve more apps and last longer. We conclude with a discussion about the implications on the development of systems to better support mobile interaction. 
    more » « less
  4. As text generated by large language models proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer. Prior work on human detection of generated text focuses on the case where an entire passage is either human-written or machine-generated. In this paper, we study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models. We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time. Furthermore, we conduct a detailed comparison study and analyze how a variety of variables (model size, decoding strategy, fine-tuning, prompt genre, etc.) affect human detection performance. Finally, we collect error annotations from our participants and use them to show that certain textual genres influence models to make different types of errors and that certain sentence-level features correlate highly with annotator selection. We release the RoFT dataset: a collection of over 21,000 human annotations paired with error classifications to encourage future work in human detection and evaluation of generated text. 
    more » « less
  5. As text generated by large language models proliferates, it becomes vital to understand how humans engage with such text, and whether or not they are able to detect when the text they are reading did not originate with a human writer. Prior work on human detection of generated text focuses on the case where an entire passage is either human-written or machine-generated. In this paper, we study a more realistic setting where text begins as human-written and transitions to being generated by state-of-the-art neural language models. We show that, while annotators often struggle at this task, there is substantial variance in annotator skill and that given proper incentives, annotators can improve at this task over time. Furthermore, we conduct a detailed comparison study and analyze how a variety of variables (model size, decoding strategy, fine-tuning, prompt genre, etc.) affect human detection performance. Finally, we collect error annotations from our participants and use them to show that certain textual genres influence models to make different types of errors and that certain sentence-level features correlate highly with annotator selection. We release the RoFT dataset: a collection of over 21,000 human annotations paired with error classifications to encourage future work in human detection and evaluation of generated text. 
    more » « less