skip to main content


Title: NoteWordy: Investigating Touch and Speech Input on Smartphones for Personal Data Capture
Speech as a natural and low-burden input modality has great potential to support personal data capture. However, little is known about how people use speech input, together with traditional touch input, to capture different types of data in self-tracking contexts. In this work, we designed and developed NoteWordy, a multimodal self-tracking application integrating touch and speech input, and deployed it in the context of productivity tracking for two weeks (N = 17). Our participants used the two input modalities differently, depending on the data type as well as personal preferences, error tolerance for speech recognition issues, and social surroundings. Additionally, we found speech input reduced participants' diary entry time and enhanced the data richness of the free-form text. Drawing from the findings, we discuss opportunities for supporting efficient personal data capture with multimodal input and implications for improving the user experience with natural language input to capture various self-tracking data.  more » « less
Award ID(s):
1753452
NSF-PAR ID:
10394083
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the ACM on Human-Computer Interaction
Volume:
6
Issue:
ISS
ISSN:
2573-0142
Page Range / eLocation ID:
568 to 591
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Most mobile health apps employ data visualization to help people view their health and activity data, but these apps provide limited support for visual data exploration. Furthermore, despite its huge potential benefits, mobile visualization research in the personal data context is sparse. This work aims to empower people to easily navigate and compare their personal health data on smartphones by enabling flexible time manipulation with speech. We designed and developed Data@Hand, a mobile app that leverages the synergy of two complementary modalities: speech and touch. Through an exploratory study with 13 long-term Fitbit users, we examined how multimodal interaction helps participants explore their own health data. Participants successfully adopted multimodal interaction (i.e., speech and touch) for convenient and fluid data exploration. Based on the quantitative and qualitative findings, we discuss design implications and opportunities with multimodal interaction for better supporting visual data exploration on mobile devices. 
    more » « less
  2. The factors influencing people’s food decisions, such as one’s mood and eating environment, are important information to foster self-reflection and to develop personalized healthy diet. But, it is difficult to consistently collect them due to the heavy data capture burden. In this work, we examine how speech input supports capturing everyday food practice through a week-long data collection study (N = 11). We deployed FoodScrap, a speech-based food journaling app that allows people to capture food components, preparation methods, and food decisions. Using speech input, participants detailed their meal ingredients and elaborated their food decisions by describing the eating moments, explaining their eating strategy, and assessing their food practice. Participants recognized that speech input facilitated self-reflection, but expressed concerns around re-recording, mental load, social constraints, and privacy. We discuss how speech input can support low-burden and reflective food journaling and opportunities for effectively processing and presenting large amounts of speech data. 
    more » « less
  3. Abstract

    Effective interactions between humans and robots are vital to achieving shared tasks in collaborative processes. Robots can utilize diverse communication channels to interact with humans, such as hearing, speech, sight, touch, and learning. Our focus, amidst the various means of interactions between humans and robots, is on three emerging frontiers that significantly impact the future directions of human–robot interaction (HRI): (i) human–robot collaboration inspired by human–human collaboration, (ii) brain-computer interfaces, and (iii) emotional intelligent perception. First, we explore advanced techniques for human–robot collaboration, covering a range of methods from compliance and performance-based approaches to synergistic and learning-based strategies, including learning from demonstration, active learning, and learning from complex tasks. Then, we examine innovative uses of brain-computer interfaces for enhancing HRI, with a focus on applications in rehabilitation, communication, brain state and emotion recognition. Finally, we investigate the emotional intelligence in robotics, focusing on translating human emotions to robots via facial expressions, body gestures, and eye-tracking for fluid, natural interactions. Recent developments in these emerging frontiers and their impact on HRI were detailed and discussed. We highlight contemporary trends and emerging advancements in the field. Ultimately, this paper underscores the necessity of a multimodal approach in developing systems capable of adaptive behavior and effective interaction between humans and robots, thus offering a thorough understanding of the diverse modalities essential for maximizing the potential of HRI.

     
    more » « less
  4. In this paper, we present work on bringing multimodal interaction to Minecraft. The platform, Multicraft, incorporates speech-based input, eye tracking, and natural language understanding to facilitate more equitable gameplay in Minecraft. We tested the platform with elementary, middle school students and college students through a collection of studies. Students found each of the provided modalities to be a compelling way to play Minecraft. Additionally, we discuss the ways that these different types of multimodal data can be used to identify the meaningful spatial reasoning practices that students demonstrate while playing Minecraft. Collectively, this paper emphasizes the opportunity to bridge a multimodal interface with a means for collecting rich data that can better support diverse learners in non-traditional learning environments. 
    more » « less
  5. null (Ed.)
    Smart speakers such as Amazon Echo present promising opportunities for exploring voice interaction in the domain of in-home exercise tracking. In this work, we examine if and how voice interaction complements and augments a mobile app in promoting consistent exercise. We designed and developed TandemTrack, which combines a mobile app and an Alexa skill to support exercise regimen, data capture, feedback, and reminder. We then conducted a four-week between-subjects study deploying TandemTrack to 22 participants who were instructed to follow a short daily exercise regimen: one group used only the mobile app and the other group used both the app and the skill. We collected rich data on individuals' exercise adherence and performance, and their use of voice and visual interactions, while examining how TandemTrack as a whole influenced their exercise experience. Reflecting on these data, we discuss the benefits and challenges of incorporating voice interaction to assist daily exercise, and implications for designing effective multimodal systems to support self-tracking and promote consistent exercise. 
    more » « less