Most mobile health apps employ data visualization to help people view their health and activity data, but these apps provide limited support for visual data exploration. Furthermore, despite its huge potential benefits, mobile visualization research in the personal data context is sparse. This work aims to empower people to easily navigate and compare their personal health data on smartphones by enabling flexible time manipulation with speech. We designed and developed Data@Hand, a mobile app that leverages the synergy of two complementary modalities: speech and touch. Through an exploratory study with 13 long-term Fitbit users, we examined how multimodal interaction helps participants explore their own health data. Participants successfully adopted multimodal interaction (i.e., speech and touch) for convenient and fluid data exploration. Based on the quantitative and qualitative findings, we discuss design implications and opportunities with multimodal interaction for better supporting visual data exploration on mobile devices.
more »
« less
NoteWordy: Investigating Touch and Speech Input on Smartphones for Personal Data Capture
Speech as a natural and low-burden input modality has great potential to support personal data capture. However, little is known about how people use speech input, together with traditional touch input, to capture different types of data in self-tracking contexts. In this work, we designed and developed NoteWordy, a multimodal self-tracking application integrating touch and speech input, and deployed it in the context of productivity tracking for two weeks (N = 17). Our participants used the two input modalities differently, depending on the data type as well as personal preferences, error tolerance for speech recognition issues, and social surroundings. Additionally, we found speech input reduced participants' diary entry time and enhanced the data richness of the free-form text. Drawing from the findings, we discuss opportunities for supporting efficient personal data capture with multimodal input and implications for improving the user experience with natural language input to capture various self-tracking data.
more »
« less
- Award ID(s):
- 1753452
- PAR ID:
- 10394083
- Date Published:
- Journal Name:
- Proceedings of the ACM on Human-Computer Interaction
- Volume:
- 6
- Issue:
- ISS
- ISSN:
- 2573-0142
- Page Range / eLocation ID:
- 568 to 591
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The factors influencing people’s food decisions, such as one’s mood and eating environment, are important information to foster self-reflection and to develop personalized healthy diet. But, it is difficult to consistently collect them due to the heavy data capture burden. In this work, we examine how speech input supports capturing everyday food practice through a week-long data collection study (N = 11). We deployed FoodScrap, a speech-based food journaling app that allows people to capture food components, preparation methods, and food decisions. Using speech input, participants detailed their meal ingredients and elaborated their food decisions by describing the eating moments, explaining their eating strategy, and assessing their food practice. Participants recognized that speech input facilitated self-reflection, but expressed concerns around re-recording, mental load, social constraints, and privacy. We discuss how speech input can support low-burden and reflective food journaling and opportunities for effectively processing and presenting large amounts of speech data.more » « less
-
In this paper, we present work on bringing multimodal interaction to Minecraft. The platform, Multicraft, incorporates speech-based input, eye tracking, and natural language understanding to facilitate more equitable gameplay in Minecraft. We tested the platform with elementary, middle school students and college students through a collection of studies. Students found each of the provided modalities to be a compelling way to play Minecraft. Additionally, we discuss the ways that these different types of multimodal data can be used to identify the meaningful spatial reasoning practices that students demonstrate while playing Minecraft. Collectively, this paper emphasizes the opportunity to bridge a multimodal interface with a means for collecting rich data that can better support diverse learners in non-traditional learning environments.more » « less
-
null (Ed.)This research establishes a better understanding of the syntax choices in speech interactions and of how speech, gesture, and multimodal gesture and speech interactions are produced by users in unconstrained object manipulation environments using augmented reality. The work presents a multimodal elicitation study conducted with 24 participants. The canonical referents for translation, rotation, and scale were used along with some abstract referents (create, destroy, and select). In this study time windows for gesture and speech multimodal interactions are developed using the start and stop times of gestures and speech as well as the stoke times for gestures. While gestures commonly precede speech by 81 ms we find that the stroke of the gesture is commonly within 10 ms of the start of speech. Indicating that the information content of a gesture and its co-occurring speech are well aligned to each other. Lastly, the trends across the most common proposals for each modality are examined. Showing that the disagreement between proposals is often caused by a variation of hand posture or syntax. Allowing us to present aliasing recommendations to increase the percentage of users' natural interactions captured by future multimodal interactive systems.more » « less
-
null (Ed.)Previous research suggests that individuals with weaker receptive language show increased reliance on lexical information for speech perception relative to individuals with stronger receptive language, which may reflect a difference in how acoustic-phonetic and lexical cues are weighted for speech processing. Here we examined whether this relationship is the consequence of conflict between acoustic-phonetic and lexical cues in speech input, which has been found to mediate lexical reliance in sentential contexts. Two groups of participants completed standardized measures of language ability and a phonetic identification task to assess lexical recruitment (i.e., a Ganong task). In the high conflict group, the stimulus input distribution removed natural correlations between acoustic-phonetic and lexical cues, thus placing the two cues in high competition with each other; in the low conflict group, these correlations were present and thus competition was reduced as in natural speech. The results showed that 1) the Ganong effect was larger in the low compared to the high conflict condition in single-word contexts, suggesting that cue conflict dynamically influences online speech perception, 2) the Ganong effect was larger for those with weaker compared to stronger receptive language, and 3) the relationship between the Ganong effect and receptive language was not mediated by the degree to which acoustic-phonetic and lexical cues conflicted in the input. These results suggest that listeners with weaker language ability down-weight acoustic-phonetic cues and rely more heavily on lexical knowledge, even when stimulus input distributions reflect characteristics of natural speech input.more » « less