This content will become publicly available on January 1, 2025
- Award ID(s):
- 1943072
- PAR ID:
- 10503737
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Communications of the ACM
- Volume:
- 67
- Issue:
- 1
- ISSN:
- 0001-0782
- Page Range / eLocation ID:
- 123 to 131
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
null (Ed.)Augmentative and alternative communication (AAC) devices enable speech-based communication. However, AAC devices do not support nonverbal communication, which allows people to take turns, regulate conversation dynamics, and express intentions. Nonverbal communication requires motion, which is often challenging for AAC users to produce due to motor constraints. In this work, we explore how socially assistive robots, framed as ''sidekicks,'' might provide augmented communicators (ACs) with a nonverbal channel of communication to support their conversational goals. We developed and conducted an accessible co-design workshop that involved two ACs, their caregivers, and three motion experts. We identified goals for conversational support, co-designed prototypes depicting possible sidekick forms, and enacted different sidekick motions and behaviors to achieve speakers' goals. We contribute guidelines for designing sidekicks that support ACs according to three key parameters: attention, precision, and timing. We show how these parameters manifest in appearance and behavior and how they can guide future designs for augmented nonverbal communication.more » « less
-
Making good letter or word predictions can help accelerate the communication of users of high-tech AAC devices. This is particularly important for real-time person-to-person conversations. We investigate whether per forming speech recognition on the speaking-side of a conversation can improve language model based predictions. We compare the accuracy of three plausible microphone deployment options and the accuracy of two commercial speech recognition engines (Google and IBM Watson). We found that despite recognition word error rates of 7-16%, our ensemble of N-gram and recurrent neural network language models made predictions nearly as good as when they used the reference transcripts.more » « less
-
Abstract Losing the ability to communicate inhibits social contact, creates feelings of frustration and isolation and complicates personal comfort and medical care. Progressive diseases such as amyotrophic lateral sclerosis (ALS) and multiple sclerosis (MS) can cause severe motor disabilities that make communication through traditional means difficult, slow, and exhausting, even with the support of augmentative and alternative communication (AAC) systems. Using a design science research approach, we seek to improve the communication process for individuals with severe motor disabilities. We develop a series of design requirements to inform the creation and evaluation of an artefact, an AAC system that incorporates context‐aware user profiles to improve the communication process for individuals with severe motor disabilities. We derive prescriptive knowledge through the creation of design principles based on our findings and justify these design principles using the lens of media synchronicity theory (MST). This research identifies opportunities for further research related to MST and provides insights to inform those designing communication systems for individuals that rely on AAC systems.
-
Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations.more » « less
-
Abstract For nearly 25 years, researchers have recognized the rich and numerous facets of native perception of non‐native speech, driving a large, and growing, body of work that has shed light on how native listeners understand non‐native speech. The bulk of this work, however, has focused on the talker. That is, most researchers have asked what perception of non‐native speech tells us about the non‐native speaker, or when interacting with non‐native speakers more generally. It is clear that listeners perceive speech not only in terms of the acoustic signal, but also with their own experience and biases driving their perception. It is also clear that native listeners can improve their perception of non‐native speech for both familiar and unfamiliar accents. Therefore, it is imperative that research in non‐native communication also consider an active role for the listener. To truly understand communication between native and non‐native speakers, it is critically important to understand both the properties of non‐native speech and how this speech is perceived. In the present review, we describe non‐native speech and then review previous research, examining the methodological shift from using native listeners as tools to understand properties of non‐native speech to understanding listeners as partners in conversation. We discuss how current models not only limit our understanding of non‐native speech, but also limit what types of questions researchers set out to answer. We demonstrate that while non‐native speakers capable of shifting their productions to be better understood by listeners, native listeners are also capable of shifting their perception to more accurately perceive non‐native speech. We conclude by setting forth a series of recommendations for future research, emphasizing the contributions of native listeners and non‐native speakers as equally important for communicative success.