skip to main content


Title: Nonverbal Communication Cue Recognition: A Pathway to More Accessible Communication
Nonverbal communication, such as body language, facial expressions, and hand gestures, is crucial to human communication as it conveys more information about emotions and attitudes than spoken words. However, individuals who are blind or have low-vision (BLV) may not have access to this method of communication, leading to asymmetry in conversations. Developing systems to recognize nonverbal communication cues (NVCs) for the BLV community would enhance communication and understanding for both parties. This paper focuses on developing a multimodal computer vision system to recognize and detect NVCs. To accomplish our objective, we are collecting a dataset focused on nonverbal communication cues. Here, we propose a baseline model for recognizing NVCs and present initial results on the Aff-Wild2 dataset. Our baseline model achieved an accuracy of 68% and a F1-Score of 64% on the Aff-Wild2 validation set, making it comparable with previous state of the art results. Furthermore, we discuss the various challenges associated with NVC recognition as well as the limitations of our current work.  more » « less
Award ID(s):
2041307
NSF-PAR ID:
10428814
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
In proceedings of Women in Computer Vision Workshop in conjunction with IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Blind and low-vision (BLV) people watch sports through radio broadcasts that offer a play-by-play description of the game. However, recent trends show a decline in the availability and quality of radio broadcasts due to the rise of video streaming platforms on the internet and the cost of hiring professional announcers. As a result, sports broadcasts have now become even more inaccessible to BLV people. In this work, we present Immersive A/V, a technique for making sports broadcasts —in our case, tennis broadcasts— accessible and immersive to BLV viewers by automatically extracting gameplay information and conveying it through an added layer of spatialized audio cues. Immersive A/V conveys players’ positions and actions as detected by computer vision-based video analysis, allowing BLV viewers to visualize the action. We designed Immersive A/V based on results from a formative study with BLV participants. We conclude by outlining our plans for evaluating Immersive A/V and the future implications of this research. 
    more » « less
  2. Augmentative and alternative communication (AAC) devices enable speech-based communication, but generating speech is not the only resource needed to have a successful conversation. Being able to signal one wishes to take a turn by raising a hand or providing some other cue is critical in securing a turn to speak. Experienced conversation partners know how to recognize the nonverbal communication an augmented communicator (AC) displays, but these same nonverbal gestures can be hard to interpret by people who meet an AC for the first time. Prior work has identified motion through robots and expressive objects as a modality that can support communication. In this work, we work closely with an AAC user to understand how motion through a physical expressive object can support their communication. We present our process and resulting lessons on the designed object and the co-design process. 
    more » « less
  3. Humans convey their intentions through the usage of both verbal and nonverbal behaviors during face-to-face communication. Speaker intentions often vary dynamically depending on different nonverbal contexts, such as vocal patterns and facial expressions. As a result, when modeling human language, it is essential to not only consider the literal meaning of the words but also the nonverbal contexts in which these words appear. To better model human language, we first model expressive nonverbal representations by analyzing the fine-grained visual and acoustic patterns that occur during word segments. In addition, we seek to capture the dynamic nature of nonverbal intents by shifting word representations based on the accompanying nonverbal behaviors. To this end, we propose the Recurrent Attended Variation Embedding Network (RAVEN) that models the fine-grained structure of nonverbal subword sequences and dynamically shifts word representations based on nonverbal cues. Our proposed model achieves competitive performance on two publicly available datasets for multimodal sentiment analysis and emotion recognition. We also visualize the shifted word representations in different nonverbal contexts and summarize common patterns regarding multimodal variations of word representations. 
    more » « less
  4. In this paper, we propose a novel method for underwater robot-to-human communication using the motion of the robot as “body language”. To evaluate this system, we develop simulated examples of the system's body language gestures, called kinemes, and compare them to a baseline system using flashing colored lights through a user study. Our work shows evidence that motion can be used as a successful communication vector which is accurate, easy to learn, and quick enough to be used, all without requiring any additional hardware to be added to our platform. We thus contribute to “closing the loop” for human-robot interaction underwater by proposing and testing this system, suggesting a library of possible body language gestures for underwater robots, and offering insight on the design of nonverbal robot-to-human communication methods. 
    more » « less
  5. A person’s appearance, identity, and other nonverbal cues can substantially influence how one is perceived by a negotiation counterpart, potentially impacting the outcome of the negotiation. With recent advances in technology, it is now possible to alter such cues through real‐time video communication. In many cases, a person’s physical presence can explicitly be replaced by 2D/3D representations in live interactive media. In other cases, technologies such as deepfake can subtly and implicitly alter many nonverbal cues—including a person’s appearance and identity—in real time. In this article, we look at some state‐of‐the‐art technological advances that can enable such explicit and implicit alterations of nonverbal cues. We also discuss the implications of such technology for the negotiation landscape and highlight ethical considerations that warrant deep, ongoing attention from stakeholders.

     
    more » « less