NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Dynamic Visualization System for Gaze and Dialogue Data

https://doi.org/10.5220/0008953201380145

Kvist, Jonathan; Ekholm, Philip; Vaidyanathan, Preethi; Bailey, Reynold; Alm, Cecilia (January 2020, 15th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - HUCAPP)

We report and review a visualization system capable of displaying gaze and speech data elicited from pairs of subjects interacting in a discussion. We elicit such conversation data in our first experiment, where two participants are given the task of reaching a consensus about questions involving images. We validate the system in a second experiment where the purpose is to see if a person could determine which question had elicited a certain visualization. The visualization system allows users to explore reasoning behavior and participation during multimodal dialogue interactions.
more » « less
Full Text Available
Fusing Dialogue and Gaze From Discussions of 2D and 3D Scenes

https://doi.org/10.1145/3351529.3360661

Wang, Regina; Olson, Bradley; Vaidyanathan, Preethi; Bailey, Reynold; Alm, Cecilia Ovesdotter (January 2019, Adjunct of the 2019 International Conference on Multimodal Interaction)

Full Text Available
Multimodal Alignment for Affective Content

Haduong, Nikita; Nester, David; Vaidyanathan, Preethi; Prud'hommeaux, Emily; Bailey, Reynold; Alm, Cecilia (January 2018, AAAI Workshop on Affective Content Analysis)

Humans routinely extract important information from images and videos, relying on their gaze. In contrast, computational systems still have difficulty annotating important visual information in a human-like manner, in part because human gaze is often not included in the modeling process. Human input is also particularly relevant for processing and interpreting affective visual information. To address this challenge, we captured human gaze, spoken language, and facial expressions simultaneously in an experiment with visual stimuli characterized by subjective and affective content. Observers described the content of complex emotional images and videos depicting positive and negative scenarios and also their feelings about the imagery being viewed. We explore patterns of these modalities, for example by comparing the affective nature of participant-elicited linguistic tokens with image valence. Additionally, we expand a framework for generating automatic alignments between the gaze and spoken language modalities for visual annotation of images. Multimodal alignment is challenging due to their varying temporal offset. We explore alignment robustness when images have affective content and whether image valence influences alignment results. We also study if word frequency-based filtering impacts results, with both the unfiltered and filtered scenarios performing better than baseline comparisons, and with filtering resulting in a substantial decrease in alignment error rate. We provide visualizations of the resulting annotations from multimodal alignment. This work has implications for areas such as image understanding, media accessibility, and multimodal data fusion.
more » « less
Full Text Available
Using Co-Captured Face, Gaze, and Verbal Reactions to Images of Varying Emotional Content for Analysis and Semantic Alignment

Gangji, Aliya; Walden, Trevor; Vaidyanathan, Preethi; Prud'hommeaux, Emily; Bailey, Reynold; Alm, Cecilia O (January 2017, The AAAI-17 Workshop on Human-Aware Artificial Intelligence)

Analyzing different modalities of expression can provide insights into the ways that humans interpret, label, and react to images. Such insights have the potential not only to advance our understanding of how humans coordinate these expressive modalities but also to enhance existing methodologies for common AI tasks such as image annotation and classification. We conducted an experiment that co-captured the facial expressions, eye movements, and spoken language data that observers produce while examining images of varying emotional content and responding to description-oriented vs. affect-oriented questions about those images. We analyzed the facial expressions produced by the observers in order to determine the connection between those expressions and an image's emotional content. We also explored the relationship between the valence of an image and the verbal responses to that image, and how that relationship relates to the nature of the prompt, using low-level lexical features and more complex affective features extracted from the observers' verbal responses. Finally, in order to integrate this multimodal data, we extended an existing bitext alignment framework to create meaningful pairings between narrated observations about images and the image regions indicated by eye movement data. The resulting annotations of image regions with words from observers' responses demonstrate the potential of bitext alignment for multimodal data integration and, from an application perspective, for annotation of open-domain images. In addition, we found that while responses to affect-oriented questions appear useful for image understanding, their holistic nature seems less helpful for image region annotation.
more » « less
Full Text Available

Search for: All records