skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Embodied Multimodal Agents to Bridge the Understanding Gap
In this paper we argue that embodied multimodal agents, i.e., avatars, can play an important role in moving natural language processing toward “deep understanding.” Fully featured interactive agents, model encounters between two “people,” but a language-only agent has little environmental and situational awareness. Multimodal agents bring new opportunities for interpreting visuals, locational information, gestures, etc., which are more axes along which to communicate. We propose that multimodal agents, by facilitating an embodied form of human-computer interaction, provide additional structure that can be used to train models that move NLP systems closer to genuine “understanding” of grounded language, and we discuss ongoing studies using existing systems.  more » « less
Award ID(s):
2019805
PAR ID:
10494303
Author(s) / Creator(s):
;
Publisher / Repository:
Association for Computational Linguistics
Date Published:
Journal Name:
Proceedings of the First Workshop on Bridging Human–Computer Interaction and Natural Language Processing
Format(s):
Medium: X
Location:
Online
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The EngageAI Institute focuses on AI‐driven narrative‐centered learning environments that create engaging story‐based problem‐solving experiences to support collaborative learning. The institute's research has three complementary strands. First, the institute creates narrative‐centered learning environments that generate interactive story‐based problem scenarios to elicit rich communication, encourage coordination, and spark collaborative creativity. Second, the institute creates virtual embodied conversational agent technologies with multiple modalities for communication (speech, facial expression, gesture, gaze, and posture) to support student learning. Embodied conversational agents are driven by advances in natural language understanding, natural language generation, and computer vision. Third, the institute is creating an innovative multimodal learning analytics framework that analyzes parallel streams of multimodal data derived from students’ conversations, gaze, facial expressions, gesture, and posture as they interact with each other, with teachers, and with embodied conversational agents. Woven throughout the institute's activities is a strong focus on ethics, with an emphasis on creating AI‐augmented learning that is deeply informed by considerations of fairness, accountability, transparency, trust, and privacy. The institute emphasizes broad participation and diverse perspectives to ensure that advances in AI‐augmented learning address inequities in STEM. The institute brings together a multistate network of universities, diverse K‐12 school systems, science museums, and nonprofit partners. Key to all of these endeavors is an emphasis on diversity, equity, and inclusion. 
    more » « less
  2. Previous research has established that embodied modeling (role-playing agents in a system) can support learning about complexity. Separately, research has demonstrated that increasing the multimodal resources available to students can support sensemaking, particularly for students classified as English Learners. This study bridges these two bodies of research to consider how embodied models can strengthen an interconnected system of multimodal models created by a classroom. We explore how iteratively refining embodied modeling activities strengthened connections to other models, real-world phenomena, and multimodal representations. Through design-based research in a sixth grade classroom studying ecosystems, we refined embodied modeling activities initially conceived as supports for computational thinking and modeling. Across three iterative cycles, we illustrate how the conceptual and epistemic relationship between the computational and embodied model shifted, and we analyze how these shifts shaped opportunities for learning and participation by: (1) recognizing each student’s perspectives as critical for making sense of the model, (2) encouraging students to question and modify the “code” for the model, and (3) leveraging multimodal resources, including graphs, gestures, and student-generated language, for meaning-making. Through these shifts, the embodied model became a full-fledged component of the classroom’s model system and created more equitable opportunities for learning and participation. 
    more » « less
  3. We present a five-year retrospective on the development of the VoxWorld platform, first introduced as a multimodal platform for modeling motion language, that has evolved into a platform for rapidly building and deploying embodied agents with contextual and situational awareness, capable of interacting with humans in multiple modalities, and exploring their environments. In particular, we discuss the evolution from the theoretical underpinnings of the VoxML modeling language to a platform that accommodates both neural and symbolic inputs to build agents capable of multimodal interaction and hybrid reasoning. We focus on three distinct agent implementations and the functionality needed to accommodate all of them: Diana, a virtual collaborative agent; Kirby, a mobile robot; and BabyBAW, an agent who self-guides its own exploration of the world. 
    more » « less
  4. We present a five-year retrospective on the development of the VoxWorld platform, first introduced as a multimodal platform for modeling motion language, that has evolved into a platform for rapidly building and deploying embodied agents with contextual and situational awareness, capable of interacting with humans in multiple modalities, and exploring their environments. In particular, we discuss the evolution from the theoretical underpinnings of the VoxML modeling language to a platform that accommodates both neural and symbolic inputs to build agents capable of multimodal interaction and hybrid reasoning. We focus on three distinct agent implementations and the functionality needed to accommodate all of them: Diana, a virtual collaborative agent; Kirby, a mobile robot; and BabyBAW, an agent who self-guides its own exploration of the world. 
    more » « less
  5. Education is poised for a transformative shift with the advent of neurosymbolic artificial intelligence (NAI), which will redefine how we support deeply adaptive and personalized learning experiences. The integration of Knowledge Graphs (KGs) with Large Language Models (LLMs), a significant and popular form of NAI, presents a promising avenue for advancing personalized instruction via neurosymbolic educational agents. By leveraging structured knowledge, these agents can provide individualized learning experiences that align with specific learner preferences and desired learning paths, while also mitigating biases inherent in traditional AI systems. NAI-powered education systems will be capable of interpreting complex human concepts and contexts while employing advanced problem-solving strategies, all grounded in established pedagogical frameworks. In this paper, we propose a system that leverages the unique affordances of KGs, LLMs, and pedagogical agents – embodied characters designed to enhance learning – as critical components of a hybrid NAI architecture. We discuss the rationale for our system design and the preliminary findings of our work. We conclude that education in the era of NAI will make learning more accessible, equitable, and aligned with real-world skills. This is an era that will explore a new depth of understanding in educational tools. 
    more » « less