skip to main content


This content will become publicly available on March 1, 2025

Title: The AI Institute for Engaged Learning
Abstract

The EngageAI Institute focuses on AI‐driven narrative‐centered learning environments that create engaging story‐based problem‐solving experiences to support collaborative learning. The institute's research has three complementary strands. First, the institute creates narrative‐centered learning environments that generate interactive story‐based problem scenarios to elicit rich communication, encourage coordination, and spark collaborative creativity. Second, the institute creates virtual embodied conversational agent technologies with multiple modalities for communication (speech, facial expression, gesture, gaze, and posture) to support student learning. Embodied conversational agents are driven by advances in natural language understanding, natural language generation, and computer vision. Third, the institute is creating an innovative multimodal learning analytics framework that analyzes parallel streams of multimodal data derived from students’ conversations, gaze, facial expressions, gesture, and posture as they interact with each other, with teachers, and with embodied conversational agents. Woven throughout the institute's activities is a strong focus on ethics, with an emphasis on creating AI‐augmented learning that is deeply informed by considerations of fairness, accountability, transparency, trust, and privacy. The institute emphasizes broad participation and diverse perspectives to ensure that advances in AI‐augmented learning address inequities in STEM. The institute brings together a multistate network of universities, diverse K‐12 school systems, science museums, and nonprofit partners. Key to all of these endeavors is an emphasis on diversity, equity, and inclusion.

 
more » « less
Award ID(s):
2112635
NSF-PAR ID:
10500513
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
AI Magazine
Volume:
45
Issue:
1
ISSN:
0738-4602
Page Range / eLocation ID:
69 to 76
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we argue that, as HCI becomes more multimodal with the integration of gesture, gaze, posture, and other nonverbal behavior, it is important to understand the role played by affordances and their associated actions in human-object interactions (HOI), so as to facilitate reasoning in HCI and HRI environments. We outline the requirements and challenges involved in developing a multimodal semantics for human-computer and human-robot interactions. Unlike unimodal interactive agents (e.g., text-based chatbots or voice-based personal digital assistants), multimodal HCI and HRI inherently require a notion of embodiment, or an understanding of the agent’s placement within the environment and that of its interlocutor. We present a dynamic semantics of the language, VoxML, to model human-computer, human-robot, and human-human interactions by creating multimodal simulations of both the communicative content and the agents’ common ground, and show the utility of VoxML information that is reified within the environment on computational understanding of objects for HOI. 
    more » « less
  2. Creating engaging interactive story-based experiences dynamically responding to individual player choices poses significant challenges for narrative-centered games. Recent advances in pre-trained large language models (LLMs) have the potential to revolutionize procedural content generation for narrative-centered games. Historically, interactive narrative generation has specified pivotal events in the storyline, often utilizing planning-based approaches toward achieving narrative coherence and maintaining the story arc. However, manual authorship is typically used to create detail and variety in non-player character (NPC) interaction to specify and instantiate plot events. This paper proposes SCENECRAFT, a narrative scene generation framework that automates NPC interaction crucial to unfolding plot events. SCENECRAFT interprets natural language instructions about scene objectives, NPC traits, location, and narrative variations. It then employs large language models to generate game scenes aligned with authorial intent. It generates branching conversation paths that adapt to player choices while adhering to the author’s interaction goals. LLMs generate interaction scripts, semantically extract character emotions and gestures to align with the script, and convert dialogues into a game scripting language. The generated script can then be played utilizing an existing narrative-centered game framework. Through empirical evaluation using automated and human assessments, we demonstrate SCENECRAFT’s effectiveness in creating narrative experiences based on creativity, adaptability, and alignment with intended author instructions.

     
    more » « less
  3. Effective storytelling relies on engagement and interaction. This work develops an automated software platform for telling stories to children and investigates the impact of two design choices on children’s engagement and willingness to interact with the system: story distribution and the use of complex gesture. A storyteller condition compares stories told in a third person, narrator voice with those distributed between a narrator and first-person story characters. Basic gestures are used in all our storytellings, but, in a second factor, some are augmented with gestures that indicate conversational turn changes, references to other characters and prompt children to ask questions. An analysis of eye gaze indicates that children attend more to the story when a distributed storytelling model is used. Gesture prompts appear to encourage children to ask questions, something that children did, but at a relatively low rate. Interestingly, the children most frequently asked “why” questions. Gaze switching happened more quickly when the story characters began to speak than for narrator turns. These results have implications for future agent-based storytelling system research. 
    more » « less
  4. null (Ed.)
    Effective storytelling relies on engagement and interaction. This work develops an automated software platform for telling stories to children and investigates the impact of two design choices on children’s engagement and willingness to interact with the system: story distribution and the use of complex gesture. A storyteller condition compares stories told in a third person, narrator voice with those distributed between a narrator and first-person story characters. Basic gestures are used in all our storytellings, but, in a second factor, some are augmented with gestures that indicate conversational turn changes, references to other characters and prompt children to ask questions. An analysis of eye gaze indicates that children attend more to the story when a distributed storytelling model is used. Gesture prompts appear to encourage children to ask questions, something that children did, but at a relatively low rate. Interestingly, the children most frequently asked “why” questions. Gaze switching happened more quickly when the story characters began to speak than for narrator turns. These results have implications for future agent-based storytelling system research. 
    more » « less
  5. Extended reality (XR) technologies, such as virtual reality (VR) and augmented reality (AR), provide users, their avatars, and embodied agents a shared platform to collaborate in a spatial context. Although traditional face-to-face communication is limited by users’ proximity, meaning that another human’s non-verbal embodied cues become more difficult to perceive the farther one is away from that person, researchers and practitioners have started to look into ways to accentuate or amplify such embodied cues and signals to counteract the effects of distance with XR technologies. In this article, we describe and evaluate the Big Head technique, in which a human’s head in VR/AR is scaled up relative to their distance from the observer as a mechanism for enhancing the visibility of non-verbal facial cues, such as facial expressions or eye gaze. To better understand and explore this technique, we present two complimentary human-subject experiments in this article. In our first experiment, we conducted a VR study with a head-mounted display to understand the impact of increased or decreased head scales on participants’ ability to perceive facial expressions as well as their sense of comfort and feeling of “uncannniness” over distances of up to 10 m. We explored two different scaling methods and compared perceptual thresholds and user preferences. Our second experiment was performed in an outdoor AR environment with an optical see-through head-mounted display. Participants were asked to estimate facial expressions and eye gaze, and identify a virtual human over large distances of 30, 60, and 90 m. In both experiments, our results show significant differences in minimum, maximum, and ideal head scales for different distances and tasks related to perceiving faces, facial expressions, and eye gaze, and we also found that participants were more comfortable with slightly bigger heads at larger distances. We discuss our findings with respect to the technologies used, and we discuss implications and guidelines for practical applications that aim to leverage XR-enhanced facial cues. 
    more » « less