Creating engaging interactive story-based experiences dynamically responding to individual player choices poses significant challenges for narrative-centered games. Recent advances in pre-trained large language models (LLMs) have the potential to revolutionize procedural content generation for narrative-centered games. Historically, interactive narrative generation has specified pivotal events in the storyline, often utilizing planning-based approaches toward achieving narrative coherence and maintaining the story arc. However, manual authorship is typically used to create detail and variety in non-player character (NPC) interaction to specify and instantiate plot events. This paper proposes SCENECRAFT, a narrative scene generation framework that automates NPC interaction crucial to unfolding plot events. SCENECRAFT interprets natural language instructions about scene objectives, NPC traits, location, and narrative variations. It then employs large language models to generate game scenes aligned with authorial intent. It generates branching conversation paths that adapt to player choices while adhering to the author’s interaction goals. LLMs generate interaction scripts, semantically extract character emotions and gestures to align with the script, and convert dialogues into a game scripting language. The generated script can then be played utilizing an existing narrative-centered game framework. Through empirical evaluation using automated and human assessments, we demonstrate SCENECRAFT’s effectiveness in creating narrative experiences based on creativity, adaptability, and alignment with intended author instructions.
more »
« less
Facial Emotion Expression Corpora for Training Game Character Neural Network Models
The emergence of photorealistic and cinematic non-player character (NPC) animation presents new challenges for video game developers. Game player expectations of cinematic acting styles bring a more sophisticated aesthetic in the representation of social interaction. New methods can streamline workflow by integrating actor-driven character design into the development of game character AI and animation. A workflow that tracks actor performance to final neural network (NN) design depends on a rigorous method of producing single-actor video corpora from which to train emotion AI NN models. While numerous video corpora have been developed to study emotion elicitation of the face from which to test theoretical models and train neural networks to recognize emotion, developing single-actor corpora to train NNs of NPCs in video games is uncommon. A class of facial emotion recognition (FER) products have enabled production of single-actor video corpora that use emotion analysis data. This paper introduces a single-actor game character corpora workflow for game character developers. The proposed method uses a single actor video corpus and dataset with the intent to train and implement a NN in an off-the-shelf video game engine for facial animation of an NPC. The efficacy of using a NN-driven animation controller has already been demonstrated (Schiffer, 2021, Kozasa et. al 2006). This paper focuses on using a single-actor video corpus for the purpose of training a NN-driven animation controller.
more »
« less
- Award ID(s):
- 1852516
- PAR ID:
- 10423957
- Date Published:
- Journal Name:
- International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP)
- Volume:
- 2
- Page Range / eLocation ID:
- 197 to 208
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper, the authors explore different approaches to animating 3D facial emotions, some of which use manual keyframe animation and some of which use machine learning. To compare approaches the authors conducted an experiment consisting of side-by-side comparisons of animation clips generated by skeleton, blendshape, audio-driven, and vision-based capture facial animation techniques. Ninety-five participants viewed twenty face animation clips of characters expressing five distinct emotions (anger, sadness, happiness, fear, neutral), which were created using the four different facial animation techniques. After viewing each clip, the participants were asked to identify the emotions that the characters appeared to be conveying and rate their naturalness. Findings showed that the naturalness ratings of the happy emotion produced by the four methods tended to be consistent, whereas the naturalness ratings of the fear emotion created with skeletal animation were significantly higher than the other methods. Recognition of sad and neutral emotions were very low for all methods as compared to the other emotions. Overall, the skeleton approach had significantly higher ratings for naturalness and higher recognition rate than the other methods.more » « less
-
Abstract 3D facial animation synthesis from audio has been a focus in recent years. However, most existing literature works are designed to map audio and visual content, providing limited knowledge regarding the relationship between emotion in audio and expressive facial animation. This work generates audio‐matching facial animations with the specified emotion label. In such a task, we argue that separating the content from audio is indispensable—the proposed model must learn to generate facial content from audio content while expressions from the specified emotion. We achieve it by an adaptive instance normalization module that isolates the content in the audio and combines the emotion embedding from the specified label. The joint content‐emotion embedding is then used to generate 3D facial vertices and texture maps. We compare our method with state‐of‐the‐art baselines, including the facial segmentation‐based and voice conversion‐based disentanglement approaches. We also conduct a user study to evaluate the performance of emotion conditioning. The results indicate that our proposed method outperforms the baselines in animation quality and expression categorization accuracy.more » « less
-
The paper reports ongoing research toward the design of multimodal affective pedagogical agents that are effective for different types of learners and applications. In particular, the work reported in the paper investigated the extent to which the type of character design (realistic versus stylized) affects students’ perception of an animated agent’s facial emotions, and whether the effects are moderated by learner characteristics (e.g. gender). Eighty-two participants viewed 10 animation clips featuring a stylized character exhibiting 5 different emotions, e.g. happiness, sadness, fear, surprise and anger (2 clips per emotion), and 10 clips featuring a realistic character portraying the same emotional states. The participants were asked to name the emotions and rate their sincerity, intensity, and typicality. The results indicated that for recognition, participants were slightly more likely to recognize the emotions displayed by the stylized agent, although the difference was not statistically significant. The stylized agent was on average rated significantly higher for facial emotion intensity, whereas the differences in ratings for typicality and sincerity across all emotions were not statistically significant. A significant difference in ratings was shown in regard to sadness (within typicality), happiness (within sincerity), fear, anger, sadness and happiness (within intensity) with the stylized agent rated higher. Gender was not a significant correlate across all emotions or for individual emotions.more » « less
-
Background Autism spectrum disorder (ASD) is a developmental disorder characterized by deficits in social communication and interaction, and restricted and repetitive behaviors and interests. The incidence of ASD has increased in recent years; it is now estimated that approximately 1 in 40 children in the United States are affected. Due in part to increasing prevalence, access to treatment has become constrained. Hope lies in mobile solutions that provide therapy through artificial intelligence (AI) approaches, including facial and emotion detection AI models developed by mainstream cloud providers, available directly to consumers. However, these solutions may not be sufficiently trained for use in pediatric populations. Objective Emotion classifiers available off-the-shelf to the general public through Microsoft, Amazon, Google, and Sighthound are well-suited to the pediatric population, and could be used for developing mobile therapies targeting aspects of social communication and interaction, perhaps accelerating innovation in this space. This study aimed to test these classifiers directly with image data from children with parent-reported ASD recruited through crowdsourcing. Methods We used a mobile game called Guess What? that challenges a child to act out a series of prompts displayed on the screen of the smartphone held on the forehead of his or her care provider. The game is intended to be a fun and engaging way for the child and parent to interact socially, for example, the parent attempting to guess what emotion the child is acting out (eg, surprised, scared, or disgusted). During a 90-second game session, as many as 50 prompts are shown while the child acts, and the video records the actions and expressions of the child. Due in part to the fun nature of the game, it is a viable way to remotely engage pediatric populations, including the autism population through crowdsourcing. We recruited 21 children with ASD to play the game and gathered 2602 emotive frames following their game sessions. These data were used to evaluate the accuracy and performance of four state-of-the-art facial emotion classifiers to develop an understanding of the feasibility of these platforms for pediatric research. Results All classifiers performed poorly for every evaluated emotion except happy. None of the classifiers correctly labeled over 60.18% (1566/2602) of the evaluated frames. Moreover, none of the classifiers correctly identified more than 11% (6/51) of the angry frames and 14% (10/69) of the disgust frames. Conclusions The findings suggest that commercial emotion classifiers may be insufficiently trained for use in digital approaches to autism treatment and treatment tracking. Secure, privacy-preserving methods to increase labeled training data are needed to boost the models’ performance before they can be used in AI-enabled approaches to social therapy of the kind that is common in autism treatments.more » « less
An official website of the United States government

