skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Hierarchically-Attentive RNN for Album Summarization and Storytelling
We address the problem of end-to-end visual storytelling. Given a photo album, our model first selects the most representative (summary) photos, and then composes a natural language story for the album. For this task, we make use of the Visual Storytelling dataset and a model composed of three hierarchically-attentive Recurrent Neural Nets (RNNs) to: encode the album photos, select representative (summary) photos, and compose the story. Automatic and human evaluations show our model achieves better performance on selection, generation, and retrieval than baselines.  more » « less
Award ID(s):
1633295
PAR ID:
10066885
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Empirical Methods in Natural Language Processing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper explores avatar identification in creative story- telling applications where users create their own story and environment. We present a study that investigated the effects of avatar facial similarity to the user on the quality of the story product they create. The children told a story using a digital puppet-based storytelling system by inter- acting with a physical puppet box that was augmented with a real-time video feed of the puppet enactment. We used a facial morphing technique to manipulate avatar facial similarity to the user. The resulting morphed image was applied to each participants puppet character, thus creating a custom avatar for each child to use in story creation. We hypothesized that the more familiar avatars appeared to participants, the stronger the sense of character identification would be, resulting in higher story quality. The proposed rationale is that visual familiarity may lead participants to draw richer story details from their past real-life experiences. Qualitative analysis of the stories supported our hypothesis. Our results contribute to avatar design in children's creative storytelling applications. 
    more » « less
  2. BackgroundReminiscence, a therapy that uses stimulating materials such as old photos and videos to stimulate long-term memory, can improve the emotional well-being and life satisfaction of older adults, including those who are cognitively intact. However, providing personalized reminiscence therapy can be challenging for caregivers and family members. ObjectiveThis study aimed to achieve three objectives: (1) design and develop the GoodTimes app, an interactive multimodal photo album that uses artificial intelligence (AI) to engage users in personalized conversations and storytelling about their pictures, encompassing family, friends, and special moments; (2) examine the app’s functionalities in various scenarios using use-case studies and assess the app’s usability and user experience through the user study; and (3) investigate the app’s potential as a supplementary tool for reminiscence therapy among cognitively intact older adults, aiming to enhance their psychological well-being by facilitating the recollection of past experiences. MethodsWe used state-of-the-art AI technologies, including image recognition, natural language processing, knowledge graph, logic, and machine learning, to develop GoodTimes. First, we constructed a comprehensive knowledge graph that models the information required for effective communication, including photos, people, locations, time, and stories related to the photos. Next, we developed a voice assistant that interacts with users by leveraging the knowledge graph and machine learning techniques. Then, we created various use cases to examine the functions of the system in different scenarios. Finally, to evaluate GoodTimes’ usability, we conducted a study with older adults (N=13; age range 58-84, mean 65.8 years). The study period started from January to March 2023. ResultsThe use-case tests demonstrated the performance of GoodTimes in handling a variety of scenarios, highlighting its versatility and adaptability. For the user study, the feedback from our participants was highly positive, with 92% (12/13) reporting a positive experience conversing with GoodTimes. All participants mentioned that the app invoked pleasant memories and aided in recollecting loved ones, resulting in a sense of happiness for the majority (11/13, 85%). Additionally, a significant majority found GoodTimes to be helpful (11/13, 85%) and user-friendly (12/13, 92%). Most participants (9/13, 69%) expressed a desire to use the app frequently, although some (4/13, 31%) indicated a need for technical support to navigate the system effectively. ConclusionsOur AI-based interactive photo album, GoodTimes, was able to engage users in browsing their photos and conversing about them. Preliminary evidence supports GoodTimes’ usability and benefits cognitively intact older adults. Future work is needed to explore its potential positive effects among older adults with cognitive impairment. 
    more » « less
  3. In strong story experience management problems, an automated storytelling agent balances player autonomy with narrative structure in the context of an interactive story game world. However, it is possible for the game world to get softlocked in states outside narrative structures specified by the game designer. These states are called dead-ends. In this paper, we revisit adversarial strong story experience management, a framing of the experience management problem that models interactive storytelling as an adversarial game where dead-ends are losses. This framing is adversarial against narrative softlocks, not necessarily the player. We present a novel agent based on adversarial search and deep reinforcement learning, which is trained to avoid dead-ends while preserving player autonomy. We compare our approach to a reactive, narrative plan-based mediation system on a test set of games compatible with current narrative planning techniques. We show that our adversarial architecture outperforms narrative mediation on a suite of dead-end metrics during game trace and breadth-first tests of state transition system exploration, using classical and intentional planning domains. 
    more » « less
  4. Effective storytelling relies on engagement and interaction. This work develops an automated software platform for telling stories to children and investigates the impact of two design choices on children’s engagement and willingness to interact with the system: story distribution and the use of complex gesture. A storyteller condition compares stories told in a third person, narrator voice with those distributed between a narrator and first-person story characters. Basic gestures are used in all our storytellings, but, in a second factor, some are augmented with gestures that indicate conversational turn changes, references to other characters and prompt children to ask questions. An analysis of eye gaze indicates that children attend more to the story when a distributed storytelling model is used. Gesture prompts appear to encourage children to ask questions, something that children did, but at a relatively low rate. Interestingly, the children most frequently asked “why” questions. Gaze switching happened more quickly when the story characters began to speak than for narrator turns. These results have implications for future agent-based storytelling system research. 
    more » « less
  5. null (Ed.)
    Effective storytelling relies on engagement and interaction. This work develops an automated software platform for telling stories to children and investigates the impact of two design choices on children’s engagement and willingness to interact with the system: story distribution and the use of complex gesture. A storyteller condition compares stories told in a third person, narrator voice with those distributed between a narrator and first-person story characters. Basic gestures are used in all our storytellings, but, in a second factor, some are augmented with gestures that indicate conversational turn changes, references to other characters and prompt children to ask questions. An analysis of eye gaze indicates that children attend more to the story when a distributed storytelling model is used. Gesture prompts appear to encourage children to ask questions, something that children did, but at a relatively low rate. Interestingly, the children most frequently asked “why” questions. Gaze switching happened more quickly when the story characters began to speak than for narrator turns. These results have implications for future agent-based storytelling system research. 
    more » « less