skip to main content


Title: Entertaining and opinionated but too controlling: a large-scale user study of an open domain Alexa prize system
Conversational systems typically focus on functional tasks such as scheduling appointments or creating todo lists. Instead we design and evaluate SlugBot (SB), one of 8 semifinalists in the 2018 AlexaPrize, whose goal is to support casual open-domain social inter-action. This novel application requires both broad topic coverage and engaging interactive skills. We developed a new technical approach to meet this demanding situation by crowd-sourcing novel content and introducing playful conversational strategies based on storytelling and games. We collected over 10,000 conversations during August 2018 as part of the Alexa Prize competition. We also conducted an in-lab follow-up qualitative evaluation. Over-all users found SB moderately engaging; conversations averaged 3.6 minutes and involved 26 user turns. However, users reacted very differently to different conversation subtypes. Storytelling and games were evaluated positively; these were seen as entertaining with predictable interactive structure. They also led users to impute personality and intelligence to SB. In contrast, search and general Chit-Chat induced coverage problems; here users found it hard to infer what topics SB could understand, with these conversations seen as being too system-driven. Theoretical and design implications suggest a move away from conversational systems that simply provide factual information. Future systems should be designed to have their own opinions with personal stories to share, and SB provides an example of how we might achieve this.  more » « less
Award ID(s):
1748056
NSF-PAR ID:
10124050
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 1st International Conference on Conversational User Interfaces
Page Range / eLocation ID:
24-30
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent years have seen a growing recognition of the importance of enabling K-12 students to engage in computational thinking, particularly in elementary grades where students' dispositions toward STEM are developing. Block-based programming has emerged as an effective tool for engaging these novice learners in computational thinking. At the same time, digital storytelling has emerged as a promising avenue for creating motivating problem-solving scenarios that engage students in science investigations. Although block-based programming and digital storytelling are in many ways synergistic, there is a lingering question of how to design block-based languages at an age-appropriate level to enable effective and engaging storytelling. In this work, we review design principles from prior block-based and digital storytelling systems as well as propose the design of block-based programming language features to enable the creation of rich, interactive science narratives by upper elementary students. 
    more » « less
  2. Aim/Purpose: The purpose of this paper is to explore the efficacy of simulated interactive virtual conversations (chatbots) for mentoring underrepresented minority doctoral engineering students who are considering pursuing a career in the professoriate or in industry. Background: Chatbots were developed under the National Science Foundation INCLUDES Design and Developments Launch Pilot award (17-4458) and provide career advice with responses from a pre-programmed database populated by renowned emeriti engineering faculty. Chatbots have been engineered to fulfill a myriad of roles, such as undergraduate student advisement, but no research has been found that addresses their use with supplemental future faculty mentoring for doctoral students.Methodology: Chatbot efficacy is examined through a phenomenological design with focus groups with underrepresented minority doctoral engineering students. No theoretical or conceptual frameworks exist relative to chatbots designed for future faculty mentoring; therefore, an adaptation and implementation of the conceptual model posited on movie recommendations was utilized to ground this study. The four-stage process of phenomenological data analysis was followed: epochĂ©, horizontalization, imaginative variation, and synthesis.Contribution: No studies have investigated the utility of chatbots in providing supplemental mentoring to future faculty. This phenomenological study contributes to this area of investigation and provides greater consideration into the unmet mentoring needs of these students, as well as the potential of utilizing chatbots for supplementary mentoring, particularly for those who lack access to high quality mentoring.Findings: Following the data analysis process, the essence of the findings was, while underrepresented minority doctoral engineering students have ample unmet mentoring needs and overall are satisfied with the user interface and trustworthiness of chatbots, their intent to use them is mixed due to a lack of personalization in this type of supplemental mentoring relationship.Recommendations for Practitioners: One of the major challenges faced by underrepresented doctoral engineering students is securing quality mentoring relationships that socialize them into the engineering culture and community of practice. While creating opportunities for students and incentivizing faculty to engage in the work of mentoring is needed, we must also consider the ways in which to leverage technology to offer supplemental future faculty mentoring virtually. Recommendation for Researchers: Additional research on the efficacy of chatbots in providing career-focused mentoring to future faculty is needed, as well as how to enhance the functionality of chatbots to create personal connections and networking opportunities, which are hallmarks of traditional mentoring relationships.Impact on Society: An understanding of the conceptual pathway that can lead to greater satisfaction with chatbots may serve to expand their use in the realm of mentoring. Scaling virtual faculty mentoring opportunities may be an important breakthrough in meeting mentoring needs across higher education.Future Research: Future chatbot research must focus on connecting chatbot users with human mentors; standardizing the process for response creation through additional data collection with a cadre of diverse, renowned faculty; engaging subject matter experts to conduct quality verification checks on responses; testing new responses with potential users; and launching the chatbots for a broad array of users. 
    more » « less
  3. Abstract Background

    The worldwide population of older adults will soon exceed the capacity of assisted living facilities. Accordingly, we aim to understand whether appropriately designed robots could help older adults stay active at home.

    Methods

    Building on related literature as well as guidance from experts in game design, rehabilitation, and physical and occupational therapy, we developed eight human-robot exercise games for the Baxter Research Robot, six of which involve physical human-robot contact. After extensive iteration, these games were tested in an exploratory user study including 20 younger adult and 20 older adult users.

    Results

    Only socially and physically interactive games fell in the highest ranges for pleasantness, enjoyment, engagement, cognitive challenge, and energy level. Our games successfully spanned three different physical, cognitive, and temporal challenge levels. User trust and confidence in Baxter increased significantly between pre- and post-study assessments. Older adults experienced higher exercise, energy, and engagement levels than younger adults, and women rated the robot more highly than men on several survey questions.

    Conclusions

    The results indicate that social-physical exercise with a robot is more pleasant, enjoyable, engaging, cognitively challenging, and energetic than similar interactions that lack physical touch. In addition to this main finding, researchers working in similar areas can build on our design practices, our open-source resources, and the age-group and gender differences that we found.

     
    more » « less
  4. Backchanneling behaviors on a robot, such as nodding, can make talking to a robot feel more natural and engaging by giving a sense that the robot is actively listening. For backchanneling to be effective, it is important that the timing of such cues is appropriate given the humans’ conversational behaviors. Recent progress has shown that these behaviors can be learned from datasets of human-human conversations. However, recent data-driven methods tend to overfit to the human speakers that are seen in training data and fail to generalize well to previously unseen speakers. In this paper, we explore the use of data augmentation for effective nodding behavior in a robot. We show that, by augmenting the input speech and visual features, we can produce data-driven models that are more robust to unseen features without collecting additional data. We analyze the efficacy of data-driven backchanneling in a realistic human-robot conversational setting with a user study, showing that users perceived the data-driven model to be better at listening as compared to rule-based and random baselines. 
    more » « less
  5. Many AI system designers grapple with how best to collect human input for different types of training data. Online crowds provide a cheap on-demand source of intelligence, but they often lack the expertise required in many domains. Experts offer tacit knowledge and more nuanced input, but they are harder to recruit. To explore this trade off, we compared novices and experts in terms of performance and perceptions on human intelligence tasks in the context of designing a text-based conversational agent. We developed a preliminary chatbot that simulates conversations with someone seeking mental health advice to help educate volunteer listeners at 7cups.com. We then recruited experienced listeners (domain experts) and MTurk novice workers (crowd workers) to conduct tasks to improve the chatbot with different levels of complexity. Novice crowds perform comparably to experts on tasks that only require natural language understanding, such as correcting how the system classifies a user statement. For more generative tasks, like creating new lines of chatbot dialogue, the experts demonstrated higher quality, novelty, and emotion. We also uncovered a motivational gap: crowd workers enjoyed the interactive tasks, while experts found the work to be tedious and repetitive. We offer design considerations for allocating crowd workers and experts on input tasks for AI systems, and for better motivating experts to participate in low-level data work for AI. 
    more » « less