Speakers build rapport in the process of aligning conversational behaviors with each other. Rapport engendered with a teachable agent while instructing domain material has been shown to promote learning. Past work on lexical alignment in the field of education suffers from limitations in both the measures used to quantify alignment and the types of interactions in which alignment with agents has been studied. In this paper, we apply alignment measures based on a data-driven notion of shared expressions (possibly composed of multiple words) and compare alignment in one-on-one human-robot (H-R) interactions with the H-R portions of collaborative human-human-robot (H-H-R) interactions. We find that students in the H-R setting align with a teachable robot more than in the H-H-R setting and that the relationship between lexical alignment and rapport is more complex than what is predicted by previous theoretical and empirical work.
more »
« less
Learning Backchanneling Behaviors for a Social Robot via Data Augmentation from Human-Human Conversations
Backchanneling behaviors on a robot, such as nodding, can make talking to a robot feel more natural and engaging by giving a sense that the robot is actively listening. For backchanneling to be effective, it is important that the timing of such cues is appropriate given the humans’ conversational behaviors. Recent progress has shown that these behaviors can be learned from datasets of human-human conversations. However, recent data-driven methods tend to overfit to the human speakers that are seen in training data and fail to generalize well to previously unseen speakers. In this paper, we explore the use of data augmentation for effective nodding behavior in a robot. We show that, by augmenting the input speech and visual features, we can produce data-driven models that are more robust to unseen features without collecting additional data. We analyze the efficacy of data-driven backchanneling in a realistic human-robot conversational setting with a user study, showing that users perceived the data-driven model to be better at listening as compared to rule-based and random baselines.
more »
« less
- Award ID(s):
- 1925043
- PAR ID:
- 10446626
- Date Published:
- Journal Name:
- Proceedings of the 5th Conference on Robot Learning (PMLR)
- Volume:
- 164
- Page Range / eLocation ID:
- 513-525
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study two approaches for predicting an appropriate pose for a robot to take part in group formations typical of social human conversations subject to the physical layout of the surrounding environment. One method is model-based and explicitly encodes key geometric aspects of conversational formations. The other method is data-driven. It implicitly models key properties of spatial arrangements using graph neural networks and an adversarial training regimen. We evaluate the proposed approaches through quantitative metrics designed for this problem domain and via a human experiment. Our results suggest that the proposed methods are effective at reasoning about the environment layout and conversational group formations. They can also be used repeatedly to simulate conversational spatial arrangements despite being designed to output a single pose at a time. However, the methods showed different strengths. For example, the geometric approach was more successful at avoiding poses generated in nonfree areas of the environment, but the data-driven method was better at capturing the variability of conversational spatial formations. We discuss ways to address open challenges for the pose generation problem and other interesting avenues for future work.more » « less
-
As conversational agents and digital assistants become increasingly pervasive, understanding their synthetic speech becomes increasingly important. Simultaneously, speech synthesis is becoming more sophisticated and manipulable, providing the opportunity to optimize speech rate to save users time. However, little is known about people’s abilities to understand fast speech. In this work, we provide the first large-scale study on human listening rates. Run on LabintheWild, it used volunteer participants, was screen reader accessible, and measured listening rate by accuracy at answering questions spoken by a screen reader at various rates. Our results show that blind and low-vision people, who often rely on audio cues and access text aurally, generally have higher listening rates than sighted people. The findings also suggest a need to expand the range of rates available on personal devices. These results demonstrate the potential for users to learn to listen to faster rates, expanding the possibilities for human-conversational agent interaction.more » « less
-
When interacting with a robot, humans form con-ceptual models (of varying quality) which capture how the robot behaves. These conceptual models form just from watching or in-teracting with the robot, with or without conscious thought. Some methods select and present robot behaviors to improve human conceptual model formation; nonetheless, these methods and HRI more broadly have not yet consulted cognitive theories of human concept learning. These validated theories offer concrete design guidance to support humans in developing conceptual models more quickly, accurately, and flexibly. Specifically, Analogical Transfer Theory and the Variation Theory of Learning have been successfully deployed in other fields, and offer new insights for the HRI community about the selection and presentation of robot behaviors. Using these theories, we review and contextualize 35 prior works in human-robot teaching and learning, and we assess how these works incorporate or omit the design implications of these theories. From this review, we identify new opportunities for algorithms and interfaces to help humans more easily learn conceptual models of robot behaviors, which in turn can help humans become more effective robot teachers and collaborators.more » « less
-
Understanding human perceptions of robot performance is crucial for designing socially intelligent robots that can adapt to human expectations. Current approaches often rely on surveys, which can disrupt ongoing human–robot interactions. As an alternative, we explore predicting people’s perceptions of robot performance using non-verbal behavioral cues and machine learning techniques. We contribute the SEAN TOGETHER Dataset consisting of observations of an interaction between a person and a mobile robot in Virtual Reality, together with perceptions of robot performance provided by users on a 5-point scale. We then analyze how well humans and supervised learning techniques can predict perceived robot performance based on different observation types (like facial expression and spatial behavior features). Our results suggest that facial expressions alone provide useful information, but in the navigation scenarios that we considered, reasoning about spatial features in context is critical for the prediction task. Also, supervised learning techniques outperformed humans’ predictions in most cases. Further, when predicting robot performance as a binary classification task on unseen users’ data, the F1-Score of machine learning models more than doubled that of predictions on a 5-point scale. This suggested good generalization capabilities, particularly in identifying performance directionality over exact ratings. Based on these findings, we conducted a real-world demonstration where a mobile robot uses a machine learning model to predict how a human who follows it perceives it. Finally, we discuss the implications of our results for implementing these supervised learning models in real-world navigation. Our work paves the path to automatically enhancing robot behavior based on observations of users and inferences about their perceptions of a robot.more » « less
An official website of the United States government

