skip to main content


Title: Surprise! Predicting Infant Visual Attention in a Socially Assistive Robot Contingent Learning Paradigm
Early intervention to address developmental disability in infants has the potential to promote improved outcomes in neurodevelopmental structure and function [1]. Researchers are starting to explore Socially Assistive Robotics (SAR) as a tool for delivering early interventions that are synergistic with and enhance human-administered therapy. For SAR to be effective, the robot must be able to consistently attract the attention of the infant in order to engage the infant in a desired activity. This work presents the analysis of eye gaze tracking data from five 6-8 month old infants interacting with a Nao robot that kicked its leg as a contingent reward for infant leg movement. We evaluate a Bayesian model of lowlevel surprise on video data from the infants’ head-mounted camera and on the timing of robot behaviors as a predictor of infant visual attention. The results demonstrate that over 67% of infant gaze locations were in areas the model evaluated to be more surprising than average. We also present an initial exploration using surprise to predict the extent to which the robot attracts infant visual attention during specific intervals in the study. This work is the first to validate the surprise model on infants; our results indicate the potential for using surprise to inform robot behaviors that attract infant attention during SAR interactions.  more » « less
Award ID(s):
1706964
NSF-PAR ID:
10354827
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Ro-Man 2019- International Conference on Robot and Human Interactive Communication
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Agents must monitor their partners' affective states continuously in order to understand and engage in social interactions. However, methods for evaluating affect recognition do not account for changes in classification performance that may occur during occlusions or transitions between affective states. This paper addresses temporal patterns in affect classification performance in the context of an infant-robot interaction, where infants’ affective states contribute to their ability to participate in a therapeutic leg movement activity. To support robustness to facial occlusions in video recordings, we trained infant affect recognition classifiers using both facial and body features. Next, we conducted an in-depth analysis of our best-performing models to evaluate how performance changed over time as the models encountered missing data and changing infant affect. During time windows when features were extracted with high confidence, a unimodal model trained on facial features achieved the same optimal performance as multimodal models trained on both facial and body features. However, multimodal models outperformed unimodal models when evaluated on the entire dataset. Additionally, model performance was weakest when predicting an affective state transition and improved after multiple predictions of the same affective state. These findings emphasize the benefits of incorporating body features in continuous affect recognition for infants. Our work highlights the importance of evaluating variability in model performance both over time and in the presence of missing data when applying affect recognition to social interactions. 
    more » « less
  2. Observing how infants and mothers coordinate their behaviors can highlight meaningful patterns in early communication and infant development. While dyads often differ in the modalities they use to communicate, especially in the first year of life, it remains unclear how to capture coordination across multiple types of behaviors using existing computational models of interpersonal synchrony. This paper explores Dynamic Mode Decomposition with control (DMDc) as a method of integrating multiple signals from each communicating partner into a model of multimodal behavioral coordination. We used an existing video dataset to track the head pose, arm pose, and vocal fundamental frequency of infants and mothers during the Face-to-Face Still-Face (FFSF) procedure, a validated 3-stage interaction paradigm. For each recorded interaction, we fit both unimodal and multimodal DMDc models to the extracted pose data. The resulting dynamic characteristics of the models were analyzed to evaluate trends in individual behaviors and dyadic processes across infant age and stages of the interactions. Results demonstrate that observed trends in interaction dynamics across stages of the FFSF protocol were stronger and more significant when models incorporated both head and arm pose data, rather than a single behavior modality. Model output showed significant trends across age, identifying changes in infant movement and in the relationship between infant and mother behaviors. Models that included mothers’ audio data demonstrated similar results to those evaluated with pose data, confirming that DMDc can leverage different sets of behavioral signals from each interacting partner. Taken together, our results demonstrate the potential of DMDc toward integrating multiple behavioral signals into the measurement of multimodal interpersonal coordination. 
    more » « less
  3. Abstract

    Most research on early language learning focuses on the objects that infants see and the words they hear in their daily lives, although growing evidence suggests that motor development is also closely tied to language development. To study the real‐time behaviors required for learning new words during free‐flowing toy play, we measured infants’ visual attention and manual actions on to‐be‐learned toys. Parents and 12‐to‐26‐month‐old infants wore wireless head‐mounted eye trackers, allowing them to move freely around a home‐like lab environment. After the play session, infants were tested on their knowledge of object‐label mappings. We found that how often parents named objects during play did not predict learning, but instead, it was infants’ attention during and around a labeling utterance that predicted whether an object‐label mapping was learned. More specifically, we found that infant visual attention alone did not predict word learning. Instead, coordinated, multimodal attention–when infants’ hands and eyes were attending to the same object–predicted word learning. Our results implicate a causal pathway through which infants’ bodily actions play a critical role in early word learning.

     
    more » « less
  4. We (Meltzoff et al., 2018) described how Oostenbroek et al.’s (2016) design likely dampened infant imitation. In their commentary, Oostenbroek et al. (2018) argue that our points are post hoc. It is important for readers to know that they are not. Our paper restated “best practices” described in published papers. Based on the literature, the design used by Oostenbroek et al. (2016) would be predicted to dampen infant imitation. First, Oostenbroek et al.’s (2016) test periods were too brief. The stimulus presentation for each type of gesture was too short to ensure that neonates saw the display. The response measurement period did not allow neonates sufficient time to organize a motor response. Meltzoff and Moore (1983a, 1994) introduced experimental procedures specifically designed to address these issues (also, Simpson, Murray, Paukner, & Ferrari, 2014). Oostenbroek et al. did not capitalize on these procedural advances. Second, Oostenbroek et al. allowed uncontrolled experimenter–infant interactions during the test session itself. Previous papers on imitation provided analyses of how uncontrolled interactions with the experimenter can introduce “noise” in experiments of facial imitation (Meltzoff & Moore, 1983b, 1994). Third, Oostenbroek et al. used suboptimal eliciting conditions. Neonates cannot support their own heads; in Oostenbroek et al., infants’ heads were allowed to flop from side-to-side unsupported on the experimenter’s lap while the experimenter gestured with both hands. In addition, papers have listed techniques for maximizing visual attention (controlled lighting, homogeneous background) (Meltzoff & Moore, 1989, 1994). Oostenbroek et al. tested infants on a couch in the home. Despite a design that would blunt imitation, our reanalysis of Oostenbroek et al.’s data showed a response pattern that is consistent with the imitation of tongue protrusion (TP). In their commentary, Oostenbroek et al. (2018) now propose limiting analyses to a subset of their original controls. We reanalyzed their data accordingly. Again, the results support early imitation. Their cross-sectional data (Oostenbroek et al., 2016, Table S4) collapsed across age show significantly more infant TP in response to the TP demonstration than to the mean of the six dynamic face controls (mouth, happy, sad, mmm, ee, and click): t(104) = 4.62, p = 0.00001. The results are also significant using a narrower subset of stimuli (mouth, happy, and sad): t(104) = 3.20, p = 0.0018. These results rule out arousal, because the adult TP demonstration was significantly more effective in eliciting infant tongue protrusions than the category of dynamic face controls. Tongue protrusion matching is a robust phenomenon successfully elicited in more than two dozen studies (reviews: Meltzoff & Moore, 1997; Nagy, Pilling, Orvos, & Molnar, 2013; Simpson et al., 2014). There are more general lessons to be drawn. Psychology is experiencing what some call a “replication crisis.” Those who attempt to reproduce effects have scientific responsibilities, as do original authors. Both can help psychology become a more cumulative science. It is crucial for investigators to label whether or not a study is a direct replication attempt. If it is not a direct replication, procedural alterations and associated limitations should be discussed. It sows confusion to use procedures that are already predicted to dampen effects, without alerting readers. Psychology will be advanced by more stringent standards for reporting and evaluating studies aimed at reproducing published effects. Infant imitation is a fundamental skill prior to language and contributes to the development of social cognition. On this both Oostenbroek et al. and we agree. 
    more » « less
  5. Abstract

    Parental responsiveness to infant behaviors is a strong predictor of infants' language and cognitive outcomes. The mechanisms underlying this effect, however, are relatively unknown. We examined the effects of parent speech on infants' visual attention, manual actions, hand‐eye coordination, and dyadic joint attention during parent‐infant free play. We report on two studies that used head‐mounted eye trackers in increasingly naturalistic laboratory environments. In Study 1, 12‐to‐24‐month‐old infants and their parents played on the floor of a seminaturalistic environment with 24 toys. In Study 2, a different sample of dyads played in a home‐like laboratory with 10 toys and no restrictions on their movement. In both studies, we present evidence that responsive parent speech extends the duration of infants' multimodal attention. This social “boost” of parent speech impacts multiple behaviors that have been linked to later outcomes—visual attention, manual actions, hand‐eye coordination, and joint attention. Further, the amount that parents talked during the interaction was negatively related to the effects of parent speech on infant attention. Together, these results provide evidence of a trade‐off between quantity of speech and its effects, suggesting multiple pathways through which parents impact infants' multimodal attention to shape the moment‐by‐moment dynamics of an interaction.

     
    more » « less