skip to main content

Title: Baby Intuitions Benchmark (BIB): Discerning the goals, preferences, and actions of others.
To achieve human-like common sense about everyday life, machine learning systems must understand and reason about the goals, preferences, and actions of other agents in the environment. By the end of their first year of life, human infants intuitively achieve such common sense, and these cognitive achievements lay the foundation for humans' rich and complex understanding of the mental states of others. Can machines achieve generalizable, commonsense reasoning about other agents like human infants? The Baby Intuitions Benchmark (BIB) challenges machines to predict the plausibility of an agent's behavior based on the underlying causes of its actions. Because BIB's content and paradigm are adopted from developmental cognitive science, BIB allows for direct comparison between human and machine performance. Nevertheless, recently proposed, deep-learning-based agency reasoning models fail to show infant-like reasoning, leaving BIB an open challenge.  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
Advances in neural information processing systems
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Highlights

    In the present experiments, 3‐month‐old prereaching infants learned to attribute either object goals or place goals to other people's reaching actions.

    Prereaching infants view agents’ actions as goal‐directed, but do not expect these acts to be directed to specific objects, rather than to specific places.

    Prereaching infants are open‐minded about the specific goal states that reaching actions aim to achieve.

    more » « less
  2. One hallmark of human reasoning is that we can bring to bear a diverse web of common-sense knowledge in any situation. The vastness of our knowledge poses a challenge for the practical implementation of reasoning systems as well as for our cognitive theories – how do people represent their common-sense knowledge? On the one hand, our best models of sophisticated reasoning are top-down, making use primarily of symbolically-encoded knowledge. On the other, much of our understanding of the statistical properties of our environment may arise in a bottom-up fashion, for example through asso- ciationist learning mechanisms. Indeed, recent advances in AI have enabled the development of billion-parameter language models that can scour for patterns in gigabytes of text from the web, picking up a surprising amount of common-sense knowledge along the way—but they fail to learn the structure of coherent reasoning. We propose combining these approaches, by embedding language-model-backed primitives into a state- of-the-art probabilistic programming language (PPL). On two open-ended reasoning tasks, we show that our PPL models with neural knowledge components characterize the distribution of human responses more accurately than the neural language models alone, raising interesting questions about how people might use language as an interface to common-sense knowledge, and suggesting that building probabilistic models with neural language-model components may be a promising approach for more human-like AI. 
    more » « less
  3. null (Ed.)
    We review recent theoretical and empirical work on the emergence of relational reasoning, drawing connections among the fields of comparative psychology, developmental psychology, cognitive neuroscience, cognitive science, and machine learning. Relational learning appears to involve multiple systems: a suite of Early Systems that are available to human infants and are shared to some extent with nonhuman animals; and a Late System that emerges in humans only, at approximately age three years. The Late System supports reasoning with explicit role-governed relations, and is closely tied to the functions of a frontoparietal network in the human brain. Recent work in cognitive science and machine learning suggests that humans (and perhaps machines) may acquire abstract relations from nonrelational inputs by means of processes that enable re-representation. 
    more » « less
  4. null (Ed.)
    People who design, use, and are affected by autonomous artificially intelligent agents want to be able to trust such agents—that is, to know that these agents will perform correctly, to understand the reasoning behind their actions, and to know how to use them appropriately. Many techniques have been devised to assess and influence human trust in artificially intelligent agents. However, these approaches are typically ad hoc and have not been formally related to each other or to formal trust models. This article presents a survey of algorithmic assurances , i.e., programmed components of agent operation that are expressly designed to calibrate user trust in artificially intelligent agents. Algorithmic assurances are first formally defined and classified from the perspective of formally modeled human-artificially intelligent agent trust relationships. Building on these definitions, a synthesis of research across communities such as machine learning, human-computer interaction, robotics, e-commerce, and others reveals that assurance algorithms naturally fall along a spectrum in terms of their impact on an agent’s core functionality, with seven notable classes ranging from integral assurances (which impact an agent’s core functionality) to supplemental assurances (which have no direct effect on agent performance). Common approaches within each of these classes are identified and discussed; benefits and drawbacks of different approaches are also investigated. 
    more » « less
  5. Abstract

    Advances in artificial intelligence have raised a basic question about human intelligence: Is human reasoning best emulated by applying task‐specific knowledge acquired from a wealth of prior experience, or is it based on the domain‐general manipulation and comparison of mental representations? We address this question for the case of visual analogical reasoning. Using realistic images of familiar three‐dimensional objects (cars and their parts), we systematically manipulated viewpoints, part relations, and entity properties in visual analogy problems. We compared human performance to that of two recent deep learning models (Siamese Network and Relation Network) that were directly trained to solve these problems and to apply their task‐specific knowledge to analogical reasoning. We also developed a new model using part‐based comparison (PCM) by applying a domain‐general mapping procedure to learned representations of cars and their component parts. Across four‐term analogies (Experiment 1) and open‐ended analogies (Experiment 2), the domain‐general PCM model, but not the task‐specific deep learning models, generated performance similar in key aspects to that of human reasoners. These findings provide evidence that human‐like analogical reasoning is unlikely to be achieved by applying deep learning with big data to a specific type of analogy problem. Rather, humans do (and machines might) achieve analogical reasoning by learning representations that encode structural information useful for multiple tasks, coupled with efficient computation of relational similarity.

    more » « less