How do people perform general-purpose physical reasoning across a variety of scenarios in everyday life? Across two stud ies with seven different physical scenarios, we asked participants to predict whether or where two objects will make contact. People achieved high accuracy and were highly consistent with each other in their predictions. We hypothesize that this robust generalization is a consequence of mental simulations of noisy physics. We designed an “intuitive physics engine” model to capture this generalizable simulation. We find that this model generalized in human-like ways to unseen stimuli and to a different query of predictions. We evaluated several state-of-the-art deep learning and scene feature models on the same task and found that they could not explain human predictions as well. This study provides evidence that human’s robust generalization in physics predictions are supported by a probabilistic simulation model, and suggests the need for structure in learned dynamics models.
more »
« less
Tangled Physics: Knots Strain Intuitive Physical Reasoning
Abstract Whereas decades of research have cataloged striking errors in physical reasoning, a resurgence of interest in intuitive physics has revealed humans’ remarkable ability to successfully predict the unfolding of physical scenes. A leading interpretation intended to resolve these opposing results is that physical reasoning recruits a general-purpose mechanism that reliably models physical scenarios (explaining recent successes), but overly contrived tasks or impoverished and ecologically invalid stimuli can produce poor performance (accounting for earlier failures). But might there be tasks that persistently strain physical understanding, even in naturalistic contexts? Here, we explore this question by introducing a new intuitive physics task: evaluating the strength of knots and tangles. Knots are ubiquitous across cultures and time-periods, and evaluating them correctly often spells the difference between safety and peril. Despite this, 5 experiments show that observers fail to discern even very large differences in strength between knots. In a series of two-alternative forced-choice tasks, observers viewed a variety of simple “bends” (knots joining two pieces of thread) and decided which would require more force to undo. Though the strength of these knots is well-documented, observers’ judgments completely failed to reflect these distinctions, across naturalistic photographs (E1), idealized renderings (E2), dynamic videos (E3), and even when accompanied by schematic diagrams of the knots’ structures (E4). Moreover, these failures persisted despite accurate identification of the topological differences between the knots (E5); in other words, even when observers correctly perceived the underlying structure of the knot, they failed to correctly judge its strength. These results expose a blindspot in physical reasoning, placing new constraints on general-purpose theories of scene understanding.
more »
« less
- Award ID(s):
- 2021053
- PAR ID:
- 10559099
- Publisher / Repository:
- Open Mind
- Date Published:
- Journal Name:
- Open Mind
- Volume:
- 8
- ISSN:
- 2470-2986
- Page Range / eLocation ID:
- 1170 to 1190
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Large-scale, pre-trained language models (LMs) have achieved human-level performance on a breadth of language understanding tasks. However, evaluations only based on end task performance shed little light on machines’ true ability in language understanding and reasoning. In this paper, we highlight the importance of evaluating the underlying reasoning process in addition to end performance. Toward this goal, we introduce Tiered Reasoning for Intuitive Physics (TRIP), a novel commonsense reasoning dataset with dense annotations that enable multi-tiered evaluation of machines’ reasoning process. Our empirical results show that while large LMs can achieve high end performance, they struggle to support their predictions with valid supporting evidence. The TRIP dataset and our baseline results will motivate verifiable evaluation of commonsense reasoning and facilitate future research toward developing better language understanding and reasoning models.more » « less
-
Despite increasing emphasis in the United States on promoting student engagement and achievement inscience, technology, engineering, and mathematics (STEM) fields, the origins of scientific literacy remainpoorly understood. We begin to address this limitation by considering the potential contributions oftwo distinct domain-general skills to early scientific literacy. Given their relevance to making predic-tions and evaluating evidence, we consider the degree to which causal reasoning skills relate to scientificliteracy (as measured by an adaptive standardized test specifically designed for preschoolers). We alsoconsider executive function (EF) as a potentially more fundamental contributor. While previous researchhas demonstrated that EF is predictive of achievement in other core academic domains like reading andmath, its relationship to scientific literacy, particularly in early childhood, has received little attention. Toexamine how causal reasoning and EF together potentially relate to the development of scientific literacyin young children, we recruited 125 3-year-olds to complete three causal reasoning tasks, three EF tasks,and the aforementioned measure of scientific literacy. Results from a series of hierarchical regressionsrevealed that EF, and one measure of causal reasoning (causal inferencing) were related to scientific liter-acy, even after controlling for age, ethnicity, maternal education, and vocabulary knowledge. Moreover,causal inferencing ability was a significant partial mediator between EF and scientific literacy. Althoughadditional research will be required to further specify the nature of these relationships, the current worksuggests that EF has the potential to support scientific literacy, perhaps in part, by scaffolding causalreasoning skills.more » « less
-
The Brønsted–Lowry acid–base model is fundamental when discussing acid and base strength in organic chemistry as many of the reactions include a competing proton transfer reaction. This model requires evaluating chemical stability via a consideration of electronic granularity. The purpose of this study is to identify students’ mental models on acid and base strength in terms of granularity and stability. Fourteen students enrolled in organic chemistry participated in this case study. Data were collected through semi-structured interviews including total case comparison tasks on stability, acidity, and basicity. Analysis of data revealed that there were four groups of students differentiated by their reasoning: (1) acid and base strength through structure without association to stability, (2) acid and base strength through electronics without association to stability, (3) acid strength associated with electronically centered stability, and (4) acid and base strength associated with electronically centered stability. This characterization can support teaching and research to promote reasoning that leads to a more consistent mental model across acid and base strength.more » « less
-
Over the course of the introductory calculus-based physics course, students are often expected to build conceptual understanding and develop and refine skills in problem solving and qualitative inferential reasoning. Many of the research-based materials developed over the past 30 years by the physics education research community use sequences of scaffolded questions to step students through a qualitative inferential reasoning chain. It is often tacitly assumed that, in addition to building conceptual understanding, such materials improve qualitative reasoning skills. However, clear documentation of the impact of such materials on qualitative reasoning skills is critical. New methodologies are needed to better study reasoning processes and to disentangle, to the extent possible, processes related to physics content from processes general to all human reasoning. As a result, we have employed network analysis methodologies to examine student responses to reasoning-related tasks in order to gain deeper insight into the nature of student reasoning in physics. In this paper, we show that network analysis metrics are both interpretable and valuable when applied to student reasoning data generated from . We also demonstrate that documentation of improvements in the articulation of specific lines of reasoning can be obtained from a network analysis of responses to reasoning chain construction tasks. Published by the American Physical Society2024more » « less
An official website of the United States government

