skip to main content


This content will become publicly available on July 24, 2025

Title: Probabilistic simulation supports generalizable intuitive physics
How do people perform general-purpose physical reasoning across a variety of scenarios in everyday life? Across two stud ies with seven different physical scenarios, we asked participants to predict whether or where two objects will make contact. People achieved high accuracy and were highly consistent with each other in their predictions. We hypothesize that this robust generalization is a consequence of mental simulations of noisy physics. We designed an “intuitive physics engine” model to capture this generalizable simulation. We find that this model generalized in human-like ways to unseen stimuli and to a different query of predictions. We evaluated several state-of-the-art deep learning and scene feature models on the same task and found that they could not explain human predictions as well. This study provides evidence that human’s robust generalization in physics predictions are supported by a probabilistic simulation model, and suggests the need for structure in learned dynamics models.  more » « less
Award ID(s):
2121009
PAR ID:
10542746
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Cognitive Science Society
Date Published:
Format(s):
Medium: X
Location:
https://escholarship.org/content/qt93j3f86q/qt93j3f86q_noSplash_cc592224fe0e74bb9c5a779349e62558.pdf
Sponsoring Org:
National Science Foundation
More Like this
  1. The social media have been increasingly used for disaster management (DM) via providing real time data on a broad scale. For example, some smartphone applications (e.g. Disaster Alert and Federal Emergency Management Agency (FEMA) App) can be used to increase the efficiency of prepositioning supplies and to enhance the effectiveness of disaster relief efforts. To maximize utilities of these apps, it is imperative to have robust human behavior models in social networks with detailed expressions of individual decision-making processes and of the interactions among people. In this paper, we introduce a hierarchical human behavior model by associating extended Decision Field Theory (e-DFT) with the opinion formation and innovation diffusion models. Particularly, its expressiveness and validity are addressed in three ways. First, we estimate individual’s choice patterns in social networks by deriving people’s asymptotic choice probabilities within e-DFT. Second, by analyzing opinion formation models and innovation diffusion models in different types of social networks, the effects of neighbor’s opinions on people and their interactions are demonstrated. Finally, an agent-based simulation is used to trace agents’ dynamic behaviors in different scenarios. The simulated results reveal that the proposed models can be used to establish better disaster management strategies in natural disasters. 
    more » « less
  2. Abstract

    Streamflow prediction is a long‐standing hydrologic problem. Development of models for streamflow prediction often requires incorporation of catchment physical descriptors to characterize the associated complex hydrological processes. Across different scales of catchments, these physical descriptors also allow models to extrapolate hydrologic information from one catchment to others, a process referred to as “regionalization”. Recently, in gauged basin scenarios, deep learning models have been shown to achieve state of the art regionalization performance by building a global hydrologic model. These models predict streamflow given catchment physical descriptors and weather forcing data. However, these physical descriptors are by their nature uncertain, sometimes incomplete, or even unavailable in certain cases, which limits the applicability of this approach. In this paper, we show that by assigning a vector of random values as a surrogate for catchment physical descriptors, we can achieve robust regionalization performance under a gauged prediction scenario. Our results show that the deep learning model using our proposed random vector approach achieves a predictive performance comparable to that of the model using actual physical descriptors. The random vector approach yields robust performance under different data sparsity scenarios and deep learning model selections. Furthermore, based on the use of random vectors, high‐dimensional characterization improves regionalization performance in gauged basin scenario when physical descriptors are uncertain, or insufficient.

     
    more » « less
  3. While current vision algorithms excel at many challenging tasks, it is unclear how well they understand the physical dynamics of real-world environments. Here we introduce Physion, a dataset and benchmark for rigorously evaluating the ability to predict how physical scenarios will evolve over time. Our dataset features realistic simulations of a wide range of physical phenomena, including rigid and soft-body collisions, stable multi-object configurations, rolling, sliding, and projectile motion, thus providing a more comprehensive challenge than previous benchmarks. We used Physion to benchmark a suite of models varying in their architecture, learning objective, input-output structure, and training data. In parallel, we obtained precise measurements of human prediction behavior on the same set of scenarios, allowing us to directly evaluate how well any model could approximate human behavior. We found that vision algorithms that learn object-centric representations generally outperform those that do not, yet still fall far short of human performance. On the other hand, graph neural networks with direct access to physical state information both perform substantially better and make predictions that are more similar to those made by humans. These results suggest that extracting physical representations of scenes is the main bottleneck to achieving human-level and human-like physical understanding in vision algorithms. We have publicly released all data and code to facilitate the use of Physion to benchmark additional models in a fully reproducible manner, enabling systematic evaluation of progress towards vision algorithms that understand physical environments as robustly as people do. 
    more » « less
  4. Learners that are exposed to the same training data might generalize differently due to differing inductive biases. In neural network models, inductive biases could in theory arise from any aspect of the model architecture. We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks, English question formation and English tense reinflection. For both tasks, the training set is consistent with a generalization based on hierarchical structure and a generalization based on linear order. All architectural factors that we investigated qualitatively affected how models generalized, including factors with no clear connection to hierarchical structure. For example, LSTMs and GRUs displayed qualitatively different inductive biases. However, the only factor that consistently contributed a hierarchical bias across tasks was the use of a tree-structured model rather than a model with sequential recurrence, suggesting that human-like syntactic generalization requires architectural syntactic structure. 
    more » « less
  5. Query driven cardinality estimation models learn from a historical log of queries. They are lightweight, having low storage requirements, fast inference and training, and are easily adaptable for any kind of query. Unfortunately, such models can suffer unpredictably bad performance under workload drift, i.e., if the query pattern or data changes. This makes them unreliable and hard to deploy. We analyze the reasons why models become unpredictable due to workload drift, and introduce modifications to the query representation and neural network training techniques to make query-driven models robust to the effects of workload drift. First, we emulate workload drift in queries involving some unseen tables or columns by randomly masking out some table or column features during training. This forces the model to make predictions with missing query information, relying more on robust features based on up-to-date DBMS statistics that are useful even when query or data drift happens. Second, we introduce join bitmaps, which extends sampling-based features to be consistent across joins using ideas from sideways information passing. Finally, we show how both of these ideas can be adapted to handle data updates.

    We show significantly greater generalization than past works across different workloads and databases. For instance, a model trained with our techniques on a simple workload (JOBLight-train), with 40ksynthetically generated queries of at most 3 tables each, is able to generalize to the much more complex Join Order Benchmark, which include queries with up to 16 tables, and improve query runtimes by 2× over PostgreSQL. We show similar robustness results with data updates, and across other workloads. We discuss the situations where we expect, and see, improvements, as well as more challenging workload drift scenarios where these techniques do not improve much over PostgreSQL. However, even in the most challenging scenarios, our models never perform worse than PostgreSQL, while standard query driven models can get much worse than PostgreSQL.

     
    more » « less