skip to main content


Title: Synthesizing theories of human language with Bayesian program induction
Abstract Automated, data-driven construction and evaluation of scientific models and theories is a long-standing challenge in artificial intelligence. We present a framework for algorithmically synthesizing models of a basic part of human language: morpho-phonology, the system that builds word forms from sounds. We integrate Bayesian inference with program synthesis and representations inspired by linguistic theory and cognitive models of learning and discovery. Across 70 datasets from 58 diverse languages, our system synthesizes human-interpretable models for core aspects of each language’s morpho-phonology, sometimes approaching models posited by human linguists. Joint inference across all 70 data sets automatically synthesizes a meta-model encoding interpretable cross-language typological tendencies. Finally, the same algorithm captures few-shot learning dynamics, acquiring new morphophonological rules from just one or a few examples. These results suggest routes to more powerful machine-enabled discovery of interpretable models in linguistics and other scientific domains.  more » « less
Award ID(s):
1918839
NSF-PAR ID:
10403562
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Nature Communications
Volume:
13
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Nutrients, such as nitrogen and phosphorus, provide vital support for human life, but overloading nutrients to the Earth system leads to environmental concerns, such as water and air pollution on local scales and climate change on the global scale. With an urgent need to feed the world's growing population and the growing concern over nutrient pollution and climate change, sustainable nutrient management has become a major challenge for this century. To address this challenge, the growing body of research on nutrient budgets, namely the nutrient inputs and outputs of a given system, has provided great opportunities for improving scientific knowledge of the complex nutrient cycles in the coupled human and natural systems. This knowledge can help inform stakeholders, such as farmers, consumers, and policy makers, on their decisions related to nutrient management. This paper systematically reviews major challenges, as well as opportunities, in defining, quantifying, and applying nutrient budgets. Nutrient budgets have been defined for various systems with different research or application purposes, but the lack of consistency in the system definition and its budget terms has hindered intercomparison among studies and experience‐sharing among researchers and regions. Our review synthesizes existing nutrient budgets under a framework with five systems (i.e.,Soil‐Plantsystem,Animalsystem,Animal‐Plant‐Soilsystem,Agro‐Foodsystem, andLandscapesystem) and four spatial scales (i.e., Plot and Farm, Watershed, National, and Global scales). We define these systems and identify issues of nitrogen and phosphorus budgets within each. Few nutrient budgets have been well balanced at any scale, due to the large uncertainties in the quantification of several major budget terms. The type and level of challenges vary across spatial scales and also differ among nutrients. Improvement in nutrient budgets will rely not only on the technological advancement of scientific observations and models but also on better bookkeeping of human activity data. While some nutrient budget terms may need decades, or even centuries, of research to be well quantified within desirable levels of uncertainties, it is imperative to effectively communicate to interested stakeholders our understanding of nutrient budgets so that scientists and a variety of stakeholders can work together to address the sustainable nutrient management challenge of this century.

     
    more » « less
  2. null (Ed.)
    Action selection policies (ASPs), used to compose low-level robot skills into complex high-level tasks are commonly represented as neural networks (NNs) in the state of the art. Such a paradigm, while very effective, suffers from a few key problems: 1) NNs are opaque to the user and hence not amenable to verification, 2) they require significant amounts of training data, and 3) they are hard to repair when the domain changes. We present two key insights about ASPs for robotics. First, ASPs need to reason about physically meaningful quantities derived from the state of the world, and second, there exists a layered structure for composing these policies. Leveraging these insights, we introduce layered dimension-informed program synthesis (LDIPS) – by reasoning about the physical dimensions of state variables, and dimensional constraints on operators, LDIPS directly synthesizes ASPs in a human-interpretable domain-specific language that is amenable to program repair. We present empirical results to demonstrate that LDIPS 1) can synthesize effective ASPs for robot soccer and autonomous driving domains, 2) enables tractable synthesis for robot action selection policies not possible with state of the art synthesis techniques, 3) requires two orders of magnitude fewer training examples than a comparable NN representation, and 4) can repair the synthesized ASPs with only a small number of corrections when transferring from simulation to real robots. 
    more » « less
  3. null (Ed.)
    Action selection policies (ASPs), used to compose low-level robot skills into complex high-level tasks are commonly represented as neural networks (NNs) in the state of the art. Such a paradigm, while very effective, suffers from a few key problems: 1) NNs are opaque to the user and hence not amenable to verification, 2) they require significant amounts of training data, and 3) they are hard to repair when the domain changes. We present two key insights about ASPs for robotics. First, ASPs need to reason about physically meaningful quantities derived from the state of the world, and second, there exists a layered structure for composing these policies. Leveraging these insights, we introduce layered dimension-informed program synthesis (LDIPS) - by reasoning about the physical dimensions of state variables, and dimensional constraints on operators, LDIPS directly synthesizes ASPs in a human-interpretable domain-specific language that is amenable to program repair. We present empirical results to demonstrate that LDIPS 1) can synthesize effective ASPs for robot soccer and autonomous driving domains, 2) requires two orders of magnitude fewer training examples than a comparable NN representation, and 3) can repair the synthesized ASPs with only a small number of corrections when transferring from simulation to real robots. 
    more » « less
  4. The roundwormCaenorhabditis elegansexhibits robust escape behavior in response to rapidly rising temperature. The behavior lasts for a few seconds, shows history dependence, involves both sensory and motor systems, and is too complicated to model mechanistically using currently available knowledge. Instead we model the process phenomenologically, and we use theSir Isaacdynamical inference platform to infer the model in a fully automated fashion directly from experimental data. The inferred model requires incorporation of an unobserved dynamical variable and is biologically interpretable. The model makes accurate predictions about the dynamics of the worm behavior, and it can be used to characterize the functional logic of the dynamical system underlying the escape response. This work illustrates the power of modern artificial intelligence to aid in discovery of accurate and interpretable models of complex natural systems.

     
    more » « less
  5. Abstract

    There is an opportunity for deep learning to revolutionize science and technology by revealing its findings in a human interpretable manner. To do this, we develop a novel data-driven approach for creating a human–machine partnership to accelerate scientific discovery. By collecting physical system responses under excitations drawn from a Gaussian process, we train rational neural networks to learn Green’s functions of hidden linear partial differential equations. These functions reveal human-understandable properties and features, such as linear conservation laws and symmetries, along with shock and singularity locations, boundary effects, and dominant modes. We illustrate the technique on several examples and capture a range of physics, including advection–diffusion, viscous shocks, and Stokes flow in a lid-driven cavity.

     
    more » « less