skip to main content


This content will become publicly available on July 31, 2024

Title: Learning useful representations for shifting tasks and distributions
Does the dominant approach to learn representations (as a side effect of optimizing an expected cost for a single training distribution) remain a good approach when we are dealing with multiple distributions? Our thesis is that such scenarios are better served by representations that are richer than those obtained with a single optimization episode. We support this thesis with simple theoretical arguments and with experiments utilizing an apparently na\"ıve ensembling technique: concatenating the representations obtained from multiple training episodes using the same data, model, algorithm, and hyper-parameters, but different random seeds. These independently trained networks perform similarly. Yet, in a number of scenarios involving new distributions, the concatenated representation performs substantially better than an equivalently sized network trained with a single training run. This proves that the representations constructed by multiple training episodes are in fact different. Although their concatenation carries little additional information about the training task under the training distribution, it becomes substantially more informative when tasks or distributions change. Meanwhile, a single training episode is unlikely to yield such a redundant representation because the optimization process has no reason to accumulate features that do not incrementally improve the training performance.  more » « less
Award ID(s):
1922658
NSF-PAR ID:
10437758
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ICML 2023
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Solving quantum many-body problems, such as finding ground states of quantum systems, has far-reaching consequences for physics, materials science, and chemistry. Classical computers have facilitated many profound advances in science and technology, but they often struggle to solve such problems. Scalable, fault-tolerant quantum computers will be able to solve a broad array of quantum problems but are unlikely to be available for years to come. Meanwhile, how can we best exploit our powerful classical computers to advance our understanding of complex quantum systems? Recently, classical machine learning (ML) techniques have been adapted to investigate problems in quantum many-body physics. So far, these approaches are mostly heuristic, reflecting the general paucity of rigorous theory in ML. Although they have been shown to be effective in some intermediate-size experiments, these methods are generally not backed by convincing theoretical arguments to ensure good performance. RATIONALE A central question is whether classical ML algorithms can provably outperform non-ML algorithms in challenging quantum many-body problems. We provide a concrete answer by devising and analyzing classical ML algorithms for predicting the properties of ground states of quantum systems. We prove that these ML algorithms can efficiently and accurately predict ground-state properties of gapped local Hamiltonians, after learning from data obtained by measuring other ground states in the same quantum phase of matter. Furthermore, under a widely accepted complexity-theoretic conjecture, we prove that no efficient classical algorithm that does not learn from data can achieve the same prediction guarantee. By generalizing from experimental data, ML algorithms can solve quantum many-body problems that could not be solved efficiently without access to experimental data. RESULTS We consider a family of gapped local quantum Hamiltonians, where the Hamiltonian H ( x ) depends smoothly on m parameters (denoted by x ). The ML algorithm learns from a set of training data consisting of sampled values of x , each accompanied by a classical representation of the ground state of H ( x ). These training data could be obtained from either classical simulations or quantum experiments. During the prediction phase, the ML algorithm predicts a classical representation of ground states for Hamiltonians different from those in the training data; ground-state properties can then be estimated using the predicted classical representation. Specifically, our classical ML algorithm predicts expectation values of products of local observables in the ground state, with a small error when averaged over the value of x . The run time of the algorithm and the amount of training data required both scale polynomially in m and linearly in the size of the quantum system. Our proof of this result builds on recent developments in quantum information theory, computational learning theory, and condensed matter theory. Furthermore, under the widely accepted conjecture that nondeterministic polynomial-time (NP)–complete problems cannot be solved in randomized polynomial time, we prove that no polynomial-time classical algorithm that does not learn from data can match the prediction performance achieved by the ML algorithm. In a related contribution using similar proof techniques, we show that classical ML algorithms can efficiently learn how to classify quantum phases of matter. In this scenario, the training data consist of classical representations of quantum states, where each state carries a label indicating whether it belongs to phase A or phase B . The ML algorithm then predicts the phase label for quantum states that were not encountered during training. The classical ML algorithm not only classifies phases accurately, but also constructs an explicit classifying function. Numerical experiments verify that our proposed ML algorithms work well in a variety of scenarios, including Rydberg atom systems, two-dimensional random Heisenberg models, symmetry-protected topological phases, and topologically ordered phases. CONCLUSION We have rigorously established that classical ML algorithms, informed by data collected in physical experiments, can effectively address some quantum many-body problems. These rigorous results boost our hopes that classical ML trained on experimental data can solve practical problems in chemistry and materials science that would be too hard to solve using classical processing alone. Our arguments build on the concept of a succinct classical representation of quantum states derived from randomized Pauli measurements. Although some quantum devices lack the local control needed to perform such measurements, we expect that other classical representations could be exploited by classical ML with similarly powerful results. How can we make use of accessible measurement data to predict properties reliably? Answering such questions will expand the reach of near-term quantum platforms. Classical algorithms for quantum many-body problems. Classical ML algorithms learn from training data, obtained from either classical simulations or quantum experiments. Then, the ML algorithm produces a classical representation for the ground state of a physical system that was not encountered during training. Classical algorithms that do not learn from data may require substantially longer computation time to achieve the same task. 
    more » « less
  2. There often is a dilemma between ease of optimization and robust out-of-distribution (OoD) generalization. For instance, many OoD methods rely on penalty terms whose optimization is challenging. They are either too strong to optimize reliably or too weak to achieve their goals. We propose to initialize the networks with a rich representation containing a palette of potentially useful features, ready to be used by even simple models. On the one hand, a rich representation provides a good initialization for the optimizer. On the other hand, it also provides an inductive bias that helps OoD generalization. Such a representation is constructed with the Rich Feature Construction (RFC) algorithm, also called the Bonsai algorithm, which consists of a succession of training episodes. During discovery episodes, we craft a multi-objective optimization criterion and its associated datasets in a manner that prevents the network from using the features constructed in the previous iterations. During synthesis episodes, we use knowledge distillation to force the network to simultaneously represent all the previously discovered features. Initializing the networks with Bonsai representations consistently helps six OoD methods achieve top performance on ColoredMNIST benchmark. The same technique substantially outperforms comparable results on the Wilds Camelyon17 task, eliminates the high result variance that plagues other methods, and makes hyperparameter tuning and model selection more reliable. 
    more » « less
  3. Introduction The notion of a single localized store of word representations has become increasingly less plausible as evidence has accumulated for the widely distributed neural representation of wordform grounded in motor, perceptual, and conceptual processes. Here, we attempt to combine machine learning methods and neurobiological frameworks to propose a computational model of brain systems potentially responsible for wordform representation. We tested the hypothesis that the functional specialization of word representation in the brain is driven partly by computational optimization. This hypothesis directly addresses the unique problem of mapping sound and articulation vs. mapping sound and meaning. Results We found that artificial neural networks trained on the mapping between sound and articulation performed poorly in recognizing the mapping between sound and meaning and vice versa. Moreover, a network trained on both tasks simultaneously could not discover the features required for efficient mapping between sound and higher-level cognitive states compared to the other two models. Furthermore, these networks developed internal representations reflecting specialized task-optimized functions without explicit training. Discussion Together, these findings demonstrate that different task-directed representations lead to more focused responses and better performance of a machine or algorithm and, hypothetically, the brain. Thus, we imply that the functional specialization of word representation mirrors a computational optimization strategy given the nature of the tasks that the human brain faces. 
    more » « less
  4. Abstract

    Warm rain collision‐coalescence has been persistently difficult to parameterize in bulk microphysics schemes. We use a flexible bulk microphysics scheme with bin scheme process parameterizations, called AMP, to investigate reasons for the difficulty. AMP is configured in a variety of ways to mimic bulk schemes and is compared to simulations with the bin scheme upon which AMP is built. We find that an important limitation in traditional bulk schemes is the use of separate cloud and rain categories. When the drop size distribution is instead represented by a continuous distribution, the simulation of cloud‐to‐rain conversion is substantially improved. We also find large sensitivity to the threshold size to distinguish cloud and rain in traditional schemes; substantial improvement is found by decreasing the threshold from 40 to 25 μm. Neither the use of an assumed functional form for the size distribution nor the choice of predicted distribution moments has a large impact on the ability of AMP to simulate rain production. When predicting four total moments of the liquid drop size distribution, either with a traditional two‐category, two‐moment scheme with a reduced size threshold, or a four‐moment single‐category scheme, errors in the evolution of mass and the cloud size distribution are similar, but the single‐category scheme has a substantially better representation of the rain size distribution. Optimal moment combinations for the single‐category approach are investigated and appear to be linked more to the information content they provide for constraining the size distributions than to their correlation with collision‐coalescence rates.

     
    more » « less
  5. Abstract

    Conservation translocation projects must carefully balance multiple, potentially competing objectives (e.g. population viability, retention of genetic diversity, delivery of key ecological services) against conflicting stakeholder values and severe time and cost constraints. Advanced decision support tools would facilitate identifying practical solutions.

    We examined how to achieve compromise across competing objectives in conservation translocations via an examination of giant tortoises in the Galapagos Islands with ancestry from the extinct Floreana Island species (Chelonoidis niger). Efforts have begun to populate Floreana Island with tortoises genetically similar to its historical inhabitants while balancing three potentially competing objectives – restoring ecosystem services (sustaining a high tortoise population size), maximizing genome representation of the extinctC. nigerspecies and maintaining a genetically diverse population – under realistic cost constraints.

    We developed a novel approach to this conservation decision problem by coupling an individual‐based simulation model with generalized additive models and global optimization. We identified several incompatibilities among programme objectives, with quasi‐optimal single‐objective solutions (sets of management actions) differing substantially in programme duration, translocation age, incubation temperature (determinant of sex ratio) and the number of individuals directly translocated from the source population.

    Quasi‐optimal single‐objective solutions were able to produce outcomes (i.e. population size and measures of genetic diversity andC. nigergenome representation) to within 75% of their highest simulated outcomes (e.g. highest population size achieved across all simulations) within a cost constraint ofc. $2m USD, but these solutions resulted in severe declines (up to 74% reduction) in outcomes for non‐focal objectives. However, when all programme objectives were equally weighted to produce a multi‐objective solution, all objectives were met to within 90% of the highest achievable mean values across all cost constraints.

    Synthesis and applications. Multi‐objective conservation translocations are likely to encounter complex trade‐offs and conflicts among programme objectives. Here, we developed a novel combination of modelling approaches to identify optimal management strategies. We found that solutions that simultaneously addressed multiple, competing objectives performed better than single‐objective solutions. Our model‐based decision support tool demonstrates that timely, cost‐effective solutions can be identified in cases where management objectives appear to be incompatible.

     
    more » « less