It is critical that machine learning (ML) model predictions be trustworthy for highthroughput catalyst discovery approaches. Uncertainty quantification (UQ) methods allow estimation of the trustworthiness of an ML model, but these methods have not been well explored in the field of heterogeneous catalysis. Herein, we investigate different UQ methods applied to a crystal graph convolutional neural network to predict adsorption energies of molecules on alloys from the Open Catalyst 2020 dataset, the largest existing heterogeneous catalyst dataset. We apply three UQ methods to the adsorption energy predictions, namely
Neural networks (NN) have become an important tool for prediction tasks—both regression and classification—in environmental science. Since many environmentalscience problems involve lifeordeath decisions and policy making, it is crucial to provide not only predictions but also an estimate of the uncertainty in the predictions. Until recently, very few tools were available to provide uncertainty quantification (UQ) for NN predictions. However, in recent years the computerscience field has developed numerous UQ approaches, and several research groups are exploring how to apply these approaches in environmental science. We provide an accessible introduction to six of these UQ approaches, then focus on tools for the next step, namely, to answer the question:
Neural networks are used for many environmentalscience applications, some involving lifeordeath decisionmaking. In recent years new methods have been developed to provide muchneeded uncertainty estimates for NN predictions. We seek to accelerate the adoption of these methods in the environmentalscience community with an accessible introduction to 1) methods for computing uncertainty estimates in NN predictions and 2) methods for evaluating such estimates.
 Award ID(s):
 1934668
 NSFPAR ID:
 10405615
 Publisher / Repository:
 American Meteorological Society
 Date Published:
 Journal Name:
 Artificial Intelligence for the Earth Systems
 Volume:
 2
 Issue:
 2
 ISSN:
 27697525
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Abstract k fold ensembling, Monte Carlo dropout, and evidential regression. The effectiveness of each UQ method is assessed based on accuracy, sharpness, dispersion, calibration, and tightness. Evidential regression is demonstrated to be a powerful approach for rapidly obtaining tunable, competitively trustworthy UQ estimates for heterogeneous catalysis applications when using neural networks. Recalibration of model uncertainties is shown to be essential in practical screening applications of catalysts using uncertainties. 
Deep Learning (DL) methods have been transforming computer vision with innovative adaptations to other domains including climate change. For DL to pervade Science and Engineering (S&EE) applications where risk management is a core component, wellcharacterized uncertainty estimates must accompany predictions. However, S&E observations and modelsimulations often follow heavily skewed distributions and are not well modeled with DL approaches, since they usually optimize a Gaussian, or Euclidean, likelihood loss. Recent developments in Bayesian Deep Learning (BDL), which attempts to capture uncertainties from noisy observations, aleatoric, and from unknown model parameters, epistemic, provide us a foundation. Here we present a discretecontinuous BDL model with Gaussian and lognormal likelihoods for uncertainty quantification (UQ). We demonstrate the approach by developing UQ estimates on “DeepSD’‘, a superresolution based DL model for Statistical Downscaling (SD) in climate applied to precipitation, which follows an extremely skewed distribution. We find that the discretecontinuous models outperform a basic Gaussian distribution in terms of predictive accuracy and uncertainty calibration. Furthermore, we find that the lognormal distribution, which can handle skewed distributions, produces quality uncertainty estimates at the extremes. Such results may be important across S&E, as well as other domains such as finance and economics, where extremes are often of significant interest. Furthermore, to our knowledge, this is the first UQ model in SD where both aleatoric and epistemic uncertainties are characterized.more » « less

INTRODUCTION Solving quantum manybody problems, such as finding ground states of quantum systems, has farreaching consequences for physics, materials science, and chemistry. Classical computers have facilitated many profound advances in science and technology, but they often struggle to solve such problems. Scalable, faulttolerant quantum computers will be able to solve a broad array of quantum problems but are unlikely to be available for years to come. Meanwhile, how can we best exploit our powerful classical computers to advance our understanding of complex quantum systems? Recently, classical machine learning (ML) techniques have been adapted to investigate problems in quantum manybody physics. So far, these approaches are mostly heuristic, reflecting the general paucity of rigorous theory in ML. Although they have been shown to be effective in some intermediatesize experiments, these methods are generally not backed by convincing theoretical arguments to ensure good performance. RATIONALE A central question is whether classical ML algorithms can provably outperform nonML algorithms in challenging quantum manybody problems. We provide a concrete answer by devising and analyzing classical ML algorithms for predicting the properties of ground states of quantum systems. We prove that these ML algorithms can efficiently and accurately predict groundstate properties of gapped local Hamiltonians, after learning from data obtained by measuring other ground states in the same quantum phase of matter. Furthermore, under a widely accepted complexitytheoretic conjecture, we prove that no efficient classical algorithm that does not learn from data can achieve the same prediction guarantee. By generalizing from experimental data, ML algorithms can solve quantum manybody problems that could not be solved efficiently without access to experimental data. RESULTS We consider a family of gapped local quantum Hamiltonians, where the Hamiltonian H ( x ) depends smoothly on m parameters (denoted by x ). The ML algorithm learns from a set of training data consisting of sampled values of x , each accompanied by a classical representation of the ground state of H ( x ). These training data could be obtained from either classical simulations or quantum experiments. During the prediction phase, the ML algorithm predicts a classical representation of ground states for Hamiltonians different from those in the training data; groundstate properties can then be estimated using the predicted classical representation. Specifically, our classical ML algorithm predicts expectation values of products of local observables in the ground state, with a small error when averaged over the value of x . The run time of the algorithm and the amount of training data required both scale polynomially in m and linearly in the size of the quantum system. Our proof of this result builds on recent developments in quantum information theory, computational learning theory, and condensed matter theory. Furthermore, under the widely accepted conjecture that nondeterministic polynomialtime (NP)–complete problems cannot be solved in randomized polynomial time, we prove that no polynomialtime classical algorithm that does not learn from data can match the prediction performance achieved by the ML algorithm. In a related contribution using similar proof techniques, we show that classical ML algorithms can efficiently learn how to classify quantum phases of matter. In this scenario, the training data consist of classical representations of quantum states, where each state carries a label indicating whether it belongs to phase A or phase B . The ML algorithm then predicts the phase label for quantum states that were not encountered during training. The classical ML algorithm not only classifies phases accurately, but also constructs an explicit classifying function. Numerical experiments verify that our proposed ML algorithms work well in a variety of scenarios, including Rydberg atom systems, twodimensional random Heisenberg models, symmetryprotected topological phases, and topologically ordered phases. CONCLUSION We have rigorously established that classical ML algorithms, informed by data collected in physical experiments, can effectively address some quantum manybody problems. These rigorous results boost our hopes that classical ML trained on experimental data can solve practical problems in chemistry and materials science that would be too hard to solve using classical processing alone. Our arguments build on the concept of a succinct classical representation of quantum states derived from randomized Pauli measurements. Although some quantum devices lack the local control needed to perform such measurements, we expect that other classical representations could be exploited by classical ML with similarly powerful results. How can we make use of accessible measurement data to predict properties reliably? Answering such questions will expand the reach of nearterm quantum platforms. Classical algorithms for quantum manybody problems. Classical ML algorithms learn from training data, obtained from either classical simulations or quantum experiments. Then, the ML algorithm produces a classical representation for the ground state of a physical system that was not encountered during training. Classical algorithms that do not learn from data may require substantially longer computation time to achieve the same task.more » « less

Abstract Probabilistic near‐term forecasting facilitates evaluation of model predictions against observations and is of pressing need in ecology to inform environmental decision‐making and effect societal change. Despite this imperative, many ecologists are unfamiliar with the widely used tools for evaluating probabilistic forecasts developed in other fields. We address this gap by reviewing the literature on probabilistic forecast evaluation from diverse fields including climatology, economics, and epidemiology. We present established practices for selecting evaluation data (end‐sample hold out), graphical forecast evaluation (times‐series plots with uncertainty, probability integral transform plots), quantitative evaluation using scoring rules (log, quadratic, spherical, and ranked probability scores), and comparing scores across models (skill score, Diebold–Mariano test). We cover common approaches, highlight mathematical concepts to follow, and note decision points to allow application of general principles to specific forecasting endeavors. We illustrate these approaches with an application to a long‐term rodent population time series currently used for ecological forecasting and discuss how ecology can continue to learn from and drive the cross‐disciplinary field of forecasting science.

Rationale Many insect species undertake multigenerational migrations in the Afro‐tropical and Palearctic ranges, and understanding their migratory connectivity remains challenging due to their small size, short life span and large population sizes. Hydrogen isotopes (
δ ^{2}H) can be used to reconstruct the movement of dispersing or migrating insects, but applyingδ ^{2}H for provenance requires a robust isotope baseline map (i.e. isoscape) for the Afro‐Palearctic.Methods We analyzed the
δ ^{2}H in the wings (δ ^{2}H_{wing}) of 142 resident butterflies from 56 sites across the Afro‐Palearctic. Theδ ^{2}H_{wing}values were compared to the predicted local growing‐season precipitationδ ^{2}H values (δ ^{2}H_{GSP}) using a linear regression model to develop an insect wingδ ^{2}H isoscape. We used multivariate linear mixed models and high‐resolution and time‐specific remote sensing climate and environmental data to explore the controls of the residualδ ^{2}H_{wing}variability.Results A strong linear relationship was found between
δ ^{2}H_{wing}andδ ^{2}H_{GSP}values (r ^{2} = 0.53). The resulting isoscape showed strong patterns across the Palearctic but limited variation and high uncertainty for the Afro‐tropics. Positive residuals of this relationship were correlated with dry conditions for the month preceding sampling whereas negative residuals were correlated with more wet days for the month preceding sampling. High intra‐siteδ ^{2}H_{wing}variance was associated with lower relative humidity for the month preceding sampling and higher elevation.Conclusion The
δ ^{2}H_{wing}isoscape is applicable for tracing herbivorous lepidopteran insects that migrate across the Afro‐Palearctic range but has limited geolocation potential in the Afro‐tropics. The spatial analysis of uncertainty using high‐resolution climatic data demonstrated that many African regions with highly variable evaporation rates and relative humidity haveδ ^{2}H_{wing}values that are less related toδ ^{2}H_{GSP}values. Increasing geolocation precision will require new modeling approaches using more time‐specific environmental data and/or independent geolocation tools.