skip to main content


Search for: All records

Award ID contains: 2053804

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We present an online post-hoc calibration method, called Online Platt Scaling (OPS), which combines the Platt scaling technique with online logistic regression. We demonstrate that OPS smoothly adapts between i.i.d. and non-i.i.d. settings with distribution drift. Further, in scenarios where the best Platt scaling model is itself miscalibrated, we enhance OPS by incorporating a recently developed technique called calibeating to make it more robust. Theoretically, our resulting OPS+calibeating method is guaranteed to be calibrated for adversarial outcome sequences. Empirically, it is effective on a range of synthetic and real-world datasets, with and without distribution drifts, achieving superior performance without hyperparameter tuning. Finally, we extend all OPS ideas to the beta scaling method. 
    more » « less
    Free, publicly-accessible full text available July 1, 2024
  2. Free, publicly-accessible full text available June 1, 2024
  3. Abstract Because geostationary satellite (Geo) imagery provides a high temporal resolution window into tropical cyclone (TC) behavior, we investigate the viability of its application to short-term probabilistic forecasts of TC convective structure to subsequently predict TC intensity. Here, we present a prototype model that is trained solely on two inputs: Geo infrared imagery leading up to the synoptic time of interest and intensity estimates up to 6 h prior to that time. To estimate future TC structure, we compute cloud-top temperature radial profiles from infrared imagery and then simulate the evolution of an ensemble of those profiles over the subsequent 12 h by applying a deep autoregressive generative model (PixelSNAIL). To forecast TC intensities at hours 6 and 12, we input operational intensity estimates up to the current time (0 h) and simulated future radial profiles up to +12 h into a “nowcasting” convolutional neural network. We limit our inputs to demonstrate the viability of our approach and to enable quantification of value added by the observed and simulated future radial profiles beyond operational intensity estimates alone. Our prototype model achieves a marginally higher error than the National Hurricane Center’s official forecasts despite excluding environmental factors, such as vertical wind shear and sea surface temperature. We also demonstrate that it is possible to reasonably predict short-term evolution of TC convective structure via radial profiles from Geo infrared imagery, resulting in interpretable structural forecasts that may be valuable for TC operational guidance. Significance Statement This work presents a new method of short-term probabilistic forecasting for tropical cyclone (TC) convective structure and intensity using infrared geostationary satellite observations. Our prototype model’s performance indicates that there is some value in observed and simulated future cloud-top temperature radial profiles for short-term intensity forecasting. The nonlinear nature of machine learning tools can pose an interpretation challenge, but structural forecasts produced by our model can be directly evaluated and, thus, may offer helpful guidance to forecasters regarding short-term TC evolution. Since forecasters are time limited in producing each advisory package despite a growing wealth of satellite observations, a tool that captures recent TC convective evolution and potential future changes may support their assessment of TC behavior in crafting their forecasts. 
    more » « less
    Free, publicly-accessible full text available June 1, 2024
  4. Abstract Atmospheric aerosols influence the Earth’s climate, primarily by affecting cloud formation and scattering visible radiation. However, aerosol-related physical processes in climate simulations are highly uncertain. Constraining these processes could help improve model-based climate predictions. We propose a scalable statistical framework for constraining the parameters of expensive climate models by comparing model outputs with observations. Using the C3.AI Suite, a cloud computing platform, we use a perturbed parameter ensemble of the UKESM1 climate model to efficiently train a surrogate model. A method for estimating a data-driven model discrepancy term is described. The strict bounds method is applied to quantify parametric uncertainty in a principled way. We demonstrate the scalability of this framework with 2 weeks’ worth of simulated aerosol optical depth data over the South Atlantic and Central African region, written from the model every 3 hr and matched in time to twice-daily MODIS satellite observations. When constraining the model using real satellite observations, we establish constraints on combinations of two model parameters using much higher time-resolution outputs from the climate model than previous studies. This result suggests that within the limits imposed by an imperfect climate model, potentially very powerful constraints may be achieved when our framework is scaled to the analysis of more observations and for longer time periods. 
    more » « less
  5. Ruiz, F. ; Dy, J. ; Meent, J.-W. (Ed.)
    Prediction algorithms, such as deep neural networks (DNNs), are used in many domain sciences to directly estimate internal parameters of interest in simulator-based models, especially in settings where the observations include images or complex high-dimensional data. In parallel, modern neural density estimators, such as normalizing flows, are becoming increasingly popular for uncertainty quantification, especially when both parameters and observations are high-dimensional. However, parameter inference is an inverse problem and not a prediction task; thus, an open challenge is to construct conditionally valid and precise confidence regions, with a guaranteed probability of covering the true parameters of the data-generating process, no matter what the (unknown) parameter values are, and without relying on large-sample theory. Many simulator-based inference (SBI) methods are indeed known to produce biased or overly con- fident parameter regions, yielding misleading uncertainty estimates. This paper presents WALDO, a novel method to construct confidence regions with finite-sample conditional validity by leveraging prediction algorithms or posterior estimators that are currently widely adopted in SBI. WALDO reframes the well-known Wald test statistic, and uses a computationally efficient regression-based machinery for classical Neyman inversion of hypothesis tests. We apply our method to a recent high-energy physics problem, where prediction with DNNs has previously led to estimates with prediction bias. We also illustrate how our approach can correct overly confident posterior regions computed with normalizing flows. 
    more » « less
  6. Abstract Unfolding is an ill-posed inverse problem in particle physics aiming to infer a true particle-level spectrum from smeared detector-level data. For computational and practical reasons, these spaces are typically discretized using histograms, and the smearing is modeled through a response matrix corresponding to a discretized smearing kernel of the particle detector. This response matrix depends on the unknown shape of the true spectrum, leading to a fundamental systematic uncertainty in the unfolding problem. To handle the ill-posed nature of the problem, common approaches regularize the problem either directly via methods such as Tikhonov regularization, or implicitly by using wide-bins in the true space that match the resolution of the detector. Unfortunately, both of these methods lead to a non-trivial bias in the unfolded estimator, thereby hampering frequentist coverage guarantees for confidence intervals constructed from these methods. We propose two new approaches to addressing the bias in the wide-bin setting through methods called One-at-a-time Strict Bounds (OSB) and Prior-Optimized (PO) intervals. The OSB intervals are a bin-wise modification of an existing guaranteed-coverage procedure, while the PO intervals are based on a decision-theoretic view of the problem. Importantly, both approaches provide well-calibrated frequentist confidence intervals even in constrained and rank-deficient settings. These methods are built upon a more general answer to the wide-bin bias problem, involving unfolding with fine bins first, followed by constructing confidence intervals for linear functionals of the fine-bin counts. We test and compare these methods to other available methodologies in a wide-bin deconvolution example and a realistic particle physics simulation of unfolding a steeply falling particle spectrum. 
    more » « less
  7. We study the problem of making calibrated probabilistic forecasts for a binary sequence generated by an adversarial nature. Following the seminal paper of Foster and Vohra (1998), nature is often modeled as an adaptive adversary who sees all activity of the forecaster except the randomization that the forecaster may deploy. A number of papers have proposed randomized forecasting strategies that achieve an ϵ-calibration error rate of O(1/sqrt T), which we prove is tight in general. On the other hand, it is well known that it is not possible to be calibrated without randomization, or if nature also sees the forecaster's randomization; in both cases the calibration error could be Ω(1). Inspired by the equally seminal works on the "power of two choices" and imprecise probability theory, we study a small variant of the standard online calibration problem. The adversary gives the forecaster the option of making two nearby probabilistic forecasts, or equivalently an interval forecast of small width, and the endpoint closest to the revealed outcome is used to judge calibration. This power of two choices, or imprecise forecast, accords the forecaster with significant power -- we show that a faster ϵ-calibration rate of O(1/T) can be achieved even without deploying any randomization. 
    more » « less
  8. When deployed in the real world, machine learning models inevitably encounter changes in the data distribution, and certain—but not all—distribution shifts could result in significant performance degradation. In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially, making interventions by a human expert (or model retraining) unnecessary. While several works have developed tests for distribution shifts, these typically either use non-sequential methods, or detect arbitrary shifts (benign or harmful), or both. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate. In this work, we design simple sequential tools for testing if the difference between source (training) and target (test) distributions leads to a significant increase in a risk function of interest, like accuracy or calibration. Recent advances in constructing time-uniform confidence sequences allow efficient aggregation of statistical evidence accumulated during the tracking process. The designed framework is applicable in settings where (some) true labels are revealed after the prediction is performed, or when batches of labels become available in a delayed fashion. We demonstrate the efficacy of the proposed framework through an extensive empirical study on a collection of simulated and real datasets. 
    more » « less
  9. A multiclass classifier is said to be top-label calibrated if the reported probability for the predicted class -- the top-label -- is calibrated, conditioned on the top-label. This conditioning on the top-label is absent in the closely related and popular notion of confidence calibration, which we argue makes confidence calibration difficult to interpret for decision-making. We propose top-label calibration as a rectification of confidence calibration. Further, we outline a multiclass-to-binary (M2B) reduction framework that unifies confidence, top-label, and class-wise calibration, among others. As its name suggests, M2B works by reducing multiclass calibration to numerous binary calibration problems, each of which can be solved using simple binary calibration routines. We instantiate the M2B framework with the well-studied histogram binning (HB) binary calibrator, and prove that the overall procedure is multiclass calibrated without making any assumptions on the underlying data distribution. In an empirical evaluation with four deep net architectures on CIFAR-10 and CIFAR-100, we find that the M2B + HB procedure achieves lower top-label and class-wise calibration error than other approaches such as temperature scaling. 
    more » « less