skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reweighting Monte Carlo predictions and automated fragmentation variations in Pythia 8
This work reports on a method for uncertainty estimation in simulated collider-event predictions. The method is based on a Monte Carlo-veto algorithm, and extends previous work on uncertainty estimates in parton showers by including uncertainty estimates for the Lund string-fragmentation model. This method is advantageous from the perspective of simulation costs: a single ensemble of generated events can be reinterpreted as though it was obtained using a different set of input parameters, where each event now is accompanied with a corresponding weight. This allows for a robust exploration of the uncertainties arising from the choice of input model parameters, without the need to rerun full simulation pipelines for each input parameter choice. Such explorations are important when determining the sensitivities of precision physics measurements. Accompanying code is available at https://gitlab.com/uchep/mlhad-weights-validation.  more » « less
Award ID(s):
2209769 2103889 2417682 2411215
PAR ID:
10529151
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
arxiv.org
Date Published:
Journal Name:
SciPost Physics
Volume:
16
Issue:
5
ISSN:
2542-4653
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Discrete-event simulation models generate random variates from input distributions and compute outputs according to the simulation logic. The input distributions are typically fitted to finite real-world data and thus are subject to estimation errors that can propagate to the simulation outputs: an issue commonly known as input uncertainty (IU). This paper investigates quantifying IU using the output confidence intervals (CIs) computed from bootstrap quantile estimators. The standard direct bootstrap method has overcoverage due to convolution of the simulation error and IU; however, the brute-force way of washing away the former is computationally demanding. We present two new bootstrap methods to enhance direct resampling in both statistical and computational efficiencies using shrinkage strategies to down-scale the variabilities encapsulated in the CIs. Our asymptotic analysis shows how both approaches produce tight CIs accounting for IU under limited input data and simulation effort along with the simulation sample-size requirements relative to the input data size. We demonstrate performances of the shrinkage strategies with several numerical experiments and investigate the conditions under which each method performs well. We also show advantages of nonparametric approaches over parametric bootstrap when the distribution family is misspecified and over metamodel approaches when the dimension of the distribution parameters is high. History: Accepted by Bruno Tuffin, Area Editor for Simulation. Funding: This work was supported by the National Science Foundation [CAREER CMMI-1834710, CAREER CMMI-2045400, DMS-1854659, and IIS-1849280]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2022.0044 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2022.0044 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ . 
    more » « less
  2. Abstract Estimating uncertainty in flood model predictions is important for many applications, including risk assessment and flood forecasting. We focus on uncertainty in physics‐based urban flooding models. We consider the effects of the model's complexity and uncertainty in key input parameters. The effect of rainfall intensity on the uncertainty in water depth predictions is also studied. As a test study, we choose the Interconnected Channel and Pond Routing (ICPR) model of a part of the city of Minneapolis. The uncertainty in the ICPR model's predictions of the floodwater depth is quantified in terms of the ensemble variance using the multilevel Monte Carlo (MC) simulation method. Our results show that uncertainties in the studied domain are highly localized. Model simplifications, such as disregarding the groundwater flow, lead to overly confident predictions, that is, predictions that are both less accurate and uncertain than those of the more complex model. We find that for the same number of uncertain parameters, increasing the model resolution reduces uncertainty in the model predictions (and increases the MC method's computational cost). We employ the multilevel MC method to reduce the cost of estimating uncertainty in a high‐resolution ICPR model. Finally, we use the ensemble estimates of the mean and covariance of the flood depth for real‐time flood depth forecasting using the physics‐informed Gaussian process regression method. We show that even with few measurements, the proposed framework results in a more accurate forecast than that provided by the mean prediction of the ICPR model. 
    more » « less
  3. We consider a simulation-based ranking and selection (R&S) problem with input uncertainty, in which unknown input distributions can be estimated using input data arriving in batches of varying sizes over time. Each time a batch arrives, additional simulations can be run using updated input distribution estimates. The goal is to confidently identify the best design after collecting as few batches as possible. We first introduce a moving average estimator for aggregating simulation outputs generated under heterogenous input distributions. Then, based on a sequential elimination framework, we devise two major R&S procedures by establishing exact and asymptotic confidence bands for the estimator. We also extend our procedures to the indifference zone setting, which helps save simulation effort for practical usage. Numerical results show the effectiveness and necessity of our procedures in controlling error from input uncertainty. Moreover, the efficiency can be further boosted through optimizing the “drop rate” parameter, which is the proportion of past simulation outputs to discard, of the moving average estimator. 
    more » « less
  4. Abstract The heavy‐tailed behavior of the generalized extreme‐value distribution makes it a popular choice for modeling extreme events such as floods, droughts, heatwaves, wildfires and so forth. However, estimating the distribution's parameters using conventional maximum likelihood methods can be computationally intensive, even for moderate‐sized datasets. To overcome this limitation, we propose a computationally efficient, likelihood‐free estimation method utilizing a neural network. Through an extensive simulation study, we demonstrate that the proposed neural network‐based method provides generalized extreme value distribution parameter estimates with comparable accuracy to the conventional maximum likelihood method but with a significant computational speedup. To account for estimation uncertainty, we utilize parametric bootstrapping, which is inherent in the trained network. Finally, we apply this method to 1000‐year annual maximum temperature data from the Community Climate System Model version 3 across North America for three atmospheric concentrations: 289 ppm (pre‐industrial), 700 ppm (future conditions), and 1400 ppm , and compare the results with those obtained using the maximum likelihood approach. 
    more » « less
  5. Random parameter logit models address unobserved preference heterogeneity in discrete choice analysis. The latent class logit model assumes a discrete heterogeneity distribution, by combining a conditional logit model of economic choices with a multinomial logit (MNL) for stochastic assignment to classes. Whereas point estimation of latent class logit models is widely applied in practice, stochastic assignment of individuals to classes needs further analysis. In this paper we analyze the statistical behavior of six competing class assignment strategies, namely: maximum prior MNL probabilities, class drawn from prior MNL probabilities, maximum posterior assignment, drawn posterior assignment, conditional individual-specific estimates, and conditional individual estimates combined with the Krinsky–Robb method to account for uncertainty. Using both a Monte Carlo study and two empirical case studies, we show that assigning individuals to classes based on maximum MNL probabilities behaves better than randomly drawn classes in market share predictions. However, randomly drawn classes have higher accuracy in predicted class shares. Finally, class assignment based on individual-level conditional estimates that account for the sampling distribution of the assignment parameters shows superior behavior for a larger number of choice occasions per individual. 
    more » « less