skip to main content

Title: An Error Analysis Toolkit for Binned Counting Experiments
We introduce the MINERvA Analysis Toolkit (MAT), a utility for centralizing the handling of systematic uncertainties in HEP analyses. The fundamental utilities of the toolkit are the MnvHnD, a powerful histogram container class, and the systematic Universe classes, which provide a modular implementation of the many universe error analysis approach. These products can be used stand-alone or as part of a complete error analysis prescription. They support the propagation of systematic uncertainty through all stages of analysis, and provide flexibility for an arbitrary level of user customization. This extensible solution to error analysis enables the standardization of systematic uncertainty definitions across an experiment and a transparent user interface to lower the barrier to entry for new analyzers.
Authors:
; ;
Editors:
Biscarat, C.; Campana, S.; Hegner, B.; Roiser, S.; Rovelli, C.I.; Stewart, G.A.
Award ID(s):
1806849
Publication Date:
NSF-PAR ID:
10356300
Journal Name:
EPJ Web of Conferences
Volume:
251
Page Range or eLocation-ID:
03046
ISSN:
2100-014X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Satellite precipitation products, as all quantitative estimates, come with some inherent degree of uncertainty. To associate a quantitative value of the uncertainty to each individual estimate, error modeling is necessary. Most of the error models proposed so far compute the uncertainty as a function of precipitation intensity only, and only at one specific spatiotemporal scale. We propose a spectral error model that accounts for the neighboring space–time dynamics of precipitation into the uncertainty quantification. Systematic distortions of the precipitation signal and random errors are characterized distinctively in every frequency–wavenumber band in the Fourier domain, to accurately characterize error across scales. The systematic distortions are represented as a deterministic space–time linear filtering term. The random errors are represented as a nonstationary additive noise. The spectral error model is applied to the IMERG multisatellite precipitation product, and its parameters are estimated empirically through a system identification approach using the GV-MRMS gauge–radar measurements as reference (“truth”) over the eastern United States. The filtering term is found to be essentially low-pass (attenuating the fine-scale variability). While traditional error models attribute most of the error variance to random errors, it is found here that the systematic filtering term explains 48% of the error variancemore »at the native resolution of IMERG. This fact confirms that, at high resolution, filtering effects in satellite precipitation products cannot be ignored, and that the error cannot be represented as a purely random additive or multiplicative term. An important consequence is that precipitation estimates derived from different sources shall not be expected to automatically have statistically independent errors.

    Significance Statement

    Satellite precipitation products are nowadays widely used for climate and environmental research, water management, risk analysis, and decision support at the local, regional, and global scales. For all these applications, knowledge about the accuracy of the products is critical for their usability. However, products are not systematically provided with a quantitative measure of the uncertainty associated with each individual estimate. Various parametric error models have been proposed for uncertainty quantification, mostly assuming that the uncertainty is only a function of the precipitation intensity at the pixel and time of interest. By projecting satellite precipitation fields and their retrieval errors into the Fourier frequency–wavenumber domain, we show that we can explicitly take into account the neighboring space–time multiscale dynamics of precipitation and compute a scale-dependent uncertainty.

    « less
  2. ABSTRACT

    We evaluate the consistency between lensing and clustering based on measurements from Baryon Oscillation Spectroscopic Survey combined with galaxy–galaxy lensing from Dark Energy Survey (DES) Year 3, Hyper Suprime-Cam Subaru Strategic Program (HSC) Year 1, and Kilo-Degree Survey (KiDS)-1000. We find good agreement between these lensing data sets. We model the observations using the Dark Emulator and fit the data at two fixed cosmologies: Planck (S8 = 0.83), and a Lensing cosmology (S8 = 0.76). For a joint analysis limited to large scales, we find that both cosmologies provide an acceptable fit to the data. Full utilization of the higher signal-to-noise small-scale measurements is hindered by uncertainty in the impact of baryon feedback and assembly bias, which we account for with a reasoned theoretical error budget. We incorporate a systematic inconsistency parameter for each redshift bin, A, that decouples the lensing and clustering. With a wide range of scales, we find different results for the consistency between the two cosmologies. Limiting the analysis to the bins for which the impact of the lens sample selection is expected to be minimal, for the Lensing cosmology, the measurements are consistent with A = 1; A = 0.91 ± 0.04 (A = 0.97 ± 0.06) using DES+KiDS (HSC). For the Planck case,more »we find a discrepancy: A = 0.79 ± 0.03 (A = 0.84 ± 0.05) using DES+KiDS (HSC). We demonstrate that a kinematic Sunyaev–Zeldovich-based estimate for baryonic effects alleviates some of the discrepancy in the Planck cosmology. This analysis demonstrates the statistical power of small-scale measurements; however, caution is still warranted given modelling uncertainties and foreground sample selection effects.

    « less
  3. Abstract Cosmological analyses of samples of photometrically-identified type Ia supernovae (SNe Ia) depend on understanding the effects of ‘contamination’ from core-collapse and peculiar SN Ia events. We employ a rigorous analysis using the photometric classifier SuperNNova on state-of-the-art simulations of SN samples to determine cosmological biases due to such ‘non-Ia’ contamination in the Dark Energy Survey (DES) 5-year SN sample. Depending on the non-Ia SN models used in the SuperNNova training and testing samples, contamination ranges from 0.8–3.5 per cent, with a classification efficiency of 97.7–99.5 per cent. Using the Bayesian Estimation Applied to Multiple Species (BEAMS) framework and its extension BBC (‘BEAMS with Bias Correction’), we produce a redshift-binned Hubble diagram marginalised over contamination and corrected for selection effects, and use it to constrain the dark energy equation-of-state, w. Assuming a flat universe with Gaussian ΩM prior of 0.311 ± 0.010, we show that biases on w are <0.008 when using SuperNNova, with systematic uncertainties associated with contamination around 10 per cent of the statistical uncertainty on w for the DES-SN sample. An alternative approach of discarding contaminants using outlier rejection techniques (e.g., Chauvenet’s criterion) in place of SuperNNova leads to biases on w that are larger but stillmore »modest (0.015–0.03). Finally, we measure biases due to contamination on w0 and wa (assuming a flat universe), and find these to be <0.009 in w0 and <0.108 in wa, 5 to 10 times smaller than the statistical uncertainties for the DES-SN sample.« less
  4. Sparse tensor factorization is a popular tool in multi-way data analysis and is used in applications such as cybersecurity, recommender systems, and social network analysis. In many of these applications, the tensor is not known a priori and instead arrives in a streaming fashion for a potentially unbounded amount of time. Existing approaches for streaming sparse tensors are not practical for unbounded streaming because they rely on maintaining the full factorization of the data, which grows linearly with time. In this work, we present CP-stream, an algorithm for streaming factorization in the model of the canonical polyadic decomposition which does not grow linearly in time or space, and is thus practical for long-term streaming. Additionally, CP-stream incorporates user-specified constraints such as non-negativity which aid in the stability and interpretability of the factorization. An evaluation of CP-stream demonstrates that it converges faster than state-of-the-art streaming algorithms while achieving lower reconstruction error by an order of magnitude. We also evaluate it on real-world sparse datasets and demonstrate its usability in both network traffic analysis and discussion tracking. Our evaluation uses exclusively public datasets and our source code is released to the public as part of SPLATT, an open source high-performance tensor factorizationmore »toolkit.« less
  5. Abstract We perform the first simultaneous Bayesian parameter inference and optimal reconstruction of the gravitational lensing of the cosmic microwave background (CMB), using 100 deg 2 of polarization observations from the SPTpol receiver on the South Pole Telescope. These data reach noise levels as low as 5.8 μ K arcmin in polarization, which are low enough that the typically used quadratic estimator (QE) technique for analyzing CMB lensing is significantly suboptimal. Conversely, the Bayesian procedure extracts all lensing information from the data and is optimal at any noise level. We infer the amplitude of the gravitational lensing potential to be A ϕ = 0.949 ± 0.122 using the Bayesian pipeline, consistent with our QE pipeline result, but with 17% smaller error bars. The Bayesian analysis also provides a simple way to account for systematic uncertainties, performing a similar job as frequentist “bias hardening” or linear bias correction, and reducing the systematic uncertainty on A ϕ due to polarization calibration from almost half of the statistical error to effectively zero. Finally, we jointly constrain A ϕ along with A L , the amplitude of lensing-like effects on the CMB power spectra, demonstrating that the Bayesian method can be used to easilymore »infer parameters both from an optimal lensing reconstruction and from the delensed CMB, while exactly accounting for the correlation between the two. These results demonstrate the feasibility of the Bayesian approach on real data, and pave the way for future analysis of deep CMB polarization measurements with SPT-3G, Simons Observatory, and CMB-S4, where improvements relative to the QE can reach 1.5 times tighter constraints on A ϕ and seven times lower effective lensing reconstruction noise.« less