skip to main content


Title: Uncertainty quantification for wide-bin unfolding: one-at-a-time strict bounds and prior-optimized confidence intervals
Abstract Unfolding is an ill-posed inverse problem in particle physics aiming to infer a true particle-level spectrum from smeared detector-level data. For computational and practical reasons, these spaces are typically discretized using histograms, and the smearing is modeled through a response matrix corresponding to a discretized smearing kernel of the particle detector. This response matrix depends on the unknown shape of the true spectrum, leading to a fundamental systematic uncertainty in the unfolding problem. To handle the ill-posed nature of the problem, common approaches regularize the problem either directly via methods such as Tikhonov regularization, or implicitly by using wide-bins in the true space that match the resolution of the detector. Unfortunately, both of these methods lead to a non-trivial bias in the unfolded estimator, thereby hampering frequentist coverage guarantees for confidence intervals constructed from these methods. We propose two new approaches to addressing the bias in the wide-bin setting through methods called One-at-a-time Strict Bounds (OSB) and Prior-Optimized (PO) intervals. The OSB intervals are a bin-wise modification of an existing guaranteed-coverage procedure, while the PO intervals are based on a decision-theoretic view of the problem. Importantly, both approaches provide well-calibrated frequentist confidence intervals even in constrained and rank-deficient settings. These methods are built upon a more general answer to the wide-bin bias problem, involving unfolding with fine bins first, followed by constructing confidence intervals for linear functionals of the fine-bin counts. We test and compare these methods to other available methodologies in a wide-bin deconvolution example and a realistic particle physics simulation of unfolding a steeply falling particle spectrum.  more » « less
Award ID(s):
2053804 2020295
NSF-PAR ID:
10430508
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Instrumentation
Volume:
17
Issue:
10
ISSN:
1748-0221
Page Range / eLocation ID:
P10013
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We develop a simple Quantile Spacing (QS) method for accurate probabilistic estimation of one-dimensional entropy from equiprobable random samples, and compare it with the popular Bin-Counting (BC) and Kernel Density (KD) methods. In contrast to BC, which uses equal-width bins with varying probability mass, the QS method uses estimates of the quantiles that divide the support of the data generating probability density function (pdf) into equal-probability-mass intervals. And, whereas BC and KD each require optimal tuning of a hyper-parameter whose value varies with sample size and shape of the pdf, QS only requires specification of the number of quantiles to be used. Results indicate, for the class of distributions tested, that the optimal number of quantiles is a fixed fraction of the sample size (empirically determined to be ~0.25–0.35), and that this value is relatively insensitive to distributional form or sample size. This provides a clear advantage over BC and KD since hyper-parameter tuning is not required. Further, unlike KD, there is no need to select an appropriate kernel-type, and so QS is applicable to pdfs of arbitrary shape, including those with discontinuous slope and/or magnitude. Bootstrapping is used to approximate the sampling variability distribution of the resulting entropy estimate, and is shown to accurately reflect the true uncertainty. For the four distributional forms studied (Gaussian, Log-Normal, Exponential and Bimodal Gaussian Mixture), expected estimation bias is less than 1% and uncertainty is low even for samples of as few as 100 data points; in contrast, for KD the small sample bias can be as large as −10% and for BC as large as −50%. We speculate that estimating quantile locations, rather than bin-probabilities, results in more efficient use of the information in the data to approximate the underlying shape of an unknown data generating pdf. 
    more » « less
  2. Energy-resolving photon-counting detectors (PCDs) separate photons from a polychromatic X-ray source into a number of separate energy bins. This spectral information from PCDs would allow advancements in X-ray imaging, such as improving image contrast, quantitative imaging, and material identification and characterization. However, aspects like detector spectral distortions and scattered photons from the object can impede these advantages if left unaccounted for. Scattered X-ray photons act as noise in an image and reduce image contrast, thereby significantly hindering PCD utility. In this paper, we explore and outline several important characteristics of spectral X-ray scatter with examples of soft-material imaging (such as cancer imaging in mammography or explosives detection in airport security). Our results showed critical spectral signatures of scattered photons that depend on a few adjustable experimental factors. Additionally, energy bins over a large portion of the spectrum exhibit lower scatter-to-primary ratio in comparison to what would be expected when using a conventional energy-integrating detector. These important findings allow flexible choice of scatter-correction methods and energy-bin utilization when using PCDs. Our findings also propel the development of efficient spectral X-ray scatter correction methods for a wide range of PCD-based applications. 
    more » « less
  3. Abstract

    Despite the wide application of meta‐analysis in ecology, some of the traditional methods used for meta‐analysis may not perform well given the type of data characteristic of ecological meta‐analyses.

    We reviewed published meta‐analyses on the ecological impacts of global climate change, evaluating the number of replicates used in the primary studies (ni) and the number of studies or records (k) that were aggregated to calculate a mean effect size. We used the results of the review in a simulation experiment to assess the performance of conventional frequentist and Bayesian meta‐analysis methods for estimating a mean effect size and its uncertainty interval.

    Our literature review showed thatniandkwere highly variable, distributions were right‐skewed and were generally small (medianni = 5, mediank = 44). Our simulations show that the choice of method for calculating uncertainty intervals was critical for obtaining appropriate coverage (close to the nominal value of 0.95). Whenkwas low (<40), 95% coverage was achieved by a confidence interval (CI) based on thetdistribution that uses an adjusted standard error (the Hartung–Knapp–Sidik–Jonkman, HKSJ), or by a Bayesian credible interval, whereas bootstrap orzdistribution CIs had lower coverage. Despite the importance of the method to calculate the uncertainty interval, 39% of the meta‐analyses reviewed did not report the method used, and of the 61% that did, 94% used a potentially problematic method, which may be a consequence of software defaults.

    In general, for a simple random‐effects meta‐analysis, the performance of the best frequentist and Bayesian methods was similar for the same combinations of factors (kand mean replication), though the Bayesian approach had higher than nominal (>95%) coverage for the mean effect whenkwas very low (k < 15). Our literature review suggests that many meta‐analyses that usedzdistribution or bootstrapping CIs may have overestimated the statistical significance of their results when the number of studies was low; more appropriate methods need to be adopted in ecological meta‐analyses.

     
    more » « less
  4. Abstract

    Modeling and drawing inference on the joint associations between single‐nucleotide polymorphisms and a disease has sparked interest in genome‐wide associations studies. In the motivating Boston Lung Cancer Survival Cohort (BLCSC) data, the presence of a large number of single nucleotide polymorphisms of interest, though smaller than the sample size, challenges inference on their joint associations with the disease outcome. In similar settings, we find that neither the debiased lasso approach (van de Geer et al., 2014), which assumes sparsity on the inverse information matrix, nor the standard maximum likelihood method can yield confidence intervals with satisfactory coverage probabilities for generalized linear models. Under this “largen, divergingp” scenario, we propose an alternative debiased lasso approach by directly inverting the Hessian matrix without imposing the matrix sparsity assumption, which further reduces bias compared to the original debiased lasso and ensures valid confidence intervals with nominal coverage probabilities. We establish the asymptotic distributions of any linear combinations of the parameter estimates, which lays the theoretical ground for drawing inference. Simulations show that the proposedrefineddebiased estimating method performs well in removing bias and yields honest confidence interval coverage. We use the proposed method to analyze the aforementioned BLCSC data, a large‐scale hospital‐based epidemiology cohort study investigating the joint effects of genetic variants on lung cancer risks.

     
    more » « less
  5. Abstract Background

    Spectral CT material decomposition provides quantitative information but is challenged by the instability of the inversion into basis materials. We have previously proposed the constrained One‐Step Spectral CT Image Reconstruction (cOSSCIR) algorithm to stabilize the material decomposition inversion by directly estimating basis material images from spectral CT data. cOSSCIR was previously investigated on phantom data.

    Purpose

    This study investigates the performance of cOSSCIR using head CT datasets acquired on a clinical photon‐counting CT (PCCT) prototype. This is the first investigation of cOSSCIR for large‐scale, anatomically complex, clinical PCCT data. The cOSSCIR decomposition is preceded by a spectrum estimation and nonlinear counts correction calibration step to address nonideal detector effects.

    Methods

    Head CT data were acquired on an early prototype clinical PCCT system using an edge‐on silicon detector with eight energy bins. Calibration data of a step wedge phantom were also acquired and used to train a spectral model to account for the source spectrum and detector spectral response, and also to train a nonlinear counts correction model to account for pulse pileup effects. The cOSSCIR algorithm optimized the bone and adipose basis images directly from the photon counts data, while placing a grouped total variation (TV) constraint on the basis images. For comparison, basis images were also reconstructed by a two‐step projection‐domain approach of Maximum Likelihood Estimation (MLE) for decomposing basis sinograms, followed by filtered backprojection (MLE + FBP) or a TV minimization algorithm (MLE + TVmin) to reconstruct basis images. We hypothesize that the cOSSCIR approach will provide a more stable inversion into basis images compared to two‐step approaches. To investigate this hypothesis, the noise standard deviation in bone and soft‐tissue regions of interest (ROIs) in the reconstructed images were compared between cOSSCIR and the two‐step methods for a range of regularization constraint settings.

    Results

    cOSSCIR reduced the noise standard deviation in the basis images by a factor of two to six compared to that of MLE + TVmin, when both algorithms were constrained to produce images with the same TV. The cOSSCIR images demonstrated qualitatively improved spatial resolution and depiction of fine anatomical detail. The MLE + TVminalgorithm resulted in lower noise standard deviation than cOSSCIR for the virtual monoenergetic images (VMIs) at higher energy levels and constraint settings, while the cOSSCIR VMIs resulted in lower noise standard deviation at lower energy levels and overall higher qualitative spatial resolution. There were no statistically significant differences in the mean values within the bone region of images reconstructed by the studied algorithms. There were statistically significant differences in the mean values within the soft‐tissue region of the reconstructed images, with cOSSCIR producing mean values closer to the expected values.

    Conclusions

    The cOSSCIR algorithm, combined with our previously proposed spectral model estimation and nonlinear counts correction method, successfully estimated bone and adipose basis images from high resolution, large‐scale patient data from a clinical PCCT prototype. The cOSSCIR basis images were able to depict fine anatomical details with a factor of two to six reduction in noise standard deviation compared to that of the MLE + TVmintwo‐step approach.

     
    more » « less