skip to main content


Search for: All records

Creators/Authors contains: "Barber, Rina Foygel"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract

    Algorithmic stability is a concept from learning theory that expresses the degree to which changes to the input data (e.g. removal of a single data point) may affect the outputs of a regression algorithm. Knowing an algorithm’s stability properties is often useful for many downstream applications—for example, stability is known to lead to desirable generalization properties and predictive inference guarantees. However, many modern algorithms currently used in practice are too complex for a theoretical analysis of their stability properties, and thus we can only attempt to establish these properties through an empirical exploration of the algorithm’s behaviour on various datasets. In this work, we lay out a formal statistical framework for this kind of black-box testing without any assumptions on the algorithm or the data distribution, and establish fundamental bounds on the ability of any black-box test to identify algorithmic stability.

     
    more » « less
  2. Abstract

    Model-X knockoffs is a flexible wrapper method for high-dimensional regression algorithms, which provides guaranteed control of the false discovery rate (FDR). Due to the randomness inherent to the method, different runs of model-X knockoffs on the same dataset often result in different sets of selected variables, which is undesirable in practice. In this article, we introduce a methodology for derandomising model-X knockoffs with provable FDR control. The key insight of our proposed method lies in the discovery that the knockoffs procedure is in essence an e-BH procedure. We make use of this connection and derandomise model-X knockoffs by aggregating the e-values resulting from multiple knockoff realisations. We prove that the derandomised procedure controls the FDR at the desired level, without any additional conditions (in contrast, previously proposed methods for derandomisation are not able to guarantee FDR control). The proposed method is evaluated with numerical experiments, where we find that the derandomised procedure achieves comparable power and dramatically decreased selection variability when compared with model-X knockoffs.

     
    more » « less
  3. Free, publicly-accessible full text available April 1, 2024
  4. Abstract Background

    Spectral CT material decomposition provides quantitative information but is challenged by the instability of the inversion into basis materials. We have previously proposed the constrained One‐Step Spectral CT Image Reconstruction (cOSSCIR) algorithm to stabilize the material decomposition inversion by directly estimating basis material images from spectral CT data. cOSSCIR was previously investigated on phantom data.

    Purpose

    This study investigates the performance of cOSSCIR using head CT datasets acquired on a clinical photon‐counting CT (PCCT) prototype. This is the first investigation of cOSSCIR for large‐scale, anatomically complex, clinical PCCT data. The cOSSCIR decomposition is preceded by a spectrum estimation and nonlinear counts correction calibration step to address nonideal detector effects.

    Methods

    Head CT data were acquired on an early prototype clinical PCCT system using an edge‐on silicon detector with eight energy bins. Calibration data of a step wedge phantom were also acquired and used to train a spectral model to account for the source spectrum and detector spectral response, and also to train a nonlinear counts correction model to account for pulse pileup effects. The cOSSCIR algorithm optimized the bone and adipose basis images directly from the photon counts data, while placing a grouped total variation (TV) constraint on the basis images. For comparison, basis images were also reconstructed by a two‐step projection‐domain approach of Maximum Likelihood Estimation (MLE) for decomposing basis sinograms, followed by filtered backprojection (MLE + FBP) or a TV minimization algorithm (MLE + TVmin) to reconstruct basis images. We hypothesize that the cOSSCIR approach will provide a more stable inversion into basis images compared to two‐step approaches. To investigate this hypothesis, the noise standard deviation in bone and soft‐tissue regions of interest (ROIs) in the reconstructed images were compared between cOSSCIR and the two‐step methods for a range of regularization constraint settings.

    Results

    cOSSCIR reduced the noise standard deviation in the basis images by a factor of two to six compared to that of MLE + TVmin, when both algorithms were constrained to produce images with the same TV. The cOSSCIR images demonstrated qualitatively improved spatial resolution and depiction of fine anatomical detail. The MLE + TVminalgorithm resulted in lower noise standard deviation than cOSSCIR for the virtual monoenergetic images (VMIs) at higher energy levels and constraint settings, while the cOSSCIR VMIs resulted in lower noise standard deviation at lower energy levels and overall higher qualitative spatial resolution. There were no statistically significant differences in the mean values within the bone region of images reconstructed by the studied algorithms. There were statistically significant differences in the mean values within the soft‐tissue region of the reconstructed images, with cOSSCIR producing mean values closer to the expected values.

    Conclusions

    The cOSSCIR algorithm, combined with our previously proposed spectral model estimation and nonlinear counts correction method, successfully estimated bone and adipose basis images from high resolution, large‐scale patient data from a clinical PCCT prototype. The cOSSCIR basis images were able to depict fine anatomical details with a factor of two to six reduction in noise standard deviation compared to that of the MLE + TVmintwo‐step approach.

     
    more » « less
  5. Spatial population genetic data often exhibits ‘isolation-by-distance,’ where genetic similarity tends to decrease as individuals become more geographically distant. The rate at which genetic similarity decays with distance is often spatially heterogeneous due to variable population processes like genetic drift, gene flow, and natural selection. Petkova et al., 2016 developed a statistical method called Estimating Effective Migration Surfaces (EEMS) for visualizing spatially heterogeneous isolation-by-distance on a geographic map. While EEMS is a powerful tool for depicting spatial population structure, it can suffer from slow runtimes. Here, we develop a related method called Fast Estimation of Effective Migration Surfaces (FEEMS). FEEMS uses a Gaussian Markov Random Field model in a penalized likelihood framework that allows for efficient optimization and output of effective migration surfaces. Further, the efficient optimization facilitates the inference of migration parameters per edge in the graph, rather than per node (as in EEMS). With simulations, we show conditions under which FEEMS can accurately recover effective migration surfaces with complex gene-flow histories, including those with anisotropy. We apply FEEMS to population genetic data from North American gray wolves and show it performs favorably in comparison to EEMS, with solutions obtained orders of magnitude faster. Overall, FEEMS expands the ability of users to quickly visualize and interpret spatial structure in their data. 
    more » « less
  6. null (Ed.)