skip to main content


Search for: All records

Creators/Authors contains: "Ipsen, Ilse"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available March 31, 2024
  2. Abstract Linear regression is a classic method of data analysis. In recent years, sketching—a method of dimension reduction using random sampling, random projections or both—has gained popularity as an effective computational approximation when the number of observations greatly exceeds the number of variables. In this paper, we address the following question: how does sketching affect the statistical properties of the solution and key quantities derived from it? To answer this question, we present a projector-based approach to sketched linear regression that is exact and that requires minimal assumptions on the sketching matrix. Therefore, downstream analyses hold exactly and generally for all sketching schemes. Additionally, a projector-based approach enables derivation of key quantities from classic linear regression that account for the combined model- and algorithm-induced uncertainties. We demonstrate the usefulness of a projector-based approach in quantifying and enabling insight on excess uncertainties and bias-variance decompositions for sketched linear regression. Finally, we demonstrate how the insights from our projector-based analyses can be used to produce practical sketching diagnostics to aid the design of judicious sketching schemes. 
    more » « less
  3. The explosion of biobank data offers unprecedented opportunities for gene-environment interaction (GxE) studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in G×E assessment, especially for set-based G×E variance component (VC) tests, which are a widely used strategy to boost overall G×E signals and to evaluate the joint G×E effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a S calable E xact A l G orithm for L arge-scale set-based G× E tests, to permit G×E VC tests for biobank-scale data. SEAGLE employs modern matrix computations to calculate the test statistic and p -value of the GxE VC test in a computationally efficient fashion, without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of 10 5 , is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate the performance of SEAGLE using extensive simulations. We illustrate its utility by conducting genome-wide gene-based G×E analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index. 
    more » « less
  4. This paper presents a probabilistic perspective on iterative methods for approximating the solution x in R^d of a nonsingular linear system Ax = b. Classically, an iterative method produces a sequence x_m of approximations that converge to x in R^d. Our approach, instead, lifts a standard iterative method to act on the set of probability distributions, P(Rd), outputting a sequence of probability distributions  mu_m in P(Rd). The output of a probabilistic iterative method can provide both a “best guess” for x, for example by taking the mean of  mu_m, and also probabilistic uncertainty quantification for the value of x when it has not been exactly determined. A comprehensive theoretical treatment is presented in the case of a stationary linear iterative method, where we characterise both the rate of contraction of  mu_m to an atomic measure on x and the nature of the uncertainty quantification being provided. We conclude with an empirical illustration that highlights the potential for probabilistic iterative methods to provide insight into solution uncertainty. 
    more » « less
  5. This paper presents a probabilistic perspective on iterative methods for approximating the solution x in R^d of a nonsingular linear system Ax = b. Classically, an iterative method produces a sequence x_m of approximations that converge to x in R^d. Our approach, instead, lifts a standard iterative method to act on the set of probability distributions, P(Rd), outputting a sequence of probability distributions  mu_m in P(Rd). The output of a probabilistic iterative method can provide both a “best guess” for x, for example by taking the mean of mu_m, and also probabilistic uncertainty quantification for the value of x when it has not been exactly determined. A comprehensive theoretical treatment is presented in the case of a stationary linear iterative method, where we characterise both the rate of contraction of  mu_m to an atomic measure on x and the nature of the uncertainty quantification being provided. We conclude with an empirical illustration that highlights the potential for probabilistic iterative methods to provide insight into solution uncertainty. 
    more » « less