skip to main content

Title: Sparse confidence sets for normal mean models

In this paper, we propose a new framework to construct confidence sets for a $d$-dimensional unknown sparse parameter ${\boldsymbol \theta }$ under the normal mean model ${\boldsymbol X}\sim N({\boldsymbol \theta },\sigma ^{2}\bf{I})$. A key feature of the proposed confidence set is its capability to account for the sparsity of ${\boldsymbol \theta }$, thus named as sparse confidence set. This is in sharp contrast with the classical methods, such as the Bonferroni confidence intervals and other resampling-based procedures, where the sparsity of ${\boldsymbol \theta }$ is often ignored. Specifically, we require the desired sparse confidence set to satisfy the following two conditions: (i) uniformly over the parameter space, the coverage probability for ${\boldsymbol \theta }$ is above a pre-specified level; (ii) there exists a random subset $S$ of $\{1,...,d\}$ such that $S$ guarantees the pre-specified true negative rate for detecting non-zero $\theta _{j}$’s. To exploit the sparsity of ${\boldsymbol \theta }$, we allow the confidence interval for $\theta _{j}$ to degenerate to a single point 0 for any $j\notin S$. Under this new framework, we first consider whether there exist sparse confidence sets that satisfy the above two conditions. To address this question, we establish a non-asymptotic minimax lower bound for the non-coverage probability over a suitable class of sparse confidence sets. The lower bound deciphers the role of sparsity and minimum signal-to-noise ratio (SNR) in the construction of sparse confidence sets. Furthermore, under suitable conditions on the SNR, a two-stage procedure is proposed to construct a sparse confidence set. To evaluate the optimality, the proposed sparse confidence set is shown to attain a minimax lower bound of some properly defined risk function up to a constant factor. Finally, we develop an adaptive procedure to the unknown sparsity. Numerical studies are conducted to verify the theoretical results.

more » « less
Award ID(s):
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Page Range / eLocation ID:
p. 1193-1247
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Dalalyan, Aynak (Ed.)
    Topic models have become popular tools for dimension reduction and exploratory analysis of text data which consists in observed frequencies of a vocabulary of p words in n documents, stored in a p×n matrix. The main premise is that the mean of this data matrix can be factorized into a product of two non-negative matrices: a p×K word-topic matrix A and a K×n topic-document matrix W. This paper studies the estimation of A that is possibly element-wise sparse, and the number of topics K is unknown. In this under-explored context, we derive a new minimax lower bound for the estimation of such A and propose a new computationally efficient algorithm for its recovery. We derive a finite sample upper bound for our estimator, and show that it matches the minimax lower bound in many scenarios. Our estimate adapts to the unknown sparsity of A and our analysis is valid for any finite n, p, K and document lengths. Empirical results on both synthetic data and semi-synthetic data show that our proposed estimator is a strong competitor of the existing state-of-the-art algorithms for both non-sparse A and sparse A, and has superior performance is many scenarios of interest. 
    more » « less
  2. Low-rank matrix recovery is a fundamental problem in machine learning with numerous applications. In practice, the problem can be solved by convex optimization namely nuclear norm minimization, or by non-convex optimization as it is well-known that for low-rank matrix problems like matrix sensing and matrix completion, all local optima of the natural non-convex objectives are also globally optimal under certain ideal assumptions. In this paper, we study new approaches for matrix sensing in a semi-random model where an adversary can add any number of arbitrary sensing matrices. More precisely, the problem is to recover a low-rank matrix $X^\star$ from linear measurements $b_i = \langle A_i, X^\star \rangle$, where an unknown subset of the sensing matrices satisfies the Restricted Isometry Property (RIP) and the rest of the $A_i$'s are chosen adversarially. It is known that in the semi-random model, existing non-convex objectives can have bad local optima. To fix this, we present a descent-style algorithm that provably recovers the ground-truth matrix $X^\star$. For the closely-related problem of semi-random matrix completion, prior work [CG18] showed that all bad local optima can be eliminated by reweighting the input data. However, the analogous approach for matrix sensing requires reweighting a set of matrices to satisfy RIP, which is a condition that is NP-hard to check. Instead, we build on the framework proposed in [KLL$^+$23] for semi-random sparse linear regression, where the algorithm in each iteration reweights the input based on the current solution, and then takes a weighted gradient step that is guaranteed to work well locally. Our analysis crucially exploits the connection between sparsity in vector problems and low-rankness in matrix problems, which may have other applications in obtaining robust algorithms for sparse and low-rank problems. 
    more » « less
  3. Summary

    The upper bounds on the coverage probabilities of the confidence regions based on blockwise empirical likelihood and non-standard expansive empirical likelihood methods for time series data are investigated via studying the probability of violating the convex hull constraint. The large sample bounds are derived on the basis of the pivotal limit of the blockwise empirical log-likelihood ratio obtained under fixed b asymptotics, which has recently been shown to provide a more accurate approximation to the finite sample distribution than the conventional χ2-approximation. Our theoretical and numerical findings suggest that both the finite sample and the large sample upper bounds for coverage probabilities are strictly less than 1 and the blockwise empirical likelihood confidence region can exhibit serious undercoverage when the dimension of moment conditions is moderate or large, the time series dependence is positively strong or the block size is large relative to the sample size. A similar finite sample coverage problem occurs for non-standard expansive empirical likelihood. To alleviate the coverage bound problem, we propose to penalize both empirical likelihood methods by relaxing the convex hull constraint. Numerical simulations and data illustrations demonstrate the effectiveness of our proposed remedies in terms of delivering confidence sets with more accurate coverage. Some technical details and additional simulation results are included in on-line supplemental material.

    more » « less
  4. null (Ed.)
    Driven by a wide range of applications, several principal subspace estimation problems have been studied individually under different structural constraints. This paper presents a uni- fied framework for the statistical analysis of a general structured principal subspace estima- tion problem which includes as special cases sparse PCA/SVD, non-negative PCA/SVD, subspace constrained PCA/SVD, and spectral clustering. General minimax lower and up- per bounds are established to characterize the interplay between the information-geometric complexity of the constraint set for the principal subspaces, the signal-to-noise ratio (SNR), and the dimensionality. The results yield interesting phase transition phenomena concern- ing the rates of convergence as a function of the SNRs and the fundamental limit for consistent estimation. Applying the general results to the specific settings yields the mini- max rates of convergence for those problems, including the previous unknown optimal rates for sparse SVD, non-negative PCA/SVD and subspace constrained PCA/SVD. 
    more » « less
  5. Abstract

    Predicting sets of outcomes—instead of unique outcomes—is a promising solution to uncertainty quantification in statistical learning. Despite a rich literature on constructing prediction sets with statistical guarantees, adapting to unknown covariate shift—a prevalent issue in practice—poses a serious unsolved challenge. In this article, we show that prediction sets with finite-sample coverage guarantee are uninformative and propose a novel flexible distribution-free method, PredSet-1Step, to efficiently construct prediction sets with an asymptotic coverage guarantee under unknown covariate shift. We formally show that our method is asymptotically probably approximately correct, having well-calibrated coverage error with high confidence for large samples. We illustrate that it achieves nominal coverage in a number of experiments and a data set concerning HIV risk prediction in a South African cohort study. Our theory hinges on a new bound for the convergence rate of the coverage of Wald confidence intervals based on general asymptotically linear estimators.

    more » « less