Estimation and inference in statistics pose significant challenges when data are collected adaptively. Even in linear models, the Ordinary Least Squares (OLS) estimator may fail to exhibit asymptotic normality for single coordinate estimation and have inflated error. This issue is highlighted by a recent minimax lower bound, which shows that the error of estimating a single coordinate can be enlarged by a multiple of $$\sqrt{d}$$ when data are allowed to be arbitrarily adaptive, compared with the case when they are i.i.d. Our work explores this striking difference in estimation performance between utilizing i.i.d. and adaptive data. We investigate how the degree of adaptivity in data collection impacts the performance of estimating a low-dimensional parameter component in high-dimensional linear models. We identify conditions on the data collection mechanism under which the estimation error for a low-dimensional parameter component matches its counterpart in the i.i.d. setting, up to a factor that depends on the degree of adaptivity. We show that OLS or OLS on centered data can achieve this matching error. In addition, we propose a novel estimator for single coordinate inference via solving a Two-stage Adaptive Linear Estimating equation (TALE). Under a weaker form of adaptivity in data collection, we establish an asymptotic normality property of the proposed estimator. 
                        more » 
                        « less   
                    
                            
                            Minimax estimation of low-rank quantum states and their linear functionals
                        
                    
    
            In classical statistics, a well known paradigm consists in establishing asymptotic equivalence between an experiment of i.i.d. observations and a Gaussian shift experiment, with the aim of obtaining optimal estimators in the former complicated model from the latter simpler model. In particular, a statistical experiment consisting of n i.i.d. observations from d-dimensional multinomial distributions can be well approximated by an experiment consisting of d − 1 dimensional Gaussian distributions. In a quantum version of the result, it has been shown that a collection of n qudits (d-dimensional quantum states) of full rank can be well approximated by a quantum system containing a classical part, which is a d − 1 dimensional Gaussian distribution, and a quantum part containing an ensemble of d(d − 1)/2 shifted thermal states. In this paper, we obtain a generalization of this result when the qudits are not of full rank. We show that when the rank of the qudits is r, then the limiting experiment consists of an r − 1 dimensional Gaussian distribution and an ensemble of both shifted pure and shifted thermal states. For estimation purposes, we establish an asymptotic minimax result in the limiting Gaussian model. Analogous results are then obtained for estimation of a low rank qudit from an ensemble of identically prepared, independent quantum systems, using the local asymptotic equivalence result. We also consider the problem of estimation of a linear functional of the quantum state. We construct an estimator for the functional, analyze the risk and use quantum local asymptotic equivalence to show that our estimator is also optimal in the minimax sense. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1915884
- PAR ID:
- 10487185
- Publisher / Repository:
- Bernoulli Society
- Date Published:
- Journal Name:
- Bernoulli
- Volume:
- 30
- Issue:
- 1
- ISSN:
- 1350-7265
- Subject(s) / Keyword(s):
- Functional estimation low rank states quantum local asymptotic normality quantum minimax estimation
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            In this paper, we develop a novel procedure for low-rank tensor regression, namely Importance Sketching Low-rank Estimation for Tensors (ISLET). The central idea behind ISLET is importance sketching, i.e., carefully designed sketches based on both the responses and low-dimensional structure of the parameter of interest. We show that the proposed method is sharply minimax optimal in terms of the mean-squared error under low-rank Tucker assumptions and under the randomized Gaussian ensemble design. In addition, if a tensor is low-rank with group sparsity, our procedure also achieves minimax optimality. Further, we show through numerical study that ISLET achieves comparable or better mean-squared error performance to existing state-of-the-art methods while having substantial storage and run-time advantages including capabilities for parallel and distributed computing. In particular, our procedure performs reliable estimation with tensors of dimension $p = O(10^8)$ and is 1 or 2 orders of magnitude faster than baseline methods.more » « less
- 
            null (Ed.)Abstract Estimating the mean of a probability distribution using i.i.d. samples is a classical problem in statistics, wherein finite-sample optimal estimators are sought under various distributional assumptions. In this paper, we consider the problem of mean estimation when independent samples are drawn from $$d$$-dimensional non-identical distributions possessing a common mean. When the distributions are radially symmetric and unimodal, we propose a novel estimator, which is a hybrid of the modal interval, shorth and median estimators and whose performance adapts to the level of heterogeneity in the data. We show that our estimator is near optimal when data are i.i.d. and when the fraction of ‘low-noise’ distributions is as small as $$\varOmega \left (\frac{d \log n}{n}\right )$$, where $$n$$ is the number of samples. We also derive minimax lower bounds on the expected error of any estimator that is agnostic to the scales of individual data points. Finally, we extend our theory to linear regression. In both the mean estimation and regression settings, we present computationally feasible versions of our estimators that run in time polynomial in the number of data points.more » « less
- 
            Abstract We propose an efficient estimator for the coefficients in censored quantile regression using the envelope model. The envelope model uses dimension reduction techniques to identify material and immaterial components in the data, and forms the estimator based only on the material component, thus reducing the variability of estimation. We will demonstrate the guaranteed asymptotic efficiency gain of our proposed envelope estimator over the traditional estimator for censored quantile regression. Our analysis begins with the local weighing approach that traditionally relies on semiparametric ‐estimation involving the conditional Kaplan–Meier estimator. We will instead invoke the independent identically distributed (i.i.d.) representation of the Kaplan–Meier estimator, which eliminates this infinite‐dimensional nuisance and transforms our objective function in ‐estimation into a ‐process indexed by only an Euclidean parameter. The modified ‐estimation problem becomes entirely parametric and hence more amenable to analysis. We will also reconsider the i.i.d. representation of the conditional Kaplan–Meier estimator.more » « less
- 
            We study the fundamental problem of estimating the mean of a d-dimensional distribution with covariance Σ≼σ2Id given n samples. When d=1, \cite{catoni} showed an estimator with error (1+o(1))⋅σ2log1δn−−−−−√, with probability 1−δ, matching the Gaussian error rate. For d>1, a natural estimator outputs the center of the minimum enclosing ball of one-dimensional confidence intervals to achieve a 1−δ confidence radius of 2dd+1−−−√⋅σ(dn−−√+2log1δn−−−−−√), incurring a 2dd+1−−−√-factor loss over the Gaussian rate. When the dn−−√ term dominates by a log1δ−−−−√ factor, \cite{lee2022optimal-highdim} showed an improved estimator matching the Gaussian rate. This raises a natural question: Is the 2dd+1−−−√ loss \emph{necessary} when the 2log1δn−−−−−√ term dominates? We show that the answer is \emph{no} -- we construct an estimator that improves over the above naive estimator by a constant factor. We also consider robust estimation, where an adversary is allowed to corrupt an ϵ-fraction of samples arbitrarily: in this case, we show that the above strategy of combining one-dimensional estimates and incurring the 2dd+1−−−√-factor \emph{is} optimal in the infinite-sample limit.more » « less
 An official website of the United States government
An official website of the United States government 
				
			
 
                                    