Summary We consider peer effect estimation in social network models where some network links are incorrectly measured. We show that if the number or magnitude of mismeasured links does not grow too quickly with the sample size, then standard instrumental variables estimators that ignore these measurement errors remain consistent, and standard asymptotic inference methods remain valid. These results hold even when the link measurement errors are correlated with regressors or with structural errors in the model. Simulations and real data experiments confirm our results in finite samples. These findings imply that researchers can ignore small numbers of mismeasured links in networks.
more »
« less
This content will become publicly available on January 1, 2026
With random regressors, least squares inference is robust to correlated errors with unknown correlation structure
Abstract Linear regression is arguably the most widely used statistical method. With fixed regressors and correlated errors, the conventional wisdom is to modify the variance-covariance estimator to accommodate the known correlation structure of the errors. We depart from existing literature by showing that with random regressors, linear regression inference is robust to correlated errors with unknown correlation structure. The existing theoretical analyses for linear regression are no longer valid because even the asymptotic normality of the least squares coefficients breaks down in this regime. We first prove the asymptotic normality of the t statistics by establishing their Berry–Esseen bounds based on a novel probabilistic analysis of self-normalized statistics. We then study the local power of the corresponding t tests and show that, perhaps surprisingly, error correlation can even enhance power in the regime of weak signals. Overall, our results show that linear regression is applicable more broadly than the conventional theory suggests, and they further demonstrate the value of randomization for ensuring robustness of inference.
more »
« less
- Award ID(s):
- 1945136
- PAR ID:
- 10625613
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Biometrika
- Volume:
- 112
- Issue:
- 1
- ISSN:
- 1464-3510
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure. For instance, the ordinary least squares (OLS) estimator in an adaptive linear regression model can exhibit non-normal asymptotic behavior, posing challenges for accurate inference and interpretation. In this paper, we propose a general method for constructing debiased estimator which remedies this issue. It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of asymptotic normality, supplemented by discussions on achieving near-optimal asymptotic variance. A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance of the least squares estimator while obtaining asymptotic normality property. Consequently, this work helps connect two fruitful paradigms of adaptive inference: a) non-asymptotic inference using concentration inequalities and b) asymptotic inference via asymptotic normality.more » « less
-
Sequential data collection has emerged as a widely adopted technique for enhancing the efficiency of data gathering processes. Despite its advantages, such data collection mechanism often introduces complexities to the statistical inference procedure. For instance, the ordinary least squares (OLS) estimator in an adaptive linear regression model can exhibit non-normal asymptotic behavior, posing challenges for accurate inference and interpretation. In this paper, we propose a general method for constructing debiased estimator which remedies this issue. It makes use of the idea of adaptive linear estimating equations, and we establish theoretical guarantees of asymptotic normality, supplemented by discussions on achieving near-optimal asymptotic variance. A salient feature of our estimator is that in the context of multi-armed bandits, our estimator retains the non-asymptotic performance of the least square estimator while obtaining asymptotic normality property. Consequently, this work helps connect two fruitful paradigms of adaptive inference: a) non-asymptotic inference using concentration inequalities and b) asymptotic inference via asymptotic normality.more » « less
-
Estimation and inference in statistics pose significant challenges when data are collected adaptively. Even in linear models, the Ordinary Least Squares (OLS) estimator may fail to exhibit asymptotic normality for single coordinate estimation and have inflated error. This issue is highlighted by a recent minimax lower bound, which shows that the error of estimating a single coordinate can be enlarged by a multiple of $$\sqrt{d}$$ when data are allowed to be arbitrarily adaptive, compared with the case when they are i.i.d. Our work explores this striking difference in estimation performance between utilizing i.i.d. and adaptive data. We investigate how the degree of adaptivity in data collection impacts the performance of estimating a low-dimensional parameter component in high-dimensional linear models. We identify conditions on the data collection mechanism under which the estimation error for a low-dimensional parameter component matches its counterpart in the i.i.d. setting, up to a factor that depends on the degree of adaptivity. We show that OLS or OLS on centered data can achieve this matching error. In addition, we propose a novel estimator for single coordinate inference via solving a Two-stage Adaptive Linear Estimating equation (TALE). Under a weaker form of adaptivity in data collection, we establish an asymptotic normality property of the proposed estimator.more » « less
-
Summary Combining dependent $ p $-values poses a long-standing challenge in statistical inference, particularly when aggregating findings from multiple methods to enhance signal detection. Recently, $ p $-value combination tests based on regularly-varying-tailed distributions, such as the Cauchy combination test and harmonic mean $ p $-value, have attracted attention for their robustness to unknown dependence. This paper provides a theoretical and empirical evaluation of these methods under an asymptotic regime where the number of $ p $-values is fixed and the global test significance level approaches zero. We examine two types of dependence among the $ p $-values. First, when $ p $-values are pairwise asymptotically independent, such as with bivariate normal test statistics with no perfect correlation, we prove that these combination tests are asymptotically valid. However, they become equivalent to the Bonferroni test as the significance level tends to zero for both one-sided and two-sided $ p $-values. Empirical investigations suggest that this equivalence can emerge at moderately small significance levels. Second, under pairwise quasi-asymptotic dependence, such as with bivariate $ t $-distributed test statistics, our simulations suggest that these combination tests can remain valid and exhibit notable power gains over the Bonferroni test, even as the significance level diminishes. These findings highlight the potential advantages of these combination tests in scenarios where $ p $-values exhibit substantial dependence. Our simulations also examine how test performance depends on the support and tail heaviness of the underlying distributions.more » « less
An official website of the United States government
