Abstract Linear regression is arguably the most widely used statistical method. With fixed regressors and correlated errors, the conventional wisdom is to modify the variance-covariance estimator to accommodate the known correlation structure of the errors. We depart from existing literature by showing that with random regressors, linear regression inference is robust to correlated errors with unknown correlation structure. The existing theoretical analyses for linear regression are no longer valid because even the asymptotic normality of the least squares coefficients breaks down in this regime. We first prove the asymptotic normality of the t statistics by establishing their Berry–Esseen bounds based on a novel probabilistic analysis of self-normalized statistics. We then study the local power of the corresponding t tests and show that, perhaps surprisingly, error correlation can even enhance power in the regime of weak signals. Overall, our results show that linear regression is applicable more broadly than the conventional theory suggests, and they further demonstrate the value of randomization for ensuring robustness of inference.
more »
« less
Consistency of the Hill Estimator for Time Series Observed with Measurement Errors
We investigate the asymptotic and finite sample behavior of the Hill estimator applied to time series contaminated by measurement or other errors. We show that for all discrete time models used in practice, whose non‐contaminated marginal distributions are regularly varying, the Hill estimator is consistent. Essentially, the only assumption on the errors is that they have lighter tails than the underlying unobservable process. The asymptotic justification however depends on the specific class of models assumed for the underlying unobservable process. We show by means of a simulation study that the asymptotic robustness of the Hill estimator is clearly manifested in finite samples. We further illustrate this robustness by a numerical study of the interarrival times of anomalies in a backbone internet network, the Internet2 in the United States; the anomalies arrival times are computed with a roundoff error.
more »
« less
- PAR ID:
- 10246705
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Journal of Time Series Analysis
- Volume:
- 41
- Issue:
- 3
- ISSN:
- 0143-9782
- Page Range / eLocation ID:
- p. 421-435
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper proposes a nonuniform subsampling method for finite mixtures of regression models to reduce large data computational tasks. A general estimator based on a subsample is investigated, and its asymptotic normality is established. We assign optimal subsampling probabilities to data points that minimize the asymptotic mean squared errors of the general estimator and linearly transformed estimators. Since the proposed probabilities depend on unknown parameters, an implementable algorithm is developed. We first approximate the optimal subsampling probabilities using a pilot sample. After that, we select a subsample using the approximated subsampling probabilities and compute estimates using the subsample. We evaluate the proposed method in a simulation study and present a real data example using appliance energy data.more » « less
-
Summary Panel count data arise when the number of recurrent events experienced by each subject is observed intermittently at discrete examination times. The examination time process can be informative about the underlying recurrent event process even after conditioning on covariates. We consider a semiparametric accelerated mean model for the recurrent event process and allow the two processes to be correlated through a shared frailty. The regression parameters have a simple marginal interpretation of modifying the time scale of the cumulative mean function of the event process. A novel estimation procedure for the regression parameters and the baseline rate function is proposed based on a conditioning technique. In contrast to existing methods, the proposed method is robust in the sense that it requires neither the strong Poisson-type assumption for the underlying recurrent event process nor a parametric assumption on the distribution of the unobserved frailty. Moreover, the distribution of the examination time process is left unspecified, allowing for arbitrary dependence between the two processes. Asymptotic consistency of the estimator is established, and the variance of the estimator is estimated by a model-based smoothed bootstrap procedure. Numerical studies demonstrated that the proposed point estimator and variance estimator perform well with practical sample sizes. The methods are applied to data from a skin cancer chemoprevention trial.more » « less
-
We propose a two-stage estimation procedure for a copula-based model with semi-competing risks data, where the non-terminal event is subject to dependent censoring by the terminal event, and both events are subject to independent censoring. With a copula-based model, the marginal survival functions of individual event times are specified by semiparametric transformation models, and the dependence between the bivariate event times is specified by a parametric copula function. For the estimation procedure, in the first stage, the parameters associated with the marginal of the terminal event are estimated using only the corresponding observed outcomes, and in the second stage, the marginal parameters for the non-terminal event time and the copula parameter are estimated together via maximizing a pseudo-likelihood function based on the joint distribution of the bivariate event times. We derived the asymptotic properties of the proposed estimator and provided an analytic variance estimator for inference. Through simulation studies, we showed that our approach leads to consistent estimates with less computational cost and more robustness than the one-stage procedure developed in Chen (2012), where all parameters were estimated simultaneously. In addition, our approach demonstrates more desirable finite-sample performances over another existing two-stage estimation method proposed in Zhu et al. (2021). An R package PMLE4SCR is developed to implement our proposed method.more » « less
-
Abstract Structural nested mean models (SNMMs) are useful for causal inference of treatment effects in longitudinal observational studies. Most existing works assume that the data are collected at prefixed time points for all subjects, which, however, may be restrictive in practice. To deal with irregularly spaced observations, we assume a class of continuous‐time SNMMs and a martingale condition of no unmeasured confounding (NUC) to identify the causal parameters. We develop the semiparametric efficiency theory and locally efficient estimators for continuous‐time SNMMs. This task is nontrivial due to the restrictions from the NUC assumption imposed on the SNMM parameter. In the presence of ignorable censoring, we show that the complete‐case estimator is optimal among a class of weighting estimators including the inverse probability of censoring weighting estimator, and it achieves a double robustness feature in that it is consistent if at least one of the models for the potential outcome mean function and the treatment process is correctly specified. The new framework allows us to conduct causal analysis respecting the underlying continuous‐time nature of data processes. The simulation study shows that the proposed estimator outperforms existing approaches. We estimate the effect of time to initiate highly active antiretroviral therapy on the CD4 count at year 2 from the observational Acute Infection and Early Disease Research Program database.more » « less
An official website of the United States government
