skip to main content

Title: Inference for Linear Models with Dependent Errors

The paper is concerned with inference for linear models with fixed regressors and weakly dependent stationary time series errors. Theoretically, we obtain asymptotic normality for the M-estimator of the regression parameter under mild conditions and establish a uniform Bahadur representation for recursive M-estimators. Methodologically, we extend the recently proposed self-normalized approach of Shao from stationary time series to the regression set-up, where the sequence of response variables is typically non-stationary in mean. Since the limiting distribution of the self-normalized statistic depends on the design matrix and its corresponding critical values are case dependent, we develop a simulation-based approach to approximate the critical values consistently. Through a simulation study, we demonstrate favourable finite sample performance of our method in comparison with a block-bootstrap-based approach. Empirical illustrations using two real data sets are also provided.

more » « less
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Medium: X Size: p. 323-343
p. 323-343
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    We propose a new method to construct confidence intervals for quantities that are associated with a stationary time series, which avoids direct estimation of the asymptotic variances. Unlike the existing tuning-parameter-dependent approaches, our method has the attractive convenience of being free of any user-chosen number or smoothing parameter. The interval is constructed on the basis of an asymptotically distribution-free self-normalized statistic, in which the normalizing matrix is computed by using recursive estimates. Under mild conditions, we establish the theoretical validity of our method for a broad class of statistics that are functionals of the empirical distribution of fixed or growing dimension. From a practical point of view, our method is conceptually simple, easy to implement and can be readily used by the practitioner. Monte Carlo simulations are conducted to compare the finite sample performance of the new method with those delivered by the normal approximation and the block bootstrap approach.

    more » « less
  2. Summary

    We develop a new class of time continuous autoregressive fractionally integrated moving average (CARFIMA) models which are useful for modelling regularly spaced and irregu-larly spaced discrete time long memory data. We derive the autocovariance function of a stationary CARFIMA model and study maximum likelihood estimation of a regression model with CARFIMA errors, based on discrete time data and via the innovations algorithm. It is shown that the maximum likelihood estimator is asymptotically normal, and its finite sample properties are studied through simulation. The efficacy of the approach proposed is demonstrated with a data set from an environmental study.

    more » « less
  3. Summary

    We provide a new definition of breakdown in finite samples, with an extension to asymptotic breakdown. Previous definitions centre on defining a critical region for either the parameter or the objective function. If for a particular outlier configuration the critical region is entered, breakdown is said to occur. In contrast with the traditional approach, we leave the definition of the critical region implicit. Our proposal encompasses previous definitions of breakdown in linear and non-linear regression settings. In some cases, it leads to a different and more intuitive notion of breakdown than other procedures that are available. An important advantage of our new definition is that it also applies to models for dependent observations where current definitions of breakdown typically fail. We illustrate our suggestion by using examples from linear and non-linear regression, and time series.

    more » « less
  4. Background

    Metamodels can address some of the limitations of complex simulation models by formulating a mathematical relationship between input parameters and simulation model outcomes. Our objective was to develop and compare the performance of a machine learning (ML)–based metamodel against a conventional metamodeling approach in replicating the findings of a complex simulation model.


    We constructed 3 ML-based metamodels using random forest, support vector regression, and artificial neural networks and a linear regression-based metamodel from a previously validated microsimulation model of the natural history hepatitis C virus (HCV) consisting of 40 input parameters. Outcomes of interest included societal costs and quality-adjusted life-years (QALYs), the incremental cost-effectiveness (ICER) of HCV treatment versus no treatment, cost-effectiveness analysis curve (CEAC), and expected value of perfect information (EVPI). We evaluated metamodel performance using root mean squared error (RMSE) and Pearson’s R2on the normalized data.


    The R2values for the linear regression metamodel for QALYs without treatment, QALYs with treatment, societal cost without treatment, societal cost with treatment, and ICER were 0.92, 0.98, 0.85, 0.92, and 0.60, respectively. The corresponding R2values for our ML-based metamodels were 0.96, 0.97, 0.90, 0.95, and 0.49 for support vector regression; 0.99, 0.83, 0.99, 0.99, and 0.82 for artificial neural network; and 0.99, 0.99, 0.99, 0.99, and 0.98 for random forest. Similar trends were observed for RMSE. The CEAC and EVPI curves produced by the random forest metamodel matched the results of the simulation output more closely than the linear regression metamodel.


    ML-based metamodels generally outperformed traditional linear regression metamodels at replicating results from complex simulation models, with random forest metamodels performing best.


    Decision-analytic models are frequently used by policy makers and other stakeholders to assess the impact of new medical technologies and interventions. However, complex models can impose limitations on conducting probabilistic sensitivity analysis and value-of-information analysis, and may not be suitable for developing online decision-support tools. Metamodels, which accurately formulate a mathematical relationship between input parameters and model outcomes, can replicate complex simulation models and address the above limitation. The machine learning–based random forest model can outperform linear regression in replicating the findings of a complex simulation model. Such a metamodel can be used for conducting cost-effectiveness and value-of-information analyses or developing online decision support tools.

    more » « less
  5. Abstract

    We consider the proportional hazards model in which the covariates include the discretized categories of a continuous time‐dependent exposure variable measured with error. Naively ignoring the measurement error in the analysis may cause biased estimation and erroneous inference. Although various approaches have been proposed to deal with measurement error when the hazard depends linearly on the time‐dependent variable, it has not yet been investigated how to correct when the hazard depends on the discretized categories of the time‐dependent variable. To fill this gap in the literature, we propose a smoothed corrected score approach based on approximation of the discretized categories after smoothing the indicator function. The consistency and asymptotic normality of the proposed estimator are established. The observation times of the time‐dependent variable are allowed to be informative. For comparison, we also extend to this setting two approximate approaches, the regression calibration and the risk‐set regression calibration. The methods are assessed by simulation studies and by application to data from an HIV clinical trial.

    more » « less