skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Robust estimation for linear panel data models
In different fields of applications including, but not limited to, behavioral, environmental, medical sciences, and econometrics, the use of panel data regression models has become increasingly popular as a general framework for making meaningful statistical inferences. However, when the ordinary least squares (OLS) method is used to estimate the model parameters, presence of outliers may significantly alter the adequacy of such models by producing biased and inefficient estimates. In this work, we propose a new, weighted likelihood based robust estimation procedure for linear panel data models with fixed and random effects. The finite sample performances of the proposed estimators have been illustrated through an extensive simulation study as well as with an application to blood pressure dataset. Our thorough study demonstrates that the proposed estimators show significantly better performances over the traditional methods in the presence of outliers and produce competitive results to the OLS based estimates when no outliers are present in the dataset.  more » « less
Award ID(s):
1811384
PAR ID:
10455875
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistics in Medicine
Volume:
39
Issue:
29
ISSN:
0277-6715
Format(s):
Medium: X Size: p. 4421-4438
Size(s):
p. 4421-4438
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This article presents generalized semiparametric regression models for conditional cumulative incidence functions with competing risks data when covariates are missing by sampling design or happenstance. A doubly robust augmented inverse probability weighted (AIPW) complete‐case approach to estimation and inference is investigated. This approach modifies IPW complete‐case estimating equations by exploiting the key features in the relationship between the missing covariates and the phase‐one data to improve efficiency. An iterative numerical procedure is derived to solve the nonlinear estimating equations. The asymptotic properties of the proposed estimators are established. A simulation study examining the finite‐sample performances of the proposed estimators shows that the AIPW estimators are more efficient than the IPW estimators. The developed method is applied to the RV144 HIV‐1 vaccine efficacy trial to investigate vaccine‐induced IgG binding antibodies to HIV‐1 as correlates of acquisition of HIV‐1 infection while taking account of whether the HIV‐1 sequences are near or far from the HIV‐1 sequences represented in the vaccine construct. 
    more » « less
  2. Abstract Randomized experiments are the gold standard for causal inference and enable unbiased estimation of treatment effects. Regression adjustment provides a convenient way to incorporate covariate information for additional efficiency. This article provides a unified account of its utility for improving estimation efficiency in multiarmed experiments. We start with the commonly used additive and fully interacted models for regression adjustment in estimating average treatment effects (ATE), and clarify the trade-offs between the resulting ordinary least squares (OLS) estimators in terms of finite sample performance and asymptotic efficiency. We then move on to regression adjustment based on restricted least squares (RLS), and establish for the first time its properties for inferring ATE from the design-based perspective. The resulting inference has multiple guarantees. First, it is asymptotically efficient when the restriction is correctly specified. Second, it remains consistent as long as the restriction on the coefficients of the treatment indicators, if any, is correctly specified and separate from that on the coefficients of the treatment-covariate interactions. Third, it can have better finite sample performance than the unrestricted counterpart even when the restriction is moderately misspecified. It is thus our recommendation when the OLS fit of the fully interacted regression risks large finite sample variability in case of many covariates, many treatments, yet a moderate sample size. In addition, the newly established theory of RLS also provides a unified way of studying OLS-based inference from general regression specifications. As an illustration, we demonstrate its value for studying OLS-based regression adjustment in factorial experiments. Importantly, although we analyse inferential procedures that are motivated by OLS, we do not invoke any assumptions required by the underlying linear models. 
    more » « less
  3. Abstract We provide a novel characterization of augmented balancing weights, also known as automatic debiased machine learning. These popular doubly robust estimators combine outcome modelling with balancing weights—weights that achieve covariate balance directly instead of estimating and inverting the propensity score. When the outcome and weighting models are both linear in some (possibly infinite) basis, we show that the augmented estimator is equivalent to a single linear model with coefficients that combine those of the original outcome model with those from unpenalized ordinary least-squares (OLS). Under certain choices of regularization parameters, the augmented estimator in fact collapses to the OLS estimator alone. We then extend these results to specific outcome and weighting models. We first show that the augmented estimator that uses (kernel) ridge regression for both outcome and weighting models is equivalent to a single, undersmoothed (kernel) ridge regression—implying a novel analysis of undersmoothing. When the weighting model is instead lasso-penalized, we demonstrate a familiar ‘double selection’ property. Our framework opens the black box on this increasingly popular class of estimators, bridges the gap between existing results on the semiparametric efficiency of undersmoothed and doubly robust estimators, and provides new insights into the performance of augmented balancing weights. 
    more » « less
  4. Growth curve models (GCMs), with their ability to directly investigate within-subject change over time and between-subject differences in change for longitudinal data, are widely used in social and behavioral sciences. While GCMs are typically studied with the normal distribution assumption, empirical data often violate the normality assumption in applications. Failure to account for the deviation from normality in data distribution may lead to unreliable model estimation and misleading statistical inferences. A robust GCM based on conditional medians was recently proposed and outperformed traditional growth curve modeling when outliers are present resulting in nonnormality. However, this robust approach was shown to perform less satisfactorily when leverage observations existed. In this work, we propose a robust double medians growth curve modeling approach (DOME GCM) to thoroughly disentangle the influence of data contamination on model estimation and inferences, where two conditional medians are employed for the distributions of the within-subject measurement errors and of random effects, respectively. Model estimation and inferences are conducted in the Bayesian framework, and Laplace distributions are used to convert the optimization problem of median estimation into a problem of obtaining the maximum likelihood estimator for a transformed model. A Monte Carlo simulation study has been conducted to evaluate the numerical performance of the proposed approach, and showed that the proposed approach yields more accurate and efficient parameter estimates when data contain outliers or leverage observations. The application of the developed robust approach is illustrated using a real dataset from the Virginia Cognitive Aging Project to study the change of memory ability. 
    more » « less
  5. Abstract Computational models of the cardiovascular system are increasingly used for the diagnosis, treatment, and prevention of cardiovascular disease. Before being used for translational applications, the predictive abilities of these models need to be thoroughly demonstrated through verification, validation, and uncertainty quantification. When results depend on multiple uncertain inputs, sensitivity analysis is typically the first step required to separate relevant from unimportant inputs, and is key to determine an initial reduction on the problem dimensionality that will significantly affect the cost of all downstream analysis tasks. For computationally expensive models with numerous uncertain inputs, sample‐based sensitivity analysis may become impractical due to the substantial number of model evaluations it typically necessitates. To overcome this limitation, we consider recently proposed Multifidelity Monte Carlo estimators for Sobol’ sensitivity indices, and demonstrate their applicability to an idealized model of the common carotid artery. Variance reduction is achieved combining a small number of three‐dimensional fluid–structure interaction simulations with affordable one‐ and zero‐dimensional reduced‐order models. These multifidelity Monte Carlo estimators are compared with traditional Monte Carlo and polynomial chaos expansion estimates. Specifically, we show consistent sensitivity ranks for both bi‐ (1D/0D) and tri‐fidelity (3D/1D/0D) estimators, and superior variance reduction compared to traditional single‐fidelity Monte Carlo estimators for the same computational budget. As the computational burden of Monte Carlo estimators for Sobol’ indices is significantly affected by the problem dimensionality, polynomial chaos expansion is found to have lower computational cost for idealized models with smooth stochastic response. 
    more » « less