skip to main content


Title: Relative efficiency of using summary versus individual data in random‐effects meta‐analysis
Abstract

Meta‐analysis is a statistical methodology for combining information from diverse sources so that a more reliable and efficient conclusion can be reached. It can be conducted by either synthesizing study‐level summary statistics or drawing inference from an overarching model for individual participant data (IPD) if available. The latter is often viewed as the “gold standard.” For random‐effects models, however, it remains not fully understood whether the use of IPD indeed gains efficiency over summary statistics. In this paper, we examine the relative efficiency of the two methods under a general likelihood inference setting. We show theoretically and numerically that summary‐statistics‐based analysis isat mostas efficient as IPD analysis, provided that the random effects follow the Gaussian distribution, and maximum likelihood estimation is used to obtain summary statistics. More specifically, (i) the two methods are equivalent in an asymptotic sense; and (ii) summary‐statistics‐based inference can incur an appreciable loss of efficiency if the sample sizes are not sufficiently large. Our results are established under the assumption that the between‐study heterogeneity parameter remains constant regardless of the sample sizes, which is different from a previous study. Our findings are confirmed by the analyses of simulated data sets and a real‐world study of alcohol interventions.

 
more » « less
NSF-PAR ID:
10455841
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrics
Volume:
76
Issue:
4
ISSN:
0006-341X
Format(s):
Medium: X Size: p. 1319-1329
Size(s):
p. 1319-1329
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Case-crossover designs are widely used to study short-term exposure effects on the risk of acute adverse health events. While the frequentist literature on this topic is vast, there is no Bayesian work in this general area. The contribution of this paper is twofold. First, the paper establishes Bayesian equivalence results that require characterization of the set of priors under which the posterior distributions of the risk ratio parameters based on a case-crossover and time-series analysis are identical. Second, the paper studies inferential issues under case-crossover designs in a Bayesian framework. Traditionally, a conditional logistic regression is used for inference on risk-ratio parameters in case-crossover studies. We consider instead a more general full likelihood-based approach which makes less restrictive assumptions on the risk functions. Formulation of a full likelihood leads to growth in the number of parameters proportional to the sample size. We propose a semi-parametric Bayesian approach using a Dirichlet process prior to handle the random nuisance parameters that appear in a full likelihood formulation. We carry out a simulation study to compare the Bayesian methods based on full and conditional likelihood with the standard frequentist approaches for case-crossover and time-series analysis. The proposed methods are illustrated through the Detroit Asthma Morbidity, Air Quality and Traffic study, which examines the association between acute asthma risk and ambient air pollutant concentrations.

     
    more » « less
  2. Abstract

    Children exposed to mixtures of endocrine disrupting compounds such as phthalates are at high risk of experiencing significant friction in their growth and sexual maturation. This article is primarily motivated by a study that aims to assess the toxicants‐modified effects of risk factors related to the hazards of early or delayed onset of puberty among children living in Mexico City. To address the hypothesis of potential nonlinear modification of covariate effects, we propose a new Cox regression model with multiple functional covariate‐environment interactions, which allows covariate effects to be altered nonlinearly by mixtures of exposed toxicants. This new class of models is rather flexible and includes many existing semiparametric Cox models as special cases. To achieve efficient estimation, we develop the global partial likelihood method of inference, in which we establish key large‐sample results, including estimation consistency, asymptotic normality, semiparametric efficiency and the generalized likelihood ratio test for both parameters and nonparametric functions. The proposed methodology is examined via simulation studies and applied to the analysis of the motivating data, where maternal exposures to phthalates during the third trimester of pregnancy are found to be important risk modifiers for the age of attaining the first stage of puberty.The Canadian Journal of Statistics47: 204–221; 2019 © 2019 Statistical Society of Canada

     
    more » « less
  3. We consider the problem of efficient inference of the Average Treatment Effect in a sequential experiment where the policy governing the assignment of subjects to treatment or control can change over time. We first provide a central limit theorem for the Adaptive Augmented Inverse-Probability Weighted estimator, which is semiparametric efficient, under weaker assumptions than those previously made in the literature. This central limit theorem enables efficient inference at fixed sample sizes. We then consider a sequential inference setting, deriving both asymptotic and nonasymptotic confidence sequences that are considerably tighter than previous methods. These anytime-valid methods enable inference under data-dependent stopping times (sample sizes). Additionally, we use propensity score truncation techniques from the recent off-policy estimation literature to reduce the finite sample variance of our estimator without affecting the asymptotic variance. Empirical results demonstrate that our methods yield narrower confidence sequences than those previously developed in the literature while maintaining time-uniform error control. 
    more » « less
  4. Abstract

    This paper deals with making inference on parameters of a two-level model matching the design hierarchy of a two-stage sample. In a pioneering paper, Scott and Smith (Journal of the American Statistical Association, 1969, 64, 830–840) proposed a Bayesian model based or prediction approach to estimating a finite population mean under two-stage cluster sampling. We provide a brief account of their pioneering work. We review two methods for the analysis of two-level models based on matching two-stage samples. Those methods are based on pseudo maximum likelihood and pseudo composite likelihood taking account of design weights. We then propose a new method for analysis of two-level models based on a normal approximation to the estimated cluster effects and taking account of design weights. This method does not require cluster sizes to be constants or unrelated to cluster effects. We evaluate the relative performance of the three methods in a simulation study. Finally, we apply the methods to real data obtained from 2011 Nepal Demographic and Health Survey (NDHS).

     
    more » « less
  5. Summary

    In longitudinal data analysis one frequently encounters non-Gaussian data that are repeatedly collected for a sample of individuals over time. The repeated observations could be binomial, Poisson or of another discrete type or could be continuous. The timings of the repeated measurements are often sparse and irregular. We introduce a latent Gaussian process model for such data, establishing a connection to functional data analysis. The functional methods proposed are non-parametric and computationally straightforward as they do not involve a likelihood. We develop functional principal components analysis for this situation and demonstrate the prediction of individual trajectories from sparse observations. This method can handle missing data and leads to predictions of the functional principal component scores which serve as random effects in this model. These scores can then be used for further statistical analysis, such as inference, regression, discriminant analysis or clustering. We illustrate these non-parametric methods with longitudinal data on primary biliary cirrhosis and show in simulations that they are competitive in comparisons with generalized estimating equations and generalized linear mixed models.

     
    more » « less