skip to main content


Title: A bias correction method in meta‐analysis of randomized clinical trials with no adjustments for zero‐inflated outcomes
Summary

Many clinical endpoint measures, such as the number of standard drinks consumed per week or the number of days that patients stayed in the hospital, are count data with excessive zeros. However, the zero‐inflated nature of such outcomes is sometimes ignored in analyses of clinical trials. This leads to biased estimates of study‐level intervention effect and, consequently, a biased estimate of the overall intervention effect in a meta‐analysis. The current study proposes a novel statistical approach, the Zero‐inflation Bias Correction (ZIBC) method, that can account for the bias introduced when using the Poisson regression model, despite a high rate of inflated zeros in the outcome distribution of a randomized clinical trial. This correction method only requires summary information from individual studies to correct intervention effect estimates as if they were appropriately estimated using the zero‐inflated Poisson regression model, thus it is attractive for meta‐analysis when individual participant‐level data are not available in some studies. Simulation studies and real data analyses showed that the ZIBC method performed well in correcting zero‐inflation bias in most situations.

 
more » « less
Award ID(s):
2027855 2015373 1812048 1737857
NSF-PAR ID:
10449979
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistics in Medicine
Volume:
40
Issue:
26
ISSN:
0277-6715
Page Range / eLocation ID:
p. 5894-5909
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    This study proposes a two-stage approach to characterize individual developmental trajectories of health risk behaviours and to delineate their time varying effects on short-term or long-term health outcomes. Our model can accommodate longitudinal covariates with zero-inflated counts and discrete outcomes. The longitudinal data of a well-known study of youths at high risk of substance abuse are presented as a motivating example to demonstrate the effectiveness of the model in delineating critical developmental periods of prevention and intervention. Our simulation study shows that the performance of the model proposed improves as the sample size or number of time points increases. When there are excess 0s in the data, the regular Poisson model cannot estimate either the longitudinal covariate process or its time varying effect well. This result, therefore, emphasizes the important role that the model proposed plays in handling zero inflation in the data.

     
    more » « less
  2. Abstract Background Modeling of single cell RNA-sequencing (scRNA-seq) data remains challenging due to a high percentage of zeros and data heterogeneity, so improved modeling has strong potential to benefit many downstream data analyses. The existing zero-inflated or over-dispersed models are based on aggregations at either the gene or the cell level. However, they typically lose accuracy due to a too crude aggregation at those two levels. Results We avoid the crude approximations entailed by such aggregation through proposing an independent Poisson distribution (IPD) particularly at each individual entry in the scRNA-seq data matrix. This approach naturally and intuitively models the large number of zeros as matrix entries with a very small Poisson parameter. The critical challenge of cell clustering is approached via a novel data representation as Departures from a simple homogeneous IPD (DIPD) to capture the per-gene-per-cell intrinsic heterogeneity generated by cell clusters. Our experiments using real data and crafted experiments show that using DIPD as a data representation for scRNA-seq data can uncover novel cell subtypes that are missed or can only be found by careful parameter tuning using conventional methods. Conclusions This new method has multiple advantages, including (1) no need for prior feature selection or manual optimization of hyperparameters; (2) flexibility to combine with and improve upon other methods, such as Seurat. Another novel contribution is the use of crafted experiments as part of the validation of our newly developed DIPD-based clustering pipeline. This new clustering pipeline is implemented in the R (CRAN) package scpoisson . 
    more » « less
  3. Abstract Background Management actions that address local-scale stressors on coral reefs can rapidly improve water quality and reef ecosystem condition. In response to reef managers who need actionable thresholds for coastal runoff and dredging, we conducted a systematic review and meta-analysis of experimental studies that explore the effects of sediment on corals. We identified exposure levels that ‘adversely’ affect corals while accounting for sediment bearing (deposited vs. suspended), coral life-history stage, and species, thus providing empirically based estimates of stressor thresholds on vulnerable coral reefs. Methods We searched online databases and grey literature to obtain a list of potential studies, assess their eligibility, and critically appraise them for validity and risk of bias. Data were extracted from eligible studies and grouped by sediment bearing and coral response to identify thresholds in terms of the lowest exposure levels that induced an adverse physiological and/or lethal effect. Meta-regression estimated the dose–response relationship between exposure level and the magnitude of a coral’s response, with random-effects structures to estimate the proportion of variance explained by factors such as study and coral species. Review findings After critical appraisal of over 15,000 records, our systematic review of corals’ responses to sediment identified 86 studies to be included in meta-analyses (45 studies for deposited sediment and 42 studies for suspended sediment). The lowest sediment exposure levels that caused adverse effects in corals were well below the levels previously described as ‘normal’ on reefs: for deposited sediment, adverse effects occurred as low as 1 mg/cm 2 /day for larvae (limited settlement rates) and 4.9 mg/cm 2 /day for adults (tissue mortality); for suspended sediment, adverse effects occurred as low as 10 mg/L for juveniles (reduced growth rates) and 3.2 mg/L for adults (bleaching and tissue mortality). Corals take at least 10 times longer to experience tissue mortality from exposure to suspended sediment than to comparable concentrations of deposited sediment, though physiological changes manifest 10 times faster in response to suspended sediment than to deposited sediment. Threshold estimates derived from continuous response variables (magnitude of adverse effect) largely matched the lowest-observed adverse-effect levels from a summary of studies, or otherwise helped us to identify research gaps that should be addressed to better quantify the dose–response relationship between sediment exposure and coral health. Conclusions We compiled a global dataset that spans three oceans, over 140 coral species, decades of research, and a range of field- and lab-based approaches. Our review and meta-analysis inform the no-observed and lowest-observed adverse-effect levels (NOAEL, LOAEL) that are used in management consultations by U.S. federal agencies. In the absence of more location- or species-specific data to inform decisions, our results provide the best available information to protect vulnerable reef-building corals from sediment stress. Based on gaps and limitations identified by our review, we make recommendations to improve future studies and recommend future synthesis to disentangle the potentially synergistic effects of multiple coral-reef stressors. 
    more » « less
  4. Abstract

    Population geneticists often use multiple independent hypothesis tests of Hardy–Weinberg Equilibrium (HWE), Linkage Disequilibrium (LD), and population differentiation, to make broad inferences about their systems of choice. However, correcting for Family‐Wise Error Rates (FWER) that are inflated due to multiple comparisons, is sparingly reported in our current literature. In this issue of Molecular Ecology Resources, perform a meta‐analysis of 215 population genetics studies published between 2011 and 2013 to show (i) scarce use of FWER corrections across all three classes of tests, and (ii) when used, inconsistent application of correction methods with a clear bias towards less‐conservative corrections for tests of population differentiation, than for tests of HWE, and LD. Here we replicate this meta‐analysis using 205 population genetics studies published between 2013 and 2018, to show the same continued disuse, and inconsistencies. We hope that both studies serve as a wake‐up call to population geneticists, reviewers, and editors to be rigorous about consistently correcting for FWER inflation.

     
    more » « less
  5. Machine learning (ML) methods for causal inference have gained popularity due to their flexibility to predict the outcome model and the propensity score. In this article, we provide a within-group approach for ML-based causal inference methods in order to robustly estimate average treatment effects in multilevel studies when there is cluster-level unmeasured confounding. We focus on one particular ML-based causal inference method based on the targeted maximum likelihood estimation (TMLE) with an ensemble learner called SuperLearner. Through our simulation studies, we observe that training TMLE within groups of similar clusters helps remove bias from cluster-level unmeasured confounders. Also, using within-group propensity scores estimated from fixed effects logistic regression increases the robustness of the proposed within-group TMLE method. Even if the propensity scores are partially misspecified, the within-group TMLE still produces robust ATE estimates due to double robustness with flexible modeling, unlike parametric-based inverse propensity weighting methods. We demonstrate our proposed methods and conduct sensitivity analyses against the number of groups and individual-level unmeasured confounding to evaluate the effect of taking an eighth-grade algebra course on math achievement in the Early Childhood Longitudinal Study.

     
    more » « less