Background: Outcome measures that are count variables with excessive zeros are common in health behaviors research. Examples include the number of standard drinks consumed or alcohol‐related problems experienced over time. There is a lack of empirical data about the relative performance of prevailing statistical models for assessing the efficacy of interventions when outcomes are zero‐inflated, particularly compared with recently developed marginalized count regression approaches for such data.Methods: The current simulation study examined five commonly used approaches for analyzing count outcomes, including two linear models (with outcomes on raw and log‐transformed scales, respectively) and three prevailing count distribution‐based models (ie, Poisson, negative binomial, and zero‐inflated Poisson (ZIP) models). We also considered the marginalized zero‐inflated Poisson (MZIP) model, a novel alternative that estimates the overall effects on the population mean while adjusting for zero‐inflation. Motivated by alcohol misuse prevention trials, extensive simulations were conducted to evaluate and compare the statistical power and Type I error rate of the statistical models and approaches across data conditions that varied in sample size ( to 500), zero rate (0.2 to 0.8), and intervention effect sizes.Results: Under zero‐inflation, the Poisson model failed to control the Type I error rate, resulting in higher than expected false positive results. When the intervention effects on the zero (vs. non‐zero) and count parts were in the same direction, the MZIP model had the highest statistical power, followed by the linear model with outcomes on the raw scale, negative binomial model, and ZIP model. The performance of the linear model with a log‐transformed outcome variable was unsatisfactory.Conclusions: The MZIP model demonstrated better statistical properties in detecting true intervention effects and controlling false positive results for zero‐inflated count outcomes. This MZIP model may serve as an appealing analytical approach to evaluating overall intervention effects in studies with count outcomes marked by excessive zeros.
more »
« less
Adjacency-Clustering and Its Application for Yield Prediction in Integrated Circuit Manufacturing
Accurate yield prediction in integrated circuit manufacturing enables accurate estimation of production cost and early detection of processing problems. It is known that defects tend to be clustered and a chip is likely to be defective if its neighbors are defective. This neighborhood effect is not well captured in traditional yield modeling approaches. We propose a new yield prediction model, called adjacency-clustering which addresses, for the first time, the neighborhood effect, and delivers prediction results that are significantly better than state-of-the-art methods. Adjacency-clustering (AC) model is a form of the Markov random field minimum energy model (MRF) that is primarily known in the context of image segmentation. AC model is a novel use of MRF for identifying defect patterns that enable diagnosis of failure causes in the manufacturing process. In this paper we utilize the defect patterns obtained by the AC model for yield prediction. We compare the performance of the AC model to that of leading yield prediction models including the Poisson, the negative binomial, the Poisson regression, and negative binomial regression models, on real data sets and on simulated data sets. The results demonstrate that the adjacency-clustering model captures the neighborhood effect and delivers superior prediction accuracy. Moreover, the concept and methodology of adjacency-clustering are not limited to integrated circuit manufacturing. Rather, it is applicable in any context where a neighborhood effect is present, such as disease risk mapping and energy consumption prediction. The e-companion is available at https://doi.org/10.1287/opre.2018.1741 .
more »
« less
- Award ID(s):
- 1760102
- PAR ID:
- 10357247
- Date Published:
- Journal Name:
- Operations Research
- Volume:
- 66
- Issue:
- 6
- ISSN:
- 0030-364X
- Page Range / eLocation ID:
- 1571 to 1585
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Low-photon count imaging has been typically modeled by Poisson statistics. This discrete probability distribution model assumes that the mean and variance of a signal are equal. In the presence of greater variability in a dataset than what is expected, the negative binomial distribution is a suitable overdispersed alternative to the Poisson distribution. In this work, we present a framework for reconstructing sparse signals in these low-count overdispersed settings. Specifically, we describe a gradient-based sequential quadratic optimization approach that minimizes the negative log-likelihood corresponding to the negative binomial distribution coupled with a sparsity-promoting regularization term. Numerical experiments on 1D and 2D sparse/compressible signals are presented.more » « less
-
Defect prediction aims to automatically identify potential defective code with minimal human intervention and has been widely studied in the literature. Just-in-Time (JIT) defect prediction focuses on program changes rather than whole programs, and has been widely adopted in continuous testing. CC2Vec, state-of-the-art JIT defect prediction tool, first constructs a hierarchical attention network (HAN) to learn distributed vector representations of both code additions and deletions, and then concatenates them with two other embedding vectors representing commit messages and overall code changes extracted by the existing DeepJIT approach to train a model for predicting whether a given commit is defective. Although CC2Vec has been shown to be the state of the art for JIT defect prediction, it was only evaluated on a limited dataset and not compared with all representative baselines. Therefore, to further investigate the efficacy and limitations of CC2Vec, this paper performs an extensive study of CC2Vec on a large-scale dataset with over 310,370 changes (8.3 X larger than the original CC2Vec dataset). More specifically, we also empirically compare CC2Vec against DeepJIT and representative traditional JIT defect prediction techniques. The experimental results show that CC2Vec cannot consistently outperform DeepJIT, and neither of them can consistently outperform traditional JIT defect prediction. We also investigate the impact of individual traditional defect prediction features and find that the added-line-number feature outperforms other traditional features. Inspired by this finding, we construct a simplistic JIT defect prediction approach which simply adopts the added-line- number feature with the logistic regression classifier. Surprisingly, such a simplistic approach can outperform CC2Vec and DeepJIT in defect prediction, and can be 81k X/120k X faster in training/testing. Furthermore, the paper also provides various practical guidelines for advancing JIT defect prediction in the near future.more » « less
-
Vladislav Sergeevich Sorokin (Ed.)Phononic crystals can develop defects during manufacturing that alter the desired dynamic response and bandgap behavior. This frequency behavior change can enable successful defect inspection if the characteristic defect response is known. In this study, the behavior of a defective square unit cell comprising a freed and shortened leg is studied using a wave finite element method and an approximate continuous-lumped model to elucidate the defect induced qualitative dynamical features. These metrics are a computationally inexpensive alternative to modeling a defective unit cell within a large pristine array entirely in finite elements. The accuracy of these models is validated by comparing the result to a full finite element model. The impact of a shortened unit cell leg on the behaviors of an infinite array of defective cells and a finite array with a single defect are successfully predicted through dispersion curves and frequency response functions, respectively. These methods reveal defect-induced modes that split the local resonance bandgap of the pristine cell, as well as new anti-resonances resulting from the shortened leg. The study uses both approaches to evaluate the effect of defects in complex phononic crystal geometries and provides a comparative evaluation of the results of each model.more » « less
-
There has been a growing number of datasets exhibiting an excess of zero values that cannot be adequately modeled using standard probability distributions. For example, microbiome data and single-cell RNA sequencing data consist of count measurements in which the proportion of zeros exceeds what can be captured by standard distributions such as the Poisson or negative binomial, while also requiring appropriate modeling of the nonzero counts. Several models have been proposed to address zero-inflated datasets including the zero-inflated negative binomial, hurdle negative binomial model, and the truncated latent Gaussian copula model. This study aims to compare various models and determine which one performs optimally under different conditions using both simulation studies and real data analyses. We are particularly interested in investigating how dependence among the variables, level of zeroinflation or deflation, and variance of the data affects model selection. KEYWORDS: Zero-InflatedModels; HurdleModels; Truncated Latent Gaussian CopulaModel; Microbiome Data; Gene-Sequencing Data; Zero-Inflation, Negative Binomial; Zero-Deflationmore » « less
An official website of the United States government

