Title: Exact Inference for Common Odds Ratio in Meta-Analysis with Zero-Total-Event Studies
Abstract Stemming from the high-profile publication of Nissen and Wolski (N Engl J Med 356:2457–2471, 2007) and subsequent discussions with divergent views on how to handle observed zero-total-event studies, defined to be studies that observe zero number of event in both treatment and control arms, the research topic concerning the common odds ratio model with zero-total-event studies remains to be an unresolved problem in meta-analysis. In this article, we address this problem by proposing a novel repro samples method to handle zero-total-event studies and make inference for the common odds ratio. The development explicitly accounts for the sampling scheme that generates the observed data and does not rely on any large sample approximations. It is theoretically justified with a guaranteed finite-sample performance. Simulation studies are designed to demonstrate the empirical performance of the proposed method. It shows that the proposed confidence set, although a little conservative, achieves the desired empirical coverage rate in all situations. The development also shows that the zero-total-event studies contain meaningful information and impact the inference for the common odds ratio. The proposed method is used to perform a meta-analysis of the 48 trials reported in Nissen and Wolski (N Engl J Med 356:2457–2471, 2007) as well more »« less
Cui, Xia; Li, Runze; Yang, Guangren; Zhou, Wang
(, Biometrika)
null
(Ed.)
Summary This paper is concerned with empirical likelihood inference on the population mean when the dimension $$p$$ and the sample size $$n$$ satisfy $$p/n\rightarrow c\in [1,\infty)$$. As shown in Tsao (2004), the empirical likelihood method fails with high probability when $p/n>1/2$ because the convex hull of the $$n$$ observations in $$\mathbb{R}^p$$ becomes too small to cover the true mean value. Moreover, when $p> n$, the sample covariance matrix becomes singular, and this results in the breakdown of the first sandwich approximation for the log empirical likelihood ratio. To deal with these two challenges, we propose a new strategy of adding two artificial data points to the observed data. We establish the asymptotic normality of the proposed empirical likelihood ratio test. The proposed test statistic does not involve the inverse of the sample covariance matrix. Furthermore, its form is explicit, so the test can easily be carried out with low computational cost. Our numerical comparison shows that the proposed test outperforms some existing tests for high-dimensional mean vectors in terms of power. We also illustrate the proposed procedure with an empirical analysis of stock data.
Cvencek, Dario; Meltzoff, Andrew N.; Maddox, Craig D.; Nosek, Brian A.; Rudman, Laurie A.; Devos, Thierry; Dunham, Yarrow; Baron, Andrew S.; Steffens, Melanie C.; Lane, Kristin; et al
(, Personality and Social Psychology Bulletin)
This meta-analysis evaluated theoretical predictions from balanced identity theory (BIT) and evaluated the validity of zero points of Implicit Association Test (IAT) and self-report measures used to test these predictions. Twenty-one researchers contributed individual subject data from 36 experiments (total N = 12,773) that used both explicit and implicit measures of the social–cognitive constructs. The meta-analysis confirmed predictions of BIT’s balance–congruity principle and simultaneously validated interpretation of the IAT’s zero point as indicating absence of preference between two attitude objects. Statistical power afforded by the sample size enabled the first confirmations of balance–congruity predictions with self-report measures. Beyond these empirical results, the meta-analysis introduced a within-study statistical test of the balance–congruity principle, finding that it had greater efficiency than the previous best method. The meta-analysis’s full data set has been publicly archived to enable further studies of interrelations among attitudes, stereotypes, and identities.
Villanustre, Flavio; Chala, Arjuna; Dev, Roger; Xu, Lili; LexisNexis, Jesse Shaw; Furht, Borko; Khoshgoftaar, Taghi
(, Journal of Big Data)
Abstract This project is funded by the US National Science Foundation (NSF) through their NSF RAPID program under the title “Modeling Corona Spread Using Big Data Analytics.” The project is a joint effort between the Department of Computer & Electrical Engineering and Computer Science at FAU and a research group from LexisNexis Risk Solutions. The novel coronavirus Covid-19 originated in China in early December 2019 and has rapidly spread to many countries around the globe, with the number of confirmed cases increasing every day. Covid-19 is officially a pandemic. It is a novel infection with serious clinical manifestations, including death, and it has reached at least 124 countries and territories. Although the ultimate course and impact of Covid-19 are uncertain, it is not merely possible but likely that the disease will produce enough severe illness to overwhelm the worldwide health care infrastructure. Emerging viral pandemics can place extraordinary and sustained demands on public health and health systems and on providers of essential community services. Modeling the Covid-19 pandemic spread is challenging. But there are data that can be used to project resource demands. Estimates of the reproductive number (R) of SARS-CoV-2 show that at the beginning of the epidemic, each infected person spreads the virus to at least two others, on average (Emanuel et al. in N Engl J Med. 2020, Livingston and Bucher in JAMA 323(14):1335, 2020). A conservatively low estimate is that 5 % of the population could become infected within 3 months. Preliminary data from China and Italy regarding the distribution of case severity and fatality vary widely (Wu and McGoogan in JAMA 323(13):1239–42, 2020). A recent large-scale analysis from China suggests that 80 % of those infected either are asymptomatic or have mild symptoms; a finding that implies that demand for advanced medical services might apply to only 20 % of the total infected. Of patients infected with Covid-19, about 15 % have severe illness and 5 % have critical illness (Emanuel et al. in N Engl J Med. 2020). Overall, mortality ranges from 0.25 % to as high as 3.0 % (Emanuel et al. in N Engl J Med. 2020, Wilson et al. in Emerg Infect Dis 26(6):1339, 2020). Case fatality rates are much higher for vulnerable populations, such as persons over the age of 80 years (> 14 %) and those with coexisting conditions (10 % for those with cardiovascular disease and 7 % for those with diabetes) (Emanuel et al. in N Engl J Med. 2020). Overall, Covid-19 is substantially deadlier than seasonal influenza, which has a mortality of roughly 0.1 %. Public health efforts depend heavily on predicting how diseases such as those caused by Covid-19 spread across the globe. During the early days of a new outbreak, when reliable data are still scarce, researchers turn to mathematical models that can predict where people who could be infected are going and how likely they are to bring the disease with them. These computational methods use known statistical equations that calculate the probability of individuals transmitting the illness. Modern computational power allows these models to quickly incorporate multiple inputs, such as a given disease’s ability to pass from person to person and the movement patterns of potentially infected people traveling by air and land. This process sometimes involves making assumptions about unknown factors, such as an individual’s exact travel pattern. By plugging in different possible versions of each input, however, researchers can update the models as new information becomes available and compare their results to observed patterns for the illness. In this paper we describe the development a model of Corona spread by using innovative big data analytics techniques and tools. We leveraged our experience from research in modeling Ebola spread (Shaw et al. Modeling Ebola Spread and Using HPCC/KEL System. In: Big Data Technologies and Applications 2016 (pp. 347-385). Springer, Cham) to successfully model Corona spread, we will obtain new results, and help in reducing the number of Corona patients. We closely collaborated with LexisNexis, which is a leading US data analytics company and a member of our NSF I/UCRC for Advanced Knowledge Enablement. The lack of a comprehensive view and informative analysis of the status of the pandemic can also cause panic and instability within society. Our work proposes the HPCC Systems Covid-19 tracker, which provides a multi-level view of the pandemic with the informative virus spreading indicators in a timely manner. The system embeds a classical epidemiological model known as SIR and spreading indicators based on causal model. The data solution of the tracker is built on top of the Big Data processing platform HPCC Systems, from ingesting and tracking of various data sources to fast delivery of the data to the public. The HPCC Systems Covid-19 tracker presents the Covid-19 data on a daily, weekly, and cumulative basis up to global-level and down to the county-level. It also provides statistical analysis for each level such as new cases per 100,000 population. The primary analysis such as Contagion Risk and Infection State is based on causal model with a seven-day sliding window. Our work has been released as a publicly available website to the world and attracted a great volume of traffic. The project is open-sourced and available on GitHub. The system was developed on the LexisNexis HPCC Systems, which is briefly described in the paper.
ABSTRACT We introduce a novel meta-analysis framework to combine dependent tests under a general setting, and utilize it to synthesize various microbiome association tests that are calculated from the same dataset. Our development builds upon the classical meta-analysis methods of aggregating P-values and also a more recent general method of combining confidence distributions, but makes generalizations to handle dependent tests. The proposed framework ensures rigorous statistical guarantees, and we provide a comprehensive study and compare it with various existing dependent combination methods. Notably, we demonstrate that the widely used Cauchy combination method for dependent tests, referred to as the vanilla Cauchy combination in this article, can be viewed as a special case within our framework. Moreover, the proposed framework provides a way to address the problem when the distributional assumptions underlying the vanilla Cauchy combination are violated. Our numerical results demonstrate that ignoring the dependence among the to-be-combined components may lead to a severe size distortion phenomenon. Compared to the existing P-value combination methods, including the vanilla Cauchy combination method and other methods, the proposed combination framework is flexible and can be adapted to handle the dependence accurately and utilizes the information efficiently to construct tests with accurate size and enhanced power. The development is applied to the microbiome association studies, where we aggregate information from multiple existing tests using the same dataset. The combined tests harness the strengths of each individual test across a wide range of alternative spaces, enabling more efficient and meaningful discoveries of vital microbiome associations.
Yu, Zhe; Chakraborty, Joymallya; Menzies, Tim
(, IEEE Transactions on Software Engineering)
This research seeks to benefit the software engineering society by providing a simple yet effective pre-processing approach to achieve equalized odds fairness in machine learning software. Fairness issues have attracted increasing attention since machine learning software is increasingly used for high-stakes and high-risk decisions. It is the responsibility of all software developers to make their software accountable by ensuring that the machine learning software do not perform differently on different sensitive demographic groups—satisfying equalized odds. Different from prior works which either optimize for an equalized odds related metric during the learning process like a black-box, or manipulate the training data following some intuition; this work studies the root cause of the violation of equalized odds and how to tackle it. We found that equalizing the class distribution in each demographic group with sample weights is a necessary condition for achieving equalized odds without modifying the normal training process. In addition, an important partial condition for equalized odds (zero average odds difference) can be guaranteed when the class distributions are weighted to be not only equal but also balanced (1:1). Based on these analyses, we proposed FairBalance, a pre-processing algorithm which balances the class distribution in each demographic group by assigning calculated weights to the training data. On eight real-world datasets, our empirical results show that, at low computational overhead, the proposed pre-processing algorithm FairBalance can significantly improve equalized odds without much, if any damage to the utility. FairBalance also outperforms existing state-of-the-art approaches in terms of equalized odds. To facilitate reuse, reproduction, and validation, we made our scripts available at https://github.com/hil-se/FairBalance.
Chen, Xiaolin, Cheng, Jerry, Tian, Lu, and Xie, Minge. Exact Inference for Common Odds Ratio in Meta-Analysis with Zero-Total-Event Studies. Statistics in Biosciences . Web. doi:10.1007/s12561-024-09443-8.
Chen, Xiaolin, Cheng, Jerry, Tian, Lu, & Xie, Minge. Exact Inference for Common Odds Ratio in Meta-Analysis with Zero-Total-Event Studies. Statistics in Biosciences, (). https://doi.org/10.1007/s12561-024-09443-8
Chen, Xiaolin, Cheng, Jerry, Tian, Lu, and Xie, Minge.
"Exact Inference for Common Odds Ratio in Meta-Analysis with Zero-Total-Event Studies". Statistics in Biosciences (). Country unknown/Code not available: Springer Science + Business Media. https://doi.org/10.1007/s12561-024-09443-8.https://par.nsf.gov/biblio/10524288.
@article{osti_10524288,
place = {Country unknown/Code not available},
title = {Exact Inference for Common Odds Ratio in Meta-Analysis with Zero-Total-Event Studies},
url = {https://par.nsf.gov/biblio/10524288},
DOI = {10.1007/s12561-024-09443-8},
abstractNote = {Abstract Stemming from the high-profile publication of Nissen and Wolski (N Engl J Med 356:2457–2471, 2007) and subsequent discussions with divergent views on how to handle observed zero-total-event studies, defined to be studies that observe zero number of event in both treatment and control arms, the research topic concerning the common odds ratio model with zero-total-event studies remains to be an unresolved problem in meta-analysis. In this article, we address this problem by proposing a novel repro samples method to handle zero-total-event studies and make inference for the common odds ratio. The development explicitly accounts for the sampling scheme that generates the observed data and does not rely on any large sample approximations. It is theoretically justified with a guaranteed finite-sample performance. Simulation studies are designed to demonstrate the empirical performance of the proposed method. It shows that the proposed confidence set, although a little conservative, achieves the desired empirical coverage rate in all situations. The development also shows that the zero-total-event studies contain meaningful information and impact the inference for the common odds ratio. The proposed method is used to perform a meta-analysis of the 48 trials reported in Nissen and Wolski (N Engl J Med 356:2457–2471, 2007) as well},
journal = {Statistics in Biosciences},
publisher = {Springer Science + Business Media},
author = {Chen, Xiaolin and Cheng, Jerry and Tian, Lu and Xie, Minge},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.