skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Catch me if you can: signal localization with knockoff e -values
Abstract We consider problems where many, somewhat redundant, hypotheses are tested and we are interested in reporting the most precise rejections, with false discovery rate (FDR) control. This is the case, for example, when researchers are interested both in individual hypotheses as well as group hypotheses corresponding to intersections of sets of the original hypotheses, at several resolution levels. A concrete application is in genome-wide association studies, where, depending on the signal strengths, it might be possible to resolve the influence of individual genetic variants on a phenotype with greater or lower precision. To adapt to the unknown signal strength, analyses are conducted at multiple resolutions and researchers are most interested in the more precise discoveries. Assuring FDR control on the reported findings with these adaptive searches is, however, often impossible. To design a multiple comparison procedure that allows for an adaptive choice of resolution with FDR control, we leverage e-values and linear programming. We adapt this approach to problems where knockoffs and group knockoffs have been successfully applied to test conditional independence hypotheses. We demonstrate its efficacy by analysing data from the UK Biobank.  more » « less
Award ID(s):
2210392
PAR ID:
10512708
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
87
Issue:
1
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 56-73
Size(s):
p. 56-73
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Adaptive multiple testing with covariates is an important research direction that has gained major attention in recent years. It has been widely recognised that leveraging side information provided by auxiliary covariates can improve the power of false discovery rate (FDR) procedures. Currently, most such procedures are devised with p-values as their main statistics. However, for two-sided hypotheses, the usual data processing step that transforms the primary statistics, known as p-values, into p-values not only leads to a loss of information carried by the main statistics, but can also undermine the ability of the covariates to assist with the FDR inference. We develop a p-value based covariate-adaptive (ZAP) methodology that operates on the intact structural information encoded jointly by the p-values and covariates. It seeks to emulate the oracle p-value procedure via a working model, and its rejection regions significantly depart from those of the p-value adaptive testing approaches. The key strength of ZAP is that the FDR control is guaranteed with minimal assumptions, even when the working model is misspecified. We demonstrate the state-of-the-art performance of ZAP using both simulated and real data, which shows that the efficiency gain can be substantial in comparison with p-value-based methods. Our methodology is implemented in the R package zap. 
    more » « less
  2. null (Ed.)
    We consider the problem of asynchronous online testing, aimed at providing control of the false discovery rate (FDR) during a continual stream of data collection and testing, where each test may be a sequential test that can start and stop at arbitrary times. This setting increasingly characterizes real-world applications in science and industry, where teams of researchers across large organizations may conduct tests of hypotheses in a decentralized manner. The overlap in time and space also tends to induce dependencies among test statistics, a challenge for classical methodology, which either assumes (overly optimistically) independence or (overly pessimistically) arbitrary dependence between test statistics. We present a general framework that addresses both of these issues via a unified computational abstraction that we refer to as “conflict sets.” We show how this framework yields algorithms with formal FDR guarantees under a more intermediate, local notion of dependence. We illustrate our algorithms in simulations by comparing to existing algorithms for online FDR control. 
    more » « less
  3. Abstract This study introduces a novelp-value-based multiple testing approach tailored for generalized linear models. Despite the crucial role of generalized linear models in statistics, existing methodologies face obstacles arising from the heterogeneous variance of response variables and complex dependencies among estimated parameters. Our aim is to address the challenge of controlling the false discovery rate (FDR) amidst arbitrarily dependent test statistics. Through the development of efficient computational algorithms, we present a versatile statistical framework for multiple testing. The proposed framework accommodates a range of tools developed for constructing a new model matrix in regression-type analysis, including random row permutations and Model-X knockoffs. We devise efficient computing techniques to solve the encountered non-trivial quadratic matrix equations, enabling the construction of pairedp-values suitable for the two-step multiple testing procedure proposed by Sarkar and Tang (Biometrika 109(4): 1149–1155, 2022). Theoretical analysis affirms the properties of our approach, demonstrating its capability to control the FDR at a given level. Empirical evaluations further substantiate its promising performance across diverse simulation settings. 
    more » « less
  4. Background Epigenome-wide association studies (EWAS), which seek the association between epigenetic marks and an outcome or exposure, involve multiple hypothesis testing. False discovery rate (FDR) control has been widely used for multiple testing correction. However, traditional FDR control methods do not use auxiliary covariates, and they could be less powerful if the covariates could inform the likelihood of the null hypothesis. Recently, many covariate-adaptive FDR control methods have been developed, but application of these methods to EWAS data has not yet been explored. It is not clear whether these methods can significantly improve detection power, and if so, which covariates are more relevant for EWAS data. Results In this study, we evaluate the performance of five covariate-adaptive FDR control methods with EWAS-related covariates using simulated as well as real EWAS datasets. We develop an omnibus test to assess the informativeness of the covariates. We find that statistical covariates are generally more informative than biological covariates, and the covariates of methylation mean and variance are almost universally informative. In contrast, the informativeness of biological covariates depends on specific datasets. We show that the independent hypothesis weighting (IHW) and covariate adaptive multiple testing (CAMT) method are overall more powerful, especially for sparse signals, and could improve the detection power by a median of 25% and 68% on real datasets, compared to the ST procedure. We further validate the findings in various biological contexts. Conclusions Covariate-adaptive FDR control methods with informative covariates can significantly increase the detection power for EWAS. For sparse signals, IHW and CAMT are recommended. 
    more » « less
  5. Abstract E-values have gained attention as potential alternatives to p-values as measures of uncertainty, significance and evidence. In brief, e-values are realized by random variables with expectation at most one under the null; examples include betting scores, (point null) Bayes factors, likelihood ratios and stopped supermartingales. We design a natural analogue of the Benjamini-Hochberg (BH) procedure for false discovery rate (FDR) control that utilizes e-values, called the e-BH procedure, and compare it with the standard procedure for p-values. One of our central results is that, unlike the usual BH procedure, the e-BH procedure controls the FDR at the desired level—with no correction—for any dependence structure between the e-values. We illustrate that the new procedure is convenient in various settings of complicated dependence, structured and post-selection hypotheses, and multi-armed bandit problems. Moreover, the BH procedure is a special case of the e-BH procedure through calibration between p-values and e-values. Overall, the e-BH procedure is a novel, powerful and general tool for multiple testing under dependence, that is complementary to the BH procedure, each being an appropriate choice in different applications. 
    more » « less