Controlling false discovery rate (FDR) is crucial for variable selection, multiple testing, among other signal detection problems. In literature, there is certainly no shortage of FDR control strategies when selecting individual features, but the relevant works for structural change detection, such as profile analysis for piecewise constant coefficients and integration analysis with multiple data sources, are limited. In this paper, we propose a generalized knockoff procedure (GKnockoff) for FDR control under such problem settings. We prove that the GKnockoff possesses pairwise exchangeability, and is capable of controlling the exact FDR under finite sample sizes. We further explore GKnockoff under high dimensionality, by first introducing a new screening method to filter the high-dimensional potential structural changes. We adopt a data splitting technique to first reduce the dimensionality via screening and then conduct GKnockoff on the refined selection set. Furthermore, the powers of proposed methods are systematically studied. Numerical comparisons with other methods show the superior performance of GKnockoff, in terms of both FDR control and power. We also implement the proposed methods to analyze a macroeconomic dataset for detecting changes of driven effects of economic development on the secondary industry.
more »
« less
Compositional knockoff filter for high‐dimensional regression analysis of microbiome data
Abstract A critical task in microbiome data analysis is to explore the association between a scalar response of interest and a large number of microbial taxa that are summarized as compositional data at different taxonomic levels. Motivated by fine‐mapping of the microbiome, we propose a two‐step compositional knockoff filter to provide the effective finite‐sample false discovery rate (FDR) control in high‐dimensional linear log‐contrast regression analysis of microbiome compositional data. In the first step, we propose a new compositional screening procedure to remove insignificant microbial taxa while retaining the essential sum‐to‐zero constraint. In the second step, we extend the knockoff filter to identify the significant microbial taxa in the sparse regression model for compositional data. Thereby, a subset of the microbes is selected from the high‐dimensional microbial taxa as related to the response under a prespecified FDR threshold. We study the theoretical properties of the proposed two‐step procedure, including both sure screening and effective false discovery control. We demonstrate these properties in numerical simulation studies to compare our methods to some existing ones and show power gain of the new method while controlling the nominal FDR. The potential usefulness of the proposed method is also illustrated with application to an inflammatory bowel disease data set to identify microbial taxa that influence host gene expressions.
more »
« less
- Award ID(s):
- 1953189
- PAR ID:
- 10544988
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Biometrics
- Volume:
- 77
- Issue:
- 3
- ISSN:
- 0006-341X
- Page Range / eLocation ID:
- 984 to 995
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The knockoff filter is a recent false discovery rate (FDR) control method for high-dimensional linear models. We point out that knockoff has three key components: ranking algorithm, augmented design, and symmetric statistic, and each component admits multiple choices. By considering various combinations of the three components, we obtain a collection of variants of knockoff. All these variants guarantee finite-sample FDR control, and our goal is to compare their power. We assume a Rare and Weak signal model on regression coeffi- cients and compare the power of different variants of knockoff by deriving explicit formulas of false positive rate and false negative rate. Our results provide new insights on how to improve power when controlling FDR at a targeted level. We also compare the power of knockoff with its propotype - a method that uses the same ranking algorithm but has access to an ideal threshold. The comparison reveals the additional price one pays by finding a data-driven threshold to control FDR.more » « less
-
Abstract Differential abundance analysis is at the core of statistical analysis of microbiome data. The compositional nature of microbiome sequencing data makes false positive control challenging. Here, we show that the compositional effects can be addressed by a simple, yet highly flexible and scalable, approach. The proposed method, LinDA, only requires fitting linear regression models on the centered log-ratio transformed data, and correcting the bias due to compositional effects. We show that LinDA enjoys asymptotic FDR control and can be extended to mixed-effect models for correlated microbiome data. Using simulations and real examples, we demonstrate the effectiveness of LinDA.more » « less
-
We propose a deep learning–based knockoffs inference framework, DeepLINK, that guarantees the false discovery rate (FDR) control in high-dimensional settings. DeepLINK is applicable to a broad class of covariate distributions described by the possibly nonlinear latent factor models. It consists of two major parts: an autoencoder network for the knockoff variable construction and a multilayer perceptron network for feature selection with the FDR control. The empirical performance of DeepLINK is investigated through extensive simulation studies, where it is shown to achieve FDR control in feature selection with both high selection power and high prediction accuracy. We also apply DeepLINK to three real data applications to demonstrate its practical utility.more » « less
-
Abstract Microbiome data from sequencing experiments contain the relative abundance of a large number of microbial taxa with their evolutionary relationships represented by a phylogenetic tree. The compositional and high-dimensional nature of the microbiome mediator challenges the validity of standard mediation analyses. We propose a phylogeny-based mediation analysis method called PhyloMed to address this challenge. Unlike existing methods that directly identify individual mediating taxa, PhyloMed discovers mediation signals by analyzing subcompositions defined on the phylogenic tree. PhyloMed produces well-calibrated mediation test p -values and yields substantially higher discovery power than existing methods.more » « less
An official website of the United States government

