Abstract A primary challenge in the analysis of RNA-seq data is to identify differentially expressed genes or transcripts while controlling for technical biases. Ideally, a statistical testing procedure should incorporate the inherent uncertainty of the abundance estimates arising from the quantification step. Most popular methods for RNA-seq differential expression analysis fit a parametric model to the counts for each gene or transcript, and a subset of methods can incorporate uncertainty. Previous work has shown that nonparametric models for RNA-seq differential expression may have better control of the false discovery rate, and adapt well to new data types without requiring reformulation of a parametric model. Existing nonparametric models do not take into account inferential uncertainty, leading to an inflated false discovery rate, in particular at the transcript level. We propose a nonparametric model for differential expression analysis using inferential replicate counts, extending the existing SAMseq method to account for inferential uncertainty. We compare our method, Swish, with popular differential expression analysis methods. Swish has improved control of the false discovery rate, in particular for transcripts with high inferential uncertainty. We apply Swish to a single-cell RNA-seq dataset, assessing differential expression between sub-populations of cells, and compare its performance to the Wilcoxon test.
more »
« less
This content will become publicly available on May 1, 2026
Shifted BH methods for controlling false discovery rate in multiple testing of the means of correlated normals against two-sided alternatives
For simultaneous testing of multivariate normal means with known correlation matrix against two-sided alternatives, this paper introduces new methods with proven finite-sample control of false discovery rate. The methods are obtained by shifting each p-value to the left and considering a Benjamini–Hochberg-type linear step-up procedure based on these shifted p-values. The amount of shift for each -value is appropriately determined from the correlation matrix to achieve the desired false discovery rate control. Simulation studies and real-data application show favorable performances of the proposed methods when compared with relevant competitors.
more »
« less
- Award ID(s):
- 2210687
- PAR ID:
- 10608763
- Publisher / Repository:
- Elsevier BV
- Date Published:
- Journal Name:
- Journal of Statistical Planning and Inference
- Volume:
- 236
- Issue:
- C
- ISSN:
- 0378-3758
- Page Range / eLocation ID:
- 106238
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Biological research often involves testing a growing number of null hypotheses as new data are accumulated over time. We study the problem of online control of the familywise error rate, that is testing an a priori unbounded sequence of hypotheses ( p-values) one by one over time without knowing the future, such that with high probability there are no false discoveries in the entire sequence. This paper unifies algorithmic concepts developed for offline (single batch) familywise error rate control and online false discovery rate control to develop novel online familywise error rate control methods. Though many offline familywise error rate methods (e.g., Bonferroni, fallback procedures and Sidak’s method) can trivially be extended to the online setting, our main contribution is the design of new, powerful, adaptive online algorithms that control the familywise error rate when the p-values are independent or locally dependent in time. Our numerical experiments demonstrate substantial gains in power, that are also formally proved in an idealized Gaussian sequence model. A promising application to the International Mouse Phenotyping Consortium is described.more » « less
-
Differential privacy provides a rigorous framework for privacy-preserving data analysis. This paper proposes the first differentially private procedure for controlling the false discovery rate (FDR) in multiple hypothesis testing. Inspired by the Benjamini-Hochberg procedure (BHq), our approach is to first repeatedly add noise to the logarithms of the p-values to ensure differential privacy and to select an approximately smallest p-value serving as a promising candidate at each iteration; the selected p-values are further supplied to the BHq and our private procedure releases only the rejected ones. Moreover, we develop a new technique that is based on a backward submartingale for proving FDR control of a broad class of multiple testing procedures, including our private procedure, and both the BHq step- up and step-down procedures. As a novel aspect, the proof works for arbitrary dependence between the true null and false null test statistics, while FDR control is maintained up to a small multiplicative factor.more » « less
-
The reproducibility debate has caused a renewed interest in changing how one reports uncertainty, from 𝑝-value for testing a null hypothesis to a confidence interval (CI) for the corresponding parameter. When CIs for multiple selected parameters are being reported, the analog of the false discovery rate (FDR) is the false coverage rate (FCR), which is the expected ratio of number of reported CIs failing to cover their respective parameters to the total number of reported CIs. Here, we consider the general problem of FCR control in the online setting, where one encounters an infinite sequence of fixed unknown parameters ordered by time. We propose a novel solution to the problem which only requires the scientist to be able to construct marginal CIs. As special cases, our framework yields algorithms for online FDR control and online sign-classification procedures that control the false sign rate (FSR). All of our methodology applies equally well to prediction intervals, having particular implications for selective conformal inference.more » « less
-
The knockoff filter is a recent false discovery rate (FDR) control method for high-dimensional linear models. We point out that knockoff has three key components: ranking algorithm, augmented design, and symmetric statistic, and each component admits multiple choices. By considering various combinations of the three components, we obtain a collection of variants of knockoff. All these variants guarantee finite-sample FDR control, and our goal is to compare their power. We assume a Rare and Weak signal model on regression coeffi- cients and compare the power of different variants of knockoff by deriving explicit formulas of false positive rate and false negative rate. Our results provide new insights on how to improve power when controlling FDR at a targeted level. We also compare the power of knockoff with its propotype - a method that uses the same ranking algorithm but has access to an ideal threshold. The comparison reveals the additional price one pays by finding a data-driven threshold to control FDR.more » « less
An official website of the United States government
