We study the problem of estimating the distribution of effect sizes (the mean of the test statistic under the alternate hypothesis) in a multiple testing setting. Knowing this distribution allows us to calculate the power (type II error) of any experimental design. We show that it is possible to estimate this distribution using an inexpensive pilot experiment, which takes significantly fewer samples than would be required by an experiment that identified the discoveries. Our estimator can be used to guarantee the number of discoveries that will be made using a given experimental design in a future experiment. We prove that this simple and computationally efficient estimator enjoys a number of favorable theoretical properties, and demonstrate its effectiveness on data from a gene knockout experiment on influenza inhibition in Drosophila.
more »
« less
Estimating the number and effect sizes of non-null hypotheses
We study the problem of estimating the distribution of effect sizes (the mean of the test statistic under the alternate hypothesis) in a multiple testing setting. Knowing this distribution allows us to calculate the power (type II error) of any experimental design. We show that it is possible to estimate this distribution using an inexpensive pilot experiment, which takes significantly fewer samples than would be required by an experiment that identified the discoveries. Our estimator can be used to guarantee the number of discoveries that will be made using a given experimental design in a future experiment. We prove that this simple and computationally efficient estimator enjoys a number of favorable theoretical properties, and demonstrate its effectiveness on data from a gene knockout experiment on influenza inhibition in Drosophila.
more »
« less
- Award ID(s):
- 1907907
- PAR ID:
- 10186595
- Date Published:
- Journal Name:
- Proceedings of Machine Learning Research
- ISSN:
- 2640-3498
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Ma, S (Ed.)The Maximally Informative Next Experiment or MINE is a new experimental design approach for experiments, such as those in omics, in which the number of effects or parameters p greatly exceeds the number of samples n (p > n). Classical experimental design presumes n > p for inference about parameters and its application to p > n can lead to over-fitting. To overcome p > n, MINE is an ensemble method, which makes predictions about future experiments from an existing ensemble of models consistent with available data in order to select the most informative next experiment. Its advantages are in exploration of the data for new relationships with n < p and being able to integrate smaller and more tractable experiments to replace adaptively one large classic experiment as discoveries are made. Thus, using MINE is model-guided and adaptive over time in a large omics study. Here, MINE is illustrated on two distinct multi-year experiments, one involving genetic networks in Neurospora crassa and a second one involving a Genome Wide Association Study or GWAS in Sorghum bicolor as a comparison to classic experimental design in an agricultural setting.more » « less
-
Abstract The assumption of normality is usually tied to the design and analysis of an experimental study. However, when dealing with lifetime testing and censoring at fixed time intervals, we can no longer assume that the outcomes will be normally distributed. This generally requires the use of optimal design techniques to construct the test plan for specific distribution of interest. Optimal designs in this situation depend on the parameters of the distribution, which are generally unknown a priori. A Bayesian approach can be used by placing a prior distribution on the parameters, thereby leading to an appropriate selection of experimental design. This, along with the model and number of predictors, can be used to derive the D‐optimal design for an allowed number of experimental runs. This paper explores using this Bayesian approach on various lifetime regression models to select appropriate D‐optimal designs in regular and irregular design regions.more » « less
-
Recent research has developed several Monte Carlo methods for estimating the normalization constant (partition function) based on the idea of annealing. This means sampling successively from a path of distributions that interpolate between a tractable "proposal" distribution and the unnormalized "target" distribution. Prominent estimators in this family include annealed importance sampling and annealed noise-contrastive estimation (NCE). Such methods hinge on a number of design choices: which estimator to use, which path of distributions to use and whether to use a path at all; so far, there is no definitive theory on which choices are efficient. Here, we evaluate each design choice by the asymptotic estimation error it produces. First, we show that using NCE is more efficient than the importance sampling estimator, but in the limit of infinitesimal path steps, the difference vanishes. Second, we find that using the geometric path brings down the estimation error from an exponential to a polynomial function of the parameter distance between the target and proposal distributions. Third, we find that the arithmetic path, while rarely used, can offer optimality properties over the universally-used geometric path. In fact, in a particular limit, the optimal path is arithmetic. Based on this theory, we finally propose a two-step estimator to approximate the optimal path in an efficient way.more » « less
-
In classical statistics, a well known paradigm consists in establishing asymptotic equivalence between an experiment of i.i.d. observations and a Gaussian shift experiment, with the aim of obtaining optimal estimators in the former complicated model from the latter simpler model. In particular, a statistical experiment consisting of n i.i.d. observations from d-dimensional multinomial distributions can be well approximated by an experiment consisting of d − 1 dimensional Gaussian distributions. In a quantum version of the result, it has been shown that a collection of n qudits (d-dimensional quantum states) of full rank can be well approximated by a quantum system containing a classical part, which is a d − 1 dimensional Gaussian distribution, and a quantum part containing an ensemble of d(d − 1)/2 shifted thermal states. In this paper, we obtain a generalization of this result when the qudits are not of full rank. We show that when the rank of the qudits is r, then the limiting experiment consists of an r − 1 dimensional Gaussian distribution and an ensemble of both shifted pure and shifted thermal states. For estimation purposes, we establish an asymptotic minimax result in the limiting Gaussian model. Analogous results are then obtained for estimation of a low rank qudit from an ensemble of identically prepared, independent quantum systems, using the local asymptotic equivalence result. We also consider the problem of estimation of a linear functional of the quantum state. We construct an estimator for the functional, analyze the risk and use quantum local asymptotic equivalence to show that our estimator is also optimal in the minimax sense.more » « less
An official website of the United States government

