skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sequential Monte Carlo testing by betting
Abstract In a Monte Carlo test, the observed dataset is fixed, and several resampled or permuted versions of the dataset are generated in order to test a null hypothesis that the original dataset is exchangeable with the resampled/permuted ones. Sequential Monte Carlo tests aim to save computational resources by generating these additional datasets sequentially one by one and potentially stopping early. While earlier tests yield valid inference at a particular prespecified stopping rule, our work develops a new anytime-valid Monte Carlo test that can be continuously monitored, yielding a p-value or e-value at any stopping time possibly not specified in advance. It generalizes the well-known method by Besag and Clifford, allowing it to stop at any time, but also encompasses new sequential Monte Carlo tests that tend to stop sooner under the null and alternative without compromising power. The core technical advance is the development of new test martingales for testing exchangeability against a very particular alternative based on a testing by betting technique. The proposed betting strategies are guided by the derivation of a simple log-optimal betting strategy, have closed-form expressions for the wealth process, provable guarantees on resampling risk, and display excellent power in practice.  more » « less
Award ID(s):
2310718 2053804
PAR ID:
10581195
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
87
Issue:
4
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 1200-1220
Size(s):
p. 1200-1220
Sponsoring Org:
National Science Foundation
More Like this
  1. Suppose that we are given sample access to an unknown distribution p over n elements and an explicit distribution q over the same n elements. We would like to reject the null hypothesis“p=q” after seeing as few samples as possible, whenp6=q, while we never want to reject the null, when p=q. Well-known results show thatΘ(√n/2)samples are necessary and sufficient for distinguishing whether p equals q versus p is-far from q in total variation distance. However,this requires the distinguishing radiusto be fixed prior to deciding how many samples to request.Our goal is instead to design sequential hypothesis testers, i.e. online algorithms that request i.i.d.samples from p and stop as soon as they can confidently reject the hypothesis p=q, without being given a lower bound on the distance between p and q, whenp6=q. In particular, we want to minimize the number of samples requested by our tests as a function of the distance between p and q, and if p=q we want the algorithm, with high probability, to never reject the null. Our work is motivated by and addresses the practical challenge of sequential A/B testing in Statistics.We show that, when n= 2, any sequential hypothesis test must seeΩ(1dtv(p,q)2log log1dtv(p,q))samples, with high (constant) probability, before it rejects p=q, where dtv(p,q) is the—unknown to the tester—total variation distance between p and q. We match the dependence of this lower bound ondtv(p,q)by proposing a sequential tester that rejects p=q from at most O(√ndtv(p,q)2log log1dtv(p,q))samples with high (constant) probability. TheΩ(√n)dependence on the support size n is also known to be necessary. We similarly provide two-sample sequential hypothesis testers, when sample access is given to both p and q, and discuss applications to sequential A/B testing. 
    more » « less
  2. Robinson, Emma Claire (Ed.)
    Many research questions in sensory neuroscience involve determining whether the neural representation of a stimulus property is invariant or specific to a particular stimulus context (e.g., Is object representation invariant to translation? Is the representation of a face feature specific to the context of other face features?). Between these two extremes, representations may also be context-tolerant or context-sensitive. Most neuroimaging studies have used operational tests in which a target property is inferred from a significant test against the null hypothesis of the opposite property. For example, the popular cross-classification test concludes that representations are invariant or tolerant when the null hypothesis of specificity is rejected. A recently developed neurocomputational theory suggests two insights regarding such tests. First, tests against the null of context-specificity, and for the alternative of context-invariance, are prone to false positives due to the way in which the underlying neural representations are transformed into indirect measurements in neuroimaging studies. Second, jointly performing tests against the nulls of invariance and specificity allows one to reach more precise and valid conclusions about the underlying representations, particularly when the null of invariance is tested using the fine-grained information from classifier decision variables rather than only accuracies (i.e., using the decoding separability test). Here, we provide empirical and computational evidence supporting both of these theoretical insights. In our empirical study, we use encoding of orientation and spatial position in primary visual cortex as a case study, as previous research has established that these properties are encoded in a context-sensitive way. Using fMRI decoding, we show that the cross-classification test produces false-positive conclusions of invariance, but that more valid conclusions can be reached by jointly performing tests against the null of invariance. The results of two simulations further support both of these conclusions. We conclude that more valid inferences about invariance or specificity of neural representations can be reached by jointly testing against both hypotheses, and using neurocomputational theory to guide the interpretation of results. 
    more » « less
  3. Accurate detection of infected individuals is one of the critical steps in stopping any pandemic. When the underlying infection rate of the disease is low, testing people in groups, instead of testing each individual in the population, can be more efficient. In this work, we consider noisy adaptive group testing design with specific test sensitivity and specificity that select the optimal group given previous test results based on pre-selected utility function. As in prior studies on group testing, we model this problem as a sequential Bayesian Optimal Experimental Design (BOED) to adaptively design the groups for each test. We analyze the required number of group tests when using the updated posterior on the infection status and the corresponding Mutual Information (MI) as our utility function for selecting new groups. More importantly, we study how the potential bias on the ground-truth noise of group tests may affect the group testing sample complexity. 
    more » « less
  4. Scholkopf, Bernhard; Uhler, Caroline; Zhang, Kun (Ed.)
    In order to test if a treatment is perceptibly different from a placebo in a randomized experiment with covariates, classical nonparametric tests based on ranks of observations/residuals have been employed (eg: by Rosenbaum), with finite-sample valid inference enabled via permutations. This paper proposes a different principle on which to base inference: if — with access to all covariates and outcomes, but without access to any treatment assignments — one can form a ranking of the subjects that is sufficiently nonrandom (eg: mostly treated followed by mostly control), then we can confidently conclude that there must be a treatment effect. Based on a more nuanced, quantifiable, version of this principle, we design an interactive test called i-bet: the analyst forms a single permutation of the subjects one element at a time, and at each step the analyst bets toy money on whether that subject was actually treated or not, and learns the truth immediately after. The wealth process forms a real-valued measure of evidence against the global causal null, and we may reject the null at level if the wealth ever crosses 1= . Apart from providing a fresh “game-theoretic” principle on which to base the causal conclusion, the i-bet has other statistical and computational benefits, for example (A) allowing a human to adaptively design the test statistic based on increasing amounts of data being revealed (along with any working causal models and prior knowledge), and (B) not requiring permutation resampling, instead noting that under the null, the wealth forms a nonnegative martingale, and the type-1 error control of the aforementioned decision rule follows from a tight inequality by Ville. Further, if the null is not rejected, new subjects can later be added and the test can be simply continued, without any corrections (unlike with permutation p-values). Numerical experiments demonstrate good power under various heterogeneous treatment effects. We first describe i-bet test for two-sample comparisons with unpaired data, and then adapt it to paired data, multi-sample comparison, and sequential settings; these may be viewed as interactive martingale variants of the Wilcoxon, Kruskal-Wallis, and Friedman tests. 
    more » « less
  5. Summary Sequential Monte Carlo algorithms are widely accepted as powerful computational tools for making inference with dynamical systems. A key step in sequential Monte Carlo is resampling, which plays the role of steering the algorithm towards the future dynamics. Several strategies have been used in practice, including multinomial resampling, residual resampling, optimal resampling, stratified resampling and optimal transport resampling. In one-dimensional cases, we show that optimal transport resampling is equivalent to stratified resampling on the sorted particles, and both strategies minimize the resampling variance as well as the expected squared energy distance between the original and resampled empirical distributions. For general $$d$$-dimensional cases, we show that if the particles are first sorted using the Hilbert curve, the variance of stratified resampling is $$O(m^{-(1+2/d)})$$, an improvement over the best previously known rate of $$O(m^{-(1+1/d)})$$, where $$m$$ is the number of resampled particles. We show that this improved rate is optimal for ordered stratified resampling schemes, as conjectured in Gerber et al. (2019). We also present an almost-sure bound on the Wasserstein distance between the original and Hilbert-curve-resampled empirical distributions. In light of these results, we show that for dimension $d>1$ the mean square error of sequential quasi-Monte Carlo with $$n$$ particles can be $$O(n^{-1-4/\{d(d+4)\}})$$ if Hilbert curve resampling is used and a specific low-discrepancy set is chosen. To our knowledge, this is the first known convergence rate lower than $$o(n^{-1})$$. 
    more » « less