skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Finite-Sample Bounds for Two-Distribution Hypothesis Tests
With the rapid growth of large language models, big data, and malicious online attacks, it has become increasingly important to have tools for anomaly detection that can distinguish machine from human, fair from unfair, and dangerous from safe. Prior work has shown that two-distribution (specified complexity) hypothesis tests are useful tools for such tasks, aiding in detecting bias in datasets and providing artificial agents with the ability to recognize artifacts that are likely to have been designed by humans and pose a threat. However, existing work on two-distribution hypothesis tests requires exact values for the specification function, which can often be costly or impossible to compute. In this work, we prove novel finite-sample bounds that allow for two-distribution hypothesis tests with only estimates of required quantities, such as specification function values. Significantly, the resulting bounds do not require knowledge of the true distribution, distinguishing them from traditional p-values. We apply our bounds to detect student cheating on multiple-choice tests, as an example where the exact specification function is unknown. We additionally apply our results to detect representational bias in machine-learning datasets and provide artificial agents with intention perception, showing that our results are consistent with prior work despite only requiring a finite sample of the space. Finally, we discuss additional applications and provide guidance for those applying these bounds to their own work.  more » « less
Award ID(s):
1950885
PAR ID:
10499937
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
2023 IEEE 10th International Conference on Data Science and Advanced Analytics (DSAA)
ISBN:
979-8-3503-4503-2
Page Range / eLocation ID:
1 to 11
Format(s):
Medium: X
Location:
Thessaloniki, Greece
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the problem of out-of-distribution (OOD) detection, that is, detecting whether a machine learning (ML) model's output can be trusted at inference time. While a number of tests for OOD detection have been proposed in prior work, a formal framework for studying this problem is lacking. We propose a definition for the notion of OOD that includes both the input distribution and the ML model, which provides insights for the construction of powerful tests for OOD detection. We also propose a multiple hypothesis testing inspired procedure to systematically combine any number of different statistics from the ML model using conformal p-values. We further provide strong guarantees on the probability of incorrectly classifying an in-distribution sample as OOD. In our experiments, we find that threshold-based tests proposed in prior work perform well in specific settings, but not uniformly well across different OOD instances. In contrast, our proposed method that combines multiple statistics performs uniformly well across different datasets and neural networks architectures. 
    more » « less
  2. null (Ed.)
    Peer prediction mechanisms incentivize agents to truthfully report their signals even in the absence of verification by comparing agents’ reports with those of their peers. In the detail-free multi-task setting, agents are asked to respond to multiple independent and identically distributed tasks, and the mechanism does not know the prior distribution of agents’ signals. The goal is to provide an epsilon-strongly truthful mechanism where truth-telling rewards agents “strictly” more than any other strategy profile (with epsilon additive error) even for heterogeneous agents, and to do so while requiring as few tasks as possible. We design a family of mechanisms with a scoring function that maps a pair of reports to a score. The mechanism is strongly truthful if the scoring function is “prior ideal”. Moreover, the mechanism is epsilon-strongly truthful as long as the scoring function used is sufficiently close to the ideal scoring function. This reduces the above mechanism design problem to a learning problem – specifically learning an ideal scoring function. Because learning the prior distribution is sufficient (but not necessary) to learn the scoring function, we can apply standard learning theory techniques that leverage side information about the prior (e.g., that it is close to some parametric model). Furthermore, we derive a variational representation of an ideal scoring function and reduce the learning problem into an empirical risk minimization. We leverage this reduction to obtain very general results for peer prediction in the multi-task setting. Specifically, Sample Complexity. We show how to derive good bounds on the number of tasks required for different types of priors–in some cases exponentially improving previous results. In particular, we can upper bound the required number of tasks for parametric models with bounded learning complexity. Furthermore, our reduction applies to myriad continuous signal space settings. To the best of our knowledge, this is the first peer-prediction mechanism on continuous signals designed for the multi-task setting. Connection to Machine Learning. We show how to turn a soft-predictor of an agent’s signals (given the other agents’ signals) into a mechanism. This allows the practical use of machine learning algorithms that give good results even when many agents provide noisy information. Stronger Properties. In the finite setting, we obtain -strongly truthful mechanisms for any stochastically relevant prior. Prior works either only apply to more restrictive settings, or achieve a weaker notion of truthfulness (informed truthfulness). 
    more » « less
  3. In general, obtaining the exact steady-state distribution of queue lengths is not feasible. Therefore, we focus on establishing bounds for the tail probabilities of queue lengths. We examine queueing systems under Heavy Traffic (HT) conditions and provide exponentially decaying bounds for the probability P(∈q > x), where ∈ is the HT parameter denoting how far the load is from the maximum allowed load. Our bounds are not limited to asymptotic cases and are applicable even for finite values of ∈, and they get sharper as ∈ - 0. Consequently, we derive non-asymptotic convergence rates for the tail probabilities. Furthermore, our results offer bounds on the exponential rate of decay of the tail, given by -1/2 log P(∈q > x) for any finite value of x. These can be interpreted as non-asymptotic versions of Large Deviation (LD) results. To obtain our results, we use an exponential Lyapunov function to bind the moment-generating function of queue lengths and apply Markov's inequality. We demonstrate our approach by presenting tail bounds for a continuous time Join-the-shortest queue (JSQ) system. 
    more » « less
  4. Many two-level nested simulation applications involve the conditional expectation of some response variable, where the expected response is the quantity of interest, and the expectation is with respect to the inner-level random variables, conditioned on the outer-level random variables. The latter typically represent random risk factors, and risk can be quantified by estimating the probability density function (pdf) or cumulative distribution function (cdf) of the conditional expectation. Much prior work has considered a naïve estimator that uses the empirical distribution of the sample averages across the inner-level replicates. This results in a biased estimator, because the distribution of the sample averages is over-dispersed relative to the distribution of the conditional expectation when the number of inner-level replicates is finite. Whereas most prior work has focused on allocating the numbers of outer- and inner-level replicates to balance the bias/variance tradeoff, we develop a bias-corrected pdf estimator. Our approach is based on the concept of density deconvolution, which is widely used to estimate densities with noisy observations but has not previously been considered for nested simulation problems. For a fixed computational budget, the bias-corrected deconvolution estimator allows more outer-level and fewer inner-level replicates to be used, which substantially improves the efficiency of the nested simulation. 
    more » « less
  5. Data that is gathered adaptively --- via bandit algorithms, for example --- exhibits bias. This is true both when gathering simple numeric valued data --- the empirical means kept track of by stochastic bandit algorithms are biased downwards --- and when gathering more complicated data --- running hypothesis tests on complex data gathered via contextual bandit algorithms leads to false discovery. In this paper, we show that this problem is mitigated if the data collection procedure is differentially private. This lets us both bound the bias of simple numeric valued quantities (like the empirical means of stochastic bandit algorithms), and correct the p-values of hypothesis tests run on the adaptively gathered data. Moreover, there exist differentially private bandit algorithms with near optimal regret bounds: we apply existing theorems in the simple stochastic case, and give a new analysis for linear contextual bandits. We complement our theoretical results with experiments validating our theory. 
    more » « less