skip to main content


Title: Optimal adaptive algorithms for estimating mixtures of unknown coins.
Given a mixture between two populations of coins, “positive” coins that each have unknown and potentially different—bias ≥ 1 + ∆ and “negative” coins with bias ≤ 2 − ∆, we consider the task of estimating the fraction ρ of positive coins to within additive error E. We achieve an upper and lower bound of Θ( ρ log 1 ) samples for a 1 −δ probability of success, where crucially, our lower bound applies to all fully-adaptive algorithms. Thus, our sample complexity bounds have tight dependence for every relevant problem parameter. A crucial component of our lower bound proof is a decomposition lemma (Lemma 5.2) showing how to assemble partially-adaptive bounds into a fully-adaptive bound, which may be of independent interest: though we invoke it for the special case of Bernoulli random variables (coins), it applies to general distributions. We present sim- ulation results to demonstrate the practical efficacy of our approach for realistic problem parameters for crowdsourcing applications, focusing on the “rare events” regime where ρ is small. The fine-grained adaptive flavor of both our algo- rithm and lower bound contrasts with much previous workin distributional testing and learning.  more » « less
Award ID(s):
1900460
NSF-PAR ID:
10273447
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the 2021 ACM- SIAM Symposium on Discrete Algorithms (SODA), pages 414–433.
Volume:
SIAM 2021
Issue:
2021
Page Range / eLocation ID:
414-433
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Lysyanskaya, Anna ; Handschuh, Helena (Ed.)
    We study the black-box function inversion problem, which is the problem of finding x[N] such that f(x)=y, given as input some challenge point y in the image of a function f:[N][N], using T oracle queries to f and preprocessed advice 01S depending on f. We prove a number of new results about this problem, as follows. 1. We show an algorithm that works for any T and S satisfying TS2maxST=(N3) . In the important setting when ST, this improves on the celebrated algorithm of Fiat and Naor [STOC, 1991], which requires TS3N3. E.g., Fiat and Naor's algorithm is only non-trivial for SN23 , while our algorithm gives a non-trivial tradeoff for any SN12 . (Our algorithm and analysis are quite simple. As a consequence of this, we also give a self-contained and simple proof of Fiat and Naor's original result, with certain optimizations left out for simplicity.) 2. We show a non-adaptive algorithm (i.e., an algorithm whose ith query xi is chosen based entirely on and y, and not on the f(x1)f(xi−1)) that works for any T and S satisfying S=(Nlog(NT)) giving the first non-trivial non-adaptive algorithm for this problem. E.g., setting T=Npolylog(N) gives S=(NloglogN). This answers a question due to Corrigan-Gibbs and Kogan [TCC, 2019], who asked whether it was possible for a non-adaptive algorithm to work with parameters T and S satisfying T+SlogNo(N) . We also observe that our non-adaptive algorithm is what we call a guess-and-check algorithm, that is, it is non-adaptive and its final output is always one of the oracle queries x1xT. For guess-and-check algorithms, we prove a matching lower bound, therefore completely characterizing the achievable parameters (ST) for this natural class of algorithms. (Corrigan-Gibbs and Kogan showed that any such lower bound for arbitrary non-adaptive algorithms would imply new circuit lower bounds.) 3. We show equivalence between function inversion and a natural decision version of the problem in both the worst case and the average case, and similarly for functions f:[N][M] with different ranges. All of the above results are most naturally described in a model with shared randomness (i.e., random coins shared between the preprocessing algorithm and the online algorithm). However, as an additional contribution, we show (using a technique from communication complexity due to Newman [IPL, 1991]) how to generically convert any algorithm that uses shared randomness into one that does not. 
    more » « less
  2. The noise sensitivity of a Boolean function f: {0,1}^n - > {0,1} is one of its fundamental properties. For noise parameter delta, the noise sensitivity is denoted as NS_{delta}[f]. This quantity is defined as follows: First, pick x = (x_1,...,x_n) uniformly at random from {0,1}^n, then pick z by flipping each x_i independently with probability delta. NS_{delta}[f] is defined to equal Pr [f(x) != f(z)]. Much of the existing literature on noise sensitivity explores the following two directions: (1) Showing that functions with low noise-sensitivity are structured in certain ways. (2) Mathematically showing that certain classes of functions have low noise sensitivity. Combined, these two research directions show that certain classes of functions have low noise sensitivity and therefore have useful structure. The fundamental importance of noise sensitivity, together with this wealth of structural results, motivates the algorithmic question of approximating NS_{delta}[f] given an oracle access to the function f. We show that the standard sampling approach is essentially optimal for general Boolean functions. Therefore, we focus on estimating the noise sensitivity of monotone functions, which form an important subclass of Boolean functions, since many functions of interest are either monotone or can be simply transformed into a monotone function (for example the class of unate functions consists of all the functions that can be made monotone by reorienting some of their coordinates [O'Donnell, 2014]). Specifically, we study the algorithmic problem of approximating NS_{delta}[f] for monotone f, given the promise that NS_{delta}[f] >= 1/n^{C} for constant C, and for delta in the range 1/n <= delta <= 1/2. For such f and delta, we give a randomized algorithm performing O((min(1,sqrt{n} delta log^{1.5} n))/(NS_{delta}[f]) poly (1/epsilon)) queries and approximating NS_{delta}[f] to within a multiplicative factor of (1 +/- epsilon). Given the same constraints on f and delta, we also prove a lower bound of Omega((min(1,sqrt{n} delta))/(NS_{delta}[f] * n^{xi})) on the query complexity of any algorithm that approximates NS_{delta}[f] to within any constant factor, where xi can be any positive constant. Thus, our algorithm's query complexity is close to optimal in terms of its dependence on n. We introduce a novel descending-ascending view of noise sensitivity, and use it as a central tool for the analysis of our algorithm. To prove lower bounds on query complexity, we develop a technique that reduces computational questions about query complexity to combinatorial questions about the existence of "thin" functions with certain properties. The existence of such "thin" functions is proved using the probabilistic method. These techniques also yield new lower bounds on the query complexity of approximating other fundamental properties of Boolean functions: the total influence and the bias. 
    more » « less
  3. In the k -cut problem, we want to find the lowest-weight set of edges whose deletion breaks a given (multi)graph into k connected components. Algorithms of Karger and Stein can solve this in roughly O ( n 2k ) time. However, lower bounds from conjectures about the k -clique problem imply that Ω ( n (1- o (1)) k ) time is likely needed. Recent results of Gupta, Lee, and Li have given new algorithms for general k -cut in n 1.98k + O(1) time, as well as specialized algorithms with better performance for certain classes of graphs (e.g., for small integer edge weights). In this work, we resolve the problem for general graphs. We show that the Contraction Algorithm of Karger outputs any fixed k -cut of weight α λ k with probability Ω k ( n - α k ), where λ k denotes the minimum k -cut weight. This also gives an extremal bound of O k ( n k ) on the number of minimum k -cuts and an algorithm to compute λ k with roughly n k polylog( n ) runtime. Both are tight up to lower-order factors, with the algorithmic lower bound assuming hardness of max-weight k -clique. The first main ingredient in our result is an extremal bound on the number of cuts of weight less than 2 λ k / k , using the Sunflower lemma. The second ingredient is a fine-grained analysis of how the graph shrinks—and how the average degree evolves—in the Karger process. 
    more » « less
  4. This paper considers a generalized version of the coin weighing problem with a spring scale that lies at the intersection of group testing and compressed sensing problems. Given a collection of n ≥ 2 coins of total weight d (for a known integer d), where the weight of each coin is an unknown integer in the range of {0, 1, ..., k} (for a known integer k ≥ 1), the goal is to determine the weight of each coin by weighing subsets of coins in a spring scale. The problem is to devise a weighing strategy that minimizes the average number of weighings over all possible weight configurations. For d = k = 1, an adaptive bisecting weighing strategy is known to be optimal. However, even the simplest non-trivial case of the problem, i.e., d = k = 2, is still open. For this case, we propose and analyze a simple and effective adaptive weighing strategy. Our analysis shows that the proposed strategy requires about 1.365log2n-0.5 weighings on average. As n grows unbounded, the proposed strategy, when compared to an optimal strategy within the commonly-used class of nested strategies, requires about 31.75% less number of weighings on average; and in comparison with the information-theoretic lower bound, it requires at most about 8.16% extra number of weighings on average. 
    more » « less
  5. The noise sensitivity of a Boolean function f:{0,1}n→{0,1} is one of its fundamental properties. A function of a positive noise parameter δ, it is denoted as NSδ[f]. Here we study the algorithmic problem of approximating it for monotone f, such that NSδ[f]≥1/nC for constant C, and where δ satisfies 1/n≤δ≤1/2. For such f and δ, we give a randomized algorithm performing O(min(1,n√δlog1.5n)NSδ[f]poly(1ϵ)) queries and approximating NSδ[f] to within a multiplicative factor of (1±ϵ). Given the same constraints on f and δ, we also prove a lower bound of Ω(min(1,n√δ)NSδ[f]⋅nξ) on the query complexity of any algorithm that approximates NSδ[f] to within any constant factor, where ξ can be any positive constant. Thus, our algorithm's query complexity is close to optimal in terms of its dependence on n. We introduce a novel descending-ascending view of noise sensitivity, and use it as a central tool for the analysis of our algorithm. To prove lower bounds on query complexity, we develop a technique that reduces computational questions about query complexity to combinatorial questions about the existence of "thin" functions with certain properties. The existence of such "thin" functions is proved using the probabilistic method. These techniques also yield previously unknown lower bounds on the query complexity of approximating other fundamental properties of Boolean functions: the total influence and the bias. 
    more » « less