skip to main content


Title: Estimating Normalizing Constants for Log-Concave Distributions: Algorithms and Lower Bounds
Estimating the normalizing constant of an unnormalized probability distribution has important applications in computer science, statistical physics, machine learning, and statistics. In this work, we consider the problem of estimating the normalizing constant to within a multiplication factor of 1 ± ε for a μ-strongly convex and L-smooth function f, given query access to f(x) and ∇f(x). We give both algorithms and lowerbounds for this problem. Using an annealing algorithm combined with a multilevel Monte Carlo method based on underdamped Langevin dynamics, we show that O(d^{4/3}/\eps^2) queries to ∇f are sufficient. Moreover, we provide an information theoretic lowerbound, showing that at least d^{1-o(1)}/\eps^{2-o(1)} queries are necessary. This provides a first nontrivial lowerbound for the problem.  more » « less
Award ID(s):
1845171 1704656
NSF-PAR ID:
10161659
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
STOC 2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The problem of testing monotonicity for Boolean functions on the hypergrid, $f:[n]^d \to \{0,1\}$ is a classic topic in property testing. When $n=2$, the domain is the hypercube. For the hypercube case, a breakthrough result of Khot-Minzer-Safra (FOCS 2015) gave a non-adaptive, one-sided tester making $\otilde(\eps^{-2}\sqrt{d})$ queries. Up to polylog $d$ and $\eps$ factors, this bound matches the $\widetilde{\Omega}(\sqrt{d})$-query non-adaptive lower bound (Chen-De-Servedio-Tan (STOC 2015), Chen-Waingarten-Xie (STOC 2017)). For any $n > 2$, the optimal non-adaptive complexity was unknown. A previous result of the authors achieves a $\otilde(d^{5/6})$-query upper bound (SODA 2020), quite far from the $\sqrt{d}$ bound for the hypercube. In this paper, we resolve the non-adaptive complexity of monotonicity testing for all constant $n$, up to $\poly(\eps^{-1}\log d)$ factors. Specifically, we give a non-adaptive, one-sided monotonicity tester making $\otilde(\eps^{-2}n\sqrt{d})$ queries. From a technical standpoint, we prove new directed isoperimetric theorems over the hypergrid $[n]^d$. These results generalize the celebrated directed Talagrand inequalities that were only known for the hypercube. 
    more » « less
  2. The noise sensitivity of a Boolean function f: {0,1}^n - > {0,1} is one of its fundamental properties. For noise parameter delta, the noise sensitivity is denoted as NS_{delta}[f]. This quantity is defined as follows: First, pick x = (x_1,...,x_n) uniformly at random from {0,1}^n, then pick z by flipping each x_i independently with probability delta. NS_{delta}[f] is defined to equal Pr [f(x) != f(z)]. Much of the existing literature on noise sensitivity explores the following two directions: (1) Showing that functions with low noise-sensitivity are structured in certain ways. (2) Mathematically showing that certain classes of functions have low noise sensitivity. Combined, these two research directions show that certain classes of functions have low noise sensitivity and therefore have useful structure. The fundamental importance of noise sensitivity, together with this wealth of structural results, motivates the algorithmic question of approximating NS_{delta}[f] given an oracle access to the function f. We show that the standard sampling approach is essentially optimal for general Boolean functions. Therefore, we focus on estimating the noise sensitivity of monotone functions, which form an important subclass of Boolean functions, since many functions of interest are either monotone or can be simply transformed into a monotone function (for example the class of unate functions consists of all the functions that can be made monotone by reorienting some of their coordinates [O'Donnell, 2014]). Specifically, we study the algorithmic problem of approximating NS_{delta}[f] for monotone f, given the promise that NS_{delta}[f] >= 1/n^{C} for constant C, and for delta in the range 1/n <= delta <= 1/2. For such f and delta, we give a randomized algorithm performing O((min(1,sqrt{n} delta log^{1.5} n))/(NS_{delta}[f]) poly (1/epsilon)) queries and approximating NS_{delta}[f] to within a multiplicative factor of (1 +/- epsilon). Given the same constraints on f and delta, we also prove a lower bound of Omega((min(1,sqrt{n} delta))/(NS_{delta}[f] * n^{xi})) on the query complexity of any algorithm that approximates NS_{delta}[f] to within any constant factor, where xi can be any positive constant. Thus, our algorithm's query complexity is close to optimal in terms of its dependence on n. We introduce a novel descending-ascending view of noise sensitivity, and use it as a central tool for the analysis of our algorithm. To prove lower bounds on query complexity, we develop a technique that reduces computational questions about query complexity to combinatorial questions about the existence of "thin" functions with certain properties. The existence of such "thin" functions is proved using the probabilistic method. These techniques also yield new lower bounds on the query complexity of approximating other fundamental properties of Boolean functions: the total influence and the bias. 
    more » « less
  3. A Boolean {\em $k$-monotone} function defined over a finite poset domain ${\cal D}$ alternates between the values $0$ and $1$ at most $k$ times on any ascending chain in ${\cal D}$. Therefore, $k$-monotone functions are natural generalizations of the classical {\em monotone} functions, which are the {\em $1$-monotone} functions. Motivated by the recent interest in $k$-monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of $k$-monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are $k$-monotone (or are close to being $k$-monotone) from functions that are far from being $k$-monotone. Our results include the following: \begin{enumerate} \item We demonstrate a separation between testing $k$-monotonicity and testing monotonicity, on the hypercube domain $\{0,1\}^d$, for $k\geq 3$; \item We demonstrate a separation between testing and learning on $\{0,1\}^d$, for $k=\omega(\log d)$: testing $k$-monotonicity can be performed with $2^{O(\sqrt d \cdot \log d\cdot \log{1/\eps})}$ queries, while learning $k$-monotone functions requires $2^{\Omega(k\cdot \sqrt d\cdot{1/\eps})}$ queries (Blais et al. (RANDOM 2015)). \item We present a tolerant test for functions $f\colon[n]^d\to \{0,1\}$ with complexity independent of $n$, which makes progress on a problem left open by Berman et al. (STOC 2014). \end{enumerate} Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid $[n]^d$, and draw connections to distribution testing techniques. 
    more » « less
  4. We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal error guarantees for several natural structured distributions. Our main contribution is to develop faster algorithms for this problem whose running time nearly matches that of computing the empirical covariance. Given N = Ω(d^2/\eps^2) samples from a d-dimensional Gaussian distribution, an \eps-fraction of which may be arbitrarily corrupted, our algorithm runs in time O(d^{3.26}/ poly(\eps)) and approximates the unknown covariance matrix to optimal error up to a logarithmic factor. Previous robust algorithms with comparable error guarantees all have runtimes Ω(d^{2ω}) when \eps = Ω(1), where ω is the exponent of matrix multiplication. We also provide evidence that improving the running time of our algorithm may require new algorithmic techniques. 
    more » « less
  5. In this work, we show that for a nontrivial hypothesis class C, we can estimate the distance of a target function f to C (estimate the error rate of the best h∈C) using substantially fewer labeled examples than would be needed to actually {\em learn} a good h∈C. Specifically, we show that for the class C of unions of d intervals on the line, in the active learning setting in which we have access to a pool of unlabeled examples drawn from an arbitrary underlying distribution D, we can estimate the error rate of the best h∈C to an additive error ϵ with a number of label requests that is {\em independent of d} and depends only on ϵ. In particular, we make O((1/ϵ^6)log(1/ϵ)) label queries to an unlabeled pool of size O((d/ϵ^2)log(1/ϵ)). This task of estimating the distance of an unknown f to a given class C is called {\em tolerant testing} or {\em distance estimation} in the testing literature, usually studied in a membership query model and with respect to the uniform distribution. Our work extends that of Balcan et al. (2012) who solved the {\em non}-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than ϵ). We also consider the related problem of estimating the performance of a given learning algorithm A in this setting. That is, given a large pool of unlabeled examples drawn from distribution D, can we, from only a few label queries, estimate how well A would perform if the entire dataset were labeled and given as training data to A? We focus on k-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of k for the given learning problem). 
    more » « less