We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include as special cases random spanning tree distributions and determinantal point processes. For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $G$ in $\widetilde{O}(\lvert V\rvert)$\footnote{Throughout, $\widetilde{O}(\cdot)$ hides polylogarithmic factors in $n$.} time per sample after an initial $\widetilde{O}(\lvert E\rvert)$ time preprocessing. This is the first nearlylinear runtime in the output size, which is clearly optimal. For a determinantal point process on $k$sized subsets of a ground set of $n$ elements, defined via an $n\times n$ kernel matrix, we show how to approximately sample in $\widetilde{O}(k^\omega)$ time after an initial $\widetilde{O}(nk^{\omega1})$ time preprocessing, where $\omega<2.372864$ is the matrix multiplication exponent. The time to compute just the weight of the output set is simply $\simeq k^\omega$, a natural barrier that suggests our runtime might be optimal for determinantal point processes as well. As a corollary, we even improve the state of the art for obtaining a single sample from a determinantal point process, from the prior runtime of $\widetilde{O}(\min\{nk^2, n^\omega\})$ to $\widetilde{O}(nk^{\omega1})$.
In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution $\mu$ on $\binom{[n]}{k}$ is reduced to sampling from related distributions on $\binom{[t]}{k}$ for $t\ll n$. We show that for strongly Rayleigh distributions, the domain size can be reduced to nearly linear in the output size $t=\widetilde{O}(k)$, improving the state of the art from $t= \widetilde{O}(k^2)$ for general strongly Rayleigh distributions and the more specialized $t=\widetilde{O}(k^{1.5})$ for spanning tree distributions. Our reduction involves sampling from $\widetilde{O}(1)$ domainsparsified distributions, all of which can be produced efficiently assuming approximate overestimates for marginals of $\mu$ are known and stored in a convenient data structure. Having access to marginals is the discrete analog of having access to the mean and covariance of a continuous distribution, or equivalently knowing ``isotropy'' for the distribution, the key behind optimal samplers in the continuous setting based on the famous KannanLov\'aszSimonovits (KLS) conjecture. We view our result as analogous in spirit to the KLS conjecture and its consequences for sampling, but rather for discrete strongly Rayleigh measures.
more »
« less
Domain Sparsification of Discrete Distributions Using Entropic Independence
We present a framework for speeding up the time it takes to sample from discrete distributions $\mu$ defined over subsets of size $k$ of a ground set of $n$ elements, in the regime where $k$ is much smaller than $n$. We show that if one has access to estimates of marginals $\mathbb{P}_{S\sim \mu}[i\in S]$, then the task of sampling from $\mu$ can be reduced to sampling from related distributions $\nu$ supported on size $k$ subsets of a ground set of only $n^{1\alpha}\cdot \operatorname{poly}(k)$ elements. Here, $1/\alpha\in [1, k]$ is the parameter of entropic independence for $\mu$. Further, our algorithm only requires sparsified distributions $\nu$ that are obtained by applying a sparse (mostly $0$) external field to $\mu$, an operation that for many distributions $\mu$ of interest, retains algorithmic tractability of sampling from $\nu$. This phenomenon, which we dub domain sparsification, allows us to pay a onetime cost of estimating the marginals of $\mu$, and in return reduce the amortized cost needed to produce many samples from the distribution $\mu$, as is often needed in upstream tasks such as counting and inference.
For a wide range of distributions where $\alpha=\Omega(1)$, our result reduces the domain size, and as a corollary, the costpersample, by a $\operatorname{poly}(n)$ factor. Examples include monomers in a monomerdimer system, nonsymmetric determinantal point processes, and partitionconstrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Derezi\'nski who obtained domain sparsification for distributions with a logconcave generating polynomial (corresponding to $\alpha=1$). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of logconcave polynomials; roughly speaking, we show that constantfactor approximation is enough for domain sparsification, improving over $O(1/k)$ relative error established in prior work.
more »
« less
 Award ID(s):
 2045354
 NSFPAR ID:
 10393961
 Editor(s):
 Braverman, Mark
 Date Published:
 Journal Name:
 Leibniz international proceedings in informatics
 Volume:
 215
 ISSN:
 18688969
 Page Range / eLocation ID:
 5:15:23
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We develop a framework for sampling from discrete distributions $\mu$ on the hypercube $\{\pm 1\}^n$ by sampling from continuous distributions supported on $\mathbb{R}^n$ obtained by convolution with spherical Gaussians. We show that for wellstudied families of discrete distributions $\mu$, convolving $\mu$ with Gaussians yields wellconditioned logconcave distributions, as long as the variance of the Gaussian is above an $O(1)$ threshold. We then reduce the task of sampling from $\mu$ to sampling from Gaussianconvolved distributions. Our reduction is based on a stochastic process widely studied under different names: backward diffusion in diffusion models, and stochastic localization. We discretize this process in a novel way that allows for high accuracy and parallelism. As our main application, we resolve open questions Anari, Hu, Saberi, and Schild raised on the parallel sampling of distributions that admit parallel counting. We show that determinantal point processes can be sampled via RNC algorithms, that is in time $\log(n)^{O(1)}$ using $n^{O(1)}$ processors. For a wider class of distributions, we show our framework yields QuasiRNC sampling, i.e., $\log(n)^{O(1)}$ time using $n^{O(\log n)}$ processors. This wider class includes nonsymmetric determinantal point processes and random Eulerian tours in digraphs, the latter nearly resolving another open question raised by prior work. Of potentially independent interest, we introduce and study a notion of smoothness for discrete distributions that we call transport stability, which we use to control the propagation of error in our framework. Additionally, we connect transport stability to constructions of optimally mixing local random walks and concentration inequalities.more » « less

We introduce a notion called entropic independence that is an entropic analog of spectral notions of highdimensional expansion. Informally, entropic independence of a background distribution $\mu$ on $k$sized subsets of a ground set of elements says that for any (possibly randomly chosen) set $S$, the relative entropy of a single element of $S$ drawn uniformly at random carries at most $O(1/k)$ fraction of the relative entropy of $S$. Entropic independence is the analog of the notion of spectral independence, if one replaces variance by entropy. We use entropic independence to derive tight mixing time bounds, overcoming the lossy nature of spectral analysis of Markov chains on exponentialsized state spaces. In our main technical result, we show a general way of deriving entropy contraction, a.k.a. modified logSobolev inequalities, for downup random walks from spectral notions. We show that spectral independence of a distribution under arbitrary external fields automatically implies entropic independence. We furthermore extend our theory to the case where spectral independence does not hold under arbitrary external fields. To do this, we introduce a framework for obtaining tight mixing time bounds for Markov chains based on what we call restricted modified logSobolev inequalities, which guarantee entropy contraction not for all distributions, but for those in a sufficiently large neighborhood of the stationary distribution. To derive our results, we relate entropic independence to properties of polynomials: $\mu$ is entropically independent exactly when a transformed version of the generating polynomial of $\mu$ is upper bounded by its linear tangent; this property is implied by concavity of the said transformation, which was shown by prior work to be locally equivalent to spectral independence. We apply our results to obtain (1) tight modified logSobolev inequalities and mixing times for multistep downup walks on fractionally logconcave distributions, (2) the tight mixing time of $O(n\log n)$ for Glauber dynamics on Ising models whose interaction matrix has eigenspectrum lying within an interval of length smaller than $1$, improving upon the prior quadratic dependence on $n$, and (3) nearlylinear time $\widetilde O_{\delta}(n)$ samplers for the hardcore and Ising models on $n$node graphs that have $\delta$relative gap to the treeuniqueness threshold. In the last application, our bound on the running time does not depend on the maximum degree $\Delta$ of the graph, and is therefore optimal even for highdegree graphs, and in fact, is sublinear in the size of the graph for highdegree graphs.more » « less

Loh, PoLing ; Raginsky, Maxim (Ed.)We establish a connection between sampling and optimization on discrete domains. For a family of distributions $\mu$ defined on size $k$ subsets of a ground set of elements, that is closed under external fields, we show that rapid mixing of natural local random walks implies the existence of simple approximation algorithms to find $\max \mu(\cdot)$. More precisely, we show that if $t$step downup random walks have spectral gap at least inverse polynomially large, then $t$step local search finds $\max \mu(\cdot)$ within a factor of $k^{O(k)}$. As the main application of our result, we show that $2$step local search achieves a nearlyoptimal $k^{O(k)}$factor approximation for MAP inference on nonsymmetric $k$DPPs. This is the first nontrivial multiplicative approximation algorithm for this problem. In our main technical result, we show that an exchange inequality, a concept rooted in discrete convex analysis, can be derived from fast mixing of local random walks. We further advance the state of the art on the mixing of random walks for nonsymmetric DPPs and more generally sectorstable distributions, by obtaining the tightest possible bound on the step size needed for polynomialtime mixing of random walks. We bring the step size down by a factor of $2$ compared to prior works, and consequently get a quadratic improvement on the runtime of local search steps; this improvement is potentially of independent interest in sampling applications.more » « less

An \ell _p oblivious subspace embedding is a distribution over r \times n matrices \Pi such that for any fixed n \times d matrix A , \[ \Pr _{\Pi }[\textrm {for all }x, \ \Vert Ax\Vert _p \le \Vert \Pi Ax\Vert _p \le \kappa \Vert Ax\Vert _p] \ge 9/10,\] where r is the dimension of the embedding, \kappa is the distortion of the embedding, and for an n dimensional vector y , \Vert y\Vert _p = (\sum _{i=1}^n y_i^p)^{1/p} is the \ell _p norm. Another important property is the sparsity of \Pi , that is, the maximum number of nonzero entries per column, as this determines the running time of computing \Pi A . While for p = 2 there are nearly optimal tradeoffs in terms of the dimension, distortion, and sparsity, for the important case of 1 \le p \lt 2 , much less was known. In this article, we obtain nearly optimal tradeoffs for \ell _1 oblivious subspace embeddings, as well as new tradeoffs for 1 \lt p \lt 2 . Our main results are as follows: (1) We show for every 1 \le p \lt 2 , any oblivious subspace embedding with dimension r has distortion \[ \kappa = \Omega \left(\frac{1}{\left(\frac{1}{d}\right)^{1 / p} \log ^{2 / p}r + \left(\frac{r}{n}\right)^{1 / p  1 / 2}}\right).\] When r = {\operatorname{poly}}(d) \ll n in applications, this gives a \kappa = \Omega (d^{1/p}\log ^{2/p} d) lower bound, and shows the oblivious subspace embedding of Sohler and Woodruff (STOC, 2011) for p = 1 is optimal up to {\operatorname{poly}}(\log (d)) factors. (2) We give sparse oblivious subspace embeddings for every 1 \le p \lt 2 . Importantly, for p = 1 , we achieve r = O(d \log d) , \kappa = O(d \log d) and s = O(\log d) nonzero entries per column. The best previous construction with s \le {\operatorname{poly}}(\log d) is due to Woodruff and Zhang (COLT, 2013), giving \kappa = \Omega (d^2 {\operatorname{poly}}(\log d)) or \kappa = \Omega (d^{3/2} \sqrt {\log n} \cdot {\operatorname{poly}}(\log d)) and r \ge d \cdot {\operatorname{poly}}(\log d) ; in contrast our r = O(d \log d) and \kappa = O(d \log d) are optimal up to {\operatorname{poly}}(\log (d)) factors even for dense matrices. We also give (1) \ell _p oblivious subspace embeddings with an expected 1+\varepsilon number of nonzero entries per column for arbitrarily small \varepsilon \gt 0 , and (2) the first oblivious subspace embeddings for 1 \le p \lt 2 with O(1) distortion and dimension independent of n . Oblivious subspace embeddings are crucial for distributed and streaming environments, as well as entrywise \ell _p lowrank approximation. Our results give improved algorithms for these applications.more » « less