skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence
We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include as special cases random spanning tree distributions and determinantal point processes. For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $$G$$ in $$\widetilde{O}(\lvert V\rvert)$$\footnote{Throughout, $$\widetilde{O}(\cdot)$$ hides polylogarithmic factors in $$n$$.} time per sample after an initial $$\widetilde{O}(\lvert E\rvert)$$ time preprocessing. This is the first nearly-linear runtime in the output size, which is clearly optimal. For a determinantal point process on $$k$$-sized subsets of a ground set of $$n$$ elements, defined via an $$n\times n$$ kernel matrix, we show how to approximately sample in $$\widetilde{O}(k^\omega)$$ time after an initial $$\widetilde{O}(nk^{\omega-1})$$ time preprocessing, where $$\omega<2.372864$$ is the matrix multiplication exponent. The time to compute just the weight of the output set is simply $$\simeq k^\omega$$, a natural barrier that suggests our runtime might be optimal for determinantal point processes as well. As a corollary, we even improve the state of the art for obtaining a single sample from a determinantal point process, from the prior runtime of $$\widetilde{O}(\min\{nk^2, n^\omega\})$$ to $$\widetilde{O}(nk^{\omega-1})$$. In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution $$\mu$$ on $$\binom{[n]}{k}$$ is reduced to sampling from related distributions on $$\binom{[t]}{k}$$ for $$t\ll n$$. We show that for strongly Rayleigh distributions, the domain size can be reduced to nearly linear in the output size $$t=\widetilde{O}(k)$$, improving the state of the art from $$t= \widetilde{O}(k^2)$$ for general strongly Rayleigh distributions and the more specialized $$t=\widetilde{O}(k^{1.5})$$ for spanning tree distributions. Our reduction involves sampling from $$\widetilde{O}(1)$$ domain-sparsified distributions, all of which can be produced efficiently assuming approximate overestimates for marginals of $$\mu$$ are known and stored in a convenient data structure. Having access to marginals is the discrete analog of having access to the mean and covariance of a continuous distribution, or equivalently knowing ``isotropy'' for the distribution, the key behind optimal samplers in the continuous setting based on the famous Kannan-Lov\'asz-Simonovits (KLS) conjecture. We view our result as analogous in spirit to the KLS conjecture and its consequences for sampling, but rather for discrete strongly Rayleigh measures.  more » « less
Award ID(s):
2045354
PAR ID:
10393964
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)
Page Range / eLocation ID:
123 to 134
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Braverman, Mark (Ed.)
    We present a framework for speeding up the time it takes to sample from discrete distributions $$\mu$$ defined over subsets of size $$k$$ of a ground set of $$n$$ elements, in the regime where $$k$$ is much smaller than $$n$$. We show that if one has access to estimates of marginals $$\mathbb{P}_{S\sim \mu}[i\in S]$$, then the task of sampling from $$\mu$$ can be reduced to sampling from related distributions $$\nu$$ supported on size $$k$$ subsets of a ground set of only $$n^{1-\alpha}\cdot \operatorname{poly}(k)$$ elements. Here, $$1/\alpha\in [1, k]$$ is the parameter of entropic independence for $$\mu$$. Further, our algorithm only requires sparsified distributions $$\nu$$ that are obtained by applying a sparse (mostly $$0$$) external field to $$\mu$$, an operation that for many distributions $$\mu$$ of interest, retains algorithmic tractability of sampling from $$\nu$$. This phenomenon, which we dub domain sparsification, allows us to pay a one-time cost of estimating the marginals of $$\mu$$, and in return reduce the amortized cost needed to produce many samples from the distribution $$\mu$$, as is often needed in upstream tasks such as counting and inference. For a wide range of distributions where $$\alpha=\Omega(1)$$, our result reduces the domain size, and as a corollary, the cost-per-sample, by a $$\operatorname{poly}(n)$$ factor. Examples include monomers in a monomer-dimer system, non-symmetric determinantal point processes, and partition-constrained Strongly Rayleigh measures. Our work significantly extends the reach of prior work of Anari and Derezi\'nski who obtained domain sparsification for distributions with a log-concave generating polynomial (corresponding to $$\alpha=1$$). As a corollary of our new analysis techniques, we also obtain a less stringent requirement on the accuracy of marginal estimates even for the case of log-concave polynomials; roughly speaking, we show that constant-factor approximation is enough for domain sparsification, improving over $O(1/k)$ relative error established in prior work. 
    more » « less
  2. We study the problem of parallelizing sampling from distributions related to determinants: symmetric, nonsymmetric, and partition-constrained determinantal point processes, as well as planar perfect matchings. For these distributions, the partition function, a.k.a.\ the count, can be obtained via matrix determinants, a highly parallelizable computation; Csanky proved it is in NC. However, parallel counting does not automatically translate to parallel sampling, as classic reductions between the two are inherently sequential. We show that a nearly quadratic parallel speedup over sequential sampling can be achieved for all the aforementioned distributions. If the distribution is supported on subsets of size $$k$$ of a ground set, we show how to approximately produce a sample in $$\widetilde{O}(k^{\frac{1}{2} + c})$$ time with polynomially many processors for any constant $c>0$. In the two special cases of symmetric determinantal point processes and planar perfect matchings, our bound improves to $$\widetilde{O}(\sqrt k)$$ and we show how to sample exactly in these cases. As our main technical contribution, we fully characterize the limits of batching for the steps of sampling-to-counting reductions. We observe that only $O(1)$ steps can be batched together if we strive for exact sampling, even in the case of nonsymmetric determinantal point processes. However, we show that for approximate sampling, $$\widetilde{\Omega}(k^{\frac{1}{2}-c})$$ steps can be batched together, for any entropically independent distribution, which includes all mentioned classes of determinantal point processes. Entropic independence and related notions have been the source of breakthroughs in Markov chain analysis in recent years, so we expect our framework to prove useful for distributions beyond those studied in this work. 
    more » « less
  3. Abstract A bipartite graph $$H = \left (V_1, V_2; E \right )$$ with $$\lvert V_1\rvert + \lvert V_2\rvert = n$$ is semilinear if $$V_i \subseteq \mathbb {R}^{d_i}$$ for some $$d_i$$ and the edge relation E consists of the pairs of points $$(x_1, x_2) \in V_1 \times V_2$$ satisfying a fixed Boolean combination of s linear equalities and inequalities in $$d_1 + d_2$$ variables for some s . We show that for a fixed k , the number of edges in a $$K_{k,k}$$ -free semilinear H is almost linear in n , namely $$\lvert E\rvert = O_{s,k,\varepsilon }\left (n^{1+\varepsilon }\right )$$ for any $$\varepsilon> 0$$ ; and more generally, $$\lvert E\rvert = O_{s,k,r,\varepsilon }\left (n^{r-1 + \varepsilon }\right )$$ for a $$K_{k, \dotsc ,k}$$ -free semilinear r -partite r -uniform hypergraph. As an application, we obtain the following incidence bound: given $$n_1$$ points and $$n_2$$ open boxes with axis-parallel sides in $$\mathbb {R}^d$$ such that their incidence graph is $$K_{k,k}$$ -free, there can be at most $$O_{k,\varepsilon }\left (n^{1+\varepsilon }\right )$$ incidences. The same bound holds if instead of boxes, one takes polytopes cut out by the translates of an arbitrary fixed finite set of half-spaces. We also obtain matching upper and (superlinear) lower bounds in the case of dyadic boxes on the plane, and point out some connections to the model-theoretic trichotomy in o -minimal structures (showing that the failure of an almost-linear bound for some definable graph allows one to recover the field operations from that graph in a definable manner). 
    more » « less
  4. We develop a framework for sampling from discrete distributions $$\mu$$ on the hypercube $$\{\pm 1\}^n$$ by sampling from continuous distributions supported on $$\mathbb{R}^n$$ obtained by convolution with spherical Gaussians. We show that for well-studied families of discrete distributions $$\mu$$, convolving $$\mu$$ with Gaussians yields well-conditioned log-concave distributions, as long as the variance of the Gaussian is above an $O(1)$ threshold. We then reduce the task of sampling from $$\mu$$ to sampling from Gaussian-convolved distributions. Our reduction is based on a stochastic process widely studied under different names: backward diffusion in diffusion models, and stochastic localization. We discretize this process in a novel way that allows for high accuracy and parallelism. As our main application, we resolve open questions Anari, Hu, Saberi, and Schild raised on the parallel sampling of distributions that admit parallel counting. We show that determinantal point processes can be sampled via RNC algorithms, that is in time $$\log(n)^{O(1)}$$ using $$n^{O(1)}$$ processors. For a wider class of distributions, we show our framework yields Quasi-RNC sampling, i.e., $$\log(n)^{O(1)}$$ time using $$n^{O(\log n)}$$ processors. This wider class includes non-symmetric determinantal point processes and random Eulerian tours in digraphs, the latter nearly resolving another open question raised by prior work. Of potentially independent interest, we introduce and study a notion of smoothness for discrete distributions that we call transport stability, which we use to control the propagation of error in our framework. Additionally, we connect transport stability to constructions of optimally mixing local random walks and concentration inequalities. 
    more » « less
  5. Chaudhuri, Kamalika and (Ed.)
    We study the problem of reinforcement learning (RL) with low (policy) switching cost {—} a problem well-motivated by real-life RL applications in which deployments of new policies are costly and the number of policy updates must be low. In this paper, we propose a new algorithm based on stage-wise exploration and adaptive policy elimination that achieves a regret of $$\widetilde{O}(\sqrt{H^4S^2AT})$$ while requiring a switching cost of $$O(HSA \log\log T)$$. This is an exponential improvement over the best-known switching cost $$O(H^2SA\log T)$$ among existing methods with $$\widetilde{O}(\mathrm{poly}(H,S,A)\sqrt{T})$$ regret. In the above, $S,A$ denotes the number of states and actions in an $$H$$-horizon episodic Markov Decision Process model with unknown transitions, and $$T$$ is the number of steps. As a byproduct of our new techniques, we also derive a reward-free exploration algorithm with a switching cost of $O(HSA)$. Furthermore, we prove a pair of information-theoretical lower bounds which say that (1) Any no-regret algorithm must have a switching cost of $$\Omega(HSA)$$; (2) Any $$\widetilde{O}(\sqrt{T})$$ regret algorithm must incur a switching cost of $$\Omega(HSA\log\log T)$$. Both our algorithms are thus optimal in their switching costs. 
    more » « less