Optimal Sublinear Sampling of Spanning Trees and Determinantal Point Processes via Average-Case Entropic Independence

Anari, Nima; Liu, Yang P.; Vuong, Thuy-Duong

doi:10.1109/FOCS54457.2022.00019

We design fast algorithms for repeatedly sampling from strongly Rayleigh distributions, which include as special cases random spanning tree distributions and determinantal point processes. For a graph $G=(V, E)$, we show how to approximately sample uniformly random spanning trees from $$G$$ in $$\widetilde{O}(\lvert V\rvert)$$\footnote{Throughout, $$\widetilde{O}(\cdot)$$ hides polylogarithmic factors in $$n$$.} time per sample after an initial $$\widetilde{O}(\lvert E\rvert)$$ time preprocessing. This is the first nearly-linear runtime in the output size, which is clearly optimal. For a determinantal point process on $$k$$-sized subsets of a ground set of $$n$$ elements, defined via an $$n\times n$$ kernel matrix, we show how to approximately sample in $$\widetilde{O}(k^\omega)$$ time after an initial $$\widetilde{O}(nk^{\omega-1})$$ time preprocessing, where $$\omega<2.372864$$ is the matrix multiplication exponent. The time to compute just the weight of the output set is simply $$\simeq k^\omega$$, a natural barrier that suggests our runtime might be optimal for determinantal point processes as well. As a corollary, we even improve the state of the art for obtaining a single sample from a determinantal point process, from the prior runtime of $$\widetilde{O}(\min\{nk^2, n^\omega\})$$ to $$\widetilde{O}(nk^{\omega-1})$$. In our main technical result, we achieve the optimal limit on domain sparsification for strongly Rayleigh distributions. In domain sparsification, sampling from a distribution $$\mu$$ on $$\binom{[n]}{k}$$ is reduced to sampling from related distributions on $$\binom{[t]}{k}$$ for $$t\ll n$$. We show that for strongly Rayleigh distributions, the domain size can be reduced to nearly linear in the output size $$t=\widetilde{O}(k)$$, improving the state of the art from $$t= \widetilde{O}(k^2)$$ for general strongly Rayleigh distributions and the more specialized $$t=\widetilde{O}(k^{1.5})$$ for spanning tree distributions. Our reduction involves sampling from $$\widetilde{O}(1)$$ domain-sparsified distributions, all of which can be produced efficiently assuming approximate overestimates for marginals of $$\mu$$ are known and stored in a convenient data structure. Having access to marginals is the discrete analog of having access to the mean and covariance of a continuous distribution, or equivalently knowing ``isotropy'' for the distribution, the key behind optimal samplers in the continuous setting based on the famous Kannan-Lov\'asz-Simonovits (KLS) conjecture. We view our result as analogous in spirit to the KLS conjecture and its consequences for sampling, but rather for discrete strongly Rayleigh measures.

More Like this