An \ell _p oblivious subspace embedding is a distribution over r \times n matrices \Pi such that for any fixed n \times d matrix A , \[ \Pr _{\Pi }[\textrm {for all }x, \ \Vert Ax\Vert _p \le \Vert \Pi Ax\Vert _p \le \kappa \Vert Ax\Vert _p] \ge 9/10,\] where r is the dimension of the embedding, \kappa is the distortion of the embedding, and for an n -dimensional vector y , \Vert y\Vert _p = (\sum _{i=1}^n |y_i|^p)^{1/p} is the \ell _p -norm. Another important property is the sparsity of \Pi , that is, the maximum number of non-zero entries per column, as this determines the running time of computing \Pi A . While for p = 2 there are nearly optimal tradeoffs in terms of the dimension, distortion, and sparsity, for the important case of 1 \le p \lt 2 , much less was known. In this article, we obtain nearly optimal tradeoffs for \ell _1 oblivious subspace embeddings, as well as new tradeoffs for 1 \lt p \lt 2 . Our main results are as follows: (1) We show for every 1 \le p \lt 2 , any oblivious subspace embedding with dimension r has distortion \[ \kappa = \Omega \left(\frac{1}{\left(\frac{1}{d}\right)^{1 / p} \log ^{2 / p}r + \left(\frac{r}{n}\right)^{1 / p - 1 / 2}}\right).\] When r = {\operatorname{poly}}(d) \ll n in applications, this gives a \kappa = \Omega (d^{1/p}\log ^{-2/p} d) lower bound, and shows the oblivious subspace embedding of Sohler and Woodruff (STOC, 2011) for p = 1 is optimal up to {\operatorname{poly}}(\log (d)) factors. (2) We give sparse oblivious subspace embeddings for every 1 \le p \lt 2 . Importantly, for p = 1 , we achieve r = O(d \log d) , \kappa = O(d \log d) and s = O(\log d) non-zero entries per column. The best previous construction with s \le {\operatorname{poly}}(\log d) is due to Woodruff and Zhang (COLT, 2013), giving \kappa = \Omega (d^2 {\operatorname{poly}}(\log d)) or \kappa = \Omega (d^{3/2} \sqrt {\log n} \cdot {\operatorname{poly}}(\log d)) and r \ge d \cdot {\operatorname{poly}}(\log d) ; in contrast our r = O(d \log d) and \kappa = O(d \log d) are optimal up to {\operatorname{poly}}(\log (d)) factors even for dense matrices. We also give (1) \ell _p oblivious subspace embeddings with an expected 1+\varepsilon number of non-zero entries per column for arbitrarily small \varepsilon \gt 0 , and (2) the first oblivious subspace embeddings for 1 \le p \lt 2 with O(1) -distortion and dimension independent of n . Oblivious subspace embeddings are crucial for distributed and streaming environments, as well as entrywise \ell _p low-rank approximation. Our results give improved algorithms for these applications.
more »
« less
This content will become publicly available on December 31, 2026
Speeding up Langevin Dynamics by Mixing
We study an overdamped Langevin equation on the $$d$$-dimensional torus with stationary distribution proportional to $$p = e^{-U / \kappa}$$. When $$U$$ has multiple wells the mixing time of the associated process is exponentially large (of size $$e^{O(1/\kappa)}$$). We add a drift to the Langevin dynamics (without changing the stationary distribution) and obtain quantitative estimates on the mixing time. Our main result shows that the mixing time of the Langevin system can be made arbitrarily small by adding a drift that is sufficiently mixing. We provide one construction of a mixing drift, and our main result can be applied by using this drift with a large amplitude. For numerical purposes, it is useful to keep the size of the imposed drift small, and we show that the smallest allowable amplitude ensures that the mixing time is $$O( d/\kappa^2)$$, which is an order of magnitude smaller than $$e^{O(1/\kappa)}$$.
more »
« less
- PAR ID:
- 10656767
- Publisher / Repository:
- SIAM
- Date Published:
- Journal Name:
- Multiscale Modeling & Simulation
- Volume:
- 23
- Issue:
- 4
- ISSN:
- 1540-3459
- Page Range / eLocation ID:
- 1696 to 1743
- Subject(s) / Keyword(s):
- mixing Langevin Monte Carlo enhanced dissipation
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We study the dissipation enhancement by cellular flows. Previous work by Iyer, Xu, and Zlato\v{s} produces a family of cellular flows that can enhance dissipation by an arbitrarily large amount. We improve this result by providing quantitative bounds on the dissipation enhancement in terms of the flow amplitude, cell size and diffusivity. Explicitly we show that the \emph{mixing time} is bounded by the exit time from one cell when the flow amplitude is large enough, and by the reciprocal of the effective diffusivity when the flow amplitude is small. This agrees with the optimal heuristics. We also prove a general result relating the \emph{dissipation time} of incompressible flows to the \emph{mixing time}. The main idea behind the proof is to study the dynamics probabilistically and construct a successful coupling.more » « less
-
In this paper, we examine the computational complexity of sampling from a Bayesian posterior (or pseudo-posterior) using the Metropolis-adjusted Langevin algorithm (MALA). MALA first employs a discrete-time Langevin SDE to propose a new state, and then adjusts the proposed state using Metropolis-Hastings rejection. Most existing theoretical analyses of MALA rely on the smoothness and strong log-concavity properties of the target distribution, which are often lacking in practical Bayesian problems. Our analysis hinges on statistical large sample theory, which constrains the deviation of the Bayesian posterior from being smooth and log-concave in a very specific way. In particular, we introduce a new technique for bounding the mixing time of a Markov chain with a continuous state space via the s-conductance profile, offering improvements over existing techniques in several aspects. By employing this new technique, we establish the optimal parameter dimension dependence of d^1/3 and condition number dependence of κ in the non-asymptotic mixing time upper bound for MALA after the burn-in period, under a standard Bayesian setting where the target posterior distribution is close to a d-dimensional Gaussian distribution with a covariance matrix having a condition number κ. We also prove a matching mixing time lower bound for sampling from a multivariate Gaussian via MALA to complement the upper bound.more » « less
-
We present a weighted approach to compute a maximum cardinality matching in an arbitrary bipartite graph. Our main result is a new algorithm that takes as input a weighted bipartite graph G(A cup B,E) with edge weights of 0 or 1. Let w <= n be an upper bound on the weight of any matching in G. Consider the subgraph induced by all the edges of G with a weight 0. Suppose every connected component in this subgraph has O(r) vertices and O(mr/n) edges. We present an algorithm to compute a maximum cardinality matching in G in O~(m(sqrt{w} + sqrt{r} + wr/n)) time. When all the edge weights are 1 (symmetrically when all weights are 0), our algorithm will be identical to the well-known Hopcroft-Karp (HK) algorithm, which runs in O(m sqrt{n}) time. However, if we can carefully assign weights of 0 and 1 on its edges such that both w and r are sub-linear in n and wr=O(n^{gamma}) for gamma < 3/2, then we can compute maximum cardinality matching in G in o(m sqrt{n}) time. Using our algorithm, we obtain a new O~(n^{4/3}/epsilon^4) time algorithm to compute an epsilon-approximate bottleneck matching of A,B subsetR^2 and an 1/(epsilon^{O(d)}}n^{1+(d-1)/(2d-1)}) poly log n time algorithm for computing epsilon-approximate bottleneck matching in d-dimensions. All previous algorithms take Omega(n^{3/2}) time. Given any graph G(A cup B,E) that has an easily computable balanced vertex separator for every subgraph G'(V',E') of size |V'|^{delta}, for delta in [1/2,1), we can apply our algorithm to compute a maximum matching in O~(mn^{delta/1+delta}) time improving upon the O(m sqrt{n}) time taken by the HK-Algorithm.more » « less
-
We introduce a notion called entropic independence that is an entropic analog of spectral notions of high-dimensional expansion. Informally, entropic independence of a background distribution $$\mu$$ on $$k$$-sized subsets of a ground set of elements says that for any (possibly randomly chosen) set $$S$$, the relative entropy of a single element of $$S$$ drawn uniformly at random carries at most $O(1/k)$ fraction of the relative entropy of $$S$$. Entropic independence is the analog of the notion of spectral independence, if one replaces variance by entropy. We use entropic independence to derive tight mixing time bounds, overcoming the lossy nature of spectral analysis of Markov chains on exponential-sized state spaces. In our main technical result, we show a general way of deriving entropy contraction, a.k.a. modified log-Sobolev inequalities, for down-up random walks from spectral notions. We show that spectral independence of a distribution under arbitrary external fields automatically implies entropic independence. We furthermore extend our theory to the case where spectral independence does not hold under arbitrary external fields. To do this, we introduce a framework for obtaining tight mixing time bounds for Markov chains based on what we call restricted modified log-Sobolev inequalities, which guarantee entropy contraction not for all distributions, but for those in a sufficiently large neighborhood of the stationary distribution. To derive our results, we relate entropic independence to properties of polynomials: $$\mu$$ is entropically independent exactly when a transformed version of the generating polynomial of $$\mu$$ is upper bounded by its linear tangent; this property is implied by concavity of the said transformation, which was shown by prior work to be locally equivalent to spectral independence. We apply our results to obtain (1) tight modified log-Sobolev inequalities and mixing times for multi-step down-up walks on fractionally log-concave distributions, (2) the tight mixing time of $$O(n\log n)$$ for Glauber dynamics on Ising models whose interaction matrix has eigenspectrum lying within an interval of length smaller than $$1$$, improving upon the prior quadratic dependence on $$n$$, and (3) nearly-linear time $$\widetilde O_{\delta}(n)$$ samplers for the hardcore and Ising models on $$n$$-node graphs that have $$\delta$$-relative gap to the tree-uniqueness threshold. In the last application, our bound on the running time does not depend on the maximum degree $$\Delta$$ of the graph, and is therefore optimal even for high-degree graphs, and in fact, is sublinear in the size of the graph for high-degree graphs.more » « less
An official website of the United States government
