Analysis of stochastic Lanczos quadrature for spectrum approximation
The cumulative empirical spectral measure (CESM) $\Phi[\mathbf{A}] : \mathbb{R} \to [0,1]$ of a $n\times n$ symmetric matrix $\mathbf{A}$ is defined as the fraction of eigenvalues of $\mathbf{A}$ less than a given threshold, i.e., $\Phi[\mathbf{A}](x) := \sum_{i=1}^{n} \frac{1}{n} {\large\unicode{x1D7D9}}[ \lambda_i[\mathbf{A}]\leq x]$. Spectral sums $\operatorname{tr}(f[\mathbf{A}])$ can be computed as the Riemann–Stieltjes integral of $f$ against $\Phi[\mathbf{A}]$, so the task of estimating CESM arises frequently in a number of applications, including machine learning. We present an error analysis for stochastic Lanczos quadrature (SLQ). We show that SLQ obtains an approximation to the CESM within a Wasserstein distance of $t \: | \lambda_{\text{max}}[\mathbf{A}] - \lambda_{\text{min}}[\mathbf{A}] |$ with probability at least $1-\eta$, by applying the Lanczos algorithm for $\lceil 12 t^{-1} + \frac{1}{2} \rceil$ iterations to $\lceil 4 ( n+2 )^{-1}t^{-2} \ln(2n\eta^{-1}) \rceil$ vectors sampled independently and uniformly from the unit sphere. We additionally provide (matrix-dependent) a posteriori error bounds for the Wasserstein and Kolmogorov–Smirnov distances between the output of this algorithm and the true CESM. The quality of our bounds is demonstrated using numerical experiments.
Authors:
; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10324443
Journal Name:
Proceedings of the 38th International Conference on Machine Learning
Volume:
139
National Science Foundation
##### More Like this
1. The classic graphical Cheeger inequalities state that if $M$ is an $n\times n$ \emph{symmetric} doubly stochastic matrix, then $\frac{1-\lambda_{2}(M)}{2}\leq\phi(M)\leq\sqrt{2\cdot(1-\lambda_{2}(M))}$ where $\phi(M)=\min_{S\subseteq[n],|S|\leq n/2}\left(\frac{1}{|S|}\sum_{i\in S,j\not\in S}M_{i,j}\right)$ is the edge expansion of $M$, and $\lambda_{2}(M)$ is the second largest eigenvalue of $M$. We study the relationship between $\phi(A)$ and the spectral gap $1-\re\lambda_{2}(A)$ for \emph{any} doubly stochastic matrix $A$ (not necessarily symmetric), where $\lambda_{2}(A)$ is a nontrivial eigenvalue of $A$ with maximum real part. Fiedler showed that the upper bound on $\phi(A)$ is unaffected, i.e., $\phi(A)\leq\sqrt{2\cdot(1-\re\lambda_{2}(A))}$. With regards to the lower bound on $\phi(A)$, there are known constructions with $\phi(A)\in\Theta\left(\frac{1-\re\lambda_{2}(A)}{\log n}\right),$ indicating that at least a mild dependence on $n$ is necessary to lower bound $\phi(A)$. In our first result, we provide an \emph{exponentially} better construction of $n\times n$ doubly stochastic matrices $A_{n}$, for which $\phi(A_{n})\leq\frac{1-\re\lambda_{2}(A_{n})}{\sqrt{n}}.$ In fact, \emph{all} nontrivial eigenvalues of our matrices are $0$, even though the matrices are highly \emph{nonexpanding}. We further show that this bound is in the correct range (up to the exponent of $n$), by showing that for any doubly stochastic matrix $A$, $\phi(A)\geq\frac{1-\re\lambda_{2}(A)}{35\cdot n}.$ As a consequence, unlike the symmetric case, there is a (necessary) loss of amore »
2. Consider the linear transport equation in 1D under an external confining potential \begin{document}$\Phi$\end{document}:

For \begin{document}$\Phi = \frac {x^2}2 + \frac { \varepsilon x^4}2$\end{document} (with \begin{document}$\varepsilon >0$\end{document} small), we prove phase mixing and quantitative decay estimates for \begin{document}${\partial}_t \varphi : = - \Delta^{-1} \int_{ \mathbb{R}} {\partial}_t f \, \mathrm{d} v$\end{document}, with an inverse polynomial decay rate \begin{document}$O({\langle} t{\rangle}^{-2})$\end{document}. In the proof, we develop a commuting vector field approach, suitably adapted to this setting. We will explain why we hope this is relevant for the nonlinear stability of the zero solution for the Vlasov–Poisson system in \begin{document}$1$\end{document}D under the external potential \begin{document}$\Phi$\end{document}.

3. A long-standing conjecture by Kotzig, Ringel, and Rosa states that every tree admits a graceful labeling. That is, for any tree $T$ with $n$~edges, it is conjectured that there exists a labeling $f\colon V(T) \to \{0,1,\ldots,n\}$ such that the set of induced edge labels $\bigl\{ |f(u)-f(v)| : \{u,v\}\in E(T) \bigr\}$ is exactly $\{1,2,\ldots,n\}$. We extend this concept to allow for multigraphs with edge multiplicity at most~$2$. A \emph{2-fold graceful labeling} of a graph (or multigraph) $G$ with $n$~edges is a one-to-one function $f\colon V(G) \to \{0,1,\ldots,n\}$ such that the multiset of induced edge labels is comprised of two copies of each element in $\bigl\{ 1,2,\ldots, \lfloor n/2 \rfloor \bigr\}$ and, if $n$ is odd, one copy of $\bigl\{ \lceil n/2 \rceil \bigr\}$. When $n$ is even, this concept is similar to that of 2-equitable labelings which were introduced by Bloom and have been studied for several classes of graphs. We show that caterpillars, cycles of length $n \not\equiv 1 \pmod{4}$, and complete bipartite graphs admit 2-fold graceful labelings. We also show that under certain conditions, the join of a tree and an empty graph (i.e., a graph with vertices but no edges) is $2$-fold graceful.
4. Abstract

We continue the program of proving circuit lower bounds via circuit satisfiability algorithms. So far, this program has yielded several concrete results, proving that functions in$\mathsf {Quasi}\text {-}\mathsf {NP} = \mathsf {NTIME}[n^{(\log n)^{O(1)}}]$$\mathrm{Quasi}-\mathrm{NP}=\mathrm{NTIME}\left[{n}^{{\left(\mathrm{log}n\right)}^{O\left(1\right)}}\right]$and other complexity classes do not have small circuits (in the worst case and/or on average) from various circuit classes$\mathcal { C}$$C$, by showing that$\mathcal { C}$$C$admits non-trivial satisfiability and/or#SAT algorithms which beat exhaustive search by a minor amount. In this paper, we present a new strong lower bound consequence of having a non-trivial#SAT algorithm for a circuit class${\mathcal C}$$C$. Say that a symmetric Boolean functionf(x1,…,xn) issparseif it outputs 1 onO(1) values of${\sum }_{i} x_{i}$${\sum }_{i}{x}_{i}$. We show that for every sparsef, and for all “typical”$\mathcal { C}$$C$, faster#SAT algorithms for$\mathcal { C}$$C$circuits imply lower bounds against the circuit class$f \circ \mathcal { C}$$f\circ C$, which may bestrongerthan$\mathcal { C}$$C$itself. In particular:

#SAT algorithms fornk-size$\mathcal { C}$$C$-circuits running in 2n/nktime (for allk) implyNEXPdoes not have$(f \circ \mathcal { C})$$\left(f\circ C\right)$-circuits of polynomial size.

#SAT algorithms for$2^{n^{{\varepsilon }}}$${2}^{{n}^{\epsilon }}$-size$\mathcal { C}$$C$-circuits running in$2^{n-n^{{\varepsilon }}}$${2}^{n-{n}^{\epsilon }}$time (for someε> 0) implyQuasi-NPdoes not have$(f \circ \mathcal { C})$$\left(f\circ C\right)$-circuits of polynomial size.

Applying#SAT algorithms from the literature, one immediate corollary of our results is thatQuasi-NPdoes not haveEMAJACC0THRcircuits of polynomialmore »

5. Abstract Kernelized Gram matrix $W$ constructed from data points $\{x_i\}_{i=1}^N$ as $W_{ij}= k_0( \frac{ \| x_i - x_j \|^2} {\sigma ^2} )$ is widely used in graph-based geometric data analysis and unsupervised learning. An important question is how to choose the kernel bandwidth $\sigma$, and a common practice called self-tuned kernel adaptively sets a $\sigma _i$ at each point $x_i$ by the $k$-nearest neighbor (kNN) distance. When $x_i$s are sampled from a $d$-dimensional manifold embedded in a possibly high-dimensional space, unlike with fixed-bandwidth kernels, theoretical results of graph Laplacian convergence with self-tuned kernels have been incomplete. This paper proves the convergence of graph Laplacian operator $L_N$ to manifold (weighted-)Laplacian for a new family of kNN self-tuned kernels $W^{(\alpha )}_{ij} = k_0( \frac{ \| x_i - x_j \|^2}{ \epsilon \hat{\rho }(x_i) \hat{\rho }(x_j)})/\hat{\rho }(x_i)^\alpha \hat{\rho }(x_j)^\alpha$, where $\hat{\rho }$ is the estimated bandwidth function by kNN and the limiting operator is also parametrized by $\alpha$. When $\alpha = 1$, the limiting operator is the weighted manifold Laplacian $\varDelta _p$. Specifically, we prove the point-wise convergence of $L_N f$ and convergence of the graph Dirichlet form with rates. Our analysis is based on first establishing a $C^0$more »