skip to main content


Title: Analysis of stochastic Lanczos quadrature for spectrum approximation
The cumulative empirical spectral measure (CESM) $\Phi[\mathbf{A}] : \mathbb{R} \to [0,1]$ of a $n\times n$ symmetric matrix $\mathbf{A}$ is defined as the fraction of eigenvalues of $\mathbf{A}$ less than a given threshold, i.e., $\Phi[\mathbf{A}](x) := \sum_{i=1}^{n} \frac{1}{n} {\large\unicode{x1D7D9}}[ \lambda_i[\mathbf{A}]\leq x]$. Spectral sums $\operatorname{tr}(f[\mathbf{A}])$ can be computed as the Riemann–Stieltjes integral of $f$ against $\Phi[\mathbf{A}]$, so the task of estimating CESM arises frequently in a number of applications, including machine learning. We present an error analysis for stochastic Lanczos quadrature (SLQ). We show that SLQ obtains an approximation to the CESM within a Wasserstein distance of $t \: | \lambda_{\text{max}}[\mathbf{A}] - \lambda_{\text{min}}[\mathbf{A}] |$ with probability at least $1-\eta$, by applying the Lanczos algorithm for $\lceil 12 t^{-1} + \frac{1}{2} \rceil$ iterations to $\lceil 4 ( n+2 )^{-1}t^{-2} \ln(2n\eta^{-1}) \rceil$ vectors sampled independently and uniformly from the unit sphere. We additionally provide (matrix-dependent) a posteriori error bounds for the Wasserstein and Kolmogorov–Smirnov distances between the output of this algorithm and the true CESM. The quality of our bounds is demonstrated using numerical experiments.  more » « less
Award ID(s):
1945652
NSF-PAR ID:
10324443
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 38th International Conference on Machine Learning
Volume:
139
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. For an $r$-uniform hypergraph $H$, let $\nu^{(m)}(H)$ denote the maximum size of a set $M$ of edges in $H$ such that every two edges in $M$ intersect in less than $m$ vertices, and let $\tau^{(m)}(H)$ denote the minimum size of a collection $C$ of $m$-sets of vertices such that every edge in $H$ contains an element of $C$. The fractional analogues of these parameters are denoted by $\nu^{*(m)}(H)$ and $\tau^{*(m)}(H)$, respectively. Generalizing a famous conjecture of Tuza on covering triangles in a graph, Aharoni and Zerbib conjectured that for every $r$-uniform hypergraph $H$, $\tau^{(r-1)}(H)/\nu^{(r-1)}(H) \leq \lceil{\frac{r+1}{2}}\rceil$. In this paper we prove bounds on the ratio between the parameters $\tau^{(m)}$ and $\nu^{(m)}$, and their fractional analogues. Our main result is that, for every $r$-uniform hypergraph~$H$,\[ \tau^{*(r-1)}(H)/\nu^{(r-1)}(H) \le \begin{cases} \frac{3}{4}r - \frac{r}{4(r+1)} &\text{for }r\text{ even,}\\\frac{3}{4}r - \frac{r}{4(r+2)} &\text{for }r\text{ odd.} \\\end{cases} \]This improves the known bound of $r-1$.We also prove that, for every $r$-uniform hypergraph $H$, $\tau^{(m)}(H)/\nu^{*(m)}(H) \le \operatorname{ex}_m(r, m+1)$, where the Turán number $\operatorname{ex}_r(n, k)$ is the maximum number of edges in an $r$-uniform hypergraph on $n$ vertices that does not contain a copy of the complete $r$-uniform hypergraph on $k$ vertices. Finally, we prove further bounds in the special cases $(r,m)=(4,2)$ and $(r,m)=(4,3)$. 
    more » « less
  2. Abstract

    We study the problem of estimating a $k$-sparse signal ${\boldsymbol \beta }_{0}\in{\mathbb{R}}^{p}$ from a set of noisy observations $\mathbf{y}\in{\mathbb{R}}^{n}$ under the model $\mathbf{y}=\mathbf{X}{\boldsymbol \beta }+w$, where $\mathbf{X}\in{\mathbb{R}}^{n\times p}$ is the measurement matrix the row of which is drawn from distribution $N(0,{\boldsymbol \varSigma })$. We consider the class of $L_{q}$-regularized least squares (LQLS) given by the formulation $\hat{{\boldsymbol \beta }}(\lambda )=\text{argmin}_{{\boldsymbol \beta }\in{\mathbb{R}}^{p}}\frac{1}{2}\|\mathbf{y}-\mathbf{X}{\boldsymbol \beta }\|^{2}_{2}+\lambda \|{\boldsymbol \beta }\|_{q}^{q}$, where $\|\cdot \|_{q}$  $(0\le q\le 2)$ denotes the $L_{q}$-norm. In the setting $p,n,k\rightarrow \infty $ with fixed $k/p=\epsilon $ and $n/p=\delta $, we derive the asymptotic risk of $\hat{{\boldsymbol \beta }}(\lambda )$ for arbitrary covariance matrix ${\boldsymbol \varSigma }$ that generalizes the existing results for standard Gaussian design, i.e. $X_{ij}\overset{i.i.d}{\sim }N(0,1)$. The results were derived from the non-rigorous replica method. We perform a higher-order analysis for LQLS in the small-error regime in which the first dominant term can be used to determine the phase transition behavior of LQLS. Our results show that the first dominant term does not depend on the covariance structure of ${\boldsymbol \varSigma }$ in the cases $0\le q\lt 1$ and $1\lt q\le 2,$ which indicates that the correlations among predictors only affect the phase transition curve in the case $q=1$ a.k.a. LASSO. To study the influence of the covariance structure of ${\boldsymbol \varSigma }$ on the performance of LQLS in the cases $0\le q\lt 1$ and $1\lt q\le 2$, we derive the explicit formulas for the second dominant term in the expansion of the asymptotic risk in terms of small error. Extensive computational experiments confirm that our analytical predictions are consistent with numerical results.

     
    more » « less
  3. The classic graphical Cheeger inequalities state that if $M$ is an $n\times n$ \emph{symmetric} doubly stochastic matrix, then \[ \frac{1-\lambda_{2}(M)}{2}\leq\phi(M)\leq\sqrt{2\cdot(1-\lambda_{2}(M))} \] where $\phi(M)=\min_{S\subseteq[n],|S|\leq n/2}\left(\frac{1}{|S|}\sum_{i\in S,j\not\in S}M_{i,j}\right)$ is the edge expansion of $M$, and $\lambda_{2}(M)$ is the second largest eigenvalue of $M$. We study the relationship between $\phi(A)$ and the spectral gap $1-\re\lambda_{2}(A)$ for \emph{any} doubly stochastic matrix $A$ (not necessarily symmetric), where $\lambda_{2}(A)$ is a nontrivial eigenvalue of $A$ with maximum real part. Fiedler showed that the upper bound on $\phi(A)$ is unaffected, i.e., $\phi(A)\leq\sqrt{2\cdot(1-\re\lambda_{2}(A))}$. With regards to the lower bound on $\phi(A)$, there are known constructions with \[ \phi(A)\in\Theta\left(\frac{1-\re\lambda_{2}(A)}{\log n}\right), \] indicating that at least a mild dependence on $n$ is necessary to lower bound $\phi(A)$. In our first result, we provide an \emph{exponentially} better construction of $n\times n$ doubly stochastic matrices $A_{n}$, for which \[ \phi(A_{n})\leq\frac{1-\re\lambda_{2}(A_{n})}{\sqrt{n}}. \] In fact, \emph{all} nontrivial eigenvalues of our matrices are $0$, even though the matrices are highly \emph{nonexpanding}. We further show that this bound is in the correct range (up to the exponent of $n$), by showing that for any doubly stochastic matrix $A$, \[ \phi(A)\geq\frac{1-\re\lambda_{2}(A)}{35\cdot n}. \] As a consequence, unlike the symmetric case, there is a (necessary) loss of a factor of $n^{\alpha}$ for $\frac{1}{2}\leq\alpha\leq1$ in lower bounding $\phi$ by the spectral gap in the nonsymmetric setting. Our second result extends these bounds to general matrices $R$ with nonnegative entries, to obtain a two-sided \emph{gapped} refinement of the Perron-Frobenius theorem. Recall from the Perron-Frobenius theorem that for such $R$, there is a nonnegative eigenvalue $r$ such that all eigenvalues of $R$ lie within the closed disk of radius $r$ about $0$. Further, if $R$ is irreducible, which means $\phi(R)>0$ (for suitably defined $\phi$), then $r$ is positive and all other eigenvalues lie within the \textit{open} disk, so (with eigenvalues sorted by real part), $\re\lambda_{2}(R) more » « less
  4. Consider the linear transport equation in 1D under an external confining potential \begin{document}$ \Phi $\end{document}:

    For \begin{document}$ \Phi = \frac {x^2}2 + \frac { \varepsilon x^4}2 $\end{document} (with \begin{document}$ \varepsilon >0 $\end{document} small), we prove phase mixing and quantitative decay estimates for \begin{document}$ {\partial}_t \varphi : = - \Delta^{-1} \int_{ \mathbb{R}} {\partial}_t f \, \mathrm{d} v $\end{document}, with an inverse polynomial decay rate \begin{document}$ O({\langle} t{\rangle}^{-2}) $\end{document}. In the proof, we develop a commuting vector field approach, suitably adapted to this setting. We will explain why we hope this is relevant for the nonlinear stability of the zero solution for the Vlasov–Poisson system in \begin{document}$ 1 $\end{document}D under the external potential \begin{document}$ \Phi $\end{document}.

     
    more » « less
  5. null (Ed.)
    A long-standing conjecture by Kotzig, Ringel, and Rosa states that every tree admits a graceful labeling. That is, for any tree $T$ with $n$~edges, it is conjectured that there exists a labeling $f\colon V(T) \to \{0,1,\ldots,n\}$ such that the set of induced edge labels $\bigl\{ |f(u)-f(v)| : \{u,v\}\in E(T) \bigr\}$ is exactly $\{1,2,\ldots,n\}$. We extend this concept to allow for multigraphs with edge multiplicity at most~$2$. A \emph{2-fold graceful labeling} of a graph (or multigraph) $G$ with $n$~edges is a one-to-one function $f\colon V(G) \to \{0,1,\ldots,n\}$ such that the multiset of induced edge labels is comprised of two copies of each element in $\bigl\{ 1,2,\ldots, \lfloor n/2 \rfloor \bigr\}$ and, if $n$ is odd, one copy of $\bigl\{ \lceil n/2 \rceil \bigr\}$. When $n$ is even, this concept is similar to that of 2-equitable labelings which were introduced by Bloom and have been studied for several classes of graphs. We show that caterpillars, cycles of length $n \not\equiv 1 \pmod{4}$, and complete bipartite graphs admit 2-fold graceful labelings. We also show that under certain conditions, the join of a tree and an empty graph (i.e., a graph with vertices but no edges) is $2$-fold graceful. 
    more » « less