skip to main content

This content will become publicly available on December 10, 2024

Title: Structured Semidefinite Programming for Recovering Structured Preconditioners
We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. \begin{itemize} \item \textbf{Diagonal preconditioning.} We give an algorithm which, given positive definite $\mathbf{K} \in \mathbb{R}^{d \times d}$ with $\mathrm{nnz}(\mathbf{K})$ nonzero entries, computes an $\epsilon$-optimal diagonal preconditioner in time $\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star,\epsilon^{-1}))$, where $\kappa^\star$ is the optimal condition number of the rescaled matrix. \item \textbf{Structured linear systems.} We give an algorithm which, given $\mathbf{M} \in \mathbb{R}^{d \times d}$ that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in $\mathbf{M}$ in $\widetilde{O}(d^2)$ time. \end{itemize} Our diagonal preconditioning results improve state-of-the-art runtimes of $\Omega(d^{3.5})$ attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of $\Omega(d^{\omega})$ where $\omega > 2.3$ is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call \emph{matrix-dictionary approximation SDPs}, which we leverage to solve an associated problem we call \emph{matrix-dictionary recovery}.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Curran Associates, Inc.
Date Published:
Journal Name:
Advances in Neural Information Processing Systems
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the $\ell_p$ regression problem, which requires finding $\mathbf{x}\in\mathbb R^{d}$ that minimizes $\|\mathbf{A}\mathbf{x}-\mathbf{b}\|_p$ for a matrix $\mathbf{A}\in\mathbb R^{n \times d}$ and response vector $\mathbf{b}\in\mathbb R^{n}$. There has been recent interest in developing subsampling methods for this problem that can outperform standard techniques when $n$ is very large. However, all known subsampling approaches have run time that depends exponentially on $p$, typically, $d^{\mathcal{O}(p)}$, which can be prohibitively expensive. We improve on this work by showing that for a large class of common \emph{structured matrices}, such as combinations of low-rank matrices, sparse matrices, and Vandermonde matrices, there are subsampling based methods for $\ell_p$ regression that depend polynomially on $p$. For example, we give an algorithm for $\ell_p$ regression on Vandermonde matrices that runs in time $\mathcal{O}(n\log^3 n+(dp^2)^{0.5+\omega}\cdot\text{polylog}\,n)$, where $\omega$ is the exponent of matrix multiplication. The polynomial dependence on $p$ crucially allows our algorithms to extend naturally to efficient algorithms for $\ell_\infty$ regression, via approximation of $\ell_\infty$ by $\ell_{\mathcal{O}(\log n)}$. Of practical interest, we also develop a new subsampling algorithm for $\ell_p$ regression for arbitrary matrices, which is simpler than previous approaches for $p \ge 4$. 
    more » « less
  2. Abstract

    This paper will study almost everywhere behaviors of functions on partition spaces of cardinals possessing suitable partition properties. Almost everywhere continuity and monotonicity properties for functions on partition spaces will be established. These results will be applied to distinguish the cardinality of certain subsets of the power set of partition cardinals.

    The following summarizes the main results proved under suitable partition hypotheses.

    If$\kappa $is a cardinal,$\epsilon < \kappa $,${\mathrm {cof}}(\epsilon ) = \omega $,$\kappa \rightarrow _* (\kappa )^{\epsilon \cdot \epsilon }_2$and$\Phi : [\kappa ]^\epsilon _* \rightarrow \mathrm {ON}$, then$\Phi $satisfies the almost everywhere short length continuity property: There is a club$C \subseteq \kappa $and a$\delta < \epsilon $so that for all$f,g \in [C]^\epsilon _*$, if$f \upharpoonright \delta = g \upharpoonright \delta $and$\sup (f) = \sup (g)$, then$\Phi (f) = \Phi (g)$.

    If$\kappa $is a cardinal,$\epsilon $is countable,$\kappa \rightarrow _* (\kappa )^{\epsilon \cdot \epsilon }_2$holds and$\Phi : [\kappa ]^\epsilon _* \rightarrow \mathrm {ON}$, then$\Phi $satisfies the strong almost everywhere short length continuity property: There is a club$C \subseteq \kappa $and finitely many ordinals$\delta _0, ..., \delta _k \leq \epsilon $so that for all$f,g \in [C]^\epsilon _*$, if for all$0 \leq i \leq k$,$\sup (f \upharpoonright \delta _i) = \sup (g \upharpoonright \delta _i)$, then$\Phi (f) = \Phi (g)$.

    If$\kappa $satisfies$\kappa \rightarrow _* (\kappa )^\kappa _2$,$\epsilon \leq \kappa $and$\Phi : [\kappa ]^\epsilon _* \rightarrow \mathrm {ON}$, then$\Phi $satisfies the almost everywhere monotonicity property: There is a club$C \subseteq \kappa $so that for all$f,g \in [C]^\epsilon _*$, if for all$\alpha < \epsilon $,$f(\alpha ) \leq g(\alpha )$, then$\Phi (f) \leq \Phi (g)$.

    Suppose dependent choice ($\mathsf {DC}$),${\omega _1} \rightarrow _* ({\omega _1})^{\omega _1}_2$and the almost everywhere short length club uniformization principle for${\omega _1}$hold. Then every function$\Phi : [{\omega _1}]^{\omega _1}_* \rightarrow {\omega _1}$satisfies a finite continuity property with respect to closure points: Let$\mathfrak {C}_f$be the club of$\alpha < {\omega _1}$so that$\sup (f \upharpoonright \alpha ) = \alpha $. There is a club$C \subseteq {\omega _1}$and finitely many functions$\Upsilon _0, ..., \Upsilon _{n - 1} : [C]^{\omega _1}_* \rightarrow {\omega _1}$so that for all$f \in [C]^{\omega _1}_*$, for all$g \in [C]^{\omega _1}_*$, if$\mathfrak {C}_g = \mathfrak {C}_f$and for all$i < n$,$\sup (g \upharpoonright \Upsilon _i(f)) = \sup (f \upharpoonright \Upsilon _i(f))$, then$\Phi (g) = \Phi (f)$.

    Suppose$\kappa $satisfies$\kappa \rightarrow _* (\kappa )^\epsilon _2$for all$\epsilon < \kappa $. For all$\chi < \kappa $,$[\kappa ]^{<\kappa }$does not inject into${}^\chi \mathrm {ON}$, the class of$\chi $-length sequences of ordinals, and therefore,$|[\kappa ]^\chi | < |[\kappa ]^{<\kappa }|$. As a consequence, under the axiom of determinacy$(\mathsf {AD})$, these two cardinality results hold when$\kappa $is one of the following weak or strong partition cardinals of determinacy:${\omega _1}$,$\omega _2$,$\boldsymbol {\delta }_n^1$(for all$1 \leq n < \omega $) and$\boldsymbol {\delta }^2_1$(assuming in addition$\mathsf {DC}_{\mathbb {R}}$).

    more » « less
  3. null (Ed.)
    We study fast algorithms for computing basic properties of an n x n positive semidefinite kernel matrix K corresponding to n points x_1,...,x_n in R^d. In particular, we consider the estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. These are some of the most basic problems defined over kernel matrices. We show that the sum of matrix entries can be estimated up to a multiplicative factor of 1+epsilon in time sublinear in n and linear in d for many popular kernel functions, including the Gaussian, exponential, and rational quadratic kernels. For these kernels, we also show that the top eigenvalue (and a witnessing approximate eigenvector) can be approximated to a multiplicative factor of 1+epsilon in time sub-quadratic in n and linear in d. Our algorithms represent significant advances in the best known runtimes for these problems. They leverage the positive definiteness of the kernel matrix, along with a recent line of work on efficient kernel density estimation. 
    more » « less
  4. We study model-free reinforcement learning (RL) algorithms for infinite-horizon average-reward Markov decision process (MDP), which is more appropriate for applications that involve continuing operations not divided into episodes. In contrast to episodic/discounted MDPs, theoretical understanding of model-free RL algorithms is relatively inadequate for the average-reward setting. In this paper, we consider both the online setting and the setting with access to a simulator. We develop computationally efficient model-free algorithms that achieve sharper guarantees on regret/sample complexity compared with existing results. In the online setting, we design an algorithm, UCB-AVG, based on an optimistic variant of variance-reduced Q-learning. We show that UCB-AVG achieves a regret bound $\widetilde{O}(S^5A^2sp(h^*)\sqrt{T})$ after $T$ steps, where $S\times A$ is the size of state-action space, and $sp(h^*)$ the span of the optimal bias function. Our result provides the first computationally efficient model-free algorithm that achieves the optimal dependence in $T$ (up to log factors) for weakly communicating MDPs, which is necessary for low regret. In contrast, prior results either are suboptimal in $T$ or require strong assumptions of ergodicity or uniformly mixing of MDPs. In the simulator setting, we adapt the idea of UCB-AVG to develop a model-free algorithm that finds an $\epsilon$-optimal policy with sample complexity $\widetilde{O}(SAsp^2(h^*)\epsilon^{-2} + S^2Asp(h^*)\epsilon^{-1}).$ This sample complexity is near-optimal for weakly communicating MDPs, in view of the minimax lower bound $\Omega(SAsp(^*)\epsilon^{-2})$. Existing work mainly focuses on ergodic MDPs and the results typically depend on $t_{mix},$ the worst-case mixing time induced by a policy. We remark that the diameter $D$ and mixing time $t_{mix}$ are both lower bounded by $sp(h^*)$, and $t_{mix}$ can be arbitrarily large for certain MDPs. On the technical side, our approach integrates two key ideas: learning an $\gamma$-discounted MDP as an approximation, and leveraging reference-advantage decomposition for variance in optimistic Q-learning. As recognized in prior work, a naive approximation by discounted MDPs results in suboptimal guarantees. A distinguishing feature of our method is maintaining estimates of value-difference between state pairs to provide a sharper bound on the variance of reference advantage. We also crucially use a careful choice of the discounted factor $\gamma$ to balance approximation error due to discounting and the statistical learning error, and we are able to maintain a good-quality reference value function with $O(SA)$ space complexity. 
    more » « less
  5. We give two new quantum algorithms for solving semidefinite programs (SDPs) providing quantum speed-ups. We consider SDP instances with m constraint matrices, each of dimension n, rank at most r, and sparsity s. The first algorithm assumes an input model where one is given access to an oracle to the entries of the matrices at unit cost. We show that it has run time O~(s^2 (sqrt{m} epsilon^{-10} + sqrt{n} epsilon^{-12})), with epsilon the error of the solution. This gives an optimal dependence in terms of m, n and quadratic improvement over previous quantum algorithms (when m ~~ n). The second algorithm assumes a fully quantum input model in which the input matrices are given as quantum states. We show that its run time is O~(sqrt{m}+poly(r))*poly(log m,log n,B,epsilon^{-1}), with B an upper bound on the trace-norm of all input matrices. In particular the complexity depends only polylogarithmically in n and polynomially in r. We apply the second SDP solver to learn a good description of a quantum state with respect to a set of measurements: Given m measurements and a supply of copies of an unknown state rho with rank at most r, we show we can find in time sqrt{m}*poly(log m,log n,r,epsilon^{-1}) a description of the state as a quantum circuit preparing a density matrix which has the same expectation values as rho on the m measurements, up to error epsilon. The density matrix obtained is an approximation to the maximum entropy state consistent with the measurement data considered in Jaynes' principle from statistical mechanics. As in previous work, we obtain our algorithm by "quantizing" classical SDP solvers based on the matrix multiplicative weight update method. One of our main technical contributions is a quantum Gibbs state sampler for low-rank Hamiltonians, given quantum states encoding these Hamiltonians, with a poly-logarithmic dependence on its dimension, which is based on ideas developed in quantum principal component analysis. We also develop a "fast" quantum OR lemma with a quadratic improvement in gate complexity over the construction of Harrow et al. [Harrow et al., 2017]. We believe both techniques might be of independent interest. 
    more » « less