This content will become publicly available on December 10, 2024
 Award ID(s):
 2045590
 NSFPAR ID:
 10495901
 Publisher / Repository:
 Curran Associates, Inc.
 Date Published:
 Journal Name:
 Advances in Neural Information Processing Systems
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

We study the $\ell_p$ regression problem, which requires finding $\mathbf{x}\in\mathbb R^{d}$ that minimizes $\\mathbf{A}\mathbf{x}\mathbf{b}\_p$ for a matrix $\mathbf{A}\in\mathbb R^{n \times d}$ and response vector $\mathbf{b}\in\mathbb R^{n}$. There has been recent interest in developing subsampling methods for this problem that can outperform standard techniques when $n$ is very large. However, all known subsampling approaches have run time that depends exponentially on $p$, typically, $d^{\mathcal{O}(p)}$, which can be prohibitively expensive. We improve on this work by showing that for a large class of common \emph{structured matrices}, such as combinations of lowrank matrices, sparse matrices, and Vandermonde matrices, there are subsampling based methods for $\ell_p$ regression that depend polynomially on $p$. For example, we give an algorithm for $\ell_p$ regression on Vandermonde matrices that runs in time $\mathcal{O}(n\log^3 n+(dp^2)^{0.5+\omega}\cdot\text{polylog}\,n)$, where $\omega$ is the exponent of matrix multiplication. The polynomial dependence on $p$ crucially allows our algorithms to extend naturally to efficient algorithms for $\ell_\infty$ regression, via approximation of $\ell_\infty$ by $\ell_{\mathcal{O}(\log n)}$. Of practical interest, we also develop a new subsampling algorithm for $\ell_p$ regression for arbitrary matrices, which is simpler than previous approaches for $p \ge 4$.more » « less

Abstract This paper will study almost everywhere behaviors of functions on partition spaces of cardinals possessing suitable partition properties. Almost everywhere continuity and monotonicity properties for functions on partition spaces will be established. These results will be applied to distinguish the cardinality of certain subsets of the power set of partition cardinals.
The following summarizes the main results proved under suitable partition hypotheses.
If
is a cardinal,$\kappa $ ,$\epsilon < \kappa $ ,${\mathrm {cof}}(\epsilon ) = \omega $ and$\kappa \rightarrow _* (\kappa )^{\epsilon \cdot \epsilon }_2$ , then$\Phi : [\kappa ]^\epsilon _* \rightarrow \mathrm {ON}$ satisfies the almost everywhere short length continuity property: There is a club$\Phi $ and a$C \subseteq \kappa $ so that for all$\delta < \epsilon $ , if$f,g \in [C]^\epsilon _*$ and$f \upharpoonright \delta = g \upharpoonright \delta $ , then$\sup (f) = \sup (g)$ .$\Phi (f) = \Phi (g)$ If
is a cardinal,$\kappa $ is countable,$\epsilon $ holds and$\kappa \rightarrow _* (\kappa )^{\epsilon \cdot \epsilon }_2$ , then$\Phi : [\kappa ]^\epsilon _* \rightarrow \mathrm {ON}$ satisfies the strong almost everywhere short length continuity property: There is a club$\Phi $ and finitely many ordinals$C \subseteq \kappa $ so that for all$\delta _0, ..., \delta _k \leq \epsilon $ , if for all$f,g \in [C]^\epsilon _*$ ,$0 \leq i \leq k$ , then$\sup (f \upharpoonright \delta _i) = \sup (g \upharpoonright \delta _i)$ .$\Phi (f) = \Phi (g)$ If
satisfies$\kappa $ ,$\kappa \rightarrow _* (\kappa )^\kappa _2$ and$\epsilon \leq \kappa $ , then$\Phi : [\kappa ]^\epsilon _* \rightarrow \mathrm {ON}$ satisfies the almost everywhere monotonicity property: There is a club$\Phi $ so that for all$C \subseteq \kappa $ , if for all$f,g \in [C]^\epsilon _*$ ,$\alpha < \epsilon $ , then$f(\alpha ) \leq g(\alpha )$ .$\Phi (f) \leq \Phi (g)$ Suppose dependent choice (
),$\mathsf {DC}$ and the almost everywhere short length club uniformization principle for${\omega _1} \rightarrow _* ({\omega _1})^{\omega _1}_2$ hold. Then every function${\omega _1}$ satisfies a finite continuity property with respect to closure points: Let$\Phi : [{\omega _1}]^{\omega _1}_* \rightarrow {\omega _1}$ be the club of$\mathfrak {C}_f$ so that$\alpha < {\omega _1}$ . There is a club$\sup (f \upharpoonright \alpha ) = \alpha $ and finitely many functions$C \subseteq {\omega _1}$ so that for all$\Upsilon _0, ..., \Upsilon _{n  1} : [C]^{\omega _1}_* \rightarrow {\omega _1}$ , for all$f \in [C]^{\omega _1}_*$ , if$g \in [C]^{\omega _1}_*$ and for all$\mathfrak {C}_g = \mathfrak {C}_f$ ,$i < n$ , then$\sup (g \upharpoonright \Upsilon _i(f)) = \sup (f \upharpoonright \Upsilon _i(f))$ .$\Phi (g) = \Phi (f)$ Suppose
satisfies$\kappa $ for all$\kappa \rightarrow _* (\kappa )^\epsilon _2$ . For all$\epsilon < \kappa $ ,$\chi < \kappa $ does not inject into$[\kappa ]^{<\kappa }$ , the class of${}^\chi \mathrm {ON}$ length sequences of ordinals, and therefore,$\chi $ . As a consequence, under the axiom of determinacy$[\kappa ]^\chi  < [\kappa ]^{<\kappa }$ , these two cardinality results hold when$(\mathsf {AD})$ is one of the following weak or strong partition cardinals of determinacy:$\kappa $ ,${\omega _1}$ ,$\omega _2$ (for all$\boldsymbol {\delta }_n^1$ ) and$1 \leq n < \omega $ (assuming in addition$\boldsymbol {\delta }^2_1$ ).$\mathsf {DC}_{\mathbb {R}}$ 
null (Ed.)We study fast algorithms for computing basic properties of an n x n positive semidefinite kernel matrix K corresponding to n points x_1,...,x_n in R^d. In particular, we consider the estimating the sum of kernel matrix entries, along with its top eigenvalue and eigenvector. These are some of the most basic problems defined over kernel matrices. We show that the sum of matrix entries can be estimated up to a multiplicative factor of 1+epsilon in time sublinear in n and linear in d for many popular kernel functions, including the Gaussian, exponential, and rational quadratic kernels. For these kernels, we also show that the top eigenvalue (and a witnessing approximate eigenvector) can be approximated to a multiplicative factor of 1+epsilon in time subquadratic in n and linear in d. Our algorithms represent significant advances in the best known runtimes for these problems. They leverage the positive definiteness of the kernel matrix, along with a recent line of work on efficient kernel density estimation.more » « less

We study modelfree reinforcement learning (RL) algorithms for infinitehorizon averagereward Markov decision process (MDP), which is more appropriate for applications that involve continuing operations not divided into episodes. In contrast to episodic/discounted MDPs, theoretical understanding of modelfree RL algorithms is relatively inadequate for the averagereward setting. In this paper, we consider both the online setting and the setting with access to a simulator. We develop computationally efficient modelfree algorithms that achieve sharper guarantees on regret/sample complexity compared with existing results. In the online setting, we design an algorithm, UCBAVG, based on an optimistic variant of variancereduced Qlearning. We show that UCBAVG achieves a regret bound $\widetilde{O}(S^5A^2sp(h^*)\sqrt{T})$ after $T$ steps, where $S\times A$ is the size of stateaction space, and $sp(h^*)$ the span of the optimal bias function. Our result provides the first computationally efficient modelfree algorithm that achieves the optimal dependence in $T$ (up to log factors) for weakly communicating MDPs, which is necessary for low regret. In contrast, prior results either are suboptimal in $T$ or require strong assumptions of ergodicity or uniformly mixing of MDPs. In the simulator setting, we adapt the idea of UCBAVG to develop a modelfree algorithm that finds an $\epsilon$optimal policy with sample complexity $\widetilde{O}(SAsp^2(h^*)\epsilon^{2} + S^2Asp(h^*)\epsilon^{1}).$ This sample complexity is nearoptimal for weakly communicating MDPs, in view of the minimax lower bound $\Omega(SAsp(^*)\epsilon^{2})$. Existing work mainly focuses on ergodic MDPs and the results typically depend on $t_{mix},$ the worstcase mixing time induced by a policy. We remark that the diameter $D$ and mixing time $t_{mix}$ are both lower bounded by $sp(h^*)$, and $t_{mix}$ can be arbitrarily large for certain MDPs. On the technical side, our approach integrates two key ideas: learning an $\gamma$discounted MDP as an approximation, and leveraging referenceadvantage decomposition for variance in optimistic Qlearning. As recognized in prior work, a naive approximation by discounted MDPs results in suboptimal guarantees. A distinguishing feature of our method is maintaining estimates of valuedifference between state pairs to provide a sharper bound on the variance of reference advantage. We also crucially use a careful choice of the discounted factor $\gamma$ to balance approximation error due to discounting and the statistical learning error, and we are able to maintain a goodquality reference value function with $O(SA)$ space complexity.more » « less

We give two new quantum algorithms for solving semidefinite programs (SDPs) providing quantum speedups. We consider SDP instances with m constraint matrices, each of dimension n, rank at most r, and sparsity s. The first algorithm assumes an input model where one is given access to an oracle to the entries of the matrices at unit cost. We show that it has run time O~(s^2 (sqrt{m} epsilon^{10} + sqrt{n} epsilon^{12})), with epsilon the error of the solution. This gives an optimal dependence in terms of m, n and quadratic improvement over previous quantum algorithms (when m ~~ n). The second algorithm assumes a fully quantum input model in which the input matrices are given as quantum states. We show that its run time is O~(sqrt{m}+poly(r))*poly(log m,log n,B,epsilon^{1}), with B an upper bound on the tracenorm of all input matrices. In particular the complexity depends only polylogarithmically in n and polynomially in r. We apply the second SDP solver to learn a good description of a quantum state with respect to a set of measurements: Given m measurements and a supply of copies of an unknown state rho with rank at most r, we show we can find in time sqrt{m}*poly(log m,log n,r,epsilon^{1}) a description of the state as a quantum circuit preparing a density matrix which has the same expectation values as rho on the m measurements, up to error epsilon. The density matrix obtained is an approximation to the maximum entropy state consistent with the measurement data considered in Jaynes' principle from statistical mechanics. As in previous work, we obtain our algorithm by "quantizing" classical SDP solvers based on the matrix multiplicative weight update method. One of our main technical contributions is a quantum Gibbs state sampler for lowrank Hamiltonians, given quantum states encoding these Hamiltonians, with a polylogarithmic dependence on its dimension, which is based on ideas developed in quantum principal component analysis. We also develop a "fast" quantum OR lemma with a quadratic improvement in gate complexity over the construction of Harrow et al. [Harrow et al., 2017]. We believe both techniques might be of independent interest.more » « less