This content will become publicly available on February 22, 2023
 Publication Date:
 NSFPAR ID:
 10326696
 Journal Name:
 Proceedings of the AAAI Conference on Artificial Intelligence
 ISSN:
 21595399
 Sponsoring Org:
 National Science Foundation
More Like this

We consider the development of practical stochastic quasiNewton, and in particular Kroneckerfactored block diagonal BFGS and LBFGS methods, for training deep neural networks (DNNs). In DNN training, the number of variables and components of the gradient n is often of the order of tens of millions and the Hessian has n^ 2 elements. Consequently, computing and storing a full n times n BFGS approximation or storing a modest number of (step, change in gradient) vector pairs for use in an LBFGS implementation is out of the question. In our proposed methods, we approximate the Hessian by a blockdiagonal matrix and use the structure of the gradient and Hessian to further approximate these blocks, each of which corresponds to a layer, as the Kronecker product of two much smaller matrices. This is analogous to the approach in KFAC, which computes a Kroneckerfactored block diagonal approximation to the Fisher matrix in a stochastic natural gradient method. Because the indefinite and highly variable nature of the Hessian in a DNN, we also propose a new damping approach to keep the upper as well as the lower bounds of the BFGS and LBFGS approximations bounded. In tests on autoencoder feedforward network models with eithermore »

We investigate the approximability of the following optimization problem. The input is an n× n matrix A=(Aij) with real entries and an originsymmetric convex body K⊂ ℝn that is given by a membership oracle. The task is to compute (or approximate) the maximum of the quadratic form ∑i=1n∑j=1n Aij xixj=⟨ x,Ax⟩ as x ranges over K. This is a rich and expressive family of optimization problems; for different choices of matrices A and convex bodies K it includes a diverse range of optimization problems like maxcut, Grothendieck/noncommutative Grothendieck inequalities, small set expansion and more. While the literature studied these special cases using casespecific reasoning, here we develop a general methodology for treatment of the approximability and inapproximability aspects of these questions. The underlying geometry of K plays a critical role; we show under commonly used complexity assumptions that polytime constantapproximability necessitates that K has type2 constant that grows slowly with n. However, we show that even when the type2 constant is bounded, this problem sometimes exhibits strong hardness of approximation. Thus, even within the realm of type2 bodies, the approximability landscape is nuanced and subtle. However, the link that we establish between optimization and geometry of Banach spaces allows usmore »

We develop effective approximation methods for unitary matrices. In our formulation, a unitary matrix is represented as a product of rotations in twodimensional subspaces, socalled Givens rotations. Instead of the quadratic dimension dependence when applying a dense matrix, applying such an approximation scales with the number factors, each of which can be implemented efficiently. Consequently, in settings where an approximation is once computed and then applied many times, such an effective representation becomes advantageous. Although efficient Givens factorizations are not possible for generic unitary operators, we show that minimizing a sparsityinducing objective with a coordinate descent algorithm on the unitary group yields good factorizations for structured matrices. Canonical applications of such a setup are orthogonal basis transforms. We demonstrate that our methods improve the approximate representation of the graph Fourier transform, the matrix obtained when diagonalizing a graph Laplacian.

We develop effective approximation methods for unitary matrices. In our formulation, a unitary matrix is represented as a product of rotations in twodimensional subspaces, socalled Givens rotations. Instead of the quadratic dimension dependence when applying a dense matrix, applying such an approximation scales with the number factors, each of which can be implemented efficiently. Consequently, in settings where an approximation is once computed and then applied many times, such an effective representation becomes advantageous. Although efficient Givens factorizations are not possible for generic unitary operators, we show that minimizing a sparsityinducing objective with a coordinate descent algorithm on the unitary group yields good factorizations for structured matrices. Canonical applications of such a setup are orthogonal basis transforms. We demonstrate that our methods improve the approximate representation of the graph Fourier transform, the matrix obtained when diagonalizing a graph Laplacian.

In this paper, we propose a new class of operator factorization methods to discretize the integral fractional Laplacian
for\begin{document}$ ( \Delta)^\frac{{ \alpha}}{{2}} $\end{document} . One main advantage is that our method can easily increase numerical accuracy by using highdegree Lagrange basis functions, but remain its scheme structure and computer implementation unchanged. Moreover, it results in a symmetric (multilevel) Toeplitz differentiation matrix, enabling efficient computation via the fast Fourier transforms. If constant or linear basis functions are used, our method has an accuracy of\begin{document}$ \alpha \in (0, 2) $\end{document} , while\begin{document}$ {\mathcal O}(h^2) $\end{document} for quadratic basis functions with\begin{document}$ {\mathcal O}(h^4) $\end{document} a small mesh size. This accuracy can be achieved for any\begin{document}$ h $\end{document} and can be further increased if higherdegree basis functions are chosen. Numerical experiments are provided to approximate the fractional Laplacian and solve the fractional Poisson problems. It shows that if the solution of fractional Poisson problem satisfies\begin{document}$ \alpha \in (0, 2) $\end{document} for\begin{document}$ u \in C^{m, l}(\bar{ \Omega}) $\end{document} and\begin{document}$ m \in {\mathbb N} $\end{document} , our method has an accuracy of\begin{document}$ 0 < l < 1 $\end{document} for constant and linear basis functions, while\begin{document}$more » for quadratic basis functions. Additionally, our method can be readily applied to approximate the generalized fractional Laplacians with symmetric kernel function, and numerical study on the tempered fractional Poisson problem demonstrates its efficiency.\begin{document}$ {\mathcal O}(h^{\min\{m+l, \, 4\}}) $\end{document}