NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Stable tensor neural networks for efficient deep learning

https://doi.org/10.3389/fdata.2024.1363978

Newman, Elizabeth; Horesh, Lior; Avron, Haim; Kilmer, Misha E (May 2024, Frontiers in Big Data)
Zhang, Yanqing (Ed.)
Learning from complex, multidimensional data has become central to computational mathematics, and among the most successful high-dimensional function approximators are deep neural networks (DNNs). Training DNNs is posed as an optimization problem to learn network weights or parameters that well-approximate a mapping from input to target data. Multiway data or tensors arise naturally in myriad ways in deep learning, in particular as input data and as high-dimensional weights and features extracted by the network, with the latter often being a bottleneck in terms of speed and memory. In this work, we leverage tensor representations and processing to efficiently parameterize DNNs when learning from high-dimensional data. We propose tensor neural networks (t-NNs), a natural extension of traditional fully-connected networks, that can be trained efficiently in a reduced, yet more powerful parameter space. Our t-NNs are built upon matrix-mimetic tensor-tensor products, which retain algebraic properties of matrix multiplication while capturing high-dimensional correlations. Mimeticity enables t-NNs to inherit desirable properties of modern DNN architectures. We exemplify this by extending recent work on stable neural networks, which interpret DNNs as discretizations of differential equations, to our multidimensional framework. We provide empirical evidence of the parametric advantages of t-NNs on dimensionality reduction using autoencoders and classification using fully-connected and stable variants on benchmark imaging datasets MNIST and CIFAR-10.
more » « less
Full Text Available
slimTrain---A Stochastic Approximation Method for Training Separable Deep Neural Networks

https://doi.org/10.1137/21M1452512

Newman, Elizabeth; Chung, Julianne; Chung, Matthias; Ruthotto, Lars (August 2022, SIAM Journal on Scientific Computing)

Full Text Available
hessQuik: Fast Hessian computation of compositefunctions

https://doi.org/10.21105/joss.04171

Newman, Elizabeth; Ruthotto, Lars (April 2022, Journal of Open Source Software)

Full Text Available
slimTrain -- A Stochastic Approximation Method for Training Separable Deep Neural Networks

Newman, Elizabeth; Chung, Julianne; Chung, Matthias; Ruthotto, Lars (July 2022, SIAM journal on scientific computing)

Deep neural networks (DNNs) have shown their success as high-dimensional function approximators in many applications; however, training DNNs can be challenging in general. DNN training is commonly phrased as a stochastic optimization problem whose challenges include non-convexity, non-smoothness, insufficient regularization, and complicated data distributions. Hence, the performance of DNNs on a given task depends crucially on tuning hyperparameters, especially learning rates and regularization parameters. In the absence of theoretical guidelines or prior experience on similar tasks, this requires solving many training problems, which can be time-consuming and demanding on computational resources. This can limit the applicability of DNNs to problems with non-standard, complex, and scarce datasets, e.g., those arising in many scientific applications. To remedy the challenges of DNN training, we propose slimTrain, a stochastic optimization method for training DNNs with reduced sensitivity to the choice hyperparameters and fast initial convergence. The central idea of slimTrain is to exploit the separability inherent in many DNN architectures; that is, we separate the DNN into a nonlinear feature extractor followed by a linear model. This separability allows us to leverage recent advances made for solving large-scale, linear, ill-posed inverse problems. Crucially, for the linear weights, slimTrain does not require a learning rate and automatically adapts the regularization parameter. Since our method operates on mini-batches, its computational overhead per iteration is modest. In our numerical experiments, slimTrain outperforms existing DNN training methods with the recommended hyperparameter settings and reduces the sensitivity of DNN training to the remaining hyperparameters.
more » « less
Full Text Available
Tensor-tensor algebra for optimal representation and compression of multiway data

https://doi.org/10.1073/pnas.2015851118

Kilmer, Misha E.; Horesh, Lior; Avron, Haim; Newman, Elizabeth (July 2021, Proceedings of the National Academy of Sciences)

With the advent of machine learning and its overarching pervasiveness it is imperative to devise ways to represent large datasets efficiently while distilling intrinsic features necessary for subsequent analysis. The primary workhorse used in data dimensionality reduction and feature extraction has been the matrix singular value decomposition (SVD), which presupposes that data have been arranged in matrix format. A primary goal in this study is to show that high-dimensional datasets are more compressible when treated as tensors (i.e., multiway arrays) and compressed via tensor-SVDs under the tensor-tensor product constructs and its generalizations. We begin by proving Eckart–Young optimality results for families of tensor-SVDs under two different truncation strategies. Since such optimality properties can be proven in both matrix and tensor-based algebras, a fundamental question arises: Does the tensor construct subsume the matrix construct in terms of representation efficiency? The answer is positive, as proven by showing that a tensor-tensor representation of an equal dimensional spanning space can be superior to its matrix counterpart. We then use these optimality results to investigate how the compressed representation provided by the truncated tensor SVD is related both theoretically and empirically to its two closest tensor-based analogs, the truncated high-order SVD and the truncated tensor-train SVD.
more » « less
Train Like a (Var)Pro: Efficient Training of Neural Networks with Variable Projection

https://doi.org/10.1137/20M1359511

Newman, Elizabeth; Ruthotto, Lars; Hart, Joseph; van Bloemen Waanders, Bart (January 2021, SIAM Journal on Mathematics of Data Science)

Full Text Available

Search for: All records