NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CRONOS: Enhancing Deep Learning with Scalable GPU Accelerated Convex Neural Networks

Feng, Miria; Frangella, Zachary; Pilanci, Mert (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024).)

We introduce the CRONOS algorithm for convex optimization of two-layer neural networks. CRONOS is the first algorithm capable of scaling to high-dimensional datasets such as ImageNet, which are ubiquitous in modern deep learning. This significantly improves upon prior work, which has been restricted to downsampled versions of MNIST and CIFAR-10. Taking CRONOS as a primitive, we then develop a new algorithm called CRONOS-AM, which combines CRONOS with alternating minimization, to obtain an algorithm capable of training multilayer networks with arbitrary architectures. Our theoretical analysis proves that CRONOS converges to the global minimum of the convex reformulation under mild assumptions. In addition, we validate the efficacy of CRONOS and CRONOS-AM through extensive large-scale numerical experiments with GPU acceleration in JAX. Our results show that CRONOS-AM can obtain comparable or better validation accuracy than predominant tuned deep learning optimizers on vision and language tasks with benchmark datasets such as ImageNet and IMDb. To the best of our knowledge, CRONOS is the first algorithm which utilizes the convex reformulation to enhance performance on large-scale learning tasks
more » « less
Full Text Available
Path regularization: A convexity and sparsity inducing regularization for parallel relu networks

Ergen, Tolga; Pilanci, Mert (December 2023, Neural Information Processing Systems (NeurIPS))

Full Text Available
Fixing the NTK: From Neural Network Linearizations to Exact Convex Programs

Dwaraknath, Rajat Vadiraj; Ergen, Tolga; Pilanci, Mert (December 2023, Neural Information Processing Systems (NeurIPS))

Full Text Available
Matrix Compression via Randomized Low Rank and Low Precision Factorization

Saha, Rajarshi (October 2023, arXivorg)

Matrices are exceptionally useful in various fields of study as they provide a convenient framework to organize and manipulate data in a structured manner. However, modern matrices can involve billions of elements, making their storage and processing quite demanding in terms of computational resources and memory usage. Although prohibitively large, such matrices are often approximately low rank. We propose an algorithm that exploits this structure to obtain a low rank decomposition of any matrix A as A≈LR, where L and R are the low rank factors. The total number of elements in L and R can be significantly less than that in A. Furthermore, the entries of L and R are quantized to low precision formats −− compressing A by giving us a low rank and low precision factorization. Our algorithm first computes an approximate basis of the range space of A by randomly sketching its columns, followed by a quantization of the vectors constituting this basis. It then computes approximate projections of the columns of A onto this quantized basis. We derive upper bounds on the approximation error of our algorithm, and analyze the impact of target rank and quantization bit-budget. The tradeoff between compression ratio and approximation accuracy allows for flexibility in choosing these parameters based on specific application requirements. We empirically demonstrate the efficacy of our algorithm in image compression, nearest neighbor classification of image and text embeddings, and compressing the layers of LlaMa-7b. Our results illustrate that we can achieve compression ratios as aggressive as one bit per matrix coordinate, all while surpassing or maintaining the performance of traditional compression techniques.
more » « less
Full Text Available
From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford Algebra and Convexity

Pilanci, Mert (October 2023, Transactions on machine learning research)

In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via ℓ1 regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.
more » « less
Full Text Available
Optimal Sets and Solution Paths of ReLU Networks

Mishkin, Aaron; Pilanci, Mert (January 2023, International Conference on Machine Learning)

We develop an analytical framework to characterize the set of optimal ReLU neural networks by reformulating the non-convex training problem as a convex program. We show that the global optima of the convex parameterization are given by a polyhedral set and then extend this characterization to the optimal set of the non-convex training objective. Since all stationary points of the ReLU training problem can be represented as optima of sub-sampled convex programs, our work provides a general expression for all critical points of the non-convex objective. We then leverage our results to provide an optimal pruning algorithm for computing minimal networks, establish conditions for the regularization path of ReLU networks to be continuous, and develop sensitivity results for minimal ReLU networks.
more » « less
Full Text Available

Search for: All records