NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Spectral Adapter: Fine-Tuning in Spectral Space

Fangzhao, Zhang; Pilanci, Mert (December 2024, Neural Information Processing Systems (Neurips) 2024)

Full Text Available
Compressing Large Language Models using Low Rank and Low Precision Decomposition

Saha, Rajarshi; Sagan, Naomi; Srivastava, Naomi; Goldsmith, Andrea; Pilanci, Mert (December 2024, Neural Information Processing Systems (Neurips) 2024)

Full Text Available
Sketching the Krylov subspace: faster computation of the entire ridge regularization path

https://doi.org/10.1007/s11227-023-05309-w

Wang, Yifei; Pilanci, Mert (January 2023, The Journal of Supercomputing)

We propose a fast algorithm for computing the entire ridge regression regularization path in nearly linear time. Our method constructs a basis on which the solution of ridge regression can be computed instantly for any value of the regularization parameter. Consequently, linear models can be tuned via cross-validation or other risk estimation strategies with substantially better efciency. The algorithm is based on iteratively sketching the Krylov subspace with a binomial decomposition over the regularization path. We provide a convergence analysis with various sketching matrices and show that it improves the state-of-the-art computational complexity. We also provide a technique to adaptively estimate the sketching dimension. This algorithm works for both the over-determined and under-determined problems. We also provide an extension for matrix-valued ridge regression. The numerical results on real medium and large-scale ridge regression tasks illustrate the efectiveness of the proposed method compared to standard baselines which require super-linear computational time.
more » « less
Full Text Available
Optimal Sets and Solution Paths of ReLU Networks

Mishkin, Aaron; Pilanci, Mert (January 2023, International Conference on Machine Learning)

We develop an analytical framework to characterize the set of optimal ReLU neural networks by reformulating the non-convex training problem as a convex program. We show that the global optima of the convex parameterization are given by a polyhedral set and then extend this characterization to the optimal set of the non-convex training objective. Since all stationary points of the ReLU training problem can be represented as optima of sub-sampled convex programs, our work provides a general expression for all critical points of the non-convex objective. We then leverage our results to provide an optimal pruning algorithm for computing minimal networks, establish conditions for the regularization path of ReLU networks to be continuous, and develop sensitivity results for minimal ReLU networks.
more » « less
Full Text Available
Optimal Shrinkage for Distributed Second-Order Optimization

Zhang, Fangzhao; Pilanci, Mert (January 2023, International Conference on Machine Learning)

In this work, we address the problem of Hessian inversion bias in distributed second-order optimization algorithms. We introduce a novel shrinkage-based estimator for the resolvent of gram matrices that is asymptotically unbiased, and characterize its non-asymptotic convergence rate in the isotropic case. We apply this estimator to bias correction of Newton steps in distributed second-order optimization algorithms, as well as randomized sketching based methods. We examine the bias present in the naive averaging-based distributed Newton’s method using analytical expressions and contrast it with our proposed biasfree approach. Our approach leads to significant improvements in convergence rate compared to standard baselines and recent proposals, as shown through experiments on both real and synthetic datasets.
more » « less
Full Text Available
The Convex Geometry of Backpropagation: Neural Network Gradient Flows Converge to Extreme Points of the Dual Convex Program

Wang, Y.; Pilanci, M. (October 2022, International Conference on Learning Representations)

Full Text Available
Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization

Ergen, T.; Sahiner, A.; Ozturkler, B.; Pauly, J.; Mardani, M.; Pilanci, M. (October 2022, International Conference on Learning Representations)

Full Text Available
The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions

Wang, Y.; Lacotte, J.; Pilanci M. (October 2022, International Conference on Learning Representations)

Full Text Available
Hidden Convexity of Wasserstein GANs: Interpretable Generative Models with Closed-Form Solutions

Sahiner, A.; Ergen, T.; Ozturkler, B.; Bartan, B.; Pauly, J.; Mardani, M.; Pilanci, M. (October 2022, International Conference on Learning Representations)

Full Text Available
Unraveling Attention via Convex Duality: Analysis and Interpretations of Vision Transformers

Sahiner, A.; Ergen, T.; Ozturkler, B.; Pauly, J.; Mardani, M.; Pilanci, M. (April 2022, International Conference on Machine Learning)

Full Text Available

Search for: All records