NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Ghosh, Avrajit; Kwon, Soo Min; Wang, Rongrong; Ravishankar, Saiprasad; Qu, Qing (March 2025, International Conference on Learning Representations)

Deep neural networks trained using gradient descent with a fixed learning rate eta often operate in the regime of ``edge of stability'' (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold 2/eta. In this work, we present a fine-grained analysis of the learning dynamics of (deep) linear networks (DLNs) within the deep matrix factorization loss beyond EOS. For DLNs, loss oscillations beyond EOS follow a period-doubling route to chaos. We theoretically analyze the regime of the 2-period orbit and show that the loss oscillations occur within a small subspace, with the dimension of the subspace precisely characterized by the learning rate. The crux of our analysis lies in showing that the symmetry-induced conservation law for gradient flow, defined as the balancing gap among the singular values across layers, breaks at EOS and decays monotonically to zero. Overall, our results contribute to explaining two key phenomena in deep networks: (i) shallow models and simple tasks do not always exhibit EOS; and (ii) oscillations occur within top features}. We present experiments to support our theory, along with examples demonstrating how these phenomena occur in nonlinear networks and how they differ from those which have benign landscapes such as in DLNs.
more » « less
Full Text Available
Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

Yaras, Can; Chen, Siyi; Wang, Peng; Qu, Qing (March 2025, The Second Conference on Parsimony and Learning)

Full Text Available
Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Ghosh, Avrajit; Kwon, Soo Min; Wang, Rongrong; Ravishankar, Saiprasad; Qu, Qing (March 2025, The Thirteenth International Conference on Learning Representations)

Full Text Available
Exploring Low-Dimensional Subspace in Diffusion Models for Controllable Image Editing

Chen, Siyi; Zhang, Huijie; Guo, Minzhe; Lu, Yifu; Wang, Peng; Qu, Qing (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
BLAST: Block-Level Adaptive Structured Matrices for Efficient Deep Neural Network Inference

Lee, Changwoo; Kwon, Soo Min; Qu, Qing; Kim, Hun-Seok (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure

Li, Xiang; Dai, Yixiang; Qu, Qing (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
Sim2Real in reconstructive spectroscopy: Deep learning with augmented device-informed data simulation

https://doi.org/10.1063/5.0209339

Chen, Jiyi; Li, Pengyu; Wang, Yutong; Ku, Pei-Cheng; Qu, Qing (September 2024, APL Machine Learning)

This work proposes a deep learning (DL)-based framework, namely Sim2Real, for spectral signal reconstruction in reconstructive spectroscopy, focusing on efficient data sampling and fast inference time. The work focuses on the challenge of reconstructing real-world spectral signals in an extreme setting where only device-informed simulated data are available for training. Such device-informed simulated data are much easier to collect than real-world data but exhibit large distribution shifts from their real-world counterparts. To leverage such simulated data effectively, a hierarchical data augmentation strategy is introduced to mitigate the adverse effects of this domain shift, and a corresponding neural network for the spectral signal reconstruction with our augmented data is designed. Experiments using a real dataset measured from our spectrometer device demonstrate that Sim2Real achieves significant speed-up during the inference while attaining on-par performance with the state-of-the-art optimization-based methods.
more » « less
Full Text Available
Compressible Dynamics in Deep Overparameterized Low-Rank Learning & Adaptation

Yaras, Can; Wang, Peng; Balzano, Laura; Qu, Qing (June 2024, International Conference on Machine Learning)

While overparameterization in machine learning models offers great benefits in terms of optimization and generalization, it also leads to increased computational requirements as model sizes grow. In this work, we show that by leveraging the inherent low-dimensional structures of data and compressible dynamics within the model parameters, we can reap the benefits of overparameterization without the computational burdens. In practice, we demonstrate the effectiveness of this approach for deep low-rank matrix completion as well as fine-tuning language models. Our approach is grounded in theoretical findings for deep overparameterized low-rank matrix recovery, where we show that the learning dynamics of each weight matrix are confined to an invariant low-dimensional subspace. Consequently, we can construct and train compact, highly compressed factorizations possessing the same benefits as their overparameterized counterparts. In the context of deep matrix completion, our technique substantially improves training efficiency while retaining the advantages of overparameterization. For language model fine-tuning, we propose a method called "Deep LoRA", which improves the existing low-rank adaptation (LoRA) technique, leading to reduced overfitting and a simplified hyperparameter setup, while maintaining comparable efficiency. We validate the effectiveness of Deep LoRA on natural language tasks, particularly when fine-tuning with limited data.
more » « less
Full Text Available
Neural Collapse in Multi-label Learning with Pick-all-label Loss

Li, Pengyu; Li, Xiao; Wang, Yutong; Qu, Qing (June 2024, International Conference on Machine Learning)

We study deep neural networks for the multi-label classification (MLab) task through the lens of neural collapse (NC). Previous works have been restricted to the multi-class classification setting and discovered a prevalent NC phenomenon comprising of the following properties for the last-layer features: (i) the variability of features within every class collapses to zero, (ii) the set of feature means form an equi-angular tight frame (ETF), and (iii) the last layer classifiers collapse to the feature mean upon some scaling. We generalize the study to multi-label learning, and prove for the first time that a generalized NC phenomenon holds with the "pick-all-label'' formulation, which we term as MLab NC. While the ETF geometry remains consistent for features with a single label, multi-label scenarios introduce a unique combinatorial aspect we term the "tag-wise average" property, where the means of features with multiple labels are the scaled averages of means for single-label instances. Theoretically, under proper assumptions on the features, we establish that the only global optimizer of the pick-all-label cross-entropy loss satisfy the multi-label NC. In practice, we demonstrate that our findings can lead to better test performance with more efficient training techniques for MLab learning.
more » « less
Full Text Available
A Global Geometric Analysis of Maximal Coding Rate Reduction

Wang, Peng; Liu, Huikang; Pai, Druv; Yu, Yaodong; Zhu, Zhihui; Qu, Qing; Ma, Yi (June 2024, International Conference on Machine Learning)

The maximal coding rate reduction (MCR2) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures. However, it lacks a complete theoretical justification: only the properties of its global optima are known, and its global landscape has not been studied. In this work, we give a complete characterization of the properties of all its local and global optima, as well as other types of critical points. Specifically, we show that each (local or global) maximizer of the MCR2 problem corresponds to a low-dimensional, discriminative, and diverse representation, and furthermore, each critical point of the objective is either a local maximizer or a strict saddle point. Such a favorable landscape makes MCR2 a natural choice of objective for learning diverse and discriminative representations via first-order optimization methods. To validate our theoretical findings, we conduct extensive experiments on both synthetic and real data sets.
more » « less
Full Text Available

« Prev Next »

Search for: All records