NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Closer Look at Model Collapse: From a Generalization-to-Memorization Perspective

Shi, Lianghe; Wu, Meng; Zhang, Huijie; Zhang, Zekai; Tao, Molei; Qu, Qing (September 2025, NeurIPS)

Free, publicly-accessible full text available September 18, 2026
SITCOM: Step-wise Triple-Consistent Diffusion Sampling For Inverse Problems

Alkhouri, Ismail; Liang, Shijun; Huang, Cheng-Han; Dai, Jimmy; Qu, Qing; Ravishankar, Saiprasad; Wang, Rongrong (July 2025, International Conference on Machine Learning)

Diffusion models (DMs) are a class of generative models that allow sampling from a distribution learned over a training set. When applied to solving inverse problems, the reverse sampling steps are modified to approximately sample from a measurement-conditioned distribution. However, these modifications may be unsuitable for certain settings (e.g., presence of measurement noise) and non-linear tasks, as they often struggle to correct errors from earlier steps and generally require a large number of optimization and/or sampling steps. To address these challenges, we state three conditions for achieving measurement-consistent diffusion trajectories. Building on these conditions, we propose a new optimization-based sampling method that not only enforces standard data manifold measurement consistency and forward diffusion consistency, as seen in previous studies, but also incorporates our proposed step-wise and network-regularized backward diffusion consistency that maintains a diffusion trajectory by optimizing over the input of the pre-trained model at every sampling step. By enforcing these conditions (implicitly or explicitly), our sampler requires significantly fewer reverse steps. Therefore, we refer to our method as Step-wise Triple- Consistent Sampling (SITCOM). Compared to SOTA baselines, our experiments across several linear and non-linear tasks (with natural and medical images) demonstrate that SITCOM achieves competitive or superior results in terms of standard similarity metrics and run-time.
more » « less
Free, publicly-accessible full text available July 14, 2026
Attention-Only Transformers via Unrolled Subspace Denoising

Wang, Peng; Lu, Yifu; Yu, Yaodong; Pai, Druv; Qu, Qing; Ma, Yi (May 2025, International Conference on Machine Learning)

Free, publicly-accessible full text available May 31, 2026
Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Ghosh, Avrajit; Kwon, Soo Min; Wang, Rongrong; Ravishankar, Saiprasad; Qu, Qing (May 2025, International Conference on Learning Representations)

Free, publicly-accessible full text available May 1, 2026
Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

Yaras, Can; Chen, Siyi; Wang, Peng; Qu, Qing (March 2025, The Second Conference on Parsimony and Learning)

Free, publicly-accessible full text available March 7, 2026
Explaining and Mitigating the Modality Gap in Contrastive Multimodal Learning

Yaras, Can; Chen, Siyi; Wang, Peng; Qu, Qing (March 2025, Second Conference on Parsimony and Learning (CPAL 2025))

Multimodal learning has recently gained significant popularity, demonstrating impressive performance across various zero-shot classification tasks and a range of perceptive and generative applications. Models such as Contrastive Language–Image Pretraining (CLIP) are designed to bridge different modalities, such as images and text, by learning a shared representation space through contrastive learning. Despite their success, the working mechanisms of multimodal learning remain poorly understood. Notably, these models often exhibit a \emph{modality gap}, where different modalities occupy distinct regions within the shared representation space. In this work, we conduct an in-depth analysis of the emergence of modality gap by characterizing the gradient flow learning dynamics. Specifically, we identify the critical roles of mismatched data pairs and a learnable temperature parameter in causing and perpetuating the modality gap during training. Furthermore, our theoretical insights are validated through experiments on practical CLIP models. These findings provide principled guidance for mitigating the modality gap, including strategies such as appropriate temperature scheduling and modality swapping. Additionally, we demonstrate that closing the modality gap leads to improved performance on tasks such as image-text retrieval.
more » « less
Free, publicly-accessible full text available March 7, 2026
Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Ghosh, Avrajit; Kwon, Soo Min; Wang, Rongrong; Ravishankar, Saiprasad; Qu, Qing (March 2025, International Conference on Learning Representations)

Deep neural networks trained using gradient descent with a fixed learning rate eta often operate in the regime of ``edge of stability'' (EOS), where the largest eigenvalue of the Hessian equilibrates about the stability threshold 2/eta. In this work, we present a fine-grained analysis of the learning dynamics of (deep) linear networks (DLNs) within the deep matrix factorization loss beyond EOS. For DLNs, loss oscillations beyond EOS follow a period-doubling route to chaos. We theoretically analyze the regime of the 2-period orbit and show that the loss oscillations occur within a small subspace, with the dimension of the subspace precisely characterized by the learning rate. The crux of our analysis lies in showing that the symmetry-induced conservation law for gradient flow, defined as the balancing gap among the singular values across layers, breaks at EOS and decays monotonically to zero. Overall, our results contribute to explaining two key phenomena in deep networks: (i) shallow models and simple tasks do not always exhibit EOS; and (ii) oscillations occur within top features}. We present experiments to support our theory, along with examples demonstrating how these phenomena occur in nonlinear networks and how they differ from those which have benign landscapes such as in DLNs.
more » « less
Free, publicly-accessible full text available March 25, 2026
Learning Dynamics of Deep Matrix Factorization Beyond the Edge of Stability

Ghosh, Avrajit; Kwon, Soo Min; Wang, Rongrong; Ravishankar, Saiprasad; Qu, Qing (March 2025, The Thirteenth International Conference on Learning Representations)

Free, publicly-accessible full text available March 5, 2026
Analysis of Deep Image Prior and Exploiting Self-Guidance for Image Reconstruction

https://doi.org/10.1109/TCI.2025.3540706

Liang, Shijun; Bell, Evan; Qu, Qing; Wang, Rongrong; Ravishankar, Saiprasad (February 2025, IEEE Transactions on Computational Imaging)

Free, publicly-accessible full text available February 19, 2026
Understanding Generalizability of Diffusion Models Requires Rethinking the Hidden Gaussian Structure

Li, Xiang; Dai, Yixiang; Qu, Qing (December 2024, 38th Conference on Neural Information Processing Systems (NeurIPS 2024))

In this work, we study the generalizability of diffusion models by looking into the hidden properties of the learned score functions, which are essentially a series of deep denoisers trained on various noise levels. We observe that as diffusion models transition from memorization to generalization, their corresponding nonlinear diffusion denoisers exhibit increasing linearity. This discovery leads us to investigate the linear counterparts of the nonlinear diffusion models, which are a series of linear models trained to match the function mappings of the nonlinear diffusion denoisers. Surprisingly, these linear denoisers are approximately the optimal denoisers for a multivariate Gaussian distribution characterized by the empirical mean and covariance of the training dataset. This finding implies that diffusion models have the inductive bias towards capturing and utilizing the Gaussian structure (covariance information) of the training dataset for data generation. We empirically demonstrate that this inductive bias is a unique property of diffusion models in the generalization regime, which becomes increasingly evident when the model's capacity is relatively small compared to the training dataset size. In the case that the model is highly overparameterized, this inductive bias emerges during the initial training phases before the model fully memorizes its training data. Our study provides crucial insights into understanding the notable strong generalization phenomenon recently observed in real-world diffusion models.
more » « less
Full Text Available

« Prev Next »

Search for: All records