NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization

Harun, MY; Kanan, C (August 2025, Proc. Conference on Lifelong Learning Agents (CoLLAs))

To adapt to real-world data streams, continual learning (CL) systems must rapidly learn new concepts while preserving and utilizing prior knowledge. When it comes to adding new information to continually-trained deep neural networks (DNNs), classifier weights for newly encountered categories are typically initialized randomly, leading to high initial training loss (spikes) and instability. Consequently, achieving optimal convergence and accuracy requires prolonged training, increasing computational costs. Inspired by Neural Collapse (NC), we propose a weight initialization strategy to improve learning efficiency in CL. In DNNs trained with mean-squared-error, NC gives rise to a Least-Square (LS) classifier in the last layer, whose weights can be analytically derived from learned features. We leverage this LS formulation to initialize classifier weights in a data-driven manner, aligning them with the feature distribution rather than using random initialization. Our method mitigates initial loss spikes and accelerates adaptation to new tasks. We evaluate our approach in large-scale CL settings, demonstrating faster adaptation and improved CL performance.
more » « less
Free, publicly-accessible full text available August 11, 2026
Improving Multimodal Large Language Models Using Continual Learning

Srivastava, S; Harun, MY; Singh, R; Kanan, C (August 2025, Proc. Conference on Lifelong Learning Agents (CoLLAs))

Generative large language models (LLMs) exhibit impressive capabilities, which can be further augmented by integrating a pre-trained vision model into the original LLM to create a multimodal LLM (MLLM). However, this integration often significantly decreases performance on natural language understanding and generation tasks, compared to the original LLM. This study investigates this issue using the LLaVA MLLM, treating the integration as a continual learning problem. We evaluate five continual learning methods to mitigate forgetting and identify a technique that enhances visual understanding while minimizing linguistic performance loss. Our approach reduces linguistic performance degradation by up to 15% over the LLaVA recipe, while maintaining high multimodal accuracy. We also demonstrate the robustness of our method through continual learning on a sequence of vision-language tasks, effectively preserving linguistic skills while acquiring new multimodal capabilities.
more » « less
Free, publicly-accessible full text available August 11, 2026
Controlling Neural Collapse Enhances Out-of-Distribution Detection and Transfer Learning

Harun, MY; Gallardo, J; Kanan, C (July 2025, Proc. International Conference on Machine Learning (ICML))

Out-of-distribution (OOD) detection and OOD generalization are widely studied in Deep Neural Networks (DNNs), yet their relationship remains poorly understood. We empirically show that the degree of Neural Collapse (NC) in a network layer is inversely related with these objectives: stronger NC improves OOD detection but degrades generalization, while weaker NC enhances generalization at the cost of detection. This trade-off suggests that a single feature space cannot simultaneously achieve both tasks. To address this, we develop a theoretical framework linking NC to OOD detection and generalization. We show that entropy regularization mitigates NC to improve generalization, while a fixed Simplex ETF projector enforces NC for better detection. Based on these insights, we propose a method to control NC at different DNN layers. In experiments, our method excels at both tasks across OOD datasets and DNN architectures.
more » « less
Free, publicly-accessible full text available July 13, 2026
Temporal Chunking Enhances Recognition of Implicit Sequential Patterns

Dey, J; Soures, N; Gonzales, M; Lerner, I; Kanan, C; Kudithipudi, D (May 2025, arxiv)
arxiv (Ed.)
Free, publicly-accessible full text available May 31, 2026
What Variables Affect Out-of-Distribution Generalization in Pretrained Models?

Harun, M Y; Lee, K; Gallardo, J; Krishnan, G; Kanan, C (December 2024, Neural Information Processing Systems (NeurIPS))

Embeddings produced by pre-trained deep neural networks (DNNs) are widely used; however, their efficacy for downstream tasks can vary widely. We study the factors influencing transferability and out-of-distribution (OOD) generalization of pre-trained DNN embeddings through the lens of the tunnel effect hypothesis, which is closely related to intermediate neural collapse. This hypothesis suggests that deeper DNN layers compress representations and hinder OOD generalization. Contrary to earlier work, our experiments show this is not a universal phenomenon. We comprehensively investigate the impact of DNN architecture, training data, image resolution, and augmentations on transferability. We identify that training with high-resolution datasets containing many classes greatly reduces representation compression and improves transferability. Our results emphasize the danger of generalizing findings from toy datasets to broader contexts.
more » « less
Full Text Available
Overcoming the Stability Gap in Continual Learning

Harun, M Y; Kanan, C (September 2024, Transactions on machine learning research)

Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making business decisions and to serve users; however, a major problem is model decay, where the DNN's predictions become more erroneous over time, resulting in revenue loss or unhappy users. To mitigate model decay, DNNs are retrained from scratch using old and new data. This is computationally expensive, so retraining happens only once performance significantly decreases. Here, we study how continual learning (CL) could potentially overcome model decay in large pre-trained DNNs and greatly reduce computational costs for keeping DNNs up-to-date. We identify the "stability gap" as a major obstacle in our setting. The stability gap refers to a phenomenon where learning new data causes large drops in performance for past tasks before CL mitigation methods eventually compensate for this drop. We test two hypotheses to investigate the factors influencing the stability gap and identify a method that vastly reduces this gap. In large-scale experiments for both easy and hard CL distributions (e.g., class incremental learning), we demonstrate that our method reduces the stability gap and greatly increases computational efficiency. Our work aligns CL with the goals of the production setting, where CL is needed for many applications.
more » « less
Full Text Available
PositCL: Compact Continual Learning with Posit Aware Quantization

https://doi.org/10.1145/3649476.3660371

Karia, Vedant; Zyarah, Abdullah; Kudithipudi, Dhireesha (June 2024, ACM)

Full Text Available
Advancing Neuro-Inspired Lifelong Learning for Edge with Co-Design

https://doi.org/10.1609/aaaiss.v3i1.31226

Soures, Nicholas; Karia, Vedant; Kudithipudi, Dhireesha (May 2024, Proceedings of the AAAI Symposium Series)

Lifelong learning, which refers to an agent's ability to continuously learn and enhance its performance over its lifespan, is a significant challenge in artificial intelligence (AI), that biological systems tackle efficiently. This challenge is further exacerbated when AI is deployed in untethered environments with strict energy and latency constraints. We take inspiration from neural plasticity and investigate how to leverage and build energy-efficient lifelong learning machines. Specifically, we study how a combination of neural plasticity mechanisms, namely neuromodulation, synaptic consolidation, and metaplasticity, enhance the continual learning capabilities of AI models. We further co-design architectures that leverage compute-in-memory topologies and sparse spike-based communication with quantization for the edge. Aspects of this co-design can be transferred to federated lifelong learning scenarios.
more » « less
Full Text Available

Search for: All records