skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Overcoming the Stability Gap in Continual Learning
Pre-trained deep neural networks (DNNs) are being widely deployed by industry for making business decisions and to serve users; however, a major problem is model decay, where the DNN's predictions become more erroneous over time, resulting in revenue loss or unhappy users. To mitigate model decay, DNNs are retrained from scratch using old and new data. This is computationally expensive, so retraining happens only once performance significantly decreases. Here, we study how continual learning (CL) could potentially overcome model decay in large pre-trained DNNs and greatly reduce computational costs for keeping DNNs up-to-date. We identify the "stability gap" as a major obstacle in our setting. The stability gap refers to a phenomenon where learning new data causes large drops in performance for past tasks before CL mitigation methods eventually compensate for this drop. We test two hypotheses to investigate the factors influencing the stability gap and identify a method that vastly reduces this gap. In large-scale experiments for both easy and hard CL distributions (e.g., class incremental learning), we demonstrate that our method reduces the stability gap and greatly increases computational efficiency. Our work aligns CL with the goals of the production setting, where CL is needed for many applications.  more » « less
Award ID(s):
2326491 1909696 2317706
PAR ID:
10552022
Author(s) / Creator(s):
;
Publisher / Repository:
Transactions on Machine Learning Research
Date Published:
Journal Name:
Transactions on machine learning research
ISSN:
2835-8856
Subject(s) / Keyword(s):
Continual Learning Model Decay Green AI
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. To adapt to real-world data streams, continual learning (CL) systems must rapidly learn new concepts while preserving and utilizing prior knowledge. When it comes to adding new information to continually-trained deep neural networks (DNNs), classifier weights for newly encountered categories are typically initialized randomly, leading to high initial training loss (spikes) and instability. Consequently, achieving optimal convergence and accuracy requires prolonged training, increasing computational costs. Inspired by Neural Collapse (NC), we propose a weight initialization strategy to improve learning efficiency in CL. In DNNs trained with mean-squared-error, NC gives rise to a Least-Square (LS) classifier in the last layer, whose weights can be analytically derived from learned features. We leverage this LS formulation to initialize classifier weights in a data-driven manner, aligning them with the feature distribution rather than using random initialization. Our method mitigates initial loss spikes and accelerates adaptation to new tasks. We evaluate our approach in large-scale CL settings, demonstrating faster adaptation and improved CL performance. 
    more » « less
  2. Deep neural networks (DNNs) are increasingly used in critical applications like autonomous vehicles and medical diagnosis, where accuracy and reliability are crucial. However, debugging DNNs is challenging and expensive, often leading to unpredictable behavior and performance issues. Identifying and diagnosing bugs in DNNs is difficult due to complex and obscure failure symptoms, which are data-driven and compute-intensive. To address this, we propose TransBug a framework that combines transformer models for feature extraction with deep learning models for classification to detect and diagnose bugs in DNNs. We employ a pre-trained transformer model, which has been trained in programming languages, to extract semantic features from both faulty and correct DNN models. We then use these extracted features in a separate deep-learning model to determine whether the code contains bugs. If a bug is detected, the model further classifies the type of bug. By leveraging the powerful feature extraction capabilities of transformers, we capture relevant characteristics from the code, which are then used by a deep learning model to identify and classify various types of bugs. This combination of transformer-based feature extraction and deep learning classification allows our method to accurately link bug symptoms to their causes, enabling developers to take targeted corrective actions. Empirical results show that the TransBug shows an accuracy of 81% for binary classification and 91% for classifying bug types. 
    more » « less
  3. Deep neural networks (DNNs) demonstrates significant advantages in improving ranking performance in retrieval tasks. Driven by the recent developments in optimization and generalization of DNNs, learning a neural ranking model online from its interactions with users becomes possible. However, the required exploration for model learning has to be performed in the entire neural network parameter space, which is prohibitively expensive and limits the application of such online solutions in practice. In this work, we propose an efficient exploration strategy for online interactive neural ranker learning based on bootstrapping. Our solution is based on an ensemble of ranking models trained with perturbed user click feedback. The proposed method eliminates explicit confidence set construction and the associated computational overhead, which enables the online neural rankers training to be efficiently executed in practice with theoretical guarantees. Extensive comparisons with an array of state-of-the-art OL2R algorithms on two public learning to rank benchmark datasets demonstrate the effectiveness and computational efficiency of our proposed neural OL2R solution. 
    more » « less
  4. Due to the large computational cost of data classification using deep learning, resource-limited devices, e.g., smart phones, PCs, etc., offload their classification tasks to a cloud server, which offers extensive hardware resources. Unfortunately, since the cloud is an untrusted third-party, users may be reluctant to share their private data with the cloud for data classification. Differential privacy has been proposed as a way of securely classifying data at the cloud using deep learning. In this approach, users conceal their data before uploading it to the cloud using a local obfuscation deep learning model, which is based on a data classification model hosted by the cloud. However, as the obfuscation model assumes that the pre-trained model at the cloud is static, it leads to significant performance degradation under realistic classification models that are constantly being updated. In this paper, we investigate the performance of differentially-private data classification under a dynamic pre-trained model, and a constant obfuscation model. We find that the classification performance decreases as the pre-trained model evolves. We then investigate the classification performance under an obfuscation model that is updated alongside the pre-trained model. We find that with a modest computational effort the obfuscation model can be updated to significantly improve the classification performance. under a dynamic pre-trained model. 
    more » « less
  5. Embeddings produced by pre-trained deep neural networks (DNNs) are widely used; however, their efficacy for downstream tasks can vary widely. We study the factors influencing transferability and out-of-distribution (OOD) generalization of pre-trained DNN embeddings through the lens of the tunnel effect hypothesis, which is closely related to intermediate neural collapse. This hypothesis suggests that deeper DNN layers compress representations and hinder OOD generalization. Contrary to earlier work, our experiments show this is not a universal phenomenon. We comprehensively investigate the impact of DNN architecture, training data, image resolution, and augmentations on transferability. We identify that training with high-resolution datasets containing many classes greatly reduces representation compression and improves transferability. Our results emphasize the danger of generalizing findings from toy datasets to broader contexts. 
    more » « less