skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.


Title: Theoretical Understanding of the Information Flow on Continual Learning Performance
Continual learning (CL) requires a model to continually learn new tasks with incremental available information while retaining previous knowledge. Despite the numerous previous approaches to CL, most of them still suffer forgetting, expensive memory cost, or lack sufficient theoretical understanding. While different CL training regimes have been extensively studied empirically, insufficient attention has been paid to the underlying theory. In this paper, we establish a probabilistic framework to analyze information flow through layers in networks for sequential tasks and its impact on learning performance. Our objective is to optimize the information preservation between layers while learning new tasks. This manages task-specific knowledge passing throughout the layers while maintaining model performance on previous tasks. Our analysis provides novel insights into information adaptation within the layers during incremental task learning. We provide empirical evidence and practically highlight the performance improvement across multiple tasks. Code is available at https://github.com/Sekeh-Lab/InformationFlow-CL.  more » « less
Award ID(s):
2053480
NSF-PAR ID:
10427624
Author(s) / Creator(s):
;
Date Published:
Journal Name:
European Conference on Computer Vision
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Task‐incremental learning (Task‐IL) aims to enable an intelligent agent to continuously accumulate knowledge from new learning tasks without catastrophically forgetting what it has learned in the past. It has drawn increasing attention in recent years, with many algorithms being proposed to mitigate neural network forgetting. However, none of the existing strategies is able to completely eliminate the issues. Moreover, explaining and fully understanding what knowledge and how it is being forgotten during the incremental learning process still remains under‐explored. In this paper, we propose KnowledgeDrift, a visual analytics framework, to interpret the network forgetting with three objectives: (1) to identify when the network fails to memorize the past knowledge, (2) to visualize what information has been forgotten, and (3) to diagnose how knowledge attained in the new model interferes with the one learned in the past. Our analytical framework first identifies the occurrence of forgetting by tracking the task performance under the incremental learning process and then provides in‐depth inspections of drifted information via various levels of data granularity. KnowledgeDrift allows analysts and model developers to enhance their understanding of network forgetting and compare the performance of different incremental learning algorithms. Three case studies are conducted in the paper to further provide insights and guidance for users to effectively diagnose catastrophic forgetting over time.

     
    more » « less
  2. This paper studies continual learning (CL) of a sequence of aspect sentiment classification (ASC) tasks. Although some CL techniques have been proposed for document sentiment classification, we are not aware of any CL work on ASC. A CL system that incrementally learns a sequence of ASC tasks should address the following two issues: (1) transfer knowledge learned from previous tasks to the new task to help it learn a better model, and (2) maintain the performance of the models for previous tasks so that they are not forgotten. This paper proposes a novel capsule network based model called B-CL to address these issues. B-CL markedly improves the ASC performance on both the new task and the old tasks via forward and backward knowledge transfer. The effectiveness of B-CL is demonstrated through extensive experiments. 
    more » « less
  3. Malicious software (malware) classification offers a unique challenge for continual learning (CL) regimes due to the volume of new samples received on a daily basis and the evolution of malware to exploit new vulnerabilities. On a typical day, antivirus vendors receive hundreds of thousands of unique pieces of software, both malicious and benign, and over the course of the lifetime of a malware classifier, more than a billion samples can easily accumulate. Given the scale of the problem, sequential training using continual learning techniques could provide substantial benefits in reducing training and storage overhead. To date, however, there has been no exploration of CL applied to malware classification tasks. In this paper, we study 11 CL techniques applied to three malware tasks covering common incremental learning scenarios, including task, class, and domain incremental learning (IL). Specifically, using two realistic, large-scale malware datasets, we evaluate the performance of the CL methods on both binary malware classification (Domain-IL) and multi-class malware family classification (Task-IL and Class-IL) tasks. To our surprise, continual learning methods significantly underperformed naive Joint replay of the training data in nearly all settings – in some cases reducing accuracy by more than 70 percentage points. A simple approach of selectively replaying 20% of the stored data achieves better performance, with 50% of the training time compared to Joint replay. Finally, we discuss potential reasons for the unexpectedly poor performance of the CL techniques, with the hope that it spurs further research on developing techniques that are more effective in the malware classification domain. 
    more » « less
  4. null (Ed.)
    Due to insufficient training data and the high computational cost to train a deep neural network from scratch, transfer learning has been extensively used in many deep-neural-network-based applications. A commonly used transfer learning approach involves taking a part of a pre-trained model, adding a few layers at the end, and re-training the new layers with a small dataset. This approach, while efficient and widely used, imposes a security vulnerability because the pre-trained model used in transfer learning is usually publicly available, including to potential attackers. In this paper, we show that without any additional knowledge other than the pre-trained model, an attacker can launch an effective and efficient brute force attack that can craft instances of input to trigger each target class with high confidence. We assume that the attacker has no access to any target-specific information, including samples from target classes, re-trained model, and probabilities assigned by Softmax to each class, and thus making the attack target-agnostic. These assumptions render all previous attack models inapplicable, to the best of our knowledge. To evaluate the proposed attack, we perform a set of experiments on face recognition and speech recognition tasks and show the effectiveness of the attack. Our work reveals a fundamental security weakness of the Softmax layer when used in transfer learning settings 
    more » « less
  5. Due to insufficient training data and the high computational cost to train a deep neural network from scratch, transfer learning has been extensively used in many deep-neural-network-based applications. A commonly used transfer learning approach involves taking a part of a pre-trained model, adding a few layers at the end, and re-training the new layers with a small dataset. This approach, while efficient and widely used, imposes a security vulnerability because the pre-trained model used in transfer learning is usually publicly available, including to potential attackers. In this paper, we show that without any additional knowledge other than the pre-trained model, an attacker can launch an effective and efficient brute force attack that can craft instances of input to trigger each target class with high confidence. We assume that the attacker has no access to any target-specific information, including samples from target classes, re-trained model, and probabilities assigned by Softmax to each class, and thus making the attack target-agnostic. These assumptions render all previous attack models inapplicable, to the best of our knowledge. To evaluate the proposed attack, we perform a set of experiments on face recognition and speech recognition tasks and show the effectiveness of the attack. Our work reveals a fundamental security weakness of the Softmax layer when used in transfer learning settings. 
    more » « less