This content will become publicly available on April 25, 2023

TRGP: Trust Region Gradient Projection for Continual Learning
Authors:
; ; ;
Award ID(s):
Publication Date:
NSF-PAR ID:
10352359
Journal Name:
The Tenth International Conference on Learning Representations
1. By learning a sequence of tasks continually, an agent in continual learning (CL) can improve the learning performance of both a new task and old' tasks by leveraging the forward knowledge transfer and the backward knowledge transfer, respectively. However, most existing CL methods focus on addressing catastrophic forgetting in neural networks by minimizing the modification of the learnt model for old tasks. This inevitably limits the backward knowledge transfer from the new task to the old tasks, because judicious model updates could possibly improve the learning performance of the old tasks as well. To tackle this problem, we first theoretically analyze the conditions under which updating the learnt model of old tasks could be beneficial for CL and also lead to backward knowledge transfer, based on the gradient projection onto the input subspaces of old tasks. Building on the theoretical analysis, we next develop a ContinUal learning method with Backward knowlEdge tRansfer (CUBER), for a fixed capacity neural network without data replay. In particular, CUBER first characterizes the task correlation to identify the positively correlated old tasks in a layer-wise manner, and then selectively modifies the learnt model of the old tasks when learning the new task. Experimental studies showmore »
3. The recent success of deep neural networks in prediction tasks on wearable sensor data is evident. However, in more practical online learning scenarios, where new data arrive sequentially, neural networks suffer severely from the catastrophic forgetting` problem. In real-world settings, given a pre-trained model on the old data, when we collect new data, it is practically infeasible to re-train the model on both old and new data because the computational costs will increase dramatically as more and more data arrive in time. However, if we fine-tune the model only with the new data because the new data might be different from the old data, the neural network parameters will change to fit the new data. As a result, the new parameters are no longer suitable for the old data. This phenomenon is known as catastrophic forgetting, and continual learning research aims to overcome this problem with minimal computational costs. While most of the continual learning research focuses on computer vision tasks, implications of catastrophic forgetting in wearable computing research and potential avenues to address this problem have remained unexplored. To address this knowledge gap, we study continual learning for activity recognition using wearable sensor data. We show that the catastrophicmore »