In supervised continual learning, a deep neural network (DNN) is updated with an ever-growing data stream. Unlike the offline setting where data is shuffled, we cannot make any distributional assumptions about the data stream. Ideally, only one pass through the dataset is needed for computational efficiency. However, existing methods are inadequate and make many assumptions that cannot be made for real-world applications, while simultaneously failing to improve computational efficiency. In this paper, we propose a novel continual learning method, SIESTA based on wake/sleep framework for training, which is well aligned to the needs of on-device learning. The major goal of SIESTA is to advance compute efficient continual learning so that DNNs can be updated efficiently using far less time and energy. The principal innovations of SIESTA are: 1) rapid online updates using a rehearsal-free, backpropagation-free, and data-driven network update rule during its wake phase, and 2) expedited memory consolidation using a compute-restricted rehearsal policy during its sleep phase. For memory efficiency, SIESTA adapts latent rehearsal using memory indexing from REMIND. Compared to REMIND and prior arts, SIESTA is far more computationally efficient, enabling continual learning on ImageNet-1K in under 2 hours on a single GPU; moreover, in the augmentation-free setting it matches the performance of the offline learner, a milestone critical to driving adoption of continual learning in real-world applications.
more »
« less
Online Continual Learning for Embedded Devices
Real-time on-device continual learning is needed for new applications such as home robots, user personalization on smartphones, and augmented/virtual reality headsets. However, this setting poses unique challenges: embedded devices have limited memory and compute capacity and conventional machine learning models suffer from catastrophic forgetting when updated on non-stationary data streams. While several online continual learning models have been developed, their effectiveness for embedded applications has not been rigorously studied. In this paper, we first identify criteria that online continual learners must meet to effectively perform real-time, on-device learning. We then study the efficacy of several online continual learning methods when used with mobile neural networks. We measure their performance, memory usage, compute requirements, and ability to generalize to out-of-domain inputs.
more »
« less
- PAR ID:
- 10333634
- Date Published:
- Journal Name:
- Conference on Lifelong Learning Agents
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Advances in embedded systems have enabled integration of many lightweight sensory devices within our daily life. In particular, this trend has given rise to continuous expansion of wearable sensors in a broad range of applications from health and fitness monitoring to social networking and military surveillance. Wearables leverage machine learning techniques to profile behavioral routine of their end-users through activity recognition algorithms. Current research assumes that such machine learning algorithms are trained offline. In reality, however, wearables demand continuous reconfiguration of their computational algorithms due to their highly dynamic operation. Developing a personalized and adaptive machine learning model requires real-time reconfiguration of the model. Due to stringent computation and memory constraints of these embedded sensors, the training/re-training of the computational algorithms need to be memory- and computation-efficient. In this paper, we propose a framework, based on the notion of online learning, for real-time and on-device machine learning training. We propose to transform the activity recognition problem from a multi-class classification problem to a hierarchical model of binary decisions using cascading online binary classifiers. Our results, based on Pegasos online learning, demonstrate that the proposed approach achieves 97% accuracy in detecting activities of varying intensities using a limited memory while power usages of the system is reduced by more than 40%.more » « less
-
Due to the separate memory and computation units in traditional Von-Neumann architecture, massive data transfer dominates the overall computing system’s power and latency, known as the ‘Memory-Wall’ issue. Especially with ever-increasing deep learning-based AI model size and computing complexity, it becomes the bottleneck for state-of-the-art AI computing systems. To address this challenge, In-Memory Computing (IMC) based Neural Network accelerators have been widely investigated to support AI computing within memory. However, most of those works focus only on inference. The on-device training and continual learning have not been well explored yet. In this work, for the first time, we introduce on-device continual learning with STT-assisted-SOT (SAS) Magnetic Random Access Memory (MRAM) based IMC system. On the hardware side, we have fabricated a SAS-MRAM device prototype with 4 Magnetic Tunnel Junctions (MTJ, each at 100nm × 50nm) sharing a common heavy metal layer, achieving significantly improved memory writing and area efficiency compared to traditional SOT-MRAM. Next, we designed fully digital IMC circuits with our SAS-MRAM to support both neural network inference and on-device learning. To enable efficient on-device continual learning for new task data, we present an 8-bit integer (INT8) based continual learning algorithm that utilizes our SAS-MRAM IMC-supported bit-serial digital in-memory convolution operations to train a small parallel reprogramming Network (Rep-Net) while freezing the major backbone model. Extensive studies have been presented based on our fabricated SAS-MRAM device prototype, cross-layer device-circuit benchmarking and simulation, as well as the on-device continual learning system evaluation.more » « less
-
Supervised Continual learning involves updating a deep neural network (DNN) from an ever-growing stream of labeled data. While most work has focused on overcoming catastrophic forgetting, one of the major motivations behind continual learning is being able to efficiently update a network with new information, rather than retraining from scratch on the training dataset as it grows over time. Despite recent continual learning methods largely solving the catastrophic forgetting problem, there has been little attention paid to the efficiency of these algorithms. Here, we study recent methods for incremental class learning and illustrate that many are highly inefficient in terms of compute, memory, and storage. Some methods even require more compute than training from scratch! We argue that for continual learning to have real-world applicability, the research community cannot ignore the resources used by these algorithms. There is more to continual learning than mitigating catastrophic forgetting.more » « less
-
Machine learning is an expanding field with an ever-increasing role in everyday life, with its utility in the industrial, agricultural, and medical sectors being undeniable. Recently, this utility has come in the form of machine learning implementation on embedded system devices. While there have been steady advances in the performance, memory, and power consumption of embedded devices, most machine learning algorithms still have a very high power consumption and computational demand, making the implementation of embedded machine learning somewhat difficult. However, different devices can be implemented for different applications based on their overall processing power and performance. This paper presents an overview of several different implementations of machine learning on embedded systems divided by their specific device, application, specific machine learning algorithm, and sensors. We will mainly focus on NVIDIA Jetson and Raspberry Pi devices with a few different less utilized embedded computers, as well as which of these devices were more commonly used for specific applications in different fields. We will also briefly analyze the specific ML models most commonly implemented on the devices and the specific sensors that were used to gather input from the field. All of the papers included in this review were selected using Google Scholar and published papers in the IEEExplore database. The selection criterion for these papers was the usage of embedded computing systems in either a theoretical study or practical implementation of machine learning models. The papers needed to have provided either one or, preferably, all of the following results in their studies—the overall accuracy of the models on the system, the overall power consumption of the embedded machine learning system, and the inference time of their models on the embedded system. Embedded machine learning is experiencing an explosion in both scale and scope, both due to advances in system performance and machine learning models, as well as greater affordability and accessibility of both. Improvements are noted in quality, power usage, and effectiveness.more » « less
An official website of the United States government

