skip to main content


Title: GeneSys: Enabling Continuous Learning through Neural Network Evolution in Hardware
Modern deep learning systems rely on (a) a handtuned neural network topology, (b) massive amounts of labelled training data, and (c) extensive training over large-scale compute resources to build a system that can perform efficient image classification or speech recognition. Unfortunately, we are still far away from implementing adaptive general purpose intelligent systems which would need to learn autonomously in unknown environments and may not have access to some or any of these three components. Reinforcement learning and evolutionary algorithm (EA) based methods circumvent this problem by continuously interacting with the environment and updating the models based on obtained rewards. However, deploying these algorithms on ubiquitous autonomous agents at the edge (robots/drones) demands extremely high energy-efficiency due to (i) tight power and energy budgets, (ii) continuous / lifelong interaction with the environment, (iii) intermittent or no connectivity to the cloud to offloadheavy-weight processing. To address this need, we present GENESYS, a HW-SW prototype of a EA-based learning system, that comprises of a closed loop learning engine called EvE and an inference engine called ADAM. EvE can evolve the topology and weights of neural networks completely in hardware for the task at hand, without requiring hand-optimization or backpropogation training. ADAM continuously interacts with the environment and is optimized for efficiently running the irregular neural networks generated by EvE. GENESYS identifies and leverages multiple unique avenues of parallelism unique to EAs that we term “gene”- level parallelism, and “population”-level parallelism. We ran GENESYS with a suite of environments from OpenAI gym and observed 2-5 orders of magnitude higher energy-efficiency over state-of-the-art embedded and desktop CPU and GPU systems.  more » « less
Award ID(s):
1755876
NSF-PAR ID:
10080204
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Annual IEEE/ACM International Symposium on Microarchitecture Workshops
ISSN:
2470-7791
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Objective.Spiking neural networks (SNNs) are powerful tools that are well suited for brain machine interfaces (BMI) due to their similarity to biological neural systems and computational efficiency. They have shown comparable accuracy to state-of-the-art methods, but current training methods require large amounts of memory, and they cannot be trained on a continuous input stream without pausing periodically to perform backpropagation. An ideal BMI should be capable training continuously without interruption to minimize disruption to the user and adapt to changing neural environments.Approach.We propose a continuous SNN weight update algorithm that can be trained to perform regression learning with no need for storing past spiking events in memory. As a result, the amount of memory needed for training is constant regardless of the input duration. We evaluate the accuracy of the network on recordings of neural data taken from the premotor cortex of a primate performing reaching tasks. Additionally, we evaluate the SNN in a simulated closed loop environment and observe its ability to adapt to sudden changes in the input neural structure.Main results.The continuous learning SNN achieves the same peak correlation (ρ=0.7) as existing SNN training methods when trained offline on real neural data while reducing the total memory usage by 92%. Additionally, it matches state-of-the-art accuracy in a closed loop environment, demonstrates adaptability when subjected to multiple types of neural input disruptions, and is capable of being trained online without any prior offline training.Significance.This work presents a neural decoding algorithm that can be trained rapidly in a closed loop setting. The algorithm increases the speed of acclimating a new user to the system and also can adapt to sudden changes in neural behavior with minimal disruption to the user.

     
    more » « less
  2. Generative Adversarial Network (GAN) has emerged as one of the most promising semi-supervised learning methods where two neural nets train themselves in a competitive environment. In this paper, as far as we know, we are the first to present a statistically trained Ternarized Generative Adversarial Network (TGAN) with fully ternarized weights (i.e. -1,0,+1) to massively reduce the need for computation and storage resources in the conventional GAN structures. In the proposed TGAN, the computationally expensive convolution operations (i.e. Multiplication and Accumulation) in both generator and discriminator's forward path are converted into hardware-friendly Addition/Subtraction operations. Accordingly, we propose a Processing-in-Memory accelerator for TGAN called (PIM-TGAN) based on Spin-Orbit Torque Magnetic Random Access Memory (SOT-MRAM) computational sub-arrays to efficiently accelerate the training process of GAN within non-volatile memory. In addition, we propose a parallelism technique to further enhance the training efficiency of TGAN. Our device-to-architecture co-simulation results show that, with almost the same inception score to the baseline GAN with floating point number weights on different data-sets, the proposed PIM-TGAN can obtain ~25.6× better energy-efficiency and 22× speedup compared to GPU platform averagely, and, 9.2× better energy-efficiency and 5.4× speedup over the best processing-in-ReRAM accelerators. 
    more » « less
  3. Generative Adversarial Networks (GANs) have recently drawn tremendous attention in many artificial intelligence (AI) applications including computer vision, speech recognition, and natural language processing. While GANs deliver state-of-the-art performance on these AI tasks, it comes at the cost of high computational complexity. Although recent progress demonstrated the promise of using ReRMA-based Process-In-Memory for acceleration of convolutional neural networks (CNNs) with low energy cost, the unique training process required by GANs makes them difficult to run on existing neural network acceleration platforms: two competing networks are simultaneously co-trained in GANs, and hence, significantly increasing the need of memory and computation resources. In this work, we propose ReGAN – a novel ReRAM-based Process-In-Memory accelerator that can efficiently reduce off-chip memory accesses. Moreover, ReGAN greatly increases system throughput by pipelining the layer-wise computation. Two techniques, namely, Spatial Parallelism and Computation Sharing are particularly proposed to further enhance training efficiency of GANs. Our experimental results show that ReGAN can achieve 240X performance speedup compared to GPU platform averagely, with an average energy saving of 94X. 
    more » « less
  4. null (Ed.)
    Abstract Deep neural networks (DNNs) have substantial computational requirements, which greatly limit their performance in resource-constrained environments. Recently, there are increasing efforts on optical neural networks and optical computing based DNNs hardware, which bring significant advantages for deep learning systems in terms of their power efficiency, parallelism and computational speed. Among them, free-space diffractive deep neural networks (D 2 NNs) based on the light diffraction, feature millions of neurons in each layer interconnected with neurons in neighboring layers. However, due to the challenge of implementing reconfigurability, deploying different DNNs algorithms requires re-building and duplicating the physical diffractive systems, which significantly degrades the hardware efficiency in practical application scenarios. Thus, this work proposes a novel hardware-software co-design method that enables first-of-its-like real-time multi-task learning in D 2 2NNs that automatically recognizes which task is being deployed in real-time. Our experimental results demonstrate significant improvements in versatility, hardware efficiency, and also demonstrate and quantify the robustness of proposed multi-task D 2 NN architecture under wide noise ranges of all system components. In addition, we propose a domain-specific regularization algorithm for training the proposed multi-task architecture, which can be used to flexibly adjust the desired performance for each task. 
    more » « less
  5. For energy-assisted compression ignition (EACI) engine propulsion at high-altitude operating conditions using sustainable jet fuels with varying cetane numbers, it is essential to develop an efficient engine control system for robust and optimal operation. Control systems are typically trained using experimental data, which can be costly and time consuming to generate due to setup time of experiments, unforeseen delays/issues with manufacturing, mishaps/engine failures and the consequent repairs (which can take weeks), and errors in measurements. Computational fluid dynamics (CFD) simulations can overcome such burdens by complementing experiments with simulated data for control system training. Such simulations, however, can be computationally expensive. Existing data-driven machine learning (ML) models have shown promise for emulating the expensive CFD simulator, but encounter key limitations here due to the expensive nature of the training data and the range of differing combustion behaviors (e.g. misfires and partial/delayed ignition) observed at such broad operating conditions. We thus develop a novel physics-integrated emulator, called the Misfire-Integrated GP (MInt-GP), which integrates important auxiliary information on engine misfires within a Gaussian process surrogate model. With limited CFD training data, we show the MInt-GP model can yield reliable predictions of in-cylinder pressure evolution profiles and subsequent heat release profiles and engine CA50 predictions at a broad range of input conditions. We further demonstrate much better prediction capabilities of the MInt-GP at different combustion behaviors compared to existing data-driven ML models such as kriging and neural networks, while also observing up to 80 times computational speed-up over CFD, thus establishing its effectiveness as a tool to assist CFD for fast data generation in control system training.

     
    more » « less