skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On-Chip Optimization and Deep Reinforcement Learning in Memristor Based Computing
Reinforcement learning (RL) has shown its viability to learn when an agent interacts continually with the environment to optimize a policy. This work presents a memristor-based deep reinforcement learning (Mem-DRL) system for on-chip training, where the learning process takes place in a dynamic cartpole environment. Memristor device variability is taken into account to make the study more realistic. The proposed system utilized an analog ReLu module to reduce analog to digital converter usage. The analog Mem-DRL system consumed 191 times less energy than an optimized digital FP16 computing system. Our Mem-DRL system reduced the ADC usages by 40%, which led to reduced the overall system energy by 42%. Mem-DRL is 2.4 times faster than the FP16 system and performs 9.27 GOPS during DRL training. The system exhibited an energy efficiency of 23.8 TOPS/W.  more » « less
Award ID(s):
1718633
PAR ID:
10517369
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent research has highlighted the effectiveness of advanced building controls in reducing the energy consumption of heating, ventilation, and air-conditioning (HVAC) systems. Among advanced building control strategies, deep reinforcement learning control (DRL) shows the potential to achieve energy savings for HVAC systems and has emerged as a promising strategy. However, training DRL requires an interactive environment for the agent, which is challenging to achieve with real buildings due to time and response speed constraints. To address this challenge, a simulation environment serving as a training environment is needed, even though the DRL algorithm does not necessarily need a model. The error between the model and the real building is inevitable in this process, which may influence the efficiency of the DRL controller. To investigate the impact of model error, a virtual testbed was established. A high- fidelity Modelica-based model is developed serving as the virtual building. Three reduced-order models (ROMs) (i.e., 3R2C, Light Gradient Boosting Machine (LightGBM) and artificial neural network (ANN) models) were trained with the historical data generated from the virtual building and were embedded in the training environments of DRL. The sensitivity of ROMs and the Modelica model to random and periodical actions were tested and compared. Deploying the policy trained based on a ROM-based environment, which stands for a surrogate model in reality, into the Modelica-based virtual building testing environment, which stands for real-building, is a practical approach to implementing the DRL control. The performance of the practical DRL controller is compared with rule-based control (RBC) and an ideal DRL controller which was trained and deployed both in the virtual building environment. In the final episode with best rewards of the case study, the 3R2C, LightGBM, and ANN-based DRL outperform the RBC by 7.4%, 14.4%, and 11.4%, respectively in terms of the reward, comprising the weighted sum of energy cost, temperature violations, and the slew rate of the control signal, but falls short of the ideal Modelica-based DRL controller which outperforms RBC by 29.5%. The DRL controllers based on data-driven models are highly unstable with higher maximum rewards but much lower average rewards which might be caused by the significant prediction defect in certain action regions of the data-driven model. 
    more » « less
  2. Deep reinforcement learning (DRL) has demonstrated significant potential in various applications, including gaming AI, robotics, and system scheduling. DRL algorithms produce, sample, and learn from training data online through a trial-and-error process, demanding considerable time and computational resources. To address this, distributed DRL algorithms and paradigms have been developed to expedite training using extensive resources. Through carefully designed experiments, we are the first to observe that strategically increasing the actor-environment interactions by spawning more concurrent actors at certain training rounds within ephemeral time frames can significantly enhance training efficiency. Yet, current distributed DRL solutions, which are predominantly server-based (or serverful), fail to capitalize on these opportunities due to their long startup times, limited adaptability, and cumbersome scalability. This paper proposesNitro, a generic training engine for distributed DRL algorithms that enforces timely and effective boosting with concurrent actors instantaneously spawned by serverless computing. With serverless functions,Nitroadjusts data sampling strategies dynamically according to the DRL training demands.Nitroseizes the opportunity of real-time boosting by accurately and swiftly detecting an empirical metric. To achieve cost efficiency, we design a heuristic actor scaling algorithm to guideNitrofor cost-aware boosting budget allocation. We integrateNitrowith state-of-the-art DRL algorithms and frameworks and evaluate them on AWS EC2 and Lambda. Experiments with Mujoco and Atari benchmarks show thatNitroimproves the final rewards (i.e., training quality) by up to 6× and reduces training costs by up to 42%. 
    more » « less
  3. Deep reinforcement learning (DRL) has demonstrated impressive success in solving complex control tasks by synthesizing control policies from data. However, the safety and stability of applying DRL to safety-critical systems remain a primary concern and challenging problem. To address the problem, we propose the Phy-DRL: a novel physics-model regulated deep reinforcement learning framework. The Phy-DRL is novel in two architectural designs: a physics-model-regulated reward and residual control, which integrates physics-model-based control and data-driven control. The concurrent designs enable the Phy-DRL to mathematically provable safety and stability guarantees. Finally, the effectiveness of the Phy-DRL is validated by an inverted pendulum system. Additionally, the experimental results demonstrate that the Phy-DRL features remarkably accelerated training and enlarged reward. 
    more » « less
  4. null (Ed.)
    While Deep Reinforcement Learning has emerged as a de facto approach to many complex experience-driven networking problems, it remains challenging to deploy DRL into real systems. Due to the random exploration or half-trained deep neural networks during the online training process, the DRL agent may make unexpected decisions, which may lead to system performance degradation or even system crash. In this paper, we propose PnP-DRL, an offline-trained, plug and play DRL solution, to leverage the batch reinforcement learning approach to learn the best control policy from pre-collected transition samples without interacting with the system. After being trained without interaction with systems, our Plug and Play DRL agent will start working seamlessly, without additional exploration or possible disruption of the running systems. We implement and evaluate our PnP-DRL solution on a prevalent experience-driven networking problem, Dynamic Adaptive Streaming over HTTP (DASH). Extensive experimental results manifest that 1) The existing batch reinforcement learning method has its limits; 2) Our approach PnP-DRL significantly outperforms classical adaptive bitrate algorithms in average user Quality of Experience (QoE); 3) PnP-DRL, unlike the state-of-the-art online DRL methods, can be off and running without learning gaps, while achieving comparable performances. 
    more » « less
  5. Abstract The constant drive to achieve higher performance in deep neural networks (DNNs) has led to the proliferation of very large models. Model training, however, requires intensive computation time and energy. Memristor‐based compute‐in‐memory (CIM) modules can perform vector‐matrix multiplication (VMM) in place and in parallel, and have shown great promises in DNN inference applications. However, CIM‐based model training faces challenges due to non‐linear weight updates, device variations, and low‐precision. In this work, a mixed‐precision training scheme is experimentally implemented to mitigate these effects using a bulk‐switching memristor‐based CIM module. Low‐precision CIM modules are used to accelerate the expensive VMM operations, with high‐precision weight updates accumulated in digital units. Memristor devices are only changed when the accumulated weight update value exceeds a pre‐defined threshold. The proposed scheme is implemented with a system‐onchip of fully integrated analog CIM modules and digital sub‐systems, showing fast convergence of LeNet training to 97.73%. The efficacy of training larger models is evaluated using realistic hardware parameters and verifies that CIM modules can enable efficient mix‐precision DNN training with accuracy comparable to full‐precision software‐trained models. Additionally, models trained on chip are inherently robust to hardware variations, allowing direct mapping to CIM inference chips without additional re‐training. 
    more » « less