Abstract: High-efficiency energy conversion systems have become increasingly important due to their wide use in all electronic systems such as data centers, smart mobile devices, E-vehicles, medical instruments, and so forth. Complex and interdependent parameters make optimal designs of power converters challenging to get. Recent research has shown that machine learning (ML) algorithms, such as reinforcement learning (RL), show great promise in design of such converter circuits. A trained RL agent can search for optimal design parameters for power conversion circuit topologies under targeted application requirements. Training an RL agent requires numerous circuit simulations. It requires significantly more training iterations when the tolerance of circuit components due to manufacturing inconsistency, aging, and temperature variation is considered. As a result, they may take days to complete, primarily because of the slow time-domain circuit simulation. This paper proposes a new FPGA architecture that accelerates the circuit simulation and hence substantially speeds up the RL-based design method for power converters. Our new architecture supports all power electronic circuit converters and their variations. It substantially improves the training speed of RL-based design methods. High-level synthesis (HLS) was used to build the accelerator on Amazon Web Service (AWS) F1 instance. An AWS virtual PC hosts the training algorithm. The host interacts with the FPGA accelerator by updating the circuit parameters, initiating simulation, and collecting the simulation results during training iterations. A script was created on the host side to facilitate this design method to convert a netlist containing circuit topology and parameters into core matrices in the FPGA accelerator. Experimental results showed 60× overall speedup of our RL-based design method in comparison with using a popular commercial simulator, PowerSim.
more »
« less
An Efficient Meta-Reinforcement Learning Approach for Circuit Linearity Calibration via Style Injection
Circuit linearity calibration can represent a set of high-dimensional search problems if the observability is limited. For example, linearity calibration of digital-to-time converters (DTC), an essential building block of modern digital phaselocked loops (DPLLs), is an example of a high-dimensional search problem as difficulty of measuring ps delays hinders prior methods that calibrate stage by stage. And, a calibrated DTC can become nonlinear again due to changes in temperature (T) and power supply voltage (V). Prior work reports a deep reinforcement learning framework that is capable of performing DTC linearity calibration with nonlinear calibration banks; however, this prior work does not address maintaining calibration in the face of temperature and supply voltage variations. In this paper, we present a meta-reinforcement learning (RL) method that can enable the RL agent to quickly adapt to a new environment when the temperature and/or voltage change. Inspired by the Style Generative Adversarial Networks (StyleGANs), we propose to treat temperature and voltage changes as the styles of the circuits. In contrast to traditional methods employing circuit sensors to detect changes in T and V, we utilize a machine learning (ML) sensor, to implicitly infer a wide range of environmental changes. The style information from the ML sensor is subsequently injected into a small portion of the policy network, modulating its weights. As a proof of concept, we first designed a 5-bit DTC at the normal voltage (1V) and normal temperature (27℃) corner (NVNT) as the environment. The RL agent begins its training in the NVNT environment. Following this initial phase, the agent is then tasked with adapting to environments with different temperature and supply voltages. Our results show that the proposed technique can reduce the Integral Non-Linearity (INL) to less than 0.5 LSB within 10, 000 search steps in a changed environment. Compared to starting learning from a random initialized policy and a trained policy, the proposed meta-RL approach takes 63% and 47% fewer steps to complete the linearity calibration, respectively. Our method is also applicable to the calibration of many other kinds of analog and RF circuits.
more »
« less
- Award ID(s):
- 1823235
- PAR ID:
- 10488831
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- MidWest Symposium on Circuits and Systems
- ISBN:
- 979-8-3503-0210-3
- Page Range / eLocation ID:
- 10 to 14
- Subject(s) / Keyword(s):
- Meta-Reinforcement Learning, Circuit Linearity Calibration, Style Generative Adversarial Networks, Fast Adaptation.
- Format(s):
- Medium: X
- Location:
- Tempe, AZ, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Due to repetitive trial-and-error style interactions between agents and a fixed traffic environment during the policy learning, existing Reinforcement Learning (RL)-based Traffic Signal Control (TSC) methods greatly suffer from long RL training time and poor adaptability of RL agents to other complex traffic environments. To address these problems, we propose a novel Adversarial Inverse Reinforcement Learning (AIRL)-based pre-training method named InitLight, which enables effective initial model generation for TSC agents. Unlike traditional RL-based TSC approaches that train a large number of agents simultaneously for a specific multi-intersection environment, InitLight pretrains only one single initial model based on multiple single-intersection environments together with their expert trajectories. Since the reward function learned by InitLight can recover ground-truth TSC rewards for different intersections at optimality, the pre-trained agent can be deployed at intersections of any traffic environments as initial models to accelerate subsequent overall global RL training. Comprehensive experimental results show that, the initial model generated by InitLight can not only significantly accelerate the convergence with much fewer episodes, but also own superior generalization ability to accommodate various kinds of complex traffic environments.more » « less
-
Decision-making under uncertainty (DMU) is present in many important problems. An open challenge is DMU in non-stationary environments, where the dynamics of the environment can change over time. Reinforcement Learning (RL), a popular approach for DMU problems, learns a policy by interacting with a model of the environment offline. Unfortunately, if the environment changes the policy can become stale and take sub-optimal actions, and relearning the policy for the updated environment takes time and computational effort. An alternative is online planning approaches such as Monte Carlo Tree Search (MCTS), which perform their computation at decision time. Given the current environment, MCTS plans using high-fidelity models to determine promising action trajectories. These models can be updated as soon as environmental changes are detected to immediately incorporate them into decision making. However, MCTS’s convergence can be slow for domains with large state-action spaces. In this paper, we present a novel hybrid decision-making approach that combines the strengths of RL and planning while mitigating their weaknesses. Our approach, called Policy Augmented MCTS (PA-MCTS), integrates a policy’s actin-value estimates into MCTS, using the estimates to seed the action trajectories favored by the search. We hypothesize that PA-MCTS will converge more quickly than standard MCTS while making better decisions than the policy can make on its own when faced with nonstationary environments. We test our hypothesis by comparing PA-MCTS with pure MCTS and an RL agent applied to the classical CartPole environment. We find that PC-MCTS can achieve higher cumulative rewards than the policy in isolation under several environmental shifts while converging in significantly fewer iterations than pure MCTS.more » « less
-
null (Ed.)Interactive reinforcement learning (IRL) agents use human feedback or instruction to help them learn in complex environments. Often, this feedback comes in the form of a discrete signal that’s either positive or negative. While informative, this information can be difficult to generalize on its own. In this work, we explore how natural language advice can be used to provide a richer feedback signal to a reinforcement learning agent by extending policy shaping, a well-known IRL technique. Usually policy shaping employs a human feedback policy to help an agent to learn more about how to achieve its goal. In our case, we replace this human feedback policy with policy generated based on natural language advice. We aim to inspect if the generated natural language reasoning provides support to a deep RL agent to decide its actions successfully in any given environment. So, we design our model with three networks: first one is the experience driven, next is the advice generator and third one is the advice driven. While the experience driven RL agent chooses its actions being influenced by the environmental reward, the advice driven neural network with generated feedback by the advice generator for any new state selects its actions to assist the RL agent to better policy shaping.more » « less
-
Pedagogical planners can provide adaptive support to students in narrative-centered learning environments by dynamically scaffolding student learning and tailoring problem scenarios. Reinforcement learning (RL) is frequently used for pedagogical planning in narrative-centered learning environments. However, RL-based pedagogical planning raises significant challenges due to the scarcity of data for training RL policies. Most prior work has relied on limited-size datasets and offline RL techniques for policy learning. Unfortunately, offline RL techniques do not support on-demand exploration and evaluation, which can adversely impact the quality of induced policies. To address the limitation of data scarcity and offline RL, we propose INSIGHT, an online RL framework for training data-driven pedagogical policies that optimize student learning in narrative-centered learning environments. The INSIGHT framework consists of three components: a narrative-centered learning environment simulator, a simulated student agent, and an RL-based pedagogical planner agent, which uses a reward metric that is associated with effective student learning processes. The framework enables the generation of synthetic data for on-demand exploration and evaluation of RL-based pedagogical planning. We have implemented INSIGHT with OpenAI Gym for a narrative-centered learning environment testbed with rule-based simulated student agents and a deep Q-learning-based pedagogical planner. Our results show that online deep RL algorithms can induce near-optimal pedagogical policies in the INSIGHT framework, while offline deep RL algorithms only find suboptimal policies even with large amounts of data.more » « less
An official website of the United States government
