skip to main content


This content will become publicly available on July 3, 2024

Title: A Novel Approach to Error Resilience in Online Reinforcement Learning
Online reinforcement learning (RL) based systems are being increasingly deployed in a variety of safety-critical applications ranging from drone control to medical robotics. These systems typically use RL onboard rather than relying on remote operation from high-performance datacenters. Due to the dynamic nature of the environments they work in, onboard RL hardware is vulnerable to soft errors from radiation, thermal effects and electrical noise that corrupt the results of computations. Existing approaches to on-line error resilience in machine learning systems have relied on availability of the large training datasets to configure resilience parameters, which is not necessarily feasible for online RL systems. Similarly, other approaches involving specialized hardware or modifications to training algorithms are difficult to implement for onboard RL applications. In contrast, we present a novel error resilience approach for online RL that makes use of running statistics collected across the (real-time) RL training process to configure error detection thresholds without the need to access a reference training dataset. In this methodology, statistical concentration bounds leveraging running statistics are used to diagnose neuron outputs as erroneous. These erroneous neurons are then set to zero (suppressed). Our approach is compared against the state of the art and validated on several RL algorithms involving the use of multiple concentration bounds on CPU as well as GPU hardware.  more » « less
Award ID(s):
2128419
NSF-PAR ID:
10453096
Author(s) / Creator(s):
;
Date Published:
Journal Name:
International On-Line Testing Symposium
Page Range / eLocation ID:
1-6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep learning techniques have been widely adopted in daily life with applications ranging from face recognition to recommender systems. The substantial overhead of conventional error tolerance techniques precludes their widespread use, while approaches involving median filtering and invariant generation rely on alterations to DNN training that may be difficult to achieve for larger networks on larger datasets. To address this issue, this paper presents a novel approach taking advantage of the statistics of neuron output gradients to identify and suppress erroneous neuron values. By using the statistics of neurons’ gradients with respect to their neighbors, tighter statistical thresholds are obtained compared to the use of neuron output values alone. This approach is modular and is combined with accurate, low-overhead error detection methods to ensure it is used only when needed, further reducing its cost. Deep learning models can be trained using standard methods and our error correction module is fit to a trained DNN, achieving comparable or superior performance compared to baseline error correction methods while incurring comparable hardware overhead without needing to modify DNN training or utilize specialized hardware architectures. 
    more » « less
  2. Deep learning techniques have been widely adopted in daily life with applications ranging from face recognition to recommender systems. The substantial overhead of conventional error tolerance techniques precludes their widespread use, while approaches involving median filtering and invariant generation rely on alterations to DNN training that may be difficult to achieve for larger networks on larger datasets. To address this issue, this paper presents a novel approach taking advantage of the statistics of neuron output gradients to identify and suppress erroneous neuron values. By using the statistics of neurons’ gradients with respect to their neighbors, tighter statistical thresholds are obtained compared to the use of neuron output values alone. This approach is modular and is combined with accurate, low-overhead error detection methods to ensure it is used only when needed, further reducing its cost. Deep learning models can be trained using standard methods and our error correction module is fit to a trained DNN, achieving comparable or superior performance compared to baseline error correction methods while incurring comparable hardware overhead without needing to modify DNN training or utilize specialized hardware architectures. 
    more » « less
  3. null (Ed.)
    Growth-transform (GT) neurons and their population models allow for independent control over the spiking statistics and the transient population dynamics while optimizing a physically plausible distributed energy functional involving continuous-valued neural variables. In this paper we describe a backpropagation-less learning approach to train a network of spiking GT neurons by enforcing sparsity constraints on the overall network spiking activity. The key features of the model and the proposed learning framework are: (a) spike responses are generated as a result of constraint violation and hence can be viewed as Lagrangian parameters; (b) the optimal parameters for a given task can be learned using neurally relevant local learning rules and in an online manner; (c) the network optimizes itself to encode the solution with as few spikes as possible (sparsity); (d) the network optimizes itself to operate at a solution with the maximum dynamic range and away from saturation; and (e) the framework is flexible enough to incorporate additional structural and connectivity constraints on the network. As a result, the proposed formulation is attractive for designing neuromorphic tinyML systems that are constrained in energy, resources, and network structure. In this paper, we show how the approach could be used for unsupervised and supervised learning such that minimizing a training error is equivalent to minimizing the overall spiking activity across the network. We then build on this framework to implement three different multi-layer spiking network architectures with progressively increasing flexibility in training and consequently, sparsity. We demonstrate the applicability of the proposed algorithm for resource-efficient learning using a publicly available machine olfaction dataset with unique challenges like sensor drift and a wide range of stimulus concentrations. In all of these case studies we show that a GT network trained using the proposed learning approach is able to minimize the network-level spiking activity while producing classification accuracy that are comparable to standard approaches on the same dataset. 
    more » « less
  4. Abstract Forpractical considerations reinforcement learning has proven to be a difficult task outside of simulation when applied to a physical experiment. Here we derive an optional approach to model free reinforcement learning, achieved entirely online, through careful experimental design and algorithmic decision making. We design a reinforcement learning scheme to implement traditionally episodic algorithms for an unstable 1-dimensional mechanical environment. The training scheme is completely autonomous, requiring no human to be present throughout the learning process. We show that the pseudo-episodic technique allows for additional learning updates with off-policy actor-critic and experience replay methods. We show that including these additional updates between periods of traditional training episodes can improve speed and consistency of learning. Furthermore, we validate the procedure in experimental hardware. In the physical environment, several algorithm variants learned rapidly, each surpassing baseline maximum reward. The algorithms in this research are model free and use only information obtained by an onboard sensor during training. 
    more » « less
  5. Abstract

    Gridded monthly rainfall estimates can be used for a number of research applications, including hydrologic modeling and weather forecasting. Automated interpolation algorithms, such as the “autoKrige” function in R, can produce gridded rainfall estimates that validate well but produce unrealistic spatial patterns. In this work, an optimized geostatistical kriging approach is used to interpolate relative rainfall anomalies, which are then combined with long-term means to develop the gridded estimates. The optimization consists of the following: 1) determining the most appropriate offset (constant) to use when log-transforming data; 2) eliminating poor quality data prior to interpolation; 3) detecting erroneous maps using a machine learning algorithm; and 4) selecting the most appropriate parameterization scheme for fitting the model used in the interpolation. Results of this effort include a 30-yr (1990–2019), high-resolution (250-m) gridded monthly rainfall time series for the state of Hawai‘i. Leave-one-out cross validation (LOOCV) is performed using an extensive network of 622 observation stations. LOOCV results are in good agreement with observations (R2= 0.78; MAE = 55 mm month−1; 1.4%); however, predictions can underestimate high rainfall observations (bias = 34 mm month−1; −1%) due to a well-known smoothing effect that occurs with kriging. This research highlights the fact that validation statistics should not be the sole source of error assessment and that default parameterizations for automated interpolation may need to be modified to produce realistic gridded rainfall surfaces. Data products can be accessed through the Hawai‘i Data Climate Portal (HCDP;http://www.hawaii.edu/climate-data-portal).

    Significance Statement

    A new method is developed to map rainfall in Hawai‘i using an optimized geostatistical kriging approach. A machine learning technique is used to detect erroneous rainfall maps and several conditions are implemented to select the optimal parameterization scheme for fitting the model used in the kriging interpolation. A key finding is that optimization of the interpolation approach is necessary because maps may validate well but have unrealistic spatial patterns. This approach demonstrates how, with a moderate amount of data, a low-level machine learning algorithm can be trained to evaluate and classify an unrealistic map output.

     
    more » « less