skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Soft Error Resilience Analysis of LSTM Networks
Long Short-Term Memory (LSTM) deep neural networks are diverse in the tasks they can accomplish, such as image captioning and speech recognition. However, they remain susceptible to transient faults when deployed in environments with high-energy particles or radiation. It remains unknown how the potential transient faults will impact LSTM models. Therefore, we investigate the resilience of the weights and biases of these networks through four implementations of the original LSTM network. Based on the observations made through the fault injection of these networks, we propose an effective method of fault mitigation through Hamming encoding of selected weights and biases in a given network.  more » « less
Award ID(s):
2019511
PAR ID:
10529732
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400706059
Page Range / eLocation ID:
328 to 332
Subject(s) / Keyword(s):
Deep Neural Networks, Fault Tolerance, LSTM
Format(s):
Medium: X
Location:
Clearwater FL USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Landmark universal function approximation results for neural networks with trained weights and biases provided the impetus for the ubiquitous use of neural networks as learning models in neuroscience and Artificial Intelligence (AI). Recent work has extended these results to networks in which a smaller subset of weights (e.g., output weights) are tuned, leaving other parameters random. However, it remains an open question whether universal approximation holds when only biases are learned, despite evidence from neuroscience and AI that biases significantly shape neural responses. The current paper answers this question. We provide theoretical and numerical evidence demonstrating that feedforward neural networks with fixed random weights can approximate any continuous function on compact sets. We further show an analogous result for the approximation of dynamical systems with recurrent neural networks. Our findings are relevant to neuroscience, where they demonstrate the potential for behaviourally relevant changes in dynamics without modifying synaptic weights, as well as for AI, where they shed light on recent fine-tuning methods for large language models, like bias and prefix-based approaches. 
    more » « less
  2. SUMMARY Earthquakes come in clusters formed of mostly aftershock sequences, swarms and occasional foreshock sequences. This clustering is thought to result either from stress transfer among faults, a process referred to as cascading, or from transient loading by aseismic slip (pre-slip, afterslip or slow slip events). The ETAS statistical model is often used to quantify the fraction of clustering due to stress transfer and to assess the eventual need for aseismic slip to explain foreshocks or swarms. Another popular model of clustering relies on the earthquake nucleation model derived from experimental rate-and-state friction. According to this model, earthquakes cluster because they are time-advanced by the stress change imparted by the mainshock. This model ignores stress interactions among aftershocks and cannot explain foreshocks or swarms in the absence of transient loading. Here, we analyse foreshock, swarm and aftershock sequences resulting from cascades in a Discrete Fault Network model governed by rate-and-state friction. We show that the model produces realistic swarms, foreshocks and aftershocks. The Omori law, characterizing the temporal decay of aftershocks, emerges in all simulations independently of the assumed initial condition. In our simulations, the Omori law results from the earthquake nucleation process due to rate and state friction and from the heterogeneous stress changes due to the coseismic stress transfers. By contrast, the inverse Omori law, which characterizes the accelerating rate of foreshocks, emerges only in the simulations with a dense enough fault system. A high-density complex fault zone favours fault interactions and the emergence of an accelerating sequence of foreshocks. Seismicity catalogues generated with our discrete fault network model can generally be fitted with the ETAS model but with some material differences. In the discrete fault network simulations, fault interactions are weaker in aftershock sequences because they occur in a broader zone of lower fault density and because of the depletion of critically stressed faults. The productivity of the cascading process is, therefore, significantly higher in foreshocks than in aftershocks if fault zone complexity is high. This effect is not captured by the ETAS model of fault interactions. It follows that a foreshock acceleration stronger than expected from ETAS statistics does not necessarily require aseismic slip preceding the mainshock (pre-slip). It can be a manifestation of a cascading process enhanced by the topological properties of the fault network. Similarly, earthquake swarms might not always imply transient loading by aseismic slip, as they can emerge from stress interactions. 
    more » « less
  3. Abstract Slow slip is part of the earthquake cycle, but the processes controlling this phenomenon in space and time are poorly constrained. Hematite, common in continental fault zones, exhibits unique textures and (U-Th)/He thermochronometry data patterns reflecting different slip rates. We investigated networks of small hematite-coated slip surfaces in basement fault damage of exhumed strike-slip faults that connect to the southern San Andreas fault in a flower structure in the Mecca Hills, California, USA. Scanning electron microscopy shows these millimeter-thick surfaces exhibit basal hematite injection veins and layered veinlets comprising nanoscale, high-aspect-ratio hematite plates akin to phyllosilicates. Combined microstructural and hematite (U-Th)/He data (n = 64 new, 24 published individual analyses) record hematite mineralization events ca. 0.8 Ma to 0.4 Ma at <1.5 km depth. We suggest these hematite faults formed via fluid overpressure, and then hematite localized repeated subseismic slip, creating zones of shallow off-fault damage as far as 4 km orthogonal to the trace of the southern San Andreas fault. Distributed hematite slip surfaces develop by, and then accommodate, transient slow slip, potentially dampening or distributing earthquake energy in shallow continental faults. 
    more » « less
  4. null (Ed.)
    The inductive biases of trained neural networks are difficult to understand and, consequently, to adapt to new settings. We study the inductive biases of linearizations of neural networks, which we show to be surprisingly good summaries of the full network functions. Inspired by this finding, we propose a technique for embedding these inductive biases into Gaussian processes through a kernel designed from the Jacobian of the network. In this setting, domain adaptation takes the form of interpretable posterior inference, with accompanying uncertainty estimation. This inference is analytic and free of local optima issues found in standard techniques such as fine-tuning neural network weights to a new task. We develop significant computational speed-ups based on matrix multiplies, including a novel implementation for scalable Fisher vector products. Our experiments on both image classification and regression demonstrate the promise and convenience of this framework for transfer learning, compared to neural network fine-tuning. 
    more » « less
  5. Smaller transistor feature sizes have made integrated circuits (ICs) more vulnerable to permanent faults. This leads to short lifetimes and increased risk of faults that lead to catastrophic errors. Fortunately, Artificial Neural Networks (ANNs) are error resilient as their accuracies can be maintained through e.g., fault-aware re-training. One of the problems though with previous work is that they require a re-design in the individual neuron processing element structure in order to efficiently deal with these faults. In this work, we propose a novel architecture combined with a design flow that performs a fault-aware weight re-assignment in order to minimize the effect of permanent faults on the accuracy of ANNs mapped to AI accelerator without the need of time-consuming fault-aware re-training nor neuron processing elements re-design. In particular, we deal with Tensor Processing Units (TPUs) although our proposed approach is also extensible to any other architecture. Experimental results show that our proposed approach and can be efficiently executed on a fast dedicated hardware re-binding unit or on software. 
    more » « less