<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Dynamic Reliability Management in Neuromorphic Computing</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>07/19/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10317041</idno>
					<idno type="doi">10.1145/3462330</idno>
					<title level='j'>ACM Journal on Emerging Technologies in Computing Systems</title>
<idno>1550-4832</idno>
<biblScope unit="volume">17</biblScope>
<biblScope unit="issue">4</biblScope>					

					<author>Shihao Song</author><author>Jui Hanamshet</author><author>Adarsha Balaji</author><author>Anup Das</author><author>Jeffrey L. Krichmar</author><author>Nikil D. Dutt</author><author>Nagarajan Kandasamy</author><author>Francky Catthoor</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Neuromorphic computing systems execute machine learning tasks designed with spiking neural networks. These systems are embracing non-volatile memory to implement high-density and low-energy synaptic storage. Elevated voltages and currents needed to operate non-volatile memories cause aging of CMOS-based transistors in each neuron and synapse circuit in the hardware, drifting the transistor’s parameters from their nominal values. If these circuits are used continuously for too long, the parameter drifts cannot be reversed, resulting in permanent degradation of circuit performance over time, eventually leading to hardware faults. Aggressive device scaling increases power density and temperature, which further accelerates the aging, challenging the reliable operation of neuromorphic systems. Existing reliability-oriented techniques periodically de-stress all neuron and synapse circuits in the hardware at fixed intervals, assuming worst-case operating conditions, without actually tracking their aging at run-time. To de-stress these circuits, normal operation must be interrupted, which introduces latency in spike generation and propagation, impacting the inter-spike interval and hence, performance (e.g., accuracy). We observe that in contrast to long-term aging, which permanently damages the hardware, short-term aging in scaled CMOS transistors is mostly due to bias temperature instability. The latter is heavily workload-dependent and, more importantly, partially reversible. We propose a new architectural technique to mitigate the aging-related reliability problems in neuromorphic systems by designing an intelligent run-time manager (NCRTM), which dynamically de-stresses neuron and synapse circuits in response to the short-term aging in their CMOS transistors during the execution of machine learning workloads, with the objective of meeting a reliability target. NCRTM de-stresses these circuits only when it is absolutely necessary to do so, otherwise reducing the performance impact by scheduling de-stress operations off the critical path. We evaluate NCRTM with state-of-the-art machine learning workloads on a neuromorphic hardware. Our results demonstrate that NCRTM significantly improves the reliability of neuromorphic hardware, with marginal impact on performance.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Spiking Neural Networks (SNNs) <ref type="bibr">[60]</ref> are machine learning approaches designed with spike-based computations <ref type="bibr">[46]</ref> and bio-inspired learning algorithms <ref type="bibr">[15]</ref> (See Appendix A for background on SNNs). SNN-based workloads are typically executed on event-driven neuromorphic hardware such as TrueNorth <ref type="bibr">[35]</ref>, Loihi <ref type="bibr">[34]</ref>, and DYNAP-SE <ref type="bibr">[64]</ref>. These hardware platforms are extremely energyefficient, thanks to their event-driven activation and their tile-based distributed architecture with in-place neural computations and synaptic storage <ref type="bibr">[72]</ref>. We investigate the internal architecture of neurons and synapses in DYNAP-SE (see Figures <ref type="figure">3b</ref> and<ref type="figure">4b</ref>), and found that these circuits consist of transistors built using bulk CMOS or FinFet technologies <ref type="bibr">[1,</ref><ref type="bibr">18,</ref><ref type="bibr">44]</ref>. <ref type="foot">1</ref> When operated at a high voltage and temperature, the transistor's parameters strongly drift from their nominal values. This is called aging. In fact, in scaled technology nodes, this aging happens even under nominal conditions and from the very start of using the devices leading to the so-called soft breakdown. The most important breakdown mechanism is the Bias Temperature Instability (BTI) <ref type="bibr">[50,</ref><ref type="bibr">51,</ref><ref type="bibr">99]</ref>. Strongly depending on the workload, BTI is highly variable and it is largely reversible under nominal conditions on removal of the stress voltage. So it leads only to parametric time-dependent variability, affecting mainly delay and leakage power. If the neurons and synapses in a neuromorphic hardware are used continuously for long duration at elevated operating conditions, the parameter drifts cannot be reversed <ref type="bibr">[100]</ref>, leading to permanent functional degradation of the circuit and eventually, hardware faults <ref type="bibr">[53,</ref><ref type="bibr">68,</ref><ref type="bibr">91]</ref>. The permanent fault rates in integrated circuits can be described by the bathtub curve as shown in Figure <ref type="figure">1</ref>. Post manufacturing, integrated circuits (IC) are characterized by high failure rates as these circuits are subjected to manufacturing tests, such as stuck-at, at-speed, burn-in, etc., which filters out defective circuits and circuits with short lifetime. The probability of the successful circuits surviving for a longer period of time, increases. The failure rate, therefore, decreases over time. This phase is known as the infant mortality period. This is followed by a period of constant failure rate, often referred as useful life. The last phase is known as the wear-out or the aging phase and is characterized by increasing fault rate. Recent studies on reliability reveal that, if wear-out is not addressed from early device usage stage (e.g., the beginning of useful life period), circuits can age faster than anticipated with the wear-out phase settling earlier in life (shown by the red dashed line in the figure).</p><p>To address time-dependent variability or aging, circuit designers often set worst-case and hence highly pessimistic reliability-related extra design margins, which unnecessarily constrain performance. Our objective is to analyze the circuit aging in neuromorphic hardware at real-time and take corrective measures at the architecture-level to reverse the parameter drifts based on the utilization of neuron and synapse circuits within a machine learning workload.</p><p>Recently, Non-Volatile Memory (NVM) is used in neuromorphic hardware to implement highdensity and low-energy synaptic storage <ref type="bibr">[14]</ref>. Several NVMs are explored for this purpose -Oxidebased Resistive RAM (OxRRAM) <ref type="bibr">[62]</ref>, Phase Change Memory (PCM) <ref type="bibr">[66]</ref>, Ferro-Electric RAM <ref type="bibr">[65]</ref>, and Spin-Transfer Torque Magnetic or Spin-Orbit-Torque RAM (STT-and SoT-MRAM) <ref type="bibr">[97]</ref>. <ref type="foot">2</ref>NVMs require either high voltage (OxRRAM, PCM and FeFET) or high current (MRAM) to operate, which accelerates the aging of transistors in neuron and synapse circuits in a neuromorphic hardware <ref type="bibr">[2,</ref><ref type="bibr">55,</ref><ref type="bibr">83,</ref><ref type="bibr">84,</ref><ref type="bibr">88]</ref>. Aggressive device scaling increases power density and temperature, which makes reliability even worse. Therefore, circuit aging is emerging as one of the primary reliability concerns for neuromorphic hardware designed with NVMs <ref type="bibr">[16]</ref>.</p><p>The reliability problem we are addressing in this work is due to high voltage operations of NVMs. That can also occur in other system contexts, <ref type="foot">3</ref> but it is in particular an issue for SNNs due to the following reasons. To address this high voltage NVM problem, periodic de-stress of the peripheral circuit is necessary, which impacts inter-spike interval (ISI) when machine learning models are executed on these circuits. The performance (e.g., accuracy) of SNNs depends on ISI. Therefore, the reliability issues of NVMs lead to performance issues in SNNs.</p><p>Prior works on mapping machine learning workloads to neuromorphic hardware have mostly focused on compilation techniques, with the objective of improving machine learning performance on hardware. Examples of such approaches include hardware utilization-based mapping <ref type="bibr">[3,</ref><ref type="bibr">4,</ref><ref type="bibr">8,</ref><ref type="bibr">10,</ref><ref type="bibr">11,</ref><ref type="bibr">24,</ref><ref type="bibr">40,</ref><ref type="bibr">47,</ref><ref type="bibr">48,</ref><ref type="bibr">86]</ref>, energy-based mapping <ref type="bibr">[9,</ref><ref type="bibr">32,</ref><ref type="bibr">94]</ref>, and endurance-based mapping <ref type="bibr">[92,</ref><ref type="bibr">93,</ref><ref type="bibr">95]</ref>. The recently-proposed approach RENEU <ref type="bibr">[88]</ref> is the only compile-time based technique that maps the neurons and synapses to the hardware to improve the long-term, i.e., the lifetime reliability. Although compile-time based aging mitigation approaches have unique advantages such as low computation overhead, predictability, and performance guarantee, they are often conservative and therefore, may miss significant performance and reliability improvement opportunities. Dynamic approaches are flexible, adaptive, and potentially more effective in a highly dynamic environment, such as ones where the inference data deviates strongly from training examples. We show that both performance and reliability can be improved significantly if neuron and synapse circuits are de-stressed periodically at run-time based on current data.</p><p>On the run-time front, very few approaches address the run-time management of neuromorphic computing. <ref type="foot">4</ref> In <ref type="bibr">[10]</ref>, the authors propose a fast approach to remap online learning SNNs on a neuromorphic hardware after every learning epoch to improve model performance. In DTRO <ref type="bibr">[2]</ref>, the authors propose a hybrid approach to estimate the reliability degradation for machine learning workloads at design-time using training data, and use this information to de-stress all hardware circuits during run-time at fixed intervals, without actually tracking the circuit aging. The effectiveness of this approach is limited to supervised techniques only and the availability of representative training data. To this end, we make the following three key observations. &#8226; Observation 1: Workload, which includes synaptic weights and their activation on neuromorphic hardware, is specific to the machine learning task being executed and its input. &#8226; Observation 2: De-stressing all circuits in the hardware periodically, without tracking the actual aging, introduces long latency in spike generation and propagation, which impacts inter-spike interval, leading to information loss in SNNs. &#8226; Observation 3: Compared to long-term aging under elevated stress conditions, which is permanent and irreversible, short-term aging under nominal conditions is heavily workload-dependent (and hence to some extent controllable), and partially reversible.</p><p>Based on these three observations, we introduce NCRTM, a run-time reliability manager for neuromorphic hardware to de-stress neuron and synapse circuits in the hardware only when needed, by dynamically tracking their short-term aging during the execution of machine learning tasks. NCRTM extends our earlier work DTRO <ref type="bibr">[2]</ref> with the following new contributions.</p><p>&#8226; We introduce an intelligent run-time manager NCRTM, which improves the long-term reliability of neuromorphic hardware by controlling its short-term aging when executing machine learning tasks. &#8226; We develop a run-time performance monitoring and reliability estimation framework using statistics collected from the neuromorphic hardware. &#8226; We show that NCRTM can be applied to both supervised and unsupervised machine learning approaches and scenarios where the number of training examples are limited. &#8226; We evaluate NCRTM with machine learning workloads designed using Convolution Neural Network (CNN), Multi-layer Perceptron (MLP), and Recurrent Neural Network (RNN) models on a state-of-the-art neuromorphic hardware simulator. Overall, NCRTM mitigates the aging-related reliability problems in neuromorphic computing by dynamically de-stressing neuron and synapse circuits in response to their short-term aging, with the objective of meeting a reliability target. NCRTM de-stresses these circuits only when it is absolutely necessary to do so, otherwise reducing the performance impact by scheduling all de-stress operations off the critical path by tracking the latency impact of de-stress operations on inter-spike interval (ISI), a key performance measure in SNNs.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">COMPARISON WITH STATE-OF-THE-ART</head><p>Figure <ref type="figure">2</ref> illustrates how the proposed approach differs from two reliability-oriented state-of-the-art approaches. Figure <ref type="figure">2a</ref> illustrates a design-time approach such as RENEU <ref type="bibr">[88]</ref>, where neurons and synapses are mapped to the hardware to increase the long term, i.e., the lifetime reliability. This approach estimates the aging in neuron and synapse circuits using representative training examples. There are no corrective online measures in place to control the aging, should the aging exceed a critical threshold or the workload behavior changes, for instance, when encountering unseen data at run-time. Figure <ref type="figure">2b</ref> illustrates a hybrid approach such as DTRO <ref type="bibr">[2]</ref>, where neurons and synapses are mapped to the hardware using a reliability-oriented mapping technique (e.g., RENEU <ref type="bibr">[88]</ref>). Additionally, all neuron and synapse circuits in the hardware are periodically de-stressed to control the aging. The de-stress interval is determined using training examples. The drawbacks of such an approach are the following. First, by not tracking the actual aging in real-time, such approach can introduce significant latency in interrupting normal operation, even when the aging is much below the critical threshold. This is especially critical for SNNs because the performance of machine learning workloads, e.g., their accuracy, depends on the precise times of spikes (see Appendix A). Second, the effectiveness of such a hybrid approach depends heavily on the training data, which may not always be representative. In fact, hybrid approaches present significant limitations for unsupervised applications or applications with limited training data.  Figure <ref type="figure">2c</ref> illustrates NCRTM, the proposed run-time approach for reliability management in neuromorphic hardware. NCRTM tracks the aging in neuron and synapse circuits at real-time during the execution of machine learning workloads and de-stresses these circuits only when their aging exceeds a critical threshold. By implementing age tracking and control at run-time, the de-stress decisions of NCRTM are made based on current data. Therefore, NCRTM is relevant for both supervised and unsupervised machine learning approaches.</p><p>To the best of our knowledge, NCRTM is the first work for run-time reliability management of neuromorphic hardware. In Section 7, we evaluate NCRTM against these state-of-the-art reliability management approaches using both supervised and unsupervised machine learning workloads.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">BACKGROUND</head><p>In this section, we introduce the background necessary to understand our proposed run-time manager NCRTM. Background on SNNs are provided as Appendix.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Neuromorphic Hardware</head><p>We consider tile-based neuromorphic hardware <ref type="bibr">[6,</ref><ref type="bibr">17]</ref>, where the tiles are interconnected using networks-on-chip (NoC) or Segmented Bus <ref type="bibr">[7]</ref>. This is similar to several contemporary neuromorphic architectures such as DYNAP-SE <ref type="bibr">[64]</ref>, Loihi <ref type="bibr">[34]</ref>, and TrueNorth <ref type="bibr">[35]</ref>. Each tile in the hardware consists of a crossbar for synaptic storage, a set of input and output neurons, and a performance monitoring unit, which in its simplest form is a spike counter (SC). A crossbar, shown in Figure <ref type="figure">3a</ref>, is a 3D organization of top electrodes (TEs), which form the rows and bottom electrodes (BEs), which form the columns. A synaptic cell is connected at a crosspoint, i.e., at the intersection of each row and column via an access transistor as shown in Figure <ref type="figure">3b</ref>. Pre-synaptic neurons are mapped along the TEs and post-synaptic neurons along the BEs. The synaptic weight between a pre-and a post-synaptic neuron is programmed as conductance of the corresponding synaptic cell at the crosspoint. A pre-synaptic neuron's voltage (&#119881; &#119894; ) applied on the TE is multiplied by the conductance (&#119866; &#119894; ) to generate current &#119868; &#119894; = &#119881; &#119894; &#8226; &#119866; &#119894; (according to Ohm's Law). This current propagates to the post-synaptic neuron to raise its action potential. Current summation occurs on each BE according to Kirchoff's Current Law, when integrating excitation from other pre-synaptic neurons. This implements &#119894; &#119868; &#119894; = &#119894; &#119881; &#119894; &#8226; &#119866; &#119894; . This is the in-memory multiply accumulate logic implemented inside a crossbar. Figure <ref type="figure">3a</ref> illustrates the integration of input excitation from two pre-synaptic neurons to one post-synaptic neuron via the synaptic weights &#119908; 1 and &#119908; 2 , respectively. This forms the data plane of the neuromorphic hardware. The control plane of the hardware consists of control signals, which enable specific access transistors (see Figure <ref type="figure">3b</ref>) to facilitate current flow in the crossbar. The NVM device of a synaptic cell, shown as a resistive element in Figure <ref type="figure">3b</ref>, can be implemented for instance with HfO2-based OxRAM or chalcogenide-based PCM as shown in Figure <ref type="figure">3c</ref>. But our approach is not limited to these specific NVM technologies. Figure <ref type="figure">4a</ref> shows the currents and voltages on the path from the pre-synaptic neuron &#119873; &#119894; to the post-synaptic neuron &#119873; &#119895; . The input current &#119868; &#119894; &#119894;&#119899; &#119895; is converted into voltage &#119881; &#119894; &#119904;&#119901;&#119896; using the neuron &#119873; &#119894; . This voltage is multiplied with the conductance &#119866; &#119894; (representing synaptic weight) to generate the current &#119868; &#119895; &#119894;&#119899; &#119895; . This current is converted to voltage &#119881; &#119895; &#119904;&#119901;&#119896; using neuron &#119873; &#119895; . Figure <ref type="figure">4b</ref> illustrate the internal architecture of a Leaky-Integrate-and-Fire neuron <ref type="bibr">[45]</ref>. The current &#119868; inj injected into the neuron is proportional to the weighted sum of excitation from all of its pre-synaptic connections. The PMOS and NMOS transistors in the neuron and the reference voltages raise the neuron's membrane voltage. When the voltage crosses a threshold, a spike is generated. The spike voltage (&#119881; spk ) must be sufficiently high to propagate current through the synaptic cell connected at the output of the neuron. This design, however, is also undermined by large power consumption, due to the very same problems present in the Axon-Hillock circuit. In <ref type="bibr">[14]</ref> van Schaik proposed a circuit with an amplifier at the input, to compare the voltage on the capacitor with a desired spiking threshold voltage.</p><p>As the input exceeds the spiking threshold, the amplifier Certain NVMs such as PCM, OxRRAM, and FeFET requires high voltages to operate. We consider the case of PCM-based neuromorphic hardware, which requires &#8776; 3V <ref type="bibr">[87]</ref> to propagate current through it. This high voltage causes aging of the access transistor in each synaptic cell in a crossbar and also of the transistors in each neuron connected along the TEs and the BEs of the crossbar.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Transistor Aging in Neuromorphic Hardware</head><p>High voltage operations of transistors introduce many reliability issues such as Time-Dependent Dielectric Breakdown (TDDB), Bias Temperature Instability (BTI), and Hot-Carrier Injection (HCI). These are the dominant causes of aging in scaled technology nodes (45nm and below) <ref type="bibr">[61]</ref>. In older nodes, Electromigration (EM) also plays a key role <ref type="bibr">[23, 25-30, 69, 79]</ref>.</p><p>Transistor aging is accelerated when it is stressed, i.e., exposed to high overdrive voltage, where overdrive voltage is defined as the voltage between transistor gate and source (&#119881; &#119866;&#119878; ) in excess of the threshold voltage (&#119881; th ), where &#119881; th is the minimum voltage required between gate and source to turn the transistor on. With this understanding, we provide a brief background of these three failure mechanisms.</p><p>&#8226; TDDB: This is a failure mechanism in a CMOS device, when the gate oxide breaks down as a result of long-time application of relatively low electric field (as opposed to immediate breakdown, which is caused by strong electric field) <ref type="bibr">[76]</ref>. The lifetime of a CMOS device is measured in terms of its mean time to failure (MTTF) as</p><p>where &#119860; and &#120574; are material-related constants, and &#119881; is the overdrive gate voltage of the CMOS device.</p><p>&#8226; BTI: This is a failure mechanism in a CMOS device where positive charges are trapped at the oxide-semiconductor boundary underneath the gate <ref type="bibr">[41]</ref>. BTI manifests as 1) decrease in drain current and transconductance, and 2) increase in off current and threshold voltage. The BTI lifetime of the device is</p><p>where &#119860; and &#120574; are material-related constants, &#119864; &#119886; is the activation energy, &#119870; is the Boltzmann constant, &#119879; is the temperature, and &#119881; is the overdrive gate voltage of the CMOS device. &#8226; HCI: This is a failure mechanism in a CMOS device, when a carrier (electron or hole) gains sufficient kinetic energy to overcome the potential barrier of the conducting channel and gets trapped in the gate dielectric, permanently changing its switching characteristic <ref type="bibr">[98]</ref>. Unlike the TDDB and BTI failure mechanisms, for which silicon-characterized reliability models are available from foundries, characterized models for HCI failure mechanisms are still in development for scaled nodes. Among these failure mechanisms, BTI is generally accepted as the most important mechanism for sub-10 nm nodes <ref type="bibr">[59,</ref><ref type="bibr">73,</ref><ref type="bibr">78]</ref>. HCI mostly occurs there under strong voltage/current conditions and TDDB has become less important because technologists have stopped pursuing ultra-high k values in the dielectric.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">OBSERVATIONS LEADING TO NCRTM</head><p>We expand on the three observations made in Section 1.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Observation 1: Workload-dependent Activation</head><p>To illustrate the application and input-dependent neuron activation, Figure <ref type="figure">5</ref> plots the spike firing rate of 100 randomly-selected neurons in AlexNet <ref type="bibr">[52]</ref>, a state-of-the-art CNN used for Imagenet classification. We report results for two randomly-selected training and test images.  We observe that spike firing rates of neurons depend on the image presented to the AlexNet CNN. Therefore, reliability improvement strategies based on design-time analysis with training examples may not be optimal when they are applied at run-time to process in-field data, a limitation of our prior work <ref type="bibr">[2]</ref>. We address this limitation by designing our proposed run-time framework NCRTM, which can adapt its decisions based on current data.</p><p>To demonstrate the workload-dependent nature of spike firing rate, Figure <ref type="figure">6</ref> plots the minimum, maximum, and average spike rate of all neurons in 10 machine learning workloads (see <ref type="bibr">Section 6)</ref> for test examples from their respective dataset. We observe that spike rates of neurons are strongly workload-dependent, and therefore, a workload-specific strategy is needed to optimally control the reliability aspect. This is precisely the objective of NCRTM. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Observation 2: Performance Trade-off in Reliability Improvement</head><p>In SNNs, information is encoded in spike times. Inter-spike interval (ISI) defines the performance metric in SNNs. To demonstrate how ISI is impacted by reliability-oriented decisions, Figure <ref type="figure">7</ref>(a)</p><p>shows the spike train generated by a neuron in AlexNet when processing a reference image. Each spike injects current into the crossbar to flow through the NVM cell. Figure <ref type="figure">7</ref>(b) illustrates the voltage of the on-chip charge pump that supply the reference voltages in the neuron to generate this spike train. The charge pump is operated at 1.8&#119881; for the entire 60ms interval, boosting its voltage to 3&#119881; only to generate spikes. Aging of the transistors in the neuron is 8.3 units (see Section 5.3 for aging computation) and the average ISI is 5.9ms (See Section 5.4 for ISI computation). Figure <ref type="figure">7</ref>(c) illustrates the charge pump's operating voltage when it is discharged to 1.2&#119881; after generating every spike and boosted again to 1.8&#119881; before generating the next. This is to de-stress the transistors in the neuron. Once de-stressed, the neuron becomes unavailable to generate spikes, introducing latency Although it is possible to estimate the worst-case ISI degradation at compile-time using SpiNe-Map <ref type="bibr">[9]</ref> and other similar approaches, such estimation can deviate significantly from the actual case in a highly dynamic environment, where testing data is different from the training examples (see Figure <ref type="figure">5</ref>). Therefore, a run-time manager is desirable to dynamically adapt the reliability-oriented decisions to limit ISI degradation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.3">Observation 3: Short-term vs. Long-term Aging</head><p>In this work, we demonstrate our approach for BTI-related failures. The general principle applies to any other failure mechanism. BTI aging manifests as: 1) A decrease in drain current and transconductance, and 2) An increase in off current and threshold voltage. When operated at a high voltage and temperature, these parameters strongly drift from their nominal values. In fact, in scaled technology nodes, this BTI aging happens even under nominal conditions and from the very start of using the devices leading to the so-called soft breakdown <ref type="bibr">[33,</ref><ref type="bibr">50,</ref><ref type="bibr">51,</ref><ref type="bibr">99]</ref>. Recent works such as <ref type="bibr">[41,</ref><ref type="bibr">42,</ref><ref type="bibr">50,</ref><ref type="bibr">51,</ref><ref type="bibr">70,</ref><ref type="bibr">77,</ref><ref type="bibr">99]</ref> suggest that BTI is the collective response of two independent defects -the as-grown hole traps (AHTs) and generated defects (GDs). AHTs and a small proportion of GDs can be recovered by annealing at high temperatures if the BTI stress voltage is removed (de-stress). Figure <ref type="figure">8</ref> illustrates the stress and recovery of the threshold voltage of a CMOS transistor on application of a high (&#119881; spk ) and a low voltage (&#119881; idle ). We observe that both stress and recovery depends on the time of exposure to the corresponding voltage level. This implies that when a neuron is idle, the BTI aging of the neuron recovers from stress. Figure <ref type="figure">9a</ref> shows the shift in threshold voltage of a NMOS transistor in a neuron with a constant firing rate of 50Hz and the neuron circuit de-stressed once every second (see Section 6 for the simulation setup). Figure <ref type="figure">9b</ref> shows the results using the same setup, but with the neuron circuit de-stressed once every 100ms. As this figure clearly shows, with longer de-stress interval (e.g., once every second), the transistor aging becomes irreversible. Therefore, the shift in threshold voltage of the transistor is higher than the case with shorter de-stress interval (e.g., once every 100ms).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">RUN-TIME MANAGER FOR NEUROMORPHIC COMPUTING (NCRTM) 5.1 A Motivating Example Showing the Need for Run-time Reliability Management</head><p>Figure <ref type="figure">10</ref> shows an example where four spikes (R1, R2, R3, &amp; R4) are generated from a neuron. These spikes are generated with some idle time between them (based on its input excitation). The figure illustrates the hybrid approach DTRO <ref type="bibr">[2]</ref> (see Fig. <ref type="figure">2b</ref>), where the neuron is de-stressed at run-time after generating 3 spikes. This fixed number is decided statically, considering the neuron's activation in some training example. This is illustrated in the top right corner of the figure, where we observe that the BTI aging (A BTI ) exceeds the aging threshold of 10 units after generating three spikes based on the idle periods between spikes in the training example.</p><p>Using this static approach, the de-stress operation is initiated upon generating R3, which delays the generation of R4 due to the non-zero latency of the de-stress operation. This causes a change in the ISI, which may lead to performance loss in SNNs (see Appendix B). At the time when the neuron circuit is being de-stressed, the BTI aging is below the threshold because a CMOS transistor recovers partially from BTI stress when idle. The length of the idle periods at run-time can be different from those used at design-time when the analysis is performed as shown in this example. Therefore, the static approach will unnecessarily introduce performance penalty in such a situation. Using fixed interval for de-stress (instead of counting the spikes) will also lead to a similar situation because the number of spikes within the de-stress interval still remains unknown at run-time, being dependent on the input excitation.</p><p>Figure <ref type="figure">10</ref> also shows NCRTM, our dynamic reliability management policy, where the de-stress operation of the neuron circuit is initiated by tracking its aging. NCRTM can generate the spike R4 because the aging of the neuron is lower than the aging threshold at the time of generating the spike. This is because NCRTM models both the stress and recovery of circuit aging at run-time.  There is no change in ISI. Therefore, NCRTM is better than the static approach both in terms of reliability and performance. This example demonstrates one scenario with sparse neuron activation. One can also imagine a counter scenario where the neuron is activated too frequently. In this case, the static policy can lead violating the critical threshold because it cannot adapt the de-stress interval at run-time. NCRTM can adjust its de-stress interval at run-time by tracking the aging (both stress and recover). In Section 7.4 we show only a marginal performance impact for workloads with frequent activation. Therefore, NCRTM is better than the static policy, when it comes to managing workload-specific circuit aging.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.2">High-level Overview</head><p>Based on the three observations in Section 4 and the motivating example in Section 5.1, we introduce NCRTM, a run-time reliability manager for neuromorphic hardware. Figure <ref type="figure">11</ref>  NCRTM is implemented as a controller to mitigating the aging of neuron circuits in each tile. To do so, NCRTM estimates maximum aging of the neurons in each tile by recording the number of spikes within a time window (see our aging formulation in Section 5.3). If the aging of a tile exceeds a threshold (&#119905;&#8462;_&#119886;), <ref type="foot">5</ref> NCRTM schedules the de-stress of the tile by making an entry in the de-stress queue (DSQ). However, the tile may not be de-stressed immediately. NCRTM de-stresses a tile opportunistically by estimating the change in ISI (called ISI distortion) that may result due to offlining the tile (see our ISI formulation in Section 5.4). During de-stresses, a very low voltage is applied to all the neurons in a tile for a time duration &#119905;&#119863;&#119878;&#119862; (discharge cycle time). This allows the transistors in the neurons to reverse the threshold voltage drift &#916;&#119881; th . The recovery time &#119905;&#119863;&#119878;&#119862; is modeled using the framework presented in <ref type="bibr">[100]</ref>.</p><p>Fundamental to the aging and ISI computation in NCRTM is a technique to estimate the number of spikes for each neuron. The spike counter (SC) in each tile can facilitate counting the spikes in a time interval. However, not all neuromorphic hardware is equipped with SC. Therefore, we present an alternative software-based technique for implementing spike counting.</p><p>Dynamic Reliability Management in Neuromorphic Computing 1:13 5.2.1 Spike Counting in Software. To understand spike counting in software, we describe the spike communication mechanism in neuromorphic hardware. Spikes from the post-synaptic neurons in a tile are converted into an address-event representation (AER) and broadcasted on the interconnect via the AER encoder. Figure <ref type="figure">12</ref> shows an example explaining the principles behind AER. Here, four neurons in a tile spike at time 3, 0, 1 and 2 time units, respectively. The encoder encodes these four spikes in order to be communicated on the interconnect.</p><p>network and 400 billion SOPS per watt for networks with high spike rates and high number of active synapses, whereas todays most energy-efficient supercomputer achieves 4.5 billion floatingpoint operations per second (FLOPS) per watt <ref type="bibr">[25]</ref>. Although the metric units are different, the computational capability can by some means be indicated by the number of operations per second.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.3">Synapse communications</head><p>This project is conducting the research on communication mechanisms and architecture of neuromorphic computing. In this section, some existent and conventional protocols and architectures are introduced as the basis of this research.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Address event representation protocol</head><p>Address-event representation (AER) is a communication protocol originally proposed as a method to communicate sparse neural events between neuromorphic chips. Massive interconnections among individual neurons or neuron clusters are allocated to the reduced number of channels by time division multiplexing. According to the protocol, each spike is represented by its location and spiking time.  We propose to count the spikes from each neuron by snooping on the interconnect. We implement counters, one for each neuron of the hardware. When a de-stress operation is initiated for a tile, all the counters for the neurons in the tile are reset to start counting the spikes for the next interval. The total storage overhead needed for implementing software-based spike counting for the 12-tile architecture of Figure <ref type="figure">11</ref> is 12 * 128 * 16 bits = 24Kb, with 128 post-synaptic neurons per tile. However, continuous snooping on the bus can introduce performance overhead. In the future, we plan to extend the interconnect routers to facilitate recording the spike packets. This will allow NCRTM to poll these readings periodically. With this necessary background, we now introduce our model for estimating aging (Section 5.3) and ISI (Section 5.4).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.3">Aging Computation</head><p>Equation 2 equates the MTTF of a CMOS transistor for a given overdrive voltage. BTI failures can also be modeled using the Weibull distribution with a scale parameter &#120572; and a slope parameter &#120573;. Reliability at time &#119905; can be written as</p><p>with the corresponding MTTF computed as</p><p>where &#915; is the Gamma function. Using the expressions for MTTF from Equations 2 and 3, and rearranging, we obtain the expression for the scale parameter &#120572; as</p><p>The aging (A), i.e., the degradation of the CMOS transistor can be expressed as</p><p>where the scaling factor &#120572; (&#119881; &#119894; ) can be calculated using Eq. 5.</p><p>ACM J. <ref type="bibr">Emerg</ref> We note that a neuron suffers aging when generating a spike. Each spike is of fixed voltage &#119881; &#119904;&#119901;&#119896; (see Figure <ref type="figure">4b</ref>) and a fixed time duration to the order of few ms. Therefore, both &#119881; &#119894; and &#119905; &#119894; in Equation 6 are constant. This allows us to express the aging formulation as</p><p>where &#119899; is the number of spikes generated by the neuron, &#916;&#119905; is the fixed spike duration, and &#119881; is the fixed spike voltage. Equation 7 allows us to represent the aging in terms of the number of spikes generated by a neuron and the unit aging parameter &#916;&#119905; &#120572; (&#119881; ) , which represents the aging per spike. This simplified formulation allows to estimate the aging in each neuron by simply counting the number of spikes it generates. Hence the performance overhead can be kept negligible.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.4">ISI Computation</head><p>To define ISI, we consider a tile consisting of &#119873; post-synaptic neurons and a finite interval of time [0,&#119879; ] for which the tile is active without undergoing a de-stress operation. The post-synaptic neurons generate &#119870; spikes in this interval, which are organized based on their generation time and the source neuron as</p><p>where &#119905; &#119899; &#119894; is the time of the &#119894; th spike generated by the &#119899; th neuron and &#119870; = &#119873; &#119894;=1 &#119896; &#119894; . The instantaneous ISI of the spike train from the &#119899; th neuron is <ref type="bibr">[43]</ref> </p><p>To estimate the impact of de-stress operation on ISI, we compute two statistics for each neuronthe instantaneous ISI (Equation <ref type="formula">9</ref>) and the average ISI, which is computed as the average of all ISIs for a neuron. Using &#119905;&#119863;&#119878;&#119862; as the time to de-stress a tile, the change in instantaneous and average ISI of the &#119899; th neuron are &#916;&#119868;&#119878;&#119868; &#119899; inst (&#119894;) = &#119868;&#119878;&#119868; &#119899; inst (&#119894;) + &#119905;&#119863;&#119878;&#119862; and &#916;&#119868;&#119878;&#119868; &#119899; avg = &#119905;&#119863;&#119878;&#119862;/&#119896; &#119873; (10)</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">EVALUATION METHODOLOGY</head><p>Figure <ref type="figure">13</ref> illustrates our simulation framework. An SNN-based application is simulated using CARLsim <ref type="bibr">[20]</ref>, a GPU accelerated SNN simulator used to train and test SNN models. CARLsim reports spike times for every synapse in the SNN. The spike times are used to perform mapping explorations optimizing some objective, such as performance (PyCARL <ref type="bibr">[8]</ref>) and reliability (RENEU <ref type="bibr">[88]</ref>). We use NeuroXplorer <ref type="bibr">[12]</ref>, a cycle-accurate simulator of neuromorphic hardware such as DYNAP-SE <ref type="bibr">[64]</ref>.</p><p>The neuron and synapse mapping obtained using the mapping exploration framework is applied to NeuroXplorer to perform cycle-accurate simulation of the application on the hardware model, using current data. NCRTM is implemented inside NeuroXplorer to estimate the circuit aging and control it by de-stressing the circuit when the aging exceeds a threshold. The change in spike latency due to the de-stress operation can be precisely modeled in NeuroXplorer, as shown in <ref type="bibr">[12]</ref>. All simulations are conducted on a system with 8 CPUs, 32GB RAM, and NVIDIA Tesla GPU, running Ubuntu 16.04.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>CARLsim</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Evaluated Applications</head><p>We evaluated 10 machine learning applications that are representative of three most commonly used neural network classes -convolutional neural network (CNN), multi-layer perceptron (MLP), and recurrent neural network (RNN). These applications are 1) LeNet <ref type="bibr">[56]</ref> based handwritten digit recognition with 28 &#215; 28 images of handwritten digits from the MNIST dataset <ref type="bibr">[37]</ref>; 2) AlexNet <ref type="bibr">[52]</ref> for Imagenet classification <ref type="bibr">[36]</ref>; 3) VGG16 <ref type="bibr">[82]</ref>, also for Imagenet classification <ref type="bibr">[36]</ref>; 4) ECGbased heart-beat classification (HeartClass) <ref type="bibr">[5,</ref><ref type="bibr">31]</ref> using electrocardiogram (ECG) data from the Physionet database <ref type="bibr">[63]</ref>; 5) multi-layer perceptron (MLP)-based handwritten digit recognition (MLP-MNIST) <ref type="bibr">[38]</ref> using the MNIST database; 6) image smoothing (ImgSmooth) <ref type="bibr">[20]</ref> on 64 &#215; 64 images; 7) edge detection (EdgeDet) <ref type="bibr">[20]</ref> on 64 &#215; 64 images using difference-of-Gaussian; 8) heart-rate estimation (HeartEstm) <ref type="bibr">[22]</ref> using ECG data; 9) gender classification using speech data (SpeechRecog) <ref type="bibr">[39]</ref>; and 10) RNN-based predictive visual pursuit (VisualPursuit) <ref type="bibr">[49]</ref>. The former 7 are supervised applications, while the latter 3 are unsupervised applications. Table <ref type="table">1</ref> summarizes the topology, the number of neurons and synapses of these applications, and their baseline accuracy on DYNAP-SE using PyCARL <ref type="bibr">[8]</ref>. <ref type="foot">6</ref>Table <ref type="table">1</ref>. Applications used to evaluate our approach NCRTM. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">Hardware Models</head><p>In our cycle-accurate simulator, we model the architecture of the DYNAP-SE neuromorphic hardware <ref type="bibr">[64]</ref> with the following configurations.</p><p>&#8226; A tiled array, with each tile accommodating 128 input and 128 output neurons. There are 65,536 crosspoints in each tile. &#8226; Spikes are digitized and communicated between cores through a mesh routing network using the Address Event Representation (AER) protocol. The DYNAP-SE platform uses static random access memory (SRAM) to implement the synaptic cells in each crossbar. However, in our simulator, we use Phase-Change Memory (PCM) as the synaptic element. Table <ref type="table">2</ref> reports the major hardware parameters.</p><p>In the future, we will demonstrate NCRTM on a real NVM-based silicon neuromorphic system.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Evaluated State-of-the-art Techniques</head><p>We evaluate the following three approaches.</p><p>&#8226; PyCARL <ref type="bibr">[8]</ref>: This is a performance-oriented approach to map SNN-based applications to neuromorphic hardware. This approach first generates clusters of neurons and synapses, where each cluster can fit on to the resources of a tile in the hardware. Then it uses an optimization algorithm to place these clusters to the hardware, maximizing performance of the learning application on the hardware. CMOS circuits are not de-stressed at run-time.</p><p>&#8226; RENEU <ref type="bibr">[88]</ref>: This is a reliability-oriented approach to map SNN-based applications to neuromorphic hardware. This approach also generates clusters of neurons and synapses from an application, but maps the clusters to the hardware minimizing the maximum aging while considering only the training data. CMOS circuits are not de-stressed at run-time.</p><p>&#8226; DTRO <ref type="bibr">[2]</ref>: This is a reliability-oriented approach where neuron and synapse circuits of the neuromorphic hardware are de-stressed at run-time at fixed interval. This interval is decided based on analysis performed at design-time using training data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4">Evaluated Metric</head><p>We evaluate the following metrics.</p><p>&#8226; Aging: This is the maximum circuit aging in DYNAP-SE for each machine learning workload.</p><p>&#8226; ISI: This is average ISI of each machine learning workload.</p><p>&#8226; Application Performance (accuracy): The performance, e.g., accuracy is defined in terms of misclassification rate for image-based CNN and MLP applications. For RNN applications that use time-series data, performance is measured in terms of error rate <ref type="bibr">[22]</ref>. &#8226; Aging per unit ISI distortion: This is an unified metric reporting the aging per unit ISI distortion for each workload, defined as</p><p>In formulating the optimization objective of Equation <ref type="formula">11</ref>, NCRTM aims to optimize (i.e., minimize) circuit aging A for a given constraint on the ISI distortion. In our earlier works <ref type="bibr">[9,</ref><ref type="bibr">32]</ref>, we have shown the dependency of application performance, e.g., accuracy on ISI due to inter-spike intervalbased information encoding in SNNs. Therefore, any distortion in ISI may lead to a reduction in performance <ref type="bibr">[9,</ref><ref type="bibr">32]</ref>. Correspondingly, the above optimization problem essentially reduces to minimizing the aging A for a given constraint on SNN accuracy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.5">Aging Parameters</head><p>To compute aging, the slope parameter of Weibull distribution is set to &#120573; = 2, and the operating temperature is set to 300&#119870;. Other fitting parameters are adjusted to achieve an MTTF of 2 years in the baseline system (PyCARL), corresponding to a threshold voltage shift of 10%. This is what Dynamic Reliability Management in Neuromorphic Computing 1:17 is typically accepted by technologists as the maximum allowed degradation before timing errors begin to appear.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">RESULTS AND DISCUSSION 7.1 Summary of Results</head><p>Table <ref type="table">3</ref> summarizes the key results. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.2">Circuit Aging</head><p>Figure <ref type="figure">14</ref> plots the aging of the neuron and synapse circuits in DYNAP-SE during the execution of the machine learning applications for each evaluated approach, normalized to PyCARL. We make the following three key observations. First, the aging due to RENEU is lower than PyCARL by an average of 2.5%. This improvement is due to the aging-aware neuron and synapse mapping policy of RENEU, which balances the aging of all tiles in the hardware. PyCARL, which balances the utilization of the tiles in the hardware, has higher aging. However, both PyCARL and RENEU are design-time based policies, i.e., they do not make any run-time decisions. Second, DTRO is a hybrid approach, which uses the neuron and synapse mapping of RENEU. Compared to PyCARL and RENEU, DTRO de-stresses all circuits in the hardware periodically at run-time. The de-stress interval is determined at design-time by analyzing the training data. The aging of DTRO is therefore lower than both PyCARL (average 35% lower) and RENEU (average 33.5% lower). Third, NCRTM, which is a run-time approach, has the lowest aging of all. The average aging of NCRTM is 74% lower than PyCARL, 73% lower than RENEU, and 60% lower than DTRO. The improvement of NCRTM is due to the precise tracking of aging at run-time using current data, to achieve a target MTTF. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.3">Threshold Voltage Shift</head><p>Circuit aging manifests as shift in threshold voltage. Figure <ref type="figure">15</ref> plots the shift in threshold voltage (&#916;&#119881; th ) in DYNAP-SE after executing each machine learning application continuously till the end of its lifetime of 2 years. We normalize the results so that the threshold voltage shift due to PyCARL is 10%. We make the following key observations. We observe that, compared to PyCARL, the average threshold voltage shift when using RENEU is 9.75%, DTRO is 7.0%, and NCRTM is only 4.8%. The threshold voltage shift is the lowest in NCRTM because the aging of NCRTM is lowest of all the approaches, which we reported in Section 7.2. Increase in threshold voltage results in the reduction in drive current, which in turn results in temporal performance degradation of neuron and synapse circuits in the neuromorphic hardware.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.4">Change in ISI</head><p>Figure <ref type="figure">16</ref> plots the ISI of the machine learning workloads on DYNAP-SE for each evaluated approach, normalized to PyCARL. We make the following five key observations. First, the ISI obtained with RENEU is similar to PyCARL. This is because RENEU generates a mapping of the workload to the hardware, which improves reliability without significantly hurting the performance. Since no run-time decisions are made in both these approaches, their performance at run-time are therefore similar. Second, the ISI obtained with DTRO is higher than RENEU by an average of 10%. This increase is because DTRO make run-time decision of de-stressing the neuron and synapse circuits periodically to control their aging. This leads to increase in latency, which increases the average ISI (see Equation <ref type="formula">10</ref>). The advantage in this case is lower aging, which we analyzed in Section 7.2, leading to a lower drift of the threshold voltage (see Section 7.3). Third, the ISI using NCRTM is higher than RENEU by an average of 12%. This increase is due to the run-time de-stresses in NCRTM (similar to DTRO), which introduces latency, impacting the ISI. The ISI using NCRTM is only 2% higher than DTRO. This increase is because NCRTM never allows the aging to reach critical levels and therefore, schedules more de-stresses by precisely tracking it at run-time. However, due to NCRTM's policy to schedule the de-stresses by tracking their latency impact on ISI, NCRTM ensures only marginal change in ISI. ISI change may lead to accuracy impact, which is discussed in Section 7.5. Fourth, the ISI of NCRTM is lower than DTRO for MLP-MNIST. This is because for this application, the circuit aging is generally lower due to the sparsity of spike generation in the workload. So, the BTI stress is recovered in the idle period. DTRO cannot track this recovery and therefore, applies a conservative control, unnecessarily constraining the performance. Finally, for the three unsupervised applications (HeartEstm, SpeechRecog, and VisualPursuit), the ISI using NCRTM is on average 10% lower than DTRO. This is because in the absence of training data for these applications, DTRO applies a conservative policy to de-stress neuron and synapse circuits frequently to prevent their aging from reaching a critical value. NCRTM, on the other hand, tracks the aging at run-time based on the data that these models encounter and de-stress the circuits, only when needed.</p><p>We conclude that for machine learning workloads with sparse activation, NCRTM is significantly better than design-time based approaches both in terms of reliability and performance. For dense activation, NCRTM improves reliability significantly compared to these approaches, with marginal impact on performance. Furthermore, NCRTM outperforms any design-time based policy, when the availability of training data is limited.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.5">Application Accuracy</head><p>Change in ISI manifests as loss in accucay of a machine learning workload. Table <ref type="table">4</ref> reports the accuracy of each machine learning workload on DYNAP-SE using the evaluated approaches. We make the following three key observations. First, the accuracy of RENEU and PyCARL are the same. This is because RENEU maps neurons and synapses of an SNN workload to the hardware resources statically to minimize the aging. It does so, ensuring that the spike communication latency on the interconnect does not induce any change in ISI compared to that obtained using PyCARL. Second, the accuracy of NCRTM is on average 4.52% lower than PyCARL and RENEU, and 0.76% lower than DTRO. This reduction in accuracy is a direct result of the change in ISI, which we analyzed in Section 7.4. Third, although NCRTM results in 4.8% lower accuracy than Baseline for AlexNet (71.7% Baseline accuracy compared to 68.2% accuracy using NCRTM), it reduces circuit aging by 62% compared to Baseline (See Sec. 7.2). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.6">Platform Exploration</head><p>Figure <ref type="figure">17</ref> illustrates the reliability impact of increasing the number of tiles in a neuromorphic hardware. The figure plots the aging results of NCRTM on DYNAP-SE with 16 and 32 tiles, normalized to the aging on DYNAP-SE with the baseline configuration of 12 tiles. We observe that the average aging with 16 and 32 tiles is 18% and 51% lower than the aging with baseline configuration of 12 tiles, respectively. Circuit aging is lower with more number of tiles. This is because with more tiles in the hardware, fewer neurons and synapses are mapped to each tile. Therefore, each tile of the hardware generates fewer spikes, which lowers the aging of its neurons and synapse circuits. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7.7">Temperature Dependency of Reliability</head><p>Figure <ref type="figure">18</ref> illustrates the temperature dependency of the aging in a neuromorphic hardware. We report the aging results of NCRTM at two elevated temperatures, 320K and 340K, for each of our machine learning applications. Aging results are normalized to NCRTM at 300K. We observe that aging increases with an increase in temperature. Aging at 320K and 340K is higher than the aging at 300K by an average of 7% and 30%, respectively. This is due to the exponential dependency of circuit aging on temperature (Equation <ref type="formula">6</ref>). We also observe from this equation that aging also depends on the voltage needed to operate the neurons and synapses in the hardware when generating and propagating spikes. Therefore, VGG16, ImgSmooth, and VisualPursuit, which have more spikes, have higher aging at the elevated temperatures than all other applications. Higher aging leads to higher threshold voltage shift in the transistors. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">CONCLUSIONS</head><p>This paper introduces NCRTM, a run-time reliability manager for neuromorphic computing. We observe that neurons and synapses in neuromorphic hardware are exposed to high voltages and/or currents because of the operating requirements of the Non-Volatile Memory, which are used for high density and low energy synaptic storage in the hardware. When exposed to these elevated conditions for too often, the CMOS transistors in the neuron and synapse circuit suffer strong aging, leading to hard breakdown. But in strongly scaled sub-10nm technology nodes, even under normal workloads, parametric soft breakdown mechanisms will start drifting the transistor parameters from their nominal values. In contrast to long-term aging, which permanently damages the hardware, shortterm aging in scaled CMOS transistors is mostly due to BTI. The latter is heavily workload-dependent and more importantly, partially reversible. Based on these observations, NCRTM dynamically destresses neuron and synapse circuits in response to the short-term aging in their CMOS transistors during the execution of machine learning tasks, with the objective of meeting a reliability target. NCRTM de-stresses these circuits only when it is absolutely necessary to do so, otherwise reducing the performance impact by scheduling de-stress operations off the critical path. We evaluate NCRTM with supervised and unsupervised machine learning applications on a neuromorphic hardware. Our results demonstrate that that for machine learning workloads with sparse activation, NCRTM is significantly better than design-time based approaches both in terms of reliability and performance. For dense activation, NCRTM improves reliability significantly compared to these approaches, with only marginal impact on performance. We conclude that NCRTM can be easily extended to incorporate other failure mechanisms. In the future, we plan on implementing NCRTM on NVM-based DYNAP-SE, when such board will be made publicly available.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">ACKNOWLEDGMENTS</head><p>This work is supported by the National Science Foundation Faculty Early Career Development Award CCF-1942697 (CAREER: Facilitating Dependable Neuromorphic Computing: Vision, Architecture, and Impact on Programmability).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A INTRODUCTION TO SPIKING NEURAL NETWORKS</head><p>Spiking Neural Networks (SNNs) <ref type="bibr">[60]</ref> are regarded as the third generation of neural networks (see Figure <ref type="figure">20a</ref>). SNNs consist of spiking neurons, which are implemented using integrate and fire <ref type="bibr">[13]</ref> model. In this model, a neuron fires a spike when its membrane voltage exceeds a threshold and subsequently the membrane voltage is reset. The moment of threshold crossing in a neuron defines its firing time. Post firing, the neuron goes into a refractory state, where the neuron cannot be excited to generate a second action potential (no matter how intense the input stimulus be) (see Figure <ref type="figure">20b</ref>). Spiking neurons are interconnected via synapses as shown in Figure <ref type="figure">20a</ref>.</p><p>Information Encoding in SNNs: Information in SNNs can be encoded using different techniques <ref type="bibr">[80]</ref>, prominent among which are rate coding <ref type="bibr">[96]</ref> and temporal coding <ref type="bibr">[75]</ref>. Rate coding encodes information as number of spikes within an encoding window without considering the temporal characteristics of the signal. Temporal coding encodes information as inter-spike interval (ISI), capturing the spatio-temopral structure of the input signal.</p><p>Machine Learning Approaches using SNNs: SNNs can be used to implement many machine learning approaches. One example is the supervised approach, where an SNN is first trained with examples from the field and then used for inference with current data. SNNs can also implement unsupervised, semi-supervised, and reinforcement learning-based machine learning approaches.</p><p>Learning Algorithms in SNNs: Currently, spike-based learning rules are limited, compared to the wide range of learning rules available for analog or rate-based artificial neural networks (ANNs). Most learning rules are based on unsupervised correlational learning rules, such as spike  timing dependent plasticity (STDP) <ref type="bibr">[21]</ref>, short-term plasticity (STP), and long-term plasticity (LTP). Other modifications include a localized version of backpropagation suitable for SNNs. However, these variants are supervisory and takes a while to converge. Attempts have been made to add a reinforcer to STDP based on the idea that dopamine in the brain carries a reward prediction error signal. In practice, dopamine modulated STDP (DA-STDP) takes a long time before the network has a strong enough signal to drive behavior <ref type="bibr">[19]</ref>. Recently, a reward-modulated STDP (R-STDP) learning is developed to train SNN controllers for obstacle avoiding behavior in mobile robots <ref type="bibr">[58]</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B IMPACT OF ISI DISTORTION ON PERFORMANCE OF SNNS</head><p>To illustrate how ISI distortion and spike disorder impact accuracy, we consider a small SNN example where three input neurons are connected to an output neuron. In Figure <ref type="figure">21</ref>, we illustrate the impact of ISI distortion on the output spike. In the top sub-figure, we observe that a spike is generated at the output neuron at 22ms due to spikes from the input neurons. In the bottom sub-figure, we observe that the second spike from input 3 is delayed, i.e., has ISI distortion. As a result of this distortion, there is no output spike. Missing spikes can impact application accuracy, as spike timings encode information in SNNs.  of crossbars in the hardware, latency, ISI, and spike disorder increases. This is because with increase in the number of crossbars, spike traffic on the shared interconnect increases, which increases the congestion, and delays some spikes more than others. When we use a hardware with 36 small crossbars arranged in a 6x6 mesh, we observe a significant increase of latency (average 3.2x), ISI distortion (average 6x), and spike disorder (average 1.5x) compared to the baseline configuration of using 4 large crossbars.</p><p>From this analysis, we conclude that when more crossbars are used for an application, latency, ISI distortion, and spike disorder increases, which jointly impacts accuracy, as described next.</p><p>Accuracy Impact: To illustrate how ISI distortion and spike disorder impact accuracy, we consider a small SNN example where three input neurons are connected to an output neuron. In Figure <ref type="figure">3a</ref>, we illustrate the impact of ISI distortion on the output spike. In the top sub-figure, we observe that a spike is generated at the output neuron at 22ms due to spikes from the input neurons. In the bottom sub-figure, we observe that the second spike from input 3 is delayed, i.e., has ISI distortion. As a result of this distortion, there is no output spike. Missing spikes can impact application accuracy, as spikes encode information in SNNs.</p><p>In Figure <ref type="figure">3b</ref>, we illustrate the impact of spike disorder on the output spike. In the top sub-figure, we observe that the spike A from input 2 is generated before the spike B from input 3, causing an output spike to be generated at 21ms. In the bottom sub-figure, we observe that the spike order of inputs 2 and 3 is reversed, i.e., the spike B is generated before the spike A. This spike disorder results in no spike being generated at In Figure <ref type="figure">5a</ref>, we illustrate is conventionally mapped to of these conventional approa neural networks, and custom NEUTRAMS <ref type="bibr">[25]</ref> and Eyeri approach PACMAN <ref type="bibr">[28]</ref> f large SNNs, (2) customize </p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>We believe these architectures are similar for other designs like TrueNorth<ref type="bibr">[35]</ref> and Loihi<ref type="bibr">[34]</ref>.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>Beside neuromorphic computing, NVMs are also used as main memory for conventional computing<ref type="bibr">[54,</ref><ref type="bibr">57,</ref><ref type="bibr">71,</ref><ref type="bibr">85,</ref><ref type="bibr">87,</ref><ref type="bibr">89,</ref><ref type="bibr">90]</ref>.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>BTI issues are also a reliability concern for standard DRAM and SRAM memories<ref type="bibr">[67]</ref>. However, due to the use of transistors as access devices, the peripheral circuits in DRAM and SRAM can use lower operating voltages &#8776; 1.2V. BTI-related reliability issues in DRAM and SRAM are therefore less severe than in NVM contexts<ref type="bibr">[87,</ref><ref type="bibr">90]</ref>.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3"><p>There are works that address run-time management for conventional multi-core systems<ref type="bibr">[81]</ref>.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_4"><p>ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2021.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_5"><p>The aging threshold &#119905;&#8462; &#119886; is a user-defined parameter used to achieve a given reliability target. ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2021.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_6"><p>The CNN models LeNet, AlexNet, and VGG16 are converted to spiking domain using our previously proposed converter<ref type="bibr">[5]</ref>. For the inference performance of the original model, readers are referred to<ref type="bibr">[74]</ref>.</p></note>
		</body>
		</text>
</TEI>
