<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Toward Comprehensive Shifting Fault Tolerance for Domain-Wall Memories with PIETT</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>07/04/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10344839</idno>
					<idno type="doi">10.1109/TC.2022.3188206</idno>
					<title level='j'>IEEE Transactions on Computers</title>
<idno>0018-9340</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Sebastien Ollivier</author><author>Stephen Longofono</author><author>Prayash Dutta</author><author>Jingtong Hu</author><author>Sanjukta Bhanja</author><author>Alex K. Jones</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Spintronic domain-wall memories (DWMs) offer improved memory density and energy compared to conventional memories but are susceptible to shifting faults. We propose PIETT (Pinning, Insertion, Erasure, and Translation-fault Tolerance) for improved misalignment correction versus the state of the art. PIETT proposes a derived error correction combined with multi-domain access approach to detect and correct a minimum of three misalignment faults after an arbitrary shift distance. Moreover, we characterize the rate of both misalignment and pinning faults in DWM nanowires and demonstrate that pinning faults are a significant concern to DWM. As such, PIETT is the first method combine correction of misalignment and pinning faults in random access DWMs. It also introduces novel PIETT Transverse Access Points (TAPs) that utilize a novel write access mode which can set/reset multiple domains in a single intrinsic operation and can store shift distance detection codes. By allowing checks between shifts of the intrinsic shift distance (e.g., 3 domains), using a single TAP per nanowire expands misalignment protection and determines the needed corrective shifts to correct faults in all nanowires. Two TAPs expands misalignment protection to correct misalignment by more than one position and detects pinning by detecting different shift distances at each extremity of the nanowire. PIETT leverages knowledge of pinned nanowire locations to guide a modified SECDED ECC with one additional parity bit stored in additional parity nanowires. Thus, PIETT in TAP mode can correct unlimited, potentially multi-position, misalignment faults and either up to three pinning faults or up to two pinning faults with up to one bit-flip fault using scrubbing. PIETT provides eight to 21 orders of magnitude improvement in mean-time-to-failure with similar or better area overhead and only a 1% system performance degradation compared to state of the art DWM misalignment correction.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">INTRODUCTION</head><p>Spin-Transfer Torque Magnetic memory (STT-MRAM) has gained traction for on-chip memory deployment due to its near-SRAM performance, CMOS compatibility, low static power, and good endurance <ref type="bibr">[1]</ref>. Unfortunately, STT-MRAM has insufficient density for main memory or secondary storage applications. Spintronic domain-wall memory-also referred to as "Racetrack" memoryoriginally proposed and demonstrated by IBM <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref>, retains the static energy benefits of STT-MRAM with a 10&#215; density improvement <ref type="bibr">[4]</ref>. DWM has a theoretical area per bit as small as 2F 2 , where F is the technology feature size <ref type="bibr">[5]</ref>. Moreover, DWM avoids endurance challenges by providing &#8805; 10 16 write cycles <ref type="bibr">[6]</ref> compared to other emerging memory candidates such as phasechange <ref type="bibr">[7]</ref> and resistive <ref type="bibr">[8]</ref> memories at 10 8 -10 9 and 10 11 -10 12 write cycles, respectively <ref type="bibr">[6]</ref>, <ref type="bibr">[9]</ref>.</p><p>DWM is constructed from ferromagentic nanowires-also referred to as tapes or racetracks-separated into domains and connected to one or more access transistor(s) to create access ports. Data is stored by magnetic orientation and accessed by shifting the magnetic domains along the nanowire and aligning the target domain to a fixed access device <ref type="bibr">[2]</ref>, <ref type="bibr">[10]</ref>. After alignment, data access is similar to an STT-MRAM Magneto-Tunnel Junction (MTJ). Thus, DWM has been proposed for non-uniform access structures like Non-Uniform Cache Access (NUCA) caches <ref type="bibr">[11]</ref>.</p><p>Unfortunately, slight fluctuations in shifting current can cause shifting faults. These faults include misalignment and pinning faults. Misalignment takes the form of overand under-shifting, ranging in frequency from 5&#8226;10 -5 to 10 -3 depending on shift distance <ref type="bibr">[12]</ref>. Pinning occurs due to imperfections in the domain wall caused by process variation. It can most commonly manifest as an erasure 1  where the pinning point functions as a barrier that prevents shifting within the nanowire <ref type="bibr">[13]</ref>, <ref type="bibr">[14]</ref>. Theoretically, an insertion may also be possible where the pinning point replicates itself and shifting continues through the whole nanowire. Either pinning fault puts the nanowire in an unrecoverable state.</p><p>In memory structures created from DWMs, multiple racetracks are bundled, accessed in parallel, and shifted together <ref type="bibr">[15]</ref>. In the bundle, additional racetracks storing Error Correction Codes (ECC) could be added to correct the data perturbed from misalignment or pinning faults. Unfortunately, this form of ECC alone is insufficient to determine when a shifting fault has occurred or to guide its correction. ECC cannot detect faults occurring in part of the nanowire not being read or when the faulty data matches the expected parity value, e.g., when neighboring data contains the same value. Additionally, fault discovery provides no insight into the type of fault, such as misalignment, pinning, or even a bit flip, as each nanowire is only sampled at a single point.</p><p>Several recent approaches have been proposed to mitigate misalignment in DWMs. Hi-fi proposes a Johnson code stored in additional synchronization domains to detect alignment <ref type="bibr">[12]</ref>. This can result in significant area and performance overheads due to the additional domains and access ports required. Greenflag proposed to correct misalignment using communication theory <ref type="bibr">[16]</ref> which was later extended as Foosball to add single bitflip protection <ref type="bibr">[17]</ref>. Unfortunately, these approaches require the entire nanowire be accessed in sequence making it unsuitable for implementation of random access memory. Moreover, none of Hi-fi, Greenflag, or Foosball can correct pinning faults.</p><p>To provide a more complete solution, we propose PIETT, or Pinning, Insertion, Erasure, and Translation-fault Tolerance, to correct faults from misalignment and pinning. PIETT has a high-performance method to correct only misalignment faults during shifting using a Derived Error Correction Coding (DECC) methodology. PIETT-DECC uses a Multi-Domain Reading (MDR) methodology <ref type="bibr">[18]</ref>, <ref type="bibr">[19]</ref>, <ref type="bibr">[20]</ref> that can determine the number of 1's in multiple adjacent domains in the nanowire. PIETT-DECC uses MDR to access the data signature, or number of 1's in the data domains, and stores 1's in the overhead domains to the right of the data domains to record the nanowire position. DECC stores external parity bits to the signature to detect and correct these misalignment faults.</p><p>In the presence of both misalignment and pinning faults, PIETT extends the MDR concept to introduce special Transverse Access Points (TAPs) deployed in extended padding bits at both extremities of the nanowire and uses them detect shifting faults. A TAP, conceptually akin to a STT-MRAM Multi-Level Cell (MLC) with t free layers, is constructed with t domains of the nanowire. In one shift operation, all t domains can be preset to '1's or reset to '0's and the number of '1's can be determined with MDR. To detect faults, prior to a shift, both TAPs are reset to a known state and read after the shift. If the shift occurred successfully both TAPs will report the correct alignment state. If there is misalignment, the TAPs will both report the same incorrect alignment state and the nanowire can be correctively shifted. If pinning occurs, it is detected with mismatched TAP alignment.</p><p>In this mode, PIETT can independently correct an unlimited number of misalignment faults including multiposition misalignment. Using SECDED or Single Error Correction Double Error Detection ECC parity nanowires to protect a group of racetracks, PIETT can correct at least three pinning faults within this group. PIETT supports multi-domain intrinsic shifts and is compatible with bit flip correction, correcting two pinning faults combined with a bit flip fault. To the best of our knowledge, this paper is the first paper first scheme to detect and correct both misalignment and pinning faults in DWM memories.</p><p>In particular, we make the following contributions:</p><p>&#8226;</p><p>We estimate the shift and pinning fault probability from process variation of domain wall notch width and depth, characterized using micromagnetic device simulation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>We propose DECC which leverages '1's counting to detect and correct at least three misalignment only faults after arbitrary shift distances in the nanowire.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>We propose TAPs which introduce multi-domain shiftwriting and leverage MDR within a DWM.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>We demonstrate how TAPs combined with padding bit encoding can be used to detect alignment or pinning faults and directly used to correct misalignment through corrective shifts.</p><p>&#8226;</p><p>We demonstrate directed scrubbing based on SECDED ECC guided by TAP-based pinning detection to correct up to three simultaneously pinned nanowires or up to two pinned nanowires in the presence of up to one bit flip fault per data location.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>We provide a detailed analysis of PIETT to evaluate the fault tolerance, performance, energy, and area overheads for a range of incident pinning fault rates.</p><p>DECC provides similar fault tolerance to Hi-fi <ref type="bibr">[12]</ref> while providing area improvement and more than 50% reduction in dynamic energy. When considering pinning faults, PIETT provides 21 orders of magnitude improvement in mean-time-to-failure based on the 10 -8 pinning fault rate determined by our model and scales well to higher fault rates, multiposition alignment faults, and longer nanowires. PIETT does increase shift latency, but has only a 1% system performance degradation. PIETT corrects misalignment and pinning with a similar area overhead to fault tolerance schemes with merely misalignment protection.</p><p>The remainder of this paper is organized as follows. Section 2 presents more detail on DWM, its shifting challenges, relevant novel access modes, leading solutions for mitigating shift faults, and other related work. The derived error correction mode of PIETT to solve misalignment faults is presented in Section 3. Section 4 explores pinning faults explaining the theory and presenting magnetic simulation results for pinning fault probability. TAPs are described in detail in Section 5. Section 6 demonstrates how PIETT can detect and correct misalignment and pinning with TAPs. The experimental setup and reliability, area, performance, and energy results of PIETT are described in Sections 7 and 8, respectively. Finally, we relate conclusions in Section 9.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">BACKGROUND AND RELATED WORK</head><p>An example of a planar (2D) DWM nanowire with shift write ports is shown in Fig. <ref type="figure">1</ref>  <ref type="bibr">[21]</ref>. The value of each domain is determined by its polarization and illustrated by arrow direction. During read access, a domain is aligned with the fixed layer (dark blue) of the access port. The resistance is detected by a current applied orthogonally through the nanowire across the fixed access port layer. Like STT-MRAM, the resistance is lower if polarization is the same direction as the fixed layer (parallel) and higher if polarization is opposite (antiparallel). Writing uses a much (often an order of magnitude) larger current. Alternatively, shift writing, shown in Fig. <ref type="figure">1</ref> at the read/write port, can improve both the speed and energy of writing <ref type="bibr">[21]</ref>.</p><p>An example of DWM data access is shown in Fig. <ref type="figure">2</ref>. The cross section of the R/W port using shift-based writes is shown in Fig. <ref type="figure">2</ref>(a) <ref type="bibr">[21]</ref> 2 . Presuming the nanowire starts in the center position, 2. Note, the design shown in Figs. <ref type="figure">1</ref> and<ref type="figure">2</ref>(a) differ from prior work <ref type="bibr">[21]</ref> by adding a second WWL (T3) because, while not needed for reading/writing, it is needed for correct shifting to prevent sneak paths between BLB and BL.  DWM demonstrations of memory array structures <ref type="bibr">[22]</ref> and Content Addressable Memories (CAMs) <ref type="bibr">[23]</ref> demonstrate fabrication feasibility with great potential for density, performance, and power consumption. Moreover, DWM technology has been proposed for utilization in a variety of positions in the memory hierarchy, including network-on-chips <ref type="bibr">[24]</ref>, as part of the cache hierarchy representing the last-level cache <ref type="bibr">[11]</ref> and multiple cache levels including L1 <ref type="bibr">[25]</ref>, in GPGPU registers <ref type="bibr">[26]</ref> and caches <ref type="bibr">[27]</ref>, and as a fast main-memory technology <ref type="bibr">[28]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RWL</head><p>DWM-based memories typically use a traditional hierarchical memory organized into ranks, banks, sub-arrays, tiles, etc. Because a bundle of nanowires contains multiple rows/words of data whose width is determined by the number of nanowires in the bundle, it is treated as a domain block cluster <ref type="bibr">[29]</ref>, <ref type="bibr">[30]</ref> or DBC as shown for a cache line granularity in Fig. <ref type="figure">3</ref>. Thus, data accessed from the memory can directly select the appropriate DBC in the peripheral circuitry, but to access the actual row/word requires shifting all the nanowires for alignment with the access point.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Shifting Faults</head><p>While shifting the DBC, one (or more) nanowires may experience an over-or under-shift misalignment fault and/or a pinning fault.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.1">Misalignment Faults</head><p>Misalignment faults, typically due to fluctuations in the shifting current <ref type="bibr">[2]</ref>, occur due to variation in the operating conditions of the system. In this case, the entire nanowire over-or under-shifts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.2">Pinning Faults</head><p>Unlike misalignment faults, pinning faults manifest due to operating conditions combined with fabrication imperfections, i.e., where the nanowire is not formed properly due to variations in the process. As discussed in Section 1, pinning can take the form of an erasure where shifting stops in the pinning point of the nanowire <ref type="bibr">[13]</ref> or as an insertion where the value is replicated at the pinning point. These behaviors occur when the shifting current is deflected to be near the lower or upper bound of tolerance and a variation defect has impacted the local domain-wall.</p><p>When a defect causes an erasure fault, the domain motion stops at the pin point and can be overwritten by the domain that follows. We provide a conceptual example of this fault in Fig. <ref type="figure">4(b)</ref>. When shifting from position (a)(i) and expecting to reach position (a)(ii), i.e., a shift to the left, one bit, d 2 , disappears at the pin point (shown in red) and the remaining domains in the nanowire stop moving.</p><p>In the case of an insertion fault, the domain motion for all domains starts at the same speed, however, as they interact with a defect the distance traveled is affected. When sufficiently stretched a replicated (inserted) domain is created. We show this conceptually in Fig. <ref type="figure">4(c)</ref>. The domain at the pin point (d 3 ) becomes pinned and replicates itself into the adjacent location. Both types of pinning can be detected because the domain motion at the extremities of the nanowire will appear as having different alignments.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Misalignment Fault Tolerance</head><p>Two main techniques have been proposed to detect and correct misalignment, one based on a dedicated code and access points (Hi-fi) <ref type="bibr">[12]</ref> and one based on data encoding using Varshamov-Tenegolts (VT) codes (GreenFlag/Foosball) <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref>. Hi-fi, like PIETT, targets 2D random access DWM memories with DBCs like Fig. <ref type="figure">3</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.1">Hi-fi</head><p>Hi-fi presents two techniques, p-ECC and p-ECC-O, which leverage additional access points and encoding techniques for misalignment  detection and correction. Fig. <ref type="figure">5</ref> shows a SECDED for misalignment example for both approaches. Hi-fi corrects faults by encoding the auxiliary domains with a pattern of alternating groups of two '1's and two '0's. Using the two adjacent read heads, the system can compare two values from the auxiliary bits and compare it against the expected system state. For example, if the system was expecting to read "00" but rather read "01" the tape is misaligned one position too far left. Similarly, reading "10" would signify one position too far right. Reading "11" would indicate misalignment by two, but not which direction.</p><p>The main difference between the two Hi-fi techniques is the location where the auxiliary information is stored in the racetrack. In Fig. <ref type="figure">5</ref>(a), p-ECC adds dedicated domains and two additional associated read-only ports to access the information, but accommodates multiple shifts between checks. In contrast, p-ECC-O, shown in Fig. <ref type="figure">5(b)</ref>, uses the already necessitated extra padding domains for auxiliary information. Unfortunately, one read and one write head are required at each end of the device to maintain and check the pattern, which only allows a single shift between checks.</p><p>Both schemes may be scaled to detect bit misalignment by two or more steps by modifying the code and the number of read heads for the auxiliary information. N-domain misalignment correction with N+1-domain misalignment detection requires a total of N+1 read ports.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.2">GreenFlag and Foosball</head><p>In GreenFlag <ref type="bibr">[16]</ref> reading requires the entire nanowire to be read in sequence, requiring a shift and read operation to access each data bit. If an undershift occurs a bit is read twice and if an overshift occurs a bit is lost, similar to what could happen in a communication channel. GreenFlag uses the VT codes and delimiters to recover missing bits and eliminate redundant bits. Thus, when writing much of the nanowire must be rewritten with the new encoding. Foosball <ref type="bibr">[17]</ref> extends GreenFlag with a new 8-bit delimiter capable of detecting a misalignment of up to two domains and a bit flip by adding parity nanowires.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.3">Suitability for Pinning Protection</head><p>Unfortunately, neither Hi-fi nor Foosball handle pinning faults. Like bit flips and unlike misalignment, pinning is actually destructive as it changes the data stored in the nanowire making it particularly difficult to correct. Foosball does handle bit-flips, but it does not address pinning. Moreover, it requires the assumption that for each the entire nanowire is accessed in sequence. PIETT is designed for 2D planar DBC structures that support parallel access. As Foosball's access mode is approximately 18&#215; slower and higher energy than these DBCs, we focus on comparisons with more closely related techniques that also target similar DBCs like P-ECC and P-ECC-O.</p><p>We were unable to find an obvious way to adapt P-ECC to detect or correct pinning. For P-ECC this can be easily grasped from</p><p>HEAD Fig. <ref type="figure">6</ref>. Shows a transverse read (a) from the right to the access port and (b) from the left to the access port.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Multi-Domain Reads</head><p>Multi-domain reads determine the number of parallel or antiparallel domains in a segment of a DWM nanowire. The first technique proposed to implement this function for DWMs is called a transverse read (TR) <ref type="bibr">[19]</ref>. TR applies a smaller current in the same direction as the shift current through a portion of the nanowire as shown in Fig. <ref type="figure">6</ref>. The current is initiated at the end of the nanowire (as shown in the figure) or at an access point and exits through the MTJ of an access point. This allows an access akin to multi-level STT-MRAM cell where multiple free layers are stacked on top of a single fixed layer. Thus, the tunneling magentoresistance (TMR) of multiple domains impacts the voltage sensed at the access port due to changes of the resistance state. TR has been demonstrated to distinguish the number of parallel or anti-parallel domains within four adjacent domains into different resistance groups <ref type="bibr">[19]</ref>.</p><p>While MDR may also be measurable through the Anomalous Hall Effect (AHE) <ref type="bibr">[31]</ref>, <ref type="bibr">[32]</ref>, recently, a multi-domain MTJ was proposed as a scalable alternative to TR for MDR <ref type="bibr">[20]</ref>. The multidomain MTJ creates an access port across multiple domains as shown in Fig. <ref type="figure">7</ref>. When a read current is applied, each of the domains function as parallel resistors allowing for different resistance levels based on the number of parallel and anti-parallel domains. This work demonstrates resilience to process variation and scalability to seven domains <ref type="bibr">[20]</ref>.</p><p>For an MDR in an arbitrary nanowire segment, R MD = &#8721; D-1 i=0 X i . To determine R MD using TR for the system in Fig. <ref type="figure">6</ref> </p><p>requiring two TRs and a standard read. By placing access points appropriately, '1's can be determined through parallel segmented TRs for an arbitrarily long segment of the nanowire in three steps <ref type="bibr">[19]</ref>. To determine R MD using multi-domain MTJs, for the configuration in Fig. <ref type="figure">7</ref> we can directly obtain the function R MD = &#8721; 4 i=0 . By alternating placement of the MTJs on the top and bottom of the nanowire, '1's can be determined in an arbitrarily long segment of the nanowire in two steps.</p><p>In Section 4 and beyond we characterize the pinning fault probability for a representative DWM nanowire and discuss a methods to detect and correct pinning faults. However, first we describe PIETT's advancement in misalignment only shift correction using derived error correction in the next section. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">PIETT WITH DERIVED ERROR CORRECTION</head><p>DECC relies on MDR to count the number of '1's in a segment of the nanowire. Encoding of the values stored in the padding bits can report the position of the nanowire. In DECC, each nanowire is constructed with a fixed domain representing a '1' on the right end and another representing a '0' on the left end. Thus, during left and right shifts, appropriate '1's and '0's are shifted into the padding bits on the right and left sides of the nanowire, respectively. The number of '1's indicates the position of the data within the nanowire. As a result, if an under-or over-shift fault occurs, the calculated number of ones will differ from the expected value.</p><p>Using the difference from the expected value, the fault can be detected and ultimately corrected.</p><p>A DECC example is shown in Fig. <ref type="figure">8</ref> where the data bits d i are shown in blue and the data bit aligned with the access port is shown in navy (dark blue). The padding bits on the left side (purple) contain '0's and the right side (beige) contain '1's. The position of the tape corresponds to the number of '1's in the padding bits. DECC uses an MDR to check the number of '1's. It then validates and, if necessary, corrects the alignment.</p><p>Consider the case where the racetrack begins in position 1 [Fig. <ref type="figure">8(a)</ref>] and attempts to shift to the left by one position to match position 2 [Fig. <ref type="figure">8(b)</ref>]. The total number of '1's prior to the shift is T OT = R MD . After the shift, the new total T OT should decrease by one. If an under-shift occurs, T OT &gt; T OT -1 requiring a left shift to balance the equation. If an over-shift occurs, the tape moves to position 3 [Fig. <ref type="figure">8(c)</ref>] and T OT &lt; T OT -1 and a right shift should correct the misalignment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Three Misalignment Correction Guarantee</head><p>T OT is the Hamming weight from '0' (H0) of the data bits defined as &#8721; 4 i=0 d i and the position of the racetrack defined by the '1's in the auxiliary bits. Thus, T OT -H0 can be used to verify the racetrack position. We define the H0 as a data signature. Rather than storing the signature using log 2 (n) bits for each racetrack, they are created on demand after a shifting operation has concluded. We store parity bits and SECDED ECC of the generated signatures in the DBC using STT-MRAM auxiliary bits as shown in red in Fig. <ref type="figure">9</ref>.</p><p>The method to protect three single domain misalignments using this parity information is described in Fig. <ref type="figure">10</ref>. We use reflected binary Gray codes to represent the signature to ensure that if the shift alignment is off by only one, the signature is only different by one bit. Thus, the parity bits detect misalignment by one position and the ECC is used to repair the signatures of the misaligned racetracks to guide corrective shifts where where LCL is the length of a cache line [Fig. <ref type="figure">10(a)</ref>]. One (or more) single misalignment errors with signature deviations in independent columns can all be detected and repaired as shown in Orange in Fig. <ref type="figure">10(b)</ref>. SECDED detects the presence of two errors in the same column and their location is dictated by the parity bit as shown in Purple in Fig. <ref type="figure">10(c</ref>). This works in the case of a third error in another column shown in Orange. If there are three errors affecting the same SECDED column the ECC correction may point to the wrong location shown in red, but the three parity bits will guide the location of errors for correction as shown in Fig. <ref type="figure">10(d)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Pinning</head><p>The signature DECC uses to determine misalignment cannot be guaranteed to change if pinning occurs. For example, consider the pinning examples in Fig. <ref type="figure">4</ref>. If an erasure occurs, the value d 2 is lost and the signature is expected to be incremented by '1' due to the left shift. However, if d 2 is '0', from an MDR Fig. <ref type="figure">4</ref>(a)(ii) is indistinguishable from Fig. <ref type="figure">4</ref>(b) and DECC will not detect a fault. Similarly for insertion faults, in DECC Fig. <ref type="figure">4</ref>(c) will be indistinguishable from Fig. <ref type="figure">4</ref></p><p>We include additional details on DECC including a synthetic uncorrectable fault limit for DECC in a preliminary version of this paper <ref type="bibr">[18]</ref>. However, in the next section we demonstrate the presence of runtime pinning faults followed by a discussion of how PIETT improves upon DECC to correct these pinning faults.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">PINNING FAULT MODELING</head><p>To create the domain walls that separate domains in a DWM nanowire, equally spaced fabricated notches are introduced to create pinning sites. The strength or pinning potential of a pinning site depends on the geometry of the notch, which can be modeled as described in Eq. 1 where q pin is the pinning site, V pin is the pinning potential at that particular location and M s is the saturation magnetization of the material used <ref type="bibr">[33]</ref>, <ref type="bibr">[34]</ref>, <ref type="bibr">[35]</ref>. &#963; d is the domain-wall width <ref type="bibr">[36]</ref>, <ref type="bibr">[37]</ref> and E pin is the notch energy density <ref type="bibr">[33]</ref> presented in Eq. 2 where A e x, K u , a, and M are the exchange coefficient, magneto-crystalline anisotropy, material lattice constant, and magnetization amplitude, respectively. A current pulse with adequate amplitude governed by the pinning potential can depin the wall from the notch positions and cause it to travel along the nanowire to the next pinning site. This is governed by the Landau-Lifshitz-Gilbert (LLG) equation <ref type="bibr">[38]</ref> in Eq. 3 where H e f f , &#945;, &#947;, and &#946; are the effective field, Gilbert damping constant, gyromagnetic ratio, and non-adiabatic spin-torque coefficients, respectively.</p><p>To examine the impact of variation, we studied a nanowire with 16 domains where each domain was 200nm long, the full nanowire is 3200nm and the width and thickness were set to 100nm and 4nm, respectively. The material properties are listed in Table <ref type="table">1</ref>. We used the most common triangular notches, which are resistant to depinning from thermal perturbation and require a minimized   </p><p>current pulse width 2.0&#215;10 11  6.5&#215;10 5 0.02 10 6 0.5 ns shift current. The notches are 50nm wide and 30nm deep. Using Eq. 3 we evaluated the ideal the critical current for a given set of nanowire dimensions and material parameters.</p><p>We then modeled the nanowire using the micromagnetic simulation program MuMax3, a widely used GPU accelerated space and time-dependent magnetization dynamics discretized finite-difference solver for nano-sized ferromagnets such as DWM nanowires <ref type="bibr">[39]</ref> that has been validated against industry standard simulation such as the Object Oriented MicroMagnetic Framework OOMMF <ref type="bibr">[40]</ref>. We characterized the nanowire for changes to the critical shift current density as we varied the notch width and depth by 5%, as described in previous modeling work in the literature <ref type="bibr">[33]</ref>, <ref type="bibr">[41]</ref>, at each notch position along the nanowire.</p><p>For any given notch, there is lower bound shift current density J L and an upper bound shift current density J U to depin and shift one position. For a shift current density J S in A/m 2 , if J S &lt; J L the domain wall will not depin and if J S &gt; J U it will travel more than one notch position. The critical shift current density was determined by testing the shifting behavior for different shifting current densities to find the critical shift current density for different variations of width and depth in MuMax. The characterized results showed a monotonically increasing nominal shift current as the notches were farther along the nanowire from the current source as predicted by Eq. 3.</p><p>To determine a fault we consider the relationship of J S to J L and J U at all notches in the nanowire using a similar methodology to prior work <ref type="bibr">[42]</ref>. Given a notch position i, if &#8704;i J S &lt; J i,L or J S &gt; J i,U then a misalignment fault-undershift or overshift respectivelyhas occurred. If for a notch k, due to variation in the system, J S &gt; J k-1,L but J S &lt; J k,L then domain-wall motion will stop at notch k and pinning (erasure) has occurred. Similarly, insertion can occur in a similar situation near J U .</p><p>To quantify erasure fault probability, we use the total differential method to define the maximum uncertainty of the actual critical shift current density in terms of each of the tested system parameters. Our simulation models determine the partial derivative of J L with respect to each input parameter determined through characterization. We assume a standard distribution due to process variation on these parameters. J L is determined by &#181; centered on the nominal value and &#963; equal to the overall uncertainty. J U is calculated in a similar way.</p><p>Since a correct shift operation requires all domain walls to shift in lockstep, for the nth domain wall to shift properly, domain walls (1, .., n -1) must also have shifted properly. Counting starts at one, since at zero if the current is under J 0,L it is categorized as an under-shift. Thus, the probability of fault free shifting at position n can be defined as</p><p>, where Q(i) is the probability that J i,L &#8804; J S . A successful full nanowire shift is P(m) where m is the total number of notches in the nanowire. The probability of erasure fault(s) is 1 -P(m). Using a similar approach with J U , we can define the probability of insertion faults.</p><p>Using this model, we verified a similar (same order) misalignment probability as prior work <ref type="bibr">[12]</ref> and obtained a pinning fault probability reported in Table <ref type="table">2</ref>. In the following section, we propose a circuit design for a transverse access point. This TAP forms the foundation for both pinning and misalignment detection in PIETT.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">TRANSVERSE ACCESS POINTS</head><p>To enable PIETT's combined misalignment and pinning detection we propose a TAP circuit as shown in Fig. <ref type="figure">11(a)</ref>. The TAP circuit is related to the shift-write access point <ref type="bibr">[21]</ref> but designed along the nanowire to create a segmented, MLC-like device. Our TAP circuit is constructed at the extremity of the nanowire with a fixed domain (in this case aligned right, which we correlate to logic '1') at the very end connected to the shift line (SLB). At the other end of the TAP, we place a fixed left/'0' separated by a standard domain-wall orthogonal to the nanowire and connected to the bit line (BL) through a MOSFET controlled by the VS signal.</p><p>By activating VS and driving current between SLB and BL (domain-motion happens in the anti-direction of current) and leaving off SL upstream, the free domains between the fixed '1' layer and the out of plane '0' layer can be set to '1's as shown in Fig. <ref type="figure">11(b)</ref>. With sufficient current this can occur in a single intrinsic operation and be slightly overdriven to prevent undershift. Overshift is not a problem because shifting in an extra '1' through the sink results in the same preset configuration. Reversing the polarity of BL and SLB will result in resetting these bits to '0' as shown in Fig. <ref type="figure">11(c</ref>). Thus, the novel programming concept behind   To verify this capability we conducted a magnetic simulation using the LLG micromagnetic simulator <ref type="bibr">[43]</ref> of the TAP circuit from Fig. <ref type="figure">11</ref> shown in Fig. <ref type="figure">12</ref>. In the magnetic simulation we can see the free domains to the left of the TAP moving right to left contain a '1' (red) adjacent to the TAP, followed by two '0's (blue), and a '1' at the far left. Fig. <ref type="figure">12(a)</ref> shows the alignment after a shift current between BL and SLB showing that all free domains in the TAP are preset but free domains outside the TAP remained undisturbed. Fig. <ref type="figure">12</ref>(b) shows resetting to '0's again without disturbing the free domains outside of the TAP.</p><p>To conduct an MDR in the TAP, we show the design for two options, TR, and MD-MTJ. For TR, we place another fixed layer orthogonal to the nanowire separated by an insulator (e.g., MgO) shown in hashed red in Fig. <ref type="figure">11</ref>. This layer is connected to the bit line (BLB) controlled by a MOSFET with a MDR signal. The VS MOSFET is turned off and the MDR transistor is turned on and a potential is applied between the bit line (BLB) and SLB to conduct the MDR. Alternatively, we can directly add an MD-MTJ above or below the nanowire, shown in green in Fig. <ref type="figure">11</ref>, connected to BLB through MDR and GND. MDR is off during preset and set.</p><p>Standard domain wall motion through the entire nanowire, including the TAPs, is still possible by turning off both VS/MDR MOSFETs and allowing current in the appropriate direction between SL and SLB. Should the wire shift left, '1's are added to the nanowire similar to the process shown in Fig. <ref type="figure">11(b</ref>), but they may proceed beyond the fixed '0' domain.</p><p>The discussion and simulation are for a single TAP added to the right end of a DWM nanowire. We can build a second mirrored TAP on the left end which can operate entirely independently and in parallel with the other. Moreover, we can swap the alignment to place the fixed '0' at the extremity and the fixed '1' on the internal end of the TAP for either the right or left TAP. The fabrication feasibility of TAPs is similar to fixed magentic fin-based writes using access transistors connected to BL and BLB, for which a CMOS layout is demonstrated <ref type="bibr">[21]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">PIETT WITH TRANSVERSE ACCESS POINTS</head><p>Using TAPs from Section 5 PIETT can discover relative position information after conducting a shift of the nanowire. This section describes how TAPs can detect both misalignment and pinning faults. While misalignments can be straightforwardly repaired by corrective shifts, a technique to correct pinning, or a mixture of pinning and bit-flip faults through scrubbing is described.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.1">Shift Fault Detection with TAPs</head><p>When over-shifting is possible, even with detection, it is necessary to add an additional padding bit at each end of the nanowire so that if over-shifting occurs when attempting to reach the extreme left or right data domain, data is not lost at the other end of the nanowire and corrective shifting is still effective. The TAPs must then be added to each end beyond this additional padding domain. Each TAP must also contain n + 1 free domains where n is the length of the maximum intrinsic shift possible in the system.</p><p>The TAPs, shown in Fig. <ref type="figure">13</ref>(a), comprise the outer four padding domains on each side. To detect and distinguish between undershift, over-shift, and pinning faults, the TAP bits are prepared prior to shifting. Based on their interaction with the other padding bits and the external fixed domain during the shift it is possible to determine whether a fault has occurred. If misalignment occurs, it is reported by both TAPs, simultaneously, both reporting motion that either exceeds or is less than the desired shift amount. Pinning is indicated if part of the nanowire moves a different distance than the other part, indicated by different motion reported by each TAP.</p><p>In PIETT, all non-TAP padding bits left of the data are set to '0' and those right of the data are set to '1'. Consider the case that we wish to shift the nanowire in the position from Fig. <ref type="figure">13(a</ref>) accessing d 2 to be able to access d 3 , requiring a left shift by one domain. Both TAPs are preset to all '1's by shifting both TAPs from left to right by four positions in parallel (see Section 5) as shown in Fig. <ref type="figure">13(b</ref>). Note, if either TAP was queried at this point with a MDR, the reported value will be 4 '1's as shown in the figure. Upon a successful shift, the nanowire ends up in the position shown in Fig. <ref type="figure">13(c</ref>). Note that both TAPs, now report "1110" or a read count of 3 '1's. On the left, one of the preset bits was evicted at the left extremity while a '0' padding bit entered the TAP. On the right side, a '0' was inserted into the TAP from fixed '0' domain on the right side.</p><p>Fig. <ref type="figure">13(d)</ref> shows the case where an under-shift occurs, as indicated by both TAPs reporting 4 instead of the expected 3 '1's, requiring a corrective left shift. If over-shifting had occurred, each TAP would read "1100" and report a read count of 2 '1's, requiring a corrective right shift, as shown in Fig. <ref type="figure">13(e</ref>). Given a TAP contains n + 1 free domains, a single TAP allows shifting by n domains in a single step protecting against an over-or undershift by k = 1. In a system free of pinning faults, with both TAPs, PIETT protects against a multiposition over-shift k &gt; 1 limited only by potential data loss from exceeding the padding bits. If the over-shift in the example is more than four domains (k &gt; 3) the system shifts back by one position until a '1' from the padding bits reenters the right TAP and places the system in a known state. A single corrective shift completes the correction. However, given the probability of misalignment by k &#8805; 2 is &lt; 10 -20 <ref type="bibr">[12]</ref> a pinning fault is more likely to occur. Fig. <ref type="figure">13</ref>(f) shows an erasure pinning fault where the pinning point, shown in red, is to the left of the head. Domain motion occurs from left to the pin point and stops. Thus, the right TAP reports "1110" to indicate motion by one position, while the left TAP reports "1111" to indicate no motion. The TAPs mismatched reporting signals pinning has occurred. An insertion pinning fault example shown in Fig. <ref type="figure">13</ref>(g), occurs when the right portion of the nanowire does not move but domain wall motion starts after the pin point. The left TAP reports "1110" while the right TAP reports "1111" again indicating pinning. In both cases, the pin point cannot be determined and scrubbing is the only remedy.</p><p>Right shifting follows the same principle except in this case both TAPs are reset to '0's such that '1's are introduced into the TAPs by domain-wall motion from the leftmost fixed '1' domain or the padding domains right of the data bits but left of the right TAP. We can guarantee that left non-TAP padding bits hold '0's and right non-TAP padding bits hold '1's by examining the behavior of the system at the extremities. For example when shifting left to one data extremity d 4 , all of the '1's preset into the right TAP make their way left into the non-TAP padding bits [Fig. <ref type="figure">13(h)]</ref>, with a similar (complemented) behavior when shifted to d 0 .</p><p>The shift steps are: 1 the TAPs are preset for a left shift or reset for a right shift, 2 the shift occurs, 3 if this is a read access and this shift reached the final access location, the read proceeds 3 , 4 the TAPs are tested and report one of correct shift, misalignment, or pinning, 5 if necessary, correct misalignment through steps 1 4 , 2 , and 4 until all nanowires are not misaligned, 6 if necessary, correct pinned nanowires. Once these steps are completed a replacement read can be conducted, or a write or subsequent shift are cleared to proceed. Of course, 3. If errors are later detected we assume the system can flush the access and pipeline until the corrected value is determined and returned prior to proceeding. This is standard practice to hide fault-tolerance delay for fault-free accesses.</p><p>4. Under-shifts may omit repeating step 1 . the technique for step 6 was not described. We describe that in Section 6.3. However, in the next section we describe a special case pinning detection for 1-bit TAPs, which can be applied to P-ECC-O from Hi-fi.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.2">1-bit TAPs and P-ECC-O</head><p>Unlike P-ECC and DECC, P-ECC-O writes an alternating pattern into the padding bits. This requires access points at each end of the nanowire as shown in Fig. <ref type="figure">5(b</ref>). This provides an opportunity to use these access points as a pseudo-TAP for pinning detection. However, to preserve the P-ECC-O misalignment functionality, for pinning detection, the value written must be a function of the bit at the extremity and the penultimate bit. Fortunately, in P-ECC-O there are suitable access points to be able to access these locations in the nanowire after each single domain shift. Recall that P-ECC-O uses the padding bits in a pattern of "110011..." such that it can detect under-or over-shift by one position and misalignment (but not under-/over-shift) by two positions <ref type="bibr">[12]</ref>. We show how P-ECC-O can be adapted to address pinning through an example in Fig. <ref type="figure">14</ref> for a shift from d 2 to d 3 . In normal P-ECC-O operation, the system, starting in the state from Fig. <ref type="figure">14(a</ref>) without a fault, would transition directly to Fig. <ref type="figure">14(c</ref>). Misalignment (over-or under-shift) is checked by testing the two outside bits in the direction of motion against the expected position in the pattern <ref type="bibr">[12]</ref>. To add pinning protection we write the complement to the last bit in the direction of domain-wall motion as shown in Fig. <ref type="figure">14(b)</ref>, where the left '1' is replaced by a '0.' If after shifting, the pattern is still overwritten, either pinning or an under-shift fault occurred. An over-shift is detected in the normal way. We can consider an under-shift a pinning fault, however, this conflation may decrease overall fault-tolerance. Instead, an additional head can be added on both sides of the nanowire (white outlined heads). This allows the detection of the alignment on both sides of the nanowire like a TAP. Thus, Figures <ref type="figure">14(b</ref>) and (d) can be differentiated. Next, we propose a technique for correcting pinned nanowires.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.3">Directed Scrubbing</head><p>When using nanowires identified by PIETT (or modified P-ECC-O) with pinning and misalignment faults it is relatively simple to fix misalignment through corrective shifts. It is less straightforward to correct pinning. We propose a technique, called directed scrubbing which allows the correction of faults from pinning.</p><p>Directed scrubbing requires additional nanowires to store parity data based on SECDED ECC for the DBC. First, the DBC is aligned with the farthest left or right data point, whichever is closer, and then read, corrected, and re-written as necessary, moving by single positions until the other extremity is reached. In completing this traversal, in addition to repairing the data domains the encoding domains of the pinned nanowires will all naturally return to the appropriate encoding as described in the discussion of Fig. <ref type="figure">13(h)</ref>.</p><p>Of course, during scrubbing, there is a probability of misalignment and additional pinning. Misalignment can be checked and corrected during scrubbing, without need to restart the scrubbing process. If pinning occurs, it can be detected, but scrubbing must start again. Thus, as pinning faults may occur simultaneously during a single intrinsic shift, they may also stack during scrubbing. Single correction ECC may not be sufficient to correct faults in a system with both scrubbing and misalignment protection. Inspired by DECC, which enhances correction by using parity bits to detect the nanowires exhibiting misalignment, by leveraging the location of the nanowires where pinning has occurred we can leverage SECDED with a single parity bit extension to detect and correct as many as three faults.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.4">Three Pinning-fault Correction Guarantee</head><p>TAPs report the nanowires that have experienced pinning. However, even if a nanowire has a pinning fault, it may not report an error during a read while scrubbing. If there are x pinned nanowires, the worst case is that ECC must protect against x errors, but fewer than x errors may also occur. SECDED ECC can correct one error when the location of the fault is unknown. However, if the location of the potential errors is known due to the TAPs, we can correct more errors. We show a variety of error cases during the scrubbing process in Fig. <ref type="figure">15</ref>, where data bits are shown in blue, Hamming Code parity bits are shown in red, and the Double Error Detection (DED) bit, is shown in gray. Locations of pinned nanowires are illustrated by yellow boxes and actual errors during this particular access are outlined in red. As PIETT reports each possible faulty position by noting the pinned nanowires, with SECDED it is possible to correct faults from two pinned nanowires as follows:</p><p>&#8226; ECC reports no faults, no re-write is necessary, Fig. <ref type="figure">15(a)</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>There is one pinned nanowire d 2 , ECC corrects one fault at position d 2 , the corrected bit may be directly re-written [Fig. <ref type="figure">15(b)</ref>].</p><p>&#8226; There are two pinned nanowires, d 2 and h 1 and ECC detects two errors (parity bit reports two errors shown in green). The correction is made by flipping the two bits belonging to the pinned nanowires [Fig. <ref type="figure">15(c)</ref>]. The most interesting case is the last case, where the DED bit, essentially a parity bit for all of the other data and Hamming Code parity bits, reports two errors. The code cannot directly pinpoint which bits are wrong and uses the location of the pinned nanowires to guide correction. However, with the knowledge of potential error location, it is actually possible to correct up to three errors.</p><p>In the previous example, either we know the number of errors or there is only one error in the data. In order to correct three errors, we need to reduce the errors to one these conditions. While the accessed location (cache line) has three simultaneous errors, the DED bit will not report a parity problem as would be the case in two errors. As a consequence, SECDED ECC will report this case as a single error. When combined with location information of the pinned nanowires resolution of all three errors is possible.</p><p>In this example, there are three possible faulty locations due to pinning, bits 2 and 4 of the data, and bit 1 of the Hamming Code. Thus for each scrubbing access, the possibilities are as follows:</p><p>&#8226; ECC reports no errors and no bits are rewritten [Fig. <ref type="figure">15(d)]</ref>.</p><p>&#8226; ECC reports one error, and it is pointing to a non-pinned nanowire [Fig. <ref type="figure">15(e)</ref>]. The presumption must be three errors and all three of d 2 , d 4 , h 1 must be written.</p><p>&#8226; ECC reports one error, and it is pointing to a pinned nanowire [Fig. <ref type="figure">15(f)</ref>]. The presumption is that d 2 must be written. However, if unlucky there may still be three faults. The value is updated with d 2 and a second ECC check is completed. If the DED bit now indicates two faults, then d 2 , d 4 , h 1 are written, otherwise only d 2 is written. This technique is successful as long as the nanowire storing DED bits is not pinned. We can solve this by duplicating the DED bits, requiring one additional nanowire per DBC beyond SECDED ECC. Three fault correction including pinning the DED nanowires(s) is guaranteed as follows:</p><p>&#8226; Cases with no pinning in either the DED (p 0 ) or DED 1 (p 1 ) nanowires-e.g., Fig. <ref type="figure">15</ref>(h)-resolve to the cases in Fig. <ref type="figure">15(b)-(g</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>If the p 0 nanowire is pinned [Fig. <ref type="figure">15(i)</ref>], p 0 is immediately detected and corrected using p 1 . If there is one other error in either d 4 or h 1 it is corrected using the Hamming code. If SECDED with the corrected DED bit reports two errors both d 4 , h 1 are written [Fig. <ref type="figure">15(i)</ref>]. This is similar to the simple case of SECDED ECC described in Fig. <ref type="figure">15(c)</ref>. The case where p 1 is pinned and p 0 is not follows similarly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>&#8226;</head><p>If both p 0 and p 1 are pinned and p 0 =p 1 [Fig. <ref type="figure">15(j)],</ref><ref type="figure"/> we cannot know if the DED value is correct. If p 0 /p 1 report the incorrect parity we write p 0 and p 1 .</p><p>&#8226; If both p 0 and p 1 are pinned and there is another pinned nanowire (e.g., d 4 ) [Fig. <ref type="figure">15(k)</ref>], we use the Hamming code to repair d 4 and then determine the parity and, if necessary, repair the values of p 0 and p 1 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6.5">Handling Bit Flips</head><p>As noted in prior work <ref type="bibr">[17]</ref> bit flip faults are possible in DWM due to communication faults over the memory bus when writing, or due to effects like read-disturbance denoted in DWM's spintronic cousin STT-MRAM <ref type="bibr">[44]</ref>, <ref type="bibr">[45]</ref>. Using a similar philosophy to Section 6.4, we can still guarantee three error correction if two of the errors come from pinning and one comes from a bit flip.</p><p>Consider in Fig. <ref type="figure">15</ref> that d 2 is a bit flip fault, thus we do not know its location. Like any single error, it can be directly corrected by ECC [Fig. <ref type="figure">15(b),</ref><ref type="figure">(f)]</ref>. However, if one error is reported, there could be three errors [Fig. <ref type="figure">15(e)</ref>]. We test again after ECC correction. ECC will report two errors because either one actual error was corrected, or a new error was added. Either way, the parity will not match signaling that three errors were originally present. Thus, ECC is tested again with both pinned locations corrected (d 4 , h 1 ) and now ECC corrects the actual flip at d 2 so that ultimately d 2 , d 4 , h 1 are written. In the case of two errors [Fig. <ref type="figure">15</ref>(c),(g)] we flip one pinned location and retest. In the case of (c) ECC will then find bit flip d 2 and h 1 , d 2 are written. In the case of (g), if we test with d 4 we are now in the case of (b) and if we test with h 1 we are now in the case of (e), which are solved.</p><p>If there is a bit flip in a DED bit like Fig. <ref type="figure">15</ref>(i), because p 0 = p 1 and p 1 reports a parity error the pinned locations are tested. If testing with d 4 ECC points to h 1 d 4 , h 1 , p 0 are written, otherwise p 1 is written. The remaining DED cases [Figs. <ref type="bibr">15(j)</ref> and (k)] follow similarly to Section 6.4.</p><p>Thus, log 2 (data block size) + 3 additional nanowires per DBC enables repair of either up to three pinned nanowires or up to two pinned nanowires and one bit flip with scrubbing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="7">EXPERIMENTAL SETUP</head><p>To evaluate the effectiveness of PIETT, we conducted experiments that study its reliability, area, energy consumption, and performance compared to related schemes. Our DWM memory architecture is based on FusedCache <ref type="bibr">[25]</ref>, which implements a combination of a set-associative L1 and Last-Level Cache (LLC) in DWM. The domains aligned with the access point belong to L1 and all the other domains logically belong to LLC. When L1 misses, shifting occurs in the DBC in order to access an LLC replacement. Otherwise, FusedCache has a similar organization to TapeCache <ref type="bibr">[11]</ref>. To evaluate the latency and energy of shifting we used a modified version of NVSIM designed specifically to model DWM memory <ref type="bibr">[46]</ref>, <ref type="bibr">[47]</ref>, <ref type="bibr">[48]</ref>. The static energy impact of PIETT is modeled through the inclusion of additional access points for each nanowire and the inclusion of additional nanowires for storing the parity data for each DBC and STT-MRAM elements for DECC.</p><p>As PIETT protects against up to three faults in misalignment alone and up to three pinning faults for misalignment with pinning, the size of the data block protected can have a significant impact on reliability. It is the convention to use 64/72 SECDED ECC for a cache line (or memory row) rather than 512/523 where the Length of a Cache Line (LCL) is 512; 64/73 or 512/524 for scrubbing requiring the extra parity bit. We present results for 64/72,73 as it best matches the conventional block size.</p><p>To model misalignment and pinning faults during simulation we consider that each DBC contains and shifts R racetracks with n data domains per racetrack, simultaneously. We define the probability of misalignment after performing a single shift of distance d as p a,d . Similarly, we define the probability of pinning faults in one racetrack after performing a single shift of distance d as p p,d . We use the values for misalignment and pinning from Table <ref type="table">2</ref>, where the pinning probability are obtained through our process discussed Section 4 and the misalignment probability is obtained from the literature <ref type="bibr">[12]</ref> and corroborated with the process in Section 4.</p><p>Since, fault probability is highly dependent on parameters such as domain size, process variation, shift current, etc., we also consider a sensitivity study of fault probabilities for p p,d from the results in Table <ref type="table">2</ref> (circa 10 -8 ) up to 10 -4 . Given that correction for misalignment and pinning are corrected orthogonally, we can independently consider p a,d and p p,d as similarly orthogonal. Given the previous treatment of p a,d in previous work that achieves sufficient misalignment protection lifetimes <ref type="bibr">[12]</ref>, <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref>, we discuss p a,d alone to evaluate PIETT with DECC in the context of energy and area improvements. Furthermore, given the probability p p,d , m is defined as the number of racetracks (out of the R racetracks) which are pinned during an intrinsic shift for the DBC, we can then define the probability of having m racetracks pinned. Using PIETT with TAPs, we focus on p p,d as any number of misalignments can be detected and corrected unless they lead to excessive pinning while conducting corrective shifts.</p><p>The memory and fault model were integrated into and simulated using the Sniper multi-core simulator <ref type="bibr">[49]</ref>. An architecture with an 8-way 4MB LLC cache and 8-way 32KB L1 cache was studied presuming n = 32. Thus, the DBCs are is composed of 512*32=16384bits. Access latencies are as follows: the data read latency is 0.98ns, write latency is 0.65ns, shift latency is 0.32ns, and tag access latency is 0.28ns <ref type="bibr">[25]</ref>. The CPU has four out-of-order cores running at a clock speed of 3 GHz. All the benchmarks used to profile the performances are workloads from SPEC-CPU2006 <ref type="bibr">[50]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8">RESULTS</head><p>Based on the experimental setup in Section 7 we evaluate the PIETT approach for reliability and examine its impact on energy, performance, and area overheads. In the following sections, P-ECC-O is the version modified to also detect and correct pinning faults.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.1">Reliability</head><p>PIETT-DECC (DECC) exceeds our target 10 year target, achieving a 15 year lifetime. This is the same order of the the 69 year lifetime for SECDED Hi-fi. The tradeoff is that DECC guarantees three misalignment corrections by one with improved area and energy compared to Hi-fi, which corrects all misalignments by one. For correcting misalignment faults, PIETT-TAPs (PIETT) provides superior fault tolerance as it can natively correct any number of misalignments by at least four making its lifetime essentially unbound, for misalignment fault rates in Table <ref type="table">2</ref>. If fault probability increases, Hi-fi and DECC lifetimes would decrease, while PIETT would remain essentially unaffected. As misalignment by two positions is reported as a 10 -20 fault rate, and misalignment by more than two positions is unmeasurably low <ref type="bibr">[12]</ref>, the misalignment uncorrectable fault rate of PIETT is better than Hi-fi with double error correction and triple error detection.</p><p>PIETT also detects and corrects faults of up to three pinned nanowires. In contrast, P-ECC-O is the only other approach capable (with modification) of detecting pinning faults.We calculated the Mean-Time-To-Failure (MTTF) for pinning incident fault rates ranging from 10 -8 as obtained from our nanowire model (Table <ref type="table">2</ref>) up to 10 -4 (same order as misalignment fault rates). Without pinning protections, the system MTTF will be between 2 s and 20 &#181;s for pinning fault rate of 10 -8 and 10 -4 , respectively. Fig. <ref type="figure">16</ref> shows the MTTF for PIETT protection of 14 workloads, where the variance is related to frequency of LLC access inducing shifts for the same range of incident fault rates. At 10 -4 , a particularly high fault rate, PIETT improves MTTF by eight orders of magnitude to 115 days, but still falls short of a 10 year target. As soon as the fault rate is &#8804; 10 -5 PIETT improves the MTTF by 14 orders of magnitude to a time of &gt; 385 years, which is well beyond a standard target of 10 years between failures. PIETT improves the MTTF by 21 orders of magnitude for a fault rate of 10 -8 of the same order as derived from our model. In the following result sections, we consider a pinning probability range of 10 -8 -10 -5 to respect the MTTF target.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.2">Area Comparison</head><p>A standard DWM nanowire consists of data domains, padding domains and an access point. Any additional domains or access points for latency optimization or fault-tolerance, decreases the area efficiency of DWM. P-ECC-O adds four extra heads, two read only and two read/write heads to write their alternating pattern and verify its conformity. In comparison, DECC adds STT-MRAM storage and PIETT adds a fixed number of additional padding domains and logic to provide the transverse write and read capabilities and extra nanowires to store the parity bits for scrubbing. These parity nanowires are also needed for the modified version of P-ECC-O.</p><p>Table <ref type="table">3</ref> provides the decomposition of the area (units based on feature size) for the different correction schemes for a nanowire size of n=32. The area is broken down by the base DWM area (domains plus heads), the area required to detect and correct misalignment faults, and the overhead to correct pinning faults, when possible. Furthermore, we show two overheads of P-ECC and P-ECC-O for protection against a misalignment of one or two domains, respectively. DECC has the lowest area overhead of all schemes. PIETT has comparable overhead to P-ECC while providing pinning protection, and scales better to larger misalignment protection while being 23% less area than pinning modified P-ECC-O.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.3">Performance</head><p>DECC and P-ECC provide similar performance guarantees as both schemes allow shifting to proceed to the final destination prior to misalignment detection/correction. PIETT's improved fault tolerance allows a multi-domain intrinsic shift, but requires a check and write of the TAPs between shift operations. The access latency and system performance in Cycles Per Instruction (CPI) results, shown in Fig. <ref type="figure">17</ref> and<ref type="figure">18</ref>, respectively, are compared to a no-correction baseline. P-ECC has a similar performance to DECC. PIETT and modified P-ECC-O are reported for the fault probabilities from Table <ref type="table">2</ref> with error bars to a pinning probability of 10 -5 . On average there is a significant latency increase of 1.9&#215; and 2&#215; for PIETT at these pinning probabilities due to the shift and check nature of TAPs. Fortunately, due to the fact that this impacts LLC accesses only, the resulting impact in CPI for the same incident fault rates is only 1% and 2% degradation, respectively.</p><p>In comparison, modified P-ECC-O, the only other approach that detect pinning has a latency reduction of 5.0&#215; and 5.4&#215; with a more substantial 7% and 9% CPI degradation, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.4">Energy Comparison</head><p>Fig. <ref type="figure">19</ref> shows the energy improvement of DECC over Hi-fi. DECC provides an average of 52% improvement over P-ECC and a 75% reduction over P-ECC-O for misalignment only fault protection. Fig. <ref type="figure">20</ref> shows the energy overhead of PIETT in comparison to P-ECC-O, P-ECC and DECC for the fault probabilities in Table <ref type="table">2</ref> with an error bar that increases the pinning fault probability to 10 -5 . PIETT is considerably more energy efficient that P-ECC-O, requiring 1  3 of the energy and reduces energy by more than 35% compared to P-ECC. It does increase energy by about 20% over DECC, but it is important to note that neither P-ECC nor DECC can correct pinning faults and we discuss this comparison further in Section 8. <ref type="bibr">6</ref>.</p><p>From these results we can observe there is a "fixed" energy overhead (similar to the latency overhead) due to the additional   operations to prepare and check amid shifting and the additional parity tapes that shift and consume energy in the DBC, but are necessary when scrubbing is required. There is also a variable cost based on scrubbing the system.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.5">Bit Flips</head><p>In prior work <ref type="bibr">[16]</ref>, <ref type="bibr">[17]</ref> bit flips could be misconstrued as misalignment faults. Bit-flips could also be problematic for DECC (throwing off the signature or encoding bits) unless protected in some other fashion. Prior work has explored how these bit-flip tradeoffs can be considered with correction in STT-MRAM <ref type="bibr">[44]</ref>.</p><p>Due to the TAP concept, bit flips cannot be interpreted as shifting faults in PIETT. In Fig. <ref type="figure">21</ref> we show the impact to MTTF of PIETT using the shifting and pinning probabilities from Table <ref type="table">2</ref> with the same range of bit flip probabilities [10 -9 -10 -6 ] studied in prior work <ref type="bibr">[17]</ref>. PIETT still protects the system well beyond the target 10 years by several orders of magnitude.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="8.6">Discussion</head><p>PIETT provides two methods for misalignment protection, DECC and PIETT with TAPs. If pinning faults are inconsequential and bit-flips can be managed as is assumed in prior work <ref type="bibr">[12]</ref> then DECC provides a reasonable 15 year misalignment guarantee with dramatic savings in energy and area. If pinning is significant then PIETT with TAPs provides significant protection against misalignment, pinning, and bit while maintaining within  circa 1% performance overhead, dramatically improved energy and similar area overhead of prior work that cannot mitigate pinning or bit flips. Compared to P-ECC-O modified to address pinning, PIETT is considerably better in area, energy, and performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="9">CONCLUSION</head><p>Manufacturing scaled DWMs will introduce more variation, more defects, and lead to a higher probability for shifting faults to occur. For DWMs to gain traction in real systems, these faults must be efficiently addressed. We propose PIETT that can address misalignment and pinning faults as well as bit flips in random access DWM memories. In PIETT's highest performance and lowest energy mode, DECC, it can provide 15 year reliability guarantees with &gt;50% energy with area improvements to over the state of the art to correct misalignment-only faults. As pinning fault tolerance is more complex than misalignment because pinning is difficult to detect and harder to correct. PIETT with TAPs is a fault tolerance solution that detects both misalignment and pinning through novel transverse access points placed at the two nanowire extremities. It uses corrective shifts to repair misalignment. PIETT with TAPs leverages knowledge of the location of pinned nanowires to improve the facility of SECDED ECC to repair errors in three pinned nanowires or two pinned nanowires and no more than one bit flip per data element. Without protection from pinning faults, our demonstrated 10 -8 pinning fault rate indicates DWM devices fail within seconds without pinning protection. In contrast, PIETT can provide effective fault tolerance for pinning fault rates &#8805; 10 -5 with MTTF of nearly 400 years. For our modeled fault probabilities (see Table <ref type="table">2</ref>), we can guarantee a lifetime over 10 11 years against pinning faults and a superior protection against misalignment, comparable performance, and an energy reduction of 35% compared to Hi-fi. Important future directions include creating a parameterized fault model for misalignment and pinning of DWM nanowires under different technology nodes, for different amounts of variation, and for different material parameters to further guide fault-tolerant DWM memories. Scaling up multi-domain access for more domains and using MDR and/or TAPs for capabilities beyond fault tolerance are also important future directions.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>This article has been accepted for publication in IEEE Transactions on Computers. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/TC.2022.3188206 &#169; 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information. Authorized licensed use limited to: University of South Florida. Downloaded on July 31,2022 at 19:35:23 UTC from IEEE Xplore. Restrictions apply.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_1"><p>This article has been accepted for publication in IEEE Transactions on Computers. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/TC.2022.3188206 &#169; 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.</p></note>
		</body>
		</text>
</TEI>
