

# Magnetic Tunnel Junction Reliability Assessment Under Process Variations and Activity Factors and Mitigation Techniques

Anirudh S. Iyengar<sup>1,\*</sup>, Swaroop Ghosh<sup>1</sup>, and Nitin Rathi<sup>2</sup>

<sup>1</sup>School of Electrical Engineering and Computer Science, State College, PA, 16828, USA

<sup>2</sup>Department of Electrical and Computer Engineering, Purdue University, West Lafayette, Indiana, 47907, USA

(Received: 14 January 2018; Accepted: 12 February 2018)

Time Dependent Dielectric Breakdown (TDDB) is a major concern for the reliability of magnetic tunnel junctions (MTJs) in Spin-Transfer Torque RAM (STTRAM). We provide a breakdown model to capture the MTJ lifetime under dynamic read/write activity factors. We also propose static and dynamic current throttling coupled with circuit and system level techniques to improve the MTJ reliability at the cost of small performance penalty. Simulations using SPLASH benchmarks demonstrate  $\sim 10^{10} \times$  improvement in MTJ reliability of the most frequently accessed cache sets for a minor IPC loss of  $\sim 1.3\%$  for dynamic throttling and  $\sim 2\%$  for static throttling under process variations.

**Keywords:** Magnetic Tunnel Junctions, Time Dependent Barrier Breakdown, Lifetime, Mean Time to Failure, Activity Factors, Process Variations.

IP: 128.118.7.237 On: Wed, 26 Jun 2019 22:52:13

Copyright: American Scientific Publishers

Delivered by Ingenta

## 1. INTRODUCTION

Spin-Transfer Torque-RAM (STT-RAM)<sup>1</sup> using Magnetic Tunnel Junction (MTJ), is a promising memory technology due to its non-volatility, high speed of operation, very low power consumption, large endurance, high density, and CMOS compatibility. The MTJ employs a thin tunnel oxide barrier (typically MgO) sandwiched between fixed and free magnetic layer. The dimension of the MgO is in the order of  $\sim 1$  nm. With technology scaling, the MTJ dimensions shrink without compromising its resistive properties. As the thin oxide experiences high electric field due to extremely thin dimension ( $<1$  nm), the breakdown of the oxide determines the MTJ reliability.<sup>2,3</sup> The high write current accentuates this issue further as the high current creates more voltage drop across the oxide. Larger memory size and process variation adds another dimension to this problem where one bit out of several MB can experience very high voltage drop. One possible case occurs when the access transistor is of extremely low  $V_{th}$  and the entire voltage drop appears across the MTJ.

Lifetime of MTJs is usually measured with respect to the Time Dependent Dielectric Breakdown (TDDB) mechanism. The experimental techniques to estimate the

TDDB is by subjecting the MTJ to a constant high voltage or current stress and measure dielectric breakdown time. However, under normal operation, the MTJs experience multiple writes and reads with different activity factors (AFs). Furthermore, the operating voltages are different during read and write. A DC model fails to capture these detailed circuit behaviors resulting in pessimistic estimation of reliability.

Research has been done to describe and model the effects of oxide barrier breakdown.<sup>3–5</sup> In Ref. [4], the lifetime estimation is formulated using the percolation model,<sup>6</sup> where the statistical probability (and time) of the tunneling electrons creating a closed path between the two magnetic layers is captured. Although the model captures the barrier degradation in detail, it does not describe the degradation under normal operation (multiple reads and writes) and varying AF. In Ref. [5], a new model is presented that incorporates the concept of charge trapping due to the tunneling current which causes stress on the barrier layer, leading to eventual breakdown. This model takes the variation of MTJ reliability of the MTJ under different pulsed regimes into account. Several studies have been conducted on the reliability and lifetime calculation of the MTJ.<sup>4,7,8</sup> However, lifetime estimation using read/write frequency and AF have not been considered.

The lifetime of non-volatile based memories (NVM) such as phase change memory and resistive RAM have

\*Author to whom correspondence should be addressed.  
Email: asi7@psu.edu

been studied extensively and various architectural techniques have been proposed. Broadly, there are two categories, reducing the number of writes and distributing the writes evenly.<sup>9–12</sup> In Ref. [9], a cache replacement mechanism is proposed that reduces the number of writes by giving preference to unmodified entries when choosing a victim for replacement. But this technique has a negative impact on miss rate and also degrades the latency and energy depending on the preference value. In Ref. [10], a novel algorithm is proposed for achieving a balanced write distribution in NVM by selectively writing to SRAM and NVM. However, this technique does not leverage the properties of NVM to its fullest. In Ref. [12], an integrated wear leveling mechanism with two techniques namely *row-shifting* and *segment-swapping* is proposed. This technique however suffers from high overhead of hardware address mapping table. The above techniques fail to consider the AF and the read/write frequency.

We observe that write current is strongly correlated to the MTJ write latency and lifetime. Figure 1 shows the dependence of write latency and write current. It is to be noted that the latency increases as the current is lowered. This can result in performance degradation. However, the reliability can drastically improve due to lower voltage across the MTJ. Figure 1 shows that a 50% reduction in write current can improve the MTJ time to failure (TTF) by  $\sim 10^5$  at the cost of 2 $\times$  higher latency. We also note that read is a more frequent operation than write. Therefore, consideration to write operations only for lifetime estimation can lead to a severe underestimation of MTJ lifetime. Furthermore, the write data polarity may also impact the lifetime since the MTJ experiences different voltages in high and low resistance states.

Based on the above observations we propose a framework for reliability assessment and mitigation (Fig. 2). First, we find write latencies for different amount of write



Fig. 1. Write latency with respect to write current. The time to failure is also plotted.



Fig. 2. Overview of the proposed reliability analysis and mitigation.

currents. Then a LLG (Landau Lifshitz Gilbert)<sup>13–14</sup> equation based model<sup>15</sup> is used for accurate circuit simulations. The latency obtained in this step is used in the gem5<sup>21</sup> simulator to determine the read and write profile of the benchmarks. The read/write profiles show that the read and write are not evenly distributed. Few bits experience substantially higher number of read/write operation than others. These bits are the potential candidates for breakdown and determine the lifetime of the NVM. The read/write profile of these bits is studied to determine the activity factor which is then applied to the lifetime model to estimate the mean time to failure. Two techniques are employed to mitigate the degradation: (a) static write current throttling: where the write current is lowered to improve the lifetime at the cost of performance overhead; and (b) dynamic write current throttling: where the write current is lowered for only those bits which are frequently accessed and have low write latency due to process variation. By combining process variation with dynamic current throttling the performance degradation due to lower write current is balanced by the low write latency. The reliability is re-estimated after this step.

To the best of our knowledge this is the first attempt to study the effect of number, AF and polarity of read/write towards the reliability of the MTJ. We also propose a static and dynamic current throttling coupled with process variation for reliability improvement. In particular, we make the following contributions in this paper:

- We provide a time dependent dielectric breakdown model that considers number of read/write operations, polarity of the data and activity factor to calculate MTJ lifetime.
- Our analysis indicates that ignoring read operation and AF of read and write operation can result in significant under-estimation of MTJ reliability.
- We perform system level analysis and propose static write current throttling to make trade-off between MTJ lifetime and performance.

- We also propose a system level technique to throttle the write current dynamically for frequently accessed bits having low write latency due to process variation. This improves lifetime with minimal performance overhead.

The rest of the paper is organized as follows. In Section 2, we describe the basics of the MTJ and provide the lifetime model. Static and dynamic write current throttling analysis are discussed in Section 3. Simulation framework and results are presented in Section 4. Lifetime dependency on process variation and data polarity is presented in Section 5. Conclusions are drawn in Section 6.

## 2. BASICS OF MTJ AND MODELING

### 2.1. Brief Introduction of MTJ

Figure 3 illustrates a MTJ that is composed of a free magnetic layer and a pinned magnetic layer separated by a thin magnesium oxide (MgO) barrier. The resistance of the MTJ is low (high) if the free layer magnetic orientation is parallel (anti-parallel) with respect to the pinned layer. The orientation of the free layer can be changed from parallel to anti-parallel (or vice versa) by injecting current from bottom-to-top (or top-to-bottom). Read and write operations create voltage drop across the MTJ. The thin oxide barrier experiences high electric field which degrades the reliability of the device. In this context it should be noted that: (a) the voltage across the MTJ is a function of read and write operation and, the MTJ resistance; (b) write operation is bipolar in nature. Therefore, the voltage across the MTJ is positive in one direction and negative in other. Furthermore, the voltage drop is higher during high-to-low transition compared to low-to-high transition; (c) the read operation is unipolar. Therefore, the voltage drop is positive and a function of the MTJ resistance only; and (d) the current needed during write operation is typically higher than the read operation. Therefore, the voltage drop across the MTJ is higher for write compared to the read. Considerations of the above factors are important for the reliability assessment of the MTJ under dynamic workload.



Fig. 3. MTJ schematic with the free layer, oxide barrier and pinned layers. The voltages during read/write operations is also shown.

### 2.2. Reliability Model

E-model of dielectric breakdown<sup>16</sup> is popular for estimation of MTJ intrinsic barrier breakdown. The applied voltage induces an electric field which causes an overall reduction of activation energy  $E_A$ , leading to an increased breakdown probability. A detailed analysis is presented in Ref. [17] to establish that the observed intrinsic dielectric breakdown of the oxide barrier follows the E-model. Other results<sup>18</sup> are also in close agreement with this observation. According to the E-model the breakdown probability is defined as

$$p(t) = \frac{dF(t)/dt}{1 - F(t)} \quad (1)$$

where  $F(t)$  denotes the fraction of devices that break intrinsically after a time 't' and  $p(t)$  is defined as

$$p(t) = A \exp\left(\frac{V(t)}{B}\right) \quad (2)$$

where  $V(t) = E(t) * t_{ox}$ , is the time-dependent oxide voltage with  $E(t)$  being the electric field and  $t_{ox}$  being the barrier thickness. If no explicit time dependence is assumed for  $p(t)$  then  $A \propto A_j \exp(-E_A/kT)$ , where  $A_j$  is the junction area,  $E_A$  is the activation energy for the dielectric breakdown, and  $B \propto kTt_{ox}/a|qZ|$ , where  $a \approx 2 \text{ \AA}$  is the atomic spacing of MgO<sup>19</sup> and  $Z = 2$  for Mg<sup>2+</sup> ions. In the case when  $dV/dt$  is a constant, an intrinsic failure  $F(t)$  can be given as

$$F(t) = 1 - \exp\left[-p(t)B\left(\frac{dv}{dt}\right)^{-1} + AB\left(\frac{dv}{dt}\right)^{-1}\right] \quad (3)$$

The maximum failure rate, which is the peak of  $dF(t)/dt$ , is found at

$$V_{MAX} = B \ln\left(\frac{dV}{dt} \cdot \frac{1}{AB}\right) \quad (4)$$

where  $V_{MAX}$  is the breakdown voltage observed at large voltage ramp speed  $dV/dt$ . For a time independent breakdown probability density  $p(t)$ , the mean lifetime is expressed as

$$\tau_{1/2} = \frac{\ln(2)}{p(t)} \quad (5)$$

where  $\tau_{1/2}$  is the time for when 50% of the devices will experience breakdown. By curve fitting the experimental data from Ref. [19], we derived the corresponding values of constants for  $A$  and  $B$  to be  $7 \times 10^8$  and 0.27 respectively assuming a constant rate of degradation due to both reads and writes. Figure 4 shows the variation of  $\tau_{1/2}(s)$  as a function of applied bias voltage for four different barrier thicknesses (0.85 nm–1.15 nm) and a junction area of  $40 \times 40 \text{ nm}^2$ . Under typical operating conditions (0.4 V), the lifetime of the MTJ is estimated to be  $\sim 10^{8.5} \text{ s}$  or  $\sim 10$  years (for  $t_{ox} = 0.85 \text{ nm}$ ) which adheres to the values presented in Ref. [20].



Fig. 4. Estimated lifetime using E-model for varying bias voltage for different values of barrier thickness and junction area of  $40 \times 40 \text{ nm}^2$ .

### 2.3. MTJ Lifetime and Mean Time to Failure

The E-model provides the time to breakdown assuming constant voltage across the MTJ for the entire lifetime. In reality, the voltage depends on various factors as described in Section 2.1 as well as on the dynamic workload that is a mix of read and write operations. In this work we consider four major factors and include them in the proposed model for accurate lifetime estimates.

#### 2.3.1. Consideration for Process Variations

The process variation induced change in threshold voltage in the read and write circuitry, severely alters the device lifetime as well. We simulate the MTJ voltage for typical (TT), fast (FF) and slow (SS) corners of access transistor using 22 nm predictive model.<sup>24</sup> For SS (FF) we increase (decrease) the transistor threshold voltage by 250 mV. The write and read voltages for the three corner cases are (158 mV, 337 mV, 520 mV) and (45 mV, 192 mV, 267 mV) respectively. The voltage across the MTJ increases in FF corner due to less drop across the access transistor. Figure 5(a) shows the corresponding difference between the times to failure for both read and write voltages for the three process corners. Both write time to failure (WTTF) and read time to failure (RTTF) reduces at faster process corners. Therefore, consideration to process variations is important for the lifetime estimate.

#### 2.3.2. Consideration for Data Polarity

The lifetime of the MTJ depends on the value of the data being written. Writing a ‘1’ degrades the MTJ more compared to writing a ‘0’. The reason being while writing ‘1’ the free layer and pinned layer are in anti-parallel position thus the resistance of the MTJ becomes high which causes more stress across the MTJ. Similarly, reading ‘1’ and reading ‘0’ degrades the MTJ to a different extent for



Fig. 5. (a) MTTF for purely read/write conditions for three different process corners (SS, TT and FF), (b) MTTF trend with respect to write to read AF ratio, and (c) MTJ life calculation for four different factors of read operations with respect to write.

the same obvious reasons. The corresponding difference is shown in Figure 6. We have considered all the three process corners (FF, TT, SS). The MTJ experiences the worst reliability when ‘1’ is written whereas reading a ‘0’ causes the least degradation.



Fig. 6. Lifetime comparison of MTJ for different data polarities under process variations.

### 2.3.3. Consideration for Activity Factor

Due to dynamic workload, the MTJ may not experience constant read and write operations. Therefore, only certain number of read and write operations will be performed during the workload. The read (write) AF is defined as the ratio of aggregate number of cycles for read (write) operations and total number of cycles in the workload. Figure 5(b) shows the variation of mean time to failure (MTTF) for a varying write to read ratios (write/read AF). We note that the reads tend to dominate the device lifetime only until a certain point beyond which the degradation due to writes takes over.

### 2.3.4. Consideration for Read Operations

Typically, the MTJ lifetime is calculated for write operations only. The rationale is that the writes require a higher current (and corresponding voltage drop across the MTJ) as compared to reads. The higher voltage tends to degrade the MTJ much faster than the reads. Figure 5(b) shows the MTJ lifetime for different fractions of read and write operations. The number of reads in the four scenarios are 0,  $10^7 \times$ ,  $10^8 \times$  and  $10^9 \times$  than writes respectively. It can be noted that even though degradation due to writes is low, as the number of reads increase, the overall device degradation increases faster than the only write case. Therefore, ignoring read operations could result in under-estimation of MTJ reliability for large memory accesses.

In order to obtain the MTJ lifetime, we considered both read and write operations. For simplicity we assume all writes and reads to be back-to-back and lumped together ( $W_{\text{tot}}$  and  $R_{\text{tot}}$  respectively). Then we calculate the WTTF/write latency and RTTF/read latency. With each write (read) the life of the device degrades by  $1/W_{\text{tot}} \times 100\%$  ( $1/R_{\text{tot}} \times 100\%$ ). Therefore, for a given number of reads and writes, the total amount of device life degradation is given by

$$\text{Life (\%)} = \left( 1 - \frac{\# \text{ writes}}{W_{\text{tot}}} + \frac{\# \text{ reads}}{R_{\text{tot}}} \right) \times 100 \quad (6)$$

In the case of systems that are operating under a certain AF the failure rate (FR) is expressed as

$$FR = \frac{AF_W}{WTTF} + \frac{AF_R}{RTTF} \quad (7)$$

Where  $AF_W$  and  $AF_R$  are the activity factors for write and read operation respectively. The MTTF is the inverse of the failure rate and is used as a reliability measure to estimate the device lifetime. MTTF is given by

$$MTTF = \frac{1}{FR} \quad (8)$$

Let us consider an example case where the reads and writes are evenly distributed ( $1 \times 10^{10}$  each) for a device in the FF corner. The corresponding degradation in device life is  $\approx 99.996\%$  from (6). Assuming the same system is always ON and operates with an activity factor of 0.5 for both reads and writes, the failure rate is  $\approx 1.67 \times 10^{-7}$  (using (7)). The MTTF is therefore  $\approx 69.3$  days (from (8)). This large variation of MTTF is due to the exponential relationship between  $\tau_{1/2}$  and bias voltage. The above model is used for the reliability analysis of STTRAM arrays.

## 3. RELIABILITY ENHANCEMENT TECHNIQUES

### 3.1. Static Throttling

In this technique we throttle the write current to exploit the exponential relationship between lifetime and write current. Note that this technique incurs IPC (Instruction per Cycle) loss since the write operations are slowed down for functional correctness. We use a current mirror based write driver to throttle the write current of the column if needed (Fig. 7). A reference write current  $I_{\text{ref}}$  (WR) is mirrored on the leg that is driving BL/SL. The direction of current flow is controlled by the polarity of data to be written ( $D_{\text{in}}$ ). The BL (SL) is connected to current source (VSS) if the data to be written is 1 (0). The sizing of PMOS P1 is ratioed with respect to the reference leg to generate the required write current. We add an extra PMOS transistor P2 with size  $1/k$  so that the write current is throttled when throttle signal is asserted (i.e.,  $thr = 1$ ). For nominal conditions P2 is disabled by connecting the gate to  $V_{DD}$ . The proposed driver needs 4 transistors for multiplexers and an extra PMOS to generate the throttled current. Since gate leakage is negligible and  $thr$  is a DC signal the multiplexers can be designed using minimum sized transistors. Therefore, the area overhead of the proposed boosting can be kept below 1%.

### 3.2. Dynamic Throttling

Note that the static throttling is simple to implement but it incurs performance penalty due to higher write latency. We observe that typical workloads tend to access some



Fig. 7. Current throttling circuitry used in this work.

sets more frequently than the others. Number of read and write access for different sets in SPLASH benchmarks are shown in Figures 8(a)–(b). It is evident that there is cliff between different sets in the benchmarks for read and write operation. This results in an uneven device wear-out.



Fig. 8. Distribution of number of (a) writes, and (b) reads for different sets for SPLASH benchmark suite.

Table I. Parameters used for process variation study.

| Device     | Parameter     | Mean                 | Std. dev. |
|------------|---------------|----------------------|-----------|
| Transistor | $V_{TH}$      | 0.46                 | 60 mV     |
| MTJ        | MgO thickness | 1.1 nm               | 5%        |
|            | Area          | $P_1 * 25 * 10^{-9}$ | 15%       |

The sets accessed frequently are degraded faster than the remaining sets leading to an eventual system failure even though majority of the memory cells are healthy. We also note that the most frequently sets are also common among different benchmarks (Fig. 11(d)). These two observations are exploited in this work for dynamic current throttling.

We exploit the uneven access of the sets to identify the most frequently accessed sets and throttle the write current to increase the lifetime of the system. Note that throttling the write current could be associated with performance degradation. A minimal overhead is incurred for finding the frequently accessed sets that are also fast under process variation and don't limit the clock cycle. Once identified using a test routine, we throttle the current while maintaining the same clock cycle. It is also possible that the most accessed sets fall under slow process corner. For those cases we exclude that set from the list for throttling.

In order to identify the frequently accessed sets we run few benchmarks and note down the data accesses for each set in gem5.<sup>21</sup> The results shown in later section reveal that few sets are accessed comparatively more than other sets. We select those sets to be the candidate for throttling.

### 3.3. Process Variation Analysis

Process variation results in distribution of slow and fast bits. In order to account for the process variation in the MTJ, we consider the variations in the MgO oxide barrier and surface area. In case of the access transistor, we have taken into account the threshold voltage fluctuations. The mean and standard deviation of these parameters are given in Table I. We run Monte-Carlo analysis for 5000 simulation points using the MTJ model.<sup>15</sup> We used the curve fitting model in MATLAB to extrapolate the distribution to 8 MB bits to match the cache size in this work (Fig. 9). A long tail is observed in the write latency plot. We find



Fig. 9. Write latency distribution using curve fitting model for 8 MB cache.

the number of bits beyond (mean + 4 $\sigma$ ) point and distribute them in the cache randomly. Next, we identify the sets containing the slow bits. If the most frequently accessed sets contain the slow bits, then the proposed dynamic throttling may result in IPC degradation. We propose to select only those sets for throttling from the most frequently accessed set list that don't contain the slow bits. This technique ensures lifetime improvement without compromising IPC.

## 4. SIMULATION RESULTS

### 4.1. Cache Organization

Figure 10 shows the L2 8 MB cache organization for our study. The cache is logically divided into (a) 4 banks, with each bank comprising of 2 MB ( $2^{23}$  bits); (b) each bank has 8 mats, with each mat comprising of  $2^{21}$  bits; (c) each mat has 8 sub arrays, with each sub array comprising of  $2^{18}$  bits. Every sub array contains peripheral circuitry including row decoders, column multiplexers, and output drivers. The sense amplifiers are also placed at the sub array level.



Fig. 10. Cache organization.

Table II. Processor configuration.

|               |                                                                                                                  |
|---------------|------------------------------------------------------------------------------------------------------------------|
| Processor     | ALPHA, 03, 2 GHz, 4 cores                                                                                        |
| SRAM L1 cache | Private, Icache = 16 KB, Dcache = 16 KB, 512 bit Cache Line, 2 cycle read/write latency, write back              |
| L2 cache      | Shared, 8 MB, 4 banks, 8 ways, 512 bit cache line, write back, 6 cycle read latency, 20 or 7 cycle write latency |
| Main memory   | 512 MB, DDR3, 200 cycle latency                                                                                  |

Table III. Cache design parameters (in 22 nm predictive technology).

| Cache parameters           | Values                |                      |
|----------------------------|-----------------------|----------------------|
|                            | $I_1$ (50 uA)         | $I_2$ (150 uA)       |
| Write latency              | 9.642 ns              | 3.362 ns             |
| Read latency               | 2.702 ns              | 2.771 ns             |
| Total area                 | 2.279 mm <sup>2</sup> | 2.79 mm <sup>2</sup> |
| Cell turned-on resistance  | 1.48 K $\Omega$       | 1.48 K $\Omega$      |
| Cell turned-off resistance | 3.82 K $\Omega$       | 3.82 K $\Omega$      |
| Cell aspect ratio          | 0.4                   | 0.4                  |
| Cell area                  | 40 F <sup>2</sup>     | 40 F <sup>2</sup>    |
| Read dynamic energy        | 0.813 nJ              | 0.813 nJ             |
| Write dynamic energy       | 0.369 nJ              | 0.369 nJ             |
| Leakage power              | 0.3 W                 | 0.3 W                |



Fig. 11. Analysis of different splash benchmarks w.r.t (a) IPC; (b) total energy; (c) the MTTF under the two throttling regimes; and, (d) numbers of sets dynamically throttled.

#### 4.2. Simulation Setup

We have evaluated the performance of 8 MB L2 STTRAM cache on a 4 core ALPHA processor (Table II). We have used the gem5<sup>21</sup> full system architectural simulator. Appropriate changes were made in gem5 to incorporate different read/write latency. We use NVSim<sup>22</sup> a circuit level model for non-volatile memory to estimate overall read and write latencies, area, and dynamic energies (Table III). The read and write latencies were fed to gem5 and the performance of SPLASH benchmark suite for 2 write latency (9.64 ns and 3.36 ns) with same read latency (2.69 ns) is evaluated (Fig. 11). In the same simulation the number of write operations and read operations to each set in L2 cache is also calculated. The number of MTJs ( $N_{\text{slow}}$ ) with write latency greater than  $4\sigma$  were determined from the latency distribution obtained from MATLAB. Next,  $N_{\text{slow}}$  bits are randomly distributed among the 64 million MTJs. The sets ( $S_{\text{slow}}$ ) which are found to contain the  $N_{\text{slow}}$  bits are noted. To select the target sets for throttling, the sets with highest write operations for all benchmarks are selected. We selected all the frequently accessed sets from top till we find a set which contains  $S_{\text{slow}}$ . By doing this we ensure lifetime improvement without affecting the *IPC*. The same benchmarks are re-simulated with the throttled sets for those sub arrays which contains the selected sets.

#### 4.3. Simulation Results

The simulation were performed on SPLASH benchmarks<sup>23</sup> for static write currents of 50  $\mu\text{A}$  and 150  $\mu\text{A}$  with write latency of 9.6 ns and 3.3 ns that is equivalent to 20 cycles and 7 cycles respectively at 2 GHz. The 150  $\mu\text{A}$  current corresponds to normal write current whereas 50  $\mu\text{A}$  corresponds to throttled write current (for both static and dynamic throttling). The corresponding IPC and energy (normalized to original case), and, lifetimes are plotted in Figures 11(a)–(c). The loss in IPC due to static throttling is 2% for a total lifetime improvement of  $10^{10}\times$ . The energy overhead is 1% on average (due to longer runtime).

We also note that the most frequently accessed sets are also common across the benchmarks (Fig. 11(d)). For, simplicity we have plotted the top 20 most frequently accessed sets. Two benchmarks (radiosity and fft) have all 20 sets common while the worst case is limited to six common sets. By using the above results, we implement dynamic current throttling to increase the lifetime of NVM systems.

As discussed before, the sets which are most frequently accessed and do not fall in the  $S_{\text{slow}}$  obtained from process variation are selected for dynamic throttling (called  $S_{\text{frequent}}$ ). For dynamic throttling we keep the nominal write current at 150  $\mu\text{A}$  and for sub arrays containing  $S_{\text{frequent}}$  we reduce the write current dynamically to 50  $\mu\text{A}$ . The write drivers are shared among all the sets in a sub-array and therefore the entire sub array is slowed down even if it contains one of the  $S_{\text{frequent}}$  sets. This technique will provide better results if the architecture of the memory is designed keeping the number of sets in each sub array low and increasing the sub arrays in each mat.

The IPC and energy for dynamic throttling is shown in Figures 11(a)–(b). We notice an IPC degradation of 1.3% on average and the lifetime improvement similar to static throttling ( $\sim 10^{10}\times$ ). The energy overhead is limited to 1.5% on average. Therefore, dynamic throttling provides us with the best of both worlds by not sacrificing performance while providing significant lifetime improvement.

### 5. LIFETIME DEPENDENCE ON PROCESS VARIATION AND DATA POLARITY

#### 5.1. Process Variation Dependency

Process variation results in bits that may experience more stress under dynamic workload conditions. Therefore, the reliability of the design will be bounded by the reliability of the weakest bit in the array. In order to evaluate the effects due to process variation we performed analysis for 5000 Monte Carlo simulation points and measured the voltage across the MTJ (Fig. 12). Curve fitting model is



Fig. 12. Distribution of voltage across MTJ due to process variation.



**Fig. 13.** Lifetime of MTJ for different SPLASH benchmarks under various conditions.

used to extrapolate the distribution to 8 MB cache size. The number of bits above (mean + 5 $\sigma$ ) voltage is determined and distributed in the array randomly. It is found that the bits in the most accessed sets for each benchmark experiences the highly stressed bits during write operation. Process variations can reduce the lifetime by 10<sup>11</sup> ×.

## 5.2. System Level Dependency

The lifetime of the MTJ depends on the write or read operation. The write operation requires a higher voltage than read operation and thus degrades the lifetime of the MTJ more. The lifetime not only depends on the type of operation but also on the data polarity as shown in Figure 6. We performed an in-depth analysis of these factors on SPLASH benchmarks suite and the results are shown in Figure 13. For simulations we have considered read/write of all 0's, read/write of all 1's, 50% distribution of 1's and 0's, and, read/write under process variations. The lifetime under these conditions are compared with the maximum lifetime of the MTJ when the voltage across it is 0 V (i.e., unstressed bit). It is evident that the lifetime under process variation is minimum while writing/reading 0's affects the MTJ life the least. The lifetime does not change drastically between benchmarks because each benchmarks stresses the worst-case bit similarly.

## 6. CONCLUSION

In this paper we presented a MTJ reliability model considering the activity factors, stress voltages of read and write operations, write/read data polarity and process variation. Our investigations revealed that MTJ reliability models can result in significant over- or under-estimation of lifetime without consideration to above factors. Furthermore, process variation shows worst effects on the MTJ lifetime. The proposed model is coupled with system level analysis and mitigation mechanisms using static and dynamic current throttling. The proposed techniques improved the lifetime by ~10<sup>10</sup> × at the cost of minor (1–2%) IPC degradation.

## References

- M. Hosomi, H. Yamagishi, T. Yamamoto, K. Bessho, Y. Higo, K. Yamane, H. Yamada et al., A novel nonvolatile memory with spin torque transfer magnetization switching: Spin-RAM. *IEEE International Electron Devices Meeting, 2005. IEDM Technical Digest*, IEEE (2005), pp. 459–462.
- B. Oliver, G. Tuttle, Q. He, X. Tang, and J. Nowak, Two breakdown mechanisms in ultrathin alumina barrier magnetic tunnel junctions. *J. Appl. Phys.* 95, 1315 (2004).
- A. Khan, J. Schmalhorst, A. Thomas, O. Schebaum, and G. Reiss, Dielectric breakdown in Co–Fe–B/MgO/Co–Fe–B magnetic tunnel junction. *J. Appl. Phys.* 103, 123705 (2008).
- C.-H. Ho, G. D. Panagopoulos, S. Y. Kim, Y. Kim, D. Lee, and K. Roy, A physics-based statistical model for reliability of STT-MRAM considering oxide variability, *2013 International Conference on Simulation of Semiconductor Processes and Devices (SISPAD)*, IEEE (2013), pp. 29–32.
- S. Amara-Dababi, H. Bea, R. Sousa, K. Mackay, and B. Dieny, Modelling of time-dependent dielectric barrier breakdown mechanisms in MgO-based magnetic tunnel junctions. *Journal of Physics D: Applied Physics* 45, 295002 (2012).
- Muhammad Ashraful Alam, B. E. Weir, and P. J. Silverman, A study of soft and hard breakdown-Part I: Analysis of statistical percolation conductance. *IEEE Transactions on Electron Devices* 49, 232 (2002).
- G. Panagopoulos, C. Augustine, and K. Roy, Modeling of dielectric breakdown-induced time-dependent STT-MRAM performance degradation, *2011 69th Annual Device Research Conference (DRC)*, IEEE (2011), pp. 125–126.
- C. Yoshida, M. Kurasawa, Y. M. Lee, K. Tsunoda, M. Aoki, and Y. Sugiyama, A study of dielectric breakdown mechanism in CoFeB/MgO/CoFeB magnetic tunnel junction, *2009 IEEE International Reliability Physics Symposium*, IEEE (2009), pp. 139–142.
- A. P. Ferreira, M. Zhou, S. Bock, B. Childers, R. Melhem, and D. Mossé, Increasing PCM main memory lifetime, *Proceedings of the Conference on Design, Automation and Test in Europe*, European Design and Automation Association (2010), pp. 914–919.
- L. Long, D. Liu, J. Hu, S. Gu, Q. Zhuge, and E. H.-M. Sha, A space-based wear leveling for pcm-based embedded systems, *2013 IEEE 19th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA)*, IEEE (2013), pp. 145–148.
- N. H. Seong, D. H. Woo, and H.-H. S. Lee, Security refresh: Prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping. *ACM SIGARCH Computer Architecture News* 38, 383 (2010).
- P. Zhou, B. Zhao, J. Yang, and Y. Zhang, A durable and energy efficient main memory using phase change memory technology, *ACM SIGARCH Computer Architecture News*, ACM (2009), Vol. 37, pp. 14–23.
- J. Zhang, P. M. Levy, S. Zhang, and V. Antropov, Identification of transverse spin currents in noncollinear magnetic structures. *Phys. Rev. Lett.* 93, 256602 (2004).
- A. Thiaville and Y. Nakatani, Domain-wall dynamics in nanowire-sand nanostrips, *Spin Dynamics in Confined Magnetic Structures III*, Springer, Berlin, Heidelberg (2006), pp. 161–205.
- X. Fong, Sri Harsha Choday, P. Georgios, C. Augustine, and K. Roy, SPICE Models for Magnetic Tunnel Junctions Based on Monodomain Approximation (2013).
- J. W. McPherson and H. C. Mogul, Underlying physics of the thermochemical E model in describing low-field time-dependent dielectric breakdown in SiO<sub>2</sub> thin films. *J. Appl. Phys.* 84, 1513 (1998).
- W. Oepts, H. J. Verhagen, R. Coehoorn, and W. J. M. De Jonge, Analysis of breakdown in ferromagnetic tunnel junctions. *J. Appl. Phys.* 86, 3863 (1999).
- J. Schmalhorst, H. Brückl, M. Justus, A. Thomas, G. Reiss, M. Vieth, G. Gieres, and J. Wecker, Evolution of the dielectric breakdown in Co/Al<sub>2</sub>O<sub>3</sub>/Co junctions by annealing. *J. Appl. Phys.* 89, 586 (2001).

19. Y. Wang, Y. Zhang, E. Y. Deng, J.-O. Klein, and W. S. Zhao, Compact model of magnetic tunnel junction with stochastic spin transfer torque switching for reliability analyses. *Microelectronics Reliability* 54, 1774 (2014).
20. T. Min, Q. Chen, R. Beach, G. Jan, C. Horng, W. Kula, T. Tornq *et al.*, A study of write margin of spin torque transfer magnetic random access memory technology. *IEEE Transactions on Magnetics* 46, 2322 (2010).
21. Gem5, <http://www.gem5.org>.
22. X. Dong, C. Xu, N. Jouppi, and Y. Xie, NVSim: A circuit-level performance, energy, and area model for emerging non-volatile memory, *Emerging Memory Technologies*, Springer, New York, NY (2014), pp. 15–50.
23. Splash, <http://kbarr.net/splash2>.
24. <http://ptm.asu.edu/latest.html>.

### Anirudh S. Iyengar

Anirudh S. Iyengar received his M.Sc. degree in Electrical Engineering from the University of South Florida (USF), Tampa, 2013, and is currently pursuing a Ph.D. degree in Computer Science and Eng. At The Pennsylvania State University. His research is focused towards secure and low-power circuits and systems.

### Swaroop Ghosh

Swaroop Ghosh (SM'13) received the B.E. degree (Hons.) from IIT Roorkee, Roorkee, India, in 2000, the M.S. degree from the University of Cincinnati, Cincinnati, OH, USA, in 2004, and the Ph.D. degree from Purdue University, West Lafayette, IN, USA, in 2008. He was with the Faculty of the University of South Florida from 2012 to 2016. He was a Senior Research and Development Engineer in advanced design with Intel Corporation from 2008 to 2012. Since 2016, he has been an Assistant Professor with Penn State University. At Intel, his research was focused on low power and robust embedded memory design in scaled technologies. His research interests include low-power circuits, hardware security, and digital testing for nanometer technologies. He has served in the technical program committees for the ACM/IEEE conferences, such as DAC, ICCAD, CICC, DATE, ISLPED, GLSVLSI, Nanoarch, and ISQED. He has organized the ACM/IEEE DAC Ph.D. Forum in 2015 and 2016. He was a recipient of the DARPA Young Faculty Award in 2015, the ACM SIGDA Outstanding New Faculty Award in 2016, the USF Outstanding Research Achievement Award in 2015, and the College of Engineering Outstanding Research Achievement Award in 2015. He served as an Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I and as a Senior Editorial Board Member for the IEEE JOURNAL OF EMERGING TOPICS ON CIRCUITS AND SYSTEMS (JETCAS). He served as a Lead Guest Editor for the IEEE JETCAS.

### Nitin Rathi

IP: 128.118.7.237 On: Wed, 26 Jun 2019 22:52:13

Nitin Rathi received his B.Tech degree from West Bengal University of Technology, India in 2013. Currently, he is pursuing Ph.D. in Electrical and Computer Engineering at Purdue University, West Lafayette. His primary research interests lie in low-power computing and emerging devices.