

# A Novel Energy-Efficient Sinusoidal Power Clocking-based Writing Circuitry for the hybrid CMOS/MTJ architecture

Wu Yang, Amit Degada and Himanshu Thapliyal

VLSI Emerging Design And Nano Things Security Lab (VEDANTS-Lab)

Department of Electrical Engineering and Computer Science

University of Tennessee, Knoxville, TN, USA

Email: hthapliyal@utk.edu

**Spin Transfer Torque Magnetic Random Access Memory (STT-MRAM)** offers a promising solution for low-power and high-density memory due to its compatibility with CMOS, higher density, scalable nature, and non-volatility. However, the higher energy required to write bit cells has remained a key challenge for its adaptation into battery-operated smart handheld devices. The existing low-energy writing solutions require additional complex control logic mechanisms, further constraining the available area. In this research, we propose a solution to design energy-efficient write circuits by incorporating two techniques together. First, we propose the sinusoidal power clocking mechanism replacing the DC power supply in the conventional CMOS design. Second, we propose three LUT-based control logic circuits and one write circuit to reduce the area and further minimize energy dissipation. The experimental results are verified over the case study implementations of  $4 \times 4$  STT-MRAM macro designed using bit cell configurations: (i) 1T-1MTJ and (ii) 4T-2MTJ. The post-layout simulation for the frequency range from 250 kHz to 6.25 MHz, the write circuit, which uses the proposed LUT-based control logic circuits and a write driver with a sinusoidal power supply, shows a more than 65.05% average energy saving compared to the CMOS counterpart. Furthermore, the write circuit, which uses the proposed 6T write driver with the sinusoidal power supply, shows an improvement in energy saving by more than 70.60% compared to the CMOS counterpart. We also verified that the energy-saving performance remains relatively consistent with the change in temperature and the Tunnelling Magnetoresistance (TMR) ratio.

**Index Terms**—Magnetic Tunnel Junction (MTJ), Spin Transfer Torque (STT), hybrid CMOS/MTJ Circuits, Write Circuits, Energy Efficient Write Circuits

## I. INTRODUCTION

In recent years, we have seen an enormous rise in Internet-of-Things (IoT) devices, changing how individuals and communities interact and exchange information. Furthermore, the development in cloud and edge computing technology has sped up the usage of smart hand-held devices in everyday applications, e.g., smart healthcare, wearable, etc. These devices are usually portable and have a finite energy budget. Therefore, energy consumption is a key design metric in hardware design [1]. Spintronics-based novel emerging nanotechnology devices, e.g., Magnetic Tunnel Junction (MTJ) and Ultra-Thin Body-Silicon on Insulator (UTB-SOI), show great potential for low-energy circuits. The MTJs offer significant advantages, such as high density, higher endurance, scalability, and compatibility with CMOS. Researchers have explored hybrid CMOS/MTJ circuits to design in-memory applications, such as lightweight cryptography circuit PRESENT-80 and image processing applications in recent years [2]–[7].

The hybrid CMOS/MTJ architecture (Fig. 1) is comprised of three primary circuits: (i) a write circuit to change the spin orientation in MTJ, (ii) a CMOS transistor and MTJ-based circuit implementing the logical function, (iii) a sense amplifier to read the logic output. The MTJ stores the "logic 1" or "logic 0" information with two distinct resistive states. Spin Transfer Torque (STT) is a promising switching mechanism for the MTJ state with a size smaller than 100nm. The magnetization of the free layer can be switched using a CMOS write circuit



Fig. 1: The top-level view of the hybrid CMOS/MTJ circuit, where our focus is on the write circuit.

by injecting a relatively low bi-directional current through MTJ [8] [9]. In academia as well as industry, the STT mechanism is widely explored compared to its counterparts such as Spin-Orbit Torque (SOT), Field Induced Magnetic Switching (FIMS), and Voltage-Controlled Magnetic Anisotropy (VCMA) [10]–[13]. Because of the simple architecture, the STT mechanism for MTJ can be easily integrated into STT-MRAM (STT-Magnetic Random Access Memory) and Logic-In-Memory (LIM) design. Despite the suitability of the MTJ device to design low-energy circuits, the writing current has remained a key challenge for the researchers [14] [15]. Therefore, designing an energy-efficient writing mechanism for the MTJ is an intriguing research direction.

TABLE I: Frequency range in healthcare applications [16].

| Reference | Medical Application                                | Frequency range of operation |
|-----------|----------------------------------------------------|------------------------------|
| [17]      | Low frequency inductive implants (pacemakers, ICD) | Less than 200 kHz            |
| [18] [19] | Implant communication                              | 9 - 315 kHz                  |
| [20]      | Bioelectrical impedance meter                      | 50 kHz, 250 kHz              |
| [21]      | Electrical Impedance Myography (EIM)               | 50 kHz                       |
| [22]      | CMOS wearable non-invasive impedance meter         | 100 Hz to 1 MHz              |
| [23]      | Hearing Aid                                        | 32 kHz to 8.00 MHz           |
| [24] [25] | Low data-rate Body Couple communication (BCC)      | 10 kHz to 10 MHz             |
| [26]      | Home Health Hub                                    | 200 kHz to 1.0 MHz           |

The conventional writer circuit consists of a write driver configuring the MTJ into the desired state and a control logic controlling the write driver. The write driver keeps applying the writing current through the MTJ even after switching the MTJ is completed, which leads to higher energy dissipation. In the literature, the researchers have explored a self-terminated write circuit that ceases the flow of the writing current just after MTJ switching completion [27], [28]. However, the self-terminated writing circuit requires additional circuitry, e.g., a write completion detector and write terminator, that may not be suitable for area-constrained smart hand-held devices. Therefore, designing an energy-efficient writing mechanism is an intriguing research problem for commercial success.

The IoT devices are often area and resource-constrained. Previously, we proposed a novel sinusoidal power-clocking-based sense amplifier circuit that shows nearly 70% energy saving for the case study implementation of the STT-MRAM [29]. In this work, we approach the problem in two facets. First, we propose a sinusoidal power-clocking mechanism for the writing circuit. Second, we offer novel designs of the write circuit. We performed energy-saving comparisons for the case study implementation of  $4 \times 4$  STT-MRAM macro using two popular bit cell configurations: (i) 1T-1MTJ (1 transistor and 1 MTJ) and (ii) 4T-2MTJ (4 transistors and 2 MTJs). Implementing the write circuit using the proposed designs together results in a minimum of 65.14% and a maximum of 71.98% energy saving compared to the CMOS counterpart. The simulation was performed using Cadence Virtuoso with the TSMC 65nm CMOS and 40nm Perpendicular Magnetic Anisotropy (PMA) Magnetic Tunnel Junctions (MTJs). The frequency range of the application considered in this work ranges from 250 kHz to 6.25 MHz. Table I lists some of the applications of this work. Recently, technological advancements have made the previous generation of bulky impedance meters and impedance myography devices into pocket-size battery-operated devices.

#### A. Key Contribution From This Work

The key contribution of this work is as follow:

- We investigated the energy efficiency of the sinusoidal power-clocking mechanism for the STT switching-based hybrid CMOS/MTJ circuits.

- We developed three Look-Up Table (LUT) based control logic, which are CMOS-based, 2MTJ-based, and 1MTJ-based control logic.
- We also developed the 6T write driver, which combined the control logic and write driver into one circuit.
- We presented case studies using the proposed designs and the sinusoidal power supply for the  $4 \times 4$  STT-MRAM macro.
- We conduct a comparative analysis of the proposed designs against their CMOS counterpart with post-layout simulations.
- We demonstrate the viability and energy efficiency of the  $4 \times 4$  STT-MRAM macro using the sinusoidal power supply with the proposed design.

#### B. Organization Of The Paper

This paper is organized as follows: Section II explains the background of MTJs, conventional CMOS write circuit, and two bit cell designs for STT-MRAM macro. Section III explains the proposed circuit designs and their functioning during the writing operations. Section IV discusses the experimental setup and post-layout simulation results for the  $4 \times 4$  STT-MRAM macro with the proposed write circuit. Section V is the conclusion of the paper.

## II. BACKGROUND

The Magnetic Tunnel Junction (MTJ) has emerged as a design option to overcome the constraints of CMOS technology, reduce leakage, and improve energy efficiency. MTJ-based memories allow around ten years of retention time, making them suitable for the battery-operated device. This section aims to describe the background information of the MTJ, the existing write circuit, and two popular bit-cell design choices in literature.

#### A. Magnetic Tunnel Junction

The Magnetic Tunnel Junction (MTJ) mainly consists of a free ferromagnetic layer (*CoFeB*) and a fixed ferromagnetic layer (*CoFeB*) separated by the insulating layer (*MgO*). In this paper, we used spin-transfer torque Perpendicular Magnetic Anisotropy(PMA) MTJ, as shown in Fig. 2, and its critical parameters are listed in Table II [30]. The orientation of the two ferromagnetic (FM) layers' magnetization determines the state of the MTJ. Each state of the MTJ shows unique resistive characteristics: i) Parallel (P) state where the two layers' orientation in the same direction exhibits lower resistance. ii) Anti-Parallel (AP) state where the orientation of the two layers pointing in the opposite direction shows higher resistance. Additionally, the free layer maintains the magnetization direction even when the power is removed [31]–[35].

During the STT writing process, the orientation of the free FM layer changes based on the direction of the current flowing through it. In comparison, the orientation of the fixed FM layer remains unchanged. Therefore, this feature enables altering the MTJ's orientation from P-state to AP-state or vice versa. The difference in resistance between the P-state and AP-state plays

TABLE II: Critical parameters of MTJ

|                         |             |
|-------------------------|-------------|
| Surface                 | 40nm x 40nm |
| Oxide barrier thickness | 0.85nm      |
| Free layer thickness    | 2nm         |
| TMR                     | 150%        |
| Resistance area product | $10^{-11}$  |



Fig. 2: Structure of Magnetic Tunnel Junction (MTJ): the two states of perpendicular magnetic anisotropy (PMA) MTJ: Anti-parallel (AP) and Parallel (P).

an important role in designing the read margin. The higher separation between them results in a reliable reading process. The Tunneling Magnetoresistance (TMR) ratio, shown in Equation (1), is one of the key parameters describing the resistance difference between the P-state and AP-state. The resistance characteristic depends on each layer's material. The values of  $R_P$  and  $R_{AP}$  are in the  $\text{k}\Omega$  range. In our experiments, the TMR is set to 150% until specified.

$$TMR = \frac{R_{AP} - R_P}{R_P} \quad (1)$$

### B. Conventional Write Circuit

The write driver injects spin-polarized current through the STT-MTJ to configure the STT-MTJ into the desired state, and the top-level view of the write circuit in the prior work is shown in Fig. 3. It consists of a control logic and a write driver, as shown in Fig. 4 [36]–[39]. The control logic requires Write Enable ( $WE$ ) and input data signal to generate  $V_1$ ,  $\bar{V}_1$ ,  $V_2$ , and  $\bar{V}_2$  signals for controlling the write driver. When the writing process is not performed, the control logic sets  $\bar{V}_1$  and  $\bar{V}_2$  to a logic value of '1' and  $V_1$  and  $V_2$  to a logic value of '0' to turn OFF all the transistors in the write driver. Once the  $WE$  pulse is applied, the control circuit flips a pair of outputs based on input data, enabling the write driver. The input data determines the direction of the current flow. To set the MTJ into AP-state, the control logic sets  $V_1$ ,  $\bar{V}_1$ ,  $V_2$ , and  $\bar{V}_2$  to logic values of '1', '0', '0', and '1', respectively. Therefore, the write driver applies the current through the MTJ from the fixed layer to the free layer. Similarly, configuring the MTJ into the P-state by setting  $V_1$ ,  $\bar{V}_1$ ,  $V_2$ , and  $\bar{V}_2$  to logic values of '0', '1', '1', and '0'. The energy dissipated by configuring the MTJ is given by Equation 2:

$$E_{dc} = \int_{\tau_0}^{\tau_1} I \times V_{dd} dt = \int_{\tau_0}^{\tau_1} \frac{V_{dd}^2}{R} dt \quad (2)$$

Where  $V_{dd}$  is the full swing of the clock,  $R$  is the resistance of the MTJ and transistors,  $\tau_0$  is the time when writing operation begins,  $\tau_1$  is the end time of the writing operation.



Fig. 3: Top-level view of the write circuit in the prior work.



(a) Existing control logic circuit.



(b) Existing write driver

Fig. 4: The write circuit in the prior work [36]–[38].

### C. Memory Bit Cells In STT-MRAM

The bit cell is the primary unit for memory that stores 1-bit data. Fig. 5 shows the structure of the 1T-1MTJ and 2T-2MTJ, where transistors are included as access transistors. When reading or writing the particular bit cell, the corresponding World Line (WL) signal is applied to the access transistors to select the cell. Once the reading or writing operation is completed, the WL will be set to a logic value of '0'. The stored data in the MTJ could be extracted with the traditional pre-charge sense amplifier (PCSA). When performing a read operation on the 1T-1MTJ, the PCSA applies current through the bit cell (1T-1MTJ) and reference cells to sense and compare them. The approach of reading 2T-2MTJ differs from 1T-1MTJ; the PCSA only applies current through the bit cell (2T-2MTJ) to sense and compare the MTJs within the bit cell (MTJ1 and MTJ2).

### III. PROPOSED DESIGN: SINUSOIDAL CLOCKING BASED WRITE CIRCUIT

When configuring the bit cell during the writing operation, the write driver supplies a continuous current, around tens of  $\mu\text{A}$ , through the MTJ [40], [41]. Additionally, IoT devices typically operate at lower frequencies, increasing the period



Fig. 5: The structure of 1T-1MTJ bit cell and 4T-2MTJ bit cell (added 2 PMOS for sufficient writing current).

of writing times. Hence, the extended duration leads to higher energy dissipation. To reduce the energy dissipation during the writing operation of the bit cell, we explore the sinusoidal power supply with the write driver. The sinusoidal power supply enables the write driver to gradually vary the current supplied through the bit cell during the writing operation rather than supplying a constant high current. The sinusoidal wave swing from  $GND$  to  $V_{dd}$  which can be expressed as  $F(t) = V_{dd} * 1/2(\sin(t) + 1)$ . Therefore, the energy dissipation of the write driver with the sinusoidal power supply is calculated by Equation 3.

$$E_{sin} = \int_{\tau_0}^{\tau_1} \frac{1}{R} \left( \frac{V_{dd}}{2} (\sin(t) + 1) \right)^2 dt \quad (3)$$

The equation  $\sin(t) + 1 = 2$  is true if and only if  $t = K\pi/2$ , where  $K$  is an odd integer. Otherwise,  $\sin(t) + 1$  is always less than 2. Therefore, we can compare the right-hand sides of Equation 3 and 2 as shown in Equation 4. Equation 4 indicates that the sinusoidal power supply consumes less energy than the constant DC power supply. Further, we can also estimate the savings by subtracting the area of power during the writing operation. In Fig. 6, the red line indicates the power using a constant DC power supply, and the yellow line indicates the power using the sinusoidal power supply. Therefore, the area with a green dashed line represents the energy saving during the writing operation.

$$\frac{1}{R} \left( \frac{V_{dd}}{2} (\sin(t) + 1) \right)^2 < \left( \frac{1}{R} \left( \frac{V_{dd}}{2} \times 2 \right)^2 \right) = \frac{V_{dd}^2}{R} \quad (4)$$

#### A. Proposed Write Circuit Using Novel Look-up Table-based (LUT) Control Logic and Write Driver Interfaces With A Sinusoidal Power Supply

To achieve further reductions in energy consumption and size of the circuit for area and resource-constrained IoT devices, we utilize the pre-charge-based for designing the control logic for the write circuit. The top-level view of our proposed write circuit is illustrated in Fig. 7. The proposed



Fig. 6: The estimated energy saving of the sinusoidal power supply when compared to the constant DC power supply.



Fig. 7: The top-level view of the proposed write circuit with LUT-based control logic.

write circuit includes a COMS-base LUT control logic with a constant DC power supply ( $V_{dd}$ ) and a write driver using a gradually varying sinusoidal power supply. By integrating pre-charge logic with MTJ, we can implement a Look-up Table (LUT) based logic, leading to our proposed LUT-based control logic. In this section, we explain the proposed circuits and their operations with respect to the time stamps. Firstly, we introduce the three novel LUT-based control logic and the write driver with a sinusoidal power supply. Secondly, we explored further reductions in the area, leading to the proposed 6T write driver design.

##### 1) Proposed CMOS-based LUT Control Logic

Table III presents the truth table of the control logic of the write circuit. When the Write Enable ( $WE$ ) signal is at the logic value of '0', the outputs  $\bar{V1}$  and  $\bar{V2}$  are set to the logic value of '1', while  $V1$  and  $V2$  are set to the logic value of '0', effectively turning off all transistors in the write driver. Therefore,  $WE$  can be utilized as a pre-charge/sensing signal, and  $Data$  can serve as the input signal. Fig. 8 shows the schematic of the proposed CMOS-based LUT control logic. It consists of the pre-charge sense amplifier, NMOS logic, a footer, and two inverters to generate the required signal for controlling the write driver. Since input  $Data$  and its complementary  $\bar{Data}$  are required, an additional inverter is needed to convert the input signals to ensure proper operation of the circuit. The CMOS-based LUT control logic operates

TABLE III: Truth table of the control logic for the 4T write driver

| WE | Data | V1 | $\bar{V1}$ | V2 | $\bar{V2}$ |
|----|------|----|------------|----|------------|
| 0  | X    | 0  | 1          | 0  | 1          |
| 1  | 0    | 1  | 0          | 0  | 1          |
| 1  | 1    | 0  | 1          | 1  | 0          |



Fig. 8: The schematic of the proposed CMOS-based LUT control logic.

in two stages: i) during the pre-charge stage, both output  $\bar{V1}$  and  $\bar{V2}$  are charged to the logic value of ‘1’, and  $V1$  and  $V2$  are set to the logic value of ‘0’. ii) during the evaluation stage, the circuit evaluates and sets the output  $V1$  or  $\bar{V2}$  to a logic value of ‘0’, and their complementary to a logic value of ‘1’. Meanwhile, the other output and its complementary remain unchanged. This two-stage operation ensures the generation of the proper control signals for controlling the write driver while reducing the area of the write circuit.

### 2) Proposed 2MTJ-based LUT Control Logic

We explored a hybrid CMOS/MTJ control logic to eliminate the need for the extra inverter in each control logic. The resistance characteristic of the MTJ allows us to realize the LUT-based circuit where two MTJ stores input data and its complementary, respectively. Therefore, we could replace the  $N_2$  and  $N_3$  in the CMOS-based LUT design with MTJs. However, for the hybrid CMOS/MTJ control logic to operate correctly, the MTJs are required to be set into the desired state. The MTJs are set to a fixed state and placed a bypass transistor to assist the evaluation to avoid the recursive problem, as shown in Fig. 9. The proposed 2MTJ-based LUT control logic sets the MTJ1 to the P-state and MTJ2 to the AP-state by applying the logic value of ‘1’ to  $P_0$  and  $P_3$ , and the logic value of ‘0’ to  $N_3$ . Since the resistance of the STT-MTJ ( $R_{MTJ}$ ) ranges from a few  $k\Omega$  to tens of  $k\Omega$ , two possible cases arise during the evaluation stage: i) when  $N_2$  is turned ON,  $R_{MOSFET} = R_{ON} < R_{MTJ}$ , setting  $\bar{V1}$  and  $V2$  to be logic value of ‘0’, and  $V1$  and  $\bar{V2}$  to be logic value of ‘1’; ii) when  $N_2$  is turned OFF,  $R_{MOSFET} = R_{OFF} > R_{MTJ}$ , setting  $\bar{V1}$  and  $V2$  to be logic value of ‘1’, and  $V1$  and  $\bar{V2}$  to be logic value of ‘0’.

### 3) Proposed 1MTJ-based LUT Control Logic

To simplify the design and minimize the area, we remove MTJ2 from the 2MTJ-based LUT design. Fig. 10 shows the



Fig. 9: The schematic of the proposed 2MTJ-based LUT control logic.

schematic of our proposed 1MTJ-based LUT control logic and necessary input signals, namely  $WE$  and  $Data$ . The proposed 1MTJ-based LUT control logic has the same evaluation mechanism as the 2MTJ-based LUT design: i) when the  $Data$  is logic value of ‘1’,  $R_{MOSFET} = R_{ON} < R_{MTJ}$ , and set the  $\bar{V1}$  and  $V2$  to be logic value of ‘0’, and  $\bar{V2}$  and  $\bar{V1}$  to be logic value of ‘1’; ii) when  $Data$  is logic value of ‘0’,  $R_{MOSFET} = R_{OFF} > R_{MTJ}$ , setting the  $\bar{V1}$  and  $V2$  to be logic value of ‘1’, and  $V1$  and  $\bar{V2}$  to logic value of ‘0’. Moreover, the control logic will operate correctly regardless of the STT-MTJ state.



Fig. 10: The schematic of the proposed 1MTJ-based LUT control logic.

### 4) The Operation Of The Write Circuit Using Our Proposed LUT-based Control Logic And The Write Driver Interfaces With A Sinusoidal Power Supply

The CMOS-based LUT control logic stands out among the proposed designs because it has less process variation and does not require back-of-line (BEOL) processes. On the other hand, the MTJ-based LUT control logic offers a reduction in area compared to the CMOS-based LUT control logic. Between the two MTJ-based LUT control logic, the 1MTJ-based LUT control logic emerges as the optimized design, providing a minimized and robust design. All proposed designs have two stages of operation: the pre-charge stage and the evaluation stage. The details of the pre-charge stage are as follows:



Fig. 11: The operation of the write driver interfaces with a sinusoidal power supply.

- **Address & Pre-charge:** At the time  $t = t_0$ , the write circuit receives input signal *Data* and *WE*. The input signal, *WE*, is set to be the logic value of '0' that turns ON the LUT-based control logic's  $P_0$ ,  $P_3$ , and OFF  $N_3$ . As a result,  $\bar{V}1$  and  $\bar{V}2$  are charged to the logic value of '1', while  $V1$  and  $V2$  are set to be the logic value of '0'. It should be noted that these output signals deactivate the write driver since all the transistors in the write driver are turned OFF.

Control signals are generated and applied to the write driver during evaluation. The operation is detailed below:

- **Generating Control Signals:** At the time  $t = t_1$ , a positive pulse of the *WE* is applied to the LUT-based control logic. This triggers the evaluation process, causing  $P_0$  and  $P_3$  to turn OFF and  $N_3$  to turn ON. Since  $N_2$  is already ON, it results in a faster discharge of  $\bar{V}1$  to GND compared to  $\bar{V}2$ . Consequently, the LUT-based control logic sets  $\bar{V}1$  and  $V2$  to a logic value of '0' and  $\bar{V}2$  and  $V1$  to a logic value of '1'. Further, these control signals are applied to the write driver so that the transistor  $P_1$  is turned ON while  $P_2$ ,  $N_2$ , and  $N_1$  remain OFF.
- **Pre-Writings:** At the time  $t_1 < t < t_2$ , the voltage of the power clock ( $V_{pc}$ ) starts rising from *GND* to  $V_{dd}$ . Notably,  $P_1$  of the write driver remains off as the source to gate voltage ( $V_{sg}$ ) is below the PMOS's threshold voltage ( $V_{thp}$ ), as shown in Fig. 11 (a).
- **Writings:** At the time  $t = t_2$ , the  $V_{pc}$  reaches  $V_{thp}$ , activating  $P_1$ . Consequently, the write driver applies current through the bit cell from  $P_2$  to  $N_2$ . Further, the current gradually increases as the  $V_{pc}$  rises during time  $t_2 < t < t_3$ , as shown in Fig. 11 (b).
- **Completion:** At the time  $t = t_3$ , the  $V_{pc}$  drops below the  $V_{thp}$ , deactivating  $P_1$  in the write driver. As a result, the write driver stops applying the writing current and completes the writing process.

Fig. 12 depicts the waveform of the write circuit using the LUT-based circuit and the write driver with a sinusoidal power supply. The write circuit performs two writing operations, firstly writing a logic value of '1' and then writing a logic value of '0' to the MTJ. Our proposed write circuit has a lower transistor count than prior work. For example, the CMOS counterpart has 24 transistors. Our proposed write circuit with



Fig. 12: The waveform of the proposed write circuit using the LUT-based control logic, and a write driver interfaces with a sinusoidal power supply.

CMOS-based, 2MTJ-based, and 1MTJ-based LUT control logic has 19, 17, and 16 transistors, respectively.

#### B. Proposed Write Circuit Using Novel 6T Write Driver Interfaces With A Sinusoidal Power Supply

To achieve further reductions in energy dissipation and area, we have merged the control logic and write driver into one circuit. The proposed write circuit comprises inverters and a 6T write driver with a sinusoidal power supply, as shown in Fig. 14. We incorporated the header and footer transistor into the conventional write driver, leading to the proposed 6T write driver as illustrated in Fig. 15. The footer and header transistors disconnect or connect the write driver from or to the power supply and *GND*. Therefore, we can enable or disable the writing process by controlling the input signals *WE* and  $\bar{WE}$ . Also, we set the *Data* and  $\bar{Data}$  to be the control signal for the write driver and remove the need for the control logic. The extra inverter is included to supply the needed signal and *Data*, for the proposed 6T write driver.



Fig. 13: The operation of the 6T write driver interfaces with a sinusoidal power supply.



Fig. 14: The top-level view of the proposed write circuit with 6T write driver.



Fig. 15: The proposed 6T write driver interfaces with a sinusoidal power supply.

The 6T write driver design has a less complex writing operation, and Fig. 16 shows the writing operation of the proposed design. The detail is explained as follows:

- **Address** At the time  $t = t_1$ , the  $WE$  is set to a logic value of '0' that the footer and header transistors are turned OFF, disconnecting the write driver from the sinusoidal power supply and  $GND$ . Meanwhile, the positive  $Data$  pulse is applied, turning the write driver's  $N_2$  ON and  $N_1$  OFF. Further,  $P_1$  and  $P_2$  are turned OFF as the  $V_{sg}$  of the PMOS is below the  $V_{thp}$ . Notice there is no conducting path since the  $P_3$  and  $N_3$  are OFF.



Fig. 16: The waveform of the proposed write circuit using the proposed 6T write driver interfaces with a sinusoidal power supply.

- **Pre-Writing:** At the time  $t = t_1$ , a positive pulse  $WE$  signal is applied to the circuit that activates the footer transistor  $N_3$ . However, the header transistor  $P_3$  remains OFF since  $V_{pc}$  is at  $GND$  due to the  $V_{sg}$  being less than the  $V_{thp}$ . Therefore, the write driver will not write the bit cell during time  $t_1 < t < t_2$ , as depicted in Fig. 13 (a).
- **Writings:** At the time  $t = t_2$ , the  $V_{pc}$  reaches  $V_{thp}$ , activating  $P_3$  and  $P_1$  accordingly. Therefore, the write driver applies current flowing from transistors  $P_3$  and  $P_1$  through the bit cell, transistors  $N_2$  and  $N_3$ , as shown in Fig. 13 (b). Similarly to the LUT-based write circuit, the writing current gradually increases as the  $V_{pc}$  rises during time  $t_2 < t < t_3$ , as shown in Fig. 16.
- **Completion:** At the time  $t = t_3$ , the  $V_{pc}$  falls below  $V_{thp}$ , causing transistors  $P_3$  and  $P_1$  in the 6T write driver to deactivate. Consequently, the write driver terminates the writing process.

Once the writing operation is completed, the  $WE$  is set to

a logic value of ‘0’. Hence, the footer and header transistors close the path to the  $GND$  and  $V_{pc}$ . Compared to the prior work, the proposed 6T write driver design reduces the energy dissipation and area at the trade-off of the longer writing time.

#### IV. SIMULATION AND RESULTS

The section compares the energy dissipation of the four proposed write circuits in the memory design against the prior work (listed in Table IV) with the post-layout simulations. The larger memories can be implemented using a smaller macro and necessary controlling circuits. We constructed a  $4 \times 4$  STT-MRAM macro using each of the proposed prototype write circuits described in the previous sections. The energy dissipation of the proposed designs is compared to the prior work [36]–[38] for the frequency range from 250 kHz to 6.25 MHz. Furthermore, we also present the effect of temperature and TMR variations on the energy dissipation for the case study implementations.

TABLE IV: Case study: building the CLB with the adiabatic logic-based LUT and the different memory cells.

| Write Circuit        | Control Logic                         | Write Driver                              |
|----------------------|---------------------------------------|-------------------------------------------|
| Prior Work [36]–[38] | Conventional Control Logic            | Conventional 4T Write Driver ( $V_{dd}$ ) |
| CM-LUT               | Proposed CMOS based LUT               | Conventional 4T Write Driver ( $V_{pc}$ ) |
| 2M-LUT               | Proposed 2MTJ based LUT               | Conventional 4T Write Driver ( $V_{pc}$ ) |
| 1M-LUT               | Proposed 1MTJ based LUT               | Conventional 4T Write Driver ( $V_{pc}$ ) |
| 6T-WD                | Proposed 6T Write Driver ( $V_{pc}$ ) |                                           |

##### A. Case Study Implementation: $4 \times 4$ STT-MRAM Macro

The  $4 \times 4$  STT-MRAM consists of the row and column decoder, write circuit, and bit cells. The decoders select a particular row of bit cells by translating the address into a row and column. It is important to note that the column decoder usually decodes the higher-order address bits. Fig. 17 shows the architecture of  $4 \times 4$  STT-MRAM considered for the case study implementations in this article. Such macros can be used to construct a bigger size of the memory. Considering the smaller number of overall memory bit cells, the column decoder is unnecessary and dropped from the design. Furthermore, we need a sense amplifier circuit for the reading operations, which can later be used to verify the correctness of the data in the memory. The writing process begins by selecting the row to be written, enabling the writing process. The row decoder translates the address into corresponding logic assertion on the word line signals  $WL$  and  $\bar{WL}$  to select a particular row that consists of 4 distinct bit cells. The control signal generator provides Write Enable ( $WE$ ) to the write circuit for the writing operations.

We created two different  $4 \times 4$  STT-MRAM designed using two different bit cells: (i) 1T-1MTJ and (ii) 4T-2MTJ for the four proposed write circuits and their counterparts for



Fig. 17: Case Study implementation of the  $4 \times 4$  Spin-Transfer Torque Magnetic Random Access Memory (STT-MRAM).

the energy dissipation comparison. To capture more realistic results, we layout the designs in Table IV and perform the post-layout simulations. The layouts of the basic circuits in  $4 \times 4$  STT-MRAM are shown in Fig. 18. Fig. 18 (a) and (b) present the 4T-2MTJ and 1T-1MTJ bit cell, and Fig. 18 (c) presents the pre-charge sense amplifier. The writing circuits of the proposed designs are presented in Fig. 18 (d) - (g), while the write circuit of the prior work is shown in Fig. 18 (h). Also Fig. 18 (i) shows the basic block of the  $4 \times 4$  STT-MRAM. We used TSMC 65nm for the CMOS and the Verilog-A model for MTJ [30] in SPICE simulation software. The load capacitance was considered to be 10 fF. In our work, we measure the energy dissipation in terms of energy per cycle, which is an average energy dissipation, by writing all possible binary values to the selected row. Further, we enable the stochastic thermal noise fluctuations of the STT-MTJ with a switching duration that follows a Gaussian distribution with a variation of 3%. The read-after-write process was repeated for all possible inputs. We also measured the effect of the temperature and TMR variations on the energy-saving performance.

Table V shows the area and area savings of the proposed write circuits compared to the prior work (shown in Table IV). The three proposed LUT-based write circuit layouts have the

TABLE V: Area (in  $\mu m^2$ ) comparison for the proposed write circuit compared with prior work.

| Write Circuit        | Height | Width | Area (Saving in %) |
|----------------------|--------|-------|--------------------|
| Prior Work [36]–[38] | 3.57   | 5.75  | 20.53              |
| CM-LUT               | 3.57   | 5.05  | 18.03 (24.81%)     |
| 2M-LUT               | 3.57   | 4.06  | 14.49 (39.55%)     |
| 1M-LUT               | 3.57   | 4.06  | 14.49 (39.55%)     |
| 6T-WD                | 3.275  | 2.70  | 8.840 (63.12%)     |



Fig. 18: The layouts of the reading circuit, bit cells and writing circuit including proposed designs and counterpart: (a) 4-bit 4T-2MTJ bit cell, (b) 4-bit 1T-1MTJ bit cell, (c) pre-charge sense amplifier, (d) proposed CM-LUT, (e) proposed 1M-LUT, (f) proposed 2M-LUT, (g) proposed 6T-WD, (h) prior work, (i) one row (1 x 4) of 4 x 4 STT-MRAM

same height as the prior work but a shorter width, resulting in a smaller area compared to the prior work. This results in 24.81%, 39.55%, and 39.55% of the area saving for the CM-LUT, 2M-LUT, and 1M-LUT write circuits, respectively. Furthermore, the proposed 6T-WD has less than half the width of the prior work and is shorter in height. Therefore, it has the smallest area among all the designs, which saves 63.12% of the area compared to prior work.

#### B. Energy Dissipation Comparison For $4 \times 4$ STT-MRAM Macro Using 1T-1MTJ Bit Cell

The 1T-1MTJ is the most widely used bit cell in STT-MRAM due to its simple design, higher density, and less complex operation. Table VI lists post-layout simulation results for energy dissipation comparison of the write circuits in  $4 \times 4$  STT-MRAM for the frequency range of 250 kHz to 6.25

TABLE VI: Energy per cycle (in pJ) performance comparison for proposed sinusoidal clocking writing circuitry for  $4 \times 4$  STT-MRAM macro using 1T-1MTJ bit cell.

| Frequency, in Hz            | 250k  | 500k  | 1M    | 1.5M  | 5M    | 6.25M |
|-----------------------------|-------|-------|-------|-------|-------|-------|
| <b>Prior Work [36]–[38]</b> | 747.8 | 373.9 | 187.0 | 124.7 | 37.44 | 29.96 |
| <b>Proposed CM-LUT</b>      | 260.5 | 130.3 | 65.14 | 43.43 | 13.05 | 10.44 |
| <b>Proposed 2M-LUT</b>      | 260.2 | 130.1 | 65.08 | 43.39 | 13.04 | 10.44 |
| <b>Proposed 1M-LUT</b>      | 260.2 | 130.1 | 65.06 | 43.38 | 13.03 | 10.43 |
| <b>Proposed 6T-CW</b>       | 209.7 | 104.9 | 52.44 | 34.96 | 10.49 | 8.397 |



Fig. 19: Energy per cycle saving (%) comparison at different frequency for the macro using 1T-1MTJ bit cell.

MHz. Also, the temperature and the TMR value are set to 25 °C and 150%, respectively. The proposed write circuits have significant improvements in terms of energy dissipation. For example, at 250 kHz, the prior work consumes 747.8 pJ, and the proposed CM-LUT, 2M-LUT, and 1M-LUT have energy dissipation of 260.5 pJ, 260.2 pJ, and 260.2 pJ, respectively. Also, the energy dissipation for the proposed 6T-WD shows the lowest energy consumption among all the designs, which is 209.7 pJ at 250 kHz. In the interested frequency range from 250 kHz to 6.5 MHz, the average energy dissipation of the prior work is 250.1 pJ. Similarly, the average energy dissipations for the proposed CM-LUT, 2M-LUT, 1M-LUT, and 6T-WD are 87.1 pJ, 87.0 pJ, 87.0 pJ, and 70.1 pJ, respectively.

It is essential to verify that the energy-saving characteristic should remain the same over the frequency range of operation for the proposed circuit with a sinusoidal power supply compared to the prior work with a DC power supply. Fig. 19 shows the energy savings for the proposed design compared to the prior work. The average energy savings in the proposed CM-LUT, 2M-LUT, 1M-LUT, and 6T-WD are 65.16%, 65.19%, 65.20%, and 71.96%, respectively. It is important to note that the energy saving of the proposed circuits remains relatively constant in the interested frequency ranges.

Temperature variations can have a significant effect on memory performance. An increase in temperature leads to a reduction in the bandgap of a semiconductor material, which results in a variation in energy dissipation. In this experiment, we varied the temperature from  $-50^{\circ}\text{C}$  to  $125^{\circ}\text{C}$  for the frequency value of 1.5 MHz and the TMR value of 150%. Table VII shows the write circuits' post-layout energy dissipation in the  $4 \times 4$  STT-MRAM when temperature varies. All of the designs experience increased energy dissipation as the temperature rises. For example, from  $-50^{\circ}\text{C}$  to  $125^{\circ}\text{C}$ , the proposed CM-LUT experiences a 1.85 pJ increase. Similarly, the proposed 2M-LUT and 1M-LUT experience an increase of 1.67 pJ and 1.58 pJ, respectively. The proposed 6T-WD also experiences a smaller increase, 0.19 pJ. Overall, the results indicate that higher temperatures lead to higher energy consumption in the write circuits of the  $4 \times 4$  STT-MRAM.

TABLE VII: Effect of temperature variations on energy per cycle (in pJ) performance for  $4 \times 4$  STT-MRAM macro using 1T-1MTJ bit cell.

| Temperature, in °C          | -50   | 0     | 25    | 75    | 125   |
|-----------------------------|-------|-------|-------|-------|-------|
| <b>Prior Work [36]–[38]</b> | 125.3 | 124.9 | 124.7 | 124.4 | 124.9 |
| <b>Proposed CM-LUT</b>      | 42.9  | 43.3  | 43.4  | 43.8  | 44.7  |
| <b>Proposed 2M-LUT</b>      | 42.8  | 43.2  | 43.4  | 43.8  | 44.5  |
| <b>Proposed 1M-LUT</b>      | 42.8  | 43.2  | 43.4  | 43.7  | 44.4  |
| <b>Proposed 6T-WD</b>       | 34.8  | 34.9  | 35.0  | 34.9  | 35.0  |

The proposed designs' energy savings compared to the prior work show the variation in performance due to the change in temperature. Fig. 20 indicates that the energy savings of the proposed designs for the temperature ranges from  $-50^{\circ}\text{C}$  to  $125^{\circ}\text{C}$ . The energy savings of the proposed designs decrease as temperature increases. Across the temperature range from  $-50^{\circ}\text{C}$  to  $125^{\circ}\text{C}$ , the proposed CM-LUT, 2M-LUT, 1M-LUT, and 6T-WD show an average of 65.05%, 65.12%, 65.14%, and 72.03% energy saving, respectively. Across the various temperatures, the maximum difference in energy savings for the proposed designs is 1.61%. Therefore, the proposed designs have relatively constant energy saving at different temperatures, guaranteeing the memory's consistent performance.



Fig. 20: Effect of temperature variations on energy per cycle saving (%) for the macro using 1T-1MTJ bit cell.

The Tunneling Magnetoresistance (TMR) ratio is a key figure of merit for the STT-MRAM. A higher TMR ratio is desirable in the STT-MRAM G-bits application, as it improves read margin since the higher TMR ratio represents the higher resistance difference between the AP-state and the P-state of the MTJ. Therefore, it is important to check the performance of the proposed write circuits for different TMR values. In practice, we vary the TMR ratio for STT-PMA-MTJ from 100% to 200%.

Table VIII lists the post-layout simulation results, illustrating the impact of TMR variations on the energy dissipation for the  $4 \times 4$  STT-MRAM at 1.5 MHz and  $25^\circ\text{C}$ . Table VIII shows that the prior work has an energy dissipation of 128.7 pJ at the TMR value of 100%, 124.7 pJ at the TMR value of 150%, and 121.5 at the TMR value of 200%, indicating a change of 5.60% across the TMR variation. Similarly, the proposed LUT-based designs also have a change of 6.08% for the TMR value ranging from 100% to 200%. Also, the 6T-WD shows a change of 6.14% in energy dissipation. Therefore, the variation of the TMR value has a higher influence than the temperature variations in the prior work and proposed designs. However, in terms of energy savings, all the proposed prototypes are relatively constant saving while changing the TMR ratio, as shown in Fig. 21.

### C. Energy Dissipation Comparison For $4 \times 4$ STT-MRAM Macro Using 4T-2MTJ Bit Cell

The STT-MRAM has been a promising candidate for designing low-power, higher-density memory. However, STT-MRAM has been shown to be vulnerable to power analysis attacks. The asymmetric resistance values representing logic values can lead to different read or writing currents in 1T-1MTJ cells. The difference in resistance characteristics could potentially leak the information stored in STT-MRAM. However, the researchers in [42] show that the 2T-2MTJ design

TABLE VIII: Effect of TMR variations on energy per cycle (in pJ) performance for  $4 \times 4$  STT-MRAM macro using 1T-1MTJ bit cell.

| TMR, in %           | 100   | 125   | 150   | 175   | 200   |
|---------------------|-------|-------|-------|-------|-------|
| PriorWork [36]–[38] | 128.7 | 126.5 | 124.7 | 123.0 | 121.5 |
| Proposed CM-LUT     | 45.0  | 44.1  | 43.4  | 42.8  | 42.2  |
| Proposed 2M-LUT     | 44.9  | 44.1  | 43.4  | 42.8  | 42.2  |
| Proposed 1M-LUT     | 44.9  | 44.1  | 43.4  | 42.7  | 42.2  |
| Proposed 6T-WD      | 36.2  | 35.5  | 35.0  | 34.4  | 34.0  |



Fig. 21: Effect of TMR variations on energy per cycle saving (%) for the macro using 1T-1MTJ bit cell.

is resilient against power analysis attacks. The 4T-2MTJ bit cell is similar to the design proposed in [42] with a slight modification. We added extra PMOS between each MTJ and their corresponding BL and SL, providing an extra current margin. Therefore, we create a  $4 \times 4$  STT-MRAM using 4T-2MTJ to evaluate the proposed designs' energy dissipation and savings compared to the prior work. Table IX lists post-layout simulation results for energy dissipation comparison of the write circuits in  $4 \times 4$  STT-MRAM from 250 kHz to 6.25 MHz. Similar to the previous 1T-1MTJ experiment, the temperature is  $25^\circ\text{C}$ , and TMR is 150%. All the proposed designs consume less energy in this experiment than in the prior work. For example, at 250 kHz, the energy dissipation is 459.4 pJ for the prior work and 154.3 pJ for the proposed CM-LUT. Similarly, 2M-LUT and 1M-LUT have an energy dissipation of 152.2 pJ and 152.1 pJ, respectively. Also, the 6T-WD consumes 135.5 pJ at 250 kHz. In the interested frequency range from 250 kHz to 6.5 MHz, the average energy dissipation is 153.7 pJ for the prior art, 51.6 pJ for CM-LUT, 2M-LUT, and 1M-LUT, and 58.34 pJ for 6T-WD.

Fig. 22 shows the energy savings of the write circuits in the  $4 \times 4$  STT-MRAM when varying the frequency. Like the 1T-1MTJ cell, the energy savings are almost uniform

TABLE IX: Energy per cycle (in pJ) performance comparison for proposed sinusoidal clocking writing circuitry for  $4 \times 4$  STT-MRAM macro using 4T-2MTJ bit cell.

| Frequency, in Hz           | 250k  | 500k  | 1M    | 1.5M  | 5M    | 6.25M |
|----------------------------|-------|-------|-------|-------|-------|-------|
| <b>PriorWork [36]–[38]</b> | 459.4 | 229.7 | 114.9 | 76.60 | 23.02 | 18.43 |
| <b>Proposed CM-LUT</b>     | 154.3 | 77.14 | 38.58 | 25.73 | 7.735 | 6.192 |
| <b>Proposed 2M-LUT</b>     | 154.2 | 77.09 | 38.56 | 25.71 | 7.733 | 6.191 |
| <b>Proposed 1M-LUT</b>     | 154.1 | 77.08 | 38.55 | 25.71 | 7.730 | 6.188 |
| <b>Proposed 6T-WD</b>      | 135.5 | 67.74 | 33.87 | 22.58 | 6.780 | 5.425 |



Fig. 22: Energy per cycle saving (%) comparison at different frequency for the macro using 4T-2MTJ bit cell.

in the range of 250 kHz to 6.25 MHz. The proposed CM-LUT, 2M-LUT, and 1M-LUT show an average of 66.41%, 66.43%, and 66.44% energy saving, respectively. Like the 1T-1MTJ experiment, the proposed 6T-WD has the highest energy saving of 70.53%. Furthermore, the proposed designs with the 4T-2MTJ exhibit a maximum difference of 0.05% in energy saving from 250 kHz to 6.5 MHz, indicating relatively consistent energy saving. However, they are less than the energy saving of proposed designs with the 1T-1MTJ in the range of 250 kHz to 6.25 MHz.

In this experiment, we evaluate the proposed designs for the temperature range from  $-50^{\circ}\text{C}$  to  $125^{\circ}\text{C}$  while the frequency is set to 1.5 MHz and the TMR ratio is set to 150%. Table X shows the write circuits' post-layout energy dissipation in the  $4 \times 4$  STT-MRAM at various temperatures. The prior work has an average of 76.9 pJ of energy dissipation over the temperature range from  $-50^{\circ}\text{C}$  to  $125^{\circ}\text{C}$ . Our proposed designs have significantly less energy dissipation compared to the prior work. For example, the proposed CM-LUT, 2M-LUT, 1M-LUT, and 6T-WD have an average value of 26.0 pJ, 25.9 pJ, 25.9 pJ, and 22.6 pJ over the temperature range from  $-50^{\circ}\text{C}$  to  $125^{\circ}\text{C}$ , respectively. Also, temperature variation

TABLE X: Effect of Temperature variations on energy per cycle (in pJ) performance for  $4 \times 4$  STT-MRAM macro using 4T-2MTJ bit cell.

| Temprature, in $^{\circ}\text{C}$ | -50  | 0    | 25   | 75   | 125  |
|-----------------------------------|------|------|------|------|------|
| <b>PriorWork [36]–[38]</b>        | 78.1 | 77.0 | 76.6 | 76.1 | 76.6 |
| <b>Proposed CM-LUT</b>            | 25.8 | 25.7 | 25.7 | 25.9 | 26.7 |
| <b>Proposed 2M-LUT</b>            | 25.8 | 25.7 | 25.7 | 25.9 | 26.5 |
| <b>Proposed 1M-LUT</b>            | 25.8 | 25.7 | 25.7 | 25.8 | 26.5 |
| <b>Proposed 6T-WD</b>             | 22.8 | 22.7 | 22.6 | 22.5 | 22.5 |

has a smaller impact on the 4T-2MTJ than the 1T-1MTJ. The proposed designs have a maximum 0.19 pJ increase over the temperature range from  $-50^{\circ}\text{C}$  to  $125^{\circ}\text{C}$ .

Fig. 23 shows the energy savings of the write circuits in the  $4 \times 4$  STT-MRAM for the temperature range from  $-50^{\circ}\text{C}$  to  $125^{\circ}\text{C}$ . The temperature variation has no significant impact on the proposed design compared to the frequency variation. For example, the average energy savings of CM-LUT, 2M-LUT, and 1M-LUT are 66.23%, 66.30%, and 66.32%, respectively. In the temperature range from  $-50^{\circ}\text{C}$  to  $125^{\circ}\text{C}$ , the proposed CM-LUT, 2M-LUT, and 1M-LUT show a 1.78%, 1.58%, and 1.52% change in energy saving, respectively. Also, the proposed 6T-WD has an average energy saving of 70.60% and a change of 0.33%. Therefore, our proposed designs with 4T-2MTJ also have relatively consistent energy savings when varying the temperature.

As noted previously, the TMR ratio is a crucial design metric for the design of STT-MRAM. Therefore, we examine the effect of TMR variations on energy dissipation for the proposed designs. The post-layout simulation results are listed in Table XI at 1.5 MHz and  $25^{\circ}\text{C}$ , varying the TMR ratio from 100% to 200%. The energy dissipation of the prior work is 80.9 pJ at 100% TMR and 73.1 pJ at 200% TMR, which has a change of 9.69%. Similarly, the proposed design has higher energy dissipation at 100% TMR and lower at 200% TMR. Also, the proposed designs have a maximum change of 10.49% and a minimum change of 9.70%. The proposed designs have relatively the same changes as the prior work. It is important to note that the prior work has around three



Fig. 23: Effect of temperature variations on energy per cycle saving (%) for the macro using 4T-2MTJ bit cell.

TABLE XI: Effect of TMR variations on energy per cycle (in pJ) performance for  $4 \times 4$  STT-MRAM macro using 4T-2MTJ bit cell.

| TMR, in %           | 100  | 125  | 150  | 175  | 200  |
|---------------------|------|------|------|------|------|
| PriorWork [36]–[38] | 80.9 | 78.6 | 76.6 | 74.8 | 73.1 |
| Proposed CM-LUT     | 27.3 | 26.5 | 25.7 | 25.1 | 24.4 |
| Proposed 2M-LUT     | 27.3 | 26.5 | 25.7 | 25.0 | 24.4 |
| Proposed 1M-LUT     | 27.3 | 26.5 | 25.7 | 25.0 | 24.4 |
| Proposed 6T-WD      | 23.9 | 23.2 | 22.6 | 22.0 | 21.5 |

times more energy consumption than the proposed designs in this experimental setup.

Fig. 24 shows the energy savings of the write circuits in the  $4 \times 4$  STT-MRAM when varying the TMR value. The average energy savings of the proposed CM-LUT, 2M-LUT, 1M-LUT, and 6T-WD are 66.41%, 66.43%, 66.43%, and 70.52%. For all the proposed designs, the maximum change in the energy saving is 0.30% for the TMR, ranging from 100% TMR to 200% TMR. Therefore, the energy saving of the proposed designs is relatively consistent and not affected by TMR variation. This interesting property can enable the memory designer to design varied sizes of the STT-MRAM with almost identical energy-saving performance.

## V. CONCLUSION

In this article, we introduce a novel approach to improve the energy efficiency of configuring the MTJ in the STT-MRAM for resource and area-constrained IoT applications. We utilize the sinusoidal power-clocking mechanism and develop a write circuit with the proposed control logic and a write driver with the sinusoidal power supply. We propose the



Fig. 24: Effect of TMR variations on energy per cycle saving (%) for case study implementation using 4T-2MTJ bit cell.

LUT-based control logic that helps reduce the write circuit's energy dissipation and area. To achieve further area reduction, we introduce the 6T write driver with the sinusoidal power supply to eliminate the need for the control logic. The energy dissipation and saving are verified by implementing a  $4 \times 4$  STT-MRAM macro with STT-MTJ bit cell (1T-1MTJ or 4T-2MTJ). All the proposed write circuits with LUT-based designs show significant energy savings when varying the frequency, temperature, and TMR ratio compared to the prior work. Further, the proposed write circuit with the 6T write driver shows better energy saving compared to the LUT-based designs. Our study concludes that with the help of post-layout simulations, the proposed write circuit using the novel control logic and write driver with a sinusoidal power supply is feasible and energy-efficient for the STT-MRAM to meet the IoT device's requirements.

## ACKNOWLEDGMENT

This work is partially supported by National Science Foundation CAREER Award No. 2232235.

The authors would like to thank Milad Tanavardi Nasab for the helpful discussion.

## REFERENCES

- [1] C. Terrell and H. Thapliyal, "Approximate adder circuits using clocked cmos adiabatic logic (ccal) for iot applications," in *2020 IEEE International Conference on Consumer Electronics (ICCE)*. IEEE, 2020, pp. 1–4.
- [2] L. Thomas *et al.*, "Basic principles, challenges and opportunities of stt-mram for embedded memory applications," *MSST 2017*, 2017.
- [3] S. Yuasa, A. Fukushima, K. Yakushiji, T. Nozaki, M. Konoto, H. Maehara, H. Kubota, T. Taniguchi, H. Arai, H. Imamura *et al.*, "Future prospects of mram technologies," in *2013 IEEE International Electron Devices Meeting*. IEEE, 2013, pp. 3–1.
- [4] S. D. Kumar and H. Thapliyal, "Exploration of non-volatile mtj/cmos circuits for dpa-resistant embedded hardware," *IEEE Transactions on Magnetics*, vol. 55, no. 12, pp. 1–8, 2019.

[5] Z. Kahleifeh and H. Thapliyal, "Ee-acml: Energy-efficient adiabatic cmos/mtj logic for cpa-resistant iot devices," *Sensors*, vol. 21, no. 22, p. 7651, 2021.

[6] Z. He, S. Angizi, and D. Fan, "Exploring stt-mram based in-memory computing paradigm with application of image edge extraction," in *2017 IEEE International Conference on Computer Design (ICCD)*. IEEE, 2017, pp. 439–446.

[7] N. Sayed, F. Oboril, A. Shirvanian, R. Bishnoi, and M. B. Tahoori, "Exploiting stt-mram for approximate computing," in *2017 22nd IEEE European Test Symposium (ETS)*. IEEE, 2017, pp. 1–6.

[8] N. Maciel, E. Marques, L. Naviner, Y. Zhou, and H. Cai, "Magnetic tunnel junction applications," *Sensors*, vol. 20, no. 1, p. 121, 2019.

[9] Y. Zhang, W. Zhao, G. Prenat, T. Devolder, J.-O. Klein, C. Chappert, B. Dieny, and D. Ravelosona, "Electrical modeling of stochastic spin transfer torque writing in magnetic tunnel junctions for memory and logic applications," *IEEE Transactions on Magnetics*, vol. 49, no. 7, pp. 4375–4378, 2013.

[10] Y.-J. Song, J. H. Lee, H. Shin, K. Lee, K. Suh, J. Kang, S. Pyo, H. Jung, S. Hwang, G. Koh *et al.*, "Highly functional and reliable 8mb stt-mram embedded in 28nm logic," in *2016 IEEE International Electron Devices Meeting (IEDM)*. IEEE, 2016, pp. 27–2.

[11] S.-W. Chung, T. Kishi, J. W. Park, M. Yoshikawa, K.-S. Park, T. Nagase, K. Sunouchi, H. Kanaya, G. Kim, K. Noma *et al.*, "4gbit density stt-mram using perpendicular mtj realized with compact cell structure," in *2016 IEEE International Electron Devices Meeting (IEDM)*. IEEE, 2016, pp. 27–1.

[12] H. Yoda, S. Fujita, N. Shimomura, E. Kitagawa, K. Abe, K. Nomura, H. Noguchi, and J. Ito, "Progress of stt-mram technology and the effect on normally-off computing systems," in *2012 International Electron Devices Meeting*. IEEE, 2012, pp. 11–3.

[13] "Everspin Technology, Spin-transfer Torque DDR Products," Accessed Mar. 18, 2023 [Online]. [Online]. Available: <https://www.everspin.com/spin-transfer-torque-ddr-products>

[14] M. N. I. Khan, S. Bhasin, B. Liu, A. Yuan, A. Chattopadhyay, and S. Ghosh, "Comprehensive study of side-channel attack on emerging non-volatile memories," *Journal of Low Power Electronics and Applications*, vol. 11, no. 4, p. 38, 2021.

[15] A. Iyengar, S. Ghosh, N. Rathi, and H. Naeimi, "Side channel attacks on stt-mram and low-overhead countermeasures," in *2016 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT)*. IEEE, 2016, pp. 141–146.

[16] A. Degada and H. Thapliyal, "Single-rail adiabatic logic for energy-efficient and cpa-resistant cryptographic circuit in low-frequency medical devices," *IEEE Open Journal of Nanotechnology*, vol. 3, pp. 1–14, 2022.

[17] T. G. Mahn, "Wireless medical technologies: Navigating government regulation in the new medical age," *Fishes Regulatory & Government Affairs Group*, 2013.

[18] "Short Range Devices (SRD); Ultra Low Power Active Medical Implants (ULP-AMI) and accessories (ULP-AMI-P) operating in the frequency range 9 kHz to 315 kHz Harmonised Standard covering the essential requirements of article 3.2 of the Directive 2014/53/EU," ETSI (European Telecommunications Standards Institute), Sophia-Antipolis, France, Standard, Jun. 2016.

[19] S. Hanna, "Regulations and standards for wireless medical applications," in *Proceedings of the 3rd international symposium on medical information and communication technology*. Citeseer, 2009, pp. 23–26.

[20] E. Völgyi, F. A. Tylavsky, A. Lyytikäinen, H. Suominen, M. Alén, and S. Cheng, "Assessing body composition with dxa and bioimpedance: effects of obesity, physical activity, and age," *Obesity*, vol. 16, no. 3, pp. 700–705, 2008.

[21] S. B. Rutkove, K. S. Lee, C. A. Shiffman, and R. Aaron, "Test-retest reproducibility of 50 khz linear-electrical impedance myography," *Clinical Neurophysiology*, vol. 117, no. 6, pp. 1244–1248, 2006.

[22] A. Hedayatipour, S. Aslanzadeh, S. H. Hesari, M. A. Haque, and N. McFarlane, "A wearable cmos impedance to frequency sensing system for non-invasive impedance measurements," *IEEE Transactions on Biomedical Circuits and Systems*, vol. 14, no. 5, pp. 1108–1121, 2020.

[23] L. Gerlach, G. Payá-Vayá, and H. Blume, "A survey on application specific processor architectures for digital hearing aids," *Journal of Signal Processing Systems*, pp. 1–16, 2021.

[24] M. Zhang, A. Raghunathan, and N. K. Jha, "Trustworthiness of medical devices and body area networks," *Proceedings of the IEEE*, vol. 102, no. 8, pp. 1174–1188, 2014.

[25] M. Zhang, A. Raghunathan, and J. K., "Towards trustworthy medical devices and body area networks," in *Proceedings of the 50th Annual Design Automation Conference*, 2013, pp. 1–6.

[26] "Medical applications user guide." NXP Semiconductors, <https://www.nxp.com/docs/en/user-guide/MDAPPUSGDRM118.pdf>, Last Accessed: 10-1-2021.

[27] M. K. Gupta and M. Hasan, "Self-terminated write-assist technique for stt-ram," *IEEE Transactions on Magnetics*, vol. 52, no. 8, pp. 1–6, 2016.

[28] P. Barla, V. K. Joshi, and S. Bhat, "A novel self write-terminated driver for hybrid stt-mtj/cmos lim structure," *Ain Shams Engineering Journal*, vol. 12, no. 2, pp. 1839–1847, 2021.

[29] W. Yang, A. Degada, and H. Thapliyal, "Adiabatic logic-based stt-mram design for iot," in *2022 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)*. IEEE, 2022, pp. 235–240.

[30] Z. Wang, W. Zhao, E. Deng, J.-O. Klein, and C. Chappert, "Perpendicular-anisotropy magnetic tunnel junction switched by spin-hall-assisted spin-transfer torque," *Journal of Physics D: Applied Physics*, vol. 48, no. 6, p. 065001, 2015.

[31] B. Behin-Aein, J.-P. Wang, and R. Wiesendanger, "Computing with spins and magnets," *MRS Bulletin*, vol. 39, no. 8, pp. 696–702, 2014.

[32] R. Zand, A. Roohi, S. Salehi, and R. F. DeMara, "Scalable adaptive spintronic reconfigurable logic using area-matched mtj design," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 63, no. 7, pp. 678–682, 2016.

[33] S. Sharmin, A. Jaiswal, and K. Roy, "Modeling and design space exploration for bit-cells based on voltage-assisted switching of magnetic tunnel junctions," *IEEE Transactions on Electron Devices*, vol. 63, no. 9, pp. 3493–3500, 2016.

[34] J. Rajpoot and S. Verma, "Area-efficient auto-write-terminate circuit for nv latch and logic-in-memory applications," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 70, no. 7, pp. 2630–2634, 2023.

[35] S. Verma, R. Paul, and M. Shukla, "Non-volatile latch compatible with static and dynamic cmos for logic in memory applications," *IEEE Transactions on Magnetics*, vol. 58, no. 4, pp. 1–8, 2022.

[36] H. Cai, Y. Wang, L. A. de Barros Naviner, J. Yang, and W. Zhao, "Exploring hybrid stt-mtj/cmos energy solution in near-/sub-threshold regime for iot applications," *IEEE Transactions on magnetics*, vol. 54, no. 2, pp. 1–9, 2017.

[37] E. Garzón, B. Zambrano, T. Moposita, R. Taco, L.-M. Prócel, and L. Trojman, "Reconfigurable cmos/stt-mtj non-volatile circuit for logic-in-memory applications," in *2020 IEEE 11th Latin American Symposium on Circuits & Systems (LASCAS)*. IEEE, 2020, pp. 1–4.

[38] S. M. Nair, R. Bishnoi, M. B. Tahoori, G. Tshagharyan, H. Grigoryan, G. Harutyunyan, and Y. Zorian, "Defect injection, fault modeling and test algorithm generation methodology for stt-mram," in *2018 IEEE International Test Conference (ITC)*. IEEE, 2018, pp. 1–10.

[39] V. K. Joshi, P. Barla, S. Bhat, and B. K. Kaushik, "From mtj device to hybrid cmos/mtj circuits: A review," *IEEE Access*, vol. 8, pp. 194 105–194 146, 2020.

[40] Y. Wang, H. Cai, L. A. de Barros Naviner, Y. Zhang, X. Zhao, E. Deng, J.-O. Klein, and W. Zhao, "Compact model of dielectric breakdown in spin-transfer torque magnetic tunnel junction," *IEEE Transactions on Electron Devices*, vol. 63, no. 4, pp. 1762–1767, 2016.

[41] H. Cai, Y. Wang, L. A. De Barros Naviner, and W. Zhao, "Robust ultra-low power non-volatile logic-in-memory circuits in fd-soi technology," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 64, no. 4, pp. 847–857, 2017.

[42] S. Ben Dodo, R. Bishnoi, and M. B. Tahoori, "Secure stt-mram bit-cell design resilient to differential power analysis attacks," *IEEE Transactions on Very Large Scale Integration (VLSI) Systems*, vol. 28, no. 1, pp. 263–272, 2020.



**Wu Yang** is currently a Ph.D. student in the Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, USA. He has completed a Master of Science from the University of Kentucky, Lexington, USA. His research interest is in the development of Low-Energy and Secure Memory Circuits, Nanotechnology Device-based Emerging Computing, and Side-Channel Analysis attacks.



**Amit Degada** worked as a Post-Doctoral researcher in the Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, USA. He has completed his Ph.D. from the University of Kentucky, Lexington, USA, and his Masters of Technology from Sardar Vallabhbhai National Institute of Technology (NIT) Surat, India. His research interest is on the development of hardware-assisted cybersecurity primitives, and Secure Memory Design.



**Himanshu Thapliyal** (SM'16) is currently an Associate Professor with the Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, Tennessee, USA. He received a PhD degree in Computer Science and Engineering from the University of South Florida, Tampa, in 2011 where he received the 'Distinguished Graduate Achievement Award'. From 2012-14, he worked as a designer of processor test solutions at Qualcomm, where he received the Qualcomm QualStar Award for contributions to memory built-in self-test. He joined as an Assistant Professor at the University of Kentucky, Lexington in 2014 where he was promoted to Associate Professor in 2020. He has been ranked in the top 50 among scientists throughout the world in the field of 'Computer Hardware and Architecture' for the calendar Years 2019-2021. He is the recipient of the 2019 NSF CAREER award and the 2020 IEEE-CS TCVLSI Mid-Career Research Achievement Award. In March 2022, he was selected to the Inaugural Class of IEEE Computer Society Distinguished Contributors which recognizes his distinguished contributions to society and the profession. He has authored over 200 journal/conference articles with over 6000 citations (h-index of 46), and 3 US Patents. He received the Best Poster/LBR Award at the 2023 ACM Great Lakes Symposium on VLSI. He received Best Paper awards at the 2021 IEEE International Conference on Consumer Electronics (ICCE), 2020 IEEE World Forum on Internet of Things, 2017 Cyber and Information Security Research Conference (CISR), and 2012 IEEE Computer Society Annual Symposium on VLSI (ISVLSI). He is serving as the Section Editor of the Springer Nature Computer Science and is leading two sections. He has served/serving in editorial boards of journals including IEEE Transactions on Consumer Electronics, IEEE Consumer Electronics Magazine, IEEE Internet of Things Journal, and the Microelectronics Journal. He co-founded the International Workshop on Quantum Computing: Circuits Systems Automation and Applications (QC-CSAA). His students have received several awards including Best Papers, outstanding PhD and Master's student awards, and secured positions in companies such as Google, Apple, Intel, Cadence, and Microsoft. His research interests include spintronics circuits, hardware-assisted cybersecurity, and quantum computing.