# Energy-Efficient Adiabatic Circuits Using Transistor-Level Monolithic 3D Integration

Ivan Miketic and Emre Salman
Department of Electrical and Computer Engineering
Stony Brook University (SUNY), Stony Brook, New York 11794
E-mail: [ivan.miketic, emre.salman]@stonybrook.edu

Abstract—Charge-recycling adiabatic circuits are recently receiving increased attention due to both high energy-efficiency and higher resistance against side-channel attacks. These characteristics make adiabatic circuits a promising technique for Internetof-things based applications. One of the important limitations of adiabatic logic is the higher intra-cell interconnect capacitance due to differential outputs and cross-coupled pMOS transistors. Since energy consumption has quadratic dependence on capacitance in adiabatic circuits (unlike conventional static CMOS where dependence is linear), higher interconnect capacitance significantly degrades the overall power savings that can be achieved by adiabatic logic, particularly in nanoscale technologies. In this paper, monolithic 3D integrated adiabatic circuits are introduced where transistor-level monolithic 3D technology is used to implement adiabatic gates. A 45 nm two-tier Mono3D PDK is used to demonstrate the proposed approach. Monolithic inter-tier vias are leveraged to significantly reduce parasitic interconnect capacitance, achieving up to 47% reduction in power-delay product as compared to 2D adiabatic circuits in a 45 nm technology node.

Index Terms—Adiabatic circuits, monolithic 3D ICs, low power, parasitic capacitance.

#### I. INTRODUCTION

There is a growing interest on monolithic 3D integrated circuits (ICs) due to unprecedented device density achieved by tiny vertical interconnects referred to as monolithic intertier vias (MIVs) [1]. Unlike through silicon via (TSV) based 3D integration where multiple wafers are stacked, monolithic 3D ICs are fabricated via a sequential process where multiple silicon tiers are fabricated on a single substrate. There has been significant recent progress in the fabrication of two-tier monolithic 3D ICs by ensuring a relatively low processing temperature for the second tier [2], [3]. Integration of emerging devices such as carbon nanotube based field-effect transistors (CNFETs) has also been explored since CNFETs are more tolerant to higher processing temperatures [4]. Even though run-time thermal integrity is a primary concern for high density 3D systems, it was demonstrated that monolithic 3D ICs are more effective in dissipating heat as compared to TSV based die stacking due to much shorter vertical distance to heat sink [5]. Furthermore, monolithic 3D ICs do not suffer from TSV-related reliability issues such as keep-out zone and TSV-to-device noise coupling [6], [7].

Varies design methodologies at different granularity have been proposed for monolithic 3D technology, ranging from block-level to transistor-level partitioning of a circuit [8]. Integrating main memory with processing elements is an attractive block-level partitioning option for scenarios where high bandwidth communication with memory is a bottleneck, as in certain deep neural networks [9]. Transistor-level partitioning methods were considered for fine granularity where MIVs are utilized inside the standard cells. Typically, the pMOS transistors are placed on the bottom tier whereas the nMOS transistors are located on the top tier since the devices within the top tier suffer from degraded performance due temperature-related process limitations [3]. Other vertical integration technologies are typically not suitable for such fine granularity partitioning [10].

A design method is proposed in this paper for transistor-level monolithic 3D ICs. Specifically, charge-recycling adiabatic circuits are designed with a transistor-level monolithic 3D technology to mitigate one of the significant challenges related to the physical implementation of adiabatic circuits, as discussed in this paper.

The rest of the paper is organized as follows. Brief background on adiabatic circuits and a motivational example are provided in Section II. The proposed method is detailed in Section III. Simulation results utilizing a Mono3D PDK in 45 nm technology are presented in Section IV. Finally, the paper is concluded in Section V.

### II. BACKGROUND AND MOTIVATIONAL EXAMPLE

Adiabatic circuits utilize a variable/AC power supply signal in the form of a trapezoidal or sinusoidal waveform. This signal also behaves as a clock signal for the adiabatic circuit since it synchronizes the flow of data and typically referred to as power-clock signal [11]. Consider the equivalent circuit of an adiabatic operation shown in Fig. 1(a). R represents the onresistance of the transistor and the interconnect resistance of the output wire and C represents the output load capacitance. The power supply signal is a trapezoidal waveform with a transition time of  $t_r$ . If  $t_r$  is sufficiently long as compared to the RC time constant, then  $v_c(t)$  approximately follows  $v_{dd}(t)$ , thereby minimizing the power loss across R. Under this assumption, the overall switching energy dissipated per cycle (consisting of both charging and discharging) is

$$E_{ad}^{swi} = 2\frac{RC}{t_r}CV_{dd}^2. \tag{1}$$

Unlike conventional static CMOS based operation where switching energy does not depend upon transition time, in



Fig. 1. Adiabatic switching: (a) equivalent RC circuit of adiabatic gate driven by a trapezoidal power supply signal, (b) schematic of an adiabatic buffer in efficient charge recovery logic (ECRL).

adiabatic operation, a larger transition time reduces the overall switching energy, as described by (1). One practical implementation of adiabatic charging is shown in Fig. 1(b) where efficient charge recovery logic (ECRL) based inverter is illustrated [12]. The operation of an ECRL inverter is described as follows: assume in is high and the power-clock signal (pc1) is rising during the evaluation stage. Then, outbar goes to logic low since M3 is turned on. Alternatively, out remains at logic high since M2 is turned on and M4 is off. During the hold stage, the output of the inverter remains constant as the input of the subsequent gate is connected to a powerclock signal with  $90^{\circ}$  phase difference, referred to as pc2. Once pc1 enters the recovery stage, charge is recovered from out back to pc1. As pc1 reaches the wait stage, power to the gate is turned off, resulting in logic low at both out and outbar. Adiabatic logic has recently received growing attention, particularly for IoT devices where both efficiency and security are important design objectives [13], [14]. The feasibility of adiabatic circuits for RF-powered applications such as RFIDs and wireless sensor nodes has also been demonstrated [15]-[17].

An important consideration in (1) is the quadratic dependence of switching energy on capacitance, which is unlike conventional static CMOS where the dependence on capacitance is linear. An important implication of this stronger dependence is the impact of parasitic interconnect capacitances on the overall switching energy. Particularly for adiabatic logic families with differential output and cross-coupled structure, the higher interconnect capacitance at the output nodes can reduce the switching energy savings. For example, in previous work where we developed a lightweight encryption core using efficient charge recovery logic in 65 nm CMOS technology [18], we observe that the energy savings as compared to static CMOS is approximately  $8.2 \times$  at the schematic level. When the parasitic interconnect impedances are considered at the post-layout level, the energy savings are reduced to approximately 4.9×. Thus, potential energy savings are lost due to the longer output nets in adiabatic logic and quadratic



Fig. 2. Physical layout of an inverter in 65 nm technology: (a) static CMOS, (b) efficient charge recovery logic based adiabatic implementation.



Fig. 3. Equivalent RC network of the output node after parasitic extraction: (a) static CMOS inverter, (b) efficient charge recovery logic (ECRL) based adiabatic inverter.

dependence of switching energy on parasitic capacitance. To further illustrate this issue, the physical layout of static CMOS based inverter and adiabatic ECRL inverter [see Fig. 1(b)] in 65 nm technology is depicted in Fig. 2. As shown in this figure, the adiabatic gate is not only larger in area, but also has a longer output net. The equivalent *RC* circuit for the output net, as derived after parasitic extraction, is illustrated in Fig. 3. The overall parasitic capacitance at the output net for static CMOS is 0.17 fF whereas for ECRL adiabatic inverter, it is 0.39 fF. Since switching energy is quadratically dependent on capacitance for adiabatic logic, this increase significantly degrades the energy savings. Monolithic 3D technology offers an opportunity to mitigate this issue since transistor-level partitioning is feasible using MIVs, as discussed in the following section.

## III. PROPOSED METHOD

The proposed method is based on developing chargerecycling (adiabatic) gates using transistor-level monolithic 3D technology with intra-cell MIVs. This approach can reduce the overall length of the output nets in adiabatic gates, thereby achieving higher energy savings and potentially lower propagation delays. The monolithic 3D technology is based on a process design kit (PDK) with two tiers in a 45 nm technology



Fig. 4. Cross-sections of the (a) conventional 2D and (b) transistor-level monolithic 3D technology with two tiers. The top tier hosts the nMOS transistors whereas the pMOS transistors are placed within the bottom tier.

node [19], [20]. The transistor device characteristics are the same as in 2D FreePDK45 [21]. The pMOS devices are fabricated within the bottom tier whereas the nMOS devices are fabricated within the top tier. Note that most of adiabatic logic families do not have a complementary pull-up network. As such, the number of pMOS devices is expected to be much less than the overall nMOS devices that form the pull-down network where logic is implemented. This asymmetry may cause white space within the bottom tier, which can be used to implement high quality passive devices for an LC tank based resonant power-clock generator [22]. Since the power efficiency of power-clock generation circuitry plays an important role in the overall energy efficiency of an adiabatic circuit, monolithic 3D technology not only helps in reducing cell-level parasitic capacitances (as demonstrated in this paper), but also can enable a power-clock generator with higher efficiency.

In the Mono3D PDK, two metal layers are allocated to the bottom tier (metal1\_btm and metal2\_btm), as illustrated in Fig. 4. These metal layers are primarily for routing the intra-cell signals. The top tier is separated from the bottom tier with an inter-layer dielectric (ILD) with a thickness of 100 nm. Inter-tier coupling is minimized at this thickness, as experimentally validated [23]. The 10 metal layers that exist in 2D *FreePDK45* are maintained the same for the top tier in Mono3D PDK. The intra-cell connections that span the two tiers are achieved by MIVs. Each MIV has a width of 50 nm and height of 215 nm [8].

Each cell is developed with a full-custom design methodology using a cell stacking technique. Power and ground rails at each cell row are connected to the system-level power network through power and ground rings placed during the placement and routing process. A new technology file (.tf) is generated for Mono3D PDK to include all of the new layers (interconnects, via, ILD, and MIV). Based on these modifications, a new display resource file (.drf) is generated to develop full-custom layouts of the 3D cells. The design rule check (DRC), layout versus schematic (LVS) and parasitic extraction (PEX) are performed using existing commercial tools. The DRC rule file is modified to include new features for the additional metal layers, vias, transistors, ILD and MIV. For example, minimum



Fig. 5. Physical layouts of inverters in (a) 2D ECRL and (b) Mono3D ECRL.

spacing between two MIVs is equal to 120 nm, producing an MIV pitch of 170 nm. The LVS rule file is also modified for the tool to be able to independently identify transistors located in separate tiers. The extracted netlist with MIVs is analyzed to accurately determine the interconnections between nMOS (within the top tier) and pMOS (within the bottom tier) transistors. The RC extraction rule file is modified to be able to recognize the new device tier, new metal layers, and MIVs. For metal interconnects, intrinsic plate capacitance, intrinsic fringe capacitance, and nearbody (coupling) capacitance are considered between silicon and metal, and metal and metal. A single MIV is characterized with a resistance of 5.5  $\Omega$ s and a capacitance of 0.04 fF, based on [24] where device-level extraction is performed. The only parasitic component that is not considered during the extraction process is the tier-to-tier coupling capacitance. As experimentally demonstrated in [23], this component is negligible when the inter-layer dielectric is 100 nm thick.

The proposed approach is demonstrated in Fig. 5 where the physical layouts of a 2D ECRL inverter and Mono3D ECRL inverter are illustrated. The two intra-cell MIVs that connect the upper and lower tiers of the output nets are also shown. The Mono3D cell has 38% smaller footprint as compared to 2D cell. Thus, a considerable reduction in the length of the output net is achieved, enabling higher energy savings and lower delay, as quantified in the following section.

# IV. SIMULATION RESULTS

ECRL based adiabatic Inverter, XOR, AND and OR gates are designed in both 2D and Mono3D 45 nm technology. All of the gates are powered with a sinusoidal power-clock signal with a peak amplitude of 1 V and have an operating frequency of 13.56 Mhz. The full-custom layouts were drawn to minimize the interconnect length of the output nets. The Mono3D PDK described in the previous section is utilized.

The average power consumption and propagation delay of each gate are listed, respectively, in Tables I and II. Note that the delay measurement in adiabatic gates is performed with respect to the power-clock signal rather than the input signal since the output changes once the power-clock signal

TABLE I
COMPARISON OF POWER CONSUMED BY 2D AND MONO3D ADIABATIC
ECRL GATES.

|          | 2D ECRL | 3D ECRL | % Reduction |
|----------|---------|---------|-------------|
|          | (nW)    | (nW)    |             |
| Inverter | 4.587   | 3.764   | 17.9        |
| AND      | 4.706   | 4.101   | 12.9        |
| OR       | 4.655   | 4.028   | 13.5        |
| XOR      | 8.434   | 7.631   | 9.5         |

TABLE II COMPARISON OF PROPAGATION DELAYS IN 2D AND MONO3D ADIABATIC ECRL GATES.

|          | 2D ECRL | 3D ECRL | % Reduction |
|----------|---------|---------|-------------|
|          | (ps)    | (ps)    |             |
| Inverter | 19.3    | 12.5    | 35.2        |
| AND      | 22.5    | 16.2    | 28.0        |
| OR       | 25.4    | 19.7    | 22.4        |
| XOR      | 32.2    | 25.0    | 22.4        |

|          | 2D ECRL                | 3D ECRL                | % Reduction |
|----------|------------------------|------------------------|-------------|
|          | $(\mu m \times \mu m)$ | $(\mu m \times \mu m)$ |             |
| Inverter | 1.34x0.79              | 0.83x0.79              | 38.1        |
| AND      | 1.34x1.01              | 0.87x1.01              | 35.1        |
| OR       | 1.34x1.12              | 0.88x1.12              | 34.3        |
| XOR      | 1.34x1.60              | 0.87x1.60              | 35.1        |

starts rising (low-to-high transition of the output) or falling (high-to-low transition of the output). According to Table I, Mono3D technology achieves 9.5 to 17.9% decrease in power consumption of adiabatic ECRL gates. Up to 35.2% reduction in propagation delay is also achieved, as listed in Table II. Finally, the footprint of each cell is listed in Table III for both 2D and mono3D implementations. Mono3D technology achieves, on average, 36% reduction in footprint.

#### V. CONCLUSION

High density intra-cell MIVs in transistor-level monolithic 3D technology are utilized to develop energy-efficient adiabatic circuits. Specifically, it was demonstrated that most of the existing adiabatic logic families suffer from high interconnect capacitance due to differential outputs and cross-coupled pMOS devices. Higher parasitic interconnect capacitance significantly reduces the energy savings since in adiabatic logic, the switching energy increases quadratically with capacitance (unlike static CMOS where the dependence is linear). By implementing adiabatic gates in transistor-level monolithic 3D technology, the overall length of the output nets can be reduced, which reduces both the power consumption and propagation delay of traditional adiabatic circuits. Up to 47% reduction in power-delay product was demonstrated.

## REFERENCES

P. Batude et al., "3D Sequential Integration Opportunities and Technology Optimization," in Proceedings of the IEEE International Interconnect Technology Conference, May 2014, pp. 373–376.

- [2] L. Brunet et al., "First Demonstration of a CMOS over CMOS 3D VLSI CoolCube Integration on 300mm Wafers," in *Proceedings of the IEEE Symposium on VLSI Technology*, Jun. 2016, pp. 1–2.
- [3] F. B. Claire et al., "FDSOI Bottom MOSFETs Stability versus Top Transistor Thermal Budget Featuring 3D Monolithic Integration," Solid-State Electronics, vol. 113, pp. 2–8, Nov. 2015.
- [4] T. F. Wu, H. Li, P. Huang, A. Rahimi, G. Hills, B. Hodson, W. Hwang, J. M. Rabaey, H. P. Wong, M. M. Shulaker, and S. Mitra, "Hyperdimensional computing exploiting carbon nanotube fets, resistive ram, and their monolithic 3d integration," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 11, pp. 3183–3196, 2018.
- [5] P. Shukla, A. K. Coskun, V. F. Pavlidis, and E. Salman, "An overview of thermal challenges and opportunities for monolithic 3d ics," in *Proc.* of the ACM GLSVLSI, May 2019.
- [6] H. Wang, M. H. Asgari, and E. Salman, "Compact Model to Efficiently Characterize TSV-to-Transistor Noise Coupling in 3D ICs," *Integration*, the VLSI Journal, vol. 47, no. 3, pp. 296–306, June 2014.
- [7] E. Salman, "Noise coupling due to through silicon vias (tsvs) in 3-d integrated circuits," in *IEEE Int. Symp. on Circuits and Systems*, 2011, pp. 1411–1414.
- [8] S. A. Panth, K. Samadi, Y. Du, and S. K. Lim, "Design and CAD Methodologies for Low Power Gate-Level Monolithic 3D ICs," in Proceedings of the ACM International Symposium on Low Power Electronics and Design, Aug. 2014, pp. 171–176.
- [9] Y. Yu and N. K. Jha, "SPRING: A sparsity-sware reduced-precision monolithic 3D CNN accelerator architecture for training and inference," arXiv:1909.00557v2, 2019.
- [10] S. M. Satheesh and E. Salman, "Power Distribution in TSV-Based 3D Processor-Memory Stacks," *IEEE Journal on Emerging and Selected Topics in Circuits and Systems*, vol. 2, no. 4, pp. 692–703, Dec. 2012.
- [11] T.-C. Ou, Z. Zhang, and M. C. Papaefthymiou, "An 821mhz 7.9 gb/s 7.3 pj/b/iteration charge-recovery ldpc decoder," in *Int. Solid-State Circuits Conf.*, 2014, pp. 462–463.
- [12] Y. Moon and D.-K. Jeong, "An efficient charge recovery logic circuit," Solid-State Circuits, IEEE Journal of, vol. 31, no. 4, pp. 514–522, 1996.
- [13] S. Dinesh Kumar, H. Thapliyal, and A. Mohammad, "Finsal: Finfet-based secure adiabatic logic for energy-efficient and dpa resistant iot devices," *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 37, no. 1, pp. 110–122, 2018.
- [14] S. Lu, Z. Zhang, and M. Papaefthymiou, "1.32ghz high-throughput charge-recovery aes core with resistance to dpa attacks," in *IEEE Symposium on VLSI Circuits*, June 2015, pp. C246–C247.
- [15] T. Wan, Y. Karimi, M. Stanaćević, and E. Salman, "Perspective paper—can ac computing be an alternative for wirelessly powered iot devices?" *IEEE Embedded Systems Letters*, vol. 9, no. 1, pp. 13–16, March 2017.
- [16] T. Wan, E. Salman, and M. Stanacevic, "A new circuit design framework for iot devices: Charge recycling with wireless power harvesting," in IEEE Int. Symp. on Circuits and Systems, May 2016.
- [17] T. Wan, Y. Karimi, M. Stanaćević, and E. Salman, "Ac computing methodology for rf-powered iot devices," *IEEE Trans. on Very Large Scale Integration Systems*, vol. 27, no. 5, pp. 1017–1028, May 2019.
- [18] T. Wan and E. Salman, "Ultra low power simon core for lightweight encryption," in *IEEE Int. Symp. on Circuits and Systems*, May 2018.
- [19] C. Yan, S. Kontak, H. Wang, and E. Salman, "Open Source Cell Library Mono3D to Develop Large-Scale Monolithic 3D Integrated Circuit," in Proc. of IEEE Int. Symp. on Circuits and Systems, May 2017.
- [20] C. Yan and E. Salman, "Mono3D: Open source cell library for monolithic 3-D integrated circuits," *IEEE Trans. on CAS I: Regular Papers*, vol. 65, no. 3, pp. 1075–1085, 2018.
- [21] "FreePDK45." [Online]. Available: http://www.eda.ncsu.edu/wiki/FreePDK45:Contents
- [22] N. Jeanniot, G. Pillonnet, P. Nouet, N. Azemard, and A. Todri-Sanial, "Synchronised 4-phase resonant power clock supply for energy efficient adiabatic logic," in *IEEE Int. Conf. on Rebooting Computing*, 2017.
- [23] P. Batude et al., "GeOI and SOI 3D Monolithic Cell Integrations for High Density Applications," in Proceedings of the IEEE International Symposium on VLSI Technology, June 2009, pp. 166–167.
- [24] J. Shi et al., "On the Design of Ultra-High Density 14nm Finfet Based Transistor-Level Monolithic 3D ICs," in Proceedings of the IEEE Computer Society Annual Symposium on VLSI, July 2016, pp. 449–454.