

# A 200-GHz Power Amplifier With 18.7-dBm $P_{\text{sat}}$ in 45-nm CMOS SOI: A Model-Based Large-Signal Approach on Cascaded Series-Connected Power Amplification

Saleh Hassanzadehyamchi<sup>✉</sup>, Graduate Student Member, IEEE, Amirreza Alizadeh<sup>✉</sup>, Member, IEEE, Ali M. Niknejad<sup>✉</sup>, Fellow, IEEE, and Omeed Momeni<sup>✉</sup>, Senior Member, IEEE

**Abstract**—This article proposes a novel approach on cascaded series-connected power amplifier (PA) design. High-frequency transistor modeling is employed to analyze the stacked cell, and a methodology is developed to maximize the output power ( $P_{\text{out}}$ ) and power-added efficiency (PAE) of each cell. The  $P_{\text{out}}$  and power gain of the cell are studied, and the optimum operation point is determined. A proof-of-concept integrated PA is implemented in a 45-nm CMOS silicon-on-insulator (SOI) process, where stacking and parallel power combining techniques are adopted to achieve 18.7-dBm  $P_{\text{out}}$  and 4.8% PAE at 200 GHz. Each PA unit uses three cascaded gain stages where two-stacked, three-stacked, and five-stacked architectures are employed for the first, second, and third stages, respectively. Four PA units are power-combined by a low-loss 4:1 zero-degree combiner. The amplifier consumes 1.4-W dc power and has a small-signal gain of 14.6 dB at 203.2 GHz. The designed PA occupies  $1.28 \times 1.05$  mm<sup>2</sup> die area, including all pads. To the author's knowledge, the designed PA achieves the highest  $P_{\text{out}}$  and PAE among all the Si counterparts at 200 GHz.

**Index Terms**—CMOS, G-band, high efficiency, mm-wave, power combining, saturated power, stacked power amplifiers (PAs).

## I. INTRODUCTION

THE need for high-speed communication links and high-resolution radars has made mm-wave bands a promising candidate for the next generation of wireless communication systems since they offer a wide frequency span and a smaller footprint for devices [1], [2]. Power amplifiers (PAs) play a key role in mm-wave transceivers since their output power determines the communication range and overall

Manuscript received 29 March 2023; revised 14 July 2023 and 29 September 2023; accepted 27 November 2023. Date of publication 12 December 2023; date of current version 29 May 2024. This article was approved by Associate Editor Payam Heydari. This work was supported by the NSF under Award 1932821 and Award ECCS-2148021. (Corresponding author: Saleh Hassanzadehyamchi.)

Saleh Hassanzadehyamchi and Omeed Momeni are with the Department of Electrical and Computer Engineering, University of California at Davis, Davis, CA 95616 USA (e-mail: shassanzadehyamchi@ucdavis.edu).

Amirreza Alizadeh is with Keysight Technologies, Santa Rosa, CA 95403 USA.

Ali M. Niknejad is with the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA 94720 USA.

Color versions of one or more figures in this article are available at <https://doi.org/10.1109/JSSC.2023.3339198>.

Digital Object Identifier 10.1109/JSSC.2023.3339198

efficiency of the system [3]. Designing high-power, highly efficient PAs has been a major challenge in mm-wave frequencies due to the limited performance of the transistors and loss of the passive devices. Traditionally, SiGe and III-V compound semiconductors such as InP have been preferred over CMOS to implement mm-wave PAs [4], [5]. However, with recent improvements in their  $f_{\text{max}}$ , scaled CMOS technologies have become popular for implementing mm-wave PAs due to lower fabrication cost and higher integration capability. Nonetheless, CMOS transistors suffer from low drain-gate and drain-source breakdown voltages and therefore limited power generation capability [6]. To overcome this issue, transistor stacking has been presented and widely discussed in literature as a series power combining technique to increase the voltage swing and hence output power [4], [5], [6], [7]. In a stacked topology, output voltage swing of transistors add up in series to provide high output voltage swing at the load. Nonetheless, design methods of stacked PAs presented in the literature are not accurate enough for mm-wave frequencies, resulting in suboptimal performance [4], [5], [6].

In this work, we propose a stacked PA design theory compatible with mm-wave operation, where the target, as discussed in [4], [5], and [6], is to maximize the output power and PAE. Modifying the existing approaches, it is shown that the optimum design point can be determined precisely at high frequencies, and maximum output power and PAE are achievable.

The basic theory of class-A PA is applied to the stacked cell using transistor's high-frequency equivalent circuit, and the optimal values of load impedances and gate termination capacitors are calculated. It investigates how transistor parasitics limit the power generation in the stacked cell. The derived theory combines the transistor's small-signal and large-signal models to study the linearity performance of the cell. When cascading stacked cells, all transistors add the same amount of power and have the same PAE, but power gain decreases as we proceed from the first to the last stage in the chain. The study concludes that impedance transformation by intercell matching networks (IMNs) is required between the neighbouring cells to satisfy the optimum impedance.

This approach is used to design and implement a 200 GHz stacked PA in a 45-nm CMOS silicon-on-insulator (SOI)

technology. The PA operates at close-to- $f_{\max}$  frequencies and consists of three cascaded stages, where two and three transistors are stacked in the first and second stages, respectively. The last stage employs a five-stacked topology to increase the power and gain. To boost the output power, a four-way zero-degree power combiner is also used by combining the powers of four PA units. The proposed PA achieves 18.7-dBm output power, 14.6-dB gain, and 4.84% PAE at 200 GHz. To the authors knowledge, this is the highest PAE and output power reported for any silicon-based PA around 200 GHz.

The rest of this article is organized as follows. A brief literature review of stacked PA design and limitations of the existing approaches are presented in Section II. Section III introduces an accurate physics-based high-frequency mm-wave transistor model. The proposed analysis method of cascaded series-connected PA is described in Section IV. The designed G-band PA in SOI CMOS technology is explained in Section V, followed by the discussion of the measurement results in Section VI. Finally, Section VII concludes this article.

## II. REVIEW OF STACKED PA DESIGN METHODOLOGIES

### A. Circuit-Based Design Theory

The operation mechanism of the stacked PA has been described in [5], where all transistors have the same size and operate at the same biasing condition. Then, assuming a linear operation, a small-signal equivalent circuit is used to analyze the behavior of the stacked cell. The design goal in [5] is to realize the optimum voltage and current waveforms across each cell, and the analysis presented gives a good understanding of the circuit operation, from which several takeaways are noticeable.

- 1) The small-signal model includes parasitic capacitors and drain-source intrinsic current source.
- 2) Transconductance ( $g_m$ ) is modeled with a real number.
- 3) The drain-source voltages ( $V_{ds}$ ) of all transistors are assumed to be in phase and equal, which translates to maximum output voltage swing.
- 4) Drain-source currents ( $I_{ds}$ ) of the transistors are assumed to be equal but out-of-phase with  $V_{ds}$ .
- 5) Neglecting the small-signal drain-source resistance and capacitance of the device, optimum load impedances ( $Z_L$ ) for each stacked cell are derived to be primarily real and proportional to class-A load line impedance ( $R_{opt}$ ).
- 6) Gate termination capacitor of each common-gate (CG) transistor is sized to present  $Z_L$  to the preceding cell.
- 7) A single-element matching network is also presented to resonate out part of the parasitics (drain-source and drain-substrate capacitors) in the intercell nodes.

Considering above limitations, the models in [5] should be revisited, and a revised design methodology is required to design stacked PAs at mm-wave frequencies. In this article, we study the performance of the stacked cell using the basic class-A PA theory. Several key points are included in the analysis, as follows.

- 1) An accurate small-signal model compatible with mm-wave frequencies is used. All parasitic losses are included in the model.

- 2) Transconductance ( $g_m$ ) includes a time delay. Recent studies on FET modeling suggest a time delay in the drain-source conductance of the device [8], [9], [10], [11]. The intrinsic time delay is related to the inertia of the intrinsic output current in response to rapid changes of  $V_{gs}$  [8].
- 3) The analysis finds voltage phase alignment between the stacked transistors unnecessary, which has been traditionally considered indispensable in [4], [5], and [6].
- 4) To realize class-A operation, the effect of all parasitic components are de-embedded.
- 5) The required load and gate termination impedances in a stacked cell are determined based on the transistor equivalent circuit model.
- 6) The IMNs provide full impedance transformation and avoids mismatch in the intercell nodes which leads to a higher gain and output power compared to those of [5].

### B. Two-Port Network-Based Design Theory

A two-port network-based theory has been presented in [12] for stacked PA design. The method relied on optimum voltage and current distribution across the transistors and provided full impedance matching between the cells. Additionally, the analysis in [12] has successfully predicted the optimum load and gate (base) termination impedances of the transistors, the compression level, and a constant PAE with respect to the gate (base) impedance termination of the transistors.

Despite the excellent results, the approach in [12] depends on two-port models of the devices and gives little insight for modifying or improving the performance of the stacked cell. In other words, the designer gains little knowledge about the transistor parameters that may degrade or improve the performance. The proposed method in this article is built on the network-based approach and produces the same results in output power, PAE, gain, and  $Z_L$ . It presents a more insightful theory that discusses the limitations and highlights the importance of each transistor parameter on the overall performance.

## III. HIGH-FREQUENCY TRANSISTOR MODELING

The proposed analysis of stacked PA utilizes the high-frequency equivalent circuit of the transistor, which requires a precise model incorporating all the associated parasitics. In the following, we first present a high-frequency small-signal model of the transistor and then modify it for the large-signal regime. The model is narrowband and fits the device behavior only at the targeted frequency band.

### A. Small-Signal Modeling

Fig. 1 shows the high-frequency small-signal equivalent circuit of an NMOS FET, where all parameters in the model have explicit physical interpretations. The intrinsic gate-source and gate-drain capacitors are shown by  $C_{gs}$  and  $C_{gd}$ , and  $R_{gs}$  and  $R_{gd}$  model the distributed channel resistances. The transconductance includes a time delay of  $\tau$  and is modeled as  $g_m = g_{mo}e^{-j\omega\tau}$ .  $R_{ds}$  and  $C_{ds}$  describe the drain-source channel resistance and capacitance. There is also a time



Fig. 1. Equivalent small-signal model of NMOS transistor.

TABLE I  
PARAMETER VALUES FOR NMOS TRANSISTORS  
AT 190 GHz IN 45-nm SOI

| Parameter                     | Transistor Size | 20.4 $\mu\text{m}$ | 32.4 $\mu\text{m}$ | 42.7 $\mu\text{m}$ |
|-------------------------------|-----------------|--------------------|--------------------|--------------------|
| $C_{gs}$ (fF)                 |                 | 13                 | 16.7               | 25.6               |
| $R_{gs}$ ( $\Omega$ )         |                 | 251.2              | 168.7              | 111.4              |
| $C_{gd}$ (fF)                 |                 | 7.2                | 11.8               | 15.1               |
| $R_{gd}$ ( $\Omega$ )         |                 | 629.3              | 448.9              | 295.5              |
| $C_{ds}$ (fF)                 |                 | 9.9                | 17.1               | 21.9               |
| $R_{ds}$ ( $\Omega$ )         |                 | 154.1              | 69.2               | 62.7               |
| $L_{ds}$ (pH)                 |                 | 50.1               | 46.7               | 31.3               |
| $g_{mo}$ ( $\text{m}\Omega$ ) |                 | 27.3               | 40.8               | 56.9               |
| $\tau$ (psec)                 |                 | 21.8               | 21.5               | 20.6               |

Bias point:  $J = 440 \mu\text{A}/\mu\text{m}$ ,  $V_{GS} = 0.6 \text{ V}$ , and  $V_{DS} = 1.2 \text{ V}$ . Channel length: 40-nm.

delay associated with the output conductance, necessitating an inductor ( $L_{ds}$ ) connected in series with  $R_{ds}$  in the equivalent circuit [8], [9], [10], [11].

To calculate the values of the model parameters, a 20.4  $\mu\text{m}$  NMOS transistor in a 45-nm SOI process is biased in class-A (i.e.,  $J = 440 \mu\text{A}/\mu\text{m}$ ,  $V_{GS} = 0.6 \text{ V}$ , and  $V_{DS} = 1.2 \text{ V}$ ), and its S-parameters are fitted to those of the equivalent circuit of Fig. 1 at center frequency of 190 GHz. This procedure is repeated for 32.4- and 42.7- $\mu\text{m}$  NMOS transistors, and the resulting values are presented in Table I. The model includes the routing parasitics of the transistors extracted by RC-extraction up to the fourth metal layer ( $C_1$ ).

### B. Large-Signal Modeling

In general, PAs are nonlinear circuits, and the small-signal model of Fig. 1 does not predict the nonlinearity, thereby necessitating modifications to consider the device's large-signal behavior. For this purpose, we will show that modifying  $g_{mo}$  based on the compression level of the device is enough to predict the large-signal behaviour, and there is no need to change the other circuit parameters in Fig. 1. In some technologies (e.g., GaN), device capacitors are voltage-dependent and introduce strong nonlinearity and generate higher-order harmonics in the drain-source current [13], [14]. This variation, nevertheless, is negligible for CMOS devices biased in class-A region, where the device is always on and no higher-order harmonics are present in  $I_{ds}$ .

To extract the  $g_{mo}$  in different compression levels of the device, we run harmonic-balance load-pull (LP) simulations at 190 GHz for the CG stacked cell at different compression levels, where the gate node of the transistor is ac-grounded. From the voltage and current distribution across the transistor,

Fig. 2.  $g_{mo}$  versus the compression level at 190 GHz for 20.4-, 32.4-, and 42.7- $\mu\text{m}$  NMOS transistors in 45-nm CMOS SOI ( $L_g = 40 \text{ nm}$ ).

(a)



(b)

Fig. 3. (a) Output power and gain comparison between simulated transistor (solid black) and its extracted model (symbols red) versus input power at 190 GHz for 32.4- $\mu\text{m}$  NMOS transistor in 45-nm CMOS SOI ( $L_g = 40 \text{ nm}$ ). The two curves are on top of one another; (b) AM-PM curves.

$g_{mo}$  is calculated for each compression level, and results for different device sizes are shown in Fig. 2. As expected,  $g_{mo}$  reduces with further device compression, and the designer should choose the corresponding  $g_{mo}$  at a specific compression level from Fig. 2 as the reference design parameter.

To validate the extracted large-signal model, one can compare the large-signal behavior of the device simulated in LP setup with that of the extracted model. As an example, the input power for the 32.4  $\mu\text{m}$  transistor is swept and output power and gain are plotted in Fig. 3(a), which demonstrates a strong correlation between the transistor and its model. Additionally, it verifies that the nonlinearity caused by higher-order harmonics is not significant, affirming that the primary influential factor in the large signal performance of



Fig. 4. Stacked PA topology.

the PA is the  $g_{mo}$  nonlinearity and reduction with high power levels. Similarly, AM-to-PM versus input power is graphed in Fig. 3(b), and the AM-PM difference between the extracted model and transistor remains below one degree, affirming linear parasitic elements.

#### IV. THEORY OF STACKED POWER AMPLIFIER

Herein, we use the high-frequency model discussed in Section III to study the cascaded series-connected (stacked) PA in linear and nonlinear regions. Output power, gain, efficiency, and linearity are the key metrics to consider. A single transistor has a limited gain and output power. One can make the transistor larger to elevate the output power; however, this stipulates a higher impedance transformation ratio for the output matching network (OMN) and increases the loss. In addition, larger transistors have lower  $f_{max}$  and provide a limited gain at close-to- $f_{max}$  frequencies. Therefore, there is an optimum transistor size for mm-wave PA to achieve maximum output power and efficiency.

Parallel and series power combining is another approach to improve the gain and output power. The former only raises the output power, whereas the latter improves both output power and gain. Fig. 4 shows the circuit diagram of a stacked PA where  $k$  common-gate cells are cascaded and represents a series power combining. The gate node of each CG transistor is terminated to a finite capacitor. IMNs provide the optimum load impedance of each cell to the input impedance of the next cell. For simplicity, we assume that IMNs are lossless, and all transistors have the same size and operate in the same class-A bias point. Several design principles are considered in the stacked PA of Fig. 4.

- 1) Drain-source and gate-source voltages— $V_{ds}$  and  $V_{gs}$ —must be realized optimally across the transistors (shown by blue color).
- 2) Drain-source voltage and current amplitudes remain constant, which guarantees the same operation point with the same efficiency for all transistors [12].
- 3) Taking the optimum drain-source voltage of the first transistor ( $V_m$ ) as a reference with a phase of zero,  $V_{ds}$  and  $V_{gs}$  of the  $k$ th transistor experience a phase shift of  $\varphi_k$  with respect to the first transistor. The efficiency still remains unchanged due to the equal phase shift in both voltages.
- 4) Since  $V_{ds}$  and  $V_{gs}$  swings are constant, the corresponding modeling parameters of transistors also remain unchanged and are equal to the reference design point chosen initially from Fig. 2.

- 5) The phase-shift  $\varphi_k$  is generated by the IMNs, providing an extra degree of freedom in design. This eliminates the previously held belief of phase alignment requirement in the design of stacked PAs [5].
- 6) The maximum power that a device can potentially generate is constrained by its size/biasing and is equal for all cells having the same size.

Thus, ideally, the  $k$ th stacked cell operates at its peak efficiency and adds a constant amount of power ( $P_{\text{added}}$ ) to the applied input power ( $P_{\text{in},k}$ ). The output power of the  $k$ th stacked cell ( $P_{\text{out},k}$ ) determines the drive power of the next cell ( $P_{\text{in},k+1}$ ). As shown in [12], for the  $k$ th cell, one can write

$$P_{\text{out},k} = P_{\text{in},1} + kP_{\text{added}} = P_{\text{in},k+1} \quad (1)$$

where  $P_{\text{in},1}$  is the input power of the first transistor in the chain. The large-signal gain of the  $k$ th cell is calculated as follows:

$$G_k = 1 + \frac{P_{\text{added}}}{P_{\text{in},1} + (k-1)P_{\text{added}}}. \quad (2)$$

Thus, the large-signal gain of the  $k$ -stacked PA is the multiplication of the gain of cells

$$G_{\text{tot}} = \prod_{i=1}^k G_i = 1 + \frac{kP_{\text{added}}}{P_{\text{in},1}} = kG_1 + (1-k). \quad (3)$$

The PAE of an  $m$ -stacked PA is calculated by employing (1)–(3) as follows:

$$\text{PAE} = \frac{P_{\text{in},1} + mP_{\text{added}} - P_{\text{in},1}}{mP_{\text{DC}}} = \frac{P_{\text{added}}}{P_{\text{DC}}} \quad (4)$$

where  $P_{\text{DC}}$  is the dc power of one transistor.

#### A. Design Equations of the Stacked Cell

The forgoing analysis suggests that, in absence of matching losses, the overall PAE of an  $m$ -stacked cell is equal to the PAE of each CG cell, and maximum PAE is achieved by optimal design of each CG cell. Fig. 5(a) shows the circuit diagram of a CG stacked cell terminated to its optimal load impedance,  $Z_L$ , and Fig. 5(b) shows the equivalent high-frequency small-signal model of this cell, where  $I_m$  is the phasor of the intrinsic current source. From the definition of class-A mode [15],  $V_m$  and  $I_m$  are out-of-phase and their ratio determines the load line impedance of transistor as follows:

$$R_{\text{opt}} = -\frac{V_m}{I_m}. \quad (5)$$



Fig. 5. (a) AC circuit diagram of a common-gate stacked cell and (b) its equivalent small-signal model.

Considering  $I_m = g_m V_{gs}$ , the gate-source voltage can be expressed as follows:

$$V_{gs} = -\frac{V_m}{g_m R_{opt}}. \quad (6)$$

Assuming that  $Z_k$  is reactive, (13)–(15) derived in Appendix A and  $P_{in} = V_{in} I_{in}^* / 2$  are used to calculate  $Z_k$  as follows:

$$Z_k = \frac{2P_{in} - \frac{V_m}{R_{opt}} \Re \left( \frac{I_{in}^*}{g_m} \right)}{j \Im \left( I_k I_{in}^* \right)}. \quad (7)$$

The details of  $V_{in}$ ,  $I_{in}$ , and  $I_k$  are presented in Appendix A. (7) shows the relation between  $Z_k$  and  $P_{in}$ , indicating that a higher input power results in a larger  $Z_k$ , and thus, a smaller gate capacitor. The lowest input power is achieved when gate node is ac-grounded as follows:

$$P_{in-min} = \frac{V_m}{2R_{opt}} \Re \left( \frac{I_{in}^*}{g_m} \right). \quad (8)$$

The optimum load impedance ( $Z_L$ ) is calculated by applying KCL to the output node and employing (13)–(15) as follows:

$$Z_L = \frac{\left(1 + g_m R_{opt}\right) \left(1 + \frac{Z_k}{Z_{gd}}\right) + \frac{Z_k}{Z_{gs}}}{g_m - \frac{g_m R_{opt}}{Z_{ds}} - \frac{1 + g_m R_{opt}}{Z_{gd}}}. \quad (9)$$

The input impedance ( $Z_{in}$ ) is calculated by combining (13) and (14) as follows:

$$Z_{in} = \frac{V_{in}}{I_{in}} = \frac{Z_{gs} + Z_k + Z_k \left(1 + g_m R_{opt}\right) \frac{Z_{gs}}{Z_{gd}}}{1 + g_m Z_{gs} - g_m R_{opt} \frac{Z_{gs}}{Z_{ds}}}. \quad (10)$$

Finally, we can derive the output power delivered to the load by utilizing (5)–(9) as follows:

$$P_{out} = P_{in} + \frac{V_m}{2} \Re \left[ I_{in}^* - I_k^* \left(1 + \left(g_m R_{opt}\right)^{-1}\right) \right] \quad (11)$$

TABLE II  
THEORETICAL AND SIMULATED ADDED POWER OF NMOS TRANSISTORS IN 45-nm CMOS SOI AT 1-dB COMPRESSION POINT

| Parameter               | Transistor size | 20.4 $\mu\text{m}$ | 32.4 $\mu\text{m}$ | 42.7 $\mu\text{m}$ |
|-------------------------|-----------------|--------------------|--------------------|--------------------|
| $P_{added,sim.}$ (mW)   |                 | 1.71               | 2.56               | 2.79               |
| $P_{added,theory}$ (mW) |                 | 1.73               | 2.57               | 2.84               |

Bias point:  $J = 440 \mu\text{A}/\mu\text{m}$ ,  $V_{GS} = 0.6 \text{ V}$ , and  $V_{DS} = 1.2 \text{ V}$ . Operation frequency: 190 GHz.



Fig. 6.  $R_{opt}$  and  $V_m$  versus the compression level at 190 GHz. NMOS transistors have a channel length of 40 nm and widths of 20.4, 32.4, and 42.7  $\mu\text{m}$ .

where the second term in the right side of the equation is  $P_{added}$ , and it is only dependent on transistor model parameters. One can replace  $I_{in}$  and  $I_k$  in (11) from (13) and (15) and calculate  $P_{added}$  in terms of circuit parameters, as shown in (12), bottom of the next page.  $V_m^2 / (2R_{opt})$  in (12) is the power generated by the transistor, whereas the second, third, and fourth terms inside the bracket are due to the losses generated by the device parasitics. These losses will be discussed further in Section IV-B.

Referring back to (5), (7), and (9), we notice that  $V_m$ ,  $R_{opt}$ , and  $P_{added}$  are the key parameters of the design and should be extracted at the design frequency of interest (190 GHz in this work). The values of  $V_m$ ,  $I_m$ , and  $g_{mo}$  (shown in Fig. 2) are directly obtained from the simulation results of the LP setup, and  $R_{opt}$  is calculated using (5).  $P_{added}$  is calculated from (12) and compared with its simulated counterpart in Table II. Theoretical values are in close agreement with simulation results for all transistor sizes.  $R_{opt}$  and  $V_m$  are functions of transistor's input power. Fig. 6 presents  $R_{opt}$  and  $V_m$  versus the compression level of the device for 20.4-, 32.4-, and 42.7- $\mu\text{m}$  NMOS transistors with a minimum channel length.  $V_m$  is identical for all transistor sizes and increases as the transistor compresses further. For the compression level of interest, the designer can read  $V_m$  and  $R_{opt}$  from Fig. 6 and use them along with  $g_{mo}$  (Fig. 2) and  $P_{added}$  (Table II) to calculate all other unknown parameters and metrics.

### B. Loss Mechanisms Inside the Transistor

As previously discussed, losses incurred by the device parasitics reduce  $P_{added}$  and degrade the performance of the stacked cell. There are three main loss contributors in the device: the drain-source ( $\alpha_{ds}$ ), the gate-source ( $\alpha_{gs}$ ), and the gate-drain ( $\alpha_{gd}$ ) parasitic losses. The second, third, and fourth terms



Fig. 7. Normalized loss of the device parasitics to the transistor current-source power,  $V_m^2/(2R_{\text{opt}})$ , for different transistor sizes in a 45-nm CMOS SOI. ( $L_g = 40$  nm).

inside the bracket in (12) show  $\alpha_{\text{ds}}$ ,  $\alpha_{\text{gs}}$ , and  $\alpha_{\text{gd}}$ , respectively. These losses should be mitigated by a proper design of the transistor (i.e., device size selection and layout). Fig. 7 shows the loss components versus the device width when the transistor operates at 1-dB compression point at 190 GHz. The loss values are normalized to the transistor current-source power,  $V_m^2/(2R_{\text{opt}})$ .  $\alpha_{\text{ds}}$  reduces  $P_{\text{added}}$  by 35% and has the highest effect, and the circuit designer should minimize the layout parasitic between the drain and source nodes. Furthermore,  $\alpha_{\text{gd}}$  and  $\alpha_{\text{gs}}$  decrease  $P_{\text{added}}$  by 26% and 12%, respectively. The total loss wastes 72%–76% of  $P_{\text{added}}$ , and is the main reason PAE of the transistors is limited at high end of mm-wave frequencies. It is worth mentioning that this loss increases with higher frequencies due to the changes in  $R_{\text{gd}}$ ,  $R_{\text{gs}}$ ,  $R_{\text{ds}}$ , and  $L_{\text{ds}}$  of the transistor.

### C. Design of the Stacked PA

A 32.4- $\mu\text{m}$  NMOS transistor is selected in this work to be studied in the stacked cell at 190 GHz. PAE,  $P_{\text{out}}$ , gain, and  $Z_k$  of this cell are calculated using (2), (4), (7), and (11), and the results are shown in Fig. 8. As input power and thus voltage swing becomes larger,  $Z_k$  should be increased to compensate for the high input voltage swing, avoid device failure and preserve efficiency. Also, a higher  $P_{\text{in}}$  leads to a larger output voltage swing and output power, whereas PAE remains constant, as expected from (4) and (12). Nevertheless, this is achieved at the cost of a lower gain, because a higher  $Z_k$  (lower  $C_k$ ) creates a stronger feedback from the output to the gate. The optimum load and input impedances and the required  $C_k$  for the same transistor size are featured in Fig. 9(a). The real parts of  $Z_L$  and  $Z_{\text{in}}$  increase as  $P_{\text{in}}$  become larger. This can potentially reduce matching losses because of the lower impedance transformation ratio.

As a measure of linearity, we study the compression level of the CG cell at 190 GHz. The operating power gain ( $G_p$ ) of the cell demonstrates the linear gain, and it is defined as the ratio of the power delivered to the load to the input

power.  $G_p$  is calculated using the  $Z$ -parameters of the cell (see Appendix B), and both large-signal and small-signal ( $G_p$ ) gains for the chosen 32.4- $\mu\text{m}$  NMOS transistor are plotted in Fig. 9(b). The difference between the compressed gain and  $G_p$  is the compression level, which is also illustrated in Fig. 9(b). Note that for each  $P_{\text{in}}$ , we adjust  $C_k$ ,  $Z_L$ , and other parameters based on the derived design equations in Section IV-A to achieve the maximum PAE performance. The compression level decreases for higher  $P_{\text{in}}$  levels, because a larger  $P_{\text{in}}$  requires a smaller  $C_k$  [Fig. 9(a)], which results in a stronger feedback and better linearity.

As an example, we apply our developed design methodology to build a three-stacked PA at 190 GHz, as shown in Fig. 8. To maximize the PAE of the PA, all cells should operate at their peak PAE [Fig. 9(a)]. The optimal design point of each cell in the chain is marked in Fig. 8. The chosen points are mapped into the PA circuit diagram. The power flow across the chain, compression level, optimum load and input impedances, and  $C_k$  values are also shown in the figure. The following steps are required to design a class-A stacked PA that uses the same transistor size and achieves its maximum PAE.

- 1) The optimum device size should be chosen to meet the target output power and gain requirement, while trying to minimize the parasitic losses (i.e.,  $\alpha_{\text{ds}}$ ,  $\alpha_{\text{gs}}$ , and  $\alpha_{\text{gd}}$ ) for highly efficient performance. Therefore for a given technology, Fig. 7 should be replotted and used as a reference. Layout parasitics should be also minimized to mitigate the losses, especially  $\alpha_{\text{ds}}$ .
- 2) Due to the limited gain of the device at mm-wave frequencies and loss of IMNs, the gate node of the first transistor in the chain ( $M_1$ ) should be ac-grounded ( $Z_k \approx 0$ ) to achieve the maximum gain. From Fig. 8, the input power of  $M_1$  for  $Z_k = 0$  is 5.5 dBm. The output power of  $M_1$  determines the input power of the second cell ( $P_{\text{in},2}$ ), and similarly  $P_{\text{out},2} = P_{\text{in},3}$ . The blue arrows in Fig. 8 show the power flow across the cells and the optimal design point for each cell.
- 3) For the marked  $P_{\text{in}}$  points in Fig. 8,  $Z_k$  of each transistor is determined. This guarantees that all cells operate at their maximum PAE.
- 4) Compression level of each cell is determined from Fig. 9(b).
- 5)  $Z_L$  and  $Z_{\text{in}}$  of all cells are extracted from Fig. 9(a) for the specified input powers.
- 6) In the last step, we design the IMNs, which transform  $Z_L$  to  $Z_{\text{in}}$  of the next cell.

The first cell in the stacked PA provides the highest gain, while the last cell produces the lowest gain. Total gain (dB) of the stacked PA from (3) is also illustrated in Fig. 8, featuring a linear relationship with  $P_{\text{in}}$ . More cells can be added to the chain if a higher gain or output power is required.

$$P_{\text{added}} = \frac{V_m^2}{2R_{\text{opt}}} \left[ 1 - R_{\text{opt}} \left( \frac{R_{\text{ds}}}{R_{\text{ds}}^2 + \omega^2 L_{\text{ds}}^2} \right) - \frac{1}{|g_m|^2 R_{\text{opt}}} \left( \frac{1}{R_{\text{gs}}} \right) - \frac{1 + R_{\text{opt}}(g_m + g_m^*) + |g_m|^2 R_{\text{opt}}^2}{|g_m|^2 R_{\text{opt}}} \left( \frac{1}{R_{\text{gd}}} \right) \right] \quad (12)$$



Fig. 8. (a) Output power, PAE, gain, total gain, and  $Z_k$  versus the input power for a 32.4- $\mu\text{m}$  NMOS transistor with 40-nm channel length. Optimum design points for the case of three cascaded stacked cells are specified with vertical lines. (b) Schematic of the designed three-stacked PA with power levels and impedances at different points in the chain.



Fig. 9. (a) Load impedance, input impedance, and gate termination capacitor versus  $P_{in}$  for a 32.4- $\mu\text{m}$  NMOS transistor with 40-nm channel length, and (b) Compression Level (dB) of the stacked cell versus  $P_{in}$  for the same device size. Design parameters are extracted at 1-dB compression point at 190 GHz.

#### D. Lossy IMNs

In practice, IMNs are lossy and degrade the gain of the stacked cell. The IMN loss should not exceed the cell gain in Fig. 8, and therefore, there is a lower bound for  $C_k$ . One may need to tune the selected  $P_{in}$  levels and increase  $C_k$  values to

| Transistor             | $M_1$        | $M_2$        | $M_3$        |
|------------------------|--------------|--------------|--------------|
| $Z_L$                  | $10+j39.7$   | $14.2+j51.3$ | $18.4+j62.6$ |
| $Z_{in}$               | $48.4+j16.7$ | $84+j15.4$   | $119+j14.1$  |
| Compression Level (dB) | 1            | 0.7          | 0.5          |
| Gain (dB)              | 2.35         | 1.55         | 1.1          |
| Total Gain (dB)        | 2.35         | 3.9          | 5            |



Fig. 10. Output power, PAE, gain, total gain, and  $Z_k$  versus the input power for a 32.4- $\mu\text{m}$  NMOS transistor with 40-nm channel length. Optimum design points for the case of three cascaded stacked cells are specified with vertical lines assuming 0.6-dB IMN loss.

compensate for the loss of IMNs. Assuming 0.6-dB loss for the IMNs, the design procedure for Fig. 8 is repeated and the results are demonstrated in Fig. 10. Due to IMN losses, the design points for second and third cells are shifted to the left, and total PAE is degraded. The loss becomes more dominant for higher input powers, degrading the PAE, significantly.

## V. CIRCUIT DESIGN

In this work, a four-way stacked PA is designed and implemented in a 45-nm CMOS SOI process with eight metal layers and trap-rich substrate. Thin-oxide floating-body NMOS FETs with 40-nm channel length are used. Significant effort has been made during the layout process to mitigate the impact of interconnect parasitics [16], [17], [18]. Table III concludes the final layout information of the devices. Channel width ( $W$ ) is optimized to achieve the highest gain at the center frequency and maximum  $f_{max}$ . The widths per finger are in range of 0.82–0.85  $\mu\text{m}$ , and the gates are double-side contacts



Fig. 11. 3-D layout of a 32.4- $\mu\text{m}$  NMOS transistor with a 40-nm channel length.

TABLE III  
LAYOUT INFORMATION OF NMOS TRANSISTORS USED IN THE DESIGN

| Transistor size ( $\mu\text{m}$ ) | 20.4 | 32.4 | 42.7 |
|-----------------------------------|------|------|------|
| Multiplicity                      | 3    | 3    | 4    |
| Number of fingers                 | 8    | 13   | 13   |
| Channel Width (nm)                | 850  | 831  | 821  |

and relaxed-pitch to minimize the resistance. The 3-D layout of two 32.4- $\mu\text{m}$  NMOS transistors in parallel is presented in Fig. 11. The source and drain of the two transistors are connected in sixth metal layer (OA), and the gates (G1 and G2) are terminated to gate termination capacitors realized in the bottom four layers (M<sub>1</sub>–C<sub>1</sub>) with interdigitated structure to obtain high  $Q$  and capacitor density [16], [17], [18]. Two other transistors with 20.4- and 42.7- $\mu\text{m}$  sizes are laid out in a similar fashion. The 42.7- $\mu\text{m}$  transistor has an  $f_{\text{max}}$  of 275 GHz when referenced to the fourth metal layer (C<sub>1</sub>). All transmission lines (TLs) are implemented in the top 4.1- $\mu\text{m}$ -thick aluminum layer to minimize the loss. All passive networks realized in UA–LD were simulated using 2.5-D electromagnetic simulator.

The transistors in the stacked cells operate in class-A and are biased at  $J = 440 \mu\text{A}/\mu\text{m}$ ,  $V_{\text{GS}} = 0.6 \text{ V}$ , and  $V_{\text{DS}} = 1.2 \text{ V}$ . Fig. 12 presents the schematic of the PA unit. Four PA units are power combined to increase the output power. The PA unit includes a five-stacked output stage, preceded by three-stacked and two-stacked driver stages to improve the gain. The design details will be discussed in the following.

#### A. Power Stage of the PA Unit

The last (power) five-stacked stage of the PA unit biased at  $V_{\text{DD3}} = 6 \text{ V}$  employs four-cascaded CG cells and a common-source (CS) driver to achieve a targeted output power of 11 dBm from each PA unit. The CS cell improves the overall gain of the power stage and uses a  $2 \times 42.7 \mu\text{m}$  transistor. The chosen transistor size for the CS cell is larger than that of the CG cells (i.e.,  $2 \times 32.4 \mu\text{m}$ ) to ensure that it stays linear while four-cascaded CG cells are getting compressed. The gates of the two parallel transistors in each CG cell are terminated to 360-fF bypass capacitors which is the highest capacitor of the process with sufficiently higher self-resonance than 190 GHz. As a result, the gain of the cell decreases by 0.3 dB [i.e., 3.1-dB small-signal gain from Fig. 9(b)] compared to that of the cell with the ideal ac-grounded gate. Moreover, the 360-fF capacitors has an impedance of  $(1 - j2.3) \Omega$



Fig. 12. Schematic of the PA unit.



Fig. 13. Power flow and realized impedances across the compressed CG cells in the power stage of the PA unit at 190 GHz.

at 190 GHz, and its ohmic loss causes additional 0.3-dB gain drop. Therefore, each CG cell achieves a small-signal gain of 2.8 dB at 190 GHz. The CS cell along with the following CG cell are designed for conjugate-matching and deliver a  $P_{\text{in}}$  of 6.4 dBm to the third cell. Fig. 13 illustrates the power flow as well as the realized impedances across the last three cells which are operating in the compressed region.

From Fig. 9(a),  $Z_L = (10.8 + j41.6)/2 = (5.4 + j20.8) \Omega$ , and thus the values of post-layout load impedances are in agreement with the theoretical counterparts. The third, fourth, and fifth cells of the stage are operating in 0.5-, 0.9-, and 1-dB compression points, respectively (Figs. 9(b) and 13). IMNs are designed using a short stub and a series TL and introduce about 0.6 dB dissipative loss at 190 GHz. Further stacking requires  $C_k$  sizing (Fig. 8) and is avoided at 190 GHz because the gain of the added cell drops below 1-dB considering IMN loss.

#### B. Driver Stages in the PA Unit

Two driver stages with similar architecture are added to achieve a targeted 15-dB small-signal gain from the PA unit. The second stage employs a CS cell as a driver and two-cascaded CG cells, while the first stage has a CS cell



Fig. 14. Block diagram of the four-way PA.



Fig. 15. Die photograph of the 200-GHz CMOS PA.

followed by a CG cell. The number of transistors cascaded in each stage, transistor sizes, and the required PA stages (three in this design) are chosen to provide enough gain and power. The gates are similarly terminated by 360-fF capacitors to obtain the highest gain from each CG cell, and all cells are designed for conjugate matching, having 2.6–3.3-dB small-signal gain. Transistor sizes and supply voltages are scaled down in the driver stages to improve the PAE and gain. The first and second stages are biased at  $V_{DD1} = 2.4$  V and  $V_{DD2} = 3.6$  V, respectively. Interstage matching networks (ISMNs) are used between the unit PA stages to provide conjugate matching.

#### C. Four-Way Combiner

Zero-degree power combining shows lower loss compared to the transformer-based and Wilkinson power combining at *G*-band [20] and is utilized to combine the output power of four PA units. Fig. 14 shows the block diagram of four-way PA. The PA uses the same network as the power splitter at the input. The power combiner is designed using 60- $\Omega$  TLs with a width of 6.5  $\mu$ m. EM simulation of the combiner and pad shows 0.2-dB loss at 190 GHz. A shunt stub is employed to bias the last stages of the unit PAs, and it is absorbed in the OMN. The stub adds 1.6-dB extra loss to that of the power combiner. The total loss of the OMN is 1.8 dB, which degrades the PAE by 34%.

## VI. MEASUREMENT RESULTS

The *G*-band stacked PA is implemented in a 45-nm CMOS SOI process. Fig. 15 shows the die photo of the fabricated PA.



Fig. 16. Measured (solid) and simulated (dashed) S-parameters.

Fig. 17.  $\mu$ -factor based on the measured and simulated S-parameters.

The dimensions of the chip are  $430 \times 930 \mu\text{m}$  excluding the dc pads. The S-parameters of the PA were characterized by a Keysight N5247A PNA-X connected to two VDI WR-5.1 frequency extenders. Fig. 16 illustrates the measured and simulated S-parameters. The PA achieves 14.6-dB peak gain at 203.2 GHz with a 3-dB bandwidth of 9.4 GHz from 197.4 to 206.8 GHz. The maximum gain is 1.4 dB lower than the simulated gain. The measured center frequency is shifted toward higher frequencies about 5.3 % compared to the simulation. This shift is likely due to imprecise high-frequency transistor modeling and inaccuracies related to EM simulation of the passive components. To minimize the simulation time period, we divided the simulation into three distinct parts: the PA unit, input and OMNs. However, it is important to recognize that this approach does not account for the coupling between the PA units with one another and matching networks, which can potentially introduce inaccuracies in the frequency response. The measured input and output reflection coefficients remain lower than  $-9$  dB across the bandwidth. The stability of the PA is probed using the  $\mu$ -factor. Fig. 17 shows the  $\mu$ -factor of the PA calculated from the measured and simulated S-parameters. The PA is unconditionally stable.

Fig. 18 shows the large-signal measurement setup. Measurement was done in four steps to calibrate the loss of the probes and S-bends. The initial measurement setup consisted of an Anritsu signal generator, followed by a VDI WR-5.1 frequency extender and a VDI WR4.3 amplifier. The amplifier was directly connected to and its output power was measured by the Erikson PM5 power meter in the 190–205-GHz range.

TABLE IV  
COMPARISON BETWEEN STATE-OF-THE-ART PAs OPERATING AROUND 200 GHz

| Reference                                          | [19]         | [20]                      | [21]         | [22]        | [23]                      | [24]          | <b>This Work</b>     |
|----------------------------------------------------|--------------|---------------------------|--------------|-------------|---------------------------|---------------|----------------------|
| Technology                                         | 130nm BiCMOS | 130nm BiCMOS              | 130nm BiCMOS | 65nm CMOS   | 65nm CMOS                 | 250nm InP HBT | <b>45nm SOI CMOS</b> |
| Process $f_{\max}$ (GHz)                           | 370          | 500                       | 450          | 395         | 400                       | 650           | <b>355</b>           |
| 3-dB Bandwidth (GHz)                               | 200-220      | 200-255                   | 211-263      | 227.5-257.5 | 195-209                   | 190-210       | <b>197.4-206.8</b>   |
| $f_0/f_{\max}$                                     | 0.57         | 0.45                      | 0.52         | 0.61        | 0.5                       | 0.31          | <b>0.56</b>          |
| Gain (dB)                                          | 25           | 12.5 (15.5 <sup>†</sup> ) | 16.4         | 13.9        | 19.5 (22.5 <sup>†</sup> ) | 22            | <b>14.6</b>          |
| OP1dB (dBm)                                        | 4            | 9 (10.5 <sup>†</sup> )    | 11.5         | -5.1        | 6.3 (7.8 <sup>*</sup> )   | 17.4          | <b>12.1</b>          |
| $P_{\text{sat}}$ (dBm)                             | 9.6          | 12 (13.5 <sup>†</sup> )   | 14.7         | -3.3        | 9.4 (10.9 <sup>*</sup> )  | 18.5          | <b>18.7</b>          |
| PAE <sub>max</sub> (%)                             | 0.5          | 2.14 (3 <sup>‡</sup> )    | 3.13         | 1.6         | 1.03 (1.4 <sup>*</sup> )  | 8.5           | <b>4.84</b>          |
| DC Power (mW)                                      | 1824*        | 740                       | 924          | 23.8        | 732                       | 814           | <b>1398</b>          |
| Active Size (mm <sup>2</sup> )                     | 0.84         | 0.83                      | 0.221        | 0.053       | 0.92 (0.52 <sup>‡</sup> ) | 0.61*         | <b>0.4</b>           |
| $P_{\text{sat}}/\text{Area}$ (mW/mm <sup>2</sup> ) | 10.9         | 19.1 (27 <sup>†</sup> )   | 133.5        | 8.8         | 9.5 (23.7 <sup>†</sup> )  | 116.1         | <b>185.3</b>         |
| FOM <sup>△</sup>                                   | 39.7         | 38.3 (44.3 <sup>†</sup> ) | 47.6         | 23.2        | 34.6 (40.4 <sup>†</sup> ) | 52.6          | <b>45</b>            |

\*Loss of the output balun/power combiner is excluded.

<sup>†</sup>Loss of the input and output balun/power combiner is excluded.

<sup>‡</sup>Active area, excluding the area occupied by input and output baluns and DC pads.

<sup>\*</sup>Calculated from the reported numbers and/or graphs.

<sup>○</sup>Maximum drain efficiency is reported.

<sup>△</sup>FOM =  $P_{\text{sat}}[\text{dBm}] + \text{Gain}[\text{dB}] + 10\log(\text{PAE}_{\text{MAX}}[\%]) + 20\log(\text{Freq}/f_{\max}) + 10\log(\text{BW})$



Fig. 18. Large-signal measurement setup.

Cascade i-220 GSG probes and S-bends were added to the setup. The probes landed on the through line of a WR5 calibration kit and the output power was measured again across the same frequency range. Comparing the two data sets, the combined probes and S-bends losses were extracted. Next, the output power of the PA was measured in 193–205-GHz band for different input power levels. Finally, the PA's input and output probes and S-bends were swapped, and the previous step was repeated to calibrate the loss difference between the input and output probes and S-bends.

The output power and power gain of the PA were calculated by de-embedding all losses, and the results are shown in Fig. 19. The PA achieves a peak saturated output power of 18.7 dBm with associated PAE of 4.8% and power gain of 10.9 dB at 200 GHz. The results show a good consistency with their simulation counterparts at 188 GHz.  $P_{\text{sat}}$  remains within 2 dB of its maximum from 198 to 202.9 GHz, as shown in Fig. 20, and shows a similar behavior with frequency shift to its simulation counterpart. The OP1dB is 12.1 dBm at 200 GHz and remains higher than 10 dBm over 193–205 GHz.



Fig. 19. Measured (marked) and simulated (dotted) power gain and output power of the PA at 200 GHz.



Fig. 20. Simulated (dashed) and measured (solid) saturated output power and OP1dB versus frequency.

The discrepancy in OP1dB between measurement and simulation at frequencies lower than 195 GHz is because of lower gain experimentally compared to its simulation counterpart,

thereby making OP1dB and  $P_{sat}$  very close to each other at 193 GHz.

Table IV summarizes the performances of the state-of-the-art PAs at  $G$ -band. To the authors' knowledge, this work demonstrates record  $P_{sat}$ , PAE, and power density ( $P_{sat}/\text{Area}$ ) amongst the silicon-based PAs at 200 GHz. The PA operate at  $0.56f_{\max}$ , showing an excellent performance at close-to- $f_{\max}$  region where device gain is limited.

## VII. CONCLUSION

A novel methodology is developed in this article to predict the power gain, output power, PAE, and compression level of mm-Wave stacked PAs. The proposed approach determines the optimum design point of the PA to obtain maximum efficiency and output power. A 200-GHz CMOS stacked PA was designed and implemented in 45-nm CMOS SOI based on the proposed technique. The presented PA achieves record  $P_{sat}$  and PAE performance at close-to- $f_{\max}$  frequencies (200 GHz) in silicon. The PA consumes 0.4 mm<sup>2</sup> active die area and features the highest power density ( $P_{sat}/\text{Area}$ ) among the state-of-the-art CMOS, SiGe BiCMOS, and InP HBT PAs.

## APPENDIX A DESIGN EQUATIONS OF THE STACKED CELL

From (5) and (6), the input current ( $I_{in}$ ), the input voltage ( $V_{in}$ ), and the current flowing to the gate termination ( $I_k$ ) can be derived as follows:

$$I_{in} = V_m \left( \frac{1}{R_{\text{opt}}} - \frac{1}{Z_{ds}} + \frac{1}{g_m R_{\text{opt}} Z_{gs}} \right) \quad (13)$$

$$V_{in} = \frac{V_m}{g_m R_{\text{opt}}} \left( 1 + \frac{Z_k}{Z_{gs}} + \frac{Z_k}{Z_{gd}} + \frac{Z_k g_m R_{\text{opt}}}{Z_{gd}} \right) \quad (14)$$

and

$$I_k = \frac{V_m}{g_m R_{\text{opt}}} \left( \frac{1}{Z_{gs}} + \frac{1 + g_m R_{\text{opt}}}{Z_{gd}} \right) \quad (15)$$

wherein

$$Z_{gs} = \frac{R_{gs}}{1 + s C_{gs} R_{gs}} \quad (16a)$$

$$Z_{gd} = \frac{R_{gd}}{1 + s C_{gd} R_{gd}} \quad (16b)$$

$$Z_{ds} = \frac{R_{ds} + s L_{ds}}{1 + s C_{ds} R_{ds} + s^2 L_{ds} C_{ds}}. \quad (16c)$$

## APPENDIX B Z-PARAMETERS OF THE STACKED CELL

The Z-parameters of the CG stacked cell is derived using the model presented in Fig. 5(b) as follows:

$$Z_{11} = Z_k + Z_{gs} \parallel \left( \frac{Z_{ds} + Z_{gd}}{1 + g_m Z_{ds}} \right) \quad (17a)$$

$$Z_{12} = Z_k + \frac{Z_{gs} Z_{gd}}{g_m Z_{ds} Z_{gs} + Z_{ds} + Z_{gs} + Z_{gd}} \quad (17b)$$

$$Z_{21} = Z_k + \frac{Z_{gs} Z_{gd} (g_m Z_{ds} + 1)}{g_m Z_{ds} Z_{gs} + Z_{ds} + Z_{gs} + Z_{gd}} \quad (17c)$$

$$Z_{22} = Z_k + Z_{gd} \parallel (Z_{ds} + Z_{gs} + g_m Z_{ds} Z_{gs}). \quad (17d)$$

## ACKNOWLEDGMENT

The authors would like to thank the National Science Foundation (NSF), Intel Corporation, and GlobalFoundries for support.

## REFERENCES

- [1] J. F. Buckwalter et al., "Prospects for high-efficiency silicon and III-V power amplifiers and transmitters in 100–300 GHz bands," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Apr. 2021, pp. 1–7.
- [2] H. Afzal, C. Li, and O. Momeni, "A 17 Gb/s 10.7 pJ/b 4FSK transceiver system for point to point communication in 65 nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2022, pp. 71–74, doi: [10.1109/RFIC54546.2022.9863100](https://doi.org/10.1109/RFIC54546.2022.9863100).
- [3] A. S. H. Ahmed, U. Soylu, M. Seo, M. Urteaga, and M. J. W. Rodwell, "A compact H-band power amplifier with high output power," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2021, pp. 123–126, doi: [10.1109/RFIC51843.2021.9490426](https://doi.org/10.1109/RFIC51843.2021.9490426).
- [4] A. Agah, H. Dabag, B. Hanafi, P. Asbeck, L. Larson, and J. Buckwalter, "A 34% PAE, 18.6 dBm 42–45 GHz stacked power amplifier in 45 nm SOI CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits Symp.*, Jun. 2012, pp. 57–60, doi: [10.1109/RFIC.2012.6242231](https://doi.org/10.1109/RFIC.2012.6242231).
- [5] H.-T. Dabag, B. Hanafi, F. Golcuk, A. Agah, J. F. Buckwalter, and P. M. Asbeck, "Analysis and design of stacked-FET millimeter-wave power amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 4, pp. 1543–1556, Apr. 2013, doi: [10.1109/TMTT.2013.2247698](https://doi.org/10.1109/TMTT.2013.2247698).
- [6] M. H. Montaseri, J. Aikio, T. Rahkonen, and A. Pärssinen, "Optimum number of transistors in stacked CMOS millimeter-wave power amplifiers," in *Proc. IEEE Int. Symp. Circuits Syst. (ISCAS)*, May 2018, pp. 1–4, doi: [10.1109/ISCAS.2018.8351160](https://doi.org/10.1109/ISCAS.2018.8351160).
- [7] A. S. H. Ahmed, A. A. Farid, M. Urteaga, and M. J. W. Rodwell, "204 GHz stacked-power amplifiers designed by a novel two-port technique," in *Proc. 13th Eur. Microw. Integr. Circuits Conf. (EuMIC)*, Sep. 2018, pp. 29–32, doi: [10.23919/EuMIC.2018.8539884](https://doi.org/10.23919/EuMIC.2018.8539884).
- [8] G. Crupi, D. M. M.-P. Schreurs, A. Raffo, A. Cademi, and G. Vannini, "A new millimeter-wave small-signal modeling approach for pHEMTs accounting for the output conductance time delay," *IEEE Trans. Microw. Theory Techn.*, vol. 56, no. 4, pp. 741–746, Apr. 2008, doi: [10.1109/TMTT.2008.918147](https://doi.org/10.1109/TMTT.2008.918147).
- [9] I. Man Kang and H. Shin, "Non-quasi-static small-signal modeling and analytical parameter extraction of SOI FinFETs," *IEEE Trans. Nanotechnol.*, vol. 5, no. 3, pp. 205–210, May 2006.
- [10] S. Qiao, H. Lv, Y. Zhang, Y. Zhang, and P. Ding, "An improved InP HEMT small signal model with RC network," in *Proc. IEEE 13th Int. Conf. ASIC (ASICON)*, Oct. 2019, pp. 1–4, doi: [10.1109/ASICON47005.2019.8983689](https://doi.org/10.1109/ASICON47005.2019.8983689).
- [11] W. Zhang, S. Yin, W. Hu, and Y. Wang, "Novel physics-based small-signal modeling and characterization for advanced RF bulk FinFETs," *IEEE Trans. Electron Devices*, vol. 68, no. 5, pp. 2160–2166, May 2021, doi: [10.1109/TED.2021.3063211](https://doi.org/10.1109/TED.2021.3063211).
- [12] A. Ahmed. (2020). *High-Efficiency Millimeter-Wave Power Amplifiers and Packaging Design*. UC Santa Barbara. [Online]. Available: <https://escholarship.org/uc/item/5zz697g4>
- [13] J. Moon, J. Kim, and B. Kim, "Investigation of a class-J power amplifier with a nonlinear  $C_{out}$  for optimized operation," *IEEE Trans. Microw. Theory Techn.*, vol. 58, no. 11, pp. 2800–2811, Nov. 2010.
- [14] A. Alizadeh and A. Medi, "A broadband integrated class-J power amplifier in GaAs pHEMT technology," *IEEE Trans. Microw. Theory Techn.*, vol. 64, no. 6, pp. 1822–1830, Jun. 2016.
- [15] S. C. Cripps, *RF Power Amplifiers for Wireless Communications*, 2nd ed. Norwell, MA, USA: Artech House, 2006.
- [16] L. Gao, Q. Ma, and G. M. Rebeiz, "A 4.7 mW W-band LNA with 4.2 dB NF and 12 dB gain using drain to gate feedback in 45 nm CMOS RFSOI technology," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2018, pp. 280–283.
- [17] O. Inac, M. Uzunkol, and G. M. Rebeiz, "45-nm CMOS SOI technology characterization for millimeter-wave applications," *IEEE Trans. Microw. Theory Techn.*, vol. 62, no. 6, pp. 1301–1311, Jun. 2014.
- [18] M. Sayginer and G. M. Rebeiz, "A W-band LNA/phase shifter with 5-dB NF and 24-mW power consumption in 32-nm CMOS SOI," *IEEE Trans. Microw. Theory Techn.*, vol. 66, no. 4, pp. 1973–1982, Apr. 2018.
- [19] N. Sarmah, K. Aufinger, R. Lachner, and U. R. Pfeiffer, "A 200–225 GHz SiGe power amplifier with peak psat of 9.6 dBm using wideband power combination," in *Proc. 42nd Eur. Solid-State Circuits Conf.*, Sep. 2016, pp. 193–196, doi: [10.1109/ESSCIRC.2016.7598275](https://doi.org/10.1109/ESSCIRC.2016.7598275).

- [20] M. H. Eissa and D. Kissinger, "A 13.5 dBm fully integrated 200-to-255 GHz power amplifier with a 4-Way power combiner in SiGe: C BiCMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2019, pp. 82–84, doi: [10.1109/ISSCC.2019.8662424](https://doi.org/10.1109/ISSCC.2019.8662424).
- [21] J. Yu et al., "A 211-to-263-GHz dual-LC-tank-based broadband power amplifier with 14.7-dBm PSAT and 16.4-dB peak gain in 130-nm SiGe BiCMOS," *IEEE J. Solid-State Circuits*, vol. 58, no. 2, pp. 332–344, Feb. 2023, doi: [10.1109/JSSC.2022.3192043](https://doi.org/10.1109/JSSC.2022.3192043).
- [22] D.-W. Park, D. R. Utomo, B. H. Lam, S.-G. Lee, and J.-P. Hong, "A 230–260-GHz wideband and high-gain amplifier in 65-nm CMOS based on dual-peak  $G_{\max}$ -core," *IEEE J. Solid-State Circuits*, vol. 54, no. 6, pp. 1613–1623, Jun. 2019, doi: [10.1109/JSSC.2019.2899515](https://doi.org/10.1109/JSSC.2019.2899515).
- [23] H. Bameri and O. Momeni, "A 200-GHz power amplifier with a wideband balanced slot power combiner and 9.4-dBm p sat in 65-nm CMOS: Embedded power amplification," *IEEE J. Solid-State Circuits*, vol. 56, no. 11, pp. 3318–3330, Nov. 2021, doi: [10.1109/JSSC.2021.3091546](https://doi.org/10.1109/JSSC.2021.3091546).
- [24] A. S. H. Ahmed, U. Soylu, M. Seo, M. Urteaga, J. F. Buckwalter, and M. J. W. Rodwell, "A 190–210 GHz power amplifier with 17.7–18.5dBm output power and 6.9–8.5% PAE," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Jun. 2021, pp. 787–790, doi: [10.1109/IMS19712.2021.9574925](https://doi.org/10.1109/IMS19712.2021.9574925).



**Ali M. Niknejad** (Fellow, IEEE) received the B.S.E.E. degree from the University of California at Los Angeles, Los Angeles, CA, USA, in 1994, and the master's and Ph.D. degrees in electrical engineering from the University of California at Berkeley (UC Berkeley), Berkeley, CA, in 1997 and 2000, respectively.

He is currently a Professor with the EECS Department, UC Berkeley, the Faculty Director of the Berkeley Wireless Research Center (BWRC), Berkeley, and the Associate Director of the ComSenseTer, Santa Barbara, CA, USA a multiuniversity center for converged terahertz communications and sensing. His research interests include wireless and broadband communications and biomedical imaging and sensors, integrated circuit technology (analog, RF, mixed signal, and mm-wave), device physics and compact modeling, and applied electromagnetics.

Prof. Niknejad and his coauthors received the 2017 IEEE Transactions on Circuits and Systems—I: Regular Papers Darlington Best Paper Award, the 2017 Most Frequently Cited Paper Award (2010–2016) at the Symposium on VLSI Circuits, and the CICC 2015 Best Invited Paper Award. He was a recipient of the 2012 ASEE Frederick Emmons Terman Award for his textbook on electromagnetics and RF integrated circuits. He was a co-recipient of the 2013 Jack Kilby Award for Outstanding Student Paper for his work on an efficient Quadrature Digital Spatial Modulator at 60 GHz, the 2010 Jack Kilby Award for Outstanding Student Paper for his work on a 90-GHz pulser with 30 GHz of bandwidth for medical imaging, and the Outstanding Technology Directions Paper at ISSCC 2004 for co-developing a modeling approach for devices up to 65 GHz. He is also a Co-Founder of HMicron and an inventor of the REACH technology, which has the potential to deliver robust wireless solutions to the healthcare industry, and a Co-Founder of RF Pixels, a 5G technology startup.



**Saleh Hassanzadehyamchi** (Graduate Student Member, IEEE) received the B.Sc. degree from the Sharif University of Technology, Tehran, Iran, in 2017. He is currently pursuing the Ph.D. degree with the University of California at Davis, Davis, CA, USA.

His research interests include mm-wave and terahertz integrated circuits as well as system on chips.

Mr. Hassanzadehyamchi was a recipient of the U.C. Davis Graduate Fellowship Award in 2017.



**Omeed Momeni** (Senior Member, IEEE) received the B.Sc. degree in electrical engineering from the Isfahan University of Technology, Isfahan, Iran, in 2002, the M.S. degree in electrical engineering from the University of Southern California, Los Angeles, CA, USA, in 2006, and the Ph.D. degree in electrical engineering from Cornell University, Ithaca, NY, USA, in 2011.

He joined the faculty of Electrical and Computer Engineering Department, University of California at Davis, Davis, CA, in 2011 and is currently an Associate Professor. He was a Visiting Professor with the Electrical Engineering and Computer Science Department, University of California, Irvine, Irvine, CA, from 2011 to 2012. From 2004 to 2006, he was with the National Aeronautics and Space Administration (NASA), Jet Propulsion Laboratory (JPL), Pasadena, CA, as an RFIC Designer. His research interests include mm-wave and terahertz integrated circuits and systems.

Prof. Momeni serves as an Associate Editor for the IEEE MICROWAVE AND WIRELESS COMPONENTS LETTERS (MWCL) since 2021, and a Technical Program Committee (TPC) member of Radio Frequency Integrated Circuits (RFIC) Symposium since 2018. He has also served as a Distinguished Lecturer for Solid-State Circuits Society (SSCS) in 2020–2022, an Associate Editor of the *Transactions on Microwave Theory and Techniques* (TMTT) in 2018–2020, a Steering Committee Member (2020), a Technical Program Review Committee Member (2017–2020) of the International Microwave Symposium (IMS), an Organizing Committee Member of the IEEE International Workshop on Design Automation for Analog and Mixed-Signal Circuits in 2013, and the Chair of the IEEE Ithaca GOLD section in 2008–2011. He was a recipient of the UC Davis Graduate Program Advising and Mentoring Award in 2022, the National Science Foundation CAREER Award in 2015, the Professor of the Year 2014 by IEEE at UC Davis, the Best Ph.D. Thesis Award from the Cornell ECE Department in 2011, the Outstanding Graduate Award from Association of Professors and Scholars of Iranian Heritage (APSIH) in 2011, the Best Student Paper Award at the IEEE Workshop on Microwave Passive Circuits and Filters in 2010, the Cornell University Jacob's fellowship in 2007, and the NASA-JPL fellowship in 2003.



**Amirreza Alizadeh** (Member, IEEE) received the Ph.D. degree in electrical engineering from the Sharif University of Technology, Tehran, Iran, in 2017.

From 2018 to 2019, he was an Assistant Professor with the Department of Electrical Engineering, Amir-Kabir University of Technology (Tehran Polytechnic), Tehran. From 2019 to 2021, he was a Post-Doctoral Research Scholar with the School of Electrical, Computer and Energy Engineering, Arizona State University, Tempe, AZ, USA, where he taught courses and performed research on full-duplex radios and phase noise of injection locked VCOs. From 2021 to 2023, he was a Staff Scientist with the Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA, USA where he worked on realization of 210- and 280-GHz communication links. He is currently a Research and Development IC Design Engineer with Keysight Technologies, Santa Rosa, CA, USA. His current research interests include RF and mm-wave integrated circuits and systems, mm-wave antennas in package, and mm-wave module design and packaging.

he taught courses and performed research on full-duplex radios and phase noise of injection locked VCOs. From 2021 to 2023, he was a Staff Scientist with the Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA, USA where he worked on realization of 210- and 280-GHz communication links. He is currently a Research and Development IC Design Engineer with Keysight Technologies, Santa Rosa, CA, USA. His current research interests include RF and mm-wave integrated circuits and systems, mm-wave antennas in package, and mm-wave module design and packaging.