

# A Compact Single-Ended Common-Base Doherty PA in 90-nm BiCMOS With 37.3% Peak PAE for 5G Beamforming Arrays

Mohammad Ali Mokri<sup>✉</sup>, *Graduate Student Member, IEEE*, Emad Afshar<sup>✉</sup>, *Graduate Student Member, IEEE*, Md Aminul Hoque<sup>✉</sup>, *Student Member, IEEE*, Soodeh Miraslani<sup>✉</sup>, Shahriar Mirabbasi, *Senior Member, IEEE*, and Deukhyoun Heo<sup>✉</sup>, *Fellow, IEEE*

**Abstract**—This article presents a back-off efficient power amplifier (PA) for mm-wave 5G and upcoming 6G beamforming phased array transceivers (PATs), incorporating advanced circuit designs and novel implementations in both passive and active components. Conventional back-off efficient PAs in the mm-wave frequency range occupy a large chip area, making it hard to fit them into PATs. To overcome this issue, we propose a compact back-off efficient Doherty PA (DPA) with a common base (CB) structure as the core of the PA and small low-loss passive elements. In addition, the proposed architecture moves the role of the input hybrid coupler to the interstage matching network while maintaining DPA functionality. The interstage matching provides the required phases for the main and auxiliary PAs, power division, and impedance matching. The PA prototype is fabricated in the GlobalFoundries 90-nm BiCMOS (9 HP) process. It achieves a peak gain of 20.4 dB at 28.45 GHz with a 1-dB bandwidth of 4.45 GHz. Under large-signal conditions, it achieves  $>19.5$  dBm  $P_{\text{sat}}$  with  $>36\%$  PAE<sub>sat</sub>. Its  $P_{1\text{dB}}$  at 26, 28, and 30 GHz are 19.4, 19.3, and 19.3 dBm with 38.5%, 37.3%, and 36.8% PAE<sub>1dB</sub>, respectively. In the 6-dB power back-off region, it reaches efficiencies of 29.1%, 31.1%, and 29.3% at 26, 28, and 30 GHz, respectively. When tested with the NR-FR2 test model at these frequencies, the PA achieves  $P_{\text{avg}}$  of 8.25, 8.45, and 8 dBm, and PAE<sub>avg</sub> of 13.9%, 14.5%, and 13.7% for a 400 M 1-CC 64-QAM signal, maintaining an rms error vector magnitude (EVM<sub>rms</sub>) of  $-25.8$ ,  $-25.8$ , and  $-25.7$  dB. In addition, in adjacent channel power ratio (ACPR) tests, the PA achieves  $-27$ ,  $-26.2$ , and  $-30.8$  dBc on the lower side and  $-28.4$ ,  $-28.5$ , and  $-27.6$  dBc on the higher side channels at 26, 28, and 30 GHz, respectively.

**Index Terms**—Beamforming transceivers, Doherty power amplifier (DPA), 5G FR2, load modulation, mm-wave amplifier.

Manuscript received 17 April 2024; revised 12 June 2024; accepted 28 June 2024. Date of publication 5 September 2024; date of current version 7 January 2025. This work was supported in part by the U.S. National Science Foundation under Grant CNS-1955306, Grant EECS-2030159, and Grant CDADIC. (Corresponding author: Deukhyoun Heo.)

Mohammad Ali Mokri, Emad Afshar, Md Aminul Hoque, and Deukhyoun Heo are with the School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99163 USA (e-mail: dheo@wsu.edu).

Soodeh Miraslani resides in Colorado Springs, CO 80920 USA.

Shahriar Mirabbasi is with the Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC V6T 1Z4, Canada.

Color versions of one or more figures in this article are available at <https://doi.org/10.1109/TMTT.2024.3432602>.

Digital Object Identifier 10.1109/TMTT.2024.3432602

## I. INTRODUCTION

INTEGRATED silicon phased-array transceivers are highly desirable for mm-wave 5G and future 6G beamforming applications, reducing size while increasing functionality. Within a phased-array beamforming transceiver, multiple transmit and receive elements operate with a limited power budget and silicon area. While mm-wave 5G promises data rates up to 100 Gb/s, future 6G networks and wireless applications will surely benefit from operation in the range of 100-GHz to 1-THz frequency bands, where even greater data rates will be possible [1]. Massive spatial multiplexing in hub and backhaul communications, as well as very precise sensing [2], imaging [3], spectroscopy, and other applications, are all made possible by the short wavelengths of mm-wave and sub-terahertz (THz). For a beamforming transceiver chip with 256 elements in base stations [4] or 4–8 elements in mobile applications [5], power amplifier's (PA's) performance and size are more crucial than PA specifications of traditional transmitters with one transmitter. Since the output power of PAs in these systems defines the number of transmitter and receiver elements to meet the requirements of specific applications, the power efficiency and linearity of PA at the peak and back-off power region significantly impact the system's total performance. Prior works have rarely incorporated back-off efficient PAs. In these chips, due to their compact size, mostly simple class-AB PAs are implemented [5], [6], [7], [8], [9], [10]. In addition, PA's instantaneous bandwidth and linearity define the maximum achievable data rate in the transmitter, and its power efficiency significantly impacts the overall system's energy efficiency. Also, the large peak-to-average-power ratio (PAPR) of the high-order modulated signal requires very linear PAs.

State-of-the-art phased array transceivers (PATs) show that the area dedicated to a PA is considerably smaller [5], [9], [10] than the area of conventional back-off efficient [11], [12], [13], [14], [15], [16] PAs. The prior mm-wave beamforming chips, as presented in the literature, have utilized various techniques. These include simple class-AB differential pairs [5], [6], [7], [8], differential cascode structures with baluns at the output port [6], or single-ended (SE) PAs [9], [10]. These design choices are driven by considerations, such as compactness,



Fig. 1. Architectures of Doherty power combiners, series and parallel, for mm-wave PAs. (a) Parallel Doherty 1, (b) parallel Doherty 2, (c) series Doherty 1, (d) series Doherty 2, and (e) parallel Doherty 3.

moderate output power level (ranging from 10 to 17 dBm), and acceptable peak power efficiency. Back-off efficient mm-wave PAs are presented using the Doherty structure [11], [12], [13], [17], [18], out-phasing [14], [15], and balanced structures [16]. However, these PAs have not been widely incorporated into PATs due to a specific challenge: the auxiliary amplifier which modulates the load seen by the main amplifier requires a separate inductive element. This additional component significantly increases the overall area occupied by the PA. In this article, we develop a compact, linear, and back-off efficient PA based on Doherty architecture that meets the requirements of mm-wave 5G beamforming chips for higher data rates and lower power consumption.

This article consists of different sections. Section II compares different topologies of mm-wave Doherty architectures in terms of power generation, size, and bandwidth. Section III studies different active device structures and their effects on power, efficiency, and size. Section IV proposes a compact Doherty combiner structure and presents its performance. Section V describes the design of interstage matching networks and driver amplifiers, focusing on strategies to reduce the circuit size. Section VI presents the full implementation of the Doherty PA (DPA) and its measurement results.

## II. ARCHITECTURES OF DOHERTY POWER COMBINERS

An amplifier achieves peak efficiency when its active devices operate at maximum voltage and current swings. However, when a modulated signal is applied, efficiency may decrease due to lower currents and voltages. To maintain high efficiency, the active device should operate at its highest possible voltage swing, as the current is determined by the input drive level or bias condition. To maintain the full voltage swing, adjustment of the load presented to the active device is needed according to Ohm's law. Consider a case where two similar current sources inject current into a single load, the impedance seen by each device is modulated, increasing by a factor of 2. The Doherty power combiner achieves proper load modulation by adding an impedance inverter to counter the impedance rise in the case of two current sources.

For the mm-wave frequencies, the structures shown in Fig. 1 are possible candidates. In this analysis, we will compare these structures based on output power, silicon area, and bandwidth;

and choose the one that serves best to our purpose. It is important to note, a limitation of load modulation circuits by now and that is the requirement for at least two current sources (transistors). Fig. 1(a) shows a parallel Doherty structure that uses two transformers for matching and a quarter wavelength TLine (QTL) on the secondary side of the main transformer. By solving the equations at 0- and 6-dB back-off, we find that  $R_{opt} = 2R_L/n^2$  and  $Z_{01} = 2R_L$ . Note that  $R_{opt}$  must be  $2R_L$  if  $n = 1$ . Because  $R_{opt} = 100 \Omega$ , this structure defines the size and output power of amplifier stages, i.e., main and auxiliary. Basically,  $n$  is the only design parameter for the designer to tune PA parameters.

While the basic Doherty combiner assumes ideal current sources, in reality, inductive components are needed to tune out the parasitic capacitor of transistors. Also, transformers allow simple bias circuit design. The characteristic impedance of the TLine used in this structure is equal to  $Z_{01} = 2R_L = 100 \Omega$ , which is a large value for IC-level implementations. To implement the QTL, lumped element  $\pi$ -section or doubly tuned transformers can be used. As an example at 28 GHz, the required inductance to implement a  $\pi$  section QTL is 568 pH with  $Z_0 = 100 \Omega$ . In either case, the larger the required  $Z_o$ , the larger the size of the required inductive component. Therefore, the size of the Doherty combiner is also bound to the value of  $R_L$ . In Fig. 1(b), the transmission line is moved to the primary of the transformer. The required value of  $Z_{01}$  becomes  $Z_{01} = 2R_L/n^2$  and  $R_{opt}$  remains the same. If  $n$  is greater than 1, the characteristic impedance requirement is reduced, but practical limitations may keep it relatively large.

Series Doherty structure shown in Fig. 1(c) sums the voltages of secondary windings instead of current summation. The required value of  $R_{opt}$  is  $R_L/(2n^2)$ , much smaller than the first two cases. It allows significantly larger output power. However, designers are still constrained by active device size. For  $n = 1$  and  $R_{opt} = 25 \Omega$ , the characteristic impedance,  $Z_{01}$ , is one-fourth of that in the first two structures, leading to smaller passive components. A limitation of the structures in Fig. 1(b) and (c) is that the output capacitor of the active device, absorbed in the TLine implementation on the primary side (e.g., lumped  $\pi$ -section), may not match the required capacitor for  $Z_0$ . If  $Z_0$ 's required capacitor is larger than the device capacitor ( $C_{dev}$ ), adding an extra capacitor is possible.



Fig. 2. Comparison of Doherty power combiner architectures, series and parallel, for mm-wave PAs. (a) PD1, (b) PD2, (c) SD1, (d) SD2, and (e) PD3.

However, if it is smaller, finding suitable component values may not be possible.

The structure shown in Fig. 1(d) moves the TLine to the load side of the combiner and it relieves the limitation of the previous structure, Fig. 1(c); however, it results in an increase in the physical width of this Doherty structure. The physical width of the Doherty combiners limits their usability in PATs.

Fig. 1(e) shows a different structure [19] compared with the previous ones [see Fig. 1(a)–(d)]. The structure enhances the basic Doherty design by adding two QTLs in the auxiliary path, maintaining a 90° phase shift. These additional transmission lines provide design flexibility and improve modulation bandwidth. Although it appears bulky with three transmission lines, efficient implementation of QTLs can make it compact. By solving the equations of this structure, we find that if we pick  $Z_{01} = Z_{02}$  for simplicity then we have

$$Z_{01} = Z_{02} = \sqrt{(2R_{\text{opt}}R_L)} \quad (1)$$

and

$$Z_{03} = 2R_L. \quad (2)$$

From (1), we find that  $R_{\text{opt}}$  is less bound to  $R_L$ . In other words,  $R_{\text{opt}} = Z_{01}^2/2R_L$  which demonstrates that by reducing  $R_{\text{opt}}$  the required value of  $Z_{01}$  is reduced by a power of 2. In this structure,  $R_{\text{opt}}$  has an inverse relationship with  $R_L$ , providing more flexibility in choosing active device sizes based on power or efficiency. The second equation requires  $Z_{03}$  to be twice the load impedance, similar to previous designs. An efficient implementation uses two transformers and a few capacitors, resulting in a compact size compared with inductive components [11], [12].

In Section II-A, we compare the performance and physical sizes of these structures with realistic values for components.

#### A. Analysis of mm-Wave Doherty Power Combiners

In this section, we compare architectures shown in Fig. 1. Each has a designed and simulated circuit implementation. Assuming that active devices  $R_{\text{opt}}$  and  $C_{\text{dev}}$  are 53 Ω and 125 fF, respectively, along with reasonable quality factor values for inductors and transformers (27 and 24, respectively, at the design frequency). Section III explains the origins of these numbers.

Fig. 2(a) shows an implementation of parallel Doherty 1, Fig. 1(a), and the simulation result of active load modulation from 24 to 30 GHz for auxiliary (blue curve) and main (red curve) amplifiers. At the design frequency, 28 GHz, the main and auxiliary amplifiers follow optimal trajectories on the X-axis of the Smith chart. Ideally, they start at  $2R_{\text{opt}}$  and open-circuit terminations at 6-dB back-off, reaching  $R_{\text{opt}}$  at 0-dB back-off. However, as the frequency deviates from the design frequency, their trajectories diverge. Table I provides the component values for the Doherty power combiner. Parallel Doherty 1 requires a transformer with a primary inductance of 258.5 pH and a secondary inductance of 762 pH with a coupling factor of 0.8. Realizing such a transformer with good performance in IC technologies is challenging if it is not impossible. In addition, this architecture necessitates a separate inductor for implementing lumped QTL, which requires 568 pH and occupies a significant chip area.

In Fig. 2(b), parallel Doherty 2 (PD2) is an implementation of Fig. 1(b). As the simulation result shows, compared with PD1, this architecture provides more condensed trajectories

TABLE I  
COMPONENT VALUES FOR DOHERTY POWER COMBINER ARCHITECTURES

|     | $R_{opt}$ | $C_{dev}$ | $L_p$ | $L_s$ | $k_1$ | $n_1$ | $Z_{01}$ | $L_1$ | $C_1$ | $C_2$ | $C_3$ | $C_4$ | $C_5$ |
|-----|-----------|-----------|-------|-------|-------|-------|----------|-------|-------|-------|-------|-------|-------|
| PD1 | 53        | 125       | 258.5 | 762   | 0.8   | 1.374 | 100      | 568.4 | 56.8  | 117.7 | -     | -     | -     |
| PD2 | 53        | 125       | 258.  | 762   | 0.8   | 1.374 | 53       | 301.2 | 107.2 | 117.8 | -     | -     | -     |
| SD1 | 53        | 125       | 258.5 | 258.5 | 0.65  | 0.68  | 53       | 301.2 | 107.2 | 108.2 | -     | -     | -     |
| SD2 | 53        | 125       | 258.5 | 258.5 | 0.65  | 0.68  | 25       | 142.1 | 227   | 216.5 | -     | -     | -     |
| PD3 | 53        | 125       | 423   | 423   | 0.62  | 0.625 | -        | -     | -     | -     | 68.1  | 68.1  | 56.84 |

around the  $X$ -axis of the Smith chart, both at 0- and 6-dB back-off. Also, the required value of  $Z_{01}$  is reduced to  $53 \Omega$ , leading to a reduced  $L_1$  (301 pH), which results in a more compact layout compared with the PD1. However, the transformers still occupy a large chip area.

In Fig. 2(c), a series Doherty 1 (SD1) architecture utilizes a QTL in the primary side of the auxiliary path and combines secondary transformers. Unlike previous designs, here,  $n$  must be smaller than 1. Achieving this could involve lowering the coupling factor ( $k$ ) or using a smaller secondary inductance. Despite requiring smaller inductances, the main and auxiliary trajectories deviate from the optimal impedance. At 0-dB back-off, the main path exhibits a smaller impedance, while the auxiliary path has a higher impedance compared with  $R_{opt}$ . This imbalance reduces combining efficiency.

Fig. 2(d) presents a variation of SD1 called SD2. In SD2, the QTL is moved to the secondary side of the transformer in the auxiliary path. The inductance values remain unchanged, but the required value for  $L_1$  decreases due to a reduction in  $Z_{01}$ . Both main and auxiliary paths closely approach  $R_{opt}$  at 0-dB back-off compared with the previous designs. An alternative implementation by Pashaeifar et al. [20] suggests a two-section QTL to improve modulation bandwidth, but the use of  $L_1$  increases the physical width of the Doherty combiner.

Fig. 2(e) depicts an implementation of parallel Doherty 3, utilizing three QTLs with two transformers and five capacitors. Device capacitors are integrated into the combiner. Although primary and secondary transformer values are higher than in series Doherty architectures, the absence of a third inductive component allows for a more compact design. Simulation results indicate that main and auxiliary trajectories closely follow the same paths, concentrating around the 28-GHz design frequency. However, a deviation from the optimal impedance point ( $R_{opt}$ ) occurs due to the finite  $Q$  of inductors and transformers.

In this study, theoretical component values are provided for a fair comparison between architectures. Readers are encouraged to verify these values by solving Doherty power combiner equations at 0- and 6-dB back-off. Some component values may be impractical for chosen impedances. For instance, achieving  $n = 1.374$  with a coupling factor of  $k_1 = 0.8$  can be challenging. Also, we neglected the effect of the quality factor of capacitors. The maximum capacitor value used in this

study is 227 fF and the estimated  $Q$ -factor for this element in our process technology is more than 75 at 28 GHz which is much higher compared with the  $Q$ -factor of inductors and transformers.

Having compared the potential architectures, it is essential to investigate the impact of active devices on the size of the Doherty combiner to identify the architecture that minimizes the amplifier's footprint while ensuring optimal performance. In Section III, the effect of active device structures on the performance metrics of DPA is analyzed.

### III. ACTIVE DEVICE STRUCTURES FOR DPA

In Section II, different Doherty power combiner architectures were analyzed. In this section, we analyze various structures of active devices for designing DPAs.

Common source/emitter (CS/CE) is a common structure for many analog circuits with relatively high input/output impedance and large voltage and current gains. Its differential version is even more favorable. For example, the voltage swing in differential amplifiers is twice as in SE design providing higher output power. Another important feature of differential amplifiers is that in theory the parasitic capacitor of the active device and its optimum resistance,  $R_{opt}$  are in series with each other, resulting in double resistance and half capacitance. This is important in the design of an impedance-matching network while considering the physical footprint of them. On the other hand, the common gate/base (CG/CB) is also an essential structure in analog design. Usually, a CG (or CB) structure is added on top of a CS/CE and makes a cascode [21] or in a stacked amplifier a CG (CB)-like-transistor is connected to a nonzero impedance letting the gate (base) have a finite amount of voltage swing. This would result in an in-phase summation of drain (collector) voltages of transistors [22].

An important feature of the CG/CB structure is that the parasitic capacitor between input and output is much smaller compared with CS (CE). This is important for the stability of the amplifier. In a common emitter (CE; source), the capacitor between the input and output nodes is larger, and the voltage swing at both sides of the parasitic capacitor is in the opposite directions, causing a large miller capacitor. Capacitive or inductive neutralization is proposed for differential and SE CS (emitter) structures [23]. Inductive neutralization is a narrowband while capacitive neutralization in differential CE



Fig. 3. Load-pull simulation results at 28 GHz for (a) differential CE, (b) differential CB, (c) SE CE, and (d) SE CB.

TABLE II  
COMPARISON BETWEEN DIFFERENT ACTIVE DEVICE STRUCTURES WITH THE SAME SIZE AND BIAS CONDITIONS

| Pin (dBm) | Differential CE |          | Differential CB |          | Single-Ended CE |          | Single-Ended CB |          |
|-----------|-----------------|----------|-----------------|----------|-----------------|----------|-----------------|----------|
|           | Peak Po         | Peak PAE |
| 19+J11    | 6               | 6        | 31+J35          | 12       | 10+J5           | 12       | 17.5+J11.5      | 12       |
| 12+J30    | 19.6            | 23+J37   | 5-J2            | 33+J43   | 16+J14          | 17.8     | 17.5            | 17.2     |
| 12+J30    | 21.6            | 58.3     | 5-J2            | 60       | 38.2            | 45.3     | 19.5            | 57.5     |
| 43.9      | 22.2            | 56       | 60              | 38.2     | 45.3            | 52       | 19.1            | 1.8      |
| 1.4       | 4               | 0.6      | 1.5             | 1.67     | 4.8             | 0.8      | 1.8             |          |

is wideband. The negative aspect of differential capacitive neutralization is that it works in odd harmonics, but it doubles the parasitic capacitor at even harmonics.

To have a quantitative comparison between the CE and CB in SE and differential modes, load-pull simulations have been done. Fig. 3 shows the load-pull contours of peak power-added efficiency (PAE) and peak power on the Smith chart for different structures at 28 GHz; in this figure, test benches for differential CE, differential CB, and SE CB are also shown. The simulations are carried out in 90-nm BiCMOS process with fixed device sizes and bias conditions, each device has a 60- $\mu$ m emitter length. Since there are different layout implementations for neutralized differential CE, we use schematic models for these comparisons for simplicity. Notice that, since there is no neutralization capacitor is applied to CB structures the effect of layout parasitic on them is less than differential CE. Table II summarizes the results of load-pull simulations. First of all, we can see differential structures provide almost 3 dB higher output power compared with their SE counterparts, as we expected. In terms of PAE CE/CB differential, and SE CB show decent PAE performance; however, in the case of differential CE, there is a significant PAE drop while optimal impedances for peak power are presented to this stage. To address this behavior, we need to consider the large signal operation of the device at the base and collector nodes in the CE structure. We know that the

base (gate) and collector (drain) in a CE (CS) structure swing in opposite directions; therefore, for large voltage swings on the base, the CB junction will become forward biased. When load impedance is large, the voltage swing on the collector side is also large, resulting in early saturation and lower output power. Also, the device's collector swing will be limited to the supply voltage, resulting in efficient power generation (for the generated current, the maximum voltage swing is achieved). On the other hand, when the optimum impedance for peak power is presented to the device which is smaller than the peak PAE impedance, the device's collector voltage swing is smaller compared with the previous case. Therefore, the device may enter saturation for larger base voltages. In this case, the device will be limited to its current generation capability. More power can be generated from the device, but in terms of efficiency, it is not optimal since the device may reach the highest voltage swing at higher currents. When it comes to output power performance both CB SE and differential show higher output power compared to their CE rivals. For example, in the case of differential CB, it provides almost 2 and 0.6 dB higher output power compared with CE differential when peak PAE and peak  $P_o$  impedances are presented to this structure, respectively. The reason behind this observation is the difference in their operation principle. In a CE stage, input and output nodes swing in opposite directions. While in the CB stage, they swing in the same direction. When both structures approach



Fig. 4. Modeled two-stage PA for PAE calculations.

saturation, the CE stage leaves the active region sooner than the CB stage. To be more precise, the collector of the CE stage can go down to  $V_B - V_{B\text{Con}}$  and since  $V_B$  has a large voltage swing, the device enters the saturation region. Conversely, in the case of the CB stage, the base voltage remains constant and the collector voltage can decrease further, leading to higher output power from the active devices. Also, the CB stage, either differential or SE, exhibits less compression.

Impedances looking into differential CB show larger  $R_{\text{opt}}$  compared with CE. As shown in Section II, the value of  $R_{\text{opt}}$  defines the required value of  $Z_0$  in the Doherty combiner, and for larger  $R_{\text{opt}}$ , the Doherty combiner becomes larger. For that reason, if a compact Doherty combiner is desired SE structure could be a better choice. SE CB and CE have half  $R_{\text{opt}}$  compared with differential cases. While SE CE shows inferior performance compared with other structures, SE CB can be a good candidate as the power stage of a compact DPA.

Understanding the impact of the low gain of the CB stage on the PAE performance of the Doherty PA is crucial. To clarify this, a two-stage PA is modeled as shown in Fig. 4. Active devices are modeled with their output power ( $P$ ), gain ( $G$ ), and PAE. Passive devices are modeled with their losses ( $L$ ). For the PAE of a two-stage amplifier, we can write

$$\text{PAE}_{\text{total}} = \frac{P_{\text{out}} - P_{\text{in}}}{P_{\text{dc1}} + P_{\text{dc2}}} \quad (3)$$

where  $P_{\text{dc1}}$  and  $P_{\text{dc2}}$  are the dc power consumption of output and driver stages and we find them from the PAE formula  $P_{\text{dc1}} = [(P_1 - P_{\text{in}1})/\text{PAE}_1]$  and  $P_{\text{dc2}} = [(P_2 - P_{\text{in}}/L_3)/\text{PAE}_2]$ . Also,  $P_{\text{out}}$  and  $P_{\text{in}}$  can also be calculated based on input power, gain, and loss parameters shown in Fig. 4 as follows:

$$P_{\text{out}} = \frac{P_{\text{in}}G_1G_2}{L_1L_2L_3}. \quad (4)$$

Note that  $G_1$  and  $G_2$  are large signal gains. By replacing (4) into (3), we have

$$\text{PAE}_{\text{total}} = \frac{1 - \frac{L_1L_2L_3}{G_1G_2}}{\frac{L_1(1-1/G_1)}{\text{PAE}_1} + \frac{L_1L_2(1-1/G_2)}{G_1\text{PAE}_2}}. \quad (5)$$

By plotting (5) with rational assumptions, we can observe how the PAE of the two-stage PA is influenced by the low gain of the CB stage. Fig. 5 shows PAE contours of two-stage PA with respect to output (PAE<sub>1</sub>) and driver (PAE<sub>2</sub>) stages. In this plot, we assumed:  $L_1 = 1.1$  dB,  $L_2 = 2$  dB,  $L_3 = 1.0$  dB,  $G_1 = 9.7$  dB, and  $G_2 = 17$  dB. Although the gain of the CB stage is lower than the CE stage, the two-stage PA can achieve higher than 40% PAE with adequate PAE at each stage. This analysis can be extended to more than two stages. However, the effect of prior stages on the PAE of the total amplifier can be considered negligible when

Fig. 5. Contour plots of PAE for a two-stage PA, assuming  $L_1 = 1.1$  dB,  $L_2 = 2$  dB,  $L_3 = 1.0$  dB,  $G_1 = 9.5$  dB, and  $G_2 = 17$  dB.

the gain of the driver or output stage is sufficiently high. Cascode and stacked amplifier stages can also be considered as active structures for the Doherty amplifier. While they have similarities, their operation principles are different. In a cascode structure, the base of the cascode device is shunted to the ground, resulting in zero voltage swings on the base (gate) of the cascode device. Also, the cascode node (collector or drain of the bottom transistor) experiences a very small voltage swing. However, in the case of a stacked structure, the base (gate) of the cascode device is tied to the ground via a finite impedance. Resulting in a finite voltage swing on the base or gate of the top transistor. The cascode node also has a voltage swing. In a well-designed stacked structure, the voltage swing on each device is equal. Therefore, results in a 3 dB higher output power, and twice the output impedance compared with a simple differential. Higher output power is desirable for amplifier design while large output impedance for the Doherty amplifier results in a large footprint. The voltage swing of the cascode structure is almost the same as the simple differential structure. Since the voltage swings are similar, output power and output impedances are almost the same. Therefore, in terms of the Doherty combiner, it has a similar requirement as a simple differential structure.

In this section, we analyzed different structures of active devices and their effect on the size, efficiency, and output power of a two-stage Doherty amplifier. We observed that although differential stages provide more output power, their optimum impedances require larger inductive components. Therefore, SE stages are more favorable when area efficiency is of interest. Comparing SE CB and CE, since the CE stage has inferior performance, it is less suited for such a Doherty design.

#### IV. DOHERTY PA IMPLEMENTATION

In Sections II and III, we examined various architectures of the Doherty combiners and analyzed the effect of different active device structures on the performance metrics of the PA. We propose a SE structure for the output stage and the PD3 as the Doherty combiner. For the driver stage, a differential CE stage with high gain is used. In addition, we propose to move the role of 90° hybrid coupler to the interstage matching

network to reduce the length of the PA layout. In Section IV-A, we describe this design in more detail.

### A. Doherty Combiner Design

In Section II, for the analysis of different architectures, we assumed optimum impedances of active devices to be  $Z_{\text{opt}} = 22 - j26$  ( $53 \Omega \parallel 125 \text{ fF}$ ) which is an impedance between the maximum PAE and maximum output power,  $P_o$  points of a differential CE stage shown in Fig. 3. Now, we understand that our active device structure is the SE CB stage, we can design the Doherty power combiner for the optimum impedances of SE CB. As mentioned in Section III, maximum PAE and maximum  $P_o$  points for this stage are close to each other, enabling near-optimal performance regarding both power and PAE metrics. If we choose  $Z_{\text{opt}} = 17.5 - j14.4$  ( $29.5 \Omega \parallel 159 \text{ fF}$ ), the component values of Doherty combiner (DP3) are as follows:  $C_1 = C_2 = 159.5 \text{ fF}$ ,  $C_3 = C_4 = 102 \text{ fF}$ ,  $C_5 = 56.8 \text{ fF}$ ,  $L_p = L_s = 355 \text{ pH}$ , and  $k_m = 0.65$ . The theoretical values of components are used as a starting point in the design process, followed by electromagnetic (EM) circuit design and optimization. Since the PA is based on SE active devices, the transformers can also be implemented accordingly. Multiturn transformers and inductors are effective ways to reduce the footprint of these components. However, a differential two-turn transformer may need four metal layers. On the other hand at mm-wave frequencies, capacitive coupling between metal layers increases the coupling factor and reduces the usable frequency of the transformer.

To implement the quarter-wavelength we propose to use a 1.5-turn SE transformer in the implementation of the Doherty combiner, as it helps further reduce the Doherty combiner footprint. Also, to achieve a relatively low coupling factor between the primary and secondary windings of transformers, we choose an alternative way, instead of using two consecutive metal layers, we implement the primary coil in the top layer and the secondary coil of the transformer in the third layer from the top. By implementing this design method, a portion of the coupling factor ( $k_m$ ) resulting from capacitive coupling is reduced without changing magnetic coupling. Comparing the transformer designed with the top and second top metal layers to the one designed here is not that straightforward. Designing a quarter-wavelength using a transformer with a high coupling factor requires very large inductances for primary and secondary according to

$$Z_{0\text{Tr}} = \frac{\omega_0 k_m L_p}{n(1 - k_m^2)} \Rightarrow L_p = \frac{Z_{0\text{Tr}} k_m}{n(1 - k_m^2)} \quad (6)$$

where  $n = \sqrt{L_s/L_p}$  and  $Z_{0\text{Tr}}$  is the characteristic impedance for the quarter-wavelength designed by the transformer. For example, we may consider a case where a transformer-based QTL with a given characteristic impedance needs to be designed. Two transformers can be designed, one with a coupling factor of 0.75 and the other one with 0.65. For the case with the coupling factor 0.75, the required inductance is almost 32% higher compared with 0.65. To get large inductive coils, longer metal traces are needed, which results in more resistance per coil. Not only that, the physical size of the transformer is another prohibitive factor. Fig. 6(a) and (b)



Fig. 6. Three-dimensional view of (a) 1.5-turn transformer and (b) proposed Doherty power combiner.



Fig. 7. EM simulation results of (a) 1.5-turn transformer and (b) passive efficiency of Doherty power combiner.

shows the 3-D view of the transformer and the Doherty combiner, respectively. The transformer occupies  $100 \times 100 \mu\text{m}$ . The resulting Doherty combiner using the proposed transformer only occupies  $297 \times 155 \mu\text{m}$  which is 50% smaller than [11] and 100  $\mu\text{m}$  narrower compared with [20]. Fig. 7(a) demonstrates the EM simulation results of the transformer implemented in a 90-nm BiCMOS process. The coupling factor is about 0.63 in the band of interest and the inductance values for primary and secondary windings are in the range of 380–390 pH. Fig. 7(b) shows the passive efficiency of the



Fig. 8. Interstage matching network. (a) Traditional architecture versus proposed, (b) proposed circuit implementation, (c) EM simulation result of the proposed implementation, (d) simulation results of amplitude and phase differences between interstage matching outputs of main and auxiliary paths over PA output power at 26, (e) 28, and (f) 30 GHz.

designed Doherty combiner versus frequency, demonstrating the maximum efficiency drop is 4% over a 4-GHz bandwidth around the center frequency.

### B. Driver and Interstage Matching Network Design

In this section, we present the design of the driver stage and interstage matching network. As we discussed in Section III, the driver stage plays a vital role in determining the overall performance of the PA and needs careful design (Fig. 5). A differential CE stage provides both high gain and decent efficiency. In traditional mm-wave DPA design for each auxiliary and main amplifier, one driver is used [11], [12], [13], [16], [20], [24]. To distribute the input signal between auxiliary and main amplifiers, a 90° hybrid coupler is used at the input. The 90° hybrid coupler is added on the top of input matching circuits which occupy a considerable amount of area leading to an increase in the overall size of the Doherty amplifier. On the other hand, as shown in Table II, the input impedance of the CB stage is small, leading to a large impedance transformation ratio (ITR), [Fig. 8(a)]. In this design a single driver is proposed and signal division, 90° phase difference, and matching are done in the interstage matching network, a single driver amplifier reduces the ITR by a factor of 2. While using separate drivers allows class-C type biasing for the driver in the auxiliary amplifier path, a single driver allows a more compact implementation of the Doherty amplifier [25]. Also, dynamic power splitting methods can further save power in the driver stage [26], [27].

As we discussed before, DP3 provides power combining for 90° out-of-phase signals. Therefore, based on the reciprocity theorem, if DP3 is excited from its output node, it generates two 90° out-of-phase signals and 3-dB power division. At the

same time, it can be used to match load to source impedances. Fig. 8(b) shows the proposed interstage matching network and its circuit implementation. We start the design by looking into the phase difference needed at the input of main and auxiliary amplifiers. The input signal to the main amplifier should have a 90° phase delay compared to the auxiliary. Thus, the number of quarter-wavelength transmission lines in the main path should exceed the number of quarter-wavelength transmission lines in the auxiliary path by one. As shown in Table II, the input impedance of CB SE is about  $2 \Omega$ , and  $R_{\text{opt}}$  of the driver amplifier is about  $60 \Omega$ . Leading to a large ITR. To have enough bandwidth in the interstage matching network design we need to add an extra Tline in each of the main and auxiliary paths. Extra TLines may seem to be bulky but as we would see next the characteristic impedance of additional TLine is minimal, requiring very small series inductance. Here, we consider the input impedances of the output stage and output impedance of the driver to be purely resistive. Parasitic capacitors can be embedded into the interstage matching design. The design process follows by choosing the characteristic impedance for each TLine in Fig. 8(b). There are two approaches, first, choosing the characteristic impedances based on the theory of small reflections [28] or similar to the design method used for the design of the output combiner. If we choose the first method the characteristic impedances of the main and auxiliary can be calculated for maximally flat, or equal ripple. For example,  $R_s = 60 \Omega$  and  $R_m = R_a = 2 \Omega$ , the characteristic impedances of main  $Z_{01m} = 3.44$ ,  $Z_{02m} = 15.5$ , and  $Z_{03m} = 69.83 \Omega$  and for the auxiliary we have  $Z_{01a} = 5.57$  and  $Z_{02a} = 43.12 \Omega$ . If we chose the second method  $Z_{03m} = 2R_s$  and  $Z_{02m} = Z_{02a} = Z_{01m} \times (2 \times R_s / R_m)^{1/2}$  and  $Z_{01m}$  is a design freedom. The lumped element implementation



Fig. 9. Schematic of compact CB DPA.

TABLE III  
COMPONENT VALUES OF A DESIGN EXAMPLE FOR THE INTERSTAGE MATCHING NETWORK DESIGN

|                     |                       |  |                                                                                                                                                                                                              |                                                                                                                      |
|---------------------|-----------------------|--|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------|
| Auxiliary Amplifier | $Z_{01a} = 4\Omega$   |  | $L_1 = \frac{Z_{01a}}{\omega_0}$<br>$C_1 = \frac{1}{\omega_0 Z_{01a}}$                                                                                                                                       | $L_1 = 22.7 \text{ pH}$<br>$C_1 = 1.42 \text{ fF}$                                                                   |
|                     | $Z_{02a} = 31\Omega$  |  | $L_p = \frac{Z_{02a} k_m}{(n_2 \omega_0 (1 - k_m^2))}$<br>$L_s = n_2^2 L_p$<br>$C_p = \frac{1}{\omega_0 (1 - k_m^2) L_p}$<br>$C_s = \frac{1}{\omega_0 (1 - k_m^2) L_s}$<br>$k_m, n_2 \text{ design freedom}$ | $L_p = 293 \text{ pH}$<br>$L_s = 102 \text{ pH}$<br>$C_p = 509 \text{ fF}$<br>$C_s = 176 \text{ fF}$<br>$k_m = 0.61$ |
| Main Amplifier      | $Z_{01m} = 4\Omega$   |  | $L_1 = \frac{Z_{01m}}{\omega_0}$<br>$C_1 = \frac{1}{\omega_0 Z_{01m}}$                                                                                                                                       | $L_1 = 22.7 \text{ pH}$<br>$C_1 = 1.42 \text{ fF}$                                                                   |
|                     | $Z_{02m} = 31\Omega$  |  | $L_p = \frac{Z_{02m} k_m}{(n_1 \omega_0 (1 - k_m^2))}$<br>$L_s = n_1^2 L_p$<br>$C_p = \frac{1}{\omega_0 (1 - k_m^2) L_p}$<br>$C_s = \frac{1}{\omega_0 (1 - k_m^2) L_s}$<br>$k_m, n_1 \text{ design freedom}$ | $L_p = 293 \text{ pH}$<br>$L_s = 102 \text{ pH}$<br>$C_p = 509 \text{ fF}$<br>$C_s = 176 \text{ fF}$<br>$k_m = 0.61$ |
|                     | $Z_{03m} = 120\Omega$ |  | $C = \frac{1}{Z_{03m} \omega_0}$                                                                                                                                                                             | $C = 47.4 \text{ fF}$                                                                                                |

of quarter-wave TLines can differ. However, we choose the schematic shown in Fig. 8(b) as our proposed implementation. In our proposed implementation,  $Z_{01m}$  and  $Z_{01a}$  are implemented using lumped element pi-section,  $Z_{02m}$  and  $Z_{02a}$  are implemented using transformer-based quarter-wave TLines, and  $Z_{03m}$  is implemented using the method present in [12]. The component values for the above example are given in Table III. These values are used as a starting point for the design of the interstage matching network. Fig. 8(b) combines all the TLine implementations of Table III, and adjacent capacitors are merged.

Fig. 8(c) shows the passive efficiency of the proposed interstage matching network resulting from EM simulation along with the phase performance of this network at 0-dB power backoff. The phase difference between main and auxiliary amplifiers is set to be  $-90^\circ$  at 28 GHz. From 24 to 30 GHz, phase mismatch is less than  $\pm 10^\circ$ . The interstage matching network covers from 24 to 34 GHz with an insertion loss of 2 dB at 28 GHz. At low output powers, the auxiliary amplifier is OFF; therefore, it is desirable to deliver more power to the main amplifier and not to waste in the auxiliary path. The proposed impedance-matching network has two QTLs in the auxiliary path and three QTLs in the main path. The transformed impedance in the auxiliary path is



Fig. 10. Chip microphotograph of compact CB DPA.



Fig. 11. Small-signal measurements of compact CB DPA.

equal to  $Z_{\text{Aux}} = R_A \times Z_{02a}^2 / Z_{01a}^2$ . Therefore, in low-power regions,  $Z_{\text{Aux}}$  is a large value, and  $Z_{\text{Main}}$  is a finite impedance. Consequently, the driver amplifier's current flows into the main path. As the output power increases, the auxiliary amplifier turns on and both amplifiers show similar impedances to the interstage matching network and it performs as it intended, 3 dB power division,  $90^\circ$  phase shift, and impedance matching. Fig. 8(d)–(f) shows the simulation results of amplitude and phase differences between interstage matching outputs of main and auxiliary paths over PA output power at 26/28/30 GHz. The same bias voltages are used for all frequency points to derive these figures. The PAE contour plot of Fig. 5 assumes a 2-dB loss for the interstage matching network. The reader may use (5) and re-evaluate PAE<sub>total</sub> for different values of interstage matching network loss. For example, if we could design this network with a 0.5 dB less insertion loss, the



Fig. 12. Large signal continuous-wave gain, PAE, and amplitude modulation (AM)–phase modulation (PM) simulations, measurements and comparison to class-B efficiency at (a) 26, (b) 28, and (c) 30 GHz.

PAE<sub>total</sub> would have been 0.8%–0.9% higher for different driver and last-stage PAE values. Another role of the interstage matching network is differential to SE conversion. Since the driver amplifier is differential and the power stage is SE, the transformers in the interstage network provide balanced to unbalanced conversion, too.

### C. Circuit Implementation

Fig. 9 shows the full schematic of DPA. Main and auxiliary PAs use 60- $\mu$ m devices in CB configuration, while the driver uses a differentially neutralized CE. Each device in a differential pair has a size of 40  $\mu$ m. The main amplifier is biased at 166  $\mu$ A/ $\mu$ m, while the auxiliary amplifier's quiescent current is negligible. Implementing high-speed, compact adaptive biasing could further enhance the back-off efficiency of Doherty amplifiers [29], [30]. Transistors in the driver amplifier are also biased at 125  $\mu$ A/ $\mu$ m. The input of the driver amplifier is also matched to the source impedance using a transformer-based matching network. Also, main and auxiliary amplifiers use a 2-V supply while the driver amplifier uses a 1.2-V supply.

The DPA is designed and fabricated in a 90-nm BiCMOS process from GlobalFoundries. Fig. 10 shows the chip photograph of the fabricated compact CB DPA. It occupies an active area of 640  $\times$  300  $\mu$ m. In the next section, the measurement results of this amplifier are presented.

## V. MEASUREMENT RESULTS

In this section, we present the results of small and large signal measurements, along with modulated signal tests.

Fig. 11 shows the small signal performance of this amplifier. Small signal measurements are carried out using N5047A PNA-X from Agilent and i67-A-GSG-75 probes from Cascade Microtech. Calibration is done up to the probe tips using the short-open-line-through (SOLT) method and impedance standard substrate (ISS). The two-stage common-base DPA (CBDPA) achieves a peak small signal gain of 20.4 dB at 28.45 GHz. Considering a 1 dB drop from peak S21, CBDPA achieves 4.45 GHz of bandwidth (26.06–30.51 GHz) which is sufficiently high to support most of mm-wave 5G frequency bands. A good agreement between simulations and measurements is observed.

Fig. 12 shows continuous-wave large-signal measurement. Its  $P_{1\text{dB}}$  points at 26, 28, and 30 GHz are 19.4, 19.3, and 19.3 dBm with 38.5%, 37.3%, and 36.8% PAE<sub>1dB</sub>, respectively. At the 6-dB power back-off region (from the saturation



Fig. 13.  $P_{\text{sat}}$ ,  $P_{1\text{dB}}$ , and PAE at saturation, 1- and 6-dB back-off from saturation versus frequency.



Fig. 14. Modulated signal measurements test setup.

point), it reaches 29.1%, 31.1%, and 29.3% efficiencies at 26, 28, and 30 GHz achieving 1.23 $\times$ , 1.29 $\times$ , and 1.28 $\times$  improvement compared with simulated class-B operation, respectively. Moreover, Fig. 13 shows the PAE<sub>sat</sub> and  $P_{\text{sat}}$  performance of CBDPA versus frequency. It achieves  $>19.5$  dBm  $P_{\text{sat}}$  with  $>36\%$  PAE<sub>sat</sub>, exhibiting wideband performance.

Fig. 14 shows the test setup for modulated signal measurements. An evaluation board of Xilinx UltraScale FPGA (ZCU111) is utilized for signal generation and analysis along with VSA software from Keysight. ZCU111 has high-speed digital-to-analog converters (DACs) and analog-to-digital converters (ADCs) that are clocked at 3932.16 MHz with 14 and 12 bits of resolution, respectively. 5G FR2 test model signal, NR-FR2-TM3.1-64QAM, is generated and loaded into DAC's internal memory. Two DACs are used to generate in-phase and quadrature-phase signals. The I and Q signals are fed to the microwave upconverter from Analog Devices

TABLE IV  
SILICON-BASED mm-WAVE PAs PERFORMANCE COMPARISON

|                              | This work          | KIM, RFIC21 [24] | HU, ISSCC19 [11] | Huang, ISSCC21 [17] | Qunaj, ISSCC21 [16] | Pashaifar, JSSC22 [20] | Zhong, RFIC20 [13] |
|------------------------------|--------------------|------------------|------------------|---------------------|---------------------|------------------------|--------------------|
| <b>Architecture</b>          | CBDPA              | Parallel Series  | Multiband        | Coupler DPA         | LMBA                | Series DPA             | Series DPA         |
| <b>Technology</b>            | 90nm SiGe          | 28nm             | 130nm SiGe       | 45nm SOI            | 28nm                | 40nm                   | 22nm FDSOI         |
| <b>Supply (V)</b>            | 1.8                | 1.8              | 1.5              | 2                   | 1                   | 1.8                    | 2.4                |
| <b>Freq. (GHz)</b>           | 26-30.5*           | 24-30            | 28-42            | 26-60               | 36                  | 24-32                  | 28                 |
| <b>Area (mm<sup>2</sup>)</b> | 0.192              | 0.16             | 1.76             | 0.67                | 1.44                | 0.37                   | 0.2                |
| <b>Gain (dB)</b>             | 20.4               | 16.5             | 18.2             | 16                  | 18                  | 17.4                   | 26.1               |
| <b>P<sub>sat</sub> (dBm)</b> | 19.91              | 18.8             | 16.8             | 22.62               | 22.6                | 20.4                   | 22.5               |
| <b>PAE<sub>sat</sub> (%)</b> | 37.3               | 30               | 20.3             | 41.9                | 32                  | 38.2                   | 28.5               |
| <b>P<sub>1dB</sub> (dBm)</b> | 19.32              | 17.5             | 15.2             | -                   | 19.6                | 20.2                   | 21.1               |
| <b>PAE<sub>1dB</sub> (%)</b> | 37.1               | -                | 19.5             | -                   | 30.5                | 39.1                   | 26.6               |
| <b>PAE<sub>6dB</sub> (%)</b> | 31.1               | 22               | 13.9             | 31.5                | 24.2                | 34                     | 22.1               |
| <b>Modulation</b>            | 400MHz 5GNR 64-QAM | 8CC OFDM         | 64 QAM           | 64QAM NR            | QAM64               | 64QAM OFDM             | QAM64              |
| <b>Data Rate</b>             | 400M               | 800M             | 3 Gbps           | 200M                | 12 Gbps             | 400M                   | 2.4Gbps            |
| <b>EVM rms</b>               | -25.7              | -27.1            | -27              | -24                 | -25                 | -24.5                  | -25.1              |
| <b>ACLR</b>                  | -26.2/-28.5        | -33.4/-32.1      | -28.8            | -28.8               | -                   | -28.8/-28.2            | -28/-31            |
| <b>P<sub>avg</sub> (dBm)</b> | 8.45               | 11.4             | 9.2              | 10.7                | 16                  | 8.8                    | 10.9               |
| <b>PAE<sub>avg</sub> (%)</b> | 14.5               | 18.1             | 18.5             | 14.5                | 22                  | 15                     | 9.2                |

\*1dB Bandwidth



Fig. 15. Modulated signal measurements. (a) 26, (b) 28, and (c) 30 GHz.



Fig. 16. Modulated signal measurements, EVM, PAE<sub>avg</sub>, ACLR<sub>L</sub>, and ACLR<sub>U</sub> versus Pout<sub>avg</sub>. (a) 26, (b) 28, and (c) 30 GHz.

(ADMV1013-EVALZ). The upconverter is tuned to suppress the image signal by more than 45 dB and the local oscillator (LO) signal by about 40 dB. The LO signal is generated using Agilent's PSG E8257D. Then, the modulated RF signal is

applied to the device under test (DUT). The amplified signal is then downconverted using ZMDB-44H-K+ from mini-circuits for digitization. Also, a portion of the RF signal from DUT is fed to the signal analyzer (Agilent N9030A) for power

measurements. The digitized signal is downloaded onto a PC with VSA software for demodulation. The PA is measured with NR-FR2 test model signals from 26 to 30 GHz (Fig. 15). At 26, 28, and 30 GHz, the PA achieves  $P_{avg}$  of 8.25, 8.45, and 8 dBm, respectively. The corresponding  $PAE_{avg}$  values are 13.9%, 14.5%, and 13.7% for a 400-MHz 1-CC 64-QAM signal with an  $EVM_{rms}$  of  $-25.8$ ,  $-25.8$ , and  $-25.7$  dB, respectively.

The PA is also tested for adjacent channel power ratio (ACPR) and at 26, 28, and 30 GHz, it achieves  $-27$ ,  $-26.2$ , and  $-30.8$  dBc for the lower side and  $-28.4$ ,  $-28.5$ , and  $-27.6$  dBc for the higher side channels. Fig. 16 shows the modulated signal measurement versus the average output power of the DPA, when the modulated signal power is swept. At lower average power, the EVM is limited to noise generated by PA and measurement system. However, at the higher end of average output power, EVM is limited by the nonlinearity of DPA. Table IV lists the comparison of the performance of CBDPA to state-of-the-art back-off efficiency PAs at mm-wave frequencies. Compared with [24], the PA prototype provides higher output power and efficiency. On the other hand, compared with [20], it occupies a 45% smaller area.

## VI. CONCLUSION

In summary, this article investigates the intrinsic constraints of mm-wave DPAs in beamforming applications, particularly when numerous transceivers are consolidated within a limited space. It introduces architectural and circuit-level innovations aimed at reducing the chip area consumed by the entire PA, thereby enhancing integration density and system efficiency in compact mm-wave beamforming applications. Different aspects of mm-wave Doherty combiner architectures, such as area, power, and bandwidth, are analyzed in detail. The effect of active device structures on the area, power, and power efficiency are considered as design parameters and the SE CB is proposed as the power stage. An interstage matching network that removes the bulky hybrid coupler at the input is introduced to further reduce the silicon area. Also, measurement results confirm the applicability of the proposed ideas for mm-wave beamforming ICs.

## REFERENCES

- [1] T. S. Rappaport et al., "Wireless communications and applications above 100 GHz: Opportunities and challenges for 6G and beyond," *IEEE Access*, vol. 7, pp. 78729–78757, 2019.
- [2] H. Aggrawal, P. Chen, M. M. Assefzadeh, B. Jamali, and A. Babakhani, "Gone in a picosecond: Techniques for the generation and detection of picosecond pulses and their applications," *IEEE Microw. Mag.*, vol. 17, no. 12, pp. 24–38, Dec. 2016.
- [3] M. Aladsani, A. Alkhateeb, and G. C. Trichopoulos, "Leveraging mmWave imaging and communications for simultaneous localization and mapping," in *Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP)*, May 2019, pp. 4539–4543.
- [4] Y. Yin, Z. Zhang, T. Kanar, S. Zahir, and G. M. Rebeiz, "A 24–29.5 GHz 256-element 5G phased-array with 65.5 dBm peak EIRP and 256-QAM modulation," in *IEEE MTT-S Int. Microw. Symp. Dig.*, Aug. 2020, pp. 687–690.
- [5] S. Shakib, M. Elkholly, J. Dunworth, V. Aparin, and K. Entesari, "A wideband 28-GHz transmit–receive front-end for 5G handset phased arrays in 40-nm CMOS," *IEEE Trans. Microw. Theory Techn.*, vol. 67, no. 7, pp. 2946–2963, Jul. 2019.
- [6] J. Pang et al., "A 28-GHz CMOS phased-array transceiver based on LO phase-shifting architecture with gain invariant phase tuning for 5G new radio," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1228–1242, May 2019.
- [7] T. Sowlati et al., "A 60-GHz 144-element phased-array transceiver for backhaul application," *IEEE J. Solid-State Circuits*, vol. 53, no. 12, pp. 3640–3659, Dec. 2018.
- [8] Y.-S. Yeh, E. Balboni, and B. Floyd, "A 28-GHz phased-array transceiver with series-fed dual-vector distributed beamforming," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2017, pp. 65–68.
- [9] B. Sadhu et al., "A 28-GHz 32-element TRX phased-array IC with concurrent dual-polarized operation and orthogonal phase and gain control for 5G communications," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3373–3391, Dec. 2017.
- [10] H.-T. Kim et al., "A 28-GHz CMOS direct conversion transceiver with packaged  $2 \times 4$  antenna array for 5G cellular system," *IEEE J. Solid-State Circuits*, vol. 53, no. 5, pp. 1245–1259, May 2018.
- [11] S. Hu, F. Wang, and H. Wang, "A 28-/37-/39-GHz linear Doherty power amplifier in silicon for 5G applications," *IEEE J. Solid-State Circuits*, vol. 54, no. 6, pp. 1586–1599, Jun. 2019.
- [12] F. Wang and H. Wang, "A high-power broadband multi-primary DAT-based Doherty power amplifier for mm-wave 5G applications," *IEEE J. Solid-State Circuits*, vol. 56, no. 6, pp. 1668–1681, Jun. 2021.
- [13] Z. Zong et al., "A 28-GHz SOI-CMOS Doherty power amplifier with a compact transformer-based output combiner," *IEEE Trans. Microw. Theory Techn.*, vol. 69, no. 6, pp. 2795–2808, Jun. 2021.
- [14] B. Rabet and J. Buckwalter, "A high-efficiency 28 GHz outphasing PA with 23 dBm output power using a triaxial balun combiner," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 174–176.
- [15] K. Ning, Y. Fang, N. Hosseinzadeh, and J. F. Buckwalter, "A 30-GHz CMOS SOI outphasing power amplifier with current mode combining for high backoff efficiency and constant envelope operation," *IEEE J. Solid-State Circuits*, vol. 55, no. 5, pp. 1411–1421, May 2020.
- [16] V. Qunaj and P. Reynaert, "26.2 A Doherty-like load-modulated balanced power amplifier achieving 15.5 dBm average pout and 20% average PAE at a data rate of 18 Gb/s in 28 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 64, Feb. 2021, pp. 356–358.
- [17] T.-Y. Huang, N. S. Mannem, S. Li, D. Jung, M.-Y. Huang, and H. Wang, "26.1 A 26-to-60GHz continuous coupler-Doherty linear power amplifier for over-an-octave back-off efficiency enhancement," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, vol. 64, Feb. 2021, pp. 354–356.
- [18] Z. Zong et al., "A 28GHz voltage-combined Doherty power amplifier with a compact transformer-based output combiner in 22nm FD-SOI," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Aug. 2020, pp. 299–302.
- [19] A. Grebennikov and J. Wong, "A dual-band parallel Doherty power amplifier for wireless applications," *IEEE Trans. Microw. Theory Techn.*, vol. 60, no. 10, pp. 3214–3222, Oct. 2012.
- [20] M. Pashaeifar, L. C. N. de Vreede, and M. S. Alavi, "A millimeter-wave CMOS series-Doherty power amplifier with post-silicon inter-stage passive validation," *IEEE J. Solid-State Circuits*, vol. 57, no. 10, pp. 2999–3013, Oct. 2022.
- [21] P. R. Gray, P. J. Hurst, S. H. Lewis, and R. G. Meyer, *Analysis and Design of Analog Integrated Circuit Design*, 5th ed., Hoboken, NJ, USA: Wiley, 2009.
- [22] H.-T. Dabag, B. Hanafi, F. Golcuk, A. Agah, J. F. Buckwalter, and P. M. Asbeck, "Analysis and design of stacked-FET millimeter-wave power amplifiers," *IEEE Trans. Microw. Theory Techn.*, vol. 61, no. 4, pp. 1543–1556, Apr. 2013.
- [23] P. Heydari, "Neutralization techniques for high-frequency amplifiers: An overview," *IEEE Solid State Circuits Mag.*, vol. 9, no. 4, pp. 82–89, Fall 2017.
- [24] S. Kim, H.-C. Park, D. Kang, D. Minn, and S.-G. Yang, "A 24.5–29.5 GHz broadband parallel-to-series combined compact Doherty power amplifier in 28-nm bulk CMOS for 5G applications," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2021, pp. 171–174.
- [25] H. Oh et al., "32.2 A 24.25-to-29.5GHz extremely compact Doherty power amplifier with differential-breaking phase offset achieving 23.7% PAEavg for 5G base-station transceivers," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2024, pp. 522–524.

- [26] M. Nick and A. Mortazawi, "Adaptive input-power distribution in Doherty power amplifiers for linearity and efficiency enhancement," *IEEE Trans. Microw. Theory Techn.*, vol. 58, no. 11, pp. 2764–2771, Nov. 2010.
- [27] J. R. Lopera, J. Mayock, Q. Sun, M. Gadringer, W. Bösch, and E. Leitgeb, "A 3.5 GHz high power GaN hybrid Doherty power amplifier with dynamic input power splitting for enhanced power added efficiency at backoff," in *Proc. IEEE Topical Conf. RF/Microwave Power Modelling Radio Wireless Appl. (PAWR)*, Jan. 2021, pp. 1–4.
- [28] D. Pozar, *Microwave Engineering*. Hoboken, NJ, USA: Wiley, 2012. [Online]. Available: <https://books.google.com/books?id=JegbAAAAQBAJ>
- [29] X. Zhang, H. Guo, and T. Chi, "32.1 A 47 GHz 4-way Doherty PA with 23.7 dBm P1dB and 21.7%/13.1% PAE at 6 / 12 dB back-off supporting 2000 MHz 5G NR 64-QAM OFDM," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2024, pp. 520–522.
- [30] C. Elgaard et al., "Efficient wideband mmW transceiver front end for 5G base stations in 22-nm FD-SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 59, no. 2, pp. 321–336, Feb. 2024.



**Soodeh Miraslani** received the B.Sc. degree in electrical engineering from Hamedan University of Technology, Hamedan, Iran, in 2012, and the M.Sc. degree from the K. N. Toosi University of Technology, Tehran, Iran, in 2014.

She was an Instructor with the Department of Electrical and Computer Engineering, Azad University, Tehran, from 2015 to 2019. Her research interests include the analysis of microwave and RF circuits, and high-speed printed circuit board design.



**Mohammad Ali Mokri** (Graduate Student Member, IEEE) received the B.Sc. degree in electrical engineering from Hamedan University of Technology, Hamedan, Iran, in 2012, and the M.Sc. degree from the University of Tehran, Tehran, Iran, in 2015. He is currently pursuing the Ph.D. degree at Washington State University, Pullman, WA, USA.

In February 2024, he joined Keysight Labs., Colorado Springs, CO, USA, as an RFIC Research and Development Design Engineer, focusing on the design of RF and mm-wave circuits for next-generation test and measurement instruments. His research interests include integrated circuits for phased array applications, mm-wave transmitters, and RF/mm-wave power amplifiers.

Mr. Mokri also serves as a reviewer for the IEEE MICROWAVE AND WIRELESS COMPONENTS LETTERS.



**Shahriar Mirabbasi** (Senior Member, IEEE) received the B.Sc. degree in electrical engineering from the Sharif University of Technology, Tehran, Iran, in 1990, and the M.A.Sc. and Ph.D. degrees in electrical and computer engineering from the University of Toronto, Toronto, ON, Canada, in 1997 and 2002, respectively.

Since 2002, he has been with the Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC, Canada, where he is currently a Professor. His current research interests include analog, mixed-signal, RF, and millimeter-wave integrated circuit and system design with particular emphasis on communication, sensor interface, and biomedical applications.



**Emad Afshar** (Graduate Student Member, IEEE) received the M.Sc. degree in electrical engineering from Tarbiat Modares University, Tehran, Iran, in 2021. He is currently pursuing the Ph.D. degree in electrical engineering at Washington State University, Pullman, WA, USA.

His research interests include power management systems and radio frequency integrated circuit design.



**Deukhyoun Heo** (Fellow, IEEE) received the B.S. degree in electrical engineering from Kyungpuk National University, Daegu, South Korea, in 1989, the M.S. degree in electrical engineering from Pohang University of Science and Technology (POSTECH), Pohang, South Korea, in 1997, and the Ph.D. degree in electrical and computer engineering from the Georgia Institute of Technology, Atlanta, GA, USA, in 2000.

In 2000, he joined with National Semiconductor Corporation, Santa Clara, CA, USA, where he was a Senior Design Engineer. In 2003, he joined the Faculty of Electrical Engineering and Computer Science Department, Washington State University, Pullman, WA, USA, where he is currently the Frank Brands Analog Distinguished Professor of electrical engineering. He has authored or co-authored more than 200 publications. He has been primarily interested in mm-wave/sub-terahertz (THz) transceivers for wireless and wireline data communications, wireless sensors and power management systems, beamformers for phased-array communications, and low-power wireless links for biomedical applications.

Dr. Heo was a Technical Committee Member of the IEEE Microwave Theory and Techniques Society (IEEE MTT-S). He was a recipient of the 2000 Best Student Paper Award presented at the IEEE MTT-S IMS and the 2009 National Science Foundation (NSF) CAREER Award. He has served as an Associate Editor for IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART II: EXPRESS BRIEFS and IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES. He is currently serving as an Associate Editor for IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION SYSTEMS and the IEEE MICROWAVE AND WIRELESS COMPONENTS LETTER.



**Md Aminul Hoque** (Student Member, IEEE) received the B.S. degree in electrical engineering from the Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh, in 2013, and the M.S. degree in electrical engineering from the University of Idaho, Moscow, ID, USA, in 2017. He is currently pursuing the Ph.D. degree in RF microelectronics at Washington State University, Pullman, WA, USA.

In 2020, he joined Qorvo Inc., Chandler, AZ, USA, as a Mobile Engineering Intern, focusing on research and development of highly linear GaAs PAs for cellular applications. In 2021, he joined Qualcomm Technologies Inc., Tempe, AZ, USA, as an RF-Analog Design Engineering Intern, where he focused on the design and model of highly efficient digital transmitters. He is currently with Renesas Electronics America, Tempe, working on high-speed, low-power signal generation systems. His current research interests include the development of RF-analog blocks for use in low-power systems for wireless communications.