

# A Bits-to-Antenna *F*-Band 120-Gb/s CMOS RF-64QAM Transmitter for FutureG Wireless Links

Zisong Wang<sup>ID</sup>, Member, IEEE, Huan Wang<sup>ID</sup>, Senior Member, IEEE,  
Youssef O. Hassan<sup>ID</sup>, Graduate Student Member, IEEE,  
and Payam Heydari<sup>ID</sup>, Fellow, IEEE

**Abstract**—This article presents a bits-to-antenna wireless transmitter (TX), fully integrated in 45-nm CMOS SOI, capable of surpassing 100-Gb/s data rates. The unique method of directly forming the 64QAM constellation within the RF domain, using three QPSK sub-TXs with controlled amplitude weighting, effectively mitigates the complications introduced by power amplifier (PA) nonlinearity in high-order modulations. This strategic approach opens avenues for significant enhancements in bandwidth and output power. This article further explores additional advantages of this TX design, such as local oscillator (LO) leakage suppression and improved output power, while going through the specifics of circuit block implementations. With a 40-GHz RF bandwidth, the RF-64QAM TX prototype achieves a measured data rate of 120 Gb/s with an effective isotropic radiated power (EIRP) of 16 dBm.

**Index Terms**—6G, FutureG, mm-wave, RF-64QAM, sub-THz, transmitter (TX).

## I. INTRODUCTION

**6**G AND FutureG [1] ambitiously aim at achieving data rates in the realm of hundreds of gigabits per second (Gb/s), setting the stage for a transformative leap in data transmission technologies [2]. This vision draws attention to the wide contiguous bandwidth residing within the (sub-)terahertz (THz) spectrum [3] and uses higher order modulation schemes as key enablers for significantly higher data rates by enhancing spectral efficiency. However, the quest to surpass 100-Gb/s data rate with traditional transmitters

Received 23 April 2024; revised 9 August 2024 and 10 November 2024; accepted 24 December 2024. This article was approved by Associate Editor Bodhisatwa Sadhu. This work was supported in part by the Microelectronics Commons Program, a DoD Initiative, under Award N00164-23-9-G056; and in part by the NSF Computer and Network Systems (CNS) Medium under Grant CNS-2233783; and in part by San Diego Nanotechnology Infrastructure (SDNI) at the University of California at San Diego, part of the National Nanotechnology Coordinated Infrastructure, through NSF under Grant ECCS-2025752. (Corresponding authors: Zisong Wang; Payam Heydari.)

Zisong Wang was with the Department of EECS, University of California at Irvine, Irvine, CA 92697 USA. He is now with Marvell Technology, Irvine, CA 92618 USA (e-mail: wangzisong@ieee.org).

Huan Wang is with Qualcomm, San Diego, CA 92121 USA.

Youssef O. Hassan and Payam Heydari are with the Department of EECS, University of California at Irvine, Irvine, CA 92697 USA (e-mail: payam@uci.edu).

Color versions of one or more figures in this article are available at <https://doi.org/10.1109/JSSC.2024.3523842>.

Digital Object Identifier 10.1109/JSSC.2024.3523842

(TXs) encounters formidable challenges, particularly those associated with power amplifier (PA) design. The obstacles include degradation in power efficiency and error vector magnitude (EVM) stemming from PA nonlinearities, such as A.M.–A.M. and A.M.–P.M. distortion. These challenges become more pronounced with higher order modulations (e.g., 64QAM) having higher peak-to-average power ratio (PAPR), where PA nonlinearity precludes full saturation operation, compromising the TX output power. Such constraints are particularly serious at mm-wave frequencies bounded by the device  $f_{\max}$  and  $G_{\max}$ . The RF power-digital-to-analog-converter (power-DAC) approach, aimed at resolving efficiency degradation by activating DAC segments associated with specific constellation points, leverages the discrete power-level nature of digital modulation to maintain high power efficiency [4]. However, attaining the necessary signal-to-noise-and-distortion ratio (SNDR) for wideband, higher order modulation at mm-wave frequencies remains a significant hurdle. The widely adopted stack-up topology of RF power-DAC cells confronts challenges in its applicability for wideband design [5], [6]. In addition, the issue of clock skew, which stems from the need to synchronize delays across DAC segments becomes more severe at sub-THz frequencies [4], casting doubts on its practical viability.

In addressing these complex issues, our recent analysis suggests that the power-backoff issue encountered in PAs when amplifying signals with high PAPR can be substantially alleviated through the concurrent execution of symbol generation/formation and upconversion in the analog/RF domain [7]. A case in point is the construction of a 64QAM signal using three QPSK signals with magnitude ratios of two, as demonstrated in Fig. 1 [8]. This approach, enabling each PA to process a constant-envelope RF signal, not only circumvents A.M.–A.M. and A.M.–P.M. distortions but also fundamentally avoids power efficiency degradation due to power back-off of high-order modulation. Thus, the PA design complexity is reduced from a multidimensional problem to a more manageable 3-D focus on bandwidth,  $P_{\text{sat}}$ , and stability, enabling new PA design methodologies at sub-THz frequencies [9]. Furthermore, it is established that  $N$  QPSK signals with identical EVM will generate a  $4^N$ QAM signal with the same EVM [7], [10].



Fig. 1. System architecture of the proposed RF-64QAM TX (the depicted GCPW lengths are for illustrative purposes only).

Leveraging its notable advantages in output power, linearity, and EVM, the direct-RF modulation method, referred to as RF-64QAM, is used to generate 64QAM signals at sub-THz frequencies. Fabricated in the 45-nm CMOS SOI, the RF-64QAM prototype demonstrates the first fully integrated sub-THz TX integrating all critical stages of the digital/mixed-signal/analog/RF chain from bits and symbols' generation circuits to the antenna, showcasing a wirelessly measured data rate of 120 Gb/s. This article offers a complete study of the RF-64QAM TX, covering design consideration and over-the-air measurement: Section II presents the system-level architecture and foundational design principles. Section III investigates the design considerations and implementation of critical circuit blocks. Section IV details the measurement setups and presents results from the TX prototype. Finally, Section V summarizes key contributions of this work.

## II. SYSTEM-LEVEL CONSIDERATIONS

### A. System Architecture

From a system-level perspective, decomposing  $4^N$ QAM into  $N$  QPSK sub-TXs relaxes the TX output power and linearity tradeoff, while streamlining the overall design process. The TX power efficiency is maximized through minor adjustments in bias conditions and power supply voltages for the circuit components. In addition, this architecture significantly boosts the TX reconfigurability. Specifically, through the selective deactivation of sub-TXs, it facilitates a straightforward conversion of the RF-64QAM TX into either an RF-16QAM or a QPSK TX, while maintaining high power efficiency. This feature renders the TX highly adaptable to meet diverse communication needs.

The RF-64QAM TX architecture, as shown in Fig. 1, revolves around three QPSK sub-TXs that share identical circuit blocks to ensure uniformity and scalability, while simplifying the design process. The output power is fine-controlled by adjusting the dc bias conditions of the three PAs, enabling distinct output saturation power ( $P_{\text{sat}}$ ) levels with a 6-dB difference (Fig. 1), while maintaining high power-added efficiency (PAE). A key aspect of the design is the carrier synchronization and baseband signals' phase alignments among the three sub-TXs, through a carefully laid-out transmission-line-based distribution network and a calibration receiver (RX) with detailed synchronization methods to be elaborated later in Section II-B.

The on-chip local oscillator (LO) chain takes the external 30-GHz single tone and boosts its frequency to 60 GHz using an on-chip doubler whose output is then fed to Wilkinson power splitters for distribution (Fig. 1). On the baseband side, the chip incorporates six  $2^9 - 1$  pseudorandom binary sequence (PRBS) generators to integrate baseband signal generation and distribution. The dc biasing for all circuit blocks is handled by current-steering DACs under the digital control of an on-chip SPI controller. Fine-tuning the control parameters through SPI enables comprehensive monitoring of on-chip dc bias voltages via the output of an on-chip multiplexer available through a single pad. This configuration facilitates a one-time calibration to compensate for process, voltage, and temperature (PVT) variations. Moreover, it enables digital calibration to correct for amplitude mismatches among the QPSK sub-TXs and within each IQ pair, thus enhancing the TX performance.

Focusing on each QPSK sub-TX, the 60-GHz tone emerging from the Wilkinson splitter passes through an LO chain, wherein the first stage is a low-pass 8-b phase shifter, with a tuning range of approximately  $\pm 10^\circ$  to compensate for PVT

variations across the three sub-TXs. The phase-shifter output is fed to a frequency doubler and buffer/amplifier before entering a miniaturized 90° branch-line coupler to produce IQ signals. A pair of 120-GHz buffers then counteract the losses of the coupler and routing, and further boost the LO signal prior to the Gilbert-cell-based QPSK modulator. A local D-flip-flop on the baseband signal path compensates for the PRBS routing loss, ensuring adequate conversion gain, while switched-*RC* networks provide basic pulse shaping. The output of QPSK modulator is coupled via a transformer into a wideband PA with a single-ended output, which finally feeds a three-way power combiner to form the 64QAM signal.

### B. Phase Synchronization of QPSK Sub-TXs

Constructing a 64QAM constellation using three QPSK signals necessitates phase synchronization among these QPSK sub-TXs. The relationship between EVM and phase mismatch has been established, as follows [10]:

$$\text{EVM}_{\text{RF-64QAM},\theta} = \sqrt{\frac{4\left[\left(2\sin\frac{\theta_2}{2} + \sin\frac{\theta_1-\theta_2}{2}\right)^2 - 2\sin\frac{\theta_2}{2}\sin\frac{\theta_1-\theta_2}{2}\right]}{21}} \quad (1)$$

where  $\theta_i$  for  $i \in \{1, 2\}$  represents the phase difference between the largest QPSK signal (QPSK<sub>3</sub>) and QPSK<sub>*i*</sub>. From (1), it is deduced that for a -30 dB EVM floor,  $\theta_i$  must be within  $\pm 5^\circ$ . To meet this theoretical requirement, a three-step strategy is used for LO and baseband phase synchronizations across the three sub-TXs.

The first step, conducted during the design phase, aligns phase offsets among I/Q carrier signals and synchronizes baseband signals across the QPSK sub-TXs. Variations in bias conditions across the three PAs lead to distinct phase responses for each sub-TX. The resulting phase differences are corrected by precise length adjustments of the grounded coplanar waveguides (GCPWs) that deliver the 60-GHz LO signal from the outputs of Wilkinson splitters to each sub-TX. In addition, electromagnetic (EM) simulations of interconnects carrying high-speed baseband signals from PRBS generators are undertaken to ensure modulator input phase alignment.

The second step uses an on-chip calibration RX at the TX output (Fig. 1) to mitigate PVT variations post fabrication. The 60-GHz LO signal emerging from the splitter is fed to a doubler and is used to downconvert the signal coupled from the combiner's output. During the calibration phase, each sub-TX is activated one by one while the dc output BB<sub>Cal</sub> of the calibration RX is measured. The 8-b tunable 60-GHz phase shifter at each sub-TX input is tuned to calibrate BB<sub>Cal</sub> to a 0-V dc voltage, achieving phase alignment.

The final step targets IQ phase-mismatch corrections within each QPSK sub-TX, using a pair of 8-b  $\pm 10^\circ$  phase shifters at 120 GHz placed after the 90° hybrid coupler (Fig. 1). A one-time calibration for each sub-constellation mitigates process-induced discrepancies, ensuring precise 64QAM signal formation.



Fig. 2. Proposed LO leakage suppression technique in RF-64QAM TX.

### C. LO Leakage Suppression

Besides the outlined advantages regarding PA linearity, TX power efficiency, and noise reduction, as detailed in [7], the RF-4<sup>*N*</sup>QAM architecture intrinsically enhances LO leakage suppression, leading to higher TX dynamic range and linearity. This is readily achieved by swapping the input connections to the PAs in lower amplitude sub-TXs (Fig. 2). Assuming uniform mismatches and parasitic couplings across the sub-TXs, this configuration enables a reduction in total LO leakage amplitude  $A_{\text{lkg}}$ , as expressed by

$$A_{\text{lkg}} = Ae^{j\Phi} \left( 1 - \sum_{i=1}^{N-1} \frac{1}{2^i} \right) \quad \text{for } N \geq 2 \quad (2)$$

where  $Ae^{j\Phi}$  denotes the LO leakage from the largest amplitude sub-TX. Equation (2) reveals that the RF-QAM TX achieves greater LO leakage suppression with higher order QAM. As *N* adopts very large values, we observe

$$\lim_{N \rightarrow \infty} A_{\text{lkg}} = \lim_{N \rightarrow \infty} Ae^{j\Phi} \left( \frac{1}{2^{N-1}} \right) = 0. \quad (3)$$

This indicates that, in theory, the RF-QAM TX architecture is capable of completely suppressing the LO leakage. In our specific case of the RF-64QAM TX, this simple architectural variation, as illustrated in Fig. 2, results in a theoretical 75% (or 12 dB) reduction in LO leakage power.

## III. BLOCK-LEVEL DESIGN CONSIDERATIONS

### A. On-Chip Wideband Antenna

Transmitting a wideband sub-THz modulated signal directly from a chip while maintaining signal integrity poses considerable challenges. The use of wirebonding for an output port above 100 GHz is impractical due to the significant losses associated with bonding wires, which degrade signal integrity. The antenna-in-package (AiP) offers a viable pathway yet faces implementation obstacles [11], [12]. A wide variety of substrate technologies (in terms of materials and processes) have been used, such as high-density interconnect (HDI) PCB using materials with low relative permittivity and loss tangent, low-temperature co-fired ceramic (LTCC), quartz-on-silicon, thin films on glass, wafer-level fan-out package, and



Fig. 3. (a) and (b) On-chip antenna with backside radiation silicon lens with simulated radiation pattern in HFSS. (c) Simulated results of antenna gain, matching, and total efficiency.

multi-layer organic laminate [13]. These technologies offer enhanced fabrication precision concerning linewidth, spacing, and via size useful for optimal matching and high power efficiency in sub-THz antennas, but come with higher costs and longer turnaround. Feeding wideband high-frequency signals to antennas on interposers via copper pillars or C4 balls requires precise interface modeling to capture high-frequency impairments induced by routing and vias. In addition, low-loss wideband matching with 30% fractional bandwidth at sub-THz frequencies, as mandated in this work, through heterogeneous integration of multiple materials with different loss tangent in an AiP structure is extremely challenging and still an open-research problem.

Given these considerations, we opt for an on-chip antenna solution. An on-chip antenna with an underneath ground shield is, however, unsuitable for wideband applications due to the substantial interlayer capacitances resulting from dense metal stackup in CMOS process. Without a ground shield, the antenna primarily radiates about 97% of its power into the silicon substrate due to the significant difference in dielectric constants between air and silicon [14]. Thus, we use a back-side radiation technique using a silicon lens.

For wideband design, a helical antenna is selected for its superior broadband characteristics compared with dipole antennas [15], [16], [17]. The thick metal layers are used to construct the helical antenna, spiraling down from  $M_7$  to  $M_6$ . It is crucial for an on-chip antenna to have a well-defined ground plane to prevent the radiated wave from using the silicon substrate as a propagation medium, potentially jeopardizing the functionality of the IC. To this end, a ground routing that spirals down alongside the signal routing is implemented, as shown in Fig. 3(b). This design secures a well-defined

ground path for the signal, while allowing for adjustment of multiple resonant frequencies of the helical antenna by varying the distance between the signal and ground routes to meet wideband design specifications. To enhance gain and ensure electrical field continuity at the ground strips of the CPW feed, the signal path is divided into two symmetrical sub-spirals, enabling effective spatial power combining. Modeled in HFSS with its diced wafer substrate and silicon lens, the simulated radiation pattern of this antenna is displayed in Fig. 3(a). The simulation results in Fig. 3(c) also reveal  $\sim 15$ -dBi antenna gain,  $\sim 70\%$  radiation efficiency across the 100–140-GHz range, and good input matching, as indicated by  $S_{11} < -10$  dB.

### B. Front-End Wideband Power Combining

1) *Power Combining in RF-QAM Modulation:* The power combining is a crucial part of the proposed RF-64QAM TX, serving a dual purpose, (1) it helps in the construction of the 64QAM constellation and (2) it naturally combines the power from several PAs in their saturation and increases the equivalent TX saturation power,  $P_{\text{sat},\text{eq}}$ , at the system level. To elucidate, with  $P_{\text{sat}}$  denoting the PA saturation power of QPSK<sub>1</sub>,  $P_{\text{sat},\text{eq}}$  is derived, as follows:

$$P_{\text{sat},\text{eq}} = \sum_{i=0}^{N-1} \frac{P_{\text{sat}}}{4^i} \quad (4)$$

where  $N$  denotes the number of sub-TXs contributing to  $4^N$ QAM generation. For  $N = 3$ , (4) yields  $P_{\text{sat},\text{eq}} = 1.3125 P_{\text{sat}}$ , marking a 1.2-dB improvement in equivalent saturation power before any power back-off is applied. As  $N$  approaches infinity,  $P_{\text{sat},\text{eq}}$  asymptotically reaches

$$P_{\text{sat},\text{eq}} = \lim_{N \rightarrow \infty} \sum_{i=0}^{N-1} \frac{P_{\text{sat}}}{4^i} = \frac{4}{3} P_{\text{sat}}. \quad (5)$$

As discussed below, this  $P_{\text{sat}}$  enhancement implies that the RF-QAM TX is capable of providing larger output power compared with the conventional counterpart even when accounting for  $10 \lg(N)$  dB power combining loss. Notably, by adjusting the bias and supply voltage for each sub-TX PA, we can ensure that the PAs always operate at  $P_{\text{sat}}$  (the saturation power of the  $i$ th PA sets to be  $P_{\text{sat}}/4^{i-1}$ ), thereby maintaining their highest drain efficiency. This observation underscores the superiority of the proposed RF-QAM TX architecture.

The output power discrepancies between an RF-QAM TX and the conventional counterpart necessitate quantitative analysis of the average output power. For a conventional TX, the PA should handle an upconverted high-order modulated signal with a high PAPR. Notably, PAPR for a  $4^N$ QAM signal is given by [7]

$$\text{PAPR} = \frac{3 \times (2^N - 1)}{2^N + 1}. \quad (6)$$

The PA is typically required to back-off from its 1-dB compression point  $P_{1\text{ dB}}$  in a TX handling high-order QAM. This requirement shapes the average output power,  $P_{\text{conv}}$ , for the



Fig. 4. Comparison of average TX output power for  $4^N$  QAM modulation between direct-RF modulation scheme and conventional approach.

conventional TX, as follows:

$$P_{\text{conv}} \text{ (dB)} = \begin{cases} P_{\text{sat}}, & \text{if } N = 1 \\ P_{1 \text{ dB}} - 10 \lg \left[ \frac{3(2^N - 1)}{2^N + 1} \right], & \text{otherwise.} \end{cases} \quad (7)$$

In the proposed RF- $4^N$  QAM TX, the PA output power in each QPSK sub-TX can extend beyond  $P_{1 \text{ dB}}$  to  $P_{\text{sat}}$ . The power back-off in this architecture effectively becomes the insertion loss from the power combiner. Assuming a classic  $N$ -way Wilkinson balanced power combiner [18], the insertion loss can be quantified as  $10 \lg(N)$  dB. Thus, the average output power,  $P_{\text{RF-QAM}}$ , for the RF-QAM scheme is derived as

$$P_{\text{RF-QAM}} \text{ (dB)} = P_{\text{sat}} + 10 \lg \left[ \frac{4}{3} \left( 1 - \frac{1}{4^N} \right) \right] - 10 \lg(N). \quad (8)$$

Fig. 4 visually shows the average output power between RF-QAM TX and the conventional counterpart based on (7) and (8). Recognizing that the difference between  $P_{1 \text{ dB}}$  and  $P_{\text{sat}}$  is contingent upon the linearity of PA topology, this analysis uses benchmark examples of 1.5-, 2-, and 3-dB differences. For the PA used in the QPSK sub-TX<sub>1</sub>, the simulated difference is 2 dB. The majority of published CMOS PAs operating above 100 GHz demonstrate a measured difference of 1.5 ~ 4 dB [19], [20]. Fig. 4 shows that the RF-QAM TX achieves higher average output power for modulation orders extending up to 1024-QAM ( $N \leq 5$ ) even for a difference as small as 1.5 dB. This advantage extends to 16K-QAM ( $N \leq 7$ ) when the difference reaches 3 dB. Notably, our RF-64QAM TX design showcases a 2.1-dB advantage in average output power. Moreover, since  $P_{1 \text{ dB}}$  point is often higher than the actual linear region boundary of the PA [21], mitigating the A.M.-to-A.M. and A.M.-to-P.M. effects on the EVM floor for the conventional TX allows this advantage of RF-64QAM to be extended to even larger  $N$  values in practice. In addition, considering 6G and FutureG may impose stringent requirements on the IM3 level, PAs may need to operate with a greater back-off from  $P_{1 \text{ dB}}$ , further highlighting the advantage of RF-QAM TX. To summarize, this analysis validates that by enabling the PA to work in its  $P_{\text{sat}}$  region for all  $4^N$  QAM modulations, the RF-QAM architecture fundamentally improves the TX output power.



Fig. 5. Schematic of a conventional recombinant, in-phase power combiner.

**2) Miniaturized Three-Way On-Chip Combiner:** Processing wideband uncorrelated signals simultaneously using three QPSK sub-TXs necessitates a three-way combiner with exceptional isolation and large fractional bandwidth. Traditional three-way Wilkinson combiners, although effective in theory, do not fit planar 2-D designs due to their reliance on  $\Delta$ -resistors or star-resistors [22]. This challenge compels a shift toward the adoption of a more sophisticated recombinant power combiner structure [23], designed in a 2-D layout that combines ten transmission lines each with one of six unique characteristic impedances, as shown in Fig. 5. Adopting a Chebyshev transformation for synthesis, these six impedances,  $Z_1$ – $Z_6$ , are derived

$$Z_1 = \mathcal{T}_1 \quad (9a)$$

$$Z_2 = 2\mathcal{T}_2 \quad (9b)$$

$$Z_3 = \frac{2\mathcal{T}_3 Z_4}{Z_4 - 2\mathcal{T}_3} \quad (9c)$$

$$Z_4 = \mathcal{T}_3 \left[ 2 + \frac{\sqrt{2}K}{\mathcal{T}_4} (Z_6 - \mathcal{T}_4) \right] \quad (9d)$$

$$Z_5 = \frac{2\mathcal{T}_4 Z_6}{Z_6 - \mathcal{T}_4} \quad (9e)$$

$$Z_6 = \frac{\mathcal{T}_1 \mathcal{T}_3 \mathcal{T}_4 (\sqrt{2} - K)}{\mathcal{T}_2 \mathcal{T}_4 \sqrt{2K^2 + 2} - K \mathcal{T}_1 \mathcal{T}_3} \quad (9f)$$

where  $\mathcal{T}_i$  is the Chebyshev transformer sectional impedance, with  $\mathcal{T}_1 = 0.91Z_0$ ,  $\mathcal{T}_2 = 1.4Z_0$ ,  $\mathcal{T}_3 = 0.48Z_0$ ,  $\mathcal{T}_4 = 0.36Z_0$ , and  $K^2$  denotes the sum of the powers of Ports 2 and 4 relative to that of Port 3 [23], [24]. However, for  $K^2 = 2$  these relationships turn singular, and the necessity for all ten transmission lines to be of quarter wavelength renders the structure unsuitable for on-chip integration.

Efforts were made to minimize transmission line lengths and reduce the number of lines on planar three-way combiners at the PCB level [25], yet application to on-chip integration remains unexplored. To tackle these challenges and tailor the footprint of the combiner for silicon integration, we accommodate design flexibility by allowing the power combiner to make up for part of the 12-dB output power difference needed between QPSK<sub>3</sub> and QPSK<sub>1</sub> sub-TXs. This will allow flexibility in choosing  $K^2$ . Assuming –20-dB return loss to be satisfactory in this design,  $Z_0$  is selected such that  $Z_5$  and  $Z_6$  are sufficiently close to 50 Ω. Pursuing this approach,  $K = 1.2$ , and  $Z_0 = 40 \Omega$ , and from (9a) this yields  $Z_5 = 44 \Omega$  and  $Z_6 = 41.4 \Omega$  for this design, which achieves a reflection



Fig. 6. (a) Strategies to shorten transmission line length: add serial inductors, reducing  $Z_0$ ; or add parallel capacitors, increasing  $Z_0$ . (b) Equivalent circuits for GCPW discontinuities: tapered GCPW, and T-junction.

loss of  $-19$  dB and an insertion loss difference of about  $1$  dB between  $S_{31}$  and  $S_{21}$ . As  $Z_5$  and  $Z_6$  are close to  $50\ \Omega$ , this allows for the removal of the three input transmission lines—specifically  $TL_8$ ,  $TL_9$ , and  $TL_{10}$ , as shown in Fig. 5.

Apart from the issues related to footprint, another concern is the range of characteristic impedances realizable by on-chip transmission lines. Due to the complexity of the signal routings in this TX, the GCPW has been selected as the preferred transmission-line structure owing to its well-defined ground return path, while providing substantial isolation from any environmental noise that may disturb the TX output signal. However, the characteristic impedance range for on-chip GCPW is determined by the technology specifications of the foundry, which dictates the maximum and minimum widths for thick metal wire routing and the spacing to lower metal layers. This range typically spans from  $30$  to  $80\ \Omega$ .

The recombinant combiner design, if followed by (9a), often necessitates an impedance surpassing  $80\ \Omega$ . For instance, with  $Z_0 = 40\ \Omega$  and  $K = 1.2$  as selected parameters,  $Z_4$  is calculated to be  $99.5\ \Omega$ . To address this issue, the use of serial inductances at the transmission line's ends is identified as an effective measure, which also aids in reducing the line's overall length, as shown in Fig. 6(a) [26]. The diversity in characteristic impedances used within this combiner naturally results in the formation of T-junctions at each connection node, where GCPW discontinuities act as the required serial inductance [27] [Fig. 6(b)]. Furthermore, a capacitor introduced at the T-junction further shortens the length of  $TL_2/TL_3$ , while making characteristic impedance  $Z_2$  to be more flexible. This adjustment enhances the design versatility of the combiner.

Finally,  $TL_1$  in Fig. 5 at the combiner output is replaced with a  $-14$ -dB coupler—comprising a  $13\text{-fF}$  coupling capacitor, a  $50\text{-}\Omega$  isolation resistor, and an interstage matching circuit—to realize a coupler-embedded combiner [Fig. 7(a)]. This embedded coupling network captures a portion of the TX output power and feeds the on-chip calibration RX, all while exerting a negligible impact on TX performance. The loaded capacitor together with the capacitive component resulting from the output tapered GCPW are used to reduce the footprint of this transmission line network [28].

The final schematic of the miniaturized coupler-embedded combiner and its microphotograph are depicted in Fig. 7(a) and (b), occupying  $460 \times 272\ \mu\text{m}^2$  of the die area. The simulated performance is summarized in Fig. 7(c) and (d), demonstrating  $<-10$ -dB input matching,



Fig. 7. (a) Schematic of a combiner with coupling port. (b) Micro-photograph of the fabricated combiner with coupler. (c) Simulated matching performance. (d) Simulated insertion loss, coupling, and isolation.

$<-15$ -dB isolation, and  $6$ -dB insertion loss across the entire  $F$ -band together with a  $-14$ -dB coupling at carrier frequency. This design presents the first three-way combiner integrated in silicon.

### C. Wideband PA

Sub-THz PA design in CMOS faces a significant challenge in achieving high output power due to constraints imposed by the process intrinsic limitations on  $G_{\max}$  and  $f_{\max}$ . This work focuses on exploring a PA topology that can achieve maximum output power over a wide bandwidth. As outlined in Section I, the PAs placed before the power combiner now handle QPSK signals, enabling us to focus on improving  $P_{\text{sat}}$  and bandwidth, while ensuring stability.

1) *System-Level Considerations*: From a system-level perspective, different approaches can be used for PA design in an RF-64QAM TX to generate the three output amplitudes required for the 64QAM constellation. One approach aims for maximizing efficiency using distinct optimum drain



Fig. 8. (a) Schematic of the PA output stage. (b) Equivalent small-signal circuit for  $Y_x$  calculation, and (c) its simplified version.

impedances for each PA, resulting in three different PA topologies. However, this introduces unwanted mismatches between three QPSK signal paths. PAs with different loads require different transistor sizes, making transistor mismatch compensation more complex. To maintain consistent phase alignment across sub-TXs, it is advantageous to use the same active and passive components. We thus adopt the same PA topology but with varying bias currents and supply voltages to form 64QAM constellation, while ensuring that all three PAs operate in saturation region for maximum efficiency. This approach confines mismatches mainly to PVT variations in transistors and passives. Despite different bias settings, mismatches are minimal and calibratable.

CMOS PAs operating above 100 GHz are typically biased in class-A or class-AB modes, as the operation near  $f_{\max}$  severely reduces higher order harmonics [29]. Therefore, the PA for the highest power QPSK sub-TX (QPSK<sub>1</sub> in Fig. 1) was designed to operate in class-AB, while the other two PAs operate in class-A. All the PAs consistently work in their saturation region, reliably delivering the required power to form a 64QAM constellation. As long as the PAs reach their saturation power, the amplitude response remains unaffected. As for mismatch in phase response, system-level synchronization ensures phase alignment among sub-TXs, as detailed in Section II-B.

One additional reason for not scaling the PA device sizes lies in the implications on PA loadline. The combiner presents a uniform 50-Ω impedance to all the PAs. If PAs are scaled with different sizes but identical bias, their optimum loadline impedances will scale accordingly. For instance, if the QPSK<sub>1</sub> PA requires a loadline impedance of  $X \Omega$  for optimal efficiency around  $P_{\text{sat}}$ , the QPSK<sub>2</sub> PA, being half the size, would require  $2X \Omega$ . This wide range of loadline impedances complicates the matching network design, leading to narrowband operation if high transformation ratios are needed. Since wide front-end bandwidth is essential to minimize the impact of intersymbol interference (ISI) effects on EVM, we first design the QPSK<sub>1</sub> PA for good efficiency and bandwidth. For the lower power PAs, we reuse the same PA core but reduce the supply voltage to control  $P_{\text{sat}}$ . Moreover, driving the three combiner ports with similar source impedances minimizes their impact on isolation, further simplifying the combiner design and enhancing performance.

2) *P<sub>sat</sub> Enhancement*: For the following PA discussions, we will focus on our PA design with the largest output

power. In CMOS PA design, the maximum output power is often constrained by the technology's breakdown voltage. To mitigate this limitation, we use mutually coupled gate and drain inductors with a coupling factor  $k \sim 0.2$  within the common-gate transistor of the PA's output cascode stage, as shown in Fig. 8(a). Assuming the drain signal emerging from the final stage is represented by  $A \sin(\omega t + \Phi)$ , this mutual coupling introduces a signal at the gate with a coupling factor of  $\alpha$ , resulting in

$$V_{DG2} = V_{DD} - V_{G2} + A(1 - \alpha) \sin(\omega t + \Phi) \quad (10)$$

where  $V_{DD}$  is the supply voltage, and  $V_{G2}$  is the gate bias voltage. To avert voltage breakdown, the maximum amplitude  $A_{\max}$  is derived to be

$$A_{\max} = \frac{V_{BO} - (V_{DD} - V_{G2})}{1 - \alpha} \quad (11)$$

with  $V_{BO}$  representing the gate-drain breakdown voltage. Equation (11) shows that given the same bias conditions, this coupling factor  $\alpha$  can improve  $P_{\text{sat}}$  without endangering the transistor breakdown. Notably, biasing the gate at  $V_{DD}$  yields  $A_{\max} \simeq (1 + \alpha)V_{BO}$  for small  $\alpha$  values, indicating the feasibility of achieving output voltage levels beyond breakdown-voltage constraints set by the CMOS process.

3) *Bandwidth Enhancement*: The PA bandwidth primarily determines the overall TX bandwidth, thereby influencing the achievable data rates. The mutually coupled gate and drain inductors  $L_p$  and  $L_s$  will contribute to PA's bandwidth enhancement, as will be proved using the small-signal analysis of the PA's output stage in Fig. 8(a). We adopt the  $\pi$  model for the coupling inductances [30], while disregarding the intrinsic parasitics within the transformer. The investigation commences by deriving input admittance seen at the source node of the cascode device using the small-signal equivalent circuit of Fig. 8(b). The inductances  $L_1$ – $L_3$  are derived [30]

$$L_1 = \frac{L_p L_s - M^2}{L_s - M} \quad (12a)$$

$$L_2 = \frac{L_p L_s - M^2}{M} = \frac{1 - k^2}{k} \sqrt{L_p L_s} \quad (12b)$$

$$L_3 = \frac{L_p L_s - M^2}{L_p - M} \quad (12c)$$

where  $M = k(L_p L_s)^{1/2}$  denotes the mutual inductance between  $L_p$  and  $L_s$ . The equivalent inductor  $L_2$  in parallel



Fig. 9. Simulated results' comparison for  $P_{\text{out}}$  when cascade device has different gate inductance configurations.



Fig. 10. Schematic of wideband differential PA with single-end output balun.

with  $C_{\text{gd}}$  can either resonate out this parasitic capacitance—turning the gate-drain connection as an open circuit—or render this parallel network as a negligibly small capacitance. This notion leads to the simplified equivalent circuit in Fig. 8(c). Applying KCL, the input admittance  $Y_x$  is derived, as follows:

$$Y_x(j\omega) \triangleq \frac{I_x}{V_x} = \underbrace{\frac{g_{m2}}{1 - \omega^2 L_1 C_{\text{gs}2}}}_{G_x} + j\omega \underbrace{\frac{C_{\text{gs}2}}{1 - \omega^2 L_1 C_{\text{gs}2}}}_{C_x}. \quad (13)$$

Defining a reference resonant frequency  $\omega_0 \triangleq 1/(L_1 C_{\text{gs}2})^{1/2}$ , (13) reveals that

$$\begin{cases} G_x > 0, \quad C_x > C_{\text{gs}2} > 0, & \text{if } \omega < \omega_0 \\ G_x < 0, \quad C_x = -\frac{C_{\text{gs}2}}{(\frac{\omega}{\omega_0})^2 - 1} < 0, & \text{if } \omega > \omega_0. \end{cases} \quad (14a)$$

$$\begin{cases} G_x < 0, \quad C_x = -\frac{C_{\text{gs}2}}{(\frac{\omega}{\omega_0})^2 - 1} < 0, & \text{if } \omega > \omega_0. \end{cases} \quad (14b)$$

where  $g_{m2}$  and  $C_{\text{gs}2}$  represent the transconductance and the parasitic impedance between gate and source of  $M_2$ , respectively. (14b) reveals unique admittance properties at high frequencies, notably the appearance of negative resistance and capacitance. From Fig. 8(a), we can see that this negative resistance can neutralize the effects of large drain resistance  $r_{\text{ol}}$  and the total capacitance  $C_{\text{dd}}$  seeing from the drain of  $M_1$ . For a special case of  $C_x = -C_{\text{dd}}$ , the effective capacitance seen at the intermediate node of the cascode stage is removed, thereby increasing the bandwidth.

The mutual coupling effect introduced between the gate and drain inductances offers distinct advantages over conventional gain-peaking inductor techniques for bandwidth enhancement [31]. Fig. 9 presents large-signal simulation results comparing a standalone cascode PA operating at  $P_{\text{sat}}$  with the same output loading network at the drain but under three different configurations at the gate: without gate inductance  $L_p$ , with gate inductance  $L_p$  but no coupling factor, and with gate inductance  $L_p$  and a coupling factor of 0.2. The simulations clearly show that introducing a coupling factor helps in  $C_{\text{gd}2}$

neutralization and results in additional peaking in output power at the high corner frequency to compensate for sharp roll-off, thus extending the bandwidth. Furthermore, this approach results in  $L_1$  and  $L_2$  values that are larger than those of  $L_p$  and  $L_s$ , enabling a more compact layout for inductances.<sup>1</sup> This compactness is beneficial for increasing the self-resonance frequency (SRF) of the transformer, an essential aspect for sub-THz applications.

Following our analysis in [9], we adopt a transformer-based interstage matching network for its superior bandwidth enhancement over transmission-line designs (Fig. 10), further augmented by stagger tuning. This approach, recently referred to as the “hybrid matching technique” [32], [33], has gained wide acceptance. In addition, the gain-peaking inductor technique is also used at the PA’s first stage, ensuring the wide bandwidth of the entire design.

4) *Stability*: To safeguard the output stage’s stability while maximizing bandwidth, the negative resistance seen at the drain of the common-source device  $M_1$  in Fig. 8(a) [cf. (14b)] must be designed to be smaller than the output resistance viewed from drain node of  $M_1$ . Disregarding  $C_{\text{gd}1}$  for simplicity, this leads to

$$r_{\text{ol}} > \left| \frac{1}{G_x} \right| = \frac{\left( \frac{\omega}{\omega_0} \right)^2 - 1}{g_{m2}}. \quad (15)$$

Combining (14b) and (15) establishes a critical design constraint

$$1 < \frac{\omega}{\omega_0} < \sqrt{g_{m2} r_{\text{ol}} + 1}. \quad (16)$$

As  $g_{m2} r_{\text{ol}}$  is usually large in amplifiers, the relationship in (16) can be easily satisfied. A more comprehensive analysis including the impact of  $C_{\text{gd}1}$  is provided in the Appendix A.

As part of the design process, we initially use the small-signal analysis described above to establish the foundation for our design parameters. We then conduct large-signal S-parameter (LSSP) simulations in Cadence Spectre to obtain all critical large-signal transistor parameters. These LSSP parameters are used to replace the values in the initial small-signal design equations. To ensure PA stability across all operational frequencies, further validations are conducted via large-signal harmonic balance and transient simulations at the design stage.

An important phenomenon observed during the TX simulation is the modulation of upconverted QPSK signal by low-frequency fluctuations on the supply rail. This unwanted envelope fluctuation is also coupled through the transformer to the PA input, creating an excess low-frequency offset. This phenomenon is more notable if the LO, the modulator, the PA, and the digital components use multiple voltage supplies. Moreover, the subsequent PA stages may also experience common-mode stability issues due to potential feedback formed by on-chip bypass capacitors and ground or supply rail inductances from routing (or bondwires at low frequency). These issues can be managed by several design techniques. First, de-Qing the bypass capacitors will help

<sup>1</sup>Proof is shown in Appendix B.



Fig. 11. (a) Layout of a wideband PA. (b) Microphotograph of the fabricated PA. (c) Simulated  $P_{\text{sat}}$ , PAE, and (d) stability  $k$ -factor.

suppress oscillation as the instability source emerges from within the PA [34]. Second, to reduce potential common-mode instability, we eliminate any potential inductive coupling between the drain and gate for the cascode device at the input stage, contrary to the coupling factor introduced in the last stage. This is achieved by replacing a differential inductor at the gate of the cascode device with two single-ended inductors [Fig. 11(a)] and shorting the biasing side of the two single-ended inductors at the SPI-controlled DAC.

The final standalone layout of the PA, excluding the modulator output matching network, is depicted in Fig. 11(a). The micro-photograph of the entire fabricated PA, occupying an area of  $300 \times 113 \mu\text{m}^2$ , is shown in Fig. 11(b). Post-layout simulation results indicate a saturation power ( $P_{\text{sat}}$ ) exceeding 9 dBm and a 3-dB bandwidth surpassing 40 GHz, as shown in Fig. 11(c). LSSP simulation confirms  $k > 1$  across *F*-band, as illustrated in Fig. 11(d), demonstrating unconditional stability of the PA.

#### D. Frequency Doubler and LO Chain

As described in Section II-A, the TX uses an off-chip 30-GHz signal for the LO input, enabling on-chip generation of a 120-GHz signal via two doubler chains. Due to the topological similarities between these chains, only one is illustrated for the sake of brevity. As depicted in Fig. 12(a), the configuration includes a 30-GHz buffer, a doubler, and a 60-GHz buffer. The use of a cascode topology for each LO buffer ensures stability and sufficient output power, while the inclusion of thick-oxide NMOS transistors in the 30-GHz buffer stage enhances the design's durability.

The design reliability is further enhanced by a carefully designed matching network, which integrates 30-GHz ESD-protected input pads and uses tapered GCPW routing with a transformer matching network. This approach effectively mitigates capacitive effects of ESD diodes, ensuring robust transfer of the off-chip LO signal into the chip. The efficacy of the matching network is demonstrated by the simulated input matching shown in Fig. 12(c), where  $S_{11} < -10$  dB is achieved across the 27.5–34-GHz range.

Given the power-hungry nature of the LO chain in the TX design, a multiport waveform shaping technique has been adopted to significantly improve power efficiency [35].



Fig. 12. (a) Schematic of LO doubler and buffers. (b) Layout of LO doubler and buffers. (c) Simulated  $S_{11}$ . (d) Simulated  $S_{22}$  and total efficiency.

As shown in Fig. 12(a), this doubler achieves zero dc power by biasing the gate of a cross-coupled NMOS pair at dc 0 V (class-C operation regime [36]). This multiport doubler is then connected to an additional buffer stage to ensure a satisfactory output power level to be able to drive the subsequent stages. Fig. 12(b) shows the detailed layout of the 30-GHz LO doubler chain, featuring ground grid distribution and localized bypass capacitors and occupies an area of  $470 \times 164 \mu\text{m}^2$ . The total

efficiency<sup>2</sup> of this doubler chain, peaking at 11.5% at 60 GHz, is shown in Fig. 12(d). The 60-GHz LO output port exhibits  $S_{22} < -10$  dB across the 56–64-GHz range and provides a >10-dBm power ready to be distributed to three QPSK sub-TXs through GCPW and Wilkinson-splitter network.

One question that may raise is why we opted to replicate the entire LO chain from 60 to 120 GHz for each sub-TX rather than using a 1:3 split after 120-GHz signal generation. While this approach may save chip area by reducing the number of doublers and buffers, the benefit is marginal, as multi-stage amplifiers and buffers at 120 GHz would still be required. Our decision to replicate the full LO chain is driven by several critical factors: First,  $f_{\max}/G_{\max}$  constraints of the technology significantly limit the available LO signal power when fewer amplifiers or buffers are used. Second, calibrating phase at 120 GHz would require doubling the tuning range of the phase shifter to maintain calibration effectiveness. However, varactors at this frequency exhibit limited  $Q$ -factors, potentially resulting in insufficient power to drive subsequent stages. Finally, the varactors are modeled with active device parameters, and at frequencies above 100 GHz, these models become less reliable. Considering these challenges, replicating the entire LO chain for each sub-TX offered the most robust and reliable solution.

#### E. LO IQ Generation

The 120-GHz LO IQ generation circuit, detailed in Fig. 13(a), comprised a 90° branch-line coupler followed by a phase shifter for IQ calibration and 120-GHz buffers. In conventional designs, IQ branch-line couplers necessitate quarter-wavelength routing on each side, which poses challenges within our system-level architecture. The critical concern arises from the long transmission line length in the vertical direction,  $\theta_2$ , which determines the vertical length of the entire QPSK sub-TX. This, in turn, directly influences the routing length from the PA output to the power combiner. The insertion loss incurred by this routing critically affects the total output power  $P_{\text{out}}$  of the TX. To address these constraints and optimize the system design, unequal line lengths in the couplers have been used [37], in which the vertical length is determined by  $\theta_2 = \arcsin(Z_0/Z_2)$ , where  $Z_2$  is the characteristic impedance of the vertical branch line and  $Z_0$  is the reference impedance which is normally 50 Ω. Increasing  $Z_2$  will reduce the routing length  $\theta_2$ . A loaded capacitor at both ends of the transmission line is also introduced to further reduce the length while adding another degree of design freedom [Fig. 6(a)] [38].

To address PVT variations and IQ phase mismatches, a low-pass phase shifter using a 60-pH inductor and a PMOS varactor was developed, offering a 10° tuning range. This grants phase alignment for the IQ calibration process mentioned in Section II-B [Fig. 13(a)]. In addition, a 120-GHz buffer post-phase shifter balances LO amplitude across each IQ path at its saturation while providing ample power to drive the modulator. This pair of identical buffers when working at

<sup>2</sup>Total efficiency  $\eta_{\text{total}} = \frac{P_{\text{out}}}{P_{\text{dc}} + P_{\text{in}}} \times 100\%$ .



Fig. 13. (a) Schematic of IQ clock generation. Simulated performance of the IQ branchlike coupler: (b) reflection loss, (c) insertion loss, and (d) phase difference. (e) Microphotograph of the fabricated IQ clock generation block (excluding buffers).

its saturation also ensures the amplitude balance in IQ paths. The microphotograph of the 90° branch-line coupler and the phase shifter is shown in Fig. 13(e) which occupies an area of  $326 \times 274 \mu\text{m}^2$ . The simulated performance of the branch-line coupler is shown in Fig. 13(b)–(d), demonstrating a matching of <−10 dB for the 102–140-GHz range, an insertion loss of 3.6 dB, and −90° phase difference at the carrier frequency of 120 GHz.

#### F. PRBS Generation

To fully exploit the capabilities of our TX and ensure a thorough evaluation of baseband signal distribution and upconversion, we designed an on-chip high-speed  $2^9 - 1$  PRBS generator to maintain uniformity in constellation distribution while conserving hardware resource overhead. Each QPSK sub-TX uses a pair of PRBS blocks in its I and Q paths, necessitating six uncorrelated PRBS bit streams, namely,  $B\langle 0 \rangle$  to  $B\langle 5 \rangle$ . Traditional methods to generate uncorrelated PRBS

Fig. 14. Proposed  $2^9 - 1$  PRBS generation circuit diagram.

sequences by modifying XOR input locations within the linear feedback shift register (LFSR) loop (red wires in Fig. 14) are constrained by the number of registers, permitting a limited number of permutations in configuration. This limitation maintains a high correlation between adjacent bit streams,  $B(i)$  and  $B(i + 1)$ .

To address this challenge, we propose to delay the cyclic pattern by  $T_d = \lfloor (2^M - 1)/K \rfloor \cdot T_{\text{PRBS}}$  bit periods to minimize the correlation between  $B(0)$  to  $B(K - 1)$ , where  $M$  is the PRBS order,  $K$  is the total number of bit streams,  $T_{\text{PRBS}}$  is the PRBS clock period, and  $\lfloor \cdot \rfloor$  denotes the floor function. Fig. 14 demonstrates the proposed PRBS generator, where for clarity and simplicity, only  $B(0)$  and  $B(1)$  are shown. To start this PRBS generator, an off-chip RESET signal brings all six LFSR loops to a synchronized logic zero state. Referring to Fig. 14, this RESET signal is processed by timing logic circuits to generate SET(0) to SET(5) to initiate six LFSR loops, where  $T_d$  is generated by a counter driven by  $\text{CLK}_{\text{PRBS}}$ . Upon generation,  $B(0)$  to  $B(5)$  are routed through a transmission-line distribution network to each sub-TX in pairs, synchronized with the clock signal. The entire distribution network was modeled in Sonnet to guarantee the simultaneous arrival of all six bit-streams and synchronization of the three local PRBS clocks. Before integration into the modulator, each  $B(i)$  passes through a DEMUX-MUX network, which includes local D flip-flop buffers. Controlled by SPI, this network facilitates pulse shaping via a switched-RC mechanism and provides an off-chip testing pin, enabling quadrature upconversion sensing a single carrier for characterization of image-reject ratio (IRR). This PRBS design method ensures the bit streams are highly uncorrelated, allowing for the reuse of the LFSR loops.

#### IV. MEASUREMENT SETUP AND RESULTS

##### A. PCB Design and Chip Assembly

Fabricated in a 45-nm CMOS SOI process, the TX die photograph is depicted in Fig. 15, occupying an area of



Fig. 15. Micro-photograph of the bits-to-antenna RF-64QAM TX chip.

$4.5 \times 2.8 \text{ mm}^2$ . To realize the backside radiation, careful consideration was given to both the PCB design and the assembly process. At the outset, a hyper-hemispherical high-resistivity float-zone silicon (HRFZ-Si) lens with a 12-mm diameter was selected to enhance backside radiation. A fused silica wafer was diced into square tiles of  $1.5 \times 1.5 \text{ cm}^2$ , where each tile served as the substrate for assembly. As depicted in Fig. 16(a), a three-dielectric-layer PCB was engineered to evaluate the chip's performance. The foundational dielectric layer uses a 1.6-mm-thick FR-4, chosen for its durability and resistance to thermal deformation, thus minimizing warping during the PCB assembly and chip-bonding process. The incorporation of the chip, approximately 300- $\mu\text{m}$ -thick, introduced challenges in wire bonding due to the requirement for  $<0.2\text{-mm}$  height difference between the bonding interfaces.<sup>3</sup> To address this challenge, a Rogers RO4350B material of 0.1-mm thickness, featuring 0.5 oz copper on the top and bottom sides, was adopted as the top layer. This arrangement, matching the chip's thickness and including a  $4.8 \times 3.1 \text{ mm}^2$  cavity [Fig. 16(b)], ensures the chip can be positioned within the cavity, aligning

<sup>3</sup>This requirement may differ among various packaging companies.



Fig. 16. TX chip assembly and PCB stack-up. (a) Cross section diagram for final assembled chip and silicon lens. (b) PCB stack-up and cavity openings' diagram. (c) Dimetric views in SolidWorks for chip assembly.



Fig. 17. Setup for antenna radiation pattern measurement.

with the electrical characteristics projected in simulations. The choice of RO4350B material, with its dielectric constant  $\epsilon_r$  of approximately 3.6 at 30 GHz, supports good input impedance matching at mm-wave frequencies. An intermediate FR-4 layer enhances the PCB's layout flexibility for dc and low-frequency signal routings. The design's feasibility and structural integrity were verified through SolidWorks modeling, with dimetric views from both the front and back sides presented in Fig. 16(c). Non-conductive epoxy (EPO-TEK 353ND) with an  $\epsilon_r$  of about 3.2 was used to stick the silica wafer to the chip, PCB, and silicon lens. A cross-sectional view in Fig. 16(a) demonstrates the complete assembly of the TX system.

### B. Measurement Setups and Results

**1) Continuous-Wave Measurements:** A continuous-wave test was conducted on the chip, mounted on a 3-D rotatable holder at a distance of 10 cm, to evaluate the TX radiation pattern at the carrier frequency of 120 GHz, as depicted in Fig. 17. A Rohde & Schwarz SMA100B sent a 30-GHz signal to the chip as the LO input. A Keysight arbitrary waveform generator (AWG) generated a low-frequency single-tone signal



Fig. 18. Normalized radiation pattern of TX prototype at carrier frequency.



Fig. 19. Measured LO leakage cancellation and IRR results.



Fig. 20. Downconverted zero-IF spectrum of 1-GBd QPSK signal.

to act as the on-chip baseband signal, bypassing the PRBS generator in all continuous-wave tests. The received power was measured using a VDI/Erickson PM5B power meter, connected through a waveguide taper from WR10 to WR8.0 (WR8.0TA). The free-space path loss (FSPL) was estimated by the Friis formula [39]

$$FSPL \text{ (dB)} = 20 \lg(d) + 20 \lg(f_c) + 20 \lg\left(\frac{4\pi}{c}\right) \quad (17)$$

where  $d$  is the distance between TX and RX,  $f_c$  is the carrier frequency, and  $c$  is the speed of light. In this setup, the FSPL was calculated to be 54 dB at 120 GHz. The measured power from PM5B varied between -14 and -13 dBm in multiple test runs. Accounting for the 24-dBi standard gain of the horn antenna and the reported loss of 0.2 dB of WR8.0TA by VDI, this translates to a worst case measured effective isotropic radiated power (EIRP) of 16 dBm. A rotatable holder, shown in Fig. 17, enabled 3-D rotation of the TX for comprehensive radiation pattern measurement. The measured normalized radiation pattern results, displayed in Fig. 18, closely follow the simulation results in HFSS.

TABLE I  
COMPARISON OF THE PROTOTYPE WITH STATE-OF-THE-ART

|                         | This Work                    | [40]             | [41]              | [10]             | [42]         | [43]         | [44]           | [45]                |
|-------------------------|------------------------------|------------------|-------------------|------------------|--------------|--------------|----------------|---------------------|
| Technology              | <b>45nm SOI</b>              | 28nm CMOS        | 22nm FinFET       | 180nm SiGe       | 45nm SOI     | 16nm FinFET  | 45nm SOI       | 22nm FD-SOI+InP HBT |
| Architecture            | <b>Direct Mod.</b>           | Direct Mod.      | I/Q DAC           | Direct Mod.      | I/Q DAC      | I/Q DAC      | I/Q DAC        | I/Q DAC             |
| D/AInterface            | <b>Integrated 1-bit</b>      | Integrated 1-bit | Integrated 2-bit* | Integrated 1-bit | External AWG | External AWG | External AWG   | External AWG        |
| Modulation              | <b>64QAM</b>                 | 16QAM            | 16QAM             | 16QAM            | 64QAM        | 16QAM        | 64QAM          | 64QAM               |
| Carrier Frequency (GHz) | <b>120</b>                   | 135              | 140               | 115              | 149          | 109,135      | 143, 152       | 149                 |
| Antenna                 | <b>On-chip +Silicon Lens</b> | Off-chip         | -                 | Off-chip         | Off-chip     | Off-chip     | PCB +Flat Lens | LTCC                |
| EIRP (dBm)              | <b>16</b>                    | 8                | -                 | N.M.             | 8.3          | N.M.         | 16.4           | 27.5                |
| Pout (dBm)              | <b>1.4</b>                   | 0                | 0.8               | 1                | 0.1          | -5           | -9.6           | 20.5                |
| DR (Gbps)               | <b>120</b>                   | 32               | 160†              | 20               | 84.48        | 120          | 57.6           | 30                  |
| Pdc (mW)                | <b>880</b>                   | 287              | 173               | 520              | 420          | 584.5        | 1580           | 760                 |
| Eff. (pJ/bit)           | <b>7.33</b>                  | 9                | 1.1               | 26               | 4.97         | 4.87         | 27.4           | 25.3                |

N.M.= Not mentioned. \* Insufficient SQNR for practically attainable EVM and INL/DNL. [7]. †No wireless measurement.



Fig. 21. Downconverted 64QAM spectrum for 8, 16, and 20 Gb/s.

Hartley modulator tests were also performed on the TX by transmitting IQ single-tone signals from Keysight M8195A AWG and bypassing the on-chip PRBS block. To verify our LO leakage suppression concept, Hartley tests were conducted on both QPSK<sub>1</sub> sub-TX and the entire 64-QAM TX. The



Fig. 22. Setup for real-time oscilloscope measurement.

outcomes, including LO leakage and the 64-QAM TX IRR, are depicted in Fig. 19. An exemplary down-converted spectrum<sup>4</sup> is shown on the left side of Fig. 19. From the results in Fig. 19, it is evident that the proposed LO leakage cancellation technique enhanced the overall LO feedthrough by 7 ~ 10 dB in the RF-64QAM scheme in comparison to a single QPSK sub-TX. Moreover, the measured IRR for this prototype is >32 dB.

2) *Modulated-Signal Measurements*: As discussed in Section III-D and illustrated in Fig. 12(d), our LO chain design prioritizes power efficiency over bandwidth expansion. This design choice precludes the possibility of evaluating our TX bandwidth by sweeping the LO signal across the entire RF spectrum. Instead, we adopted an alternative approach for bandwidth measurement, inspired by the methodology presented in [41]. By activating only a single sub-TX and transmitting a 1-GHz PRBS signal, we were able to assess the TX bandwidth. The measured QPSK spectrum in Fig. 20,

<sup>4</sup>A signal at  $1 + 120 = 121$  GHz was down-converted by a signal of  $9.8 \times 12 = 117.6$  GHz from the VDI signal generator frequency extender.



Fig. 23. Measured equalized eyediagrams and constellations for (a) QPSK, (b) 16QAM, and (c) 64QAM.

which is captured after the signal downconversion to the baseband, shows a 3-dB drop at 20 GHz from the ideal sinc function, indicating a 40-GHz RF bandwidth. This measured RF bandwidth corresponds to a 20-GHz baseband bandwidth, theoretically allowing up to 40 GBd for the system, which translates to 240 Gb/s for a 64QAM signal. However, our measurements capped the data rate at 20 GBd, and the reasons for not achieving higher rates are multi-faceted: First, the PRBS generator was originally designed for a maximum of 20 GBd. While simulations suggested overdriving it with a higher  $V_{DD}$  could extend its capability to 25 GBd, practical challenges such as  $IR$  drops and self-heating prevented the PRBS chain from functioning effectively at higher rates. Second, inadequate bypass capacitances on certain CML buffers along the PRBS signal path led to vertical eye closure at higher data rates, exacerbated by increased “switching noise” at  $V_{DD}$ . This issue became more pronounced with higher modulation orders, such as 64QAM. Finally, the lack of advanced on-chip pulse-shaping hardware further constrained performance. The current implementation, relying solely on basic  $RC$  filters without DSP support, was insufficient to fully mitigate ISI at higher data rates.

Fig. 21 displays the measured spectrum at 30-cm TX–RX distance for the downconverted 64QAM signals at 8, 16, and 20 GBd, captured using a Rohde & Schwarz FSVA40 signal analyzer. For the proposed PRBS pattern of  $2^9 - 1$ , the beating frequency difference  $\Delta f_B$  is derived to be

$$\Delta f_B = \frac{f_{BD}}{2^9 - 1} \quad (18)$$

where  $f_{BD}$  represents the baud rate of the baseband signal. Accordingly, we anticipate  $\Delta f_B$  to be 15.66, 31.31, and 39.1 MHz for baud rates of 8, 16, and 20 GBd, respectively. The right side of Fig. 21 provides a zoomed-in view of the spectrum, confirming our expectations. In addition, a 3-cm wireless link measurement was performed by downconverting the signal through an  $F$ -band balanced mixer, followed by demodulation using a Keysight DSAV334A 33-GHz real-time

oscilloscope for eyediagrams and a Tektronix DPO77002SX 70-GHz ATI performance oscilloscope for real-time data capture and processing for constellation and EVM analysis (Fig. 22). The limited transmission distance is primarily due to significant path loss at this carrier frequency further exacerbated by the absence of a low-noise amplifier in the off-the-shelf RX, the integrated noise across a 40-GHz RF bandwidth, and the sensitivity constraints of the wideband oscilloscope. These factors are critical for achieving a discernible eye-opening for the 64QAM signal. The measured eye diagrams for QPSK (using QPSK<sub>1</sub> sub-TX alone), 16QAM (using both QPSK<sub>1</sub> and QPSK<sub>2</sub> sub-TXs), and 64QAM, obtained through the oscilloscope’s internal continuous-time linear equalizers (CTLEs) and baud rate decision feedback equalizers (DFEs), are displayed in Fig. 23. Up to 40 taps of DFE were used to recover 20-GBd 64QAM signals primarily to cancel out the unavoidable group delay variation and channel frequency response and cable-loss on the RX side. The distinct eye-openings at 48 and 120 Gb/s affirm the performance of the proposed RF-64QAM TX. Furthermore, the equalized 64QAM constellation diagram at 120 Gb/s exhibited



Fig. 24. Power breakdown for the RF-64QAM TX.

an  $EVM_{rms}$  of  $-18.6$  dB. A comprehensive performance summary of the RF-64QAM prototype alongside a comparison to the state-of-the-art is presented in Table I, and a power breakdown is shown in Fig. 24. This prototype represents a complete fully integrated bits-to-antenna 64QAM TX capable of  $>100$  Gb/s transmission, paving the way for next-generation wireless communication.

## V. CONCLUSION

A comprehensive CMOS single-chip solution for  $>100$ -Gb/s 64QAM tailored for 6G applications is presented. The *F*-band RF-64QAM TX prototype was fabricated in the 45-nm CMOS SOI. The demonstrated performance showcased 120-Gb/s wirelessly measured data rate with an  $EVM_{rms}$  of  $-18.6$  dB. The measured EIRP was reported to be at approximately 16 dBm, coupled with a power efficiency of 7.3 pJ/b. In addition, this TX successfully achieved an LO leakage cancellation of over 7 dB, further enhancing the overall performance of the system.

## APPENDIX A

### STABILITY DISCUSSION WITH CONSIDERATION OF $C_{GD1}$

If we consider  $C_{gd1}$  for  $M_1$ , the input impedance of  $M_1$  can be expressed as

$$Y_{in} = j\omega C_{gs1} + j\omega C_{gd1} \left( \frac{g_{m1} + Y_x}{Y_x + j\omega C_{gd1}} \right). \quad (\text{A.1})$$

Here,  $Y_x$  is as defined in (13). Ensuring unconditional stability of the amplifier requires the real part of  $Y_{in}$  to be positive, leading to

$$(C_x + C_{gd1})(g_{m1} + G_x) > G_x C_x. \quad (\text{A.2})$$

This condition introduces another design constraint

$$\frac{\omega}{\omega_0} > \sqrt{1 + \frac{g_{m1}C_{gs1} + g_{m2}C_{gd1}}{g_{m1}C_{gd1}}}. \quad (\text{A.3})$$

Combining with (16), we derive a comprehensive design guideline for PA stability

$$\sqrt{1 + \frac{g_{m1}C_{gs1} + g_{m2}C_{gd1}}{g_{m1}C_{gd1}}} < \frac{\omega}{\omega_0} < \sqrt{g_{m2}r_{o1} + 1}. \quad (\text{A.4})$$

Equation (A.4) outlines a more detailed constraint for ensuring the stability of the PA in the design process.

## APPENDIX B

### $\pi$ -MODEL INDUCTANCES FOR WEAK COUPLING CASE

In this appendix, we aim to demonstrate that in a  $\pi$ -model of a transformer, all the inductances exceed the original values of the two coils when these are weakly coupled. This insight is particularly valuable for designers working at sub-terahertz frequencies, where inductor size or SRF could be design-limiting factors.

Given the  $\pi$ -model inductance values as outlined in (12), we define weak coupling as the condition where the mutual

inductance  $M < \min(L_s, L_p)$ . Examining the difference between  $L_1$  and  $L_p$ , we derive

$$L_1 - L_p = \frac{M(L_p - M)}{L_s - M} > 0. \quad (\text{B.1})$$

Similarly, for  $L_3$  compared with  $L_s$ , we have

$$L_3 - L_s = \frac{M(L_s - M)}{L_p - M} > 0. \quad (\text{B.2})$$

Equations (B.1) and (B.2) confirm that  $L_1$  and  $L_3$  in a  $\pi$ -model are invariably larger than the original values of the two coils, suggesting that by strategically tuning  $L_2$ , the inductor footprint can be made more compact and the SRF of these inductances could be increased, which is beneficial for sub-terahertz designs.

To assess  $L_2$ , we examine the function  $f(k) = (1 - k^2)/k$  and its first-order derivative

$$\frac{df}{dk} = -\frac{1}{k^2} - 1 < 0 \quad (\text{B.3})$$

which holds true for all possible  $k$  values. And from (12b), we know that  $L_2 = f(k)(L_p L_s)^{1/2}$ , will increase if  $k$  gets smaller. Given practical constraints at sub-THz frequencies, including skin and proximity effects as well as parasitics inherent to the technology,  $k$  generally remains below 0.6. This results in  $L_2 > 1.067(L_p L_s)^{1/2}$  across the board for sub-THz designs. Specifically, for our case that  $k \simeq 0.2$ ,  $L_2$  is roughly four times larger than the root average of individual inductances.

## ACKNOWLEDGMENT

The authors appreciate GlobalFoundries for their sponsorship in chip fabrication and extend their thanks to Keysight Technologies, Inc., especially Brian McCarver and Mani Peroomal, for their invaluable support with test equipment. Special thanks to Judy Pourazima from Rohde & Schwarz for equipment assistance, along with Bryan Bowman, Michele LaBrecque, and Tony Ambrose from Tektronix for their timely help on the oscilloscope setup and usage. Moreover, they are thankful to Dr. Hamidreza Aghasi for fruitful technical discussions and for providing equipment access, Dr. Yaoyu Cao for the discussion on the PRBS generator, and Xuyang Liu for his role in equipment provision. Finally, they would like to extend their gratitude to Md Hedayatullah Maktoomi for insightful discussions on the combiner and technical assistance with the antenna simulation setup.

## REFERENCES

- [1] Department of Defense Reserach & Engineering Enterprise. *FutureG Office—Advancing Telecommunication for America’s Warfighters*. Accessed: 2024. [Online]. Available: <https://rt.cto.mil/futureg-home/>
- [2] Keysight. (2023). *Next-Generation Wireless: A Guide to Fundamentals 6G*. [Online]. Available: <https://www.keysight.com/us/en/assets/7123-1050/ebooks/Next-Generation-Wireless-A-Guide-to-the-Fundamentals-of-6G.pdf>
- [3] P. Heydari, “Terahertz integrated circuits and systems for high-speed wireless communications: Challenges and design perspectives,” *IEEE Open J. Solid-State Circuits Soc.*, vol. 1, pp. 18–36, 2021.
- [4] A. M. Niknejad and S. P. Voinigescu, “Digital mm-wave silicon transmitters,” in *mm-Wave Silicon Power Amplifiers and Transmitters*, H. Hashemi and S. Raman, Eds., Cambridge, U.K.: Cambridge Univ. Press, 2016, ch. 9, pp. 334–375.

[5] S. Shopov, A. Balteanu, and S. P. Voinigescu, "A 19dBm, 15Gbaud, 9bit SOI CMOS power-DAC cell for high-order QAM W-band transmitters," *IEEE J. Solid-State Circuits*, vol. 49, no. 7, pp. 1653–1664, Jul. 2014.

[6] P. Nazari, S. Jafarrou, and P. Heydari, "A CMOS two-element 170-GHz fundamental-frequency transmitter with direct RF-8PSK modulation," *IEEE J. Solid-State Circuits*, vol. 55, no. 2, pp. 282–297, Feb. 2020.

[7] M. Oveis, H. Wang, and P. Heydari, "A study of a millimeter-wave transmitter architecture realizing QAM directly in RF domain," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 70, no. 6, pp. 2243–2256, Jun. 2023.

[8] Z. Wang, H. Wang, Y. O. Hassan, and P. Heydari, "A CMOS fully integrated 120-Gbps RF-64QAM F-band transmitter with an on-chip antenna for 6G wireless communication," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2024, pp. 343–346.

[9] Z. Wang, H. Wang, and P. Heydari, "CMOS power-amplifier design perspectives for 6G wireless communications," in *Proc. IEEE Int. Midwest Symp. Circuits Syst. (MWSCAS)*, Aug. 2021, pp. 753–756.

[10] H. Wang, H. Mohammadnezhad, and P. Heydari, "Analysis and design of high-order QAM direct-modulation transmitter for high-speed point-to-point mm-wave wireless links," *IEEE J. Solid-State Circuits*, vol. 54, no. 11, pp. 3161–3179, Nov. 2019.

[11] M. H. Maktoomi, Z. Wang, H. Wang, S. Saadat, P. Heydari, and H. Aghasi, "A sub-terahertz wideband stacked-patch antenna on a flexible printed circuit for 6G applications," *IEEE Trans. Antennas Propag.*, vol. 70, no. 11, pp. 10047–10061, Nov. 2022.

[12] M. H. Maktoomi, S. Saadat, O. Momeni, P. Heydari, and H. Aghasi, "Broadband antenna design for terahertz communication systems," *IEEE Access*, vol. 11, pp. 20897–20911, 2023.

[13] X. Gu, D. Liu, and B. Sadhu, "Packaging and antenna integration for silicon-based millimeter-wave phased arrays: 5G and beyond," *IEEE J. Microw.*, vol. 1, no. 1, pp. 123–134, Jan. 2021.

[14] A. Babakhani, X. Guan, A. Komijani, A. Natarajan, and A. Hajimiri, "A 77-GHz phased-array transceiver with on-chip antennas in silicon: Receiver and antennas," *IEEE J. Solid-State Circuits*, vol. 41, no. 12, pp. 2795–2806, Dec. 2006.

[15] J. Kaiser, "The Archimedean two-wire spiral antenna," *IRE Trans. Antennas Propag.*, vol. 8, no. 3, pp. 312–323, May 1960.

[16] C. A. Balanis, *Antenna Theory: Analysis and Design*, 4th ed., Hoboken, NJ, USA: Wiley, 2016.

[17] M. Ramezani, H. Miladi, M. Shahabadi, S. Mohajerzadeh, and A. H. Tamaddon, "Design and fabrication of on-chip square spiral antenna for ISM band," in *Proc. Int. Conf. Instrum., Commun., Inf. Technol., Biomed. Eng.*, Nov. 2009, pp. 1–4.

[18] E. J. Wilkinson, "An N-way hybrid power divider," *IEEE Trans. Microw. Theory Techn.*, vol. MTT-8, no. 1, pp. 116–118, Jan. 1960.

[19] H. Wang et al. (Feb. 2024). *Power Amplifiers Performance Survey 2000-Present*. [Online]. Available: <https://ideas.ethz.ch/Surveys/pa-survey.html>

[20] T. Maiwald et al., "A review of integrated systems and components for 6G wireless communication in the D-band," *Proc. IEEE*, vol. 111, no. 3, pp. 220–256, Mar. 2023.

[21] S. C. Cripps, *RF Power Amplifiers for Wireless Communications*. Norwood, MA, USA: Artech House, 2006, ch. 9, pp. 240–250.

[22] *N-Way Wilkinson Splitters*. Accessed: 2024. [Online]. Available: <https://www.microwaves101.com/encyclopedia/n-way-wilkinson-splitters>

[23] M. E. Goldfarb, "A recombinant, in-phase power divider," *IEEE Trans. Microw. Theory Techn.*, vol. 39, no. 8, pp. 1438–1440, Aug. 1991.

[24] L. I. Parad and R. L. Moynihan, "Split-tee power divider," *IEEE Trans. Microw. Theory Techn.*, vol. MTT-13, no. 1, pp. 91–95, Jan. 1965.

[25] A. I. Omi, Ahammad, M. H. Maktoomi, M. A. Maktoomi, and P. K. Sekhar, "Miniaturized wideband three-way power dividers with arbitrary band ratio using a new analytical design technique," *IEEE Access*, vol. 11, pp. 72148–72158, 2023.

[26] K. Hettak, G. A. Morin, and M. G. Stubbs, "Compact MMIC CPW and asymmetric CPS branch-line couplers and Wilkinson dividers using shunt and series stub loading," *IEEE Trans. Microw. Theory Techn.*, vol. 53, no. 5, pp. 1624–1635, May 2005.

[27] D. Mirshekar-Syahkal, "Computation of equivalent circuits of CPW discontinuities using quasi-static spectral domain method," *IEEE Trans. Microw. Theory Techn.*, vol. 44, no. 6, pp. 979–984, Jun. 1996.

[28] R. N. Simons and G. E. Ponchak, "Modeling of some coplanar waveguide discontinuities," *IEEE Trans. Microw. Theory Techn.*, vol. 36, no. 12, pp. 1796–1803, Dec. 1988.

[29] D. Simic and P. Reynaert, "A 14.8dBm 20.3dB power amplifier for D-band applications in 40nm CMOS," in *Proc. IEEE Radio Freq. Integr. Circuits (RFIC) Symp.*, Jun. 2018, pp. 232–235.

[30] K. K. Clarke and D. T. Hess, *Communication Circuits: Analysis and Design*. Reading, MA, USA: Addison-Wesley, 1971, ch. 2, pp. 62–64.

[31] Y.-H. Yu, Y.-S. Yang, and Y.-J. E. Chen, "A compact wideband CMOS low noise amplifier with gain flatness enhancement," *IEEE J. Solid-State Circuits*, vol. 45, no. 3, pp. 502–509, Mar. 2010.

[32] P. Guan et al., "A fully integrated bit-to-bit 24/48Gb/s QPSK/16-QAM D-band transceiver with mixed-signal baseband in 28nm CMOS technology," in *Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC)*, Nov. 2023, pp. 1–3.

[33] P. Guan et al., "A fully integrated QPSK/16-QAM D-band CMOS transceiver with mixed-signal baseband circuitry realizing digital interfaces," *IEEE J. Solid-State Circuits*, vol. 59, no. 10, pp. 3123–3141, Oct. 2024.

[34] D. Chowdhury, P. Reynaert, and A. M. Niknejad, "Transformer-coupled power amplifier stability and power back-off analysis," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 55, no. 6, pp. 507–511, Jun. 2008.

[35] S. Li, T. Chi, T.-Y. Huang, M.-Y. Huang, D. Jung, and H. Wang, "A buffer-less wideband frequency doubler in 45-nm CMOS-SOI with transistor multiport waveform shaping achieving 25% drain efficiency and 46–89 GHz instantaneous bandwidth," *IEEE Solid-State Circuits Lett.*, vol. 2, no. 4, pp. 25–28, Apr. 2019.

[36] Z. Chen and P. Heydari, "An 85–95.2 GHz transformer-based injection-locked frequency tripler in 65nm CMOS," in *IEEE MTT-S Int. Microw. Symp. Dig.*, May 2010, pp. 776–779.

[37] C. Toker, M. Saglam, M. Ozme, and N. Gunalp, "Branch-line couplers using unequal line lengths," *IEEE Trans. Microw. Theory Techn.*, vol. 49, no. 4, pp. 718–721, Apr. 2001.

[38] T. Hirota, A. Minakawa, and M. Muraguchi, "Reduced-size branch-line and rat-race hybrids for uniplanar MMIC's," *IEEE Trans. Microw. Theory Techn.*, vol. 38, no. 3, pp. 270–275, Mar. 1990.

[39] H. T. Friis, "A note on a simple transmission formula," *Proc. IRE*, vol. 34, no. 5, pp. 254–256, May 1946.

[40] C. D'heer and P. Reynaert, "A fully integrated 135-GHz direct-digital 16-QAM wireless and dielectric waveguide link in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 59, no. 3, pp. 889–907, Mar. 2024.

[41] S. Callender et al., "A fully integrated 160-Gb/s D-band transmitter achieving 1.1-pJ/b efficiency in 22-nm FinFET," *IEEE J. Solid-State Circuits*, vol. 57, no. 12, pp. 3582–3598, Dec. 2022.

[42] A. Hamani et al., "A D-band multichannel TX system-in-package achieving 84.48 Gb/s with 64-QAM based on 45-nm CMOS and low-cost PCB technology," *IEEE Trans. Microw. Theory Techn.*, vol. 70, no. 7, pp. 3385–3395, Jul. 2022.

[43] G. C. Dogiamis et al., "A 120-Gb/s 100–145-GHz 16-QAM dual-band dielectric waveguide interconnect with package integrated diplexers in Intel 16," *IEEE Solid-State Circuits Lett.*, vol. 5, pp. 178–181, 2022.

[44] J. L. González-Jiménez et al., "A D-Band transmitter achieving 57.6-Gb/s and 30-dBm EIRP based on channel-aggregation 45-nm ICs and a low-profile flat lens antenna," *IEEE Trans. Microw. Theory Techn.*, vol. 72, no. 1, pp. 836–850, Jan. 2024.

[45] A. A. Farid, A. S. H. Ahmed, A. Dhananjay, and M. J. W. Rodwell, "A fully packaged 135-GHz multiuser MIMO transmitter array tile for wireless communications," *IEEE Trans. Microw. Theory Techn.*, vol. 70, no. 7, pp. 3396–3405, Jul. 2022.



**Zisong Wang** (Member, IEEE) received the B.S. and M.S. degrees from the University of Science and Technology of China (USTC), Hefei, China, in 2012 and 2015, respectively, the M.S. degree from the University of Southern California (USC), Los Angeles, CA, USA, in 2017, and the Ph.D. degree from the University of California at Irvine (UCI), Irvine, CA, USA, in 2024, all in electrical engineering.

From 2012 to 2015, he was a Research Assistant at USTC, focusing on the design of high-purity, high-efficiency oscillators. From 2015 to 2017, his research at USC specializes in mixed-signal integrated circuits. His doctoral research at UCI centered on developing high-speed wireless transmitters at sub-THz. In the Summer of 2023, he interned at Marvell Technology, Inc., Irvine, CA, where he contributed to analog and mixed-signal IC design for optical interconnects, and currently serves as a Staff Engineer. His research interests include high-speed, high-efficiency, low-noise IC design for next-generation communications.

Dr. Wang was a recipient of the UCI EECS Departmental Fellowship in 2017. He is a member of the Editorial Review Board (ERB) of IEEE SOLID-STATE CIRCUITS LETTERS and serves as a reviewer for the IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE TRANSACTIONS ON MICROWAVE THEORY AND TECHNIQUES, IEEE JOURNAL OF ELECTROMAGNETICS, RF AND MICROWAVES IN MEDICINE AND BIOLOGY, IEEE MICROWAVE AND WIRELESS TECHNOLOGY LETTERS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, the IEEE International Symposium on Circuits and Systems (ISCAS), and the IEEE International Midwest Symposium on Circuits and Systems (MWSCAS).



**Huan Wang** (Senior Member, IEEE) received the B.S. degree from Zhejiang University, Hangzhou, China, in 2011, the M.S. degree from the University of Texas at Austin, Austin, TX, USA, in 2013, and the Ph.D. degree from the University of California at Irvine, Irvine, CA, USA, in 2020, all in electrical engineering.

From 2013 to 2015, he was with Cirrus Logic, Austin, as an Analog Design Engineer, where he designed audio class-D amplifiers for mobile applications. He was an Engineering Intern with Qualcomm Inc., San Diego, CA, USA, in 2012, 2016, and 2019, where he is currently working full-time on analog/RF/mm-wave IC design for wireless cellular communications. His research interests include analog, RF, and mm-wave/THz circuits and system design.

Dr. Wang was a recipient of the UCI EECS Department Fellowship in 2015, the Maschoff Brennan Innovator Scholarship in 2019, and a co-recipient of the 2024 IEEE Circuits and Systems Society Darlington Best Paper Award. He serves as a reviewer for IEEE JOURNAL OF SOLID-STATE CIRCUITS, IEEE SOLID-STATE CIRCUITS LETTERS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS, and IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS.



**Youssef O. Hassan** (Graduate Student Member, IEEE) received the B.S. degree (Hons.) in electrical engineering from Cairo University, Giza, Egypt, in 2019, and the M.S. degree in electrical engineering from the University of California at Irvine, Irvine, CA, USA, in 2023, where he is currently pursuing the Ph.D. degree in electrical engineering.

In 2017, he contributed to a research on biologically inspired neuron models at the Opto-Nano-Electronics Laboratory (ONELab), Cairo University. From 2019 to 2020, he was an Analog Physical

Design Engineer at Goodix Technology, Inc., Egypt, Cairo. He also completed summer internships at Movandi, Irvine, in 2022; and Qualcomm, San Diego, CA, USA, in 2023 and 2024, focusing on advanced IC technologies. His doctoral research focuses on the analysis and design of RF and mm-wave/THz integrated circuits.

Mr. Hassan was a recipient of the 2019 Microelectronics Olympiad Participation Award in Armenia. He serves as a reviewer for IEEE JOURNAL OF SOLID-STATE CIRCUITS and IEEE SOLID-STATE CIRCUITS LETTERS.



**Payam Heydari** (Fellow, IEEE) received his B.S. and M.S. degrees in electrical engineering from the Sharif University of Technology in 1992 and 1995, respectively. He received his Ph.D. degree from the University of Southern California in 2001. He is currently the Henry Samueli Faculty Excellence Professor and the University Chancellor's Professor at the University of California, Irvine (UCI). He is the (co)-author of two books, three book chapters, and more than 200 journal and conference papers. He served as a Distinguished Lecturer of both the IEEE Solid-State Circuits Society (SSCS) (2014-2016) and the IEEE Microwave Theory and Technology Society (MTT-S) (2019-2022). The primary focus of Dr. Heydari's research is on mixed-signal, analog, radio-frequency, millimeter-wave, and terahertz integrated circuit design.

Dr. Heydari is a fellow of the National Academy of Inventors. Dr. Heydari and co-authors are the first-place winners of the IEEE Engineering in Medicine and Biology Conference (EMBC) 2024 Young Professional Paper Competition for their EMBC 2024 paper on brain-machine interfaces (BMI). He is the co-recipient of the 2024 IEEE Circuits and Systems Society Darlington Best Paper Award, and is the recipient of the 2023 IEEE MTT-S Distinguished Educator Award, the 2021 IEEE SSCS Innovative Education Award, and Best Invited Paper Award at the 2021 IEEE Custom Integrated Circuits Conference (CICC). He was selected as the inaugural Faculty Innovation Fellow by the UCI's Beall Applied Innovation. He was the recipient of the 2016-2017 UCI School of Engineering Mid-Career Excellence in Research, the 2014 Distinguished Engineering Educator Award from Orange County Engineering Council, the 2010 Faculty of the Year Award from UCI's Engineering Student Council (ESC), the 2009 Business Plan Competition First Place Prize Award and Best Concept Paper Award both from Paul Merage School of Business at UCI, the 2009 School of Engineering Fariborz Maseeh Best Faculty Research Award, the 2007 IEEE Circuits and Systems Society Guillemin-Cauer Award, the 2005 IEEE Circuits and Systems Society Darlington Best Paper Award, the 2005 National Science Foundation (NSF) CAREER Award, the 2005 Henry Samueli School of Engineering Teaching Excellence Award, the Best Paper Award at the 2000 IEEE Int'l Conference on Computer Design (ICCD), and the 2001 Technical Excellence Award from the Association of Professors and Scholars of Iranian Heritage (APSIH). His research on novel low-power multi-purpose multi-antenna RF front-ends received the Low-Power Design Contest Award at the 2008 IEEE Int'l Symposium on Low-Power Electronics and Design (ISLPED).

Dr. Heydari is an Associate Editor (AE) of IEEE JOURNAL OF SOLID-STATE CIRCUITS (JSSC), IEEE OPEN JOURNAL OF SOLID-STATE CIRCUITS SOCIETY (OJ-SSCS), and was an AE of IEEE SOLID-STATE CIRCUITS LETTERS (SSC-L) and IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS (TCAS-I). He serves on the Technical Program Committee (TPC) of the IEEE International Microwave Symposium (IMS), and was a TPC member of the International Solid-State Circuits Conference (ISSCC), IEEE European Solid-State Circuits Conference (ESSCIRC), and CICC. From 2018 to 2020, he was the AdCom member of the IEEE SSCS. He is the director of the Nanoscale Communication IC (NCIC) Labs.