

# Multi-Mode Spatial Signal Processor With Rainbow-Like Fast Beam Training and Wideband Communications Using True-Time-Delay Arrays

Chung-Ching Lin<sup>1</sup>, *Graduate Student Member, IEEE*, Chase Puglisi, *Member, IEEE*,

Veljko Boljanovic<sup>1</sup>, *Graduate Student Member, IEEE*, Han Yan<sup>1</sup>, *Member, IEEE*,

Erfan Ghaderi<sup>1</sup>, *Member, IEEE*, Jayce Gaddis, *Student Member, IEEE*,

Qiuyan Xu<sup>1</sup>, *Graduate Student Member, IEEE*, Sreeni Poolakkal, *Graduate Student Member, IEEE*,

Danijela Cabric, *Fellow, IEEE*, and Subhanshu Gupta<sup>1</sup>, *Senior Member, IEEE*

**Abstract**—Initial access in millimeter-wave (mmW) wireless is critical toward successful realization of the fifth-generation (5G) wireless networks and beyond. Limited bandwidth in existing standards and use of phase-shifters in analog/hybrid phased-antenna arrays (PAAs) are not suited for these emerging standards demanding low-latency direction finding. This work proposes a reconfigurable true-time-delay (TTD)-based spatial signal processor (SSP) with frequency-division beam training methodology and wideband beam-squint less data communications. Discrete-time delay compensated clocking technique is used to support 800-MHz bandwidth with a large unity-gain bandwidth ring-amplifier (RAMP)-based signal combiner. To extensively characterize the proposed SSP across different SSP modes and frequency-angle pairs, an automated testbed is developed using computer vision techniques that significantly speeds up the testing progress and minimizes possible human errors. Using seven levels of time-interleaving for each of the four antenna elements, TTD SSP has a delay range of 3.8 ns over 800 MHz and achieves unique frequency-to-angle mapping in the beam training mode with nearly 12-dB frequency-independent gain in the beamforming mode. The SSP is prototyped in 65-nm CMOS with an area of 1.98 mm<sup>2</sup> consuming only 29 mW excluding buffers. Furthermore, an error vector magnitude (EVM) of 7.3% is realized for 16-QAM modulation at a speed of 614.4 Mb/s.

Manuscript received January 13, 2022; revised March 18, 2022 and April 27, 2022; accepted May 22, 2022. This article was approved by Associate Editor Kenichi Okada. This work was supported in part by the NSF under Grant 1955672, Grant 1705026, and Grant 1944688; and in part by the ComSenTer and CONIX Research Centers, two of six centers in Joint University Microelectronics Program (JUMP), a Semiconductor Research Corporation (SRC) Program sponsored by the Defense Advanced Research Projects Agency (DARPA). (Corresponding author: Subhanshu Gupta.)

Chung-Ching Lin, Chase Puglisi, Erfan Ghaderi, Jayce Gaddis, Qiuyan Xu, Sreeni Poolakkal, and Subhanshu Gupta are with the School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA 99164 USA (e-mail: subhanshu.gupta@wsu.edu).

Veljko Boljanovic, Han Yan, and Danijela Cabric are with the Department of Electrical and Computer Engineering, University of California at Los Angeles, Los Angeles, CA 90095 USA.

Color versions of one or more figures in this article are available at <https://doi.org/10.1109/JSSC.2022.3178798>.

Digital Object Identifier 10.1109/JSSC.2022.3178798

**Index Terms**—Fast beam training, frequency-to-angle mapping, ring amplifier (RAMP), true-time-delay (TTD) spatial signal processing.

## I. INTRODUCTION

A BUNDANT spectral resources in the millimeter-wave (mmW) regime have opened new possibilities to deliver Gb/s data. Owing to the large propagation losses at mmW, multi-antenna systems are needed for improved signal-to-noise ratio (SNR) and link quality. However, the resulting narrow-pointed beams necessitate fast beam acquisition especially for communications applications on-the-move. To acquire fast channel state information (CSI), maximize the received power, and fully benefit from the beamforming technology, a process known as beam training is widely adopted to search the optimal angles to direct the beams.

The existing beam training methodologies are based on iterative scanning approach limited to non-scalable, time-, and/or power-consuming scenarios and pose numerous challenges toward low-latency requirement for both the base station (BS) and the user equipment (UE). Fig. 1(a) illustrates the time-division-based beam scanning method used in most of the analog phased antenna arrays (PAAs). By configuring the phase shifters inherent to these arrays and sensing the received power, the CSI can be obtained. Finer angle resolution is limited by aperture that increases the design complexity and consumes large power and silicon area [1]. In contrast to the time-division-based beam scanning method, Fig. 1(b) shows the frequency-division-based beam scanning approach that obtains CSI by observing different array frequency responses within a single orthogonal frequency-division multiplexing (OFDM) symbol saving critical time needed for beam scanning. The need for a constant group delay in the frequency-division-based beam training limits its realization when only phase shifters are used [2]. With PAAs, the frequency-to-angle mapping created by the beam-squint error primarily depends on the unknown angle-of-arrival



Fig. 1. (a) Time-division beam sweep, (b) frequency-division beam sweep; and their anticipated beam patterns and resource allocation over both frequency and time shown in (c) and (d).



Fig. 2. Beam-squint effect in PAAs. It is assumed that the beam steering angle at  $f_c$  is aligned with AoA.

(AoA) and ratio  $f/f_c$ , where  $f$  and  $f_c$  are the subcarrier and carrier frequencies, respectively, as illustrated in Fig. 2. Thus, PAAs cannot realize the frequency-division-based beam training because the receiver has no knowledge of and control over mapping. Furthermore, it is straightforward to show that the only case when the frequency-to-angle mapping in PAAs covers all angular directions is when the bandwidth is two times larger than the carrier frequency, i.e.,  $BW = 2f_c$ , and AoA is fixed to  $90^\circ$ , which is unrealistic.

Uniform frequency response across wide modulated bandwidths can be realized using the concept of true-time-delay (TTD) introduced with baseband delay elements. Large delay range-to-resolution ratios realized using this technique [3], [4] have opened opportunities for fast beam training algorithms in large-scale arrays without needing additional external

constructs. For example, Rotman Lens [5]-based beamforming can be leveraged for beam training. However, its angular resolution is being limited to three points and its integration adds costs and makes the system bulky. Benefiting from several GHz of available spectrum, another recent approach used two leaky wave antennas (LWAs) [6] operating at sub-THz frequencies used for directional finding. LWA at lower frequency band has also been proposed in [7] but with a coverage efficiency of only 50%. Like the previous approach, integration of these antennas can be costly and inefficient especially when operating in the cm-wave and mmW bands.

The need of TTD approach is driven by different spatial signal processing modes including beam training and beamforming. Since beam squint is a function of number of antenna elements and antenna aperture, phase shift compensation fails to provide such requirement especially when scaling to a larger array for data communications. Absence of time-delay units (TDUs) or smaller delay ranges will not enable rainbow beam training in analog arrays which will affect energy efficiency and search latency. The authors' recent works have shown a reconfigurable TTD-based spatial signal processor (SSP) with large delay-bandwidth product that creates unique frequency-to-angle map for fast beam training and reduces the scan time of analog PAAs from several symbols to a single OFDM symbol [2], [8], [9]. Lin *et al.* [10] demonstrated a reconfigurable TTD SSP hardware for the first time that can be easily reconfigured for both beam training and beamforming over 800-MHz modulated bandwidth. This work significantly expands on the circuits and systems proposed in [10] with the following distinct contributions.

- 1) In-depth analysis of the hardware architecture when applying the beam training algorithm.
- 2) Hardware design considerations and requirements for beam training algorithm are presented and compared with various TTD implementations from RF to analog/digital basebands.
- 3) Detailed circuit design of TTD SSP including time-interleaved interpolated clock generator and high-speed sampling technique to alleviate jitter and mismatch.
- 4) Expanded measurement methodology detailing the test setup with customized automated validation bench development for single-tone, wideband, and modulated measurements of TTD SSP.

The rest of this article is organized as follows. Section II discusses system design considerations for the two SSP modes outlining hardware challenges. The proposed SSP design and circuit implementation details are presented in Section III. Section IV presents the measurement results for different SSP modes followed by conclusions in Section V.

## II. TTD SSP SYSTEM DESIGN CONSIDERATIONS

We consider a uniform linear array with  $N$  critically spaced antenna elements. The inter-element delay can be expressed as  $\Delta t = \sin(\theta)/2f_c$ , where  $\theta$  is the angle of incidence, and  $f_c$  is the center frequency. Therefore, the array vector of the received signal for the  $n$ th element in the frequency domain with respect to AoA can be expressed as:



Fig. 3. HPBW versus the number of antennas.

$[a(\theta, f)]_n = \exp(-j \cdot 2\pi(n-1)d \cdot f \cdot \sin(\theta)/c)$ , where  $d$  is the inter-element spacing,  $f$  is the frequency, and  $c$  represents the speed of light [10]. Next, we describe the system design considerations for both wideband beam training and data communication mode with associated design parameters and hardware challenges.

#### A. Mode 1: Low-Latency Beam Training

Most of the existing beam training algorithms leverage only phase-shifter-based SSPs for exhaustive beam sweeping where different beams are used sequentially in time to estimate AoA. These algorithms (and consequently the analog PAAs) suffer from long scanning times and are thus limited to smaller array sizes by the number of antennas. The scanning time increases as the array size scales up since the half-power beamwidth (HPBW) decreases (as shown in Fig. 3), implying more iterations (or switching) is required to complete the beam training process. To alleviate the above problems, researchers introduced architectures that can synthesize rainbow-like beams to probe the entire angular range with a single OFDM symbol [2], [9]. The key idea is to configure signal delays in different antenna branches to intentionally exacerbate the beam-squint effect among subcarriers. This creates a frequency-to-angle map where each frequency is associated with a unique spatial beam. In a uniform linear array, the approximate mathematical relationship between the frequency  $f$  and the corresponding angle  $\alpha(f)$  is given as follows [2]:

$$\alpha(f) \approx \sin^{-1}(\text{mod}(2f\Delta\tau + 1, 2) - 1) \quad (1)$$

where  $\text{mod}$  is the modulo operator, and  $\Delta\tau$  is the delay difference between neighboring antennas. To probe the entire angular range  $\alpha \in [-\pi/2, \pi/2]$  with different frequencies, the delay difference needs to be set as  $\Delta\tau = 1/\text{BW}$ , where  $\text{BW}$  is the signal bandwidth. The delay taps of the  $n$ th element in  $N$  antenna branches are then  $\tau_n = (n-1)/\text{BW}$ ,  $n = 1, \dots, N$  [9]. With such delays, however, the maximum TTD delay at the last antenna is  $\tau_N = (N-1)/\text{BW}$  (i.e.,  $n = N$ ), which imposes a significant design challenge for array implementation. It is also worth noting that the frequency-to-angle mapping can be further enhanced by increasing its diversity order  $R$ . In that



Fig. 4. Delay tap requirement of (a) beam training and (b) beamforming mode.

case, the delay taps are  $\tau_N = (n-1) \cdot R/\text{BW}$ ,  $n = 1, \dots, N$ , which enable  $R$  different frequencies to be mapped in a particular probed direction. In this work, we use a unity  $R$  for a proof-of-concept implementation.

Given the described frequency-to-angle map, the information of the dominant AoA is embedded in the signal spectrum and it can be obtained through simple frequency-domain digital signal processing. Specifically, after signal combining, removing the OFDM cyclic prefix, and taking the Fourier transform of the signal, the dominant AoA can be concluded by identifying the subcarrier with the highest power and leveraging the relationship in (1). Let  $W(f)$  be the frequency-domain TTD antenna weight vector associated with subcarrier  $f$ , with the  $n$ th element defined as  $[W(f)]_n = \exp(-j \cdot 2\pi \cdot f \cdot \tau_n)$ . Using a spatial response  $a(\theta, f)$  introduced earlier, the received signal  $y(f)$  at the subcarrier  $f$  can be expressed as

$$y(f) = W^H(f)a(\theta, f). \quad (2)$$

Therefore, an estimate  $\hat{\theta}$  of the dominant AoA  $\theta$  can be expressed as follows:

$$\hat{\theta} = \alpha(f^*), \quad f^* = \underset{f}{\text{argmax}} |y(f)|^2. \quad (3)$$

Later in Section IV-A, we demonstrate the estimation algorithm in (3) for different values of incidence angle  $\theta$ .

#### B. Mode 2: Wideband Communications

After identifying AoA using the beam training algorithm described in Section II-A, the SSP can be switched from the beam training mode to the beamforming mode for data communication. The received signals at the antennas are aligned first by applying suitable time delays and phase shifts (in baseband) and then constructively combined enhancing the SNR by  $N$  over a wideband frequency. Different from the beam training mode, the delay taps  $\tau_n$  need to be designed such that antenna weight,  $W(f)$ , matches with array spatial response,  $a(\theta, f)$ , for all frequencies which requires the delay taps  $\tau_n = (n-1) \cdot d \cdot \sin(\theta)/c$ ,  $n = 1, \dots, N$ . Fig. 4 illustrates the tap delay setting for both the modes that are used to characterize our proposed multi-mode TTD SSP.

TABLE I  
TIME-DELAY UNIT IMPLEMENTATION CHOICES

|                      | Area        | Power Cons.         | Delay Range         | Delay Resolution |
|----------------------|-------------|---------------------|---------------------|------------------|
| <b>xLine</b>         | High        | Low ( $\approx 0$ ) | Low ( $\approx 0$ ) | Limited          |
| <b>LC delay line</b> | High        | Low ( $\approx 0$ ) | Low ( $\approx 0$ ) | Limited          |
| <b>RF resampling</b> | Low         | Med.                | Med.                | High             |
| <b>Gm-C filter</b>   | Med.        | Med.-to-High        | Med.                | Med.             |
| <b>BB sampling</b>   | Low         | Low-to-med.         | High                | High             |
| <b>Digital</b>       | Low-to-med. | High                | High                | High             |



Fig. 5. Methods to implement TDUs. (a) Microwave transmission line. (b) LC delay line. (c) RF resampling. (d) Gm-C filter. (e) Baseband (BB) sampling. (f) Digital domain.

### III. PROPOSED TTD SSP CIRCUIT IMPLEMENTATION

As discussed in Section II, the integration of the rainbow-like beam training algorithm with the data communication mode requires a reconfigurable TTD architecture with both constant group delay and a large delay range-to-resolution ratio across a wide bandwidth to enable the two SSP modes.

Recent TDUs have used the following methods: transmission line [11], LC delay network [12], [13], RF re-sampling [14], Gm-C all-pass filter [15], [16], time-interleaved switched capacitor circuit [3], [4], [17], and digital delays [18]. The conceptual implementations of these TDUs are shown in Fig. 5(a)–(f). Table I further provides a high-level comparison of these methods. As seen in Table I, the transmission line and LC delay network provide a passive solution, however, with limited delay resolution and delay range. The delay range can be improved by switching sections of the TDUs and cascading multiple sections but requires a large silicon area. Moderate resolution and delay range can be realized using Gm-C delay cell. Nevertheless, as the bandwidth increases, the power consumption of the single Gm-C unit also increases proportionally. Besides, the Gm-C filter hardly benefits from process scaling due to the analog-intensive architecture. The process, voltage, and temperature effects can be mitigated by including additional

calibration knobs for biasing input pairs or load at the expense of additional power and bandwidth constraint owing to larger parasitic capacitance. RF re-sampling [14] is performed using a digital-controlled delay lines (DDLs) to resample the input signal after a passive mixer which demonstrate a wide delay range with fine resolution. However, interleaving directly at carrier frequency is potentially prone to jitter and gain errors. Besides, the achievable resolution is determined by DDL design that also operates at carrier frequency implying the design challenges and power overhead also scales up with frequency. Another alternative is to introduce delay in the digital domain using the powerful digital signal processor (DSP). However, the need for an analog-to-digital converter (ADC) per antenna element results in significant power overhead increasing proportionately for larger arrays. Furthermore, integer delay can be implemented in digital domain easily while fractional delay is achieved by interpolation which incurs additional power/design overhead [18]. In contrast, baseband sampling [3], [17], [19] using interleaved sampler has a high delay range-to-resolution ratio, while the higher delay range is achieved by increasing the interleaving factor. The inherent nature of digital-mostly architecture of the time-interleaved sampling technique with precision on-chip capacitor makes this architecture friendly to process scaling.

Fig. 6 shows the proposed TTD SSP with detailed architecture implementation. The commercial off-the-shelf (COTS) front-end of the system is expected to operate with a center frequency ( $f_c$ ) of 28 GHz and a target BW of 800 MHz. The received signal is then down converted to baseband, assuming a low IF downconverter. Two functionalities including beam training and beamforming are demonstrated through frequency-independent time delay settings with a multiply-and-accumulated (MAC) operation. The results are observed directly from output power spectral density (PSD). The MAC structure can also be scaled to construct a higher order matrix in a multi-input–multi-output (MIMO) system with minimal hardware overhead. Leveraging the baseband discrete-time beamforming, it has been proven in [3] that the minimum number of interleaving levels ( $M_{BF}$ ) to meet the required signal BW for the beamforming mode can be expressed as (assuming a 120° field of view (FoV))

$$M_{BF} = 1 + \left[ (d/\lambda_c) \times \sqrt{3}/2 \times (N - 1) \times (BW/f_c) \right]. \quad (4)$$

Note that the time delay in the system is caused by the actual antenna placement, AoA, and carrier frequency and maintained the same before/after the mixer [20]. The interleaving level to support the beam training mode is derived with similar fashion as

$$M_{BT} \geq 1 + 2(N - 1) \quad (5)$$

where  $d$  is the inter-element distance,  $\lambda_c$  is the wavelength, and  $f_c$  is the center frequency. Therefore, the minimum interleaving level is chosen to be  $\max(M_{BF}, M_{BT})$  and, usually, dominated by the beam training requirement. In this work, an interleaving level of 7 is required to support a four-element SSP implementation for both the modes.



Fig. 6. Proposed SSP modes; architecture; and block-level implementation (inset).



Fig. 7. (a) Unit sampler. (b) Time-interleaved (TI) sampler array. (c) Time interleaver. (d) Simplified timing diagram.

The increased bandwidths with time-interleaved paths in this proposed work place several challenges to the design of the input network when compared with earlier works handling up to 500-MHz BW [3], [4], [21]. This requires a careful design analysis of input source follower interfacing with the external 50- $\Omega$  source impedance, the ON-resistance of the sampler switch, the size of the sampling capacitor, and the bandwidth and energy efficiency of the closed-loop summer. Sections III-A–III-C analyze the design requirements for each block in the input network.

#### A. Switched Capacitor Array (SCA) and Clock Generation

The fundamental block of the discrete-time delay compensation method is the sample-and-hold within MAC. The sampler, shown in Fig. 7(a), is adopted from [22] as it offers greater



Fig. 8. Pseudo-differential RAMP in its closed-loop configuration.

immunity to parasitic capacitance thus alleviating second-order errors. The complete sampler array is illustrated in Fig. 7(b) constructed using the unit sampler in Fig. 7(a). SCA requires non-overlapping clocks for sampling, sum, and reset phases generated by the time interleaver. Each interleaved level has a conversion speed of roughly 228 MHz ( $=f_{clk}/M$ ). After all the signal paths are aligned in time, the sampled inputs from the four elements are combined using a closed-loop ring amplifier (RAMP) before being fed into the quantizer (off-chip) for further processing. The parasitic-insensitive topology is highly beneficial as the interleaved sampled signals from different elements are summed at the opamp virtual ground. The choice of the sampling capacitor,  $C_s$ , ( $=50$  fF) is determined by the thermal noise requirement and the available silicon area. The required time-interleaved clock is derived from the input clock (IN) and derived by the circuits illustrated in Fig. 7(c). The timing diagram shown in Fig. 7(d) captures the relationship between sampling, summing, and reset phases with twice the number of interleaving levels and eight times the bandwidth. Careful layout with mismatch optimized post-layout extractions was done to alleviate mismatch in the time-interleaved clock phases similar to a high-speed time-interleaved ADC [23].

#### B. Signal Combiner

The concept of summer originates from the charge sharing gain stage [3] with capacitive feedback network.

An operational transconductance amplifier (OTA) is used for signal summation at the output of the SCA to combine signals. The large feedback factors due to summation of different antenna inputs and interleaved channels at the OTA virtual ground significantly constrains the unity-gain bandwidth affecting settling times for wideband signals and therefore resulting in significant power/performance penalties. For instance, the unity-gain bandwidth ( $\omega_u$ ) requirements of OTA used within summer can be derived as [3]

$$\omega_u \geq 2N \cdot \ln(2) \cdot (R + 1) f_s \quad (6)$$

where  $N$  is the number of elements,  $R$  is the desired resolution, and  $f_s$  is the sampling frequency. It is clear from 6 that the required  $\omega_u$  increases with the number of antennas, resolution, and bandwidth given the Nyquist sampling criteria. For a 6-bit resolution, four-element, and 1.6-GHz sampling frequency, the minimum required  $f_u$  is around 10 GHz which is hard to realize using a conventional OTA as used earlier in [3]. Emerging research in RAMP topology [23]–[27] alleviates some of these limitations enabling realization of a low-voltage high-speed amplifier that scales well with advanced CMOS technology. A RAMP comprises several cascaded inverters that form a multi-stage amplifier. Its initial response behaves analogous to a ring oscillator to charge and discharge the capacitive load which is stabilized by placing a dominant output pole, through adjusting the biasing points and hence creating a dead zone at the last stage of RAMP. The dead zone enables RAMP to operate the last stage in either subthreshold or cutoff which increases its output impedance pushing the dominant pole (at the output) toward the origin, thus stabilizing the amplifier.

Fig. 8 shows the top-level open-loop dynamic amplifier design adapted from [25]. It consists of two single-ended RAMP that forms a pseudo-differential configuration and eventually constructs a closed-loop summer with additional feedback capacitor and a reset switch. The bias enhancement and the biasing scheme using an anti-parallel arrangement of CMOS transistors are implemented here to enhance both the bandwidth and the linearity. Though the RAMP behavior has been investigated thoroughly in [23], [25], and [27], the application of RAMP in our SSP has additional challenges. Recall from Figs. 6 and 8, during the SUM phase, the SSP is analogous to a closed-loop integrator. As the number of elements increases, the feedback factor,  $\beta$ , decreases which implies that the available loop gain is inversely proportional to the number of elements. In practice, additional parasitic capacitance located at the virtual ground node of RAMP due to routing the time-interleaved channels makes  $\beta$  even smaller. Fig. 9 shows the simulated loop gain and phase for the RAMP different  $\beta$  while phase margin is maintained for all the cases. Thus, the adoption of RAMP is crucial here compared with the traditional OTA to achieve large delay range across a wide bandwidth. Besides, the first and second stages of RAMP are made as fast as possible, and eventually process limited. An advanced technology node will further help achieve better gain-bandwidth products alleviating the  $\beta$  constraints. Finally, similar to [27], the active common-mode feedback (CMFB) using two-stage RAMP is used to define the common-mode



Fig. 9. Simulated loop gain and phase response for different feedback factor ( $\beta$ ).



Fig. 10. (a) Inline demux sampling used in the proposed SSP. (b) Simplified RC network for BW prediction.

voltage providing sufficient gain compared with a passive CMFB.

### C. Input and Output Buffers

The input network including the sample-and-hold also determines the bandwidth and linearity of SSP in addition to the signal combiner. Fig. 10(a) shows the time-interleaved sampling scheme used in the proposed SSP input network. Its  $RC$  model is illustrated in Fig. 10(b) to predict the bandwidth



Fig. 11. Simulated ON-resistance of bootstrapped switch (blue), NMOS switch (red), and complementary switches (black) as a function of input voltage.

where  $Z_{\text{out,BUF}}$  is the buffer output impedance,  $R_{\text{on}}$  represents the switch ON-resistance, and  $C_{\text{par}}$  is the node parasitic capacitance including the switches' capacitance. To achieve fast settling for wideband signals, input buffers with low output impedance are required. Thus, a source follower is designed as the input buffer to provide a low output impedance. The switch selections of the inline demux network are also an interest. While the MOS-only switch is the most area-efficient, the input-dependent ON-resistance introduces harmonics and deteriorates system linearity. In contrast, the bootstrapped switch offers the best linearity since the gate–source voltage is independent of the input source at the expense of area penalty [28]. Fig. 11 shows the ON-resistance for three types of switches and visualizes the ON-resistance variation across different input levels assuming a sampling capacitor ( $C_s$ ). A good linearity can only be achieved by lowering down the ON-resistance, and thus increasing the size of the NMOS transistor at the cost of increased drain junction capacitance and consequently significantly higher nonlinearities [28]. With the adoption of inline demux technique, the first rank samples at the full speed and the second rank samples at a lower speed due to interleaved implementation. Thus, the bootstrapped switch is adopted at the first rank only to maintain the linearity, while the complimentary switches are placed at the second rank to save the silicon area.

Following the buffer, the sampling switch ( $\text{SW}_1$ ) needs to handle the entire bandwidth of interest with high linearity and is thus designed as a boosted switch.

The last section of the RC network is determined by two principles. First, the interleaved switches are triggered using the non-overlapping clocks with a duty cycle of 7.14% (1/14), given an interleaving factor of 7 where only one switch is off at a particular interval and decides the network bandwidth. Second, the size of the sampling capacitor  $C_s$  is constrained by thermal noise requirements with the noise floor. Thus, the bandwidth will be determined by two times the ON-resistance of  $\text{SW}_{11}$ . Based on the above design considerations,  $C_s$  is chosen to be 50 fF, while the interleaved switches are implemented using transmission gates considering the input level requirement and its  $R_{\text{on}}$  is almost independent of the output voltages. The simplified diagram and transistor-level



Fig. 12. Implementation of the input network. (a) Simplified diagram. (b) Differential source follower buffer.



Fig. 13. Amplitude response of the input network using extracted parameters.

schematic of the critical blocks are shown in Fig. 12(a) and (b), respectively. Fig. 12(b) shows the differential buffers starting with a bias tee network for biasing the input buffer input transistor independent of the external signal. The bootstrap switch is adapted from [29] for the SSP input network providing a low ON-resistance of 42.75 Ω. Other parameters in Fig. 12(b) are extracted from simulation (not considering routing parasitic) demonstrating a flat amplitude response with 3-dB bandwidth of 8.7 GHz meeting settling requirements over bandwidth of interest (800 MHz), as shown in Fig. 13. Note that the high-pass network formed by the bias tee is omitted in bandwidth estimation. The output buffer design is similar to the input buffer except the  $G_m$  of the input pair is set as 20 mS to drive the 50-Ω interface for data acquisition.

#### IV. TESTBED AUTOMATION AND MEASUREMENT RESULTS

The proposed TTD SSP is designed and fabricated in a TSMC 65-nm CMOS occupying an area of 1.98 mm<sup>2</sup> including test pads, as shown in Fig. 14(a) and mounted on a prototyped evaluation board for testing in a Quad Flat No Leads (QFN) 72-pin package. Fig. 14(b) illustrates the signal generation and receive using the Xilinx RFSoC high-speed DACs and ADC, respectively. The differential-to-single-ended conversion is also done using the Xilinx balun board. Fig. 14(c) and (d) shows the complete verification setup and the corresponding test bench photograph. The laptop controls the Xilinx RFSoC GUI for signal generation, DAC81416EVM for dc biasing, and Analog Discovery-2 for SPI control.

Multi-antenna SSP receivers are typically characterized over a large matrix of frequency and AoA combinations requiring significant effort complicated by multiple inputs/outputs. Thus, the development of the test bench and validation method is of interest. The input signal for the TTD SSP device under test (DUT) was generated in MATLAB and then uploaded to a Xilinx ZCU216 RFSoC configured in



Fig. 14. TTD SSP automated closed-loop test bench. (a) Die micrograph. (b) Input signal chain setup. (c) Complete verification setup for characterization. (d) Setup photograph for the signal chain.

multi-tier synchronization mode to precisely synchronize the eight-differential channels (16 inputs) before being applied to the DUT through SMA coaxial cables. A 3.2-GHz clock source from HP8360A provides the external clock to generate the required time-interleaved phases on-chip. The required biases and configuration control are provided by the Texas Instruments DAC (DAC81416EVM) and Digilent Analog Discovery. Finally, the output signal is fed to GS/s ADCs in the Xilinx RFSoC and observed from its GUI.

Because the input and output signal are generated and acquired by the same GUI, signal path automation is possible to reduce the testing time. For precise synchronization with multi-channel generation, we leverage computer vision techniques to test GUI with pre-stored image icons. The test bench specifies which GUI components to interact with. PyAutoGUI library is used to provide an abstraction to OpenCV [30] which provides tools to easily distinguish the screen images. Fig. 15 shows the flowchart for the automated closed-loop testbed. The input can be selected from either pre-generated MATLAB code or directly in Python. Another application script called as Xilinx\_Auto.py configures the DACs and reads the captured ADC data for additional post processing.

### A. Mode 1: Low-Latency Beam Training Test

In the beam training mode, the input signals of TTD SSP are generated by RF DAC which emulates OFDM symbols received by a critically spaced linear array at  $\lambda/2$  with incident angle  $\theta$  followed by down-conversion to intermediate frequency with a BW of 491.32 MHz and subcarrier spacing 960 kHz. Although beam training conceptually requires only one pilot OFDM symbol to measure the system frequency response and infer  $\theta$ , our experiment lacks the necessary synchronization in the current test bench setup. Thus, the same OFDM pilot symbol repeats itself and its PSD is observed for analysis. It is clear from the measured response in Fig. 16 that the PSD of the received signal is uniquely determined by incident angles  $\theta$ . Hence,  $\theta$  can be inferred without time-consuming sequential beam sweep. To further prove the effectiveness of the beam training algorithm, Fig. 17(a) shows the measured heat map with a sweep from  $-85^\circ$  to  $85^\circ$ , constructing a unique frequency-to-angle mapping within the high beamforming gain region. The dark color region represents the intentionally squint region which matches the single-point measurement shown in Fig. 16. Another way to interpret the measured data is leveraging the 3-D plot among frequency, angle, and relative magnitude, illustrated in Fig. 17(b). Again, a peak (i.e., high beamforming gains region) is clearly observed across different angles. Note that similar to any Nyquist ADC, the sinc behavior from the sample-and-hold results in a signal droop close to Nyquist [31]. Post processing using a whitening filter to flatten the amplitude response is done only for the beamtraining mode to correctly infer AoA using the calibration method as follows.

- 1) Measure beamformed frequency response for characterizing roll-off over Nyquist BW.
- 2) Design whitening filter in MATLAB to correct the roll-off.
- 3) Apply the same filter for beamtraining to derive AoA as per the proposed algorithm.

### B. Mode 2: Wideband Data Communications Tests

In the communication mode, the proposed SSP can be configured as illustrated in Fig. 4(b). Three different signal types are emulated including single tone, wideband chirp, and quadrature amplitude modulation (QAM) in RFSoC and applied to TTD SSP DUT. Fig. 18(a) shows the measured single tone response showing a beamforming gain close to 12 dB for a four-element array with an FoV (same as AoA range in this work) close to  $\pm 90^\circ$  across the bandwidth of interest, demonstrating the capability of the delay compensation technique with RAMP-based signal combiner across 720-MHz bandwidth. The proposed array can also be configured to the eight-element mode, demonstrating a maximum 16-dB beamforming gain, as shown in Fig. 20(b). Though the FoV gets sacrificed to  $50^\circ$  to accommodate the increased number of elements and fractional bandwidth ( $f/f_c$ ), the narrowed FoV can be remedied by increasing the interleaving level at the expense of additional power consumption and area overhead. The effect of the required interleaving level with respect to the number of element and



Fig. 15. Automated testing flow for multi-mode TTD SSP characterization.



Fig. 16. Measured beam training PSD against theoretical results.



Fig. 17. Measured (a) heat map and (b) 3-D beam pattern for beam training.

different FoV is further illustrated in Fig. 19. As observed, a larger array with wider fractional bandwidth proportionally requires more interleaving [see (4)] which implies a tradeoff between the number of antennas, FoV, fractional bandwidth, and hardware/power overhead. To leverage the proposed technique with a reasonable hardware/power overhead, a sub-array implementation within a hybrid architecture is thus preferred [3], [9]. For the four cases of AoA, the beamforming beam patterns are presented in the polar format (see Fig. 20). The frequency-independent beamforming conversion gain and the beam-squint free beam patterns show the TTD-based operation of SSP. Wideband signal testing is also applied to the proposed SSP as shown in Fig. 21(a) and (b) for 720 MHz BW at 0° AoA with all the elements enabled. Again, a uniform beamforming gain of 12 dB is observed with all the elements enabled in the four-element configuration. SSP



Fig. 18. Beamforming gain for (a) four-element and (b) eight-element configurations.



Fig. 19. Minimum number of interleaving level with respect to FoV and the number of antennas under the assumption of a fractional bandwidth of 0.2.



Fig. 20. AoA measurement in a four-element configuration.

is also measured for QAM-modulated signals with four-QAM and 16-QAM in the communication mode. The steps for error vector magnitude (EVM) measurement are as follows.

- 1) *Single-Channel Loopback*: is used to benchmark our test setup by connecting the Xilinx RFSoC DAC followed by two baluns back to the Xilinx RFSoC ADC. With

TABLE II  
COMPARISON WITH STATE-OF-THE-ART TTD SSP

|                                 | [32]<br>JSSC'07            | [33]<br>RFIC'18 | [16]<br>JSSC'17      | [3]<br>TCAS-I'19                   | [21]<br>TMTT'21        | [4]<br>JSSC'21    | [14]<br>SSCL'20           | [18]<br>JSSC'19        | This<br>work         |
|---------------------------------|----------------------------|-----------------|----------------------|------------------------------------|------------------------|-------------------|---------------------------|------------------------|----------------------|
| <b>TTD Arch.</b>                | RF                         | RF              | BB                   | BB                                 | BB                     | BB                | BB                        | Digital                | BB                   |
| <b>Method</b>                   | LC<br>delay                | LC<br>delay     | Gm-C<br>filter       | TI<br>sampling                     | TI<br>sampling         | RF<br>sampling    | resampling                | Digital<br>delay       | TI<br>sampling       |
| <b>Domain</b>                   | Cont.-<br>Time             | Cont.-<br>Time  | Cont.-<br>Time       | Discrete-<br>Time                  | Discrete-<br>Time      | Discrete-<br>Time | Discrete-<br>Time         | Digital                | Discrete-<br>Time    |
| <b># Elements</b>               | 4                          | N/A             | N/A                  | 4                                  | 4                      | 4                 | 4                         | 16                     | $4(8^3)$             |
| <b>Supply (V)</b>               | 1.5                        | 2.5             | 1.4                  | 1.0                                | N.A.                   | 1.0               | N.A.                      | N.A.                   | 1.0/1.3 <sup>1</sup> |
| <b>Power (mW)</b>               | 555                        | 285             | 364                  | 47                                 | $70^2$                 | 40                | 122                       | 453                    | $29^3/176$           |
| <b>f<sub>CENTER</sub> (MHz)</b> | 8                          | 11              | N/A                  | N/A                                | 25                     | 1.5               | 0.6–4                     | 1                      | $28^{10}$            |
| <b>BW (MHz)</b>                 | 18000                      | 2000–20000      | 2000                 | 100                                | 100                    | 500               | 800                       | 100                    | 800                  |
| <b>Delay Range (ns)</b>         | 0.3                        | 0.508           | 1.7                  | 15                                 | 10                     | 1                 | 5                         | 7500                   | 3.8                  |
| <b>Delay Resol. (ns)</b>        | 0.015                      | 0.004           | 0.001                | 0.005                              | 0.005                  | 0.015             | 0.035                     | 0.25                   | 0.005                |
| <b>Linearity (dBm)</b>          | -3.2 <sup>4</sup> (P1-dB)  | N/A             | -3.1 (IIP3)          | N/A                                | -28 <sup>5</sup> (EVM) | 7.9 (IIP3)        | -16.5 <sup>6</sup> (IIP3) | -41                    | $14^6$ (IIP3)        |
| <b>Noise (dB)</b>               | 2.9–4.8 <sup>12</sup> (NF) | N/A             | 23 <sup>7</sup> (NF) | $330 \mu\text{V}_{\text{rms,out}}$ | N/A                    | 33 (SNDR)         | 20 <sup>6</sup> (NF)      | 60 <sup>8</sup> (SNDR) | 31.8 <sup>11</sup>   |
| <b>Area (mm<sup>2</sup>)</b>    | 9.92                       | 5.45            | 0.61 <sup>9</sup>    | 0.57 <sup>9</sup>                  | 0.5                    | 0.82              | 1.2                       | 4.42                   | 1.98                 |
| <b>Tech. (nm)</b>               | CMOS<br>130                | BiCMOS<br>130   | CMOS<br>130          | CMOS<br>65                         | CMOS<br>65             | CMOS<br>65        | CMOS<br>28                | CMOS<br>40             | CMOS<br>65           |

<sup>1</sup>1.3V biases input buffers only; <sup>2</sup>Baseband delay circuit, mixers, and LO; <sup>3</sup>Excluding I/O buffers for testing; <sup>4</sup>Estimated from Fig.15(d) [33] at 15 GHz; <sup>5</sup>Test at 270-Mb/s 64-QAM; <sup>6</sup>Worst case for single-channel measurement; <sup>7</sup>Worst case with a filter order of 9; <sup>8</sup>Array SNDR; <sup>9</sup>Active area only; <sup>10</sup>Front-end emulated using Xilinx ZCU216; <sup>11</sup>790MHz input signal with 5dB buffer loss is decoupled; <sup>12</sup> Measured from the UWB front-end; \* - Estimated; <sup>13</sup>In the 8-element mode, Only 0° test is feasible for testing due to the current implementation constraint



Fig. 21. Wideband measurement with (a) one-element and (b) four-element enabled.



Fig. 23. Measured EVM performance for both four- and eight-element configurations using 16-QAM signal.



Fig. 22. Measured EVM for (a) four-QAM and (b) 16-QAM.

the 16-QAM-modulated signal applied, a 2.1% EVM is measured.

2) *Four-Element DUT Measurement*: is performed using the same settings in the previous step realizing an EVM of 5.2% [see Fig. 22(a)] and 7.3% [see Fig. 22(b)] for four-QAM and 16-QAM, respectively. The measured 16-QAM data rate is 614.4 Mb/s (signal bandwidth  $\sim 300$  MHz). The reported number does not de-embed

the output source follower loss, and thus, the performance is expected to improve further.

Fig. 23 further captures the EVM performance variation across different input power levels using a 16-QAM signal. The degraded EVM performance is caused by either the low SNR (region on the left at Fig. 23) or increased non-linearity caused by system saturation [34] (region on the right at Fig. 23), leaving a low error region at the center for data communications. The EVM performance of our proposed system can be attributed to the following factors: (1) finite bandwidth of the closed-loop summer including RAMP (source followers and the sampler exceed the desired bandwidth by more than 10 $\times$ ). The limited bandwidth results in incomplete settling and makes the symbols deviate from its ideal locations as described in [35]; (2) signal loss in the input and output buffers used to match external 50- $\Omega$  impedance directly deteriorates the SNR and thus the EVM; and (3) mismatches in the test setup including multiple connectors, adapters, and baluns. Though less significant when compared with the previous two, the

test setup imperfections further limit the EVM. Future works will focus on improving our test bench to improve EVM performance and meet the requirements of the emerging mmW wireless applications. In addition, the eight-element case tends to saturate the system faster when compared with the four-element case, causing the EVM to become worse at higher input power. The single-channel third-order input intercept (IIP3) is also measured and interpolated for the proposed SSP. Two input tones set to be 766 and 776 MHz with a spacing of 10 MHz result in the third-order intermodulation product located at 786 and 756 MHz, respectively, demonstrating 14-dBm IIP3. A higher IIP3 performance, compared with [3] and [4], is attributed to the inclusion of the input source follower circuits followed by the bootstrapped switch. Besides, a higher supply (1.3 V) is used for improved linearity of the input source follower with careful design considerations ensuring no device stress across all device terminals. The four-element TTD SSP consumes 29 mW including RAMP, sample phase generation, and switched capacitor bank. The required interface circuits consume 135, 10, and 2 mW for the input buffer, output buffer, and current mirroring, respectively. Table II summarizes the critical parameters for the proposed TTD SSP and compares with the state-of-the-art. Though [14] demonstrated similar bandwidth as the proposed work, the use of an RF sampling mixer with digital delay line will limit its application at mmW. In contrast, the proposed work can connect with different RF front-end downconverters relaxing the overall system design complexity for TTD arrays. The modulated signals' performance in two processing modes proves the applicability of the proposed SSP for high-speed wireless links for both beam training and beamforming functionalities.

## V. CONCLUSION

This article demonstrates multi-mode SSP with low-latency beam training and wideband data communications. Frequency-dependent search beams are created to sound all the directions simultaneously using TTD arrays to greatly reduce beam training latency. The proposed method is scalable depending on the multi-antenna front-end specifications. In addition, the proposed architecture supports wideband data communications with large delay-bandwidth product using fast slewing wideband RAMP for efficient signal combining in the baseband switched capacitor array. A 3.8-ns delay compensation across 800-MHz bandwidth and a 29-mW power consumption is demonstrated with EVM of <10% supporting 16-QAM.

## REFERENCES

- [1] D. Pepe and D. Zito, "Two mm-wave vector modulator active phase shifters with novel IQ generator in 28 nm FDSOI CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 2, pp. 344–356, Feb. 2017.
- [2] H. Yan, V. Boljanovic, and D. Cabric, "Wideband millimeter-wave beam training with true-time-delay array architecture," in *Proc. 53rd Asilomar Conf. Signals, Syst., Comput.*, Nov. 2019, pp. 1447–1452.
- [3] E. Ghaderi, A. S. Ramani, A. A. Rahimi, D. Heo, S. Shekhar, and S. Gupta, "An integrated discrete-time delay-compensating technique for large-array beamformers," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 66, no. 9, pp. 3296–3306, Sep. 2019.
- [4] E. Ghaderi and S. Gupta, "A four-element 500-MHz 40-mW 6-bit ADC-enabled time-domain spatial signal processor," *IEEE J. Solid-State Circuits*, vol. 56, no. 6, pp. 1784–1794, Jun. 2021.
- [5] Y. Gao, M. Khaliel, F. Zheng, and T. Kaiser, "Rotman lens based hybrid analog-digital beamforming in massive MIMO systems: Array architectures, beam selection algorithms and experiments," *IEEE Trans. Veh. Technol.*, vol. 66, no. 10, pp. 9134–9148, Jun. 2017.
- [6] H. Saeidi, S. Venkatesh, X. Lu, and K. Sengupta, "THz prism: One-shot simultaneous localization of multiple wireless nodes with leaky-wave THz antennas and transceivers in CMOS," *IEEE J. Solid-State Circuits*, vol. 56, no. 12, pp. 3840–3854, Dec. 2021.
- [7] H. Li, M. Wu, Y. Cheng, L. Mei, and C. Zhou, "Leaky-wave antennas as metal rims of mobile handset for mm-wave communications," *IEEE Trans. Antennas Propag.*, vol. 69, no. 7, pp. 4142–4147, Jul. 2021.
- [8] V. Boljanovic, H. Yan, E. Ghaderi, D. Heo, S. Gupta, and D. Cabric, "Design of millimeter-wave single-shot beam training for true-time-delay array," in *Proc. IEEE 21st Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC)*, May 2020, pp. 1–5.
- [9] V. Boljanovic *et al.*, "Fast beam training with true-time-delay arrays in wideband millimeter-wave systems," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 68, no. 4, pp. 1727–1739, Apr. 2021.
- [10] C.-C. Lin *et al.*, "A 4-element 800 MHz-BW 29 mW true-time-delay spatial signal processor enabling fast beam-training with data communications," in *Proc. IEEE 47th Eur. Solid State Circuits Conf. (ESSCIRC)*, Sep. 2021, pp. 287–290.
- [11] F. Lin and H. Deng, "Continuously tunable true-time-delay phase shifter based on transmission lines with simultaneously reconfigurable impedance and phase constant," *IEEE Trans. Microw. Theory Techn.*, vol. 67, no. 12, pp. 4714–4723, Dec. 2019.
- [12] H. Hashemi, T.-S. Chu, and J. Roderick, "Integrated true-time-delay-based ultra-wideband array processing," *IEEE Commun. Mag.*, vol. 46, no. 9, pp. 162–172, Sep. 2008.
- [13] M. Li *et al.*, "An 800-ps origami true-time-delay-based CMOS receiver front end for 6.5–9-GHz phased arrays," *IEEE Solid-State Circuits Lett.*, vol. 3, pp. 382–385, 2020.
- [14] K. Spoof, M. Zahra, V. Unnikrishnan, K. Stadius, M. Kosunen, and J. Ryynanen, "A 0.6–4.0 GHz RF-resampling beamforming receiver with frequency-scaling true-time-delays up to three carrier cycles," *IEEE Solid-State Circuits Lett.*, vol. 3, pp. 234–237, 2020.
- [15] S. K. Garakoui, E. A. M. Klumperink, B. Nauta, and F. E. van Vliet, "Compact cascadable  $g_m$ -C all-pass true time delay cell with reduced delay variation over frequency," *IEEE J. Solid-State Circuits*, vol. 50, no. 3, pp. 693–703, Mar. 2015.
- [16] I. Mondal and N. Krishnapura, "A 2-GHz bandwidth, 0.25–1.7 ns true-time-delay element using a variable-order all-pass filter architecture in  $0.13\mu\text{m}$  CMOS," *IEEE J. Solid-State Circuits*, vol. 52, no. 8, pp. 2180–2193, Aug. 2017.
- [17] A. Nagulu *et al.*, "A full-duplex receiver with true-time-delay cancelers based on switched-capacitor-networks operating beyond the delay-bandwidth limit," *IEEE J. Solid-State Circuits*, vol. 56, no. 5, pp. 1398–1411, May 2021.
- [18] S. Jang, R. Lu, J. Jeong, and M. Flynn, "A 1-GHz 16-element four-beam true-time-delay digital beamformer," *IEEE J. Solid-State Circuits*, vol. 54, no. 5, pp. 1304–1314, May 2019.
- [19] E. Ghaderi, A. S. Ramani, A. A. Rahimi, D. Heo, S. Shekhar, and S. Gupta, "Four-element wide modulated bandwidth MIMO receiver with >35-dB interference cancellation," *IEEE Trans. Microw. Theory Techn.*, vol. 68, no. 9, pp. 3930–3941, Sep. 2020.
- [20] M. Soer, "Switched-RC beamforming receivers in advanced CMOS: Theory and design," Ph.D. dissertation, Dept. Elect. Eng., Math. Comput. Sci., Univ. Twente, Enschede, The Netherlands, Nov. 2012.
- [21] A. Fikes, P. P. Khial, S. Nooshabadi, and A. Hajimiri, "Programmable active mirror: A scalable decentralized router," *IEEE Trans. Microw. Theory Techn.*, vol. 69, no. 3, pp. 1860–1874, Mar. 2021.
- [22] D. Allstot, R. Brodersen, and P. Gray, "MOS switched capacitor ladder filters," *IEEE J. Solid-State Circuits*, vol. JSSC-13, no. 6, pp. 806–814, Dec. 1978.
- [23] B. Hershberg, S. Weaver, K. Sobue, S. Takeuchi, K. Hamashita, and U.-K. Moon, "Ring amplifiers for switched capacitor circuits," *IEEE J. Solid-State Circuits*, vol. 47, no. 12, pp. 2928–2942, Dec. 2012.
- [24] Y. Lim and M. P. Flynn, "A 1 mW 71.5 dB SNDR 50 MS/s 13 bit fully differential ring amplifier based SAR-assisted pipeline ADC," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 2901–2911, Dec. 2015.
- [25] J. Lagos, B. Hershberg, E. Martens, P. Wambacq, and J. Cranckx, "A single-channel, 600-MS/s, 12-b, ringamp-based pipelined ADC in 28-nm CMOS," *IEEE J. Solid-State Circuits*, vol. 54, no. 2, pp. 403–416, Feb. 2018.

- [26] Y. Lim and M. P. Flynn, "A 100 MS/s, 10.5 bit, 2.46 mW comparatorless pipeline ADC using self-biased ring amplifiers," *IEEE J. Solid-State Circuits*, vol. 50, no. 10, pp. 2331–2341, Oct. 2015.
- [27] J. Lagos, B. P. Hershberg, E. Martens, P. Wambaeq, and J. Craninckx, "A 1-GS/s, 12-b, single-channel pipelined ADC with dead-zone-degenerated ring amplifiers," *IEEE J. Solid-State Circuits*, vol. 54, no. 3, pp. 646–658, Mar. 2019.
- [28] B. Razavi, "The bootstrapped switch [a circuit for all seasons]," *IEEE Solid State Circuits Mag.*, vol. 7, no. 3, pp. 12–15, Summer 2015.
- [29] A. M. Abo and P. R. Gary, "A 1.5-V, 10-bit, 14.3-MS/s CMOS pipeline analog-to-digital converter," *IEEE J. Solid-State Circuits*, vol. 34, no. 5, pp. 599–606, May 1999.
- [30] T.-H. Chang, T. Yeh, and R. C. Miller, "GUI testing using computer vision," in *Proc. 28th Int. Conf. Hum. Factors Comput. Syst. (CHI)*, 2010, p. 1535.
- [31] S. W. Smith, *The Scientist and Engineer's Guide to Digital Signal Processing*. San Diego, CA, USA: California Tech. Publishing, 1997. [Online]. Available: <http://www.dspguide.com>
- [32] T.-S. Chu, J. Roderick, and H. Hashemi, "An integrated ultra-wideband timed array receiver in 0.13 $\mu$ m CMOS using a path-sharing true time delay architecture," *IEEE J. Solid-State Circuits*, vol. 42, no. 12, pp. 2834–2850, Dec. 2007.
- [33] M.-K. Cho, I. Song, and J. D. Cressler, "A true time delay-based SiGe bi-directional T/R chipset for large-scale wideband timed array antennas," in *Proc. IEEE Radio Freq. Integr. Circuits Symp. (RFIC)*, Jun. 2018, pp. 272–275.
- [34] R. Bonk *et al.*, "Linear semiconductor optical amplifiers for amplification of advanced modulation formats," *Opt. Exp.*, vol. 20, no. 9, pp. 9657–9672, Apr. 2012.
- [35] H. Wang, H. Mohammadnezhad, and P. Heydari, "Analysis and design of high-order QAM direct-modulation transmitter for high-speed point-to-point mm-wave wireless links," *IEEE J. Solid-State Circuits*, vol. 54, no. 11, pp. 3161–3179, Nov. 2019.



**Veljko Boljanovic** (Graduate Student Member, IEEE) received the B.S. and M.S. degrees in electrical and computer engineering from the University of Novi Sad, Novi Sad, Serbia, in 2015 and 2016, respectively. He is currently pursuing the Ph.D. degree with the University of California at Los Angeles, CA, USA.

His research interests include system design, network performance optimization, and digital signal processing in wireless millimeter-wave communications.

Mr. Boljanovic was a recipient of the Electrical and Computer Engineering Department Fellowship and the Dissertation Year Fellowship at the University of California at Los Angeles in 2017 and 2021, respectively.



**Han Yan** (Member, IEEE) received the B.E. degree from Zhejiang University, Hangzhou, China, in 2013, and the M.S. and Ph.D. degrees in electrical and computer engineering from the University of California at Los Angeles (UCLA), Los Angeles, CA, USA, in 2015 and 2020, respectively.

He has broad research interests in signal processing and communication system design for millimeter-wave mobile networks, cooperative unmanned aerial vehicles networks, and dynamic spectrum sharing radios.

Dr. Yan was a recipient of the UCLA Dissertation Year Fellowship in 2018, the Qualcomm Innovation Fellowship in 2019, the UCLA ECE Distinguished Ph.D. Dissertation Award in 2020, and the Best Paper Award at the 2020 ACM mmNets Workshop.



**Chung-Ching Lin** (Graduate Student Member, IEEE) received the M.S. degree in communication engineering from Yuan Ze University, Taoyuan, Taiwan, in 2014. He is currently pursuing the Ph.D. degree with Washington State University, Pullman, WA, USA.

His current research interests include low-power and wideband multi-antenna transceivers, frequency synthesizer, and high-speed I/O circuit design.

Mr. Lin was a recipient of the Yu-Ziang Academic Scholarship in 2013, the Southern Methodist University Graduate Student Travel Grant in 2018, the IEEE Circuits and Systems society (CAS) Travel Award in 2019, and the IEEE Custom Integrated Circuits Conference (CICC) Educational Grants Award in 2020. He was the IEEE Radio Frequency Integrated Circuits (RFIC) Symposium Best Student Paper Award Nominee (out of 12 finalists) in 2020.



**Erfan Ghaderi** (Member, IEEE) received the B.Sc. degree in electrical engineering from the Sharif University of Technology, Tehran, Iran, in 2012, the M.Sc. degree in circuits and systems from the University of Tehran, Tehran, in 2015, and the Ph.D. degree from the School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, USA, in 2020.

He is currently an Analog Design Engineer with Intel Corporation, Hillsboro, OR, USA. His research interests include integrated circuit design for high-speed data converters, phased array applications, and time-domain signal processing.



**Chase Puglisi** (Member, IEEE) received the B.S. and M.S. degrees from Washington State University (WSU), Pullman, WA, USA, in 2017 and 2020, respectively.

While working toward his M.S. degree, he worked with the Systems-on-Chip (SoC) Laboratory, WSU. He currently works as an Analog Design Engineer with Texas Instruments, Tucson, AZ, USA. His research interests include time-based circuits, hybrid amplifier design, and multi-channel phased array systems.



**Jayce Gaddis** (Student Member, IEEE) received the dual B.S. degrees (*summa cum laude*) in electrical engineering and computer engineering from Washington State University, Pullman, WA, USA, in 2021.

He currently works as an Electrical Design Engineer with Cadence Neuroscience, Redmond, WA, USA, developing analog/mixed-signal systems for next-generation medical devices. His research interests are signal processing, medical electronics, and analog/mixed-signal circuit design.



**Qiuyan Xu** (Graduate Student Member, IEEE) received the B.E. degree in electronic science and technology from Xi'an Jiaotong University, Xi'an, China, in 2017, and the M.S. degree in electrical engineering from Washington State University, Pullman, WA, USA, in 2019, where she is currently pursuing the Ph.D. degree.

Her research interests include time-domain circuits, high-speed mixed-signal designs, and applications in wireless communication systems.



**Danijela Cabric** (Fellow, IEEE) received the M.S. degree in electrical engineering from the University of California at Los Angeles (UCLA), Los Angeles, CA, USA, in 2001, and the Ph.D. degree in electrical engineering from the University of California (UC) at Berkeley, Berkeley, CA, USA, in 2007.

She is currently a Professor of electrical and computer engineering with UCLA. Her research interests are millimeter-wave communications, distributed communications and sensing for the Internet of Things, and machine learning for wireless networks' coexistence and security.

Dr. Cabric received the Samuelsi Fellowship in 2008, the Okawa Foundation Research Grant in 2009, the Hellman Fellowship in 2012, the National Science Foundation Faculty Early Career Development (CAREER) Award in 2012, and the Qualcomm Faculty Award in 2020. She has served as an Associate Editor for *IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING*, *IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS*, *IEEE TRANSACTIONS ON MOBILE COMPUTING*, and *IEEE Signal Processing Magazine*, and an IEEE ComSoc Distinguished Lecturer.



**Sreeni Poolakkal** (Graduate Student Member, IEEE) received the B.Tech. degree from the University of Calicut, Calicut, in 2014, and the M.Tech. degree from IIT Guwahati, Guwahati, India, in 2018. He is currently pursuing the Ph.D. degree with Washington State University, Pullman, WA, USA. He did his master's thesis on low-power and low phase noise oscillators.

Subsequently, he joined Tata Steel, Jamshedpur, India, as an Automation Technologist. His research interests include highly linear and wideband phased array radios.

Mr. Poolakkal was a recipient of the IEEE Solid-State Circuits Society (SSCS), Student Travel Grant Award (STGA) to attend International Solid-State Circuits Conference (ISSCC) 2022.



**Subhanshu Gupta** (Senior Member, IEEE) received the B.E. degree from the National Institute of Technology (NIT), Trichy, India, in 2002, and the M.S. and Ph.D. degrees from the University of Washington, Seattle, WA, USA, in 2006 and 2010, respectively.

He is currently an Assistant Professor of electrical engineering and computer science with Washington State University, Pullman, WA, USA. He has held industrial positions at Maxlinear, Irvine, CA, USA, where he worked on wideband transceivers for SAT-COM and infrastructure applications. His research interests include large-scale phased arrays and wideband transceivers, low-power time-domain circuits and systems, and statistical hardware optimization for next-generation wireless communications, the Internet of Things, and quantum applications.

Dr. Gupta was a recipient of the National Science Foundation CAREER Award in 2020, the Department of Defense DURIP Award in 2021, and the Cisco Faculty Research Award in 2017. He has earlier received the Analog Devices Outstanding Student Designer Award in 2008 and the IEEE RFIC Symposium Best Student Paper Award (Third Place) in 2011. He serves as an Associate Editor for the *IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS* for the term 2020–2021 and also served as a Guest Editor for *IEEE DESIGN AND TEST OF COMPUTERS* in 2019.