

# A $345\mu\text{W}$ 1GHz Process and Temperature Invariant Constant Slope-and-Swing Ramp-based 7-bit Phase Interpolator for True-Time-Delay Spatial Signal Processors

Soumen Mohapatra, Chung-Ching Lin, Mohammad Chahardori, Erfan Ghaderi, Md Aminul Hoque, Subhanshu Gupta, Deukhyoun Heo

School of Electrical Engineering and Computer Science, Washington State University, USA  
soumen.mohapatra@wsu.edu

**Abstract**— In the baseband time delay (TD) elements used for delay compensation in discrete-time beamformers, phase interpolator (PI) plays a crucial role as the resolution of the PI defines the delay resolution of the TD. In this paper, we present a process and temperature invariant high-resolution and highly linear low-power PI. The proposed PI uses current integration which generates an adaptable constant slope-and-swing ramp signal to achieve low power. By switched-capacitor bias generation, the PI linearity is enhanced with 0.2 LSB DNL and 0.3 LSB INL, respectively. The 7-bit PI is realized in 65nm CMOS technology can generate the full range delay with a resolution of 8psec with the input of 1GHz. The PI consumes a power of  $345\mu\text{W}$  and occupies an active area of  $0.021\text{mm}^2$ .

**Keywords**— Ramp-rate tracking, constant slope-and-swing, phase interpolator, ramp-based, baseband time delay.

## I. INTRODUCTION

Quantized RF transceivers with phase-locked loops (PLLs) [1], [2] and discrete-time beamformers [3] need full-range, high-resolution, and low-power phase interpolators (PI) for accurate delay compensation at baseband frequencies. For example, in the discrete-time true-time delay (TTD) spatial signal processors (SSP), the inter-element delay range between the receiving antennas is compensated precisely in the baseband by sampling using PI-based clock generators instead of using digital-to-time converters (DTC) with finite range only [3]. In [3], the PIs and the time-interleaver accounted for more than 90% power consumption of the overall 4-element array making it infeasible to scale the antenna arrays required in next-generation wireless communication systems. The PI takes the clock inputs with definite phase difference and generates a clock with an interpolated phase that is a weighted sum of input signal phases determined by the digital code. The PI's linearity depends not only on the time difference between the input signals but also their rise times. Thus, step inputs to the PI can generate more non-linearity whereas finite rise time can yield higher linearity. In general, PIs have been implemented using either current-mode logic (CML) [1], [4], [5] (Fig.1(a)) or arrayed inverters [2] (Fig.1(b)). In CML-based PIs, though the dynamic power consumption is less due to limited output swing, the linearity and delay resolution are significantly impacted by the DAC. In addition, a slew rate control buffer is needed to generate finite rise time at the input, which consumes extra power. In [4], the linearity is improved by interpolating multiple cascaded stages where each stage interpolates two



Fig. 1. The architectures of PI. (a) CML-based PI; (b) inverter-based PI; and (c) the proposed constant slope-and-swing ramp-based PI.

signals with a small phase difference but at the cost of higher power consumption. In inverter-based PIs, the drive strength of PMOS and NMOS transistors in the arrayed inverter is varied based on digital codes to generate interpolated clocks with different delays. Despite good linearity, rail-to-rail swing across the load capacitance results in high dynamic power consumption as also evident in [3].

This work proposes a constant slope-and-swing ramp-based PI to address both the linearity issue in CML-based PI and dynamic power loss in inverter-based PI. As shown in Fig.1(c), a linear ramp is compared with the resistive DAC output generated using digital codes to create interpolated outputs. The dynamic power is reduced by limiting the ramp swing instead of full rail-to-rail swing. A ramp generator with ramp-rate tracking produces a highly linear ramp signal which is stable across process, voltage, and temperature (PVT) variations alleviating linearity issues in earlier PIs. The PI performance is validated in a standalone IC as well as a 500MHz bandwidth 2-element TTD SSP adapted from [6]. The rest of the paper is organized as follows. Section II discusses the proposed PI architecture. Section III presents the measurement setup and results for both standalone PI and the 2-element TTD SSP with three identical PIs followed by the conclusions in section IV.

## II. PROPOSED CONSTANT SLOPE-AND-SWING RAMP-BASED PI

This section presents the architecture for the proposed ramp-based PI. Fig.2(a) shows the PI block diagram



Fig. 2. (a) Top level architecture of proposed PI; and (b) the proposed constant swing-and-slope PVT independent ramp generator with ramp-rate tracking, replica bias, and I/C integrator.

comprising of four main components: (i) a digital-to-analog converter (DAC), (ii) CML-to-CMOS converter, (iii) a ramp generator with ramp-rate tracking circuits, and (iv) a threshold-based comparator along with a 50% on-time generator. Three identical copies of the PI have been implemented for delay compensation on the 2-element TTD SSP. The following sub-sections expand on the design for each block.

1) Resistor-divider-based DAC: The 5-bit digital input code to the PI  $D_{IN}<5:1>$  is applied to a low-power resistor-divider-based DAC generating different reference voltages from 0.55V to 0.65V based on the input codes. The DAC unit resistor,  $R$ , has been chosen to alleviate mismatch issues as it affects both the PI linearity and the DAC power consumption.

2) CML-to-CMOS converter: The external clock input to the PI at 2GHz is first divided by 2 and then applied to a CML-to-CMOS converter to generate 4 quadrature rail-to-rail outputs CLKI, CLKIB, CLKQ, and CLKQB. External input  $D_{IN}<7:6>$  selects one of the above four clock outputs and allows it to go through the ramp generator.

3) Ramp generator with ramp-rate tracking: The ramp generator in Fig.2(b) integrates the capacitor current based on the input clock phase. When the clock is high, the ramp has a positive slope and vice versa with the clock low. External bias voltage  $V_{BIAS}$  determines the magnitude of the integrating current to the capacitor. This integration generates a linear ramp waveform with a reduced peak-to-peak swing ( $V_{ramp}=0.2V$ ) to ensure minimum dynamic power loss. However, mismatches in both the mirrored current for integration and the capacitor in the ramp generator directly



Fig. 3. Ramp generator output with and without ramp-rate tracking circuit across process corners and temperatures.

affects the PI linearity significantly reducing its signal swing. As a result, the comparator is unable to resolve the difference between the PI ramp and the DAC output. In addition, PVT variations of the current and the capacitor in the ramp generator results in the ramp slope (defined as the ratio of integrating current over capacitor) not being constant. To alleviate variations in the ramp slope across PVT, we propose ramp-rate tracking with replica bias which ensures that the integrating current is a function of the capacitor.

a) Ramp-rate tracking consists of a switched-capacitor based voltage to current (V2I) converter and a non-overlapping clock generator as shown in Fig.2(b). The feedback loop in the ramp-rate tracking sets the  $V_{FB}$  node to  $V_{BIAS}$  that creates the current,  $I (=V_{BIAS} \cdot f \cdot C_{BIAS})$ , through V2I where  $V_{BIAS}$  is the bias voltage and  $f$  is the sampling frequency. By using the switched capacitor as the load impedance ( $=1/f/C_{BIAS}$ ) to the V2I converter,  $I$  is set to be proportional to  $C_{BIAS}$ . The non-overlapping clock phases  $\phi_1$  and  $\phi_2$  for the switched-capacitor circuit is generated by dividing one of the unused phases of the CML-to-CMOS converter output. A divider ratio of 32 is selected to relax the bandwidth requirements for the V2I converter. Additional low-pass RC filter is added to the PMOS mirror to mitigate effects of high-frequency noise on the integrating current.

b) Replica bias is shown in Fig.2(b) sets the DC level of the ramp signal to  $V_{CTRL}$  ( $=0.6V$ ). The feedback loop in the replica bias consumes  $20\mu W$  only mostly contributed by the opamp with sub-threshold input-pair devices that helps in limiting the overall power consumption of the PI.

c) I/C integrator multiplies the V2I current from the ramp-rate tracking by  $M$  and generates the integrating current of  $M \cdot V_{BIAS} \cdot f \cdot C_{BIAS}$ . Therefore, the ramp slope,  $m$ , is expressed as:

$$m = \frac{M \cdot V_{BIAS} \cdot f \cdot C_{BIAS}}{C} \quad (1)$$

From the above equation,  $m$  is proportional to  $C_{BIAS}/C$ . As both  $C_{BIAS}$  and  $C$  are implemented using metal-insulator-metal (MIM) capacitors, the effect of process and temperature variations are alleviated ensuring a constant ratio between them. This is verified by simulation across multiple corners as shown in Fig.3 with the nominal swing of  $V_{ramp}$  set as 0.2V. Without the ramp-rate tracking,  $V_{ramp}$  varies significantly from 70mV(SS, 100°C, maximum MIM capacitor) to 335mV(FF, 0°C, min. MIM capacitor). With the



Fig. 4. (a) PI timing diagram for  $DIN<7:6>=00$ , and (b) quadrant selection for the ramp signal input to the comparator using different  $DIN<7:6>$ .

ramp-rate tracking enabled, the signal swing varies by only 35mV from 183mV to 218mV across the same corners.

4) Threshold-based comparator with 50% on-time generator: The comparator compares the DAC output with the ramp signal (RAMP) and produces a pulse-width modulated (PWM) output with variable delay based on the digital codes into the DAC. The comparator only operates in the mid-region of the ramp signal (from 550mV to 650mV) as shown in Fig. 4(a) to reduce its power consumption. To ensure that the entire ramp range is covered while maintaining required comparator performance, we select two differential phases out of the four phases of the CML-to-CMOS output for one quadrant operation. Fig.4(b) shows the selected quadrant for each clock phase. Thus, CLKI is selected for the 1st quadrant ( $45^\circ$ - $135^\circ$ ), CLKQ for the 2nd quadrant ( $135^\circ$ - $225^\circ$ ), CLKIB for the 3rd quadrant ( $225^\circ$ - $315^\circ$ ), and CLKQB for the 4th quadrant ( $315^\circ$ - $45^\circ$ ).

The PWM output of the comparator doesn't have the fixed pulse width. To make constant on-time (50%) as shown in Fig.2(a), the comparator PWM output is thus applied to a divide-by-2 D-flip-flop (DFF) which generates a 50% duty-cycle irrespective of various input ON time. The DFF output and its delayed version ( $T_d = 500\text{ps}$ ) are applied to an XOR gate to generate a 50% duty-cycle clock of 1ns with delays corresponding to each code. Fig.4(a) shows the timing diagram of the proposed PI for one quadrant operation using CLKI ( $DIN<7:6>=00$ ) illustrating the ramp generator output (RAMP), DAC output, PWM comparator output, and the final PI output.

### III. MEASUREMENT RESULTS

Measurement results are reported for two chips: (a) standalone PI test chip for performance characterization, and (b) TTD-SSP chip for PI application in discrete-time beamformer.

#### A. Standalone PI measurement

As shown in Fig.5, 3-PIs have been fabricated in the standalone chip in TSMC 65nm with each PI occupying an active area of  $0.02\text{mm}^2$ . The bottom two PIs are used to check the output mismatch and test internal DC voltages inside the PI. The top PI output as shown in Fig.6 is XORed with the reference clock output ( $CLK_{REF}$ ) which varies for each quadrant. So, the XOR gate output gets repeated after each



Fig. 5. Chip photo of the standalone 3-PIs ( $0.02\text{mm}^2$  active area/PI).



Fig. 6. PI linearity plot (standalone PI measurement setup is shown).



Fig. 7. PI DNL INL versus code.

quadrant. The XOR gate output passes through the RC-low pass filter. XOR gate acts as a phase detector and the low pass filtered DC output gives the phase delay output for each code. PI linearity plot is shown for one quadrant in Fig.6 where the DC average output varies linearly for PI code.

Based on the phase detector output, we measured the linearity performance as shown in Fig.7 with measured DNL max of 0.2 LSB and INL max of 0.3 LSB at 1GHz PI frequency.

The PI consumes  $345\mu\text{W}$ , out of which comparator ( $=100\mu\text{W}$ ), DAC ( $=40\mu\text{W}$ ), ramp generator ( $=130\mu\text{W}$ ), 50% on-time generator ( $=50\mu\text{W}$ ), and biasing ( $=25\mu\text{W}$ ). By making sure comparator input-referred noise is not limiting the resolution, PI delay resolution can be enhanced further by increasing the DAC bits without compromising the power consumption unlike inverter-based or CML-based PIs.

#### B. PI measurement in 2-channel TTD SSP IC

PIs play a fundamental role in the delay compensation circuits in switched capacitor array (SCA) based beamformers. The PI delay compensates the inter-element delay of the incident wave. In this work, a two-channel switched-cap array (SCA) based TTD SSP is implemented which requires three identical PIs for delay compensation in the SCA as shown in Fig.8. We note that the inter-element delay of adjacent



Fig. 8. Two-channel switched-cap array (SCA) baseband (BB) TTD SSP diagram with 3 copies of the proposed PI..



Fig. 9. Measured beamforming gain using the 2-channel TTD SSP varying the PI codes ((inset 1) chip micrograph occupying 1.09mm<sup>2</sup>).

antennas at RF is mathematically equivalent to the delay after down conversion accompanied with a phase shift [3]. The antenna spacing and the phase shift at LOs are emulated using the Xilinx ZCU111 RFSoC and applied to the 2-channel TTD SSP adapted from [6]. The proposed PIs are then used to precisely compensate the inter-element delay after down conversion. For PI functionality characterization, a single tone input is applied to the TTD SSP at 400MHz using the ZCU111 under multi-tier synchronization mode. The PI control words are applied using a control word generator through the on-chip serial-to-parallel interface. The beamforming gain is measured by varying the control word of only the second PI for the first quadrant only while the other PI control words are fixed to zero. As shown in Fig.9, 2.5dB (ideally 3dB) beamforming gain is observed. The beamforming gain deviates from the expected by around 0.5dB due to on-chip mismatch and parasitic capacitance. Bypassing the PIs and applying the same external delays result in a measured deviation of 0.2dB. Though imperfections exist, the measured results confirm the beamforming operation using the proposed PI. Additionally, since the PIs in the BB SSP do not require fast switching as long as the delay setting behavior is deterministic and can be captured by a look-up table, the imperfections can be remedied in practice using one-time calibration at start-up.

Table 1. Comparision with state-of-the art .

| Metrics                 | This Work    | [1]   | [7]     | [8]   | [9]    | [5]   |
|-------------------------|--------------|-------|---------|-------|--------|-------|
| Arch.                   | CSS          | CML   | Inv.    | EIP   | Charge | CML   |
| Tech. (nm)              | <b>65</b>    | 65    | 65      | 28    | 40     | 28    |
| Res. (bits)             | <b>7</b>     | 4     | 8       | 11    | 12     | 7     |
| Freq. (GHz)             | <b>1</b>     | 0.5   | 0.1-1.5 | 2     | 2.5    | 2-11  |
| DNL (LSB)               | <b>0.2</b>   | 0.23  | 0.52    | 1.25  | 0.87   | 0.5   |
| INL (LSB)               | <b>0.3</b>   | 0.6   | 1.33    | 4.9   | 3.83   | 1.1   |
| P (mW)                  | <b>0.345</b> | 1.508 | 4.3     | 19.8  | 7.1    | 18.6  |
| PE (mW/GHz)             | <b>0.345</b> | 3.16  | 2.866   | 9.9   | 2.84   | 1.69  |
| Supply (V)              | <b>1</b>     | 1     | 1.2     | 1.1   | 1.1    | 1     |
| Area (mm <sup>2</sup> ) | <b>0.021</b> | N/A   | 0.06    | 0.009 | N/A    | 0.022 |

Table 1 compares the PI performance with the prior art [1], [5], [7], [8], [9] exhibiting state-of-the-art energy efficiency of 0.345 mW/GHz and lowest reported 0.2 LSB DNL and 0.3 LSB INL performance.

#### IV. CONCLUSIONS

Massive antenna arrays demand calibrated time references at ultra-low power. This paper presents a low-power constant slope-and-swing PT independent ramp-based PI in 65nm CMOS. The proposed PI includes a ramp-rate tracking circuit ensuring constant-slope and constant-swing ramp across PVT alleviating linearity concerns. The PI performance and functionality have been measured with a standalone IC as well as in a 2-channel TTD SSP successfully demonstrating the delay shift required to compensate for the inter-element delay.

#### ACKNOWLEDGMENTS

This work was supported in part by the U.S. NSF under Grants CNS-1955306, EECS-2030159, NSF CAREER 194688, JCATI, and the CDADIC.

#### REFERENCES

- [1] R. K. Nandwana *et al.*, “A calibration-free fractional-n ring pll using hybrid phase/current-mode phase interpolation method,” *IEEE Journal of Solid-State Circuits*, vol. 50, no. 4, pp. 882–895, 2015.
- [2] M. Mansuri *et al.*, “A scalable 0.128–1 tb/s, 0.8–2.6 pj/bit, 64-lane parallel i/o in 32-nm cmos,” *IEEE Journal of Solid-State Circuits*, vol. 48, no. 12, pp. 3229–3242, 2013.
- [3] E. Ghaderi *et al.*, “Four-element wide modulated bandwidth mimo receiver with gt;35-db interference cancellation,” *IEEE Transactions on Microwave Theory and Techniques*, vol. 68, no. 9, pp. 3930–3941, 2020.
- [4] G. Wu *et al.*, “A 1–16 gbs all-digital clock and data recovery with a wideband high-linearity phase interpolator,” *IEEE Transactions on Very Large Scale Integr. (VLSI) Syst.*, vol. 24, no. 7, pp. 2511–2520, 2016.
- [5] E. Monaco *et al.*, “A 2–11 ghz 7-bit high-linearity phase rotator based on wideband injection-locking multi-phase generation for high-speed serial links in 28-nm cmos fdsoi,” *IEEE Journal of Solid-State Circuits*, vol. 52, no. 7, pp. 1739–1752, 2017.
- [6] C.-C. Lin *et al.*, “A 4-element 800mhz-bw 29mw true-time-delay spatial signal processor enabling fast beam-training with data communications,” in *ESSCIRC 2021 - IEEE 47th European Solid State Circuits Conference (ESSCIRC)*, 2021, pp. 287–290.
- [7] M.-S. Chen, A. A. Hafez, and C.-K. K. Yang, “A 0.1–1.5 ghz 8-bit inverter-based digital-to-phase converter using harmonic rejection,” *IEEE Journal of Solid-State Circuits*, vol. 48, no. 11, pp. 2681–2692, 2013.
- [8] S. Sievert *et al.*, “A 2 ghz 244 fs-resolution 1.2 ps-peak-inl edge interpolator-based digital-to-time converter in 28 nm cmos,” *IEEE Journal of Solid-State Circuits*, vol. 51, no. 12, pp. 2992–3004, 2016.
- [9] H. Jiang *et al.*, “A 12-bit 0.5–2.4-ghz 0.65°-peak-inl parasitic-insensitive digital-to-phase converter,” *IEEE Microwave and Wireless Components Letters*, vol. 31, no. 1, pp. 68–71, 2021.