

# A Single-Clock-Phase Sense Amplifier Architecture with 9x Smaller Clock-to-Q Delay Compared to the StrongARM & 6.3dB Lower Noise Compared to Double-Tail

Xiaohui Lin, Mohamed Megahed, and Tejasvi Anand

Oregon State University, Corvallis, OR 97331, USA, email: linxiao@oregonstate.edu

## Abstract

A single-clock-phase sense amplifier architecture with a strong regeneration is proposed. Designed in 22nm FinFET, the proposed architecture has a 9x smaller  $t_{CQ}$  delay compared to the conventional StrongARM latch and 6.3dB lower input referred noise compared to the Double-Tail architecture for similar input transistor size and power consumption.

**Keywords:** Sense amplifier, slicer, clocked comparator.

## Introduction

Fast decision-making sense amplifier (SA) is a critical component in high-speed wireline links and high-speed analog-to-digital converters. The speed of decision-making is defined as the delay between the clock triggering edge to the valid output, namely clock-to-Q delay or  $t_{CQ}$ . Conventional StrongARM latch architecture (Fig. 1) suffers from large  $t_{CQ}$  delay due to (a) it requires two phases of common-mode (CM) discharge before entering regeneration phase [1], and (b) the input pair causes the degeneration on the cross-coupled NMOS pair [2], which thus prolongs the regeneration time. The improved two-stage sense amplifier architectures such as Double-Tail comparator [3] and Elzakker latch [4] can reduce  $t_{CQ}$ . However, this  $t_{CQ}$  reduction comes at the cost of using two clock phases instead of one, which requires a stricter timing of the clock phases. Moreover, as discussed in [5], the Double-Tail latch suffers from higher input referred noise. In view of these limitations, we propose a single-stage, single-clock-phase sense amplifier architecture (Fig. 1), with averaged  $t_{CQ}$  9x smaller compared to the StrongARM latch (across 5 chips), similar and even smaller  $t_{CQ}$  compared to the double-clock-phase architectures [3][4]. The measured input referred noise of the proposed architecture is comparable to StrongARM and 6.3dB lower than Double-Tail architecture.

## Comparison with the Conventional StrongARM Latch

Fig. 2 shows a visual comparison of the operation between the proposed sense amplifier and the StrongARM. The input voltages to the sense amplifiers are VIP and VIN such that VIP > VIN. During the precharge phase (CLK=0), the nodes  $A_S$ ,  $B_S$ ,  $VOP_P$  and  $VON_P$  in the StrongARM are reset to  $V_{DD}$ . Once CLK goes high at time  $t=0s$ , the output node voltages  $VOP_P$  and  $VON_P$  experience an immediate discharge towards  $V_{DD}/2$ , enabling the proposed latch to enter the regeneration phase earlier than StrongARM latch. The second reason for the smaller  $t_{CQ}$  in the proposed latch is more active cross-coupled pairs during regeneration, with no degeneration in the NMOS cross-coupled transistors ( $M4/M4'$ ), and a fewer stack of transistors from  $V_{DD}$  to ground as compared to the StrongARM. Since  $VOP_P$  and  $VON_P$  reach  $V_{DD}/2$  faster due to charge sharing, the proposed latch enters the regeneration region with all three cross-coupled pairs strongly turned on, which gives stronger positive feedback.

## The Operation of the Proposed Latch

Fig. 3 shows the detailed operation of the proposed architecture in three phases. Phase I is the precharge phase (CLK=0), with nodes X and Y discharged to the ground,  $VOP_P$ ,  $VON_P$

precharged to  $V_{DD}$ . Once CLK switches to high it enters Phase II, charge-share dominant regeneration phase. The voltage on nodes  $VOP_P$ ,  $VON_P$  starts to drop from  $V_{DD}$  due to (a) charge sharing between the parasitic capacitors  $C_{ON}$ ,  $C_{OP}$ ,  $C_X$ , and  $C_Y$ , (b) the strong discharge path given by transistors  $M4$ ,  $M4'$ , and (c) another discharge path provided by the input pair  $M1$  and  $M1'$ , which is modeled by their common-mode current ( $I_{CM}$ ). When  $VOP_P$  and  $VON_P$  reach  $V_{DD}-V_{TH}$ ,  $M5$  and  $M5'$  are activated, and thus the latch enters the strong regeneration phase, which can be divided into two separate time regions  $t_2$  and  $t_3$ . During time  $t_2$ , the output capacitors  $C_{ON}$ ,  $C_{OP}$  are discharged by currents  $ID_{N_L}$  and  $ID_{N_R}$  through  $M2$ ,  $M4$ ,  $M2'$  and  $M4'$ , respectively. As a result, voltages on nodes  $VOP_P$ ,  $VON_P$  keep reducing till they reach the trip point  $V_{DD}/2$ . During time duration  $t_3$ ,  $VOP_P$  and  $VON_P$  start to go in an opposite direction until reaching  $V_{DD}$  and GND respectively. Once the proposed sense amplifier enters  $t_3$ , the strong regeneration pushes  $VON_P$  and  $VON_N$  in the opposite direction.

## Measurement Results

Four sense amplifier architectures were designed for apple-to-apple comparison in 22nmFinFET and sized to consume similar power with the same input transistor size. Delay line consisting of M inverters and N sense amplifiers was designed to measure  $t_{CQ}$  delay (Fig. 4). Output of the delay line ( $\phi_{OUT}$ ) is a narrow pulse, shown in the measured output (Fig 4), whose width is equal to  $N \times t_{CQ} + M \times t_{INV}$ , where  $t_{INV}$  is one inverter delay. Two such delay lines with different number of inverters (M) and sense amplifiers (N) were used to measure two different pulse widths. By solving the two linear equations with two unknowns  $t_{CQ}$  and  $t_{INV}$ ,  $t_{CQ}$  can be estimated. Operating at 0.95V and measured across 5 chips, for input difference  $\Delta V_{IN}$  of 10mV, energy efficiency of 14.2 fJ/decision, the proposed sense amplifier architecture has an averaged  $t_{CQ}$  of 99.3ps for input common mode  $V_{CM} = 0.35V$ , which is 9x smaller  $t_{CQ}$  as compared to the StrongARM latch and 3.3x smaller  $t_{CQ}$  compared to Elzakker SA[4] (Fig. 5). Measured  $t_{CQ}$  versus  $V_{CM}$  change at input difference  $\Delta V_{IN}$  of 50mV and its sensitivity towards  $\Delta V_{IN}$  change at various  $V_{CM}$  shows that the proposed architecture has only 11.6ps change of  $t_{CQ}$  toward 100mV input difference  $\Delta V_{IN}$  change (10mV-110mV) at  $V_{CM}=0.35V$ , which is smallest sensitivity compared to the prior architectures implemented on the same chip. The noise measurement was done by measuring 20,000 samples for each measurement point at  $F_{CLK}=40MHz$  (Fig. 6). The proposed architecture achieves 6.3dB lower input referred noise compared to the Double-Tail architecture and similar noise compared to the StrongARM at  $V_{CM}=0.35V$ . Die micrograph is shown in Fig. 4. The proposed architecture achieves the smallest energy delay product of 1241.1 fJ·ps compared to the prior published architectures (Table I).

**Acknowledgements** This work was supported by NSF grant number 2006571. We thank Intel for 22nm FinFET tape-out support.

## References

- [1] A. Abidi, CICC, 2014.
- [2] B. Razavi, SSC Magazine, Spring 2015.
- [3] D. Schinkel, ISSCC, 2007.
- [4] M. Van Elzakker, ISSCC, 2008.
- [5] H. Xu, TCAS-I, Aug. 2019.



Fig. 1: Conventional sense amplifier architectures and the proposed sense amplifier architecture with single clock phase and one stage.



Fig. 3: The operation and the modeling of the proposed sense amplifier architecture.



Fig. 4: Block diagram of the proposed delay line structure when VIP>VIN for t<sub>CQ</sub> measurement and its measured output waveform. Die micrograph.



Fig. 2: Comparison between the StrongARM and the proposed architecture when VIP>VIN and CLK goes high with the associated timing diagram.



Fig. 5: Measured t<sub>CQ</sub> delay of 5 chips vs. input amplitude ΔV<sub>IN</sub> of four architectures and its zoom-in view (top). Measured t<sub>CQ</sub> delay vs. input V<sub>CM</sub> (bottom left), and t<sub>CQ</sub> variation per 10mV of ΔV<sub>IN</sub> vs. input V<sub>CM</sub> of four architectures (bottom right).

Table I: Measured performance & comparison with the prior-art.

| Technology                                                         | This Work                          |                     |                       |                      | Prior Art              |                        |                      |
|--------------------------------------------------------------------|------------------------------------|---------------------|-----------------------|----------------------|------------------------|------------------------|----------------------|
|                                                                    | Proposed Architecture              | StrongARM [2]       | Double-Tail Latch [3] | Elzakker Latch [4]   | Schinkel, ISSCC'07 [3] | Goll, ISSCC'09         | Bindra, JSSC'18      |
| Supply Voltage [V]                                                 | 0.95                               | 0.95                | 0.95                  | 0.95                 | 1.2                    | 1.2                    | 1.2                  |
| Circuit Topology                                                   | Charge-Sharing Strong Regeneration | Cross Coupled Pairs | Double-Tail           | Modified-Latch       | Double-Tail            | Modified-Latch         | Dynamic-Bias         |
| Number of Stages                                                   | 1                                  | 1                   | 2                     | 2                    | 2                      | 2                      | 2                    |
| CLK Phases Required                                                | 1                                  | 1                   | 2                     | 2                    | 2                      | 2                      | 2                    |
| V <sub>CM</sub> [V]                                                | 0.35 0.45                          | 0.35 0.45           | 0.35+ 0.45+           | 0.35 0.45            | 0.6                    | 0.6                    | 0.6                  |
| ΔV <sub>IN</sub> [mV]                                              | 10                                 | 10                  | 30                    | 10                   | 10                     | 18.6                   | 10                   |
| t <sub>CQ</sub> Delay [ps]                                         | 99.3 87.4                          | 917.5 687.5         | 144.9+ -              | 333.1 No Output      | 35° (relative delay)   | 64                     | 92° (relative delay) |
| t <sub>CQ</sub> variation vs. ΔV <sub>IN</sub> variation [ps/10mV] | 1.00 1.97                          | 37.60 7.66          | 6.64 -                | 18.80 2.86           | 8.10*                  | 6.67*                  | 38.38*               |
| # of chips measured                                                |                                    |                     |                       |                      |                        |                        |                      |
| Clock Frequency (F <sub>CLK</sub> )                                | 1.2 GHz                            | 1.2 GHz             | 1.2 GHz               | 1.2 GHz              | 1 GHz                  | 7 GHz                  | 25 MHz               |
| Energy per Decision (fJ)                                           | 14.2                               | 15.7                | 14.2                  | 17.7                 | 113.0                  | 185.7                  | 34.0                 |
| Input Referred RMS Noise (mV)                                      | 1.096                              | 1.223               | 2.260                 | 1.880                | 1.5                    | -                      | 0.4                  |
| Input Pair Size (W/L)                                              |                                    |                     |                       |                      | -                      | -                      | -                    |
| Energy Delay Product (fJ·ps)                                       | 1410.1 1241.1                      | 14404.8 10793.8     | 20576.7+ -            | 5895.9 -             | 3955                   | 11884.8                | 31280                |
| Area (μm <sup>2</sup> )                                            | 10.8 μm <sup>2</sup>               | 8.6 μm <sup>2</sup> | 11.1 μm <sup>2</sup>  | 12.4 μm <sup>2</sup> | 82.5 μm <sup>2</sup>   | 319.48 μm <sup>2</sup> | 125 μm <sup>2</sup>  |

\*: Values read from the graphs

+: Not functioning @ ΔV<sub>IN</sub>=10mV, measurement result taken @ ΔV<sub>IN</sub>=30mV



Fig. 6: Measured input referred cumulative noise distribution (marker) and fitting to Gaussian distribution of four architectures (line) at V<sub>CM</sub> = 0.35V. Measured input referred noise vs. input V<sub>CM</sub> of four architectures.