# A 4-Channel 60Gb/s Aggregate & 0.167pJ/bit/dB Transceiver Achieving 33% Higher Pin-Efficiency Over Differential Using 4b6w Balanced Coding

Xiaohui Lin<sup>1</sup>, Mohamed Megahed<sup>1,2</sup>, Bella Bose<sup>1</sup>, Tejasvi Anand<sup>1</sup> Oregon State University, Corvallis, USA, <sup>2</sup>MediaTek, Irvine, USA Email: linxiao@oregonstate.edu

Abstract—This paper presents a transceiver using balanced coding with higher pin efficiency while without compromising the signal-to-noise ratio (SNR). The proposed transceiver architecture utilizes a 4b6w (4 bits over 6 wires) balanced encoding technique to achieve 0.66 pin efficiency, which is 33% higher than fully differential. The Rx front end is designed to differentially sense the six lanes to achieve 3dB higher SNR over single-ended. The proposed transceiver achieves a 60Gb/s aggregate data rate (15Gb/s/ch) and compensates for 15dB channel loss at 7.5GHz with an FoM of 0.167pJ/bit/dB in 16nm FinFET.

Keywords—4b6w balanced coding, wireline, I/O pin efficiency

#### I. INTRODUCTION

Evolving artificial intelligence and high-performance computing applications demand higher aggregate data rates between SoC-to-SoC and SoC-to-module. However, achieving higher aggregate data throughput is often constrained by the limited number of I/O pins available in the package. Therefore, there is a need to increase the per-pin data rate in wireline links. There are three primary ways to achieve this goal: (1) increase the baud rate by clocking the data at high frequency, (2) use higher order modulation such as PAM-4/6/8, and (3) increase the per-pin data rate using spatial encoding schemes. This paper investigates the third option to increase the data throughput by leveraging spatial encoding to increase the pin efficiency of the transceiver.

Pin efficiency is defined as the number of bits transmitted divided by the total number of wires used. In the case of single-ended wireline signaling, one channel carries one bit at a time, and therefore, single-ended signaling (SE) has a high pin efficiency of 1, and fully differential signaling (DS) has a low pin efficiency of 0.5. While single-ended signaling has a high pin efficiency, it suffers from high reference noise when the data is sampled at the receiver. High reference noise is due to the fact that the noise on the data is uncorrelated with the noise of the reference voltage used to sample the data. Another drawback of single-ended signaling is the high simultaneous switching noise. The simultaneous switching noise causes the ripple on the transmitter supply voltage when the output drivers of the single-ended signaling are switching, and this noise appears on the transmitted signal. Finally, single-ended has a 3dB lower SNR compared to fully differential signaling.



Fig. 1. Comparison of pin efficiency vs SNR of various signaling techniques.

These drawbacks of single-ended signaling limit the maximum achievable data rates and result in high BER.

Balanced coding technique [1]–[3], in which the codewords contain a balanced (equal) number of ones and zeros in a binary code setting, has been employed by researchers to encode the data in the spatial domain to offer high pin efficiency (>0.5). In the past, researchers have used other forms of spatial encoding schemes to increase the pin efficiency and data throughput such as CNRZ-5 [4], BASES [5] and 4B4W4L [6]. However despite the pioneering innovations, these prior implementations have lower SNR due to (a) use of multiple voltage levels, and (b) limitation on the receiver architecture.

In view of these limitations, this paper presents a 4-channel wireline transceiver using 4b6w balanced coding. The proposed transceiver architecture can achieve a better trade-off between pin efficiency and SNR when compared to the state of the art, as shown in Fig. 1. While this work only demonstrates the balanced coding using two-voltage-level signaling, the concept presented in this work can be extended to design a multi-level balanced coded transceiver to further increase the pin efficiency. There are four main highlights in this work: (a) 33% higher pin efficiency compared to fully differential; (b) 3dB higher SNR compared to singled-ended signaling; (c) less sensitivity to transmitter supply noise due to referenceless sampling; (d) separate tunable delay line on each lane for de-skew among 6 lanes. The rest of the paper is organized as follows: Section II discusses the 4b6w balanced coding and the proposed transceiver architecture. Section III presents the measurement results and section IV concludes this work.

## II. 4B6W BALANCED CODING & PROPOSED ARCHITECTURE

### A. 4b6w Balanced Coding

Fig. 2 shows the 4b6w encoding table. The four-bit information X is mapped to six-bit codeword Y, consisting of four-bit encoded information D and two-bit check symbol C. Mapping is performed such that the output Y has an equal number of 1s and 0s, and therefore, the output is balanced. The encoding logic can be understood as follows: The check symbol C is decided based on the input information bit X. When the number of 1s in X is 1, 2, or 3, then D stays the same as X and its check symbol C is set as 11, 01, or 00, respectively. When the number of 1s in X is 0 or 4, then the first two bits of X are complemented in D, and its check symbol C is set as 10. Therefore, the encoded six-bit output has three 1s and three 0s. The encoded 6 bits are simultaneously transmitted on 6 parallel wires, as shown in Fig. 3. It can be observed from the figure that at every sampling time instance, there is always an equal number of 1s and 0s on all six lanes, which results in balanced data transmission.

#### B. Proposed Transceiver Architecture

Fig. 4 shows the proposed 6-lane balanced coded 4b6w transceiver architecture. At the transmitter, the balanced encoder transforms 4 streams of 32-bit PRBS data into 6 streams of 32-bit balanced coded data. This balanced encoder is

| Symbol | Innut V                               | Output Y =(D,C) (6bit)                                                                   |                                                     |  |  |  |  |
|--------|---------------------------------------|------------------------------------------------------------------------------------------|-----------------------------------------------------|--|--|--|--|
|        | Input X = $(x_{1,}x_{2,}x_{3,}x_{4})$ | Encoded Info Bits<br>D=(d <sub>1</sub> ,d <sub>2</sub> ,d <sub>3</sub> ,d <sub>4</sub> ) | Check Symbol<br>C=(c <sub>1</sub> ,c <sub>2</sub> ) |  |  |  |  |
| 1      | 0000                                  | 1100                                                                                     | 10                                                  |  |  |  |  |
| 2      | 0001                                  | 0001                                                                                     | 11                                                  |  |  |  |  |
| 3      | 0010                                  | 0010                                                                                     | 11                                                  |  |  |  |  |
| 4      | 0011                                  | 0011                                                                                     | 01                                                  |  |  |  |  |
| 5      | 0100                                  | 0100                                                                                     | 11                                                  |  |  |  |  |
| 6      | 0101                                  | 0101                                                                                     | 01                                                  |  |  |  |  |
| 7      | 0110                                  | 0110                                                                                     | 01                                                  |  |  |  |  |
| 8      | 0111                                  | 0111                                                                                     | 00                                                  |  |  |  |  |
| 9      | 1000                                  | 1000                                                                                     | 11                                                  |  |  |  |  |
| 10     | 1001                                  | 1001                                                                                     | 01                                                  |  |  |  |  |
| 11     | 1010                                  | 1010                                                                                     | 01                                                  |  |  |  |  |
| 12     | 1011                                  | 1011                                                                                     | 00                                                  |  |  |  |  |
| 13     | 1100                                  | 1100                                                                                     | 01                                                  |  |  |  |  |
| 14     | 1101                                  | 1101                                                                                     | 00                                                  |  |  |  |  |
| 15     | 1110                                  | 1110                                                                                     | 00                                                  |  |  |  |  |
| 16     | 1111                                  | 0011                                                                                     | 10                                                  |  |  |  |  |

Fig. 2. The 4b6w balanced encoding table.



Fig. 3. The 4b6w encoded data on a 6-lane bus.



Fig. 4. Proposed 6-lane transceiver architecture.

followed by 32:1 serialization, 3-tap FFE with 3-bit resolution, and lastly SST-based output driver. In the proposed transceiver, all 6 lanes have to be simultaneously sampled at the receiver to correctly decode the balanced encoded data. Therefore, skew between the lanes due to path mismatch on the package and PCB traces must be removed. To remove the skew, the transmitter has programmable delay lines on the clock path to each of the 6 transmitter slices to individually de-skew each of the 6 lanes.

At the receiver, the  $50\Omega$  terminations on all 6 lanes are shorted together due to the pseudo-differential nature of balanced encoded data, which helps to reduce the transmitter power consumption. The receiver has a half-rate architecture; the even and odd path each consists of 15 samplers followed by 1:4 demultiplexers and a balanced coding decoder. The 15 samplers at the receiver are all reference-free two-input samplers. For a 6-lane communication system, comparing each two lanes by the comparators translates to the use of  ${}^6C_2$  or 15 samplers at the receiver for balanced decoding. In other words, all of the possible differential comparison is performed on the 6 lanes using 15 samplers

#### C. 4b6w Decoding with Differential Sensing

Fig. 5 shows the proposed decoding logic and example. In the decoding table, each of the 15 columns represents one sampler, and the value in each row corresponds to the differential signal seen at the input of the receiver sampler for all 16 possible transmitted symbols. For the sake of simplifying the explanation of the decoding logic, it has been assumed that transmitted logic 1 and 0 correspond to +1V and 0V on the wire. Since the data on the receiver is sampled differentially, the differential input to the samplers can be 0V, -1V, and 1V. The proposed decoding is designed by only looking at the samplers that have non-equal inputs, as highlighted in green in the decoder table. The decoding is explained with the help of two examples:

Example 1: Assume that the received data is sampled on the 15 comparators such that the comparator C5, which takes the difference of wire 1 and wire 6, sees 1V input difference, and the other comparators C7, C9, C11, C13, and C15 see 1V, 1V,



Fig. 5. Proposed 4b6w decoding table.

-1V, -1V, 1V input difference respectively. Accordingly, the output of comparator C5 is logic 1, and that of comparators C7, C9, C11, C13, and C15 is logic 1, 1, 0, 0, 1, respectively. With this, the decoder gives output 0000, corresponding to the first row of the decoding table. As there would be cases when two inputs are equal and comparators will therefore go through metastability and give invalid outputs, the proposed decoding logic allocates the don't care condition (the greyed box X in Fig. 5) for those equal input cases so that only nonequal inputs are utilized by the decoding logic to make the decision.

Example 2: Assume that comparators C4, C7 and C8 see -1V input difference, and C12, C14 and C15 see the 1V input difference. Based on the last row of the decoding table in Fig. 5, the output will be decoded to 1111.

Care was taken during the design to add sufficient resynchronization with positive feedback flip-flops so that the metastability of any comparator does not propagate to the decoding logic. Using the above-mentioned decoding process, each 4-bit symbol can be uniquely decoded when all the chosen comparators (colored in green) see either a 1V or -1V input signal swing mentioned in the table in Fig. 5. The proposed transceiver can be re-configured to single-ended operation for comparison purposes.

#### III. MEASUREMENT RESULTS

The proposed transceiver architecture was designed and fabricated in a 16nm FinFET process. Fig. 6 shows the die micrograph and its zoom-in layout, the measured channel frequency response, and the transceiver measurement setup. The signal generator (Anritsu MG3697C) provides the external clock to the Tx board and Rx board through the power divider. Oscilloscope (Tektronix DSA8300) and BER tester (Tektronix BSA286CL) were used to measure the near-end eye diagram and BER at the receiver output, respectively. The Tx board and Rx board are connected through the communication channel consisting of a BGA package, PCB traces, coaxial cables, and



Fig. 6. (a) Die micrograph; (b) channel loss profile; (c) measurement setup. connectors. The communication channel has a measured 15dB channel loss at a Nyquist frequency of 7.5GHz.

Fig. 7 shows the measured near-end transmitted eye of the PRBS-7 4b6w balanced coded data on all 6 lanes at 15Gb/s/lane. Vertical eye opening is 190mV and horizontal eye opening is 53.2ps. Fig. 8 shows the de-skew range of the delay line, which was calculated by transmitting a repetitive 1100 data pattern and measuring the relative delay in Tx output with maximum and minimum delay settings of the programmable delay line. The de-skew range is measured to be 15ps. This deskew was sufficient to match all 6 lanes with our measurement setup.

Fig. 9 shows the measured bathtub plot for all 4 bits (4 channels) obtained after decoding the 4b6w encoded data at the receiver. Horizontal opening for all the 4 channels for BER less than  $10^{-12}$  is 0.48 UI. The resilience of 4b6w to



Fig. 7. Measured six-lane near-end Tx eye of 4b6w encoded data.



Fig. 8. Measured deskew range of the delay line.



Fig. 9. Measured TRx BER bathtub for all four channels with PRBS7 at 15Gbps/pin.



Fig. 10. Lowest BER for proposed 4b6w TRx and single-ended (SE) with injected noise in the SST driver supply at 15Gbps/pin.



Fig. 11. Meausred in-situ eye for (a) proposed transceiver and (b) single-ended (SE).

the transmitter supply noise is analyzed by injecting random noise (max  $\sigma$ =12.3mV) on the supply of the Tx SST output driver and measuring the lowest Rx BER for 4b6w, as shown in Fig. 10. For the comparison purpose, the transceiver was configured in the single-ended signaling (SE) mode and the effect of Tx SST output driver supply noise was measured. It can be observed that with the increase of injected supply noise, the proposed transceiver with 4b6w encoding demonstrates a good noise immunity with the lowest BER staying lower than  $10^{-12}$  while the lowest BER of single-ended has degraded.

Fig. 11 shows the measured in-situ eye for 4b6w and single ended at 15Gbps at the receiver. The 4b6w has 125mV eye height, which is approximately twice of that in the single-ended (65mV) for the same channel loss (15dB), and therefore demonstrates the 3dB higher SNR of 4b6w over single-ended signaling. Table 12 shows the performance summary of the proposed transceiver and its comparison with the state-of-the-art transceivers. The proposed transceiver achieves a pin efficiency of 0.66, which is 33% more than fully differential,

TABLE I. COMPARISON WITH STATE-OF-THE-ART TRANSCEIVERS

|                               | Encoded Links      |                  |                       |       |                   |                       |                    |            |                        | Conventional Links |               |       |
|-------------------------------|--------------------|------------------|-----------------------|-------|-------------------|-----------------------|--------------------|------------|------------------------|--------------------|---------------|-------|
|                               | Balanced           |                  |                       |       |                   | Chord<br>Signaling Ur |                    | Unbalanced | -                      | -                  | -             |       |
|                               | This Work          |                  | Kim<br>TCAS-II'23     |       | Lee<br>JSSC'16    |                       | Tajalli<br>JSSC'20 |            | McCollough<br>ISSCC'21 | Seong Eref         |               |       |
| Technology (nm)               | 16                 | 16               |                       | 28    |                   | 65                    |                    | 6          | 7                      | 4 16               |               | 6     |
| Encoding &<br>Signaling       | 4b6w               |                  | 4b4w SE BC<br>(BASES) |       | 4b4w<br>4level BC |                       | 5b6w<br>CNRZ-5     |            | 6b7w<br>SE NRZ         | SE NRZ DS NRZ      |               | NRZ   |
| Pin Efficiency                | 0.66               |                  | 1                     |       | 1                 |                       | 0.83               |            | 0.85                   | 1                  | 1 0.5         |       |
| SNR over SE [dB]              | 3                  |                  | -6.02                 |       | -12.55            |                       | 0.51               |            | 0                      | 0                  | 3             |       |
| Lanes                         | 6                  |                  | 4                     |       | 4                 |                       | 6                  |            | 12                     | -                  | 16            |       |
| Data Rate<br>[Gb/s/lane]      | 15                 |                  | 16                    |       | 8                 |                       | 20.83              |            | 40                     | 32                 | 56            |       |
| Aggregate Data<br>Rate [Gb/s] | 60                 |                  | 64                    |       | 32                |                       | 1000               |            | 480                    | -                  | - 448         |       |
| Loss [dB]                     | 15                 | 5                | 5.42                  |       | 9                 |                       | 6                  |            | 8                      | 3.9 8              |               |       |
| Power [mW]                    | Tx                 | Rx               | Tx                    | Rx    | Tx                | Rx                    | Tx*                | Rx*        |                        |                    | Tx            | Rx    |
| rowei [ilivv]                 | 50.16              | 100.57           | 101.4                 | 100   | 58.88             | 17.6                  | 448.8              | 438.6      | -                      | -                  | 304.6         | 479.4 |
| Supply [V]                    | 0.75, 0.9          | 9, 0.95          | 1.0,                  | 1.2   | 1.3               | 2                     |                    |            | -                      | -                  | 0.9, 1.2, 1.8 |       |
| Tx+Rx Efficiency<br>[pJ/bit]  | 2.5                | <b>2.51</b> 3.15 |                       | 2.39  |                   | 1.02                  |                    | 1.7        | 0.44                   | 2.25               |               |       |
| FOM **[pJ/bit/dB]             | <b>0.167</b> 0.581 |                  | 81                    | 0.265 |                   | 0.17                  |                    | 0.212      | 0.113                  | 0.113 0.281        |               |       |

<sup>\*:</sup> calculated from subtracting PLL power; \*\*: Energy per bit divided by channel loss.



Fig. 12. Measured TRx power breakdown.

and SNR of 3dB over single-ended. It achieves an FoM of 0.167pJ/bit/dB, which is one of the lowest compared to the prior spatial encoding-based wireline transceivers. Fig. 12 shows the power breakdown from the measurements.

#### IV. CONCLUSIONS

This paper presented a 4b6w balanced coding transceiver with 3dB SNR gain over single-ended signaling. The measured transceiver achieves a 60Gb/s aggregate data rate with the FoM of 0.167pJ/bit/dB at 15dB channel loss in 16nm FinFET.

#### ACKNOWLEDGMENT

This project is funded by NSF grant number 2006571 and CDADIC. We thank Tektronix for instrument support and Intel for 16nm FinFET tape-out support.

#### REFERENCES

- L. G. Tallini and B. Bose, "Design of balanced and constant weight codes for VLSI systems," *IEEE Transactions on Computers*, vol. 47, no. 5, pp. 556-572, May 1998.
- [2] D. Oh et al., "Pseudo-differential signaling scheme based on 4b/6b multiwire code," IEEE-EPEP Electrical Performance of Electronic Packaging, pp. 29-32, 2008.
- [3] A. Abbasfar, "Generalized differential vector signaling," *International Conference on Communications*, pp. 1-5, 2009.
- [4] A. Tajalli et al., "A 1.02-pJ/b 20.83-Gb/s/Wire USR transceiver using CNRZ-5 in 16-nm FinFET," *IEEE Journal of Solid-State Circuits*, vol. 55, no. 4, pp. 1108-1123, April 2020.
- [5] H. Kim, H. Seo, H. Kim, C. Yoo, J. Han, "A 16-Gb/s/wire 4-wire short-haul transceiver with balanced single-ended signaling (BASES) in 28-nm CMOS," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 70, no. 8, pp. 2799-2803, Aug. 2023.
- [6] S-M. Lee et al., "A single-ended parallel transceiver with four-bit four-wire four-level balanced coding for the point-to-point DRAM interface," *IEEE Journal of Solid-State Circuits*, vol. 51, no. 8, pp. 1890-1901, Aug. 2016.