

# An Offset-Canceling Approximate-DFT Beamforming Architecture for Wireless Transceivers

Haixiang Zhao & Soumyajit Mandal  
 Department of Electrical Engineering  
 and Computer Science  
 Case Western Reserve University  
 Cleveland, OH 44106  
 Email: {hxz355,sxm833}@case.edu

Viduneth Ariyarathna  
 Arjuna Madanayake  
 Department of Electrical  
 and Computer Engineering  
 University of Akron  
 Akron, OH 44325  
 Email: bpv1@zips.uakron.edu  
 and arjuna@uakron.edu

Renato J. Cintra  
 Dept. de Estatística, UFPE, Brazil  
 University of Calgary, AB, Canada  
 Email: rjdsc@stat.ufpe.org

**Abstract**—We describe a current-mode multi-beam beamforming approach for 5G wireless applications based on a low-complexity approximate-DFT (a-DFT). Dynamic current mirrors are used to cancel errors in the current copying and scaling operations required to realize a-DFT matrices, thus resulting in an accurate and scalable architecture. The circuit design for the case of 8-point a-DFT has been validated with transistor-level simulations in the UMC 0.18  $\mu\text{m}$  CMOS process.

## I. INTRODUCTION

Millimeter-wave (mmW) frequency bands are expected to be an important component of emerging 5G wireless communication networks. Radio propagation is much more directional at these frequencies than in today's sub-6 GHz bands, and path losses are also much higher [1]. Such losses can be compensated by increasing antenna gain; in particular, by using antenna arrays to form multiple sharp steerable beams under algorithmic control. Thus, wideband multi-beam beamforming is necessary in order for mmW-based wireless networks to deliver the significant improvements in channel capacities and data rates required for 5G applications [1]–[3].

In earlier work [4], we have proposed analog current-mode circuits that generate multiple baseband beams by using approximate discrete Fourier transforms (a-DFTs). Such a-DFT matrices are designed to closely approximate the exact DFT while only using small integer coefficients (e.g., 0,  $\pm 1$ , and  $\pm 2$ ), thus allowing them to be efficiently realized using current mirrors. However, recent unpublished simulation and experimental results have shown that threshold-voltage mismatches between the transistors in each mirror result in significant errors in the resulting beam patterns. In particular, the peak magnitudes and sidelobe levels of the beams were found to be sensitive to these mismatches, while the beam directions were relatively unaffected.

In this paper, we use an offset-canceling technique to significantly improve the accuracy of each mirror, and thus resulting beam patterns. In particular, we use dynamic current mirrors (DCMs) to improve the accuracy of both current copying and splitting operations.



Fig. 1. Overall architecture of the current-mode multi-beam beamformer.

## II. MULTI-BEAM BEAMFORMING ARCHITECTURE

The overall structure of the proposed multi-beam beamforming system is shown in Fig. 1. The signal flows of the system in the receive mode are shown. The first stage is a uniformly-spaced linear antenna array which has  $N$  elements and receives incoming waves with an incident angle  $\theta$  and outputs the signal set  $\mathbf{a} = [a_j]_{N \times 1}$ . The second stage is a signal pre-processing array in which each element contains an RF chain which has the structure shown on the right of Fig. 1; every element is fed the  $\mathbf{a}$ , then after the quadrature hybrid (QH) block it splits the  $\mathbf{a}$  into two signals,  $\mathbf{a}_I = [a_{I,j}]_{N \times 1}$  and  $\mathbf{a}_Q = [a_{Q,j}]_{N \times 1}$  across the I and Q channels separately; then after going across the low noise amplifier (LNA) and voltage-to-current converter (V/I), the signal set converts from  $\mathbf{a}_I, \mathbf{a}_Q$  to  $\mathbf{v}_I = [v_{I,j}]_{N \times 1}, \mathbf{v}_Q = [v_{Q,j}]_{N \times 1}$  and then to  $\mathbf{i}_I = [i_{I,j}]_{N \times 1}, \mathbf{i}_Q = [i_{Q,j}]_{N \times 1}$ , where  $i$  and  $v$  refer to current and voltage, respectively. The third stage is a current-mode  $N$ -point a-DFT array which is what we will focus later in this paper. The a-DFT operation for a real signal will produce a complex output, thus in this stage we use this circuit to calculate the

real  $\mathbf{I}_{QRe} = [Re\{I_Q, j\}]_{N \times 1}$ ,  $\mathbf{I}_{IRe} = [Re\{I_I, j\}]_{N \times 1}$  and imaginary  $\mathbf{I}_{QIm} = [Im\{I_Q, j\}]_{N \times 1}$ ,  $\mathbf{I}_{IIm} = [Im\{I_I, j\}]_{N \times 1}$  output signals across I and Q channels separately. Finally we use the four sets of output signals to obtain a-DFT $\{\mathbf{i}_Q\} = \mathbf{I}_{QRe} + j \cdot \mathbf{I}_{QIm}$  and a-DFT $\{\mathbf{i}_I\} = \mathbf{I}_{IRe} + j \cdot \mathbf{I}_{IIm}$ . Later circuits use these two sets to process the beam-formed signal; however, these circuits will not be discussed here.

### A. The Spatial a-DFT Operation

The a-DFT operation is the most critical part of the multi-beam former. An  $N$ -point DFT can be described by an  $N \times N$  linear transform (LT). Performing the DFT operation across the signals obtained from a uniformly-spaced linear antenna array produces an orthogonal set of  $N$  beams with each corresponding to a particular direction, thus resulting in a multi-beam beamformer. Earlier analog realizations of the DFT [5]–[8] have suffered from errors in realizing the irrational coefficient values (e.g.,  $1/\sqrt{2}$ ), which degrade performance as  $N$  increases. The proposed a-DFT approach solves this issue by using LT approximation methods to realize DFT-like transforms with small integer coefficients [9]–[11]. In this paper, we will focus on the  $N = 8$  case, although our results are easily generalized to larger  $N$ . A suitable 8-point a-DFT matrix, which satisfies  $\mathbf{I}_{8 \times 1} = \hat{\mathbf{F}}_8 \cdot \mathbf{i}_{8 \times 1}$ , is given by [4]

$$\hat{\mathbf{F}}_8 = \begin{bmatrix} 2 & 2 & 2 & 2 & 2 & 2 & 2 & 2 \\ 2 & 1 & j & 2j & 1 & j & 2 & 1+j \\ 2 & 2j & 2 & 2j & 2 & 2j & 2 & 2j \\ 2 & 1 & j & 2j & 1 & j & 2 & 1+j \\ 2 & 2 & 2 & 2 & 2 & 2 & 2 & 2 \\ 2 & 1+j & 2j & 1+j & 2 & 1 & j & 2j \\ 2 & 2j & 2 & 2j & 2 & 2j & 2 & 2j \\ 2 & 1+j & 2j & 1+j & 2 & 1 & j & 2j \end{bmatrix}. \quad (1)$$

Some properties of the matrix  $\hat{\mathbf{F}}_8$  can be used to simplify the circuit realization. It is the sum of real and imaginary parts, each with coefficients limited to  $\{0, \pm 1, \pm 2\}$  which greatly simplifies circuit design. In addition, the sum of the absolute values of the coefficients of each column is  $2 \times 8$ . This implies that for every input signal, say  $i_{I,j}$  or  $i_{Q,j}$  in Fig. 1, we should generate the same 16 copies of it. In the next subsection, we discuss how to accurately generate these copies in order to perform the a-DFT operation in circuit form.

### B. Current-Mode Circuit Realization

As shown in Fig. 2, we use a current-mode circuit for the a-DFT. This circuit contains two main structures: the first is a current copier array which generates 16 copies of every input signal, say  $i_j$ ; the second is an operation circuit that uses a single current mirror to perform addition and subtraction when fed with copies of  $i_j$ . Fig. 2 shows how to calculate the second entry of the output vector; since  $I_{2+,Re} = 2 \cdot i_1 + i_2 + i_8$  and  $I_{2-,Re} = i_4 + 2 \cdot i_5 + i_6$ , then using Kirchoff's current law (KCL) at the output node we get  $I_{2,Re} = I_{2+,Re} - I_{2-,Re}$ , which is part of the a-DFT. Multiple similar circuits share the same current copier array to realize the entire a-DFT operation.



Fig. 2. Block diagram of the proposed a-DFT-based beamforming circuit.

### III. REALIZATION USING DYNAMIC CURRENT MIRRORS

The accuracy and bandwidth of the a-DFT circuit is mainly limited by the current mirrors. The accuracy of current mirrors is degraded by both systematic and random mismatches between the input and output transistors. The most important systematic mismatch is due to finite output impedance (i.e., the Early effect), which can be greatly reduced by cascoding. However, a threshold voltage mismatch of  $\sigma_{Vth}$  results in a random fractional current error of  $\sigma_I/\bar{I} = \sigma_{Vth}/V_L$  where  $\bar{I}$  is the average current and  $V_L = \bar{I}/G_m$  is the linear range of the transistors. Reducing such errors by increasing  $V_L$  (e.g., by increasing the gate overdrive) is of limited benefit because of the rapid increase in i) bias current  $\bar{I}$  (which increases power consumption), and ii) saturation voltage  $V_{DSAT}$  (which makes low-voltage operation more difficult). Alternatively,  $\sigma_{Vth}$  can be reduced by i) careful layout (e.g., using a common centroid configuration), and ii) increasing the gate area  $WL$  (since  $\sigma_{Vth} \propto 1/\sqrt{WL}$ ). However, the latter strategy increases the gate-source capacitance  $C_{gs} \propto WL$  and reduces the bandwidth, which is  $\propto G_m/C_{gs}$ . Thus, alternative ways to increase mirror accuracy are highly desirable.

#### A. Dynamic Current Mirror

The DCM is a well-known discrete-time method for realizing accurate current copiers [12], [13]. As shown in Fig. 3, the simplest structure of the DCM is composed of a single transistor M1, a sampling capacitor  $C_H$  and switch S1 (shown within the dashed line) and two additional switches S2, S3. There are two working phases for this circuit: i) In phase 1, the switches S1 and S2 are closed while the S3 is open; then the capacitor  $C_H$  (which could be a real capacitor in the circuit or just the gate capacitance of M1) will try to sample the gate voltage of M1. It is obvious that when the circuit enters into a stable state, the voltage across  $C_H$  will reach a value that forces the M1 to produce the same current as the input current  $I_{in}$ . ii) In phase 2, the switch S3 is closed while S1 and S2 are open; then the capacitor  $C_H$  will try to hold the

voltage sampled during the phase 1, which keeps forcing M1 to produce the same current as in phase 1 and deliver this current to the output terminal. From the workings of the DCM,



Fig. 3. The basic structure of a dynamic current mirror (DCM).

we find that in this circuit the mirror transistor uses its own gate voltage (sampled during the previous phase) to generate current in a given phase. By contrast, in a traditional current mirror, the gate voltage is copied from another transistor, which results in the threshold mismatch problem described earlier. Thus,  $\sigma_{V_{th}}$  has no influence on the DCM; the switches effectively act as a chopper that transfers most of the mismatch energy from DC to harmonics of the clock frequency. Thus, the DCM is a potentially useful circuit to adopt for highly accurate applications of current mirrors.

#### B. Extensions of the Dynamic Current Mirror

In order to make the DCM more practical for circuit design and especially for realizing the a-DFT operation, a few modifications are needed. In particular, we would like the DCM to i) output a constant current instead of just during one phase; ii) have high output impedance; iii) produce an output current that is proportional to the input (i.e., act as a current scaler and not just a 1:1 copier); and iv) copy the input current to any required number of output currents (i.e., act as a multi-output current copier). Fig. 4 shows the circuit used to realize the requirements listed above. In order to obtain a large output impedance, a cascode structure is introduced in the DCM, i.e., in every branch the transistors  $T_{mk}$  and  $T_{ck}$  play the roles of the common-source (CS) stage and common-gate (CG) stage separately. Moreover, using at least two branches within the DCM keeps the output current constant by alternatively changing the phase of different branches. In addition, because there is only one branch of the DCM that needs to be in the first phase at any given time, adding more branches enables us to realize ratio operations. The routing 1 shown in Fig. 4 results in the ratio-mode structure of the DCM. In this mode only one branch is in phase 1 which means another  $k$  branches are in phase 2. Thus, there are  $k$  branches in phase 2 that output current. After connecting them together, we get a total output current  $I_{out} = k \cdot I_{in}$ . The routing 2 shown in Fig. 4



Fig. 4. Improved DCM circuit used to realize the proposed beamformer.

gives the copy-mode structure of the DCM. This mode is quite similar to the ratio mode; the only difference is that every branch has its own output terminal. Thus, we can get  $k$  output currents  $I_{outi} = I_{in}$ ,  $i = 1, 2, \dots, k$ .

#### C. Moving Average Filter

In order to realize the ratio and copy modes of the DCM, we need a  $1:N$  scaler and copier. We use the circuit structure shown in Fig. 4 which is equivalent to introducing a moving average filter (MAF). Therefore some basic analysis of MAF is necessary. The idea of the MAF is very simple and the one used in the  $1:N$  DCM can be expressed as:

$$y_k[n] = \frac{1}{k} \sum_{m=1}^k x[n-m] \quad (2)$$

Using the discrete-time Fourier transformation (DTFT), we find the frequency and impulse responses of this system:

$$H_k(\omega) = \frac{1}{k} \sum_{n=1}^k e^{-j\omega n} = e^{-j\omega} \cdot \frac{1 - e^{-jk\omega}}{1 - e^{-j\omega}},$$

$$h_k[n] = \frac{1}{k} \sum_{m=1}^k \delta[n-m]. \quad (3)$$

Here we focus on the frequency response of the system. As shown in Fig. 5, the bandwidth of the filter is mainly decided by the first lobe. As the value of  $k$  increases from 2 to 16, the width (from  $\omega = 0$  to the first zero-crossing point) of the first lobe decreases by the same ratio. We can calculate the 3 dB-bandwidth of the filter, and the result is that  $f_{H_2} = 0.333 \cdot f_{sample}$ ,  $f_{H_4} = 0.154 \cdot f_{sample}$  and  $f_{H_{16}} = 0.038 \cdot f_{sample}$ , where  $f_{sample}$  is the sampling frequency.

#### D. 3-dB Bandwidth of the $1:N$ DCM Copier

Taking the 8-point a-DFT as an example, a set of 8 1:16 signal copiers are needed by the algorithm. If we only use a single type of MAF to do the job, there are three different



Fig. 5. Expected magnitude of the frequency response  $H_k[\omega]$  for different values of  $k$ .

possibilities: 1) using a single 1:16 DCM copier (MAF with  $k = 16$ ); 2) using five ( $4^0 + 4^1$ ) 1:4 copier (MAF with  $k = 4$ ) connected in series, resulting in a 2-stage system; 3) using 15 ( $2^0 + 2^1 + 2^2 + 2^3$ ) 1:2 copiers (MAF with  $k = 2$ ) connected in series, resulting in a 4-stage system. Finally, the frequency response of the 1:16 copier can be written as:

$$S_{N,k}(\omega) = \prod_{\log_k N} H_k(\omega), \quad (4)$$

where  $N$  is the number of currents to be copied and  $k$  is the MAF parameter. Then we can use (4) to calculate the 3-dB bandwidth for the 1:16 DCM copier:  $f_{S_{16,16}} = 0.038 \cdot f_{sample}$ ,  $f_{S_{16,4}} = 0.114 \cdot f_{sample}$ ,  $f_{S_{16,2}} = 0.184 \cdot f_{sample}$ . Thus, moving from a 1-stage splitter to a distributed 4-stage splitter improves bandwidth by  $4.8 \times$ .

The analysis above shows that the performance of the 1: $N$  DCM copier using MAF is limited by  $N$ ,  $k$ , and  $f_{sample}$ , which bound its applications in large-scale circuits (large  $N$ ) and high-speed circuits (bandwidth close to  $f_{sample}$ ).

#### IV. SIMULATION RESULTS

Transistor-level simulations was carried out in the UMC 180 nm standard CMOS process with a power supply voltage  $V_{DD} = 1.8$  V and a sampling frequency of  $f_{sample} = 1$  GHz for all blocks.

Fig. 6 shows simulation results of a single 1:16 current copier with a DC bias current of  $40 \mu\text{A}$  and a sinusoidal input current with an amplitude of  $5 \mu\text{A}$ . The simulated bandwidth of the copier using a single 1:16 DCM ( $BW \approx 40$  MHz) and using 4 stages 1:2 DCM ( $BW \approx 170$  MHz) are in good agreement with the values calculated in the last section ( $BW = 38$  MHz and  $BW = 184$  MHz).

Fig. 7 shows the simulated beam patterns for the whole system using different 1:16 copier structures. They are both in excellent agreement with theoretical a-DFT beam shapes, which in turn are similar to the exact DFT.

Preliminary 5G usage plans for the 28 GHz licensed band call for 5 channels, each 160 MHz wide, to cover the entire available bandwidth of 0.80 GHz. We therefore target an operating bandwidth of  $> 160$  MHz for the beamformer. From our simulation results above, this target is achieved for a relatively small value of  $N$ , wherein  $N$  is the parameter of the  $N$ -point a-DFT operation and is equal to 8.



Fig. 6. Simulated frequency response of the 1:1 DCM and the 1:16 current copier with different structures at a clock frequency  $f_{sample} = 1$  GHz.



Fig. 7. Simulated output beam patterns for the 8-point a-DFT-based beamformer circuit. **Top:** the beam patterns using a single 1:16 DCM as the current copier at input frequency of 10 MHz. **Bottom:** the beam patterns using the 4 stages 1:2 DCM as the current copier at input frequency of 130 MHz.

#### V. CONCLUSION

We have described a scalable architecture for realizing low-complexity multi-beam beamformers for baseband processing within mm-wave wireless transceivers. The structure uses dynamic current mirrors to eliminate static offsets between the mirrors, thus resulting in accurate and robust beam patterns. An 8-beam version of this architecture has been successfully implemented and simulated at the transistor level, and is currently being laid out. In future work, we will design larger versions of the network (e.g., that generate 16 or 32 beams).

#### REFERENCES

- [1] S. Sun, T. S. Rappaport, R. W. Heath, A. Nix, and S. Rangan, “MIMO for millimeter-wave wireless communications: beamforming, spatial multiplexing, or both?” *IEEE Communications Magazine*, vol. 52, no. 12, pp. 110–121, December 2014.
- [2] R. W. Heath, N. Gonzalez-Prelcic, S. Rangan, W. Roh, and A. M. Sayeed, “An overview of signal processing techniques for millimeter wave MIMO systems,” *IEEE Journal of Selected Topics in Signal Processing*, vol. 10, no. 3, pp. 436–453, April 2016.

- [3] R. Mndez-Rial, C. Rusu, N. Gonzlez-Prelcic, A. Alkhateeb, and R. W. Heath, "Hybrid MIMO architectures for millimeter wave communications: Phase shifters or switches?" *IEEE Access*, vol. 4, pp. 247–267, 2016.
- [4] V. Ariyathna, S. Kulasekera, A. Madanayake, K.-S. Lee, D. Suarez, R. J. Cintra, F. M. Bayer, and L. Belostotski, "Multi-beam 4 GHz microwave apertures using current-mode DFT approximation on 65 nm CMOS," 2015, iEEE International Microwave Symposium (IMS).
- [5] M. Lehne and S. Raman, "An analog/mixed-signal FFT processor for wideband OFDM systems," in *Sarnoff Symposium, 2006 IEEE*, March 2006, pp. 1–4.
- [6] ——, "A prototype analog/mixed-signal fast Fourier transform processor IC for OFDM receivers," in *Radio and Wireless Symposium, 2008 IEEE*, Jan 2008, pp. 803–806.
- [7] E. Afshari, H. Bhat, and A. Hajimiri, "Ultrafast analog Fourier transform using 2-D LC lattice," *IEEE Transactions on, Circuits and Systems I: Regular Papers*, vol. 55, no. 8, pp. 2332–2343, Sept 2008.
- [8] A. Farahmand and M. R. Zahabi, "An energy efficient, high speed analog FFT processor for MB-OFDM UWB receivers," in *International Congress on Technology, Communication and Knowledge (ICTCK)*, Nov 2014, pp. 1–6.
- [9] D. Suarez, R. J. Cintra, F. M. Bayer, A. Sengupta, S. Kulasekera, and A. Madanayake, "Multi-beam RF aperture using multiplierless FFT approximation," *Electronics Letters*, vol. 50, no. 24, pp. 1788–1790, 2014.
- [10] U. S. Potluri, A. Madanayake, R. J. Cintra, F. M. Bayer, S. Kulasekera, and A. Edirisuriya, "Improved 8-point approximate DCT for image and video compression requiring only 14 additions," *IEEE Transactions on Circuits and Systems I*, vol. 61, no. 6, pp. 1727–1740, 2014.
- [11] C. J. Tablada, F. M. Bayer, and R. J. Cintra, "A class of DCT approximations based on the Feig-Winograd algorithm," *Signal Processing*, 2015.
- [12] G. Wegmann and E. A. Vittoz, "Analysis and improvements of accurate dynamic current mirrors," *IEEE Journal of Solid-State Circuits*, vol. 25, no. 3, pp. 699–706, Jun 1990.
- [13] S. J. Daubert and D. Vallancourt, "Operation and analysis of current copier circuits," *IEE Proceedings G - Circuits, Devices and Systems*, vol. 137, no. 2, pp. 109–115, Apr 1990.