

# Reconfigurable Matched Filtering Using Wideband Margin-Computing Correlators: Applications in Radar and Communication Signal Processing

Kareem Rashed<sup>1</sup>, Aswin Chowdary Undavalli<sup>2</sup>, Shantanu Chakrabarty<sup>2</sup>, Arun Natarajan<sup>1</sup> and Aravind Nagulu<sup>2</sup>

<sup>1</sup>*School of Electrical Engineering and Computer Science, Oregon State University, Corvallis, OR, USA*

<sup>2</sup>*Department of Electrical and Systems Engineering, Washington University in St. Louis, MO, USA*

{rashedk, nataraja}@oregonstate.edu, {a.undavalli, shantanu, nagulu}@wustl.edu

**Abstract**—Correlators are fundamental building blocks in radar and communication signal-processing applications. In this paper, a wideband RF correlator based on the margin-computing (MC) paradigm and its applications to radar and communication signal processing are presented. The proposed analog correlator replaces traditional multiply-and-accumulate correlators with analog addition-and-thresholding to enable energy-efficient correlation, making it hardware-friendly, scalable, and power-efficient. The prototype IC in 65nm CMOS supports 5GS/s inputs, a large correlation length of 1024, and 8-bit computing accuracy with a high energy efficiency of 152TOPs/J. Measured results of the baseband and direct-RF wavelet processing of impulse radar signatures across varying bandwidths and carrier frequencies are presented. Measured results of code-modulated communication systems with correlator-based signal processing for code-synchronization and code-despread are also reported.

**Index Terms**—Matched-filters, correlation, inner-product, multiplier-free, code-domain, radars, coherent demodulation.

## I. INTRODUCTION

Correlators are building blocks that lie at the heart of radar/communication signal processing. In the case of radar, the reflected signal is typically processed by a matched filter that is optimized for transmit signal characteristics. This matched filtering operation can be equivalent to a correlation with a template signal. For example, in a spread spectrum radar, the distance to a target can be measured by correlating the received signal with the transmitted template. Similarly, correlation is used extensively in wireless communication for signal detection, synchronization, and channel estimation. For instance, direct-sequence code division multiple access (DS-CDMA) systems use matched filters for selectively receiving a code-modulated signal by correlating the received signal with the target code. In all these systems, enabling correlation operation at higher frequency and larger bandwidth, while maintaining high dynamic range is a fundamental challenge. While active or passive matched filters can be utilized, correlation with programmable templates enables reconfigurable matched filtering which is desirable in several applications.

Typically, correlation, which is equivalent to an inner (dot) product, is performed using digital multiply-and-accumulate

This work was supported in part by the National Science Foundation under Award 2128535 and the DARPA Massive Cross-Correlation (DARPA MAX) program, Award Number: FA8650-23-2-7309.



Fig. 1: Correlation using analog approximation with margin-computing techniques.



Fig. 2: Conceptual diagram of Margin-Computing approach where the correlation between sequence  $X$  and sequence  $Y$  is computed by finding MC outputs for additive/subtractive combination of  $X$  and  $Y$  inputs.

(MAC) operations, with power consumption scaling with frequency, compute, and ADC power. On the other hand, analog correlators eliminate ADCs by operating directly on analog inputs, but conventional analog multiplier-based correlators do not scale well to advanced process nodes, have short correlation lengths, are sensitive to process, voltage, and temperature variations, and are not easily reconfigurable [1]–[3].

Significantly, the relationship between two signals can be perceived as a computation resembling pattern matching, where the redundancy within the ensemble imparts resilience to approximation errors. For example, a digital-domain approximation based on L1 distance for the dot-product is employed in [4], achieving  $\sim$ 100 TeraOps/s/W (TOPS/W). A similar approach is followed in [5] by adopting approximate arithmetic hardware based on digital OR/AND operation. However, these approximations still operate on digitized inputs. It's noteworthy that for correlators to be practically effective as reconfigurable matched filters in the analog domain, they must (i) accommodate long template sequences



Fig. 3: Margin-Computing approach where  $V_{out}^+$  and  $V_{out}^-$  compute margin based on ReLU function operating on inputs  $(x+y, -x-y)$  and  $(x-y, -x+y)$ .

( $>1000$ ), (ii) demonstrate enhanced accuracy with longer sequences (similar to MAC), and (iii) attain a high level of computational energy efficiency comparable to digital MAC, all while eliminating input ADCs. In this paper, we present a novel analog approach for direct-RF wideband correlation that relies on the margin-computing (MC) paradigm (Fig. 1). The results from the hardware demonstration of the wideband analog correlator are presented in [6]. In this article, the hardware implementation is reviewed and several new application-oriented measurements are presented for radar and communication signal processing.

## II. MARGIN-COMPUTING-BASED CORRELATOR

True cross-correlation ( $R_\infty$ ) between two random input sequences,  $X$  and  $Y$ , is only measurable over long sequence lengths. Rather than the conventional MAC operation, correlation can be estimated using MC functions that operate on additive and subtractive operands such as  $\pm X \pm Y$  (Fig. 2). While various options exist for MC functions, thresholding-based ReLU function can lead to low power, with the correlation,  $R_{MC}$ , given by

$$R_{MC} = G^{-1}(V_{out}^+ - V_{out}^-), \text{ where,} \\ V_{out}^\pm \text{ s.t. } \sum_{i=1}^N \text{ReLU}(|x_i \pm y_i| - V_{out}^\pm) = \gamma, \quad (1)$$

where  $x_i$  and  $y_i$  are elements of  $X$  and  $Y$ ,  $\gamma$  is a hyperparameter, and  $G$  is a monotonic one-to-one mapping function that can be learned through training/simulations, approximated by a 5<sup>th</sup>-order polynomial in this work (Fig. 2).

**Margin-Computing (MC) Analogy:** The MC approximation is analogous to a reverse water-filling problem of finding an output voltage for a given set of inputs when the total charge across all compute capacitors ( $C_c$ ),  $Q_{total}$ , is constrained, as shown in Fig. 3. Prior current-domain MC-compute circuits require bias currents that lead to high power for long correlations [7]. The charge-domain scheme to compute MC-



Fig. 4: Low-power diode-capacitor circuit to estimate correlation by performing the MC computation outlined in Fig. 3.



Fig. 5: Simulations showing proposed MC correlator computes correlation with improving accuracy as sequence length increases, with performance comparable to MAC correlators.

based correlation requires the computation of total charge-constrained  $V_{out}^+$  and  $V_{out}^-$ , given input sequences  $X$  and  $Y$  (Fig. 3).

As shown in Fig. 4, the thresholding-MC function can be realized using charge-coupled diode-capacitor circuits and the hyperparameter  $\gamma = Q_{total}/C_c$ . The four-quadrant operands ( $\pm x_i \pm y_i$ ) are applied to the diodes across the left and the right arms. At reset, the output voltages  $V_{out}^+ = V_{out}^- = V_{DD}$  and total charge  $Q_{total} = 0$ . During the compute phase, the total charge  $Q_{total} = I_{0,t}$ , the diode outputs  $V_i^\pm = \text{max}(|x_i \pm y_i|, V_{out}^\pm)$ , and the thresholding-MC condition  $\sum_{i=1}^N C_c \times \text{ReLU}(|x_i \pm y_i| - V_{out}^\pm) = Q_{total}$  is satisfied. Thus, the differential voltage  $V_{out,d} = V_{out}^+ - V_{out}^-$  estimates correlation.

For random sequences, the correlation computed using MC approximation ( $R_{MC}$ ) follows a similar error distribution to MAC schemes ( $R_{MAC}$ ), with both  $R_{MC}$  and  $R_{MAC}$  converging to  $R_\infty$  as sequence length,  $N \rightarrow \infty$ . This can be seen from Fig. 5 where the error in the predicted correlation,  $\epsilon_{MC}$ , follows similar behavior as a MAC correlator and the



Fig. 6: Schematic of MC Correlator implemented in 65nm CMOS - the IC includes a 5GS/s sampler that samples the two input sequences, and switched capacitor circuits to generate the  $\pm x \pm y$  products, along with MC computation circuit.



Fig. 7: Die photo of 65nm CMOS IC occupying  $2.2 \text{ mm}^2$ . Further area reductions are expected in upcoming prototypes.

$\epsilon_{MC} \rightarrow 0$  as  $N$  increases. In this MC-compute scheme, energy of  $E_{core} = 2Q_{total}V_{DD}$  is drawn per  $N$ -length correlation with a precision of ENOB and  $E_{sampler}$  is consumed for driving the MC-core, resulting in a high compute-efficiency of  $N(ENOB^2 + ENOB)/(E_{core} + E_{sampler}) \sim 150 \text{ TOPS/W}$ .

### III. IMPLEMENTATION IN 65NM CMOS

Fig. 6 shows the implemented low-power 5GS/s, 1024-sample correlator using the thresholding-MC operation. The RF correlator can be divided into three sub-blocks – a two-layer sampler that sequentially stores input samples on 1024 capacitors, a capacitor-stacking-based operand generator to generate the operands required for the MC-correlation estimation, and an MC correlation compute engine. The sampler supports up to 5GS/s with two-layer sampling to reduce the parasitic capacitance at the RF input node. Following sampling, in the operand generation phase, the sampling capacitors,  $C_{si}$  are stacked to generate the four quadrant operands required by the MC-correlation estimator. The sampling capacitors,  $C_{si}$  (50fF) are conservatively sized to ensure charge leakage during sampling and operand generation does not impact

overall compute error. Finally, the operands are applied to the MC-core in the compute phase to calculate the cross-correlation between the input sequences. The computation speed is determined by the charging rate of the compute cap,  $C_c$  (25fF) which is selected conservatively at 10x the lower limit for 8-bit ENOB based on process mismatches and noise simulations. A 50 $\mu$ A cascode current source imposes the MC constraint, balancing speed and power trade-offs.

## IV. MEASUREMENT RESULTS

The IC is implemented in 65nm CMOS with the MC-core power consumption of 1.2mW at 5GS/s with an area of 0.97mm<sup>2</sup>. The IC die photo is shown in Fig. 7.

### A. MC Correlator Performance

Measured MC correlator performance is shown in Fig. 8. For periodic inputs, errors in correlation computation arise from the hardware MC implementation,  $\epsilon_{HW}$ , which includes the MC-approximation and noise/mismatch (hardware-dynamic-range, HDR =  $20\log(1/\epsilon_{HW})$  ). Accordingly, an HDR=50.3dB (i.e., ENOB = 8.06 bits) is measured when correlating two phase-shifted sinusoidal inputs. For random inputs, computation errors are dominated by errors due to finite length of input sequences  $\epsilon_{len} \sim 1/\sqrt{N}$  with  $\epsilon_{len} \rightarrow 0$  for larger sequence lengths. Measurements using random 5GS/s input sequences with known correlations, show the measured correlation tracking  $R_\infty$  with a computation error of -30.7dB. Correlator scalability measurements are shown in Fig. 8. The measured higher accuracy with increasing sequence length demonstrates the feasibility of the proposed approach for longer correlations. Measurements across  $\pm 20\%$  supply voltage demonstrate robustness to voltage variations.

### B. MC Correlator in Radar Systems

The feasibility of using the MC correlator as a reconfigurable matched filter is demonstrated through system-level



Fig. 8: With sinusoid inputs, measured MC correlation ( $R_{MC}$ ) demonstrates 8-bit performance. Measurements across random sequences with different sequence lengths show measured performance matching theory/simulations. Charge-domain MC compute is robust to supply voltage variations.



Fig. 9: Radar system measurements. Input radar pulse of (a) 7.3 MHz, (b) 1.25 GHz, (c) 2.5 GHz bandwidth, and (d) modulated 7.3 MHz BW pulse are correlated with pulse template.

measurements of radar signal detection, code domain processing and QAM signal demodulation. In radar system measurements, shown in Fig. 9, a radar pulse is sampled at 5 GS/s and correlated against time-shifted pulse templates. A raised-cosine radar pulse with a roll-off factor of 0.5 is adopted. Fig. 9(a) shows the measured correlation for a base-band radar pulse of 7.3 MHz BW buried under noise (SNR=0 dB). In Fig. 9(b) and (c), wider base-band pulses of 1.25 GHz BW and 2.5 GHz BW, respectively, and 0 dB SNR are used. In Fig. 9(d), a 7.3 MHz BW pulse modulated by a 1 GHz carrier is correlated with the template pulse. The measured correlation

demonstrates the ability of the proposed MC correlator to use the correlation with a template to create reconfigurable matched filtering in the analog/RF domain.

#### C. MC Correlator in Code-Modulated Communication Systems

The applications of the MC correlator in code-modulation communication systems are demonstrated in two scenarios (i) code-synchronization, and (ii) code despreading and data reception. Fig. 10 depicts code-domain signal processing using the proposed MC correlator for code-synchronization. In Fig. 10(a), an input PN code at 2.5 GHz chip rate with



Fig. 10: Code-domain measurements. (a) Measurement for 2.5 Gb/s PN code correlated with time-shifted versions. (b) Code-domain selectivity measurements.



Fig. 11: 16-QAM demodulation measurements with (a) 1 GHz constellation, and (b) 2 GHz constellation.

SNR= 0 dB is sampled at 5 GS/s and correlated with delayed versions of the PN code. Measurements show the expected zero-lag impulse response. Code-domain selectivity with the MC correlator is shown in Fig. 10(b) where an RX signal with two PN codes at a 2.5 GHz chip rate is sampled at 5 GS/s and correlated with one of the codes. Measurements across target code delays show target code selection with the rejection of unwanted code.

In Fig. 11, a code despreading application is demonstrated by employing the MC correlator in a coherent 16-QAM signal demodulation. In these measurements, a noisy 16-QAM

TABLE I: Performance comparison with the state-of-the-art analog correlators.

|                                           | A.R. Javed<br>BCTM 16         | Q. Wu<br>TMTT23         | V. Mangal<br>JSSC20        | This Work           |
|-------------------------------------------|-------------------------------|-------------------------|----------------------------|---------------------|
| Tech.                                     | 130nm NPN                     | 1um GaAs HBT            | 65nm CMOS                  | 65nm CMOS           |
| Correlation Mechanism                     | Analog Multiply and Integrate | Pumped T-line           | Analog Time Domain         | ReLU-MC             |
| Input Sampling Rate                       | 33 GS/s                       | 0.6-1.1GS/s             | 100 S/s                    | 5 GS/s              |
| Input 1, Input 2                          | Analog, <sup>†</sup> 1 bit    | Analog, <sup>†</sup> LO | Analog, <sup>†</sup> 1 bit | Analog, Analog (8b) |
| Code Length                               | 1                             | 8                       | 11                         | 1024                |
| Power                                     | 123 mW                        | 1288 mW                 | 40 nW *                    | 1.2 mW **           |
| Correlator FoM (pJ/Corr/Len)              | 3.7                           | 293                     | 400                        | 0.24                |
| Area (mm <sup>2</sup> )                   | 0.15 (est)                    | 78.61                   | 0.09                       | 0.97                |
| Compute Area/Corr. Len (mm <sup>2</sup> ) | 0.15                          | 39.304                  | 0.0082                     | 0.00095             |

<sup>†</sup> Dynamic range not reported

\* For baseband correlator

\*\* For compute core

signal (SNR=0dB) is demodulated by correlating with the in-phase (I) and quadrature (Q) components of the carrier. The reconstructed 1 GHz, and 2 GHz 16-QAM constellations, shown in Fig. 11(a) and (b), show an EVM of 5.4% and 9.6%, respectively. These measurements show how the MC correlator can be applied across signal processing applications, while providing an efficient reconfigurable filtering/detection in analog. The MC correlator performance is summarised and compared to the state-of-the-art analog correlators in Table I.

## V. CONCLUSIONS

This paper presented an instantaneously wideband charge-domain correlator operating using the margin-computing paradigm. System-level measurements employing the MC-based correlator in radar signal processing, code-synchronization, and code-despreadening in code-modulated communication systems demonstrate reconfigurable matched filtering operation in the analog/RF domain.

## ACKNOWLEDGMENT

The authors would like to thank Dr. James Wilson and his team from DARPA for valuable feedback.

## REFERENCES

- [1] V. Mangal and P. R. Kinget, "Clockless, continuous-time analog correlator using time-encoded signal processing demonstrating asynchronous CDMA for wake-up receivers," *IEEE JSSC*, 2020.
- [2] Q. Wu, X. Zou, and Y. E. Wang, "Monolithically integrated directly pumped time-varying transmission lines," *IEEE T-MTT*, 2022.
- [3] A. R. Javed, J. Scheytt, and U. vd Ahe, "Linear ultra-broadband NPN-only analog correlator at 33 Gbps in 130 nm SiGe BiCMOS technology," in *IEEE BCTM*, 2016, pp. 78-81.
- [4] Y. He *et al.*, "7.3 a 28nm 38-to-102-TOPS/W 8b multiply-less approximate digital SRAM compute-in-memory macro for neural-network inference," in *IEEE ISSCC*, 2023.
- [5] D. Wang *et al.*, "DIMC: 2219TOPS/W 2569F2/b digital in-memory computing macro in 28nm based on approximate arithmetic hardware," in *IEEE ISSCC*, 2022.
- [6] K. Rashed, A. Undavalli, S. Chakrabarty, A. Nagulu, and A. Natarajan, "A Scalable and Instantaneously Wideband 5GS/s RF Correlator Based on Charge Thresholding Achieving 8-bit ENOB and 152 TOPS/W Compute Efficiency," in *IEEE ISSCC*, 2024.
- [7] M. Gu and S. Chakrabarty, "Bias-scalable inner-product approximation circuit using analog margin propagation," in *IEEE MWSCAS*, 2013.