skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on February 16, 2026

Title: A 4GS/s Fully Analog 256×256 MP-Based Cross-Correlator with 1000TOPS/W Compute Efficiency and 1.3TOPS/mm2 Compute Density in 22nm SOI CMOS
Multi-lag cross-correlations (X-Corr) are essential building blocks in radar and communication for range/velocity detection and synchronization. Performing X-corrs necessitates efficient delay and correlation blocks. Traditionally, high bandwidth X-corr is performed using high-speed ADCs followed by digital multiply-and-accumulates (MACs). However, 5–20 TOPS/W X-Corr efficiencies lead to 0.1-1W per cross-correlator, limiting deployability in power-constrained applications. Alternatively, to realize X-corr using prior single-lag analog correlators, wideband analog delays (>10ns delays with 4GHz BW) should be integrated on chip to enable multiple lags. Furthermore, replicating N analog correlators, leads to an impractical chip area. Therefore, practical analog X-Corr requires: (i) high input bandwidths, (ii) long correlation length, N for high signal processing gain (SPG=10log10(N)), (iii) high compute-efficiency (>100 TOPS/W) with compute accuracy compared to digital MACs (>7-bit), (iv) single-shot readout across all N X-corr lags in a compact area. In this work, we leverage a sampling-based approach to create large analog delays and area/power-efficient four-transistor analog compute cell to present a margin-propagation (MP) based fully-analog X-Corr compute engine in 22nm SOI-CMOS achieving: (i) 1-4GS/s input, (ii) single-shot 256-length X-Corrs across all 256 lags resulting in a 256x256 X-correlator, 8.2-8.5 bit compute accuracy or hardware dynamic range (HDR) of 51-53dB, (iii) high compute efficiency of 996–1060 TOPS/W (6.6x better than SoA), (iv) high compute density of 1.3 TOPS/mm2 (7x better than SoA). We also demonstrate an X-band code-domain radar with a range resolution of 15cm across 256 range bins, supporting up to 1024 chirp averages with a 115Hz refresh rate.  more » « less
Award ID(s):
2425444
PAR ID:
10618164
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
ISSN:
2376-8606
ISBN:
979-8-3315-4101-9
Page Range / eLocation ID:
448 to 450
Subject(s) / Keyword(s):
Analog computing, approximate computing, correlation, cross-correlators, inner-product, multiplier-free, RF sensing, code-domain radars
Format(s):
Medium: X
Location:
San Francisco, CA, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. This work presents the first resistive random access memory (RRAM)-based compute-in-memory (CIM) macro design tailored for genome processing. We analyze and demonstrate two key types of genome processing applications using our developed CIM chip prototype: the state-of-the-art (SOTA) burrows–wheeler transform (BWT)-based DNA short- read alignment and alignment-free mRNA quantification. Our CIM macro is designed and optimized to support the major functions essential to these algorithms, e.g., parallel XNOR operations, count, addition, and parallel bit-wise and operations. The proposed CIM macro prototype is fabricated with monolithic integration of HfO2 RRAM and 65-nm CMOS, achieving 2.07 TOPS/W (tera-operations per second per watt) and 2.12 G suffixes/J (suffixes per joule) at 1.0 V, which is the most energy-efficient solution to date for genome processing. 
    more » « less
  2. This paper proposes a real-size, single-shot, high-speed, and energy-efficient tensorized optical multimodal fusion network (TOMFuN) on an electro-photonic large-scale III–V-on-Si in-memory compute engine. The TOMFuN architecture leverages a memory-efficient and low-complexity self-attention for the embedding network for the text information and tensor-train and CANDECOMP/PARAFAC decompositions for compressing the model parameters in the large-scale fully connected layers. Compared to full-size counterparts, our proposed network maintains a compatible inference accuracy in multimodal sentiment analysis tasks while requiring 92.8× fewer model parameters and 51.3× fewer hardware resources. Furthermore, the impact of photonic device imperfections on the TOMFuN architecture is investigated. The simulation results show that noise-aware on-chip training exhibits superior robustness. Finally, chip performance analysis shows that our TOMFuN inference accelerator has 230.73 PetaOps computational speed, 6.51 TOPS/W power efficiency, and 2.7 µs latency with the input dimensions of 1024. 
    more » « less
  3. Abstract We present reverberation mapping measurements for the prominent ultraviolet broad emission lines of the active galactic nucleus Mrk 817 using 165 spectra obtained with the Cosmic Origins Spectrograph on the Hubble Space Telescope. Our ultraviolet observations are accompanied by X-ray, optical, and near-infrared observations as part of the AGN Space Telescope and Optical Reverberation Mapping Program 2 (AGN STORM 2). Using the cross-correlation lag analysis method, we find significant correlated variations in the continuum and emission-line light curves. We measure rest-frame delayed responses between the far-ultraviolet continuum at 1180 Å and Ly α λ 1215 Å ( 10.4 − 1.4 + 1.6 days), N v λ 1240 Å ( 15.5 − 4.8 + 1.0 days), Si iv + ]O iv λ 1397 Å ( 8.2 − 1.4 + 1.4 days), C iv λ 1549 Å ( 11.8 − 2.8 + 3.0 days), and He ii λ 1640 Å ( 9.0 − 1.9 + 4.5 days) using segments of the emission-line profile that are unaffected by absorption and blending, which results in sampling different velocity ranges for each line. However, we find that the emission-line responses to continuum variations are more complex than a simple smoothed, shifted, and scaled version of the continuum light curve. We also measure velocity-resolved lags for the Ly α and C iv emission lines. The lag profile in the blue wing of Ly α is consistent with virial motion, with longer lags dominating at lower velocities, and shorter lags at higher velocities. The C iv lag profile shows the signature of a thick rotating disk, with the shortest lags in the wings, local peaks at ±1500 km s −1 , and a local minimum at the line center. The other emission lines are dominated by broad absorption lines and blending with adjacent emission lines. These require detailed models, and will be presented in future work. 
    more » « less
  4. null (Ed.)
    ABSTRACT Future generations of radio interferometers targeting the 21 cm signal at cosmological distances with N ≫ 1000 antennas could face a significant computational challenge in building correlators with the traditional architecture, whose computational resource requirement scales as $$\mathcal {O}(N^2)$$ with array size. The fundamental output of such correlators is the cross-correlation products of all antenna pairs in the array. The FFT-correlator architecture reduces the computational resources scaling to $$\mathcal {O}(N\log {N})$$ by computing cross-correlation products through a spatial Fourier transform. However, the output of the FFT-correlator is meaningful only when the input antenna voltages are gain- and phase-calibrated. Traditionally, interferometric calibration has used the $$\mathcal {O}(N^2)$$ cross-correlations produced by a standard correlator. This paper proposes two real-time calibration schemes that could work in parallel with an FFT-correlator as a self-contained $$\mathcal {O}(N\log {N})$$ correlator system that can be scaled to large-N redundant arrays. We compare the performance and scalability of these two calibration schemes and find that they result in antenna gains whose variance decreases as 1/log N with increase in the size of the array. 
    more » « less
  5. UV and optical continuum reverberation mapping is a powerful tool for probing the accretion disk and inner broad-line region. However, recent reverberation mapping campaigns in the X-ray, UV, and optical have found lags consistently longer than those expected from the standard disk reprocessing picture. The largest discrepancy to date was recently reported in Mrk 335, where UV/optical lags are up to 12 times longer than expected. Here, we perform a frequency-resolved time lag analysis of Mrk 335, using Gaussian processes to account for irregular sampling. For the first time, we compare the Fourier frequency-resolved lags directly to those computed using the popular interpolated cross-correlation function method applied to both the original and detrended light curves. We show that the anticipated disk reverberation lags are recovered by the Fourier lags when zeroing in on the short-timescale variability. This suggests that a separate variability component is present on long timescales. If this separate component is modeled as reverberation from another region beyond the accretion disk, we constrain a size scale of roughly 15 lt-days from the central black hole. This is consistent with the size of the broad-line region inferred from Hβreverberation lags. We also find tentative evidence for a soft X-ray lag, which we propose may be due to light travel time delays between the hard X-ray corona and distant photoionized gas that dominates the soft X-ray spectrum below 2 keV. 
    more » « less