skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multi-Channel FFT Architectures Designed via Folding and Interleaving
Computing the FFT of a single channel is well understood in the literature. However, computing the FFT of multiple channels in a systematic manner has not been fully addressed. This paper presents a framework to design a family of multi-channel FFT architectures using folding and interleaving. Three distinct multi-channel FFT architectures are presented in this paper. These architectures differ in the input and output preprocessing steps and are based on different folding sets, i.e., different orders of execution.  more » « less
Award ID(s):
1954749
PAR ID:
10379848
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2022 IEEE International Symposium on Circuits and Systems (ISCAS)
Page Range / eLocation ID:
142 to 146
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper addresses the design of a partly-parallel cascaded FFT-IFFT architecture that does not require any intermediate buffer. Folding can be used to design partly-parallel architectures for FFT and IFFT. While many cascaded FFT-IFFT architectures can be designed using various folding sets for the FFT and the IFFT, for a specified folded FFT architecture, there exists a unique folding set to design the IFFT architecture that does not require an intermediate buffer. Such a folding set is designed by processing the output of the FFT as soon as possible (ASAP) in the folded IFFT. Elimination of the intermediate buffer reduces latency and saves area. The proposed approach is also extended to interleaved processing of multi-channel time-series. The proposed FFT-IFFT cascade architecture saves about N/2 memory elements and N/4 clock cycles of latency compared to a design with identical folding sets. For the 2-interleaved FFT-IFFT cascade, the memory and latency savings are, respectively, N/2 units and N/2 clock cycles, compared to a design with identical folding sets. 
    more » « less
  2. Power consumption is one of the significant challenges in millimeter wave (mmWave) systems due to the need to support wide bandwidths and large numbers of antennas. This paper explores energy efficient implementations of the baseband trans-receiver components for a multi-carrier 3GPP New Radio (NR) system. The analysis considers key components including channel selection filters, digital beamforming and FFT engines for the OFDM processing. A methodology is presented for optimizing bit widths in various components, which is critical in low power designs. Fully digital and analog beamforming architectures are also compared. Preliminary power estimates are provided using a TSMC 28 nm process for a 400 MHz system at 28 GHz similar to 5G systems today and a hypothetical 1.6 GHz system at 140 GHz for potential 6G deployment. 
    more » « less
  3. We demonstrate a self-folding paper robot with capillary force driven fluid. When water is sprayed on fluidic channels patterned on paper, the 2-D sheet of paper can be controllably self-folded into various 3-D structures; half-oval, circle, round-edge square, triangle, half-circle, and table. The self-folding paper sheet can be readily fabricated via a double-sided wax printing method, forming a bilayer structure of the fluidic channel and the hydrophobic wax, in which these two layers have different swelling/shrinking properties. The patterned paper performs folding actuation with water and unfolding behavior with evaporation without being mechanically manipulated by external forces or moments. Finally, we create a paper gripper based on this self-folding actuation, conveying a low-weight object. This report demonstrates the possibility of paper microfluidics for self-folding actuation and soft robotics. 
    more » « less
  4. High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Parallel NTT and iNTT architectures are proposed to reduce the number of clock cycles to process the polynomials. Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our proposed architecture, namely PaReNTT, makes three novel contributions. First, cascaded parallel NTT and iNTT architectures are proposed such that any buffer requirement for permuting the product of the NTTs before it is input to the iNTT is eliminated. This is achieved by using different folding sets for the NTTs and iNTT. Second, a novel approach to expand the set of feasible special moduli is presented where the moduli can be expressed in terms of a few signed power-of-two terms. Third, novel architectures for pre-processing for computing residual polynomials using the CRT and post-processing for combining the residual polynomials are proposed. These architectures significantly reduce the area consumption of the pre-processing and post-processing steps. The proposed long modular polynomial multiplications are ideal for applications that require low latency and high sample rate such as in the cloud, as these feed-forward architectures can be pipelined at arbitrary levels. Pipelining and latency tradeoffs are also investigated. Compared to a prior design, the proposed architecture reduces latency by a factor of 49.2, and the area-time products (ATP) for the lookup table and DSP, ATP(LUT) and ATP(DSP), respectively, by 89.2% and 92.5%. Specifically, we show that for n =4096 and a 180-bit coefficient, the proposed 2-parallel architecture requires 6.3 Watts of power while operating at 240 MHz, with 6 moduli, each of length 30 bits, using Xilinx Virtex Ultrascale+ FPGA. 
    more » « less
  5. null (Ed.)
    Massive multi-user (MU) multiple-input multiple-output (MIMO) provides high spectral efficiency by means of spatial multiplexing and fine-grained beamforming. However, conventional base-station (BS) architectures for systems with hundreds of antennas that rely on centralized baseband processing inevitably suffer from (i) excessive interconnect data rates between radio-frequency circuitry and processing fabrics, and (ii) prohibitive complexity at the centralized baseband processor. Recently, decentralized baseband processing (DBP) architectures and algorithms have been proposed, which mitigate the interconnect bandwidth and complexity bottlenecks. This paper systematically explores the design trade-offs between error-rate performance, computational complexity, and data transfer latency of DBP architectures under different system configurations and channel conditions. Considering architecture, algorithm, and numerical precision aspects, we provide practical guidelines to select the DBP architecture and algorithm that are able to realize the full benefits of massive MU-MIMO in the uplink and downlink. 
    more » « less