skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM to 12:00 PM ET on Tuesday, March 25 due to maintenance. We apologize for the inconvenience.


Title: HF-LDPC: HLS-friendly QC-LDPC FPGA Decoder with High Throughput and Flexibility
LDPC (Low-Density Parity-Check) codes have become a cornerstone of transforming a noise-filled physical channel into a reliable and high-performance data channel in communication and storage systems. FPGA (Field-Programmable Gate Array) based LDPC hardware, especially for decoding with high complexity, is essential to realizing the high-bandwidth channel prototypes. HLS (High-Level Synthesis) is introduced to speed up the FPGA development of LDPC hardware by automatically compiling high-level abstract behavioral descriptions into RTL-level implementations, but often sub-optimally due to lacking effective low-level descriptions. To overcome this problem, this paper proposes an HLS-friendly QC-LDPC FPGA decoder architecture, HF-LDPC, that employs HLS not only to precisely characterize high-level behaviors but also to effectively optimize low-level RTL implementation, thus achieving both high throughput and flexibility. First, HF-LDPC designs a multi-unit framework with a balanced I/O-computing dataflow to adaptively match code parameters with FPGA configurations. Second, HFLDPC presents a novel fine-grained task-level pipeline with interleaved updating to eliminate stalls due to data interdependence within each updating task. HF-LDPC also presents several HLSenhanced approaches. We implement and evaluate HF-LDPC on Xilinx U50, which demonstrates that HF-LDPC outperforms existing implementations by 4× to 84× with the same parameter and linearly scales to up to 116 Gbps actual decoding throughput with high hardware efficiency.  more » « less
Award ID(s):
2008835
PAR ID:
10567295
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE
Date Published:
ISBN:
979-8-3503-4291-8
Page Range / eLocation ID:
566 to 573
Format(s):
Medium: X
Location:
Washington, DC, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. We present Quantum Belief Propagation (QBP), a Quantum Annealing (QA) based decoder design for Low Density Parity Check (LDPC) error control codes, which have found many useful applications in Wi-Fi, satellite communications, mobile cellular systems, and data storage systems. QBP reduces the LDPC decoding to a discrete optimization problem, then embeds that reduced design onto quantum annealing hardware. QBP's embedding design can support LDPC codes of block length up to 420 bits on real state-of-the-art QA hardware with 2,048 qubits. We evaluate performance on real quantum annealer hardware, performing sensitivity analyses on a variety of parameter settings. Our design achieves a bit error rate of 10--8 in 20 μs and a 1,500 byte frame error rate of 10--6 in 50 μs at SNR 9 dB over a Gaussian noise wireless channel. Further experiments measure performance over real-world wireless channels, requiring 30 μs to achieve a 1,500 byte 99.99% frame delivery rate at SNR 15-20 dB. QBP achieves a performance improvement over an FPGA based soft belief propagation LDPC decoder, by reaching a bit error rate of 10--8 and a frame error rate of 10--6 at an SNR 2.5--3.5 dB lower. In terms of limitations, QBP currently cannot realize practical protocol-sized (e.g., Wi-Fi, WiMax) LDPC codes on current QA processors. Our further studies in this work present future cost, throughput, and QA hardware trend considerations. 
    more » « less
  2. null (Ed.)
    We present Calyx, a new intermediate language (IL) for compiling high-level programs into hardware designs. Calyx combines a hardware-like structural language with a software-like control flow representation with loops and conditionals. This split representation enables a new class of hardware-focused optimizations that require both structural and control flow information which are crucial for high-level programming models for hardware design. The Calyx compiler lowers control flow constructs using finite-state machines and generates synthesizable hardware descriptions. We have implemented Calyx in an optimizing compiler that translates high-level programs to hardware. We demonstrate Calyx using two DSL-to-RTL compilers, a systolic array generator and one for a recent imperative accelerator language, and compare them to equivalent designs generated using high-level synthesis (HLS). The systolic arrays are 4.6× faster and 1.11× larger on average than HLS implementations, and the HLS-like imperative language compiler is within a few factors of a highly optimized commercial HLS toolchain. We also describe three optimizations implemented in the Calyx compiler. 
    more » « less
  3. The design of heterogeneous systems that include domain specific accelerators is a challenging and time-consuming process. While taking into account area constraints, designers must decide which parts of an application to accelerate in hardware and which to leave in software. Moreover, applications in domains such as Extended Reality (XR) offer opportunities for various forms of parallel execution, including loop level, task level and pipeline parallelism. To assist the design process and expose every possible level of parallelism, we present Trireme , a fully automated tool-chain that explores multiple levels of parallelism and produces domain specific accelerator designs and configurations that maximize performance, given an area budget. FPGA SoCs were used as target platforms and Catapult HLS [7] was used to synthesize RTL using a commercial 12nm FinFET technology. Experiments on demanding benchmarks from the XR domain revealed a speedup of up to 20 ×, as well as a speedup of up to 37 × for smaller applications, compared to software-only implementations. 
    more » « less
  4. Non-uniform message quantization techniques such as reconstruction-computation-quantization (RCQ) improve error-correction performance and decrease hardware complexity of low-density parity-check (LDPC) decoders that use a flooding schedule. Layered MinSum RCQ (L-msRCQ) enables message quantization to be utilized for layered decoders and irregular LDPC codes. We investigate field-programmable gate array (FPGA) implementations of L-msRCQ decoders. Three design methods for message quantization are presented, which we name the Lookup, Broadcast, and Dribble methods. The decoding performance and hardware complexity of these schemes are compared to a layered offset MinSum (OMS) decoder. Simulation results on a (16384, 8192) protograph-based raptor-like (PBRL) LDPC code show that a 4-bit L-msRCQ decoder using the Broadcast method can achieve a 0.03 dB improvement in error-correction performance while using 12% fewer registers than the OMS decoder. A Broadcast-based 3-bit L-msRCQ decoder uses 15% fewer lookup tables, 18% fewer registers, and 13% fewer routed nets than the OMS decoder, but results in a 0.09 dB loss in performance. 
    more » « less
  5. null (Ed.)
    This paper presents a ternary low-density parity-check (LDPC) error correction system for wireless electrocardiogram sensors to improve the accuracy of arrhythmia classification. The classification system is based on ternary Delta-modulated bitstreams and rotation linear kernel support vector machines, which identifies the supraventricular ectopic beat (SVEB) and the ventricular ectopic beat (VEB) over the normal heartbeats. We model errors using a ternary symmetric channel with probability parameter p and construct a variety of ternary LDPC codes with different coding rates by concatenating two-component sub-matrices to form a parity-check matrix with a quasi-cyclic structure that facilitates the hardware design. In particular, a hardware-friendly LDPC encoder circuit is proposed that leverages the highly structured parity-check matrix to perform serial generation of the parity symbols using an accumulator and a look-up table. The encoder circuits are implemented on FPGA and synthesized on ASIC using a 32 nm CMOS process. Simulation results show that the ternary LDPC codes can significantly improve classification accuracy in the presence of errors. For example, with an error probability of up to 21% in the sensor output bitstreams, the classification accuracy remains above 99% with the proposed error correction system. 
    more » « less