skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: FPGA Implementation of a Low Latency and High SFDR Direct Digital Synthesizer for Resource-Efficient Quantum-Enhanced Communication*
A Direct Digital Synthesizer (DDS) generates a sinusoidal signal, which is a significant component of many communication systems using modulation schemes. A CORDIC algorithm offers minimum memory requirements compared to look-up-based methods and low latency. The latency depends on the number of iterations, which is determined by the number of angles in the rotation set. However, it is necessary to maintain high spectral purity to optimize the overall system performance. To optimize the opportunity of quantum measurement, low latency and a high spectral purity sine wave generator is essential. The implementation of this design generates output with 64% latency reduction compared to that of the conventional CORDIC design and 72.2 dB SFDR value.  more » « less
Award ID(s):
1927674
PAR ID:
10287294
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
East-West Design & Test Symposium (EWDTS)
Page Range / eLocation ID:
1 to 8
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. CRYSTAL-Kyber (Kyber) is one of the post-quantum cryptography (PQC) key-encapsulation mechanism (KEM) schemes selected during the standardization process. This paper addresses optimization for Kyber architecture with respect to latency and throughput constraints. Specifically, matrix-vector multiplication and number theoretic transform (NTT)-based polynomial multiplication are critical operations and bottle-necks that require optimization. To address this challenge, we propose an algorithm and hardware co-design approach to systematically optimize matrix-vector multiplication and NTT-based polynomial multiplication by employing a novel sub-structure sharing technique in order to reduce computational complexity, i.e., the number of modular multiplications and modular additions/subtractions consumed. The sub-structure sharing approach is inspired by prior fast parallel approaches based on polyphase decomposition. The proposed efficient feed-forward architecture achieves high speed, low latency, and full utilization of all hardware components, which can significantly enhance the overall efficiency of the Kyber scheme. The FPGA implementation results show that our proposed design, using the fast two-parallel structure, leads to an approximate reduction of 90% in execution time (μs) , along with a 66× improvement in throughput performance. 
    more » « less
  2. Interactive web services increasingly drive critical business workloads such as search, advertising, games, shopping, and finance. Whereas optimizing parallel programs and distributed server systems have historically focused on average latency and throughput, the primary metric for interactive applications is instead consistent responsiveness, i.e., minimizing the number of requests that miss a target latency. This paper is the first to show how to generalize work-stealing, which is traditionally used to minimize the makespan of a single parallel job, to optimize for a target latency in interactive services with multiple parallel requests. We design a new adaptive work stealing policy, called tail-control, that reduces the number of requests that miss a target latency. It uses instantaneous request progress, system load, and a target latency to choose when to parallelize requests with stealing, when to admit new requests, and when to limit parallelism of large requests. We implement this approach in the Intel Thread Building Block (TBB) library and evaluate it on real-world workloads and synthetic workloads. The tail-control policy substantially reduces the number of requests exceeding the desired target latency and delivers up to 58% relative improvement over various baseline policies. This generalization of work stealing for multiple requests effectively optimizes the number of requests that complete within a target latency, a key metric for interactive services. 
    more » « less
  3. Interactive mobile applications like web browsing and gaming are known to benefit significantly from low latency networking, as applications communicate with cloud servers and other users’ devices. Emerging mobile channel standards have not met these needs: general-purpose channels are greatly improving bandwidth but empirically offer little improvement for common latency-sensitive applications, and ultra-low-latency channels are targeted at only specific applications with very low bandwidth requirements. We explore a different direction for wireless channel design: utilizing two channels – one high bandwidth, one low latency – simultaneously for general-purpose applications. With a focus on web browsing, we design fine-grained traffic steering heuristics that can be implemented in a shim layer of the host network stack, effectively exploiting the high bandwidth and low latency properties of both channels. In the special case of 5G’s channels, our experiments show that even though URLLC offers just 0.2% of the bandwidth of eMBB, the use of both channels in parallel can reduce page load time by 26% to 59% compared to delivering traffic exclusively on eMBB. We believe this approach may benefit applications in addition to web browsing, may offer service providers incentives to deploy low latency channels, and suggests a direction for the design of future wireless channels. 
    more » « less
  4. Wireless communication over long distances has become the bottleneck for battery-powered, large-scale deployments. Low-power protocols like Zigbee and Bluetooth Low Energy have limited communication range, whereas long-range communication strategies like cellular and satellite networks are power-hungry. Technologies that use narrow-band communication like LoRa, SigFox, and NB-IoT have low spectral efficiency, leading to scalability issues. The goal of this work is to develop a communication framework that is energy efficient, long-range, and scalable. We propose, design, and prototype WiChronos, a communication paradigm that encodes information in the time interval between two narrowband symbols to drastically reduce the energy consumption in a wide area network with large number of senders. We leverage the low data-rate and relaxed latency requirements of such applications to achieve the desired features identified above. We design and implement chirp spread spectrum transmitter and receiver using off-the-shelf components to send the narrowband symbols. Based on our prototype, WiChronos achieves an impressive 60% improvement in battery life compared to state-of-the-art LPWAN technologies in transmission of payloads less than 10 bytes at experimentally verified distances of over 4 km. We also show that more than 1,000 WiChronos senders can co-exist with less than 5% collision probability under low traffic conditions. 
    more » « less
  5. In-memory processing offers a promising solution for enhancing the performance of data-intensive applications. While analog in-memory computing demonstrates remarkable efficiency, its limited precision is suitable only for approximate computing tasks. In contrast, digital in-memory computing delivers the deterministic precision necessary to accelerate high-assurance applications. Current digital in-memory computing methods typically involve manually breaking down arithmetic operations into in-memory compute kernels. In contrast, traditional digital circuits are synthesized through intricate and automated design workflows. In this article, we introduce a logic synthesis framework called LOGIC, which facilitates the translation of high-level applications into digital in-memory compute kernels that can be executed using non-volatile memory. We propose techniques for decomposing element-wise arithmetic operations into in-memory kernels while minimizing the number of in-memory operations. Additionally, we optimize the sequence of in-memory operations to reduce non-volatile memory utilization. To address the NP-hard execution sequencing optimization problem, we have developed twolook-aheadalgorithms that offer practical solutions. Additionally, we leverage data layout reorganization to efficiently accelerate applications that heavily rely on sparse matrix-vector multiplication operations. Our experimental evaluations demonstrate that our proposed synthesis approach improves the area and latency of fixed-point multiplication by 84% and 20% compared to the state-of-the-art, respectively. Moreover, when applied to scientific computing applications sourced from the SuiteSparse Matrix Collection, our design achieves remarkable improvements in area, latency, and energy efficiency by factors of 4.8×, 2.6×, and 11×, respectively. 
    more » « less