skip to main content


Title: A Single-Clock-Phase Sense Amplifier Architecture with 9x Smaller Clock-to-Q Delay Compared to the StrongARM & 6.3dB Lower Noise Compared to Double-Tail
A single-clock-phase sense amplifier architecture with a strong regeneration is proposed. Designed in 22nm FinFET, the proposed architecture has a 9x smaller t CQ delay compared to the conventional StrongARM latch and 6.3dB lower input referred noise compared to the Double-Tail architecture for similar input transistor size and power consumption.  more » « less
Award ID(s):
2006571
NSF-PAR ID:
10355811
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
International Symposium on VLSI Technology Systems and Application
ISSN:
2469-3863
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. High-speed long polynomial multiplication is important for applications in homomorphic encryption (HE) and lattice-based cryptosystems. This paper addresses low-latency hardware architectures for long polynomial modular multiplication using the number-theoretic transform (NTT) and inverse NTT (iNTT). Parallel NTT and iNTT architectures are proposed to reduce the number of clock cycles to process the polynomials. Chinese remainder theorem (CRT) is used to decompose the modulus into multiple smaller moduli. Our proposed architecture, namely PaReNTT, makes three novel contributions. First, cascaded parallel NTT and iNTT architectures are proposed such that any buffer requirement for permuting the product of the NTTs before it is input to the iNTT is eliminated. This is achieved by using different folding sets for the NTTs and iNTT. Second, a novel approach to expand the set of feasible special moduli is presented where the moduli can be expressed in terms of a few signed power-of-two terms. Third, novel architectures for pre-processing for computing residual polynomials using the CRT and post-processing for combining the residual polynomials are proposed. These architectures significantly reduce the area consumption of the pre-processing and post-processing steps. The proposed long modular polynomial multiplications are ideal for applications that require low latency and high sample rate such as in the cloud, as these feed-forward architectures can be pipelined at arbitrary levels. Pipelining and latency tradeoffs are also investigated. Compared to a prior design, the proposed architecture reduces latency by a factor of 49.2, and the area-time products (ATP) for the lookup table and DSP, ATP(LUT) and ATP(DSP), respectively, by 89.2% and 92.5%. Specifically, we show that for n =4096 and a 180-bit coefficient, the proposed 2-parallel architecture requires 6.3 Watts of power while operating at 240 MHz, with 6 moduli, each of length 30 bits, using Xilinx Virtex Ultrascale+ FPGA. 
    more » « less
  2. Recent advances in GPU-based manycore accelerators provide the opportunity to efficiently process large-scale graphs on chip. However, real world graphs have a diverse range of topology and connectivity patterns (e.g., degree distributions) that make the design of input-agnostic hardware architectures a challenge. Network-on-Chip (NoC)- based architectures provide a way to overcome this challenge as the architectural topology can be used to approximately model the expected traffic patterns that emerge from graph application workloads. In this paper, we first study the mix of long- and short-range traffic patterns generated on-chip using graph workloads, and subsequently use the findings to adapt the design of an optimal NoC-based architecture. In particular, by leveraging emerging three-dimensional (3D) integration technology, we propose design of a small-world NoC (SWNoC)- enabled manycore GPU architecture, where the placement of the links connecting the streaming multiprocessors (SM) and the memory controllers (MC) follow a power-law distribution. The proposed 3D manycore GPU architecture outperforms the traditional planar (2D) counterparts in both performance and energy consumption. Moreover, by adopting a joint performance-thermal optimization strategy, we address the thermal concerns in a 3D design without noticeably compromising the achievable performance. The 3D integration technology is also leveraged to incorporate Near Data Processing (NDP) to complement the performance benefits introduced by the SWNoC architecture. As graph applications are inherently memory intensive, off-chip data movement gives rise to latency and energy overheads in the presence of external DRAM. In conventional GPU architectures, as the main memory layer is not integrated with the logic, off-chip data movement negatively impacts overall performance and energy consumption. We demonstrate that NDP significantly reduces the overheads associated with such frequent and irregular memory accesses in graph-based applications. The proposed SWNoC-enabled NDP framework that integrates 3D memory (like Micron's HMC) with a massive number of GPU cores achieves 29.5% performance improvement and 30.03% less energy consumption on average compared to a conventional planar Mesh-based design with external DRAM. 
    more » « less
  3. Building energy consumption is highly influenced by weather conditions, thus having appropriate weather data is important for improving the accuracy of building energy models. Typically local weather station data from the nearest airport or military base is used for weather data input. However this is generally known to differ from the actual weather conditions experienced by an urban building, particularly considering most weather stations are located far from urban areas. The use of the Weather Research and Forecasting Model (WRF) coupled with an Urban Canopy Model (UCM) provides a means to be able to predict more localized variations in weather conditions. However, one of the main challenges associated with the assessment of the use of this model is the lack of availability of ground based weather station data with which to compare its results. This has generally limited the ability to assess the level of agreement between WRF-UCM weather predictions and measured weather data in urban locations. In this study, a network of 40 ground based weather stations located in Austin, TX are compared to WRF/UCM-predicted weather data, to assess similarities and differences between model-predicted results and actual data. Given that the WRF-UCM method also takes into account many input parameters and assumptions, including the urban fraction which can be measured at different scales, this work also considers the relative impact of the granularity of the urban fraction data on WRF-UCM predicted weather. As a case study, a building energy model of a typical residential building is then developed and used to assess the differences in predicted building energy use and demands between the WRF-UCM weather and measured weather conditions during an extreme heatwave event in Austin, TX 
    more » « less
  4. Summary

    Inductive power transfer has become an emerging technology for its significant benefits in many applications, including mobile phones, laptops, electric vehicles, implanted bio‐sensors, and internet of things (IoT) devices. In modern applications, a direct current–direct current (DC–DC) converter is one of the essential components to regulate the output supply voltage for achieving the desired characteristics, that is, steady voltage with lower peak ripples. This paper presents a switched‐capacitor (SC) DC–DC converter using complementary architecture to provide a regulated DC voltage with an increased dynamic response. The proposed topology enhances the converter efficiency by decreasing the equivalent output resistance to half by connecting two symmetric SC single ladder converters. The proposed converter is designed using the standard 130‐nm BiCMOS process. The results show that the proposed architecture produces 327‐mV DC output with a rise time of 60.1 ns and consumes 3.449‐nW power for 1.0‐V DC supply. The output settling time is 43.6% lower than the single‐stage SC DC–DC converter with an input frequency of 200 MHz. The comparison results show that the proposed converter has a higher power conversion efficiency of 93.87%and a lower power density of 0.57 mW/mm2compared to the existing works.

     
    more » « less
  5. In this article, we present a low-energy inference method for convolutional neural networks in image classification applications. The lower energy consumption is achieved by using a highly pruned (lower-energy) network if the resulting network can provide a correct output. More specifically, the proposed inference method makes use of two pruned neural networks (NNs), namely mildly and aggressively pruned networks, which are both designed offline. In the system, a third NN makes use of the input data for the online selection of the appropriate pruned network. The third network, for its feature extraction, employs the same convolutional layers as those of the aggressively pruned NN, thereby reducing the overhead of the online management. There is some accuracy loss induced by the proposed method where, for a given level of accuracy, the energy gain of the proposed method is considerably larger than the case of employing any one pruning level. The proposed method is independent of both the pruning method and the network architecture. The efficacy of the proposed inference method is assessed on Eyeriss hardware accelerator platform for some of the state-of-the-art NN architectures. Our studies show that this method may provide, on average, 70% energy reduction compared to the original NN at the cost of about 3% accuracy loss on the CIFAR-10 dataset. 
    more » « less