skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: MPAM: Reliable, Low-Latency, Near-Threshold-Voltage Multi-Voltage/Frequency-Domain Network-on-Chip with Metastability Risk Prediction and Mitigation
Emerging applications like a drone and an autonomous vehicle require system-on-a-chips (SoCs) with high reliability, e.g., the mean-time-between-failure (MTBF) needs to be over tens of thousands of hours [1]. Meanwhile, as these applications require increasingly higher performance and energy efficiency, a multi-core architecture is often desirable. Here, each core operates in an independent voltage/frequency (V/F) domain, ideally from the near-threshold voltage (NTV) to super-threshold, while communicating with one another via a network-on-chip (NoC) [2]. However, this makes it challenging to ensure robustness in clock domain crossing against metastability. Metastability becomes even more critical to NTV circuits since metastability resolution time constant T grows super-linearly with voltage scaling [3]. Conventionally, an NoC uses multi-stage (4 stages in [4]) synchronizers to improve MTBF, but they increase latency and cannot completely eliminate metastability. Recently, [5] proposed a novel NTV flip-flop, which has a lower probability of having metastability. Another recent work [6] proposed to detect the necessary condition of metastability and mitigate it by modulating the RX clock and also requesting retransmission to guarantee data correctness. However, as it detects a necessary condition, not actual metastability, it tends to overly request retransmission, hurting latency, throughput, and energy efficiency.  more » « less
Award ID(s):
1919147
PAR ID:
10342207
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
2022 IEEE Custom Integrated Circuits Conference (CICC)
Page Range / eLocation ID:
1 to 2
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Emerging embedded systems, such as autonomous robots/vehicles, demand a new system-on-a-chip (SoC) that is ultra-low power (mW or even sub-mW level) but highly robust. Such an SoC typically integrates heterogeneous building blocks for supporting a range of features, each ideally operating in an independent voltage and frequency (V/F) domain [1]. In such an architecture, a network-on-chip (NoC) has played a key role to enable high-speed and energy-efficient networking. However, it is increasingly challenging to meet a robustness target since each V/F domain uses a significantly different voltage, e.g., from nominal 1V to near-threshold voltage (NTV), and clock frequency, e.g., from hundreds of MHz to sub-MHz. Furthermore, any two clocks may have uncertain and time-varying phase and frequency relationships. These properties significantly worsen robustness, particularly metastability, in an NoC. 
    more » « less
  2. Network-on-Chips (NoCs) have emerged as the standard on-chip communication fabrics for multi/many core systems and system on chips. However, as the number of cores on chip increases, so does power consumption. Recent studies have shown that NoC power consumption can reach up to 40% of the overall chip power. Considerable research efforts have been deployed to significantly reduce NoC power consumption. In this paper, we build on approximate computing techniques and propose an approximate communication methodology called DEC-NoC for reducing NoC power consumption. The proposed DEC-NoC leverages applications' error tolerance and dynamically reduces the amount of error checking and correction in packet transmission, which results in a significant reduction in the number of retransmitted packets. The reduction in packet retransmission results in reduced power consumption. Our cycle accurate simulation using PARSEC benchmark suites shows that DEC-NoC achieves up to 56% latency reduction and up to 58% dynamic power reduction compared to NoC architectures with conventional error control techniques. 
    more » « less
  3. Energy-efficient microprocessors are essential for a wide range of applications. While near-threshold computing is a promising technique to improve energy efficiency, optimal supply demands from logic core and on-chip memory are conflicting. In this paper, we perform static reliability analysis of 6T SRAM and discover the variance among different sizing configuration and asymmetric minimum voltage requirements between read and write operations. We leverage this asymmetric property i n near-threshold processors equipped with voltage boosting capability by proposing an opportunistic dual-supply switching scheme with a write aggregation buffer. Our results show that proposed technique improves energy efficiency by more than 21.45% with approximate 10.19% performance speed-up. 
    more » « less
  4. Relaxed retention (or volatile) spin-transfer torque RAM (STT-RAM) has been widely studied as a way to reduce STT-RAM's write energy and latency overheads. Given a relaxed retention time STT-RAM level one (L1) cache, we analyze the impacts of dynamic voltage and frequency scaling (DVFS)---a common optimization in modern processors---on STT-RAM L1 cache design. Our analysis reveals that, apart from the fact that different applications may require different retention times, the clock frequency, which is typically ignored in most STT-RAM studies, may also significantly impact applications' retention time needs. Based on our findings, we propose an asymmetric-retention core (ARC) design for multicore architectures. ARC features retention time heterogeneity to specialize STT-RAM retention times to applications' needs. We also propose a runtime prediction model to determine the best core on which to run an application, based on the applications' characteristics, their retention time requirements, and available DVFS settings. Results reveal that the proposed approach can reduce the average cache energy by 20.19% and overall processor energy by 7.66%, compared to a homogeneous STT-RAM cache design. 
    more » « less
  5. Organizations managing high-performance computing systems face a multitude of challenges, including overarching concerns such as overall energy consumption, microprocessor clock frequency limitations, and the escalating costs associated with chip production. Evidently, processor speeds have plateaued over the last decade, persisting within the range of 2 GHz to 5 GHz. Scholars assert that brain-inspired computing holds substantial promise for mitigating these challenges. The spiking neural network (SNN) particularly stands out for its commendable power efficiency when juxtaposed with conventional design paradigms. Nevertheless, our scrutiny has brought to light several pivotal challenges impeding the seamless implementation of large-scale neural networks (NNs) on silicon. These challenges encompass the absence of automated tools, the need for multifaceted domain expertise, and the inadequacy of existing algorithms to efficiently partition and place extensive SNN computations onto hardware infrastructure. In this paper, we posit the development of an automated tool flow capable of transmuting any NN into an SNN. This undertaking involves the creation of a novel graph-partitioning algorithm designed to strategically place SNNs on a network-on-chip (NoC), thereby paving the way for future energy-efficient and high-performance computing paradigms. The presented methodology showcases its effectiveness by successfully transforming ANN architectures into SNNs with a marginal average error penalty of merely 2.65%. The proposed graph-partitioning algorithm enables a 14.22% decrease in inter-synaptic communication and an 87.58% reduction in intra-synaptic communication, on average, underscoring the effectiveness of the proposed algorithm in optimizing NN communication pathways. Compared to a baseline graph-partitioning algorithm, the proposed approach exhibits an average decrease of 79.74% in latency and a 14.67% reduction in energy consumption. Using existing NoC tools, the energy-latency product of SNN architectures is, on average, 82.71% lower than that of the baseline architectures. 
    more » « less