Connecting two surface-code patches may require significantly higher noise at the interface. We show, via circuit-level simulations under a depolarizing noise model with idle errors, that surface codes remain fault tolerant despite substantially elevated interface error rates. Specifically, we compare three strategies—direct noisy links, gate teleportation, and a CAT-state gadget—for both rotated and unrotated surface codes, and demonstrate that careful design can mitigate hook errors in each case so that the full code distance is preserved for both 𝑋 and 𝑍. Although these methods differ in space and time overhead and performance, each offers a viable route to modular surface-code architectures. Our results, obtained with stim and pymatching, confirm that high-noise interfaces can be integrated fault-tolerantly without compromising the code's essential properties, indicating that fault-tolerant scaling of error-corrected modular devices is within reach with current technology.
more »
« less
Performance and power modeling and prediction using MuMMI and 10 machine learning methods
Summary Energy‐efficient scientific applications require insight into how high performance computing system features impact the applications' power and performance. This insight can result from the development of performance and power models. In this article, we use the modeling and prediction tool MuMMI (Multiple Metrics Modeling Infrastructure) and 10 machine learning methods to model and predict performance and power consumption and compare their prediction error rates. We use an algorithm‐based fault‐tolerant linear algebra code and a multilevel checkpointing fault‐tolerant heat distribution code to conduct our modeling and prediction study on the Cray XC40 Theta and IBM BG/Q Mira at Argonne National Laboratory and the Intel Haswell cluster Shepard at Sandia National Laboratories. Our experimental results show that the prediction error rates in performance and power using MuMMI are less than 10% for most cases. By utilizing the models for runtime, node power, CPU power, and memory power, we identify the most significant performance counters for potential application optimizations, and we predict theoretical outcomes of the optimizations. Based on two collected datasets, we analyze and compare the prediction accuracy in performance and power consumption using MuMMI and 10 machine learning methods.
more »
« less
- PAR ID:
- 10443320
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Concurrency and Computation: Practice and Experience
- Volume:
- 35
- Issue:
- 15
- ISSN:
- 1532-0626
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Power modeling is an essential building block for computer systems in support of energy optimization, energy profiling, and energy-aware application development. We introduce VESTA, a novel approach to modeling the power consumption of applications with one key insight: language runtime events are often correlated with a sustained level of power consumption. When compared with the established approach of power modeling based on hardware performance counters (HPCs), VESTA has the benefit of solely requiring application-scoped information and enabling a higher level of explainability, while achieving comparable or even higher precision. Through experiments performed on 37 real-world applications on the Java Virtual Machine (JVM), we find the power model built by VESTA is capable of predicting energy consumption with a mean absolute percentage error of 1.56%, while the monitoring of language runtime events incurs small performance and energy overhead.more » « less
-
null (Ed.)Predicting coarse-grain variations in workload behavior during execution is essential for dynamic resource optimization of processor systems. Researchers have proposed various methods to first classify workloads into phases and then learn their long-term phase behavior to predict and anticipate phase changes. Early studies on phase prediction proposed table-based phase predictors. More recently, simple learning-based techniques such as decision trees have been explored. However, more recent advances in machine learning have not been applied to phase prediction so far. Furthermore, existing phase predictors have been studied only in connection with specific phase classifiers even though there is a wide range of classification methods. Early work in phase classification proposed various clustering methods that required access to source code. Some later studies used performance monitoring counters, but they only evaluated classifiers for specific contexts such as thermal modeling. In this work, we perform a comprehensive study of source-oblivious phase classification and prediction methods using hardware counters. We adapt classification techniques that were used with different inputs in the past and compare them to state-of-the-art hardware counter based classifiers. We further evaluate the accuracy of various phase predictors when coupled with different phase classifiers and evaluate a range of advanced machine learning techniques, including SVMs and LSTMs for workload phase prediction. We apply classification and prediction approaches to SPEC workloads running on an Intel Core-i9 platform. Results show that a two-level kmeans clustering combined with SVM-based phase change prediction provides the best tradeoff between accuracy and long-term stability. Additionally, the SVM predictor reduces the average prediction error by 80% when compared to a table-based predictor.more » « less
-
Datacenter capacity is growing exponentially to satisfy the increasing demand for many emerging computationally-intensive applications, such as deep learning. This trend has led to concerns over datacenters’ increasing energy consumption and carbon footprint. The most basic prerequisite for optimizing a datacenter’s energy- and carbon-efficiency is accurately monitoring and attributing energy consumption to specific users and applications. Since datacenter servers tend to be multi-tenant, i.e., they host many applications, server- and rack-level power monitoring alone does not provide insight into the energy usage and carbon emissions of their resident applications. At the same time, current application-level energy monitoring and attribution techniques are intrusive: they require privileged access to servers and necessitate coordinated support in hardware and software, neither of which is always possible in cloud environments. To address the problem, we design WattScope, a system for non-intrusively estimating the power consumption of individual applications using external measurements of a server’s aggregate power usage and without requiring direct access to the server’s operating system or applications. Our key insight is that, based on an analysis of production traces, the power characteristics of datacenter workloads, e.g., low variability, low magnitude, and high periodicity, are highly amenable to disaggregation of a server’s total power consumption into application-specific values. WattScope adapts and extends a machine learning-based technique for disaggregating building power and applies it to server- and rack-level power meter measurements that are already available in data centers. We evaluate WattScope’s accuracy on a production workload and show that it yields high accuracy, e.g., often 10% normalized mean absolute error, and is thus a potentially useful tool for datacenters in externally monitoring application-level power usage.more » « less
-
WattScope is a system for non-intrusively estimating the power consumption of individual applications using external measurements of a server's aggregate power usage and without requiring direct access to the server's operating system or applications. Our key insight is that, based on an analysis of production traces, the power characteristics of datacenter workloads, e.g., low variability, low magnitude, and high periodicity, are highly amenable to disaggregation of a server's total power consumption into application-specific values. WattScope adapts and extends a machine learning-based technique for disaggregating building power and applies it to server- and rack-level power measurements that are already available in datacenters. We evaluate WattScope's accuracy on a production workload and show that it yields high accuracy, e.g., often <∼10% normalized mean absolute error, and is thus a potentially useful tool for datacenters in externally monitoring application-level power usage.more » « less
An official website of the United States government
