Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Computer systems research heavily relies on simulation tools like gem5 to effectively prototype and validate new ideas. However, publicly available simulators struggle to accurately model systems as architectures evolve rapidly. This is a major issue because incorrect simulator models may lead researchers to draw misleading or even incorrect conclusions about their research prototypes from these simulators. Although this challenge pertains to many open source simulators, we focus on the widely used, open source gem5 simulator. In GAP we showed that gem5’s GPGPU models have significant correlation issues versus real hardware. GAP also improved the fidelity of gem5’s AMDGPU model, particularly for cache access latencies and bandwidths. However, one critical issue remains: our microbenchmarks reveal 88% error in memory bandwidth between gem5’s current model and corresponding real AMD GPUs. To narrow this gap, we examined recent patents and gem5’s memory system bottlenecks, then made several improvements including: utilizing a redesigned HBM memory controller, enhancing TLB request coalescing, adding support for multiple page sizes, adding a page walk cache, and improving network bandwidth modeling. Collectively, these optimizations significantly improve gem5’s GPU memory bandwidth by 3.8x: from 153 GB/s to 583 GB/s. Moreover, our address translation enhancements can be ported to other ISAs where similar support is also needed, improving gem5’s MMU support.more » « lessFree, publicly-accessible full text available June 22, 2026
- 
            In recent years deep neural networks (DNNs) have emerged as an important application domain driving the requirements for future systems. As DNNs get more sophisticated, their compute requirements and the datasets they are trained on continue to grow at a fast rate. For example, Gholami showed that compute in Transformer networks grew 750X over 2 years, while other work projects DNN compute and memory requirements to grow by 1.5X per year. Given their growing requirements and importance, heterogeneous systems often add machine learning (ML) specific features (e.g., TensorCores) to improve their efficiency. However, given ML’s voracious rate of growth and size, there is a growing challenge in performing early-system exploration based on sound simulation methodology. In this work we discuss our efforts to enhance gem5’s support to make these workloads practical to run while retaining accuracy.more » « less
- 
            The breakdown in Moore’s Law and Dennard Scaling is leading to drastic changes in the makeup and constitution of computing systems. For example, a single chip integrates 10-100s of cores and has a heterogeneous mix of general-purpose compute engines and highly specialized accelerators. Traditionally, computer architects have relied on tools like architectural simulators (e.g., Accel-Sim, gem5, gem5-SALAM, GPGPU-Sim, MGPUSim, Sniper-Sim, and ZSim) to accurately perform early stage prototyping and optimizations for the proposed research. However, as systems become increasingly complex and heterogeneous, architectural tools are straining to keep up. In particular, publicly available architectural simulators are often not very representative of the industry parts they intend to represent. This leads to a mismatch in expectations; when prototyping new optimizations in gem5 users may draw the wrong conclusions about the efficacy of proposed optimizations if the tool’s models do not provide high fidelity. In this work, we focus on the gem5 simulator, the most popular platform for computer system simulation. In recent years gem5 has been used by ∼20% of simulation-based papers published in top-tier computer architecture conferences per year. Moreover, gem5 can run entire systems, including CPUs, GPUs, and accelerators as well as the operating system, runtime, network and other related components (including multiple ISAs). Thus, gem5 has the potential to allow users to study the behavior of the entire heterogeneous systems. Unfortunately, some of gem5’s models do not always provide high accuracy relative to their ”real” counterparts. In particular, although gem5’s GPU model provides high accuracy internally at AMD [9], the publicly available gem5 GPU model is often inaccurate, especially for the memory subsystem. To understand this, we designed a series of microbenchmarks designed to expose the latencies, bandwidths, and sizes of a variety of GPU components on real AMD GPUs. Our results showed that while gem5’s GPU microarchitecture was relatively accurate (within 5-10% in most cases), gem5’s memory subsytem was off by an average of 272% (645% max) for latency and 70% (693% max) for bandwidth. Accordingly, to help bridge this divide, we propose to design and use a new tool, GPU Accuracy Profiler (GAP), to compare and improve the behavior of gem5’s simulated GPUs relative to real GPUs. By iteratively applying fixes and improvements to gem’s GPU model via GAP, we will significantly improved its fidelity relative to real AMD GPUs. Although this work is still ongoing, our preliminary results show significant promise: on average 25% error for latency and 16% error for bandwidth, respectively. Overall, by completing this work we hope to enable more widespread adoption of gem5 as an accurate platform for heterogeneous architecture research.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                     Full Text Available
                                                Full Text Available