Search for: All records

Award ID contains: 1925485

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Implementing Support for Extensible Power Modeling in gem5

Smith, Alex; Sinclair, Matthew D (June 2025, 6th gem5 Users Workshop)

Power consumption has increasingly become a first-class design constraint to satisfy requirements for scientific workloads and other widely used workloads, such as machine learning. To meet performance and power requirements, system designers often use architectural simulators, such as gem5, to model component and system-level behavior. However, performance and power modeling tools are often isolated and do not make it accessible to integrate with one another for rapid performance and power system co-design. Although studies have previously explored power modeling with gem5 and validation on real hardware, there are several flaws with this approach. First, power models are sometimes not open source, making it difficult to apply them to different simulated systems. The current interface for implementing power models in gem5 also relies on hard-coded strings provided by the user to model dynamic and static power. This makes defining power models for components cumbersome and restrictive, as gem5’s MathExpr string formula parser has support for limited mathematical operations. Third, previous works only implement one form of power model for one component. This unnecessarily limits users from combining other power models, which may model certain system components with higher accuracy. Instead, we posit that decoupling how power models are integrated with simulators from the design of power models themselves will enable better power modeling in simulators. Accordingly, we extend our prior work on designing and implementing an extensible, generalizable power modeling interface by integrating support for McPAT into it and validating it emits correct power values.
more » « less
Free, publicly-accessible full text available June 22, 2026
Closing the Gap: Improving the Accuracy of gem5’s GPU Models

Vishnu Ramadas; Daniel Kouchekinia; Ndubuisi Osuji; Matthew D. Sinclair (June 2023, 5th gem5 Users' Workshop)

In recent years, we have been enhancing and updating gem5’s GPU support, including enhanced gem5’s GPU support to enable running ML workloads. Moreover, we created, validated, and released a Docker image with the proper software and libraries needed to run AMD’s GCN3 and Vega GPU models in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model, out of the box without needing to properly install the appropriate ROCm software and libraries. Additionally, we updated gem5 to make it easier to reproduce results, including releasing support for a number of GPU workloads in gem5-resources and enabling continuous integration testing for a variety of GPU workloads. Current gem5 support focuses on Carrizo- and Vega-class GPUs. Unfortunately, these models do not always provide high accuracy relative to the equivalent ”real” GPUs. This leads to a mismatch in expectations: when prototyping new optimizations in gem5 users may draw the wrong conclusions about the efficacy of proposed optimizations if gem5’s GPU models do not provide high fidelity. Accordingly, to help bridge this divide, we design a series of micro-benchmarks designed expose the latencies, bandwidths, and sizes of a variety of GPU components on real GPUs. By iteratively applying fixes and improvements to gem’s GPU model, we significantly improve its fidelity relative to real AMD GPUs.
more » « less
Full Text Available
Improving the Speed of gem5’s GPU Regression Tests

James Braun; Matthew D. Sinclair (June 2023, 5th gem5 Users' Workshop)

n recent years, we have been enhancing and updating gem5’s GPU support. First, we have enhanced gem5’s GPU support for ML workloads such that gem5 can now run. Moreover, as part of this support, we created, validated, and released a Docker image that contains the proper software and libraries needed to run GCN3 and Vega GPU models in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model, out of the box without needing to properly install the appropriate ROCm software and libraries. Additionally, we have updated gem5 to make it easier to reproduce results, including releasing support for a number of GPU workloads in gem5-resources and enabling continuous integration testing on future GPU commits. However, in an effort to provide sufficient coverage, the cur- rent testing support for GPU tests requires significant runtime both for the nightly and weekly regression tests. Currently most of these regression tests test the GPU SE mode support, since GPU FS mode support is still nascent. Unfortunately, much of this time is spent parsing input files to create arrays and other data structures that the GPU subsequently computes on. Although SE mode does not simulate the system calls needed to read these input files, nevertheless this still represents a significant overhead that increases runtime and prevents other tests (potentially providing additional coverage) from being run in that same timeframe. In an effort to address this, in the work we have been working on utilizing SE mode’s avoiding modeling system calls to speed up the runtime of the GPU regression tests. Specifically, we redesign the input reading phase of these GPU tests to create and use mmap’d files for their input arrays (which SE mode completes all at once) instead of reading in the files entry by entry. In doing so, we see significant reductions in runtime of at least 29%
more » « less
Analyzing the Benefits of More Complex Cache Replacement Policies in Moderns GPU LLCs

Jarvis Jia; Matthew D. Sinclair (June 2023, 5th gem5 Users' Workshop)

The gem5 simulator offers Classic and Ruby as two separate memory models for simulating on-chip caches. The Classic model, which originated from M5, is a quick and simple option that allows for easy configuration, but only supports a basic MOESI coherence protocol. On the other hand, the Ruby model, which was developed by GEMS [2], is a more advanced and flexible option that can accurately simulate a wider range of cache coherence protocols and features. However, choosing between the two memory system models in gem5 is challenging for researchers as each has advantages and limitations which can be inconvenient. In particular, this has led to a bifurcation of effort where prior work has added replacement policies to Classic and Ruby in parallel – duplicating effort unnecessarily and preventing users from using a desired replacement policy if it is not implemented in the desired memory model (e.g., users could only use RRIP in Classic). Accordingly, we merged the cache replacement policies from Classic to Ruby, enabling users to use any of the replacement policies in either memory model. Gem5 currently has the capability to support 13 replacement policies, which can be used exchangeable within the Classic and Ruby cache models, including commonly used options like LRU, FIFO, PseudoLRU, and different types of RRIPs. After combining the replacement policies for the Classic and Ruby cache models, we designed and integrated (into gem5’s nightly regressions) multiple corner case tests to verify and ensure the continued correct functionality of these policies. Through these tests, we identified and fixed several bugs to ensure that the replacement policies operate correctly. Finally, with the newly enabled and verified functionality, since there is limited information about how different replacement policies affects GPU performance, we decided to use gem5 to study these policies in a GPU context. Specifically, we study GPU L2 caches, since GPU L1 caches are often used to stream data through and thus are unlikely to be significantly impacted by replacement policy.
more » « less
Full Text Available
Improving gem5’s GPUFS Support

Vishnu Ramadas; Matthew Poremba; Bradford M. Beckmann; Matthew D. Sinclair (June 2023, 5th gem5 Users' Workshop)

With the waning of Moore’s Law and the end of Dennard’s Scaling, systems are turning towards heterogeneity, mixing conventional cores and specialized accelerators to continue scaling performance and energy efficiency. Specialized accelerators are frequently used to improve the efficiency of computations that run inefficiently on conventional, general-purpose processors. As a result, systems ranging from smartphones to data-centers, hyper-scalars, and supercomputers are increasingly using large numbers of accelerators to provide better efficiency than CPU-based solutions. However, heterogeneous systems face key challenges: changes to the underlying technology which threaten continued scaling, as well as the voracious scaling from applications, which require additional research to address. Traditionally, simulators could be used to perform early exploration for this research. However, existing simulators lack important support for these key challenges. Detailed simulation of modern systems can take extremely long times in existing tools and infrastructure. Furthermore, prototyping optimizations at scale can also be challenging, especially for newly proposed accelerators. Although other simulators such as Accel-Sim, SCALE-Sim, and Gemmini enable some early experiments, they are limited in their ability to target a wide variety of accelerators. In comparison, gem5 has support for various CPUs, GPUs, DSPs, and many other important accelerators. However, efficiently simulating large-scale workloads on gem5’s cycle-level models requires prohibitively long times. We aim to enhance gem5’s support to make running these workloads practical while retaining accuracy.
more » « less
Improving the Speed of gem5’s GPU Regression Tests

James Braun; Matthew D. Sinclair (June 2023, 5th gem5 Users' Workshop)

In recent years, we have been enhancing and updating gem5’s GPU support. First, we have enhanced gem5’s GPU support for ML workloads such that gem5 can now run. Moreover, as part of this support, we created, validated, and released a Docker image that contains the proper software and libraries needed to run GCN3 and Vega GPU models in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model, out of the box without needing to properly install the appropriate ROCm software and libraries. Additionally, we have updated gem5 to make it easier to reproduce results, including releasing support for a number of GPU workloads in gem5-resources and enabling continuous integration testing on future GPU commits. However, in an effort to provide sufficient coverage, the cur- rent testing support for GPU tests requires significant runtime both for the nightly and weekly regression tests. Currently most of these regression tests test the GPU SE mode support, since GPU FS mode support is still nascent. Unfortunately, much of this time is spent parsing input files to create arrays and other data structures that the GPU subsequently computes on. Although SE mode does not simulate the system calls needed to read these input files, nevertheless this still represents a significant overhead that increases runtime and prevents other tests (potentially providing additional coverage) from being run in that same timeframe. In an effort to address this, in the work we have been working on utilizing SE mode’s avoiding modeling system calls to speed up the runtime of the GPU regression tests. Specifically, we redesign the input reading phase of these GPU tests to create and use mmap’d files for their input arrays (which SE mode completes all at once) instead of reading in the files entry by entry. In doing so, we see significant reductions in runtime of at least 29%
more » « less
Full Text Available
Analyzing the Benefits of More Complex Cache Replacement Policies in Moderns GPU LLCs

Jarvis Jia; Matthew D. Sinclair (June 2023, 5th gem5 Users' Workshop)

The gem5 simulator offers Classic and Ruby as two separate memory models for simulating on-chip caches. The Classic model, which originated from M5, is a quick and simple option that allows for easy configuration, but only supports a basic MOESI coherence protocol. On the other hand, the Ruby model, which was developed by GEMS [2], is a more advanced and flexible option that can accurately simulate a wider range of cache coherence protocols and features. However, choosing between the two memory system models in gem5 is challenging for researchers as each has advantages and limitations which can be inconvenient. In particular, this has led to a bifurcation of effort where prior work has added replacement policies to Classic and Ruby in parallel – duplicating effort unnecessarily and preventing users from using a desired replacement policy if it is not implemented in the desired memory model (e.g., users could only use RRIP in Classic). Accordingly, we merged the cache replacement policies from Classic to Ruby, enabling users to use any of the replacement policies in either memory model. Gem5 currently has the capability to support 13 replacement policies, which can be used exchangeable within the Classic and Ruby cache models, including commonly used options like LRU, FIFO, PseudoLRU, and different types of RRIPs. After combining the replacement policies for the Classic and Ruby cache models, we designed and integrated (into gem5’s nightly regressions) multiple corner case tests to verify and ensure the continued correct functionality of these policies. Through these tests, we identified and fixed several bugs to ensure that the replacement policies operate correctly. Finally, with the newly enabled and verified functionality, since there is limited information about how different replacement policies affects GPU performance, we decided to use gem5 to study these policies in a GPU context. Specifically, we study GPU L2 caches, since GPU L1 caches are often used to stream data through and thus are unlikely to be significantly impacted by replacement policy.
more » « less
Full Text Available
gem5 GPU Accuracy Profiler (GAP)

Charles Jamieson, Anushka Chandrashekar (June 2022, 4th gem5 Users Workshop)

In recent years, we have been enhancing and updating gem5's GPU support. First, we have enhanced gem5’s GPU support for ML workloads such that gem5 can now run. Moreover, as part of this support, we created, validated, and released a Docker image that contains the proper software and libraries needed to run GCN3 and Vega GPU models in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model, out of the box without needing to properly install the appropriate ROCm software and libraries. Additionally, we have updated gem5 to make it easier to reproduce results, including releasing support for a number of GPU workloads in gem5-resources and enabling continuous integration testing on future GPU commits. However, we currently do not have a way to model validated gem5 configurations for the most recent AMD GPUs. Current support focuses on Carrizo- and Vega-class GPUs. Unfortunately, these models do not always provide high accuracy relative to real GPU runs. This leads to a mismatch between how each instruction is supposedly being executed according to the ISA and how a given GPU model executes a given instruction. These discrepancies are of interest to those developing the gem5 GPU models as they can lead to less accurate simulations. Accordingly, to help bridge this divide, we have created a new tool, GAP (gem5 GPU Accuracy Profiler), to identify discrepancies between real GPU and simulated gem5 GPU behavior. GAP identifies and verifies how accurate these configurations relative to real GPUs by comparing the simulator’s performance counters to those from real GPUs.
more » « less
Full Text Available
Enabling Reproducible and Agile Full-System Simulation

Bobby R. Bruce, Ayaz Akram (March 2021, 2021 IEEE International Symposium on Performance Analysis of Systems and Software)
null (Ed.)
Running experiments in modern computer architecture simulators can be a difficult and error-prone endeavor. Users must track many configurations, components and outputs between simulation runs. The gem5 simulator is no exception to this, requiring researchers to gather, organize, and create a significant number of components for a single simulation. In this paper, we present the GEM5ART framework, a tool to aid gem5 users in better structuring and running architecture simulations, and GEM5 RESOURCES, a suite of resources with known compatibility with the simulator. These new additions to the gem5 project make full system simulation easier, allowing researchers to concentrate more so on their architectural innovations over setting up the simulation framework. The GEM5ART framework carefully logs the resources used in a gem5 simulation and places the results obtained within a database, thus enabling simple reproduction of experiments. The pre-built resources allow researchers to jump straight into running simulations rather than having to spend valuable time creating them. GEM5ART has been released with a permissive, open source license allowing the broader computer architecture community to contribute as workloads and workflows evolve. An archive of the data, an related materials, presented in this paper can be found at https://doi.org/10.6084/m9.figshare.14176802
more » « less
Full Text Available
The gem5 Simulator: Version 20.0+: A new era for the open-source computer architecture simulator

Lowe-Power, Jason; Ahmad, Abdul Mutaal; Akram, Ayaz; Alian, Mohammad; Amslinger, Rico; Andreozzi, Matteo; Armejach, Adrià; Asmussen, Nils; Bharadwaj, Srikant; Black, Gabe; et al (July 2020, ArXivorg)

The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 simulator has been under active development over the last nine years since the original gem5 release. In this time, there have been over 7500 commits to the codebase from over 250 unique contributors which have improved the simulator by adding new features, fixing bugs, and increasing the code quality. In this paper, we give and overview of gem5's usage and features, describe the current state of the gem5 simulator, and enumerate the major changes since the initial release of gem5. We also discuss how the gem5 simulator has transitioned to a formal governance model to enable continued improvement and community support for the next 20 years of computer architecture research.
more » « less
Full Text Available

« Prev Next »