skip to main content


Search for: All records

Award ID contains: 1925485

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The gem5 simulator offers Classic and Ruby as two separate memory models for simulating on-chip caches. The Classic model, which originated from M5, is a quick and simple option that allows for easy configuration, but only supports a basic MOESI coherence protocol. On the other hand, the Ruby model, which was developed by GEMS [2], is a more advanced and flexible option that can accurately simulate a wider range of cache coherence protocols and features. However, choosing between the two memory system models in gem5 is challenging for researchers as each has advantages and limitations which can be inconvenient. In particular, this has led to a bifurcation of effort where prior work has added replacement policies to Classic and Ruby in parallel – duplicating effort unnecessarily and preventing users from using a desired replacement policy if it is not implemented in the desired memory model (e.g., users could only use RRIP in Classic). Accordingly, we merged the cache replacement policies from Classic to Ruby, enabling users to use any of the replacement policies in either memory model. Gem5 currently has the capability to support 13 replacement policies, which can be used exchangeable within the Classic and Ruby cache models, including commonly used options like LRU, FIFO, PseudoLRU, and different types of RRIPs. After combining the replacement policies for the Classic and Ruby cache models, we designed and integrated (into gem5’s nightly regressions) multiple corner case tests to verify and ensure the continued correct functionality of these policies. Through these tests, we identified and fixed several bugs to ensure that the replacement policies operate correctly. Finally, with the newly enabled and verified functionality, since there is limited information about how different replacement policies affects GPU performance, we decided to use gem5 to study these policies in a GPU context. Specifically, we study GPU L2 caches, since GPU L1 caches are often used to stream data through and thus are unlikely to be significantly impacted by replacement policy. 
    more » « less
    Free, publicly-accessible full text available June 3, 2024
  2. With the waning of Moore’s Law and the end of Dennard’s Scaling, systems are turning towards heterogeneity, mixing conventional cores and specialized accelerators to continue scaling performance and energy efficiency. Specialized accelerators are frequently used to improve the efficiency of computations that run inefficiently on conventional, general-purpose processors. As a result, systems ranging from smartphones to data-centers, hyper-scalars, and supercomputers are increasingly using large numbers of accelerators to provide better efficiency than CPU-based solutions. However, heterogeneous systems face key challenges: changes to the underlying technology which threaten continued scaling, as well as the voracious scaling from applications, which require additional research to address. Traditionally, simulators could be used to perform early exploration for this research. However, existing simulators lack important support for these key challenges. Detailed simulation of modern systems can take extremely long times in existing tools and infrastructure. Furthermore, prototyping optimizations at scale can also be challenging, especially for newly proposed accelerators. Although other simulators such as Accel-Sim, SCALE-Sim, and Gemmini enable some early experiments, they are limited in their ability to target a wide variety of accelerators. In comparison, gem5 has support for various CPUs, GPUs, DSPs, and many other important accelerators. However, efficiently simulating large-scale workloads on gem5’s cycle-level models requires prohibitively long times. We aim to enhance gem5’s support to make running these workloads practical while retaining accuracy. 
    more » « less
  3. n recent years, we have been enhancing and updating gem5’s GPU support. First, we have enhanced gem5’s GPU support for ML workloads such that gem5 can now run. Moreover, as part of this support, we created, validated, and released a Docker image that contains the proper software and libraries needed to run GCN3 and Vega GPU models in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model, out of the box without needing to properly install the appropriate ROCm software and libraries. Additionally, we have updated gem5 to make it easier to reproduce results, including releasing support for a number of GPU workloads in gem5-resources and enabling continuous integration testing on future GPU commits. However, in an effort to provide sufficient coverage, the cur- rent testing support for GPU tests requires significant runtime both for the nightly and weekly regression tests. Currently most of these regression tests test the GPU SE mode support, since GPU FS mode support is still nascent. Unfortunately, much of this time is spent parsing input files to create arrays and other data structures that the GPU subsequently computes on. Although SE mode does not simulate the system calls needed to read these input files, nevertheless this still represents a significant overhead that increases runtime and prevents other tests (potentially providing additional coverage) from being run in that same timeframe. In an effort to address this, in the work we have been working on utilizing SE mode’s avoiding modeling system calls to speed up the runtime of the GPU regression tests. Specifically, we redesign the input reading phase of these GPU tests to create and use mmap’d files for their input arrays (which SE mode completes all at once) instead of reading in the files entry by entry. In doing so, we see significant reductions in runtime of at least 29% 
    more » « less
  4. In recent years, we have been enhancing and updating gem5’s GPU support. First, we have enhanced gem5’s GPU support for ML workloads such that gem5 can now run. Moreover, as part of this support, we created, validated, and released a Docker image that contains the proper software and libraries needed to run GCN3 and Vega GPU models in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model, out of the box without needing to properly install the appropriate ROCm software and libraries. Additionally, we have updated gem5 to make it easier to reproduce results, including releasing support for a number of GPU workloads in gem5-resources and enabling continuous integration testing on future GPU commits. However, in an effort to provide sufficient coverage, the cur- rent testing support for GPU tests requires significant runtime both for the nightly and weekly regression tests. Currently most of these regression tests test the GPU SE mode support, since GPU FS mode support is still nascent. Unfortunately, much of this time is spent parsing input files to create arrays and other data structures that the GPU subsequently computes on. Although SE mode does not simulate the system calls needed to read these input files, nevertheless this still represents a significant overhead that increases runtime and prevents other tests (potentially providing additional coverage) from being run in that same timeframe. In an effort to address this, in the work we have been working on utilizing SE mode’s avoiding modeling system calls to speed up the runtime of the GPU regression tests. Specifically, we redesign the input reading phase of these GPU tests to create and use mmap’d files for their input arrays (which SE mode completes all at once) instead of reading in the files entry by entry. In doing so, we see significant reductions in runtime of at least 29% 
    more » « less
    Free, publicly-accessible full text available June 3, 2024
  5. The gem5 simulator offers Classic and Ruby as two separate memory models for simulating on-chip caches. The Classic model, which originated from M5, is a quick and simple option that allows for easy configuration, but only supports a basic MOESI coherence protocol. On the other hand, the Ruby model, which was developed by GEMS [2], is a more advanced and flexible option that can accurately simulate a wider range of cache coherence protocols and features. However, choosing between the two memory system models in gem5 is challenging for researchers as each has advantages and limitations which can be inconvenient. In particular, this has led to a bifurcation of effort where prior work has added replacement policies to Classic and Ruby in parallel – duplicating effort unnecessarily and preventing users from using a desired replacement policy if it is not implemented in the desired memory model (e.g., users could only use RRIP in Classic). Accordingly, we merged the cache replacement policies from Classic to Ruby, enabling users to use any of the replacement policies in either memory model. Gem5 currently has the capability to support 13 replacement policies, which can be used exchangeable within the Classic and Ruby cache models, including commonly used options like LRU, FIFO, PseudoLRU, and different types of RRIPs. After combining the replacement policies for the Classic and Ruby cache models, we designed and integrated (into gem5’s nightly regressions) multiple corner case tests to verify and ensure the continued correct functionality of these policies. Through these tests, we identified and fixed several bugs to ensure that the replacement policies operate correctly. Finally, with the newly enabled and verified functionality, since there is limited information about how different replacement policies affects GPU performance, we decided to use gem5 to study these policies in a GPU context. Specifically, we study GPU L2 caches, since GPU L1 caches are often used to stream data through and thus are unlikely to be significantly impacted by replacement policy. 
    more » « less
    Free, publicly-accessible full text available June 3, 2024
  6. In recent years, we have been enhancing and updating gem5’s GPU support, including enhanced gem5’s GPU support to enable running ML workloads. Moreover, we created, validated, and released a Docker image with the proper software and libraries needed to run AMD’s GCN3 and Vega GPU models in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model, out of the box without needing to properly install the appropriate ROCm software and libraries. Additionally, we updated gem5 to make it easier to reproduce results, including releasing support for a number of GPU workloads in gem5-resources and enabling continuous integration testing for a variety of GPU workloads. Current gem5 support focuses on Carrizo- and Vega-class GPUs. Unfortunately, these models do not always provide high accuracy relative to the equivalent ”real” GPUs. This leads to a mismatch in expectations: when prototyping new optimizations in gem5 users may draw the wrong conclusions about the efficacy of proposed optimizations if gem5’s GPU models do not provide high fidelity. Accordingly, to help bridge this divide, we design a series of micro-benchmarks designed expose the latencies, bandwidths, and sizes of a variety of GPU components on real GPUs. By iteratively applying fixes and improvements to gem’s GPU model, we significantly improve its fidelity relative to real AMD GPUs. 
    more » « less
    Free, publicly-accessible full text available June 3, 2024
  7. In recent years, we have been enhancing and updating gem5's GPU support. First, we have enhanced gem5’s GPU support for ML workloads such that gem5 can now run. Moreover, as part of this support, we created, validated, and released a Docker image that contains the proper software and libraries needed to run GCN3 and Vega GPU models in gem5. With this container, users can run the gem5 GPU model, as well as build the ROCm applications that they want to run in the GPU model, out of the box without needing to properly install the appropriate ROCm software and libraries. Additionally, we have updated gem5 to make it easier to reproduce results, including releasing support for a number of GPU workloads in gem5-resources and enabling continuous integration testing on future GPU commits. However, we currently do not have a way to model validated gem5 configurations for the most recent AMD GPUs. Current support focuses on Carrizo- and Vega-class GPUs. Unfortunately, these models do not always provide high accuracy relative to real GPU runs. This leads to a mismatch between how each instruction is supposedly being executed according to the ISA and how a given GPU model executes a given instruction. These discrepancies are of interest to those developing the gem5 GPU models as they can lead to less accurate simulations. Accordingly, to help bridge this divide, we have created a new tool, GAP (gem5 GPU Accuracy Profiler), to identify discrepancies between real GPU and simulated gem5 GPU behavior. GAP identifies and verifies how accurate these configurations relative to real GPUs by comparing the simulator’s performance counters to those from real GPUs. 
    more » « less
  8. null (Ed.)
    Running experiments in modern computer architecture simulators can be a difficult and error-prone endeavor. Users must track many configurations, components and outputs between simulation runs. The gem5 simulator is no exception to this, requiring researchers to gather, organize, and create a significant number of components for a single simulation. In this paper, we present the GEM5ART framework, a tool to aid gem5 users in better structuring and running architecture simulations, and GEM5 RESOURCES, a suite of resources with known compatibility with the simulator. These new additions to the gem5 project make full system simulation easier, allowing researchers to concentrate more so on their architectural innovations over setting up the simulation framework. The GEM5ART framework carefully logs the resources used in a gem5 simulation and places the results obtained within a database, thus enabling simple reproduction of experiments. The pre-built resources allow researchers to jump straight into running simulations rather than having to spend valuable time creating them. GEM5ART has been released with a permissive, open source license allowing the broader computer architecture community to contribute as workloads and workflows evolve. An archive of the data, an related materials, presented in this paper can be found at https://doi.org/10.6084/m9.figshare.14176802 
    more » « less
  9. The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 simulator has been under active development over the last nine years since the original gem5 release. In this time, there have been over 7500 commits to the codebase from over 250 unique contributors which have improved the simulator by adding new features, fixing bugs, and increasing the code quality. In this paper, we give and overview of gem5's usage and features, describe the current state of the gem5 simulator, and enumerate the major changes since the initial release of gem5. We also discuss how the gem5 simulator has transitioned to a formal governance model to enable continued improvement and community support for the next 20 years of computer architecture research. 
    more » « less
  10. In the past decade, GPUs have become an important resource for compute-intensive, general-purpose GPU applications such as machine learning, big data analysis, and large-scale simulations. In the future, with the explosion of machine learning and big data, application demands will keep increasing, resulting in more data and computation being pushed to GPUs. However, due to the slowing of Moore’s Law and rising manufacturing costs, it is becoming more and more challenging to add compute resources into a single GPU device to improve its throughput. As a result, spreading work across multiple GPUs is popular in data-centric and scientific applications. For example, Facebook uses 8 GPUs per server in their recent machine learning platform. However, research infrastructure has not kept pace with this trend: most GPU hardware simulators, including gem5, only support a single GPU. Thus, it is hard to study interference between GPUs, communication between GPUs, or work scheduling across GPUs. Our research group has been working to address this shortcoming by adding multi-GPU support to gem5. Here, we discuss the changes that were needed, which included updating the emulated driver, GPU components, and coherence protocol. 
    more » « less