Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
Reproducibility in the sciences is critical to reliable inquiry, but is often easier said than done. In the computer architecture community, research may require modifying systems from low-level circuits to operating systems and highlevel applications. All of these moving parts make reproducible experiments on full-stack systems challenging to design. Furthermore, the computing ecosystem evolves quickly, leading to rapidly obsolete artifacts. This is especially true in the realm of software where applications are often updated on a monthly, or even daily, cadence. In this paper we introduce FireMarshal, a software workload management tool for RISC-V based full-stack hardware development and research. FireMarshal automates workload generation (constructing boot binaries and filesystem images), development (with functional simulation), and evaluation (with cycle-exact RTL simulation). It also ensures, to the extent possible, that the exact same software runs deterministically across all phases of development, providing confidence in correctness and accuracy while minimizing time spent on slow and expensive RTL-level simulation. To ease workload specification, FireMarshal provides sane defaults for common components like firmware and operating systems, freeing users to focus only on project-specific components. Beyond reproducibility, Fire- Marshal enables continued development of workloads through the use of inheritance, where new workloads can be derived frommore »
The design of computing systems has changed dramatically over the past decade, but most courses in advanced computer architecture remain unchanged. Computer architecture education lies at the intersection between computer science and electrical engineering, with practical exercises in classes based on appropriate levels of abstraction in the computing system design stack. Hardware-centric lab exercises often require broad infrastructure resources and tend to navigate around tedious practical implementation concepts, while software-centric exercises leave a gap between modeling and system implementation implications that students later need to overcome in professional settings. Vertical integration trends in domain-specific compute systems, as well as software-hardware co-design, are often covered in classroom lectures, but are not reflected in laboratory exercises due to complex tooling and simulation infrastructure. We describe our experiences with a joint hardware-software approach to exploring computer architecture concepts in class exercises, by using opensource processor hardware implementations, generator-based hardware design methodologies, and cloud-hosted FPGAs. This approach further enables scaling course enrollment, remote learning and a cross-class collaborative lab ecosystem, creating a connecting thread between computer science and electrical engineering experience-based curricula.
We present COBRA, a framework which enables a realistic hardware-guided methodology for evaluating compositions of hardware branch predictors. COBRA provides a common interface for developing RTL implementations of predictor subcomponents, as well as a predictor composer that automatically generates hardware predictor pipelines from sub-components based on a high-level topological model of a desired algorithm. We demonstrate how COBRA aids in the design and evaluation of diverse predictor architectures and how our hardware-centric approach captures concerns in predictor characterization that are not exposed in software-based algorithm development. Using COBRA, we generate three superscalar pipelined branch predictors with diverse architectures, synthesize them to run at 1 GHz on a commercial FinFET process, integrate them with the open-source BOOM out-of-order core, and evaluate their endto- end performance on workloads over trillions of cycles. The COBRA generator system has been open-sourced as part of the SonicBOOM out-of-order core.
We present FireSim, an open-source simulation platform that enables cycle-exact microarchitectural simulation of large scale-out clusters by combining FPGA-accelerated simulation of silicon-proven RTL designs with a scalable, distributed network simulation. Unlike prior FPGA-accelerated simulation tools, FireSim runs on Amazon EC2 F1, a public cloud FPGA platform, which greatly improves usability, provides elasticity, and lowers the cost of large-scale FPGA-based experiments. We describe the design and implementation of FireSim and show how it can provide sufficient performance to run modern applications at scale, to enable true hardware-software co-design. As an example, we demonstrate automatically generating and deploying a target cluster of 1,024 3.2 GHz quad-core server nodes, each with 16 GB of DRAM, interconnected by a 200 Gbit/s network with 2 microsecond latency, which simulates at a 3.4 MHz processor clock rate (less than 1,000x slowdown over real-time). In aggregate, this FireSim instantiation simulates 4,096 cores and 16 TB of memory, runs ~ 14 billion instructions per second, and harnesses 12.8 million dollars worth of FPGAs-at a total cost of only ~ $100 per simulation hour to the user. We present several examples to show how FireSim can be used to explore various research directions in warehouse-scale machine design, including modelingmore »