skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: FireMarshal: Making HW/SW Co-Design Reproducible and Reliable
Reproducibility in the sciences is critical to reliable inquiry, but is often easier said than done. In the computer architecture community, research may require modifying systems from low-level circuits to operating systems and highlevel applications. All of these moving parts make reproducible experiments on full-stack systems challenging to design. Furthermore, the computing ecosystem evolves quickly, leading to rapidly obsolete artifacts. This is especially true in the realm of software where applications are often updated on a monthly, or even daily, cadence. In this paper we introduce FireMarshal, a software workload management tool for RISC-V based full-stack hardware development and research. FireMarshal automates workload generation (constructing boot binaries and filesystem images), development (with functional simulation), and evaluation (with cycle-exact RTL simulation). It also ensures, to the extent possible, that the exact same software runs deterministically across all phases of development, providing confidence in correctness and accuracy while minimizing time spent on slow and expensive RTL-level simulation. To ease workload specification, FireMarshal provides sane defaults for common components like firmware and operating systems, freeing users to focus only on project-specific components. Beyond reproducibility, Fire- Marshal enables continued development of workloads through the use of inheritance, where new workloads can be derived from established and continually updated base workloads. Users communicate their designs through the use of simple JSON configuration files that can be easily version controlled, reused, and shared. In this paper, we describe the design of FireMarshal along with the associated software management methodology for architectural research and development.  more » « less
Award ID(s):
2016662
PAR ID:
10290343
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2021 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
Page Range / eLocation ID:
299 to 309
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The next generation of supercomputing resources is expected to greatly expand the scope of HPC environments, both in terms of more diverse workloads and user bases, as well as the integration of edge computing infrastructures. This will likely require new mechanisms and approaches at the Operating System level to support these broader classes of workloads along with their different security requirements. We claim that a key mechanism needed for these workloads is the ability to securely compartmentalize the system software executing on a given node. In this paper, we present initial efforts in exploring the integration of secure and trusted computing capabilities into an HPC system software stack. As part of this work we have ported the Kitten Lightweight Kernel (LWK) to the ARM64 architecture and integrated it with the Hafnium hypervisor, a reference implementation of a secure partition manager (SPM) that provides security isolation for virtual machines. By integrating Kitten with Hafnium, we are able to replace the commodity oriented Linux based resource management infrastructure and reduce the overheads introduced by using a full weight kernel (FWK) as the node-level resource scheduler. While our results are very preliminary, we are able to demonstrate measurable performance improvements on small scale ARM based SOC platforms. 
    more » « less
  2. With the rapid growth of the machine learning applications, the workloads of future HPC systems are anticipated to be a mix of scientific simulation, big data analytics, and machine learning applications. Simulation is a great research vehicle to understand the performance implications of co-running scientific applications with big data and machine learning workloads on large-scale systems. In this paper, we present Union, a workload manager that provides an automatic framework to facilitate hybrid workload simulation in CODES. Furthermore, we use Union, along with CODES, to investigate various hybrid workloads composed of traditional simulation applications and emerging learning applications on two dragonfly systems. The experiment results show that both message latency and communication time are important performance metrics to evaluate network interference. Network interference on HPC applications is more reflected by the message latency variation, whereas ML application performance depends more on the communication time. 
    more » « less
  3. Full-system simulation of computer systems is critical for capturing the complex interplay between various hard-ware and software components in future systems. Modeling the network subsystem is indispensable for the fidelity of full-system simulations due to the increasing importance of scale-out systems. Over the last decade, the network software stack has undergone major changes, with userspace networking stacks and data-plane networks rapidly replacing the conventional kernel network stack. Nevertheless, the current state-of-the-art architectural simulator, gem5, still employs kernel networking, which precludes realistic network application scenarios. In this work, we first demonstrate the limitations of gem5's current network stack in achieving high network bandwidth. Then, we enable a userspace networking stack on gem5. We extend gem5's NIC hardware model and device driver to sup-port userspace device drivers running the DPDK framework. Additionally, we implement a network load generator hardware model in gem5 to generate various traffic patterns and per-form per-packet timestamp and latency measurements without introducing packet loss. We develop a suite of six network-intensive benchmarks for stress testing the host network stack. These applications, based on DPDK, can run on both gem5 and real systems. Our experimental results show that enabling userspace networking improves gem5's network bandwidth by 6.3× compared with the current Linux kernel software stack. We characterize the performance of DPDK benchmarks running on both a real system and gem5, and evaluate the sensitivity of the applications to various system and microarchitecture parameters. This work marks the first step in refactoring the networking subsystem in gem5. 
    more » « less
  4. null (Ed.)
    Hybrid Transactional and Analytical Processing (HTAP) systems suffer from workload interference at the software and hardware level. We examine workload interference for HTAP systems and highlight investigation directions to mitigate the interference. We use the popular two-copy HTAP architecture. The OLTP and OLAP sides are independent components with their own private copies of the data. The OLTP side is a row-store, whereas the OLAP side is a column-store. The OLTP and OLAP sides are connected by means of an intermediate data structure, delta, that keeps track of the fresh tuples that are generated by the OLTP side, but not yet transferred to the OLAP side. OLTP transactions register their modifications to delta before committing. OLAP queries first prop- agate fresh tuples from the OLTP side to the OLAP side and then perform query execution over the data at the OLAP side. HTAP systems suffer from interference at both the software and hardware level. Software-level interference depends on the OLTP and fresh tuple propagation throughput. In order to minimize interference, HTAP systems should ensure that fresh tuple propagation throughput is greater than the throughput of the OLTP transactions that generate the fresh tuples. Hardware-level interference depends on the demand for shared resources such as LLC and memory bandwidth by the OLTP and OLAP workloads. HTAP systems should isolate the OLTP and OLAP workloads in the shared resources and use micro-architectural re- source allocation policies that assign the optimal amount of re- sources to OLTP and OLAP workloads to minimize hardware-level interference. 
    more » « less
  5. Scientific workflows are used routinely in numerous scientific domains, and Workflow Management Systems (WMSs) have been developed to orchestrate and optimize workflow executions on distributed platforms. WMSs are complex software systems that interact with complex software infrastructures. Most WMS research and development activities rely on empirical experiments conducted with full-fledged software stacks on actual hardware platforms. Such experiments, however, are limited to hardware and software infrastructures at hand and can be labor- and/or time-intensive. As a result, relying solely on real-world experiments impedes WMS research and development. An alternative is to conduct experiments in simulation. In this work we present WRENCH, a WMS simulation framework, whose objectives are (i) accurate and scalable simulations; and (ii) easy simulation software development. WRENCH achieves its first objective by building on the SimGrid framework. While SimGrid is recognized for the accuracy and scalability of its simulation models, it only provides low-level simulation abstractions and thus large software development efforts are required when implementing simulators of complex systems. WRENCH thus achieves its second objective by providing high-level and directly re-usable simulation abstractions on top of SimGrid. After describing and giving rationales for WRENCH’s software architecture and APIs, we present a case study in which we apply WRENCH to simulate the Pegasus production WMS. We report on ease of implementation, simulation accuracy, and simulation scalability so as to determine to which extent WRENCH achieves its two above objectives. We also draw both qualitative and quantitative comparisons with a previously proposed workflow simulator. 
    more » « less