Computer systems research heavily relies on simulation tools like gem5 to effectively prototype and validate new ideas. However, publicly available simulators struggle to accurately model systems as architectures evolve rapidly. This is a major issue because incorrect simulator models may lead researchers to draw misleading or even incorrect conclusions about their research prototypes from these simulators. Although this challenge pertains to many open source simulators, we focus on the widely used, open source gem5 simulator. In GAP we showed that gem5’s GPGPU models have significant correlation issues versus real hardware. GAP also improved the fidelity of gem5’s AMDGPU model, particularly for cache access latencies and bandwidths. However, one critical issue remains: our microbenchmarks reveal 88% error in memory bandwidth between gem5’s current model and corresponding real AMD GPUs. To narrow this gap, we examined recent patents and gem5’s memory system bottlenecks, then made several improvements including: utilizing a redesigned HBM memory controller, enhancing TLB request coalescing, adding support for multiple page sizes, adding a page walk cache, and improving network bandwidth modeling. Collectively, these optimizations significantly improve gem5’s GPU memory bandwidth by 3.8x: from 153 GB/s to 583 GB/s. Moreover, our address translation enhancements can be ported to other ISAs where similar support is also needed, improving gem5’s MMU support. 
                        more » 
                        « less   
                    
                            
                            Facilitating the Bootstrapping of a New ISA
                        
                    
    
            Implementation of a new instruction set architecture (ISA) is a non-trivial task which involves significant modifications to the system software, such as the compiler, the assembler, and the linker. This task also includes modifying and verifying functional and cycle accurate simulators to facilitate correct simulation and performance evaluation of programs under the new ISA. Isolating errors in these software components becomes extremely challenging and demands automated and semi-automated mechanisms since neither the compilation infrastructure nor the simulation infrastructure can be trusted as both parties have been heavily modified. Bootstrapping a new ISA is very common in embedded systems since there is a greater variety of embedded ISAs due to often not having a need to support backward compatibility of executables. In this paper, we present the tools and the verification mechanisms we have implemented to support the development of a number of related, but distinct ISAs. These ISAs are similar in complexity to the RISC-V ISA, and range from simple pipelined and superscalar processor ISAs, to a complete VLIW ISA. Our work in developing the system software and simulators for these ISAs demonstrate that a step-by-step semi-automated approach which relies on simple invariants can facilitate effective bootstrapping of the complete system software and the simulator infrastructure. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2103103 1565215 1822737 1823417 1900788 2030070 2146354 2211354 2103105 1901005 2211353
- PAR ID:
- 10432098
- Date Published:
- Journal Name:
- Languages, Compilers, and Tools for Embedded Systems
- Page Range / eLocation ID:
- 2 to 12
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            null (Ed.)The increased pervasiveness of technological advancements in automation makes it urgent to address the question of how work is changing in response. Focusing on applications of machine learning (ML) to automate information tasks, we draw on a simple framework for identifying the impacts of an automated system on a task that suggests 3 patterns for the use of ML—decision support, blended decision making and complete automation. In this paper, we extend this framework by considering how automation of one task might have implications for interdependent tasks and how automation applies to coordination mechanisms.more » « less
- 
            Heterogeneous architectures have become increasingly common. From co-packaging small and large cores, to GPUs alongside CPUs, to general-purpose heterogeneous-ISA architectures with cores implementing different ISAs. As diversity of execution cores grows, predictive models become of paramount importance for scheduling and resource allocation. In this paper, we investigate the capabilities of performance predictors in a heterogeneous-ISA setting, as well as the predictors’ effects on scheduler quality. We follow an unbiased feature selection methodology to identify the optimal set of features for this task, instead of pre-selecting features before training. Finally, we incorporate our findings in ML-based schedulers and evaluate their sensitivity to the underlying system’s level of heterogeneity. We show our schedulers to perform within 2-11% of an oracular scheduler across a variety of underlying heterogeneous-ISA multicore systems without modification.more » « less
- 
            Aldrich, Jonathan; Salvaneschi, Guido (Ed.)Tensor processing infrastructures such as deep learning frameworks and specialized hardware accelerators have revolutionized how computationally intensive code from domains such as deep learning and image processing is executed and optimized. These infrastructures provide powerful and expressive abstractions while ensuring high performance. However, to utilize them, code must be written specifically using the APIs / ISAs of such software frameworks or hardware accelerators. Importantly, given the fast pace of innovation in these domains, code written today quickly becomes legacy as new frameworks and accelerators are developed, and migrating such legacy code manually is a considerable effort. To enable developers in leveraging such DSLs while preserving their current programming paradigm, we present Tenspiler, a verified-lifting-based compiler that uses program synthesis to translate sequential programs written in general-purpose programming languages (e.g., C++ or Python code that does not leverage any specialized framework or accelerator) into tensor operations. Central to Tenspiler is our carefully crafted yet simple intermediate language, named TensIR, that expresses tensor operations. TensIR enables efficient lifting, verification, and code generation. Unlike classical pattern-matching-based compilers, Tenspiler uses program synthesis to translate input code into TensIR, which is then compiled to the target API / ISA. Currently, Tenspiler already supports six DSLs, spanning a broad spectrum of software and hardware environments. Furthermore, we show that new backends can be easily supported by Tenspiler by adding simple pattern-matching rules for TensIR. Using 10 real-world code benchmark suites, our experimental evaluation shows that by translating code to be executed on 6 different software frameworks and hardware devices, Tenspiler offers on average 105× kernel and 9.65× end-to-end execution time improvement over the fully-optimized sequential implementation of the same benchmarks.more » « less
- 
            null (Ed.)Robot Dynamic Simulators offer convenient implementation and testing of physical robots, thus accelerating research and development. While existing simulators support most real-world robots with serially linked kinematic and dynamic chains, they offer limited or conditional support for complex closed-loop robots. On the other hand, many of the underlying physics computation libraries that these simulators employ support closed-loop kinematic chains and redundant mechanisms. Such mechanisms are often utilized in surgical robots to achieve constrained motions (e.g., the remote center of motion (RCM)). To deal with such robots, we propose a new simulation framework based on a front-end description format and a robust real-time dynamic simulator. Although this study focuses on surgical robots, the proposed format and simulator are applicable to any type of robot. In this manuscript, we describe the philosophy and implementation of the front-end description format and demonstrate its performance and the simulator's capabilities using simulated models of real-world surgical robots.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    