skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Chauffeur: Benchmark Suite for Design and End-to-End Analysis of Self-Driving Vehicles on Embedded Systems
Self-driving systems execute an ensemble of different self-driving workloads on embedded systems in an end-to-end manner, subject to functional and performance requirements. To enable exploration, optimization, and end-to-end evaluation on different embedded platforms, system designers critically need a benchmark suite that enables flexible and seamless configuration of self-driving scenarios, which realistically reflects real-world self-driving workloads’ unique characteristics. Existing CPU and GPU embedded benchmark suites typically (1) consider isolated applications, (2) are not sensor-driven, and (3) are unable to support emerging self-driving applications that simultaneously utilize CPUs and GPUs with stringent timing requirements. On the other hand, full-system self-driving simulators (e.g., AUTOWARE, APOLLO) focus on functional simulation, but lack the ability to evaluate the self-driving software stack on various embedded platforms. To address design needs, we present Chauffeur, the first open-source end-to-end benchmark suite for self-driving vehicles with configurable representative workloads. Chauffeur is easy to configure and run, enabling researchers to evaluate different platform configurations and explore alternative instantiations of the self-driving software pipeline. Chauffeur runs on diverse emerging platforms and exploits heterogeneous onboard resources. Our initial characterization of Chauffeur on different embedded platforms – NVIDIA Jetson TX2 and Drive PX2 – enables comparative evaluation of these GPU platforms in executing an end-to-end self-driving computational pipeline to assess the end-to-end response times on these emerging embedded platforms while also creating opportunities to create application gangs for better response times. Chauffeur enables researchers to benchmark representative self-driving workloads and flexibly compose them for different self-driving scenarios to explore end-to-end tradeoffs between design constraints, power budget, real-time performance requirements, and accuracy of applications.  more » « less
Award ID(s):
1704859
PAR ID:
10297482
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
ACM Transactions on Embedded Computing Systems
Volume:
20
Issue:
5s
ISSN:
1539-9087
Page Range / eLocation ID:
1 to 22
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The increasing size of input graphs for graph neural networks (GNNs) highlights the demand for using multi-GPU platforms. However, existing multi-GPU GNN systems optimize the computation and communication individually based on the conventional practice of scaling dense DNNs. For irregularly sparse and fine-grained GNN workloads, such solutions miss the opportunity to jointly schedule/optimize the computation and communication operations for high-performance delivery. To this end, we propose MGG , a novel system design to accelerate full-graph GNNs on multi-GPU platforms. The core of MGG is its novel dynamic software pipeline to facilitate fine-grained computation-communication overlapping within a GPU kernel. Specifically, MGG introduces GNN-tailored pipeline construction and GPU-aware pipeline mapping to facilitate workload balancing and operation overlapping. MGG also incorporates an intelligent runtime design with analytical modeling and optimization heuristics to dynamically improve the execution performance. Extensive evaluation reveals that MGG outperforms state-of-the-art full-graph GNN systems across various settings: on average 4.41×, 4.81×, and 10.83× faster than DGL, MGG-UVM, and ROC, respectively. 
    more » « less
  2. The emergence of “robotics in the wild” has triggered a wave of recent research in hardware and software to boost robots’ compute capabilities. Nevertheless, research in this area is hindered by the lack of a comprehensive benchmark suite. In this paper, we present RTRBench, a benchmark suite for robotic kernels. RTRBench includes 16 kernels, spanning the entire software pipeline of a wide swath of robots, all implemented in C++ for fast execution. Together with the suite, we conduct an evaluation of the workloads at the architecture level. We pinpoint the sources of inefficiencies in a modern robotic processor when executing the robotic kernels, along with the opportunities for improvements. The source code of the benchmark suite is available in https://cmu-roboarch.github.io/rtrbench/. 
    more » « less
  3. The recent introduction of Unified Virtual Memory (UVM) in GPUs offers a new programming model that allows GPUs and CPUs to share the same virtual memory space, which shifts the complex memory management from programmers to GPU driver/ hardware and enables kernel execution even when memory is oversubscribed. Meanwhile, UVM may also incur considerable performance overhead due to tracking and data migration along with special handling of page faults and page table walk. As UVM is attracting significant attention from the research community to develop innovative solutions to these problems, in this paper, we propose a comprehensive UVM benchmark suite named UVMBench to facilitate future research on this important topic. The proposed UVMBench consists of 32 representative benchmarks from a wide range of application domains. The suite also features unified programming implementation and diverse memory access patterns across benchmarks, thus allowing thorough evaluation and comparison with current state-of-the-art. A set of experiments have been conducted on real GPUs to verify and analyze the benchmark suite behaviors under various scenarios. 
    more » « less
  4. Integrated CPU-GPU architecture provides excellent acceleration capabilities for data parallel applications on embedded platforms while meeting the size, weight and power (SWaP) requirements. However, sharing of main memory between CPU applications and GPU kernels can severely affect the execution of GPU kernels and diminish the performance gain provided by GPU. For example, in the NVIDIA Jetson TX2 platform, an integrated CPU-GPU architecture, we observed that, in the worst case, the GPU kernels can suffer as much as 3X slowdown in the presence of co-running memory intensive CPU applications. In this paper, we propose a software mechanism, which we call BWLOCK++, to protect the performance of GPU kernels from co-scheduled memory intensive CPU applications. 
    more » « less
  5. Genomic analysis is the study of genes which includes the identification, measurement, or comparison of genomic features. Genomics research is of great importance to our society because it can be used to detect diseases, create vaccines, and develop drugs and treatments. As a type of general-purpose accelerators with massive parallel processing capability, GPUs have been recently used for genomics analysis. Developing GPU-based hardware and software frameworks for genome analysis is becoming a promising research area. To support this type of research, benchmarks are needed that can feature representative, concurrent, and diverse applications running on GPUs. In this work, we created a benchmark suite called Genomics-GPU, which contains 10 widely-used genomic analysis applications. It covers genome comparison, matching, and clustering for DNAs and RNAs. We also adapted these applications to exploit the CUDA Dynamic Parallelism (CDP), a recent advanced feature supporting dynamic GPU programming, to further improve the performance. Our benchmark suite can serve as a basis for algorithm optimization and also facilitate GPU architecture development for genomics analysis. 
    more » « less