skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Towards GPU-Accelerated PRM for Autonomous Navigation
Sampling based planning is an important step for long-range navigation for an autonomous vehicle. This work proposes a GPU-accelerated sampling based path planning algorithm which can be used as a global planner in autonomous navigation tasks. A modified version of the generation portion for the Probabilistic Road Map (PRM) algorithm is presented which reorders some steps of the algorithm in order to allow for parallelization and thus can benefit highly from utilization of a GPU. The GPU and CPU algorithms were compared using a simulated navigation environment with graph generation tasks of several different sizes. It was found that the GPU-accelerated version of the PRM algorithm had significant speedup over the CPU version (up to 78×). This results provides promising motivation towards implementation of a real-time autonomous navigation system in the future.  more » « less
Award ID(s):
1719027
PAR ID:
10188830
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
International Conference on Information Technology–New Generations
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Heterogeneous computing environments combining CPU and GPU resources provide a great boost to large-scale scientific computing applications. Code generation utilities that partition the work into CPU and GPU tasks while considering data movement costs allow researchers to develop high-performance solutions more quickly and easily, and make these resources accessible to a larger user base.We present developments for a domain-specific language (DSL) and code generation framework for solving partial differential equations (PDEs). These enhancements facilitate GPU-accelerated solution of the Boltzmann transport equation (BTE) for phonons, which is the governing equation for simulating thermal transport in semiconductor materials at sub-micron scales. The solution of the BTE involves thousands of coupled PDEs as well as complicated boundary conditions and solving a nonlinear equation that couples all of the degrees of freedom at each time step. These developments enable the DSL to generate configurable hybrid GPU/CPU code that couples accelerated kernels with user-defined code. We observed performance improvements of around 18X compared to a CPU-only version produced by this same DSL with minimal additional programming effort. 
    more » « less
  2. Abstract We study the performance of a cloud-based GPU-accelerated inference server to speed up event reconstruction in neutrino data batch jobs. Using detector data from the ProtoDUNE experiment and employing the standard DUNE grid job submission tools, we attempt to reprocess the data by running several thousand concurrent grid jobs, a rate we expect to be typical of current and future neutrino physics experiments. We process most of the dataset with the GPU version of our processing algorithm and the remainder with the CPU version for timing comparisons. We find that a 100-GPU cloud-based server is able to easily meet the processing demand, and that using the GPU version of the event processing algorithm is two times faster than processing these data with the CPU version when comparing to the newest CPUs in our sample. The amount of data transferred to the inference server during the GPU runs can overwhelm even the highest-bandwidth network switches, however, unless care is taken to observe network facility limits or otherwise distribute the jobs to multiple sites. We discuss the lessons learned from this processing campaign and several avenues for future improvements. 
    more » « less
  3. For a CPU-GPU heterogeneous computing system, different types of processors have load balancing problems in the calculation process. What’s more, multitasking cannot be matched to the appropriate processor core is also an urgent problem to be solved. In this paper, we propose a task scheduling strategy for high-performance CPU-GPU heterogeneous computing platform to solve these problems. For the single task model, a task scheduling strategy based on loadaware for CPU-GPU heterogeneous computing platform is proposed. This strategy detects the computing power of the CPU and GPU to process specified tasks, and allocates computing tasks to the CPU and GPU according to the perception ratio. The tasks are stored in a bidirectional queue to reduce the additional overhead brought by scheduling. For the multi-task model, a task scheduling strategy based on the genetic algorithm for CPU-GPU heterogeneous computing platform is proposed. The strategy aims at improving the overall operating efficiency of the system, and accurately binds the execution relationship between different types of tasks and heterogeneous processing cores. Our experimental results show that the scheduling strategy can improve the efficiency of parallel computing as well as system performance. 
    more » « less
  4. Graphics Processing Units (GPU) are increasingly deployed on Cyber-physical Systems (CPSs), frequently used to perform real-time safety-critical functions, such as object detection on autonomous vehicles. As a result, availability is important for GPU tasks in CPS platforms. However, existing Trusted Execution Environments (TEE) solutions with availability guarantees focus only on CPU computing.To bridge this gap, we propose AvaGPU, a TEE that guarantees real-time availability for CPU tasks involving GPU execution under compromised OS. There are three technical challenges. First, to prevent malicious resource contention due to separate scheduling of CPU and GPU tasks, we proposed a CPU-GPU co-scheduling framework that couples the priority of CPU and GPU tasks. Second, we propose software-based secure preemption on GPU tasks to bound the degree of priority inversion on GPU. Third, we propose a new split design of GPU driver with minimized Trusted Computing Base (TCB) to achieve secure and efficient GPU management for CPS. We implement a prototype of AvaGPU on the Jetson AGX Orin platform. The system is evaluated on benchmark, synthetic tasks, and real-world applications with 15.87% runtime overhead on average. 
    more » « less
  5. Knowledge graphs (KGs) capture knowledge in the form of head– relation–tail triples and are a crucial component in many AI systems. There are two important reasoning tasks on KGs: (1) single-hop knowledge graph completion, which involves predicting individual links in the KG; and (2), multi-hop reasoning, where the goal is to predict which KG entities satisfy a given logical query. Embedding-based methods solve both tasks by first computing an embedding for each entity and relation, then using them to form predictions. However, existing scalable KG embedding frameworks only support single-hop knowledge graph completion and cannot be applied to the more challenging multi-hop reasoning task. Here we present Scalable Multi-hOp REasoning (SMORE), the first general framework for both single-hop and multi-hop reasoning in KGs. Using a single machine SMORE can perform multi-hop reasoning in Freebase KG (86M entities, 338M edges), which is 1,500× larger than previously considered KGs. The key to SMORE’s runtime performance is a novel bidirectional rejection sampling that achieves a square root reduction of the complexity of online training data generation. Furthermore, SMORE exploits asynchronous scheduling, overlapping CPU-based data sampling, GPU-based embedding computation, and frequent CPU–GPU IO. SMORE increases throughput (i.e., training speed) over prior multi-hop KG frameworks by 2.2× with minimal GPU memory requirements (2GB for training 400-dim embeddings on 86M-node Freebase) and achieves near linear speed-up with the number of GPUs. Moreover, on the simpler single-hop knowledge graph completion task SMORE achieves comparable or even better runtime performance to state-of-the-art frameworks on both single GPU and multi-GPU settings. 
    more » « less