skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Accelerating 2-D Image Convolution Using a Graphics Processing Unit
Image processing is an important technique that is used in many fields, such as self-driving vehicles or facial recognition. One method is called image convolution, which involves many calculations that manipulate the pixels of an image to produce a new image with a desired effect. This is computation intensive and requires a significant amount of time when run on a traditional computer processing unit (CPU). Since image processing is used for real-time applications, such as those mentioned above, it is essential that convolution algorithms run as quickly as possible. A common way to speed up image convolution algorithms is to take advantage of the highly parallel structure of graphical processing units (GPU) to perform concurrent calculations. One problem with GPU applications is that they are often limited by the latency delays associated with transferring data between the CPU and the GPU. Previous works have looked into different ways to address this issue and optimize GPU programs. This research aims to explore different memory implementations and compare them to see which is best at optimizing data transfers.  more » « less
Award ID(s):
1659650
PAR ID:
10314076
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2021 IEEE Western New York Image and Signal Processing Workshop (WNYISPW)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Whenever the concept of high-performance cloth simulation is brought up, GPU acceleration is almost always the first that comes to mind. Leveraging immense parallelization, GPU algorithms have demonstrated significant success recently, whereas CPU methods are somewhat overlooked. Indeed, the need for an efficient CPU simulator is evident and pressing. In many scenarios, high-end GPUs may be unavailable or are already allocated to other tasks, such as rendering and shading. A high-performance CPU alternative can greatly boost the overall system capability and user experience. Inspired by this demand, this paper proposes a CPU algorithm for high-resolution cloth simulation. By partitioning the garment model into multiple (but not massive) sub-meshes or domains, we assign per-domain computations to individual CPU processors. Borrowing the idea of projective dynamics that breaks the computation into global and local steps, our key contribution is a new parallelization paradigm at domains for both global and local steps so that domain-level calculations are sequential and lightweight. The CPU has much fewer processing units than a GPU. Our algorithm mitigates this disadvantage by wisely balancing the scale of the parallelization and convergence. We validate our method in a wide range of simulation problems involving high-resolution garment models. Performance-wise, our method is at least one order faster than existing CPU methods, and it delivers a similar performance compared with the state-of-the-art GPU algorithms in many examples, but without using a GPU. 
    more » « less
  2. null (Ed.)
    Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the data volumes of such experiments are rapidly increasing. The demand to process billions of neutrino events with many machine learning algorithm inferences creates a computing challenge. We explore a computing model in which heterogeneous computing with GPU coprocessors is made available as a web service. The coprocessors can be efficiently and elastically deployed to provide the right amount of computing for a given processing task. With our approach, Services for Optimized Network Inference on Coprocessors (SONIC), we integrate GPU acceleration specifically for the ProtoDUNE-SP reconstruction chain without disrupting the native computing workflow. With our integrated framework, we accelerate the most time-consuming task, track and particle shower hit identification, by a factor of 17. This results in a factor of 2.7 reduction in the total processing time when compared with CPU-only production. For this particular task, only 1 GPU is required for every 68 CPU threads, providing a cost-effective solution. 
    more » « less
  3. Abstract Atmospheric processes involve both space and time. Thus, humans looking at atmospheric imagery can often spot important signals in an animated loop of an image sequence not apparent in an individual (static) image. Utilizing such signals with automated algorithms requires the ability to identify complex spatiotemporal patterns in image sequences. That is a very challenging task due to the endless possibilities of patterns in both space and time. Here, we review different concepts and techniques that are useful to extract spatiotemporal signals from meteorological image sequences to expand the effectiveness of AI algorithms for classification and prediction tasks. We first present two applications that motivate the need for these approaches in meteorology, namely the detection of convection from satellite imagery and solar forecasting. Then we provide an overview of concepts and techniques that are helpful for the interpretation of meteorological image sequences, such as (a) feature engineering methods using (i) meteorological knowledge, (ii) classic image processing, (iii) harmonic analysis, and (iv) topological data analysis; (b) ways to use convolutional neural networks for this purpose with emphasis on discussing different convolution filters (2D/3D/LSTM-convolution); and (c) a brief survey of several other concepts, including the concept of “attention” in neural networks and its utility for the interpretation of image sequences and strategies from self-supervised and transfer learning to reduce the need for large labeled datasets. We hope that presenting an overview of these tools—many of which are not new but underutilized in this context—will accelerate progress in this area. 
    more » « less
  4. Graphics Processing Units (GPU) are increasingly deployed on Cyber-physical Systems (CPSs), frequently used to perform real-time safety-critical functions, such as object detection on autonomous vehicles. As a result, availability is important for GPU tasks in CPS platforms. However, existing Trusted Execution Environments (TEE) solutions with availability guarantees focus only on CPU computing.To bridge this gap, we propose AvaGPU, a TEE that guarantees real-time availability for CPU tasks involving GPU execution under compromised OS. There are three technical challenges. First, to prevent malicious resource contention due to separate scheduling of CPU and GPU tasks, we proposed a CPU-GPU co-scheduling framework that couples the priority of CPU and GPU tasks. Second, we propose software-based secure preemption on GPU tasks to bound the degree of priority inversion on GPU. Third, we propose a new split design of GPU driver with minimized Trusted Computing Base (TCB) to achieve secure and efficient GPU management for CPS. We implement a prototype of AvaGPU on the Jetson AGX Orin platform. The system is evaluated on benchmark, synthetic tasks, and real-world applications with 15.87% runtime overhead on average. 
    more » « less
  5. Multi-pattern matching is widely used in modern software for applications requiring high throughput such as protein search, network traffic inspection, virus or spam detection. Graphics Processor Units (GPUs) excel at executing massively parallel workloads. Regular expression (regex) matching is typically performed by simulating the execution of deterministic finite automata (DFAs) or nondeterministic finite automata (NFAs). The natural implementations of these automata simulation algorithms on GPUs are highly inefficient because they give rise to irregular memory access patterns. This paper presents HybridSA, a heterogeneous CPU-GPU parallel engine for multi-pattern matching. HybridSA uses bit parallelism to efficiently simulate NFAs on GPUs, thus reducing the number of memory accesses and increasing the throughput. Our bit-parallel algorithms extend the classical shift-and algorithm for string matching to a large class of regular expressions and reduce automata simulation to a small number of bitwise operations. We have developed a compiler to translate regular expressions into bit masks, perform optimizations, and choose the best algorithms to run on the GPU. The majority of the regular expressions are accelerated on the GPU, while the patterns that exhibit random memory accesses are executed on the CPU in parallel. We evaluate HybridSA against state-of-the-art CPU and GPU engines, as well as a hybrid combination of the two. HybridSA achieves between 4 and 60 times higher throughput than the state-of-the-art CPU engine and between 4 and 233 times better than the state-of-the-art GPU engine across a collection of real-world benchmarks. 
    more » « less