skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2344578

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The emergence of new, off-path smart network cards (SmartNICs), known generally as Data Processing Units (DPU), has opened a wide range of research opportunities. Of particular interest is the use of these and related devices in tandem with their host’s CPU, creating a heterogeneous computing system with new properties and strengths to be explored, capable of accelerating a wide variety of workloads. This survey begins by providing background information to this new field, such as discussing its origins, its motivations and challenges, listing a few of the current market offerings for DPUs, and providing some brief information about the major programming languages and frameworks for using them. Then, we review and categorize a number of recent works in the field, covering a wide variety of studies, benchmarks, and application areas such as in data center infrastructure, commercial uses, and AI and ML acceleration. 
    more » « less
    Free, publicly-accessible full text available March 3, 2026
  2. In this poster, we will show how to leverage NVidia’s Bluef ield Data Processing Unit (DPU) in geospatial systems. Existing work in literature has explored DPUs in the context of machine learning, compression and MPI acceleration. We show our designs on how to integrate DPUs into existing high performance geospatial systems like MPI-GIS. The workflow of a typical spatial computing workload consists of two phases- filter and refine. First we used DPU as a target to offload spatial computations from the host CPU. We show the performance improvements due to offload. Next we used DPU for network I/O processing. In network I/O case, the query data first comes to DPU for filtering and then the query goes to CPU for refinement. DPU-based filter and refine system can be useful in other domains like Physics where an FPGA is used to perform the filter to handle Big Data. We used Bluefield-2 and Bluefield-3 in our experiments. For scalability study, we have used up to 16 DPUs. 
    more » « less
  3. Spatial join is an important operation for combining spatial data. Parallelization is essential for improving spatial join performance. However, load imbalance due to data skew limits the scalability of parallel spatial join. There are many work sharing techniques to address this problem in a parallel environment. One of the techniques is to use data and space partitioning and then scheduling the partitions among threads/processes with the goal of minimizing workload differences across threads/processes. However, load imbalance still exists due to differences in join costs of different pairs of input geometries in the partitions. For the load imbalance problem, we have designed a work stealing spatial join system (WSSJ-DM) on a distributed memory environment. Work stealing is an approach for dynamic load balancing in which an idle processor steals computational tasks from other processors. This is the first work that uses work stealing concept (instead of work sharing) to parallelize spatial join computation on a large compute cluster. We have evaluated the scalability of the system on shared and distributed memory. Our experimental evaluation shows that work stealing is an effective strategy. We compared WSSJ-DM with work sharing implementations of spatial join on a high performance computing environment using partitioned and un-partitioned datasets. Static and dynamic load balancing approaches were used for comparison. We study the effect of memory affinity in work stealing operations involved in spatial join on a multi-core processor. WSSJ-DM performed spatial join using ST_Intersection on Lakes (8.4M polygons) and Parks (10M polygons) in 30 seconds using 35 compute nodes on a cluster (1260 CPU cores). A work sharing Master-Worker implementation took 160 seconds in contrast. 
    more » « less