skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: High-Performance Domain-Specific Library for Hydrologic Data Processing
Hydrologists must process many gigabytes of data for hydrologic simulations, which takes time and resources degrading performance. The performance issues are caused mainly by domain scientists’ preference for using Python, which trades performance for productivity. In my thesis, I demonstrate that using the static compilation technique to compile Python to generate C code along with several optimizations reduces time and resources for hydrologic data processing. I developed a Domain Specific Library (DSL) which is a subset of Python and compiles to Sparse Polyhedral Framework - Intermediate Representation (SPF-IR), which allows opportunities for optimizations like read reduction fusion which are not available in Python. We fused the file I/O to perform computation on small chunks of data (stream computation) in order to reduce the memory footprint. The C code we generated from SPF-IR shows an average speed-up of 2.58x over the existing hand-optimized implementations and can totally eliminate the tempo- rary storage required. DSL users can still enjoy the ease of use of Python but get performance better than the C code.  more » « less
Award ID(s):
2107135
PAR ID:
10462373
Author(s) / Creator(s):
Editor(s):
Cathie Olschanowsky
Date Published:
Journal Name:
ProQuest dissertations theses global
ISSN:
2771-5140
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cathie Olschanowsky (Ed.)
    Sparse computations are important in scientific computing. Many scientific applications compute on sparse data. Data is said to be sparse if it has a relatively small number of non-zeros. Sparse formats use auxiliary arrays to store non-zeros, as a result, the contents of auxiliary arrays are not known until run-time. The Inspector/Executor (I/E) paradigm uses run-time information for compiler optimizations. An inspector computes information at run-time to drive transformations. The executor—a compile-time transformation of the original code— uses information computed by the inspector. The sparse polyhedral framework (SPF) encompasses a series of tools to support I/E run-time transformations. This work introduces a unified framework that wraps SPF tools while providing a holistic view of computation as an intermediate representation (IR). This work also introduces a method to automatically synthesize inspectors to transform between sparse formats and improvements to SPF to explore the performance of irregular applications. 
    more » « less
  2. Cathie Olschanowsky (Ed.)
    The Sparse Polyhedral Framework (SPF) provides vital support to scientific applications, but is limited in portability. SPF extends the Polyhedral Model to non-affine codes. Scientific applications need the optimizations SPF enables, but current SPF tools don’t support GPUs or other heterogeneous hardware targets. As clock speeds continue to stagnate, scientific applications need the performance enhancements enabled by both SPF and newer heterogeneous hardware. The MLIR (Multi-Level Intermediate Representation) ecosystem offers a large, extensible, and cooperating set of intermediate representations (called dialects). A typical compiler has one main intermediate representation, whereas an MLIR based compiler will have many. Because of this flexibility, the MLIR ecosystem has many dialects designed with heterogeneous hardware platforms in mind. This work creates an MLIR SPF dialect. The dialect enables SPF optimizations and is capable of generating GPU code as well as CPU code from SPF representations. Previous C based SPF front ends are not capable of generating GPU code. The SPF dialect representations of common sparse scientific kernels generate CPU code competitive with the existing C based front end, and GPU code competitive with standard benchmarks. 
    more » « less
  3. Many important applications including machine learning, molecular dynamics, and computational fluid dynamics, use sparse data. Processing sparse data leads to non-affine loop bounds and frustrates the use of the polyhedral model for code transformation. The Sparse Polyhedral Framework (SPF) addresses limitations of the Polyhedral model by supporting non-affine constraints in sets and relations using uninterpreted functions. This work contributes an object-oriented API that wraps the SPF intermediate representation (IR) and integrates the Inspector/Executor Generation Library and Omega+ for precise set and relation manipulation and code generation. The result is a well-specified definition of a full computation using the SPF IR. The API provides a single entry point for tools to interact with the SPF, generate and manipulate polyhedral data flow graphs, and transform sparse applications. 
    more » « less
  4. Recent trends towards large machine learning models require both training and inference tasks to be distributed. Considering the huge cost of training these models, it is imperative to unlock optimizations in computation and communication to obtain best performance. However, the current logical separation between computation and communication kernels in machine learning frameworks misses optimization opportunities across this barrier. Breaking this abstraction can provide many optimizations to improve the performance of distributed workloads. However, manually applying these optimizations requires modifying the underlying computation and communication libraries for each scenario, which is both time consuming and error-prone. Therefore, we present CoCoNet, which contains (i) a domain specific language to express a distributed machine learning program in the form of computation and communication operations, (ii) a set of semantics preserving transformations to optimize the program, and (iii) a compiler to generate jointly optimized communication and computation GPU kernels. Providing both computation and communication as first class constructs allows users to work on a high-level abstraction and apply powerful optimizations, such as fusion or overlapping of communication and computation. CoCoNet enabled us to optimize data-, model- and pipeline-parallel workloads in large language models with only a few lines of code. Our experiments show that CoCoNet significantly outperforms state-of-the-art distributed machine learning implementations. 
    more » « less
  5. Heterogeneous computing environments combining CPU and GPU resources provide a great boost to large-scale scientific computing applications. Code generation utilities that partition the work into CPU and GPU tasks while considering data movement costs allow researchers to develop high-performance solutions more quickly and easily, and make these resources accessible to a larger user base.We present developments for a domain-specific language (DSL) and code generation framework for solving partial differential equations (PDEs). These enhancements facilitate GPU-accelerated solution of the Boltzmann transport equation (BTE) for phonons, which is the governing equation for simulating thermal transport in semiconductor materials at sub-micron scales. The solution of the BTE involves thousands of coupled PDEs as well as complicated boundary conditions and solving a nonlinear equation that couples all of the degrees of freedom at each time step. These developments enable the DSL to generate configurable hybrid GPU/CPU code that couples accelerated kernels with user-defined code. We observed performance improvements of around 18X compared to a CPU-only version produced by this same DSL with minimal additional programming effort. 
    more » « less