skip to main content


Title: MultiGrid on FPGA Using Data Parallel C++
Centered on modern C++ and the SYCL standard for heterogeneous programming, Data Parallel C++ (dpc++) and Intel's oneAPI software ecosystem aim to lower the barrier to entry for the use of accelerators like FPGAs in diverse applications. In this work, we consider the usage of FPGAs for scientific computing, in particular with a multigrid solver, MueLu. We report on early experiences implementing kernels of the solver in DPC++ for execution on Stratix 10 FPGAs, and we evaluate several algorithmic design and implementation choices. These choices not only impact performance, but also shed light on the capabilities and limitations of DPC++ and oneAPI.  more » « less
Award ID(s):
2016701
NSF-PAR ID:
10367855
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IPDSPW
Page Range / eLocation ID:
907 to 910
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Candida albicansis an opportunistic yeast that can cause life‐threatening systemic infection in immunocompromised individuals. During infections,C. albicanshas to cope with genotoxic stresses generated by the host immune system. DNA–protein crosslink (DPC), the covalent linkage of proteins with DNA, is one type of DNA damages that can be caused by the host immune response. DPCs are bulky lesions that interfere with the progression of replication and transcription machineries, and hence threaten genomic integrity. Accordingly, either a DPC tolerance mechanism or a DPC repair pathway is essential forC. albicansto maintain genomic stability and survive in the host. Here, we identified Wss1 (weak suppressor of Smt3) inC. albicans(CaWss1) using bioinformatics, genetic complementation, and biochemical studies. We showed thatCaWss1 promotes cell survival under genotoxic stress conditions that generate DPCs and that the catalytic metalloprotease domain ofCaWss1 is essential for its cellular function. Interactions ofCaWss1 with Cdc48 and small ubiquitin‐like modifier, although not strictly required, contribute to the function ofCaWss1 in the suppression of the growth defects under DPC‐inducing conditions. This report is the first investigation of the role ofCaWss1 in DPC tolerance inC. albicans.

     
    more » « less
  2. The continued growth in the processing power of FPGAs coupled with high bandwidth memories (HBM), makes systems like the Xilinx U280 credible platforms for linear solvers which often dominate the run time of scientific and engineering applications. In this paper, we present Callipepla, an accelerator for a preconditioned conjugate gradient linear solver (CG). FPGA acceleration of CG faces three challenges: (1) how to support an arbitrary problem and terminate acceleration processing on the fly, (2) how to coordinate long-vector data flow among processing modules, and (3) how to save off-chip memory bandwidth and maintain double (FP64) precision accuracy. To tackle the three challenges, we present (1) a stream-centric instruction set for efficient streaming processing and control, (2) vector streaming reuse (VSR) and decentralized vector flow scheduling to coordinate vector data flow among modules and further reduce off-chip memory access latency with a double memory channel design, and (3) a mixed precision scheme to save bandwidth yet still achieve effective double precision quality solutions. To the best of our knowledge, this is the first work to introduce the concept of VSR for data reusing between on-chip modules to reduce unnecessary off-chip accesses and enable modules working in parallel for FPGA accelerators. We prototype the accelerator on a Xilinx U280 HBM FPGA. Our evaluation shows that compared to the Xilinx HPC product, the XcgSolver, Callipepla achieves a speedup of 3.94×, 3.36× higher throughput, and 2.94× better energy efficiency. Compared to an NVIDIA A100 GPU which has 4× the memory bandwidth of Callipepla, we still achieve 77% of its throughput with 3.34× higher energy efficiency. The code is available at https://github.com/UCLA-VAST/Callipepla. 
    more » « less
  3. Mackelprang, Rachel (Ed.)
    ABSTRACT Increasing data volumes on high-throughput sequencing instruments such as the NovaSeq 6000 leads to long computational bottlenecks for common metagenomics data preprocessing tasks such as adaptor and primer trimming and host removal. Here, we test whether faster recently developed computational tools (Fastp and Minimap2) can replace widely used choices (Atropos and Bowtie2), obtaining dramatic accelerations with additional sensitivity and minimal loss of specificity for these tasks. Furthermore, the taxonomic tables resulting from downstream processing provide biologically comparable results. However, we demonstrate that for taxonomic assignment, Bowtie2’s specificity is still required. We suggest that periodic reevaluation of pipeline components, together with improvements to standardized APIs to chain them together, will greatly enhance the efficiency of common bioinformatics tasks while also facilitating incorporation of further optimized steps running on GPUs, FPGAs, or other architectures. We also note that a detailed exploration of available algorithms and pipeline components is an important step that should be taken before optimization of less efficient algorithms on advanced or nonstandard hardware. IMPORTANCE In shotgun metagenomics studies that seek to relate changes in microbial DNA across samples, processing the data on a computer often takes longer than obtaining the data from the sequencing instrument. Recently developed software packages that perform individual steps in the pipeline of data processing in principle offer speed advantages, but in practice they may contain pitfalls that prevent their use, for example, they may make approximations that introduce unacceptable errors in the data. Here, we show that differences in choices of these components can speed up overall data processing by 5-fold or more on the same hardware while maintaining a high degree of correctness, greatly reducing the time taken to interpret results. This is an important step for using the data in clinical settings, where the time taken to obtain the results may be critical for guiding treatment. 
    more » « less
  4. We consider the problem of optimal control of district cooling energy plants (DCEPs) consisting of multiple chillers, a cooling tower, and a thermal energy storage (TES), in the presence of time-varying electricity price. A straightforward application of model predictive control (MPC) requires solving a challenging mixed-integer nonlinear program (MINLP) because of the on/off of chillers and the complexity of the DCEP model. Reinforcement learning (RL) is an attractive alternative since its real-time control computation is much simpler. But designing an RL controller is challenging due to myriad design choices and computationally intensive training. In this paper, we propose an RL controller and an MPC controller for minimizing the electricity cost of a DCEP and compare them via simulations. The two controllers are designed to be comparable in terms of objective and information requirements. The RL controller uses a novel Q-learning algorithm that is based on least-squares policy iteration. We describe the design choices for the RL controller, including the choice of state space and basis functions, that are found to be effective. The proposed MPC controller does not need a mixed integer solver for implementation, but only a nonlinear program (NLP) solver. A rule-based baseline controller is also proposed to aid in comparison. Simulation results show that the proposed RL and MPC controllers achieve similar savings over the baseline controller, about 17%. 
    more » « less
  5. 3D phase imaging recovers an object’s volumetric refractive index from intensity and/or holographic measurements. Partially coherent methods, such as illumination-based differential phase contrast (DPC), are particularly simple to implement in a commercial brightfield microscope. 3D DPC acquires images at multiple focus positions and with different illumination source patterns in order to reconstruct 3D refractive index. Here, we present a practical extension of the 3D DPC method that does not require a precise motion stage for scanning the focus and uses optimized illumination patterns for improved performance. The user scans the focus by hand, using the microscope’s focus knob, and the algorithm self-calibrates the axial position to solve for the 3D refractive index of the sample through a computational inverse problem. We further show that the illumination patterns can be optimized by an end-to-end learning procedure. Combining these two, we demonstrate improved 3D DPC with a commercial microscope whose only hardware modification is LED array illumination.

     
    more » « less