Abstract The Boltzmann Transport equation (BTE) was solved numerically in cylindrical coordinates and in time domain to simulate a Frequency Domain Thermo-Reflectance (FDTR) experiment. First, a parallel phonon BTE solver that accounts for all phonon modes, frequencies, and polarizations was developed and tested. The solver employs the finite-volume method (FVM) for discretization of physical space, and the finite-angle method (FAM) for discretization of angular space. The solution was advanced in time using explicit time marching. The simulations were carried out in time domain and band-based parallelization of the BTE solver was implemented. The phase lag between the temperature averaged over the probed region of the transducer and the modulated laser pump signal was extracted for a pump laser modulation frequency ranging from 20–200 MHz. It was found that with the relaxation time scales used in the present study, the computed phase lag is underpredicted when compared to experimental data, especially at smaller modulation frequencies. The challenges in solving the BTE for such applications are highlighted.
more »
« less
Scalable parallelization for the solution of phonon Boltzmann Transport Equation
The Boltzmann Transport Equation (BTE) for phonons is often used to predict thermal transport at submicron scales in semiconductors. The BTE is a seven-dimensional nonlinear integro-differential equation, resulting in difficulty in its solution even after linearization under the single relaxation time approximation. Furthermore, parallelization and load balancing are challenging, given the high dimensionality and variability of the linear systems' conditioning. This work presents a 'synthetic' scalable parallelization method for solving the BTE on large-scale systems. The method includes cell-based parallelization, combined band+cell-based parallelization, and batching technique. The essential computational ingredient of cell-based parallelization is a sparse matrix-vector product (SpMV) that can be integrated with an existing linear algebra library like PETSc. The combined approach enhances the cell-based method by further parallelizing the band dimension to take advantage of low inter-band communication costs. For the batched approach, we developed a batched SpMV that enables multiple linear systems to be solved simultaneously, merging many MPI messages to reduce communication costs, thus maintaining scalability when the grain size becomes very small. We present numerical experiments to demonstrate our method's excellent speedups and scalability up to 16384 cores for a problem with 12.6 billion unknowns.
more »
« less
- Award ID(s):
- 2004236
- PAR ID:
- 10502521
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400700569
- Page Range / eLocation ID:
- 215 to 226
- Format(s):
- Medium: X
- Location:
- Orlando FL USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We present two algorithms to compute system-specific polarizabilities and dispersion coefficients such that required memory and computational time scale linearly with increasing number of atoms in the unit cell for large systems. The first algorithm computes the atom-in-material (AIM) static polarizability tensors, force-field polarizabilities, and C 6 , C 8 , C 9 , C 10 dispersion coefficients using the MCLF method. The second algorithm computes the AIM polarizability tensors and C 6 coefficients using the TS-SCS method. Linear-scaling computational cost is achieved using a dipole interaction cutoff length function combined with iterative methods that avoid large dense matrix multiplies and large matrix inversions. For MCLF, Richardson extrapolation of the screening increments is used. For TS-SCS, a failproof conjugate residual (FCR) algorithm is introduced that solves any linear equation system having Hermitian coefficients matrix. These algorithms have mathematically provable stable convergence that resists round-off errors. We parallelized these methods to provide rapid computation on multi-core computers. Excellent parallelization efficiencies were obtained, and adding parallel processors does not significantly increase memory requirements. This enables system-specific polarizabilities and dispersion coefficients to be readily computed for materials containing millions of atoms in the unit cell. The largest example studied herein is an ice crystal containing >2 million atoms in the unit cell. For this material, the FCR algorithm solved a linear equation system containing >6 million rows, 7.57 billion interacting atom pairs, 45.4 billion stored non-negligible matrix components used in each large matrix-vector multiplication, and ∼19 million unknowns per frequency point (>300 million total unknowns).more » « less
-
Heterogeneous computing environments combining CPU and GPU resources provide a great boost to large-scale scientific computing applications. Code generation utilities that partition the work into CPU and GPU tasks while considering data movement costs allow researchers to develop high-performance solutions more quickly and easily, and make these resources accessible to a larger user base.We present developments for a domain-specific language (DSL) and code generation framework for solving partial differential equations (PDEs). These enhancements facilitate GPU-accelerated solution of the Boltzmann transport equation (BTE) for phonons, which is the governing equation for simulating thermal transport in semiconductor materials at sub-micron scales. The solution of the BTE involves thousands of coupled PDEs as well as complicated boundary conditions and solving a nonlinear equation that couples all of the degrees of freedom at each time step. These developments enable the DSL to generate configurable hybrid GPU/CPU code that couples accelerated kernels with user-defined code. We observed performance improvements of around 18X compared to a CPU-only version produced by this same DSL with minimal additional programming effort.more » « less
-
ppohBEM is an open-source software package im- plementing the boundary element method. One of its main software tasks is the solution of the dense linear system of equations, for which, ppohBEM relies on another software package called HACApK. To reduce the cost of solving the linear system, HACApK hierarchically compresses the coefficient matrix using adaptive cross approximation. This hierarchical compression greatly reduces the storage and time complexities of the solver and enables the solution of large-scale boundary value problems. To extend the capability of ppohBEM, in this paper, we carefully port the HACApK’s linear solver onto GPU clusters. Though the potential of the GPUs has been widely accepted in high-performance computing, it is still a challenge to utilize the GPUs for a solver, like HACApK’s, that requires fine-grained computation and global communication. First, to utilize the GPUs, we integrate the batched GPU kernel that was recently released in the MAGMA software package. We discuss several techniques to improve the performance of the batched kernel. We then study various techniques to address the inter-GPU communication and study their effects on state-of- the-art GPU clusters. We believe that the techniques studied in this paper are of interest to a wide range of software packages running on GPUs, especially with the increasingly complex node architectures and the growing costs of the communication. We also hope that our efforts to integrate the GPU kernel or to setup the inter-GPU communication will influence the design of the future-generation batched kernels or the communication layer within a software stack.more » « less
-
The need for secure and efficient communication between connected devices continues to grow in healthcare systems within smart cities. Secure communication of healthcare data in Internet of Things (IoT) systems is critical to ensure patient privacy and data integrity. Problems with healthcare communication, like data breaches, integrity issues, scalability issues, and cyber threats, make it harder for people to trust doctors, cause costs to rise, stop people from using new technology, and put private data at risk. So, this paper presents a blockchain-based hybrid method for sending secure healthcare data that combines IoT systems with blockchain technology and high-tech encryption techniques like elliptic curve cryptography (ECC). The proposed method uses the public key of a smart contract to encrypt private data to protect its privacy. It also uses cryptographic hashing and digital signatures to make sure that the data is correct and real. The framework stores metadata (e.g., hashes and signatures) on-chain, and large data uses off-chain storage like IPFS to reduce costs and improve scalability. It also incorporates a mechanism to authenticate IoT devices and enable secure communication across heterogeneous networks. Moreover, this work bridges gaps in existing solutions by providing an end-to-end secure communication system for healthcare applications. It provides strong data security and efficient storage for a reliable and scalable way to handle healthcare data safely in IoT ecosystems.more » « less
An official website of the United States government

