Performance analysis is critical for GPU programs with data-dependent behavior, but models like Roofline are not very useful for them and interpreting raw performance counters is tedious. In this work, we present an analytical model for shared memory atomics (fetch-and-op and compare-and-swap instructions on NVIDIA Volta and Ampere GPU) that allows users to immediately determine if shared memory atomic operations are a bottleneck for a program’s execution. Our model is based on modeling the architecture as a single-server queuing model whose inputs are performance counters. It captures load-dependent behavior such as pipelining, parallelism, and different access patterns. We embody this model in a tool that uses CUDA hardware counters as parameters to predict the utilization of the shared-memory atomic unit. To the best of our knowledge, no existing profiling tool or model provides this capability for shared-memory atomic operations. We used the model to compare two histogram kernels that use shared-memory atomics. Although nearly identical, their performance can be different by up to 30%. Our tool correctly identifies a bottleneck shift from shared-memory atomic unit as the cause of this discrepancy
more »
« less
Developing interoperable, accessible software via the atomic, molecular, and optical sciences gateway: A case study of the B-spline atomic R-matrix code graphical user interface
The Atomic, Molecular, and Optical Science (AMOS) Gateway is a comprehensive cyberinfrastructure for research and educational activities in computational AMO science. The B-Spline atomic R-Matrix (BSR) suite of programs is one of several computer programs currently available on the gateway. It is an excellent example of the gateway’s potential to increase the scientific productivity of AMOS users. While the suite is available to be used in batch mode, its complexity does not make it well-suited to the approach taken in the gateway’s default setup. The complexity originates from the need to execute many different computations and to construct generally complex workflows, requiring numerous input files that must be used in a specific sequence. The BSR graphical user interface described in this paper was developed to considerably simplify employing the BSR codes on the gateway, making BSR available to a large group of researchers and students interested in AMO science.
more »
« less
- Award ID(s):
- 2311928
- PAR ID:
- 10632530
- Publisher / Repository:
- Journal of Chemical Physics
- Date Published:
- Journal Name:
- The Journal of Chemical Physics
- Volume:
- 161
- Issue:
- 13
- ISSN:
- 0021-9606
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Since its initial development in the 1970s by Phil Burke and his collaborators, the R-matrix theory and associated computer codes have become the method of choice for the calculation of accurate data for general electron–atom/ion/molecule collision and photoionization processes. The use of a non-orthogonal set of orbitals based on B-splines, now called the B-spline R-matrix (BSR) approach, was pioneered by Zatsarinny. It has considerably extended the flexibility of the approach and improved particularly the treatment of complex many-electron atomic and ionic targets, for which accurate data are needed in many modelling applications for processes involving low-temperature plasmas. Both the original R-matrix approach and the BSR method have been extended to the interaction of short, intense electromagnetic (EM) radiation with atoms and molecules. Here, we provide an overview of the theoretical tools that were required to facilitate the extension of the theory to the time domain. As an example of a practical application, we show results for two-photon ionization of argon by intense short-pulse extreme ultraviolet radiation.more » « less
-
Liwendowski, H. (Ed.)The electrons and atoms inside molecules can rearrange rapidly during photoexcitation or collisions, moving angstroms in a few femtoseconds or less. This non-classical many-body quantum evolution is far too small and too fast to be resolved in any imaging microscope, but if we could film it, what should we expect to see? New tools based on ultrafast lasers, electron accelerators, and x-ray free-electron lasers have now begun to record this motion with increasing detail, and for a growing array of atomic and molecular systems. Here I will attempt to answer the question, "So what?" What have we learned, and how are molecular movies guiding us toward future discoveries in AMO physics? *Much of this work is supported by the U.S. Department of Energy (DOE), Office of Science, Office of Basic Energy Sciences (BES), Chemical Sciences, Geosciences, and Biosciences Division (CSGB). Other work described here has been supported by the National Science Foundationmore » « less
-
null (Ed.)Although scanning transmission electron microscopy (STEM) images of individual heavy atoms were reported 50 years ago, the applications of atomic-resolution STEM imaging became wide spread only after the practical realization of aberration correctors on field-emission STEM/TEM instruments to form sub-Ångstrom electron probes. The innovative designs and advances of electron optical systems, the fundamental understanding of electron–specimen interaction processes, and the advances in detector technology all played a major role in achieving the goal of atomic-resolution STEM imaging of practical materials. It is clear that tremendous advances in computer technology and electronics, image acquisition and processing algorithms, image simulations, and precision machining synergistically made atomic-resolution STEM imaging routinely accessible. It is anticipated that further hardware/software development is needed to achieve three-dimensional atomic-resolution STEM imaging with single-atom chemical sensitivity, even for electron-beam-sensitive materials. Artificial intelligence, machine learning, and big-data science are expected to significantly enhance the impact of STEM and associated techniques on many research fields such as materials science and engineering, quantum and nanoscale science, physics and chemistry, and biology and medicine. This review focuses on advances of STEM imaging from the invention of the field-emission electron gun to the realization of aberration-corrected and monochromated atomic-resolution STEM and its broad applications.more » « less
-
Recent concurrent shifts of the East Asian polar-front jet (EAPJ) and the East Asian subtropical jet (EASJ) in the boreal winter have raised concerns, since they could result in severe weather events over East Asia. However, the possible mechanisms are not fully understood. In this study, the roles of the interdecadal Pacific oscillation (IPO) and the Atlantic multidecadal oscillation (AMO) are investigated by analyzing reanalysis data and model simulations. Results show that combinations of opposite phases of the IPO and AMO can result in significant shifts of the two jets during 1920–2014. This relationship is particularly evident during 1999–2014 and 1979–98 in the reanalysis data. A combination of a negative phase of the IPO (−IPO) and a positive phase of the AMO (+AMO) since the late 1990s has enhanced the meridional temperature gradient and the Eady growth rate and thus westerlies over the region between the two jets, but weakened them to the south and north of the region, thereby contributing to the equatorward and poleward shifts of the EAPJ and EASJ, respectively. Atmospheric model simulations are further used to investigate the relative contribution of −IPO and +AMO to the jet shifts. The model simulations show that the combination of −IPO and +AMO favors the recent jet changes more than the individual −IPO or +AMO. Under a concurrent −IPO and +AMO, the meridional eddy transport of zonal momentum and sensitive heat strengthens, and more mean available potential energy converts to the eddy available potential energy over the region between the two jets, which enhances westerly winds there.more » « less
An official website of the United States government

