NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Integrating process, control-flow, and data resiliency layers using a hybrid Fenix/Kokkos approach

https://doi.org/10.1109/CLUSTER51413.2022.00052

Whitlock, M.; Morales, N.; Bosilca, G.; Bouteiller, A; Nicolae, B.; Teranishi, K.; Giem, E.; Sarkar, V. (September 2022, 2022 IEEE International Conference on Cluster Computing (CLUSTER))

Integrating recent advancements in resilient algorithms and techniques into existing codes is a singular challenge in fault tolerance - in part due to the underlying complexity of implementing resilience in the first place, but also due to the difficulty introduced when integrating the functionality of a standalone new strategy with the preexisting resilience layers of an application. We propose that the answer is not to build integrated solutions for users, but runtimes designed to integrate into a larger comprehensive resilience system and thereby enable the necessary jump to multi-layered recovery. Our work designs, implements, and verifies one such comprehensive system of runtimes. Utilizing Fenix, a process resilience tool with integration into preexisting resilience systems as a design priority, we update Kokkos Resilience and the use pattern of VeloC to support application-level integration of resilience runtimes. Our work shows that designing integrable systems rather than integrated systems allows for user-designed optimization and upgrading of resilience techniques while maintaining the simplicity and performance of all-in-one resilience solutions. More application-specific choice in resilience strategies allows for better long-term flexibility, performance, and - importantly - simplicity.
more » « less
Full Text Available
Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs

https://doi.org/10.1109/CLUSTER51413.2022.00029

Cayrols, S.; Li, J.; Bosilca, G.; Tomov, S.; Ayala, A.; Dongarra, J. (September 2022, 2022 IEEE International Conference on Cluster Computing (CLUSTER))

In the context of parallel applications, communication is a critical part of the infrastructure and a potential bottleneck. The traditional approach to tackle communication challenges consists of redesigning algorithms so that the complexity or the communication volume is reduced. However, there are algorithms like the Fast Fourier Transform (FFT) where reducing the volume of communication is very challenging yet can reap large benefit in terms of time-to-completion. In this paper, we revisit the implementation of the MPI all-to-all routine at the core of 3D FFTs by using advanced MPI features, such as One-Sided Communication, and integrate data compression during communication to reduce the volume of data exchanged. Since some compression techniques are ‘lossy’ in the sense that they involve a loss of accuracy, we study the impact of lossy compression in heFFTe, the state-of-the-art FFT library for large scale 3D FFTs on hybrid architectures with GPUs. Consequently, we design an approximate FFT algorithm that trades off user-controlled accuracy for speed. We show that we speedup the 3D FFTs proportionally to the compression rate. In terms of accuracy, comparing our approach with a reduced precision execution, where both the data and the computation are in reduced precision, we show that when the volume of communication is compressed to the size of the reduced precision data, the approximate FFT algorithm is as fast as the one in reduced precision while the accuracy is one order of magnitude better.
more » « less
Full Text Available
Using long vector extensions for MPI reductions

https://doi.org/10.1016/j.parco.2021.102871

Zhong, Dong; Cao, Qinglei; Bosilca, George; Dongarra, Jack (March 2022, Parallel Computing)

Full Text Available
An international survey on MPI users

https://doi.org/10.1016/j.parco.2021.102853

Hori, Atsushi; Jeannot, Emmanuel; Bosilca, George; Ogura, Takahiro; Gerofi, Balazs; Yin, Jie; Ishikawa, Yutaka (December 2021, Parallel Computing)

Full Text Available
Quo Vadis MPI RMA? Towards a More Efficient Use of MPI One-Sided Communication

Schuchart, J.; Niethammer, C.; Gracia, J.; Bosilca, G. (January 2021, Proceedings of EuroMPI 2021 (EuroMPI’21)

The MPI standard has long included one-sided communication abstractions through the MPI Remote Memory Access (RMA) interface. Unfortunately, the MPI RMA chapter in the 4.0 version of the MPI standard still contains both well-known and lesser known short-comings for both implementations and users, which lead to potentially non-optimal usage patterns. In this paper, we identify a set of issues and propose ways for applications to better express anticipated usage of RMA routines, allowing the MPI implementation to better adapt to the application's needs. In order to increase the flexibility of the RMA interface, we add the capability to duplicate windows, allowing access to the same resources encapsulated by a window using different configurations. In the same vein, we introduce the concept of MPI memory handles, meant to provide life-time guarantees on memory attached to dynamic windows, removing the overhead currently present in using dynamically exposed memory. We will show that our extensions provide improved accumulate latencies, reduced overheads for multi-threaded flushes, and allow for zero overhead dynamic memory window usage.
more » « less
Full Text Available
Callback-based completion notification using MPI Continuations

https://doi.org/10.1016/j.parco.2021.102793

Schuchart, Joseph; Samfass, Philipp; Niethammer, Christoph; Gracia, José; Bosilca, George (January 2021, Parallel computing)

Full Text Available
Task Bench: A Parameterized Benchmark for Evaluating Parallel Runtime Performance

https://doi.org/10.1109/SC41405.2020.00066

Slaughter, Elliott; Wu, Wei; Fu, Yuankun; Brandenburg, Legend; Garcia, Nicolai; Kautz, Wilhem; Marx, Emily; Morris, Kaleb S.; Cao, Qinglei; Bosilca, George; et al (November 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis)

Full Text Available
Using Advanced Vector Extensions AVX-512 for MPI Reductions

https://doi.org/10.1145/3416315.3416316

Zhong, Dong; Cao, Qinglei; Bosilca, George; Dongarra, Jack (September 2020, EuroMPI/USA '20: 27th European MPI Users' Group Meeting)

Full Text Available
Predicting MPI Collective Communication Performance Using Machine Learning

https://doi.org/10.1109/CLUSTER49012.2020.00036

Hunold, Sascha; Bhatele, Abhinav; Bosilca, George; Knees, Peter (September 2020, 2020 IEEE International Conference on Cluster Computing (CLUSTER))

Full Text Available
HAN: a Hierarchical AutotuNed Collective Communication Framework

https://doi.org/10.1109/CLUSTER49012.2020.00013

Luo, Xi; Wu, Wei; Bosilca, George; Pei, Yu; Cao, Qinglei; Patinyasakdikul, Thananon; Zhong, Dong; Dongarra, Jack (September 2020, 2020 IEEE International Conference on Cluster Computing (CLUSTER))

Full Text Available

« Prev Next »

Search for: All records