- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources3
- Resource Type
-
0002000001000000
- More
- Availability
-
30
- Author / Contributor
- Filter by Author / Creator
-
-
Cameron, Kirk (2)
-
Adams, Joel C. (1)
-
Back, Godmar (1)
-
Bala, Piotr (1)
-
Bane, Michael K. (1)
-
Cameron, Kirk W. (1)
-
Casanova, Henri (1)
-
Dichev, Kiril (1)
-
Ellis, Margaret (1)
-
Ferreira da Silva, Rafael (1)
-
Jethwani, Gautam (1)
-
Koch, William (1)
-
Lee, Tabitha (1)
-
Nikolopoulos, Dimitrios S. (1)
-
Zhu, Tongyu (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
& Abreu-Ramos, E. D. (0)
-
& Abramson, C. I. (0)
-
& Abreu-Ramos, E. D. (0)
-
- Filter by Editor
-
-
null (1)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Adams, Joel C.; Back, Godmar; Bala, Piotr; Bane, Michael K.; Cameron, Kirk; Casanova, Henri; Ellis, Margaret; Ferreira da Silva, Rafael; Jethwani, Gautam; Koch, William; et al (, 2020 IEEE/ACM Workshop on Education for High-Performance Computing (EduHPC))null (Ed.)
-
Dichev, Kiril; Cameron, Kirk; Nikolopoulos, Dimitrios S. (, EuroMPI'18: Proceedings of the 25th European MPI Users' Group Meeting)Exascale systems will suffer failures hourly. HPC programmers rely mostly on application-level checkpoint and a global rollback to recover. In recent years, techniques reducing the number of rolling back processes have been implemented via message logging. However, the log-based approaches have weaknesses, such as being dependent on complex modifications within an MPI implementation, and the fact that a full restart may be required in the general case. To address the limitations of all log-based mechanisms, we return to checkpoint-only mechanisms, but advocate data flow rollback (DFR), a fundamentally different approach relying on analysis of the data flow of iterative codes, and the well-known concept of data flow graphs. We demonstrate the benefits of DFR for an MPI stencil code by localising rollback, and then reduce energy consumption by 10-12% on idling nodes via frequency scaling. We also provide large-scale estimates for the energy savings of DFR compared to global rollback, which for stencil codes increase as n2 for a process count n.more » « less
An official website of the United States government
