Search for: All records

Award ID contains: 1763743

« Prev Next »

Total Resources

22

Resource Type
Conference Paper

22

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

22

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CARAT KOP: Towards Protecting the Core HPC Kernel from Linux Kernel Modules

https://doi.org/10.1145/3624062.3624237

Filipiuk, Thomas ; Wanninger, Nick ; Dhiantravan, Nadharm ; Surmeier, Carson ; Bernat, Alex ; Dinda, Peter ( November 2023 , Proceedings of the 13th International Workshop on Runtime and Operating Systems for Supercomputers (ROSS 2023))
WARDen: Specializing Cache Coherence for High-Level Parallel Languages

https://doi.org/10.1145/3579990.3580013

Wilkins, Michael ; Westrick, Sam ; Kandiah, Vijay ; Bernat, Alex ; Suchy, Brian ; Deiana, Enrico Armenio ; Campanoni, Simone ; Acar, Umut A. ; Dinda, Peter ; Hardavellas, Nikos ( February 2023 , Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization)

High-level parallel languages (HLPLs) make it easier to write correct parallel programs. Disciplined memory usage in these languages enables new optimizations for hardware bottlenecks, such as cache coherence. In this work, we show how to reduce the costs of cache coherence by integrating the hardware coherence protocol directly with the programming language; no programmer effort or static analysis is required. We identify a new low-level memory property, WARD (WAW Apathy and RAW Dependence-freedom), by construction in HLPL programs. We design a new coherence protocol, WARDen, to selectively disable coherence using WARD. We evaluate WARDen with a widely-used HLPL benchmark suite on both current and future x64 machine structures. WARDen both accelerates the benchmarks (by an average of 1.46x) and reduces energy (by 23%) by eliminating unnecessary data movement and coherency messages.
more » « less
Full Text Available
FPVM: Towards a Floating Point Virtual Machine

https://doi.org/10.1145/3502181.3531469

Dinda, Peter ; Wanninger, Nick ; Ma, Jiacheng ; Bernat, Alex ; Bernat, Charles ; Ghosh, Souradip ; Kraemer, Christopher ; Elmasry, Yehya ( June 2022 , Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2022) June 2022)

Full Text Available
NOELLE Offers Empowering LLVM Extensions

https://doi.org/10.1109/CGO53902.2022.9741276

Matni, Angelo ; Deiana, Enrico Armenio ; Su, Yian ; Gross, Lukas ; Ghosh, Souradip ; Apostolakis, Sotiris ; Xu, Ziyang ; Tan, Zujun ; Chaturvedi, Ishita ; Homerding, Brian ; et al ( April 2022 , 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO))

Modern and emerging architectures demand increasingly complex compiler analyses and transformations. As the emphasis on compiler infrastructure moves beyond support for peephole optimizations and the extraction of instruction-level parallelism, compilers should support custom tools designed to meet these demands with higher-level analysis-powered abstractions and functionalities of wider program scope. This paper introduces NOELLE, a robust open-source domain-independent compilation layer built upon LLVM providing this support. NOELLE extends abstractions and functionalities provided by LLVM enabling advanced, program-wide code analyses and transformations. This paper shows the power of NOELLE by presenting a diverse set of 11 custom tools built upon it.
more » « less
Full Text Available
Isolating functions at the hardware limit with virtines

https://doi.org/10.1145/3492321.3519553

Wanninger, Nicholas C. ; Bowden, Joshua J. ; Shetty, Kirtankumar ; Garg, Ayush ; Hale, Kyle C. ( March 2022 , Proceedings of the 17th European Conference on Computer Systems (EuroSys 2022))

Full Text Available
ST2GPU: An Energy-Efficient GPU Design with Spatio-Temporal Shared-Thread Speculative Adders

https://doi.org/10.1109/DAC18074.2021.9586093

Kandiah, Vijay ; Gok, Ali Murat ; Tziantzioulis, Georgios ; Hardavellas, Nikos ( December 2021 , Proceedings of the 58th ACM/IEEE Design Automation Conference (DAC 2021))

Full Text Available
The Case for an Interwoven Parallel Hardware/Software Stack

https://doi.org/10.1109/SCWS55283.2021.00017

Hale, Kyle C. ; Campanoni, Simone ; Hardavellas, Nikos ; Dinda, Peter A. ( November 2021 , Proceedings of the 11th Workshop on Runtime and Operating Systems for Supercomputers)

Full Text Available
Quantifying the Semantic Gap Between Serial and Parallel Programming

https://doi.org/10.1109/IISWC53511.2021.00024

Zhang, Xiaochun ; Jones, Timothy M. ; Campanoni, Simone ( November 2021 , 2021 IEEE International Symposium on Workload Characterization (IISWC))

Automatic parallelizing compilers are often constrained in their transformations because they must conservatively respect data dependences within the program. Developers, on the other hand, often take advantage of domain-specific knowledge to apply transformations that modify data dependences but respect the application's semantics. This creates a semantic gap between the parallelism extracted automatically by compilers and manually by developers. Although prior work has proposed programming language extensions to close this semantic gap, their relative contribution is unclear and it is uncertain whether compilers can actually achieve the same performance as manually parallelized code when using them. We quantify this semantic gap in a set of sequential and parallel programs and leverage these existing programming-language extensions to empirically measure the impact of closing it for an automatic parallelizing compiler. This lets us achieve an average speedup of 12.6× on an Intel-based 28-core machine, matching the speedup obtained by the manually parallelized code. Further, we apply these extensions to widely used sequential system tools, obtaining 7.1× speedup on the same system.
more » « less
Full Text Available
Memory Mapping and Parallelizing Random Forests for Speed and Cache Efficiency

https://doi.org/10.1145/3458744.3474052

Romero-Gainza, Eduardo ; Stewart, Christopher ; Li, Angela ; Hale, Kyle ; Morris, Nathaniel ( August 2021 , International Workshop on Parallel and Distributed Algorithms for Decision Sciences (PDADS 2021))

Full Text Available
Task parallel assembly language for uncompromising parallelism

https://doi.org/10.1145/3453483.3460969

Rainey, Mike ; Newton, Ryan R. ; Hale, Kyle ; Hardavellas, Nikos ; Campanoni, Simone ; Dinda, Peter ; Acar, Umut A. ( June 2021 , PLDI 2021: Proceedings of the 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation)
null (Ed.)
Full Text Available

« Prev Next »