NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Revisiting Computation for Research: Practices and Trends

https://doi.org/10.1109/SC41406.2024.00076

Giordani, Jeremiah; Xu, Ziyang; Colby, Ella; Ning, August; Godala, Bhargav Reddy; Chaturvedi, Ishita; Zhu, Shaowei; Chon, Yebin; Chan, Greg; Tan, Zujun; et al (November 2024, IEEE)

Full Text Available
GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction

https://doi.org/10.1109/ISCA59077.2024.00011

Chaturvedi, Ishita; Godala, Bhargav Reddy; Wu, Yucan; Xu, Ziyang; Iliakis, Konstantinos; Eleftherakis, Panagiotis-Eleftherios; Xydis, Sotirios; Soudris, Dimitrios; Sorensen, Tyler; Campanoni, Simone; et al (June 2024, IEEE)

Full Text Available
PROMPT: A Fast and Extensible Memory Profiling Framework

https://doi.org/10.1145/3649827

Xu, Ziyang; Chon, Yebin; Su, Yian; Tan, Zujun; Apostolakis, Sotiris; Campanoni, Simone; August, David I (April 2024, Proceedings of the ACM on Programming Languages)

Memory profiling captures programs’ dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT’s impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers.
more » « less
Full Text Available
Compiling Loop-Based Nested Parallelism for Irregular Workloads

https://doi.org/10.1145/3620665.3640405

Su, Yian; Rainey, Mike; Wanninger, Nick; Dhiantravan, Nadharm; Liang, Jasper; Acar, Umut A; Dinda, Peter; Campanoni, Simone (April 2024, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Modern programming languages offer special syntax and semantics for logical fork-join parallelism in the form of parallel loops, allowing them to be nested, e.g., a parallel loop within another parallel loop. This expressiveness comes at a price, however: on modern multicore systems, realizing logical parallelism results in overheads due to the creation and management of parallel tasks, which can wipe out the benefits of parallelism. Today, we expect application programmers to cope with it by manually tuning and optimizing their code. Such tuning requires programmers to reason about architectural factors hidden behind layers of software abstractions, such as task scheduling and load balancing. Managing these factors is particularly challenging when workloads are irregular because their performance is input-sensitive. This paper presents HBC, the first compiler that translates C/C++ programs with high-level, fork-join constructs (e.g., OpenMP) to binaries capable of automatically controlling the cost of parallelism and dealing with irregular, input-sensitive workloads. The basis of our approach is Heartbeat Scheduling, a recent proposal for automatic granularity control, which is backed by formal guarantees on performance. HBC binaries outperform OpenMP binaries for workloads for which even entirely manual solutions struggle to find the right balance between parallelism and its costs.
more » « less
Full Text Available
Representing Data Collections in an SSA Form

https://doi.org/10.1109/CGO57630.2024.10444817

McMichen, Tommy; Greiner, Nathan; Zhong, Peter; Sossai, Federico; Patel, Atmn; Campanoni, Simone (March 2024, IEEE)

Full Text Available
EMISSARY: Enhanced Miss Awareness Replacement Policy for L2 Instruction Caching

https://doi.org/10.1145/3579371.3589097

Nagendra, Nayana Prasad; Godala, Bhargav Reddy; Chaturvedi, Ishita; Patel, Atmn; Kanev, Svilen; Moseley, Tipp; Stark, Jared; Pokam, Gilles A.; Campanoni, Simone; August, David I. (June 2023, Proceedings of the 50th International Symposium on Computer Architecture (ISCA))

Full Text Available
SPLENDID: Supporting Parallel LLVM-IR Enhanced Natural Decompilation for Interactive Development

https://doi.org/10.1145/3582016.3582058

Tan, Zujun; Chon, Yebin; Kruse, Michael; Doerfert, Johannes; Xu, Ziyang; Homerding, Brian; Campanoni, Simone; August, David I. (March 2023, International Conference on Architectural Support for Programming Languages and Operating Systems)

Manually writing parallel programs is difficult and error-prone. Automatic parallelization could address this issue, but profitability can be limited by not having facts known only to the programmer. A parallelizing compiler that collaborates with the programmer can increase the coverage and performance of parallelization while reducing the errors and overhead associated with manual parallelization. Unlike collaboration involving analysis tools that report program properties or make parallelization suggestions to the programmer, decompiler-based collaboration could leverage the strength of existing parallelizing compilers to provide programmers with a natural compiler-parallelized starting point for further parallelization or refinement. Despite this potential, existing decompilers fail to do this because they do not generate portable parallel source code compatible with any compiler of the source language. This paper presents SPLENDID, an LLVM-IR to C/OpenMP decompiler that enables collaborative parallelization by producing standard parallel OpenMP code. Using published manual parallelization of the PolyBench benchmark suite as a reference, SPLENDID's collaborative approach produces programs twice as fast as either Polly-based automatic parallelization or manual parallelization alone. SPLENDID's portable parallel code is also more natural than that from existing decompilers, obtaining a 39x higher average BLEU score.
more » « less
Full Text Available
Program State Element Characterization

https://doi.org/10.1145/3579990.3580011

Deiana, Enrico Armenio; Suchy, Brian; Wilkins, Michael; Homerding, Brian; McMichen, Tommy; Dunajewski, Katarzyna; Dinda, Peter; Hardavellas, Nikos; Campanoni, Simone (February 2023, International Symposium on Code Generation and Optimization)

Modern programming languages offer abstractions that simplify software development and allow hardware to reach its full potential. These abstractions range from the well-established OpenMP language extensions to newer C++ features like smart pointers. To properly use these abstractions in an existing codebase, programmers must determine how a given source code region interacts with Program State Elements (PSEs) (i.e., the program's variables and memory locations). We call this process Program State Element Characterization (PSEC). Without tool support for PSEC, a programmer's only option is to manually study the entire codebase. We propose a profile-based approach that automates PSEC and provides abstraction recommendations to programmers. Because a profile-based approach incurs an impractical overhead, we introduce the Compiler and Runtime Memory Observation Tool (CARMOT), a PSEC-specific compiler co-designed with a parallel runtime. CARMOT reduces the overhead of PSEC by two orders of magnitude, making PSEC practical. We show that CARMOT's recommendations achieve the same speedup as hand-tuned OpenMP directives and avoid memory leaks with C++ smart pointers. From this, we argue that PSEC tools, such as CARMOT, can provide support for the rich ecosystem of modern programming language abstractions.
more » « less
Full Text Available
WARDen: Specializing Cache Coherence for High-Level Parallel Languages

https://doi.org/10.1145/3579990.3580013

Wilkins, Michael; Westrick, Sam; Kandiah, Vijay; Bernat, Alex; Suchy, Brian; Deiana, Enrico Armenio; Campanoni, Simone; Acar, Umut A.; Dinda, Peter; Hardavellas, Nikos (February 2023, Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization)

High-level parallel languages (HLPLs) make it easier to write correct parallel programs. Disciplined memory usage in these languages enables new optimizations for hardware bottlenecks, such as cache coherence. In this work, we show how to reduce the costs of cache coherence by integrating the hardware coherence protocol directly with the programming language; no programmer effort or static analysis is required. We identify a new low-level memory property, WARD (WAW Apathy and RAW Dependence-freedom), by construction in HLPL programs. We design a new coherence protocol, WARDen, to selectively disable coherence using WARD. We evaluate WARDen with a widely-used HLPL benchmark suite on both current and future x64 machine structures. WARDen both accelerates the benchmarks (by an average of 1.46x) and reduces energy (by 23%) by eliminating unnecessary data movement and coherency messages.
more » « less
Full Text Available
NOELLE Offers Empowering LLVM Extensions

https://doi.org/10.1109/CGO53902.2022.9741276

Matni, Angelo; Deiana, Enrico Armenio; Su, Yian; Gross, Lukas; Ghosh, Souradip; Apostolakis, Sotiris; Xu, Ziyang; Tan, Zujun; Chaturvedi, Ishita; Homerding, Brian; et al (April 2022, 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO))

Modern and emerging architectures demand increasingly complex compiler analyses and transformations. As the emphasis on compiler infrastructure moves beyond support for peephole optimizations and the extraction of instruction-level parallelism, compilers should support custom tools designed to meet these demands with higher-level analysis-powered abstractions and functionalities of wider program scope. This paper introduces NOELLE, a robust open-source domain-independent compilation layer built upon LLVM providing this support. NOELLE extends abstractions and functionalities provided by LLVM enabling advanced, program-wide code analyses and transformations. This paper shows the power of NOELLE by presenting a diverse set of 11 custom tools built upon it.
more » « less
Full Text Available

« Prev Next »

Search for: All records