NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CAMP: Compiler and Allocator-based Heap Memory Protection

Lin, Zhenpeng; Yu, Zheng; Guo, Ziyi; Campanoni, Simone; Dinda, Pete; Xing, Xinyu (August 2024, USENIX)

Full Text Available
CAMP: Compiler and Allocator-based Heap Memory Protection

Lin, Zhenpeng; Yu, Zheng; Guo, Ziyi; Campanoni, Simone; Dinda, Peter; Xing, Xinyu (August 2024, USENIX Security 2024)

Full Text Available
Getting a Handle on Unmanaged Memory

https://doi.org/10.1145/3620666.3651326

Wanninger, Nick; McMichen, Tommy; Campanoni, Simone; Dinda, Peter (April 2024, ACM)

Full Text Available
TrackFM: Far-out Compiler Support for a Far Memory World

https://doi.org/10.1145/3617232.3624856

Tauro, Brian R; Suchy, Brian; Campanoni, Simone; Dinda, Peter; Hale, Kyle C (April 2024, ACM)

Full Text Available
PROMPT: A Fast and Extensible Memory Profiling Framework

https://doi.org/10.1145/3649827

Xu, Ziyang; Chon, Yebin; Su, Yian; Tan, Zujun; Apostolakis, Sotiris; Campanoni, Simone; August, David I (April 2024, Proceedings of the ACM on Programming Languages)

Memory profiling captures programs’ dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique program trace summary, various memory profiler types have been developed. Yet, designing practical memory profilers often requires extensive compiler expertise, adeptness in program optimization, and significant implementation effort. This often results in a void where aspirations for fast and robust profilers remain unfulfilled. To bridge this gap, this paper presents PROMPT, a framework for streamlined development of fast memory profilers. With PROMPT, developers need only specify profiling events and define the core profiling logic, bypassing the complexities of custom instrumentation and intricate memory profiling components and optimizations. Two state-of-the-art memory profilers were ported with PROMPT where all features preserved. By focusing on the core profiling logic, the code was reduced by more than 65% and the profiling overhead was improved by 5.3× and 7.1× respectively. To further underscore PROMPT’s impact, a tailored memory profiling workflow was constructed for a sophisticated compiler optimization client. In 570 lines of code, this redesigned workflow satisfies the client’s memory profiling needs while achieving more than 90% reduction in profiling overhead and improved robustness compared to the original profilers.
more » « less
Full Text Available
Representing Data Collections in an SSA Form

https://doi.org/10.1109/CGO57630.2024.10444817

McMichen, Tommy; Greiner, Nathan; Zhong, Peter; Sossai, Federico; Patel, Atmn; Campanoni, Simone (March 2024, IEEE)

Full Text Available
Compiling Loop-Based Nested Parallelism for Irregular Workloads

https://doi.org/10.1145/3620665.3640405

Su, Yian; Rainey, Mike; Wanninger, Nick; Dhiantravan, Nadharm; Liang, Jasper; Acar, Umut A; Dinda, Peter; Campanoni, Simone (April 2024, ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Modern programming languages offer special syntax and semantics for logical fork-join parallelism in the form of parallel loops, allowing them to be nested, e.g., a parallel loop within another parallel loop. This expressiveness comes at a price, however: on modern multicore systems, realizing logical parallelism results in overheads due to the creation and management of parallel tasks, which can wipe out the benefits of parallelism. Today, we expect application programmers to cope with it by manually tuning and optimizing their code. Such tuning requires programmers to reason about architectural factors hidden behind layers of software abstractions, such as task scheduling and load balancing. Managing these factors is particularly challenging when workloads are irregular because their performance is input-sensitive. This paper presents HBC, the first compiler that translates C/C++ programs with high-level, fork-join constructs (e.g., OpenMP) to binaries capable of automatically controlling the cost of parallelism and dealing with irregular, input-sensitive workloads. The basis of our approach is Heartbeat Scheduling, a recent proposal for automatic granularity control, which is backed by formal guarantees on performance. HBC binaries outperform OpenMP binaries for workloads for which even entirely manual solutions struggle to find the right balance between parallelism and its costs.
more » « less
Full Text Available
GhOST: a GPU Out-of-Order Scheduling Technique for Stall Reduction

https://doi.org/10.1109/ISCA59077.2024.00011

Chaturvedi, Ishita; Godala, Bhargav Reddy; Wu, Yucan; Xu, Ziyang; Iliakis, Konstantinos; Eleftherakis, Panagiotis-Eleftherios; Xydis, Sotirios; Soudris, Dimitrios; Sorensen, Tyler; Campanoni, Simone; et al (June 2024, IEEE)

Full Text Available
Revisiting Computation for Research: Practices and Trends

https://doi.org/10.1109/SC41406.2024.00076

Giordani, Jeremiah; Xu, Ziyang; Colby, Ella; Ning, August; Godala, Bhargav Reddy; Chaturvedi, Ishita; Zhu, Shaowei; Chon, Yebin; Chan, Greg; Tan, Zujun; et al (November 2024, IEEE)

Full Text Available
SPLENDID: Supporting Parallel LLVM-IR Enhanced Natural Decompilation for Interactive Development

https://doi.org/10.1145/3582016.3582058

Tan, Zujun; Chon, Yebin; Kruse, Michael; Doerfert, Johannes; Xu, Ziyang; Homerding, Brian; Campanoni, Simone; August, David I. (March 2023, International Conference on Architectural Support for Programming Languages and Operating Systems)

Manually writing parallel programs is difficult and error-prone. Automatic parallelization could address this issue, but profitability can be limited by not having facts known only to the programmer. A parallelizing compiler that collaborates with the programmer can increase the coverage and performance of parallelization while reducing the errors and overhead associated with manual parallelization. Unlike collaboration involving analysis tools that report program properties or make parallelization suggestions to the programmer, decompiler-based collaboration could leverage the strength of existing parallelizing compilers to provide programmers with a natural compiler-parallelized starting point for further parallelization or refinement. Despite this potential, existing decompilers fail to do this because they do not generate portable parallel source code compatible with any compiler of the source language. This paper presents SPLENDID, an LLVM-IR to C/OpenMP decompiler that enables collaborative parallelization by producing standard parallel OpenMP code. Using published manual parallelization of the PolyBench benchmark suite as a reference, SPLENDID's collaborative approach produces programs twice as fast as either Polly-based automatic parallelization or manual parallelization alone. SPLENDID's portable parallel code is also more natural than that from existing decompilers, obtaining a 39x higher average BLEU score.
more » « less
Full Text Available

« Prev Next »

Search for: All records