NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Processor-Oblivious Record and Replay

https://doi.org/10.1145/3365659

Utterback, Robert; Agrawal, Kunal; Lee, I-Ting Angelina; Kulkarni, Milind (December 2019, ACM Transactions on Parallel Computing)
null (Ed.)
Full Text Available
Efficient race detection with futures

https://doi.org/10.1145/3293883.3295732

Utterback, Robert; Agrawal, Kunal; Fineman, Jeremy; Lee, I-Ting Angelina (January 2019, Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming)

Full Text Available
Race detection and reachability in nearly series-parallel DAGs

https://doi.org/10.1137/1.9781611975031.11

Agrawal, Kunal; Devietti, Joseph; Fineman, Jeremy T.; Lee, I-Ting Angelina; Utterback, Robert; Xu, Changming (January 2018, Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms)

Full Text Available
Processor-Oblivious Record and Replay

https://doi.org/10.1145/3018743.3018764

Utterback, Robert; Agrawal, Kunal; Lee, I-Ting Angelina; Kulkarni, Milind (February 2017, Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming)

Record-and-replay systems are useful tools for debugging non-deterministic parallel programs by first recording an execution and then replaying that execution to produce the same access pattern. Existing record-and-replay systems generally target thread-based execution models, and record the behaviors and interleavings of individual threads. Dynamic multithreaded languages and libraries, such as the Cilk family, OpenMP, TBB, etc., do not have a notion of threads. Instead, these languages provide a processor-oblivious model of programming, where programs expose task-parallelism using high-level constructs such as spawn/sync without regard to the number of threads/cores available to run the program. Thread-based record-and-replay would violate the processor-oblivious nature of these programs, as they incorporate the number of threads into the recorded information, constraining the replayed execution to the same number of threads. In this paper, we present a processor-oblivious record-and-replay scheme for such languages where record and replay can use different number of processors and both are scheduled using work stealing. We provide theoretical guarantees for our record and replay scheme --- namely that record is optimal for programs with one lock and replay is near-optimal for all cases. In addition, we implemented this scheme in the Cilk Plus runtime system and our evaluation indicates that processor-obliviousness does not cause substantial overheads.
more » « less
Full Text Available
Provably Good and Practically Efficient Parallel Race Detection for Fork-Join Programs

https://doi.org/10.1145/2935764.2935801

Utterback, Robert; Agrawal, Kunal; Fineman, Jeremy T.; Lee, I-Ting Angelina (July 2016, Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures)

If a parallel program has determinacy race(s), different schedules can result in memory accesses that observe different values --- various race-detection tools have been designed to find such bugs. A key component of race detectors is an algorithm for series-parallel (SP) maintenance, which identifies whether two accesses are logically parallel. This paper describes an asymptotically optimal algorithm, called WSP-Order, for performing SP maintenance in programs with fork-join (or nested) parallelism. Given a fork-join program with T1 work and T∞ span, WSP-Order executes it while also maintaining SP relationships in O(T1/P + T∞) time on P processors, which is asymptotically optimal. At the heart of WSP-Order is a work-stealing scheduler designed specifically for SP maintenance. We also implemented C-RACER, a race-detector based on WSP-Order within the Cilk Plus runtime system, and evaluated its performance on five benchmarks. Empirical results demonstrate that when run sequentially, it performs almost as well as previous best sequential race detectors. More importantly, when run in parallel, it achieves almost as much speedup as the original program without race-detection.
more » « less
Full Text Available

Search for: All records