NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient Differencing of System-level Provenance Graphs

https://doi.org/10.1145/3583780.3615171

Nakamura, Yuta; Kanj, Iyad; Malik, Tanu (October 2023, ACM)
Querying Container Provenance

https://doi.org/10.1145/3543873.3587568

Modi, Aniket; Reyad, Moaz; Malik, Tanu; Gehani, Ashish (April 2023, WWW '23 Companion: Companion Proceedings of the ACM Web Conference 2023)

Full Text Available
Reproducible Notebook Containers using Application Virtualization

https://doi.org/10.1109/eScience55777.2022.00015

Ahmad, Raza; Manne, Naga Nithin; Malik, Tanu (October 2022, IEEE 18th International Conference on e-Science (e-Science))

Full Text Available
CHEX: Multiversion Replay with Ordered Checkpoints.

https://doi.org/10.14778/3514061.3514075

Naga Nithin Manne, Shilvi Satpati (January 2022, Proceedings of the Very Large Databases)
J. Freire and Xuemin Lin (Ed.)
In scientific computing and data science disciplines, it is often necessary to share application workflows and repeat results. Current tools containerize application workflows, and share the resulting container for repeating results. These tools, due to containerization, do improve sharing of results. However, they do not improve the efficiency of replay. In this paper, we present the multiversion replay problem, which arises when multiple versions of an application are containerized, and each version must be replayed to repeat results. To avoid executing each version separately, we develop CHEX, which checkpoints program state and determines when it is permissible to reuse program state across versions. It does so using system call-based execution lineage. Our capability to identify common computations across versions enables us to consider optimizing replay using an in-memory cache, based on a checkpoint-restore-switch system. We show the multiversion replay problem is NP-hard, and propose efficient heuristics for it. CHEX reduces overall replay time by sharing common computations but avoids storing a large number of checkpoints. We demonstrate that CHEX maintains lightweight package sharing, and improves the total time of multiversion replay by 50% on average.
more » « less
Full Text Available
Expanding the Scope of Artifact Evaluation at HPC Conferences: Experience of SC21

Tanu Malik, Anjo Vahldiek-Oberwagner (January 2022, Proceedings of Practical Reproducible Evaluation in Computer Systems)

A scientific paper consists of a constellation of artifacts that ex- tend beyond the document itself: software, hardware, evaluation data and documentation, raw survey results, mechanized proofs, models, test suites, benchmarks, and so on. In some cases, the quality of these artifacts is as important as that of the document itself. Based on the success of the Artifact Evaluation efforts at other systems conferences, the 2021 International Conference for High Performance Computing, Networking, Storage, and Analysis (SC21) organized a comprehensive Artifact Description/Artifact Evaluation (AD/AE) review and competition as part of the SC21 Reproducibility Initiative. This paper summarizes the key findings of the AD/AE effort.
more » « less
Full Text Available
Provenance-based Workflow Diagnostics Using Program Specification

Nakamura, Y. Malik (January 2022, 29th IEEE International Conference on High Performance Computing, Data, and Analytics)

Full Text Available
Reproducibility Practice in High-Performance Computing: Community Survey Results

https://doi.org/10.1109/MCSE.2021.3096678

Plale, Beth A.; Malik, Tanu; Pouchard, Line C. (September 2021, Computing in Science & Engineering)

Full Text Available
Documenting Computing Environments for Reproducible Experiments

https://doi.org/10.3233/APC200106

J. Chuah, M.Deeds (March 2020, Parallel Computing: Technology Trends)

Full Text Available
Efficient Provenance Alignment in Reproduced Executions

Y. Nakamura, T. Malik (January 2020, USENIX Theory and Practice of Provenance)

Full Text Available
SciInc: A Container Runtime for Incremental Recomputation

https://doi.org/10.1109/eScience.2019.00040

A. Youngdahl, D.H. Ton (January 2019, IEEE 15th International Conference on eScience)

Full Text Available

Search for: All records