NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Enhancing Program Analysis with Deterministic Distinguishable Calling Context

https://doi.org/10.1145/3708493.3712679

Kim, Sungkeun; Nguyen, Khanh; Tsai, Chia-Che; Lee, Jaewoo; Muzahid, Abdullah; Kim, Eun Jung (February 2025, ACM)

Calling context is crucial for improving the precision of program analyses in various use cases (clients), such as profiling, debugging, optimization, and security checking. Often the calling context is encoded using a numerical value. We have observed that many clients benefit not only from a deterministic but also globally distinguishable value across runs to simplify bookkeeping and guarantee complete uniqueness. However, existing work only guarantees determinism, not global distinguishability. Clients need to develop auxiliary helpers, which incurs considerable overhead to distinguish encoded values among all calling contexts. In this paper, we propose Deterministic Distinguishable Calling Context Encoding () that can enable both properties of calling context encoding natively. The key idea of is leveraging the static call graph and encoding each calling context as the running call path count. Thereby, a mapping is established statically and can be readily used by the clients. Our experiments with two client tools show that has a comparable overhead compared to two state-of-the-art encoding schemes, PCCE and PCC, and further avoids the expensive overheads of collision detection, up to 2.1× and 50%, for Splash-3 and SPEC CPU 2017, respectively.
more » « less
Free, publicly-accessible full text available February 25, 2026
Enhancing Collective Communication in MCM Accelerators for Deep Learning Training

Laskar, Sabuj; Majhi, Pranati; Kim, Sungkeun; Mahmud, Farabi; Muzahid, Abdullar; Kim, Eun J. (March 2024, IEEE)
Attack of the Knights:Non Uniform Cache Side Channel Attack

https://doi.org/10.1145/3627106.3627199

Mahmud, Farabi; Kim, Sungkeun; Chawla, Harpreet Singh; Kim, Eun Jung; Tsai, Chia-Che; Muzahid, Abdullah (December 2023, ACM)
WHISTLE: CPU Abstractions for Hardware and Software Memory Safety Invariants

https://doi.org/10.1109/TC.2022.3180990

Kim, Sungkeun; Mahmud, Farabi; Huang, Jiayi; Majumder, Pritam; Tsai, Chia-Che; Muzahid, Abdullah; Kim, Eun Jung (June 2022, IEEE Transactions on Computers)

Memory safety invariants extracted from a program can help defend and detect against both software and hardware memory violations. For instance, by allowing only specific instructions to access certain memory locations, system can detect out-of-bound or illegal pointer dereferences that lead to correctness and security issues. In this paper, we propose CPU abstractions, called, to specify and check program invariants to provide defense mechanism against both software and hardware memory violations at runtime. ensures that the invariants must be satisfied at every memory accesses. We present a fast invariant address translation and retrieval scheme using a specialized cache. It stores and checks invariants related to global, stack and heap objects. The invariant checks can be performed synchronously or asynchronously. uses synchronous checking for high security-critical programs, while others are protected by asynchronous checking. A fast exception is proposed to alert any violations as soon as possible in order to close the gap for transient attacks. Our evaluation shows that can detect both software and hardware, spatial and temporal memory violations. incurs 53% overhead when checking synchronously, or 15% overhead when checking asynchronously.
more » « less
Full Text Available
Communication Algorithm-Architecture Co-Design for Distributed Deep Learning

https://doi.org/10.1109/ISCA52012.2021.00023

Huang, Jiayi; Majumder, Pritam; Kim, Sungkeun; Muzahid, Abdullah; Yum, Ki Hwan; Kim, Eun Jung (June 2021, 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA))

Large-scale distributed deep learning training has enabled developments of more complex deep neural network models to learn from larger datasets for sophisticated tasks. In particular, distributed stochastic gradient descent intensively invokes all-reduce operations for gradient update, which dominates communication time during iterative training epochs. In this work, we identify the inefficiency in widely used all-reduce algorithms, and the opportunity of algorithm-architecture co-design. We propose MultiTree all-reduce algorithm with topology and resource utilization awareness for efficient and scalable all-reduce operations, which is applicable to different interconnect topologies. Moreover, we co-design the network interface to schedule and coordinate the all-reduce messages for contention-free communications, working in synergy with the algorithm. The flow control is also simplified to exploit the bulk data transfer of big gradient exchange. We evaluate the co-design using different all-reduce data sizes for synthetic study, demonstrating its effectiveness on various interconnection network topologies, in addition to state-of-the-art deep neural networks for real workload experiments. The results show that MultiTree achieves 2.3× and 1.56× communication speedup, as well as up to 81% and 30% training time reduction compared to ring all-reduce and state-of-the-art approaches, respectively.
more » « less
Full Text Available
ReViCe: Reusing Victim Cache to Prevent Speculative Cache Leakage

https://doi.org/10.1109/SecDev45635.2020.00029

Kim, Sungkeun; Mahmud, Farabi; Huang, Jiayi; Majumder, Pritam; Christou, Neophytos; Muzahid, Abdullah; Tsai, Chia-Che; Kim, Eun Jung (September 2020, IEEE Secure Development (SecDev))

Full Text Available

Search for: All records