skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Kim, Sungkeun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Memory safety invariants extracted from a program can help defend and detect against both software and hardware memory violations. For instance, by allowing only specific instructions to access certain memory locations, system can detect out-of-bound or illegal pointer dereferences that lead to correctness and security issues. In this paper, we propose CPU abstractions, called, to specify and check program invariants to provide defense mechanism against both software and hardware memory violations at runtime. ensures that the invariants must be satisfied at every memory accesses. We present a fast invariant address translation and retrieval scheme using a specialized cache. It stores and checks invariants related to global, stack and heap objects. The invariant checks can be performed synchronously or asynchronously. uses synchronous checking for high security-critical programs, while others are protected by asynchronous checking. A fast exception is proposed to alert any violations as soon as possible in order to close the gap for transient attacks. Our evaluation shows that can detect both software and hardware, spatial and temporal memory violations. incurs 53% overhead when checking synchronously, or 15% overhead when checking asynchronously. 
    more » « less
  2. Large-scale distributed deep learning training has enabled developments of more complex deep neural network models to learn from larger datasets for sophisticated tasks. In particular, distributed stochastic gradient descent intensively invokes all-reduce operations for gradient update, which dominates communication time during iterative training epochs. In this work, we identify the inefficiency in widely used all-reduce algorithms, and the opportunity of algorithm-architecture co-design. We propose MultiTree all-reduce algorithm with topology and resource utilization awareness for efficient and scalable all-reduce operations, which is applicable to different interconnect topologies. Moreover, we co-design the network interface to schedule and coordinate the all-reduce messages for contention-free communications, working in synergy with the algorithm. The flow control is also simplified to exploit the bulk data transfer of big gradient exchange. We evaluate the co-design using different all-reduce data sizes for synthetic study, demonstrating its effectiveness on various interconnection network topologies, in addition to state-of-the-art deep neural networks for real workload experiments. The results show that MultiTree achieves 2.3× and 1.56× communication speedup, as well as up to 81% and 30% training time reduction compared to ring all-reduce and state-of-the-art approaches, respectively. 
    more » « less