skip to main content


Title: Algorithms and Frameworks for Accelerating Security Applications on HPC Platforms
Typical cybersecurity solutions emphasize on achieving defense functionalities. However, execution efficiency and scalability are equally important, especially for real-world deployment. Straightforward mappings of cybersecurity applications onto HPC platforms may significantly underutilize the HPC devices’ capacities. On the other hand, the sophisticated implementations are quite difficult: they require both in-depth understandings of cybersecurity domain-specific characteristics and HPC architecture and system model. In our work, we investigate three sub-areas in cybersecurity, including mobile software security, network security, and system security. They have the following performance issues, respectively: 1) The flow- and context-sensitive static analysis for the large and complex Android APKs are incredibly time-consuming. Existing CPU-only frameworks/tools have to set a timeout threshold to cease the program analysis to trade the precision for performance. 2) Network intrusion detection systems (NIDS) use automata processing as its searching core and requires line-speed processing. However, achieving high-speed automata processing is exceptionally difficult in both algorithm and implementation aspects. 3) It is unclear how the cache configurations impact time-driven cache side-channel attacks’ performance. This question remains open because it is difficult to conduct comparative measurement to study the impacts. In this dissertation, we demonstrate how application-specific characteristics can be leveraged to optimize implementations on various types of HPC for faster and more scalable cybersecurity executions. For example, we present a new GPU-assisted framework and a collection of optimization strategies for fast Android static data-flow analysis that achieve up to 128X speedups against the plain GPU implementation. For network intrusion detection systems (IDS), we design and implement an algorithm capable of eliminating the state explosion in out-of-order packet situations, which reduces up to 400X of the memory overhead. We also present tools for improving the usability of Micron’s Automata Processor. To study the cache configurations’ impact on time-driven cache side-channel attacks’ performance, we design an approach to conducting comparative measurement. We propose a quantifiable success rate metric to measure the performance of time-driven cache attacks and utilize the GEM5 platform to emulate the configurable cache.  more » « less
Award ID(s):
1565314 1838271
NSF-PAR ID:
10111271
Author(s) / Creator(s):
Date Published:
Journal Name:
Virginia Tech Theses
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Time-driven and access-driven attacks are two dominant types of the timing-based cache side-channel attacks. Despite access-driven attacks are popular in recent years, investigating the time-driven attacks is still worth the effort. It is because, in contrast to the access-driven attacks, time-driven attacks are independent of the attackers’ cache access privilege. Although cache configurations can impact the time-driven attacks’ performance, it is unclear how different cache parameters influence the attacks’ success rates. This question remains open because it is extremely difficult to conduct comparative measurements. The difficulty comes from the unavailability of the configurable caches in existing CPU products. In this paper, we utilize the GEM5 platform to measure the impacts of different cache parameters, including Private Cache Size and Associativity, Shared Cache Size and Associativity, Cache-line Size, Replacement Policy, and Clusivity. In order to make the time-driven attacks comparable, we define the equivalent key length (EKL) to describe the attacks’ success rates. Key findings from the measurement results include (i) private cache has a key effect on the attacks’ success rates; (ii) changing shared cache has a trivial effect on the success rates, but adding neighbor processes can make the effect significant; (iii) the Random replacement policy leads to the highest success rates while the LRU/LFU are the other way around; (iv) the exclusive policy makes the attacks harder to succeed compared to the inclusive policy. We finally leverage these findings to provide suggestions to the attackers and defenders as well as the future system designers. 
    more » « less
  2. Graphics processing units (GPUs) are becoming default accelerators in many domains such as high-performance computing (HPC), deep learning, and virtual/augmented reality. Recently, GPUs have also shown significant speedups for a variety of security-sensitive applications such as encryptions. These speedups have largely benefited from the high memory bandwidth and compute throughput of GPUs. One of the key features to optimize the memory bandwidth consumption in GPUs is intra-warp memory access coalescing, which merges memory requests originating from different threads of a single warp into as few cache lines as possible. However, this coalescing feature is also shown to make the GPUs prone to the correlation timing attacks as it exposes the relationship between the execution time and the number of coalesced accesses. Consequently, an attacker is able to correctly reveal an AES private key via repeatedly gathering encrypted data and execution time on a GPU. In this work, we propose a series of defense mechanisms to alleviate such timing attacks by carefully trading off performance for improved security. Specifically, we propose to randomize the coalescing logic such that the attacker finds it hard to guess the correct number of coalesced accesses generated. To this end, we propose to randomize: a) the granularity (called as subwarp) at which warp threads are grouped together for coalescing, and b) the threads selected by each subwarp for coalescing. Such randomization techniques result in three mechanisms: fixed-sized subwarp (FSS), random-sized subwarp (RSS), and random-threaded subwarp (RTS). We find that the combination of these security mechanisms offers 24- to 961-times improvement in the security against the correlation timing attacks with 5 to 28% performance degradation. Online copy: http://adwaitjog.github.io/docs/pdf/rcoal-hpca18.pdf 
    more » « less
  3. Smart mobile devices have become an integral part of people's life and users often input sensitive information on these devices. However, various side channel attacks against mobile devices pose a plethora of serious threats against user security and privacy. To mitigate these attacks, we present a novel secure Back-of-Device (BoD) input system, SecTap, for mobile devices. To use SecTap, a user tilts her mobile device to move a cursor on the keyboard and tap the back of the device to secretly input data. We design a tap detection method by processing the stream of accelerometer readings to identify the user's taps in real time. The orientation sensor of the mobile device is used to control the direction and the speed of cursor movement. We also propose an obfuscation technique to randomly and effectively accelerate the cursor movement. This technique not only preserves the input performance but also keeps the adversary from inferring the tapped keys. Extensive empirical experiments were conducted on different smart phones to demonstrate the usability and security on both Android and iOS platforms. 
    more » « less
  4. Integration of complex and high-speed electronic components in the state of art electric power system enhances the need for improved security infrastructure and resilience against invasive and non-invasive attacks on the smart grid. A modern smart grid system integrates a variety of instruments and standards to achieve cost-effective and time-effective energy measurement and management. As the fundamental component in the smart grid, the smart meter supports real-time monitoring, automatic control, and high-speed communication along with power consumption recording. However, the wide use of smart meters also increases privacy and security concerns. In this paper, we demonstrate the vulnerability of side-channel attacks on secure communication in smart grids for software-based and hardware-based implementations. 
    more » « less
  5. This paper describes a new benchmark tool, Spatter, for assessing memory system architectures in the context of a specific category of indexed accesses known as gather and scatter. These types of operations are increasingly used to express sparse and irregular data access patterns, and they have widespread utility in many modern HPC applications including scientific simulations, data mining and analysis computations, and graph processing. However, many traditional benchmarking tools like STREAM, STRIDE, and GUPS focus on characterizing only uniform stride or fully random accesses despite evidence that modern applications use varied sets of more complex access patterns. Spatter is an open-source benchmark that provides a tunable and configurable framework to benchmark a variety of indexed access patterns, including variations of gather / scatter that are seen in HPC mini-apps evaluated in this work. The design of Spatter includes backends for OpenMP and CUDA, and experiments show how it can be used to evaluate 1) uniform access patterns for CPU and GPU, 2) prefetching regimes for gather / scatter, 3) compiler implementations of vectorization for gather / scatter, and 4) trace-driven "proxy patterns" that reflect the patterns found in multiple applications. The results from Spatter experiments show, for instance, that GPUs typically outperform CPUs for these operations in absolute bandwidth but not fraction of peak bandwidth, and that Spatter can better represent the performance of some cache-dependent mini-apps than traditional STREAM bandwidth measurements. 
    more » « less