Typical cybersecurity solutions emphasize on achieving defense functionalities. However, execution
efficiency and scalability are equally important, especially for real-world deployment.
Straightforward mappings of cybersecurity applications onto HPC platforms may significantly underutilize
the HPC devices’ capacities. On the other hand, the sophisticated implementations are
quite difficult: they require both in-depth understandings of cybersecurity domain-specific characteristics
and HPC architecture and system model.
In our work, we investigate three sub-areas in cybersecurity, including mobile software security,
network security, and system security. They have the following performance issues, respectively:
1) The flow- and context-sensitive static analysis for the large and complex Android APKs are
incredibly time-consuming. Existing CPU-only frameworks/tools have to set a timeout threshold to
cease the program analysis to trade the precision for performance. 2) Network intrusion detection
systems (NIDS) use automata processing as its searching core and requires line-speed processing.
However, achieving high-speed automata processing is exceptionally difficult in both algorithm
and implementation aspects. 3) It is unclear how the cache configurations impact time-driven
cache side-channel attacks’ performance. This question remains open because it is difficult to
conduct comparative measurement to study the impacts.
In this dissertation, we demonstrate how application-specific characteristics can be leveraged to
optimize implementations on various types of HPC for faster and more scalable cybersecurity executions.
For example, we present a new GPU-assisted framework and a collection of optimization
strategies for fast Android static data-flow analysis that achieve up to 128X speedups against the
plain GPU implementation. For network intrusion detection systems (IDS), we design and implement
an algorithm capable of eliminating the state explosion in out-of-order packet situations,
which reduces up to 400X of the memory overhead. We also present tools for improving the usability
of Micron’s Automata Processor. To study the cache configurations’ impact on time-driven
cache side-channel attacks’ performance, we design an approach to conducting comparative measurement.
We propose a quantifiable success rate metric to measure the performance of time-driven
cache attacks and utilize the GEM5 platform to emulate the configurable cache.
more »
« less
GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting
Many popular vetting tools for Android applications use static code analysis techniques. In particular, Inter-procedural Data-Flow Graph (IDFG) construction is the computation at the core of Android static data-flow analysis and consumes most of the analysis time. Many analysis tools use a worklist algorithm, an iterative fixed-point approach, to construct the IDFG. In this paper, we observe that a straightforward GPU parallelization of the worklist algorithm leads to significant underutilization of the GPU resources. We identify four performance bottlenecks, namely, frequent dynamic memory allocations, high branch divergence, workload imbalance, and irregular memory access patterns. Accordingly, we propose GDroid, a GPU-based worklist algorithm implementation with multiple fine-grained optimizations tailored to common characteristics of Android applications. The optimizations considered are: matrix-based data structure, memory access-based node grouping, and worklist merging. Our experimental evaluation, performed on 1000 Android applications, shows that the proposed optimizations are beneficial to performance, and GDroid can achieve up to 128X speedups against a plain GPU implementation.
more »
« less
- NSF-PAR ID:
- 10178646
- Date Published:
- Journal Name:
- 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
- Volume:
- 1
- Page Range / eLocation ID:
- 274 to 284
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Today’s large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing lossy compressors for scientific data cannot achieve a high compression ratio and throughput simultaneously, hindering their adoption in many applications requiring fast compression, such as in-memory compression. To this end, in this work, we develop a fast and high-ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU). Specifically, we first design a new compression pipeline that consists of fully parallelized quantization, bitshuffle, and our newly designed fast encoding. Then, we propose a series of deep architectural optimizations for each kernel in the pipeline to take full advantage of CUDA architectures. We propose a warp-level optimization to avoid data conflicts for bit-wise operations in bitshuffle, maximize shared memory utilization, and eliminate unnecessary data movements by fusing different compression kernels. Finally, we evaluate FZ-GPU on two NVIDIA GPUs (i.e., A100 and RTX A4000) using six representative scientific datasets from SDRBench. Results on the A100 GPU show that FZ-GPU achieves an average speedup of 4.2× over cuSZ and an average speedup of 37.0× over a multi-threaded CPU implementation of our algorithm under the same error bound. FZ-GPU also achieves an average speedup of 2.3× and an average compression ratio improvement of 2.0× over cuZFP under the same data distortion.more » « less
-
Decision Forests are popular machine learning techniques that assist scientists to extract knowledge from massive data sets. This class of tool remains popular because of their interpretability and ease of use, unlike other modern machine learning methods, such as kernel machines and deep learning. Decision forests also scale well for use with large data because training and run time operations are trivially parallelizable allowing for high inference throughputs. A negative aspect of these forests, and an untenable property for many real time applications, is their high inference latency caused by the combination of large model sizes with random memory access patterns. We present memory packing techniques and a novel tree traversal method to overcome this deficiency. The result of our system is a grouping of trees into a hierarchical structure. At low levels, we pack the nodes of multiple trees into contiguous memory blocks so that each memory access fetches data for multiple trees. At higher levels, we use leaf cardinality to identify the most popular paths through a tree and collocate those paths in contiguous cache lines. We extend this layout with a re-ordering of the tree traversal algorithm to take advantage of the increased memory throughput provided by out-of-order execution and cache-line prefetching. Together, these optimizations increase the performance and parallel scalability of classification in ensembles by a factor of ten over an optimized C++ implementation and a popular R-language implementation.more » « less
-
This Innovative Practice Work in Progress presents a plugin tool named DroidPatrol. It can be integrated with the Android Studio to perform tainted data flow analysis of mobile applications. Most vulnerabilities should be addressed and fixed during the development phase. Computer users, managers, and developers agree that we need software and systems that are “more secure”. Such efforts require support from both the educational institutions and learning communities to improve software assurance, particularly in writing secure code. Many open source static analysis tools help developers to maintain and clean up the code. However, they are not able to find potential security bugs. Our work is aimed to checking of security issues within Android applications during implementation. We provide an example hands-on lab based on DroidPatrol prototype and share the initial evaluation feedback from a classroom. The initial results show that the plugin based hands-on lab generates interests among learners and has the promise of acting as an intervention tool for secure software development.more » « less
-
This Innovative Practice Work in Progress presents a plugin tool named DroidPatrol. It can be integrated with the Android Studio to perform tainted data flow analysis of mobile applications. Most vulnerabilities should be addressed and fixed during the development phase. Computer users, managers, and developers agree that we need software and systems that are “more secure”. Such efforts require support from both the educational institutions and learning communities to improve software assurance, particularly in writing secure code. Many open source static analysis tools help developers to maintain and clean up the code. However, they are not able to find potential security bugs. Our work is aimed to checking of security issues within Android applications during implementation. We provide an example hands-on lab based on DroidPatrol prototype and share the initial evaluation feedback from a classroom. The initial results show that the plugin based hands-on lab generates interests among learners and has the promise of acting as an intervention tool for secure software development.more » « less