skip to main content


Title: GPU-Based Static Data-Flow Analysis for Fast and Scalable Android App Vetting
Many popular vetting tools for Android applications use static code analysis techniques. In particular, Inter-procedural Data-Flow Graph (IDFG) construction is the computation at the core of Android static data-flow analysis and consumes most of the analysis time. Many analysis tools use a worklist algorithm, an iterative fixed-point approach, to construct the IDFG. In this paper, we observe that a straightforward GPU parallelization of the worklist algorithm leads to significant underutilization of the GPU resources. We identify four performance bottlenecks, namely, frequent dynamic memory allocations, high branch divergence, workload imbalance, and irregular memory access patterns. Accordingly, we propose GDroid, a GPU-based worklist algorithm implementation with multiple fine-grained optimizations tailored to common characteristics of Android applications. The optimizations considered are: matrix-based data structure, memory access-based node grouping, and worklist merging. Our experimental evaluation, performed on 1000 Android applications, shows that the proposed optimizations are beneficial to performance, and GDroid can achieve up to 128X speedups against a plain GPU implementation.  more » « less
Award ID(s):
1717862 1838271
NSF-PAR ID:
10178646
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
Volume:
1
Page Range / eLocation ID:
274 to 284
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Typical cybersecurity solutions emphasize on achieving defense functionalities. However, execution efficiency and scalability are equally important, especially for real-world deployment. Straightforward mappings of cybersecurity applications onto HPC platforms may significantly underutilize the HPC devices’ capacities. On the other hand, the sophisticated implementations are quite difficult: they require both in-depth understandings of cybersecurity domain-specific characteristics and HPC architecture and system model. In our work, we investigate three sub-areas in cybersecurity, including mobile software security, network security, and system security. They have the following performance issues, respectively: 1) The flow- and context-sensitive static analysis for the large and complex Android APKs are incredibly time-consuming. Existing CPU-only frameworks/tools have to set a timeout threshold to cease the program analysis to trade the precision for performance. 2) Network intrusion detection systems (NIDS) use automata processing as its searching core and requires line-speed processing. However, achieving high-speed automata processing is exceptionally difficult in both algorithm and implementation aspects. 3) It is unclear how the cache configurations impact time-driven cache side-channel attacks’ performance. This question remains open because it is difficult to conduct comparative measurement to study the impacts. In this dissertation, we demonstrate how application-specific characteristics can be leveraged to optimize implementations on various types of HPC for faster and more scalable cybersecurity executions. For example, we present a new GPU-assisted framework and a collection of optimization strategies for fast Android static data-flow analysis that achieve up to 128X speedups against the plain GPU implementation. For network intrusion detection systems (IDS), we design and implement an algorithm capable of eliminating the state explosion in out-of-order packet situations, which reduces up to 400X of the memory overhead. We also present tools for improving the usability of Micron’s Automata Processor. To study the cache configurations’ impact on time-driven cache side-channel attacks’ performance, we design an approach to conducting comparative measurement. We propose a quantifiable success rate metric to measure the performance of time-driven cache attacks and utilize the GEM5 platform to emulate the configurable cache. 
    more » « less
  2. Today’s large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing lossy compressors for scientific data cannot achieve a high compression ratio and throughput simultaneously, hindering their adoption in many applications requiring fast compression, such as in-memory compression. To this end, in this work, we develop a fast and high-ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU). Specifically, we first design a new compression pipeline that consists of fully parallelized quantization, bitshuffle, and our newly designed fast encoding. Then, we propose a series of deep architectural optimizations for each kernel in the pipeline to take full advantage of CUDA architectures. We propose a warp-level optimization to avoid data conflicts for bit-wise operations in bitshuffle, maximize shared memory utilization, and eliminate unnecessary data movements by fusing different compression kernels. Finally, we evaluate FZ-GPU on two NVIDIA GPUs (i.e., A100 and RTX A4000) using six representative scientific datasets from SDRBench. Results on the A100 GPU show that FZ-GPU achieves an average speedup of 4.2× over cuSZ and an average speedup of 37.0× over a multi-threaded CPU implementation of our algorithm under the same error bound. FZ-GPU also achieves an average speedup of 2.3× and an average compression ratio improvement of 2.0× over cuZFP under the same data distortion. 
    more » « less
  3. Decision Forests are popular machine learning techniques that assist scientists to extract knowledge from massive data sets. This class of tool remains popular because of their interpretability and ease of use, unlike other modern machine learning methods, such as kernel machines and deep learning. Decision forests also scale well for use with large data because training and run time operations are trivially parallelizable allowing for high inference throughputs. A negative aspect of these forests, and an untenable property for many real time applications, is their high inference latency caused by the combination of large model sizes with random memory access patterns. We present memory packing techniques and a novel tree traversal method to overcome this deficiency. The result of our system is a grouping of trees into a hierarchical structure. At low levels, we pack the nodes of multiple trees into contiguous memory blocks so that each memory access fetches data for multiple trees. At higher levels, we use leaf cardinality to identify the most popular paths through a tree and collocate those paths in contiguous cache lines. We extend this layout with a re-ordering of the tree traversal algorithm to take advantage of the increased memory throughput provided by out-of-order execution and cache-line prefetching. Together, these optimizations increase the performance and parallel scalability of classification in ensembles by a factor of ten over an optimized C++ implementation and a popular R-language implementation. 
    more » « less
  4. This Innovative Practice Work in Progress presents a plugin tool named DroidPatrol. It can be integrated with the Android Studio to perform tainted data flow analysis of mobile applications. Most vulnerabilities should be addressed and fixed during the development phase. Computer users, managers, and developers agree that we need software and systems that are “more secure”. Such efforts require support from both the educational institutions and learning communities to improve software assurance, particularly in writing secure code. Many open source static analysis tools help developers to maintain and clean up the code. However, they are not able to find potential security bugs. Our work is aimed to checking of security issues within Android applications during implementation. We provide an example hands-on lab based on DroidPatrol prototype and share the initial evaluation feedback from a classroom. The initial results show that the plugin based hands-on lab generates interests among learners and has the promise of acting as an intervention tool for secure software development. 
    more » « less
  5. This Innovative Practice Work in Progress presents a plugin tool named DroidPatrol. It can be integrated with the Android Studio to perform tainted data flow analysis of mobile applications. Most vulnerabilities should be addressed and fixed during the development phase. Computer users, managers, and developers agree that we need software and systems that are “more secure”. Such efforts require support from both the educational institutions and learning communities to improve software assurance, particularly in writing secure code. Many open source static analysis tools help developers to maintain and clean up the code. However, they are not able to find potential security bugs. Our work is aimed to checking of security issues within Android applications during implementation. We provide an example hands-on lab based on DroidPatrol prototype and share the initial evaluation feedback from a classroom. The initial results show that the plugin based hands-on lab generates interests among learners and has the promise of acting as an intervention tool for secure software development. 
    more » « less