Millions of software projects leverage automated workflows, like GitHub Actions, for performing common build and deploy tasks. While GitHub Actions have greatly improved the software build process for developers, they pose significant risks to the software supply chain by adding more dependencies and code complexity that may introduce security bugs. This paper presents ARGUS, the first static taint analysis system for identifying code injection vulnerabilities in GitHub Actions. We used ARGUS to perform a large-scale
evaluation on 2,778,483 Workflows referencing 31,725 Actions and discovered critical code injection vulnerabilities in 4,307 Workflows and 80 Actions. We also directly compared ARGUS to two existing pattern-based GitHub Actions vulnerability scanners, demonstrating that our system exhibits a marked improvement in terms of vulnerability detection, with a discovery rate more than seven times (7x) higher than the
state-of-the-art approaches. These results demonstrate that command injection vulnerabilities in the GitHub Actions ecosystem are pervasive and require taint analysis to be detected.
more »
« less
VulGen: Realistic Vulnerable Sample Generation via Pattern Mining and Deep Learning
Building new, powerful data-driven defenses against prevalent software vulnerabilities needs sizable, quality vulnerability datasets, so does large-scale benchmarking of existing defense solutions. Automatic data generation would promisingly meet the need, yet there is little work aimed to generate much-needed quality vulnerable samples. Meanwhile, existing similar and adaptable techniques suffer critical limitations for that purpose. In this paper, we present VULGEN, the first injection-based vulnerability-generation technique that is not limited to a particular class of vulnerabilities. VULGEN combines the strengths of deterministic (pattern-based) and probabilistic (deep-learning/DL-based) program transformation approaches while mutually overcoming respective weaknesses. This is achieved through close collaborations between pattern mining/application and DL-based injection localization, which separates the concerns with how and where to inject. By leveraging large, pretrained programming language modeling and only learning locations, VULGEN mitigates its own needs for quality vulnerability data (for training the localization model). Extensive evaluations show that VULGEN significantly outperforms a state-of-the-art (SOTA) pattern-based peer technique as well as both Transformer- and GNN-based approaches in terms of the percentages of generated samples that are vulnerable and those also exactly matching the ground truth (by 38.0--430.1% and 16.3--158.2%, respectively). The VULGEN-generated samples led to substantial performance improvements for two SOTA DL-based vulnerability detectors (by up to 31.8% higher in F1), close to those brought by the ground-truth real-world samples and much higher than those by the same numbers of existing synthetic samples.
more »
« less
- Award ID(s):
- 1954376
- PAR ID:
- 10505753
- Publisher / Repository:
- Proceedings of IEEE/ACM International Conference on Software Engineering (ICSE 2023)
- Date Published:
- Page Range / eLocation ID:
- 2527–2539
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Millions of software projects leverage automated workflows, like GitHub Actions, for performing common build and deploy tasks. While GitHub Actions have greatly improved the software build process for developers, they pose significant risks to the software supply chain by adding more dependencies and code complexity that may introduce security bugs. This paper presents ARGUS, the first static taint analysis system for identifying code injection vulnerabilities in GitHub Actions. We used ARGUS to perform a large-scale evaluation on 2,778,483 Workflows referencing 31,725 Actions and discovered critical code injection vulnerabilities in 4,307 Workflows and 80 Actions. We also directly compared ARGUS to two existing pattern-based GitHub Actions vulnerability scanners, demonstrating that our system exhibits a marked improvement in terms of vulnerability detection, with a discovery rate more than seven times (7x) higher than the state-of-the-art approaches. These results demonstrate that command injection vulnerabilities in the GitHub Actions ecosystem are not only pervasive but also require taint analysis to be detected.more » « less
-
Mainstream software applications and tools are the configurable platforms with an enormous number of parameters along with their values. Certain settings and possible interactions between these parameters may harden (or soften) the security and robustness of these applications against some known vulnerabilities. However, the large number of vulnerabilities reported and associated with these tools make the exhaustive testing of these tools infeasible against these vulnerabilities infeasible. As an instance of general software testing problem, the research question to address is whether the system under test is robust and secure against these vulnerabilities. This paper introduces the idea of "vulnerability coverage," a concept to adequately test a given application for a certain classes of vulnerabilities, as reported by the National Vulnerability Database (NVD). The deriving idea is to utilize the Common Vulnerability Scoring System (CVSS) as a means to measure the fitness of test inputs generated by evolutionary algorithms and then through pattern matching identify vulnerabilities that match the generated vulnerability vectors and then test the system under test for those identified vulnerabilities. We report the performance of two evolutionary algorithms (i.e., Genetic Algorithms and Particle Swarm Optimization) in generating the vulnerability pattern vectors.more » « less
-
System-on-Chip (SoC) security is vital in designing trustworthy systems. Detecting and fixing a vulnerability in the early stages is easier and cost-effective. Assertion-based verification is widely used for functional validation of Register-Transfer Level (RTL) designs. Assertions can improve the controllability and observability that can lead to faster error detection and localization. Although assertions are widely used for functional validation of RTL models, there is limited effort in applying assertions to detect SoC security vulnerabilities. Specifically, a fundamental challenge in SoC security and trust validation is how to develop high-quality security assertions. In this article, we perform automated vulnerability analysis of RTL models to generate security assertions for six classes of vulnerabilities. Experimental results show that the generated security assertions can detect a wide variety of vulnerabilities. Our automated framework can drastically reduce the overall security validation effort compared to the manual development of security assertions. Automated generation of security assertions will enable assertion-based verification to be one of the most promising pre-silicon security sign-off solutions.more » « less
-
Software depends on upstream projects that regularly fix vulnerabilities, but the documentation of those vulnerabilities is often unreliable or unavailable. Automating the collection of existing vulnerability fixes is essential for downstream projects to reliably update their dependencies due to the sheer number of dependencies in modern software. Prior efforts rely solely on incomplete databases or imprecise or inaccurate statistical analysis of upstream repositories. In this paper, we introduce Differential Alert Analysis (DAA) to discover vulnerability fixes in software projects. In contrast to statistical analysis, DAA leverages static analysis security testing (SAST) tools, which reason over code context and semantics. We provide a language-independent implementation of DAA and show that for Python and Java based projects, DAA has high precision for a ground-truth dataset of vulnerability fixes — even with noisy and low-precision SAST tools. We then use DAA in two large-scale empirical studies covering several prominent ecosystems, finding hundreds of resolved alerts, including many never publicly disclosed. DAA thus provides a powerful, accurate primitive for software projects, code analysis tools, vulnerability databases, and researchers to characterize and enhance the security of software supply chains.more » « less