Kernel use-after-free (UAF) bugs are severe threats to system security due to their complex root causes and high exploitability. We find that 36.1% of recent kernel UAF bugs are caused by improper uses of reference counters, dubbed refcount-related UAF bugs. Current kernel fuzzing tools based on code coverage can detect common memory errors, but none of them is aware of the root cause. As a consequence, they only trigger refcount-related UAF bugs passively and coincidentally, and may miss many deep hidden vulnerabilities. To actively trigger refcount-related UAF bugs, in this paper, we propose CountDown, a novel refcount-guided kernel fuzzer. CountDown collects diverse refcount operations from kernel executions and reshapes syscall relations based on commonly accessed refcounts. When generating user-space programs, CountDown prefers to combine syscalls that ever access the same refcounts, aiming to trigger complex refcount behaviors. It also injects refcount-decreasing and refcount-accessing syscalls to intentionally free the refcounted object and trigger invalid accesses through dangling pointers. We test CountDown on mainstream Linux kernels and compare it with popular fuzzers. On average, our tool can detect 66.1% more UAF bugs and 32.9% more KASAN reports than state-of-the-art tools. CountDown has found nine new kernel memory bugs, where two are fixed and one is confirmed.
more »
« less
Arbitrar: User-Guided API Misuse Detection
Software APIs exhibit rich diversity and complexity which not only renders them a common source of programming errors but also hinders program analysis tools for checking them. Such tools either expect a precise API specification, which requires program analysis expertise, or presume that correct API usages follow simple idioms that can be automatically mined from code, which suffers from poor accuracy. We propose a new approach that allows regular programmers to find API misuses. Our approach interacts with the user to classify valid and invalid usages of each target API method. It minimizes user burden by employing an active learning algorithm that ranks API usages by their likelihood of being invalid. We implemented our approach in a tool called ARBITRAR for C/C++ programs, and applied it to check the uses of 18 API methods in 21 large real-world programs, including OpenSSL and Linux Kernel. Within just 3 rounds of user interaction on average per API method, ARBITRAR found 40 new bugs, with patches accepted for 18 of them. Moreover, ARBITRAR finds all known bugs reported by a state-of-the-art tool APISAN in a benchmark suite comprising 92 bugs with a false positive rate of only 51.5% compared to APISAN’s 87.9%
more »
« less
- Award ID(s):
- 1836936
- PAR ID:
- 10312079
- Date Published:
- Journal Name:
- IEEE Symposium on Security and Privacy
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Data races are notorious concurrency bugs which can cause severe problems, including random crashes and corrupted execution results. However, existing data race detection tools are still challenging for users to use. It takes a significant amount of effort for users to install, configure and properly use a tool. A single tool often cannot find all the bugs in a program. Requiring users to use multiple tools is often impracticable and not productive because of the differences in tool interfaces and report formats. In this paper, we present a cloud-based, service-oriented design and implementation of a race detection service (RDS)1 to detect data races in parallel programs. RDS integrates multiple data race detection tools into a single cloud-based service via a REST API. It defines a standard JSON format to represent data race detection results, facilitating producing user-friendly reports, aggregating output of multiple tools, as well as being easily processed by other tools. RDS also defines a set of policies for aggregating outputs from multiple tools. RDS significantly simplifies the workflow of using data race detection tools and improves the report quality and productivity of performing race detection for parallel programs. Our evaluation shows that RDS can deliver more accurate results with much less effort from users, when compared with the traditional way of using any individual tools. Using four selected tools and DataRaceBench, RDS improves the Adjusted F-1 scores by 8.8% and 12.6% over the best and the average scores, respectively. For the NAS Parallel Benchmark, RDS improves 35% of the adjusted accuracy compared to the average of the tools. Our work studies a new approach of composing software tools for parallel computing via a service-oriented architecture. The same approach and framework can be used to create metaservice for compilers, performance tools, auto-tuning tools, and so on.more » « less
-
Recent trends in software-defined networking have extended network programmability to the data plane. Unfortunately, the chance of introducing bugs increases significantly. Verification can help prevent bugs by assuring that the program does not violate its requirements. Although research on the verification of P4 programs is very active, we still need tools to make easier for programmers to express properties and to rapidly verify complex invariants. In this paper, we leverage assertions and symbolic execution to propose a more general P4 verification approach. Developers annotate P4 programs with assertions expressing general network correctness properties; the result is transformed into C models and all possible paths symbolically executed. We implement a prototype, and use it to show the feasibility of the verification approach. Because symbolic execution does not scale well, we investigate a set of techniques to speed up the process for the specific case of P4 programs. We use the prototype implemented to show the gains provided by three speed up techniques (use of constraints, program slicing, parallelization), and experiment with different compiler optimization choices. We show our tool can uncover a broad range of bugs, and can do it in less than a minute considering various P4 applications.more » « less
-
To repair a program does not mean to make it (absolutely) correct; it only means to make it more-correct than it was originally. This is not a mundane academic distinction: Given that programs typically have about a dozen faults per KLOC, it is important for program repair methods and tools to be designed in such a way that they map an incorrect program into a more-correct, albeit still potentially incorrect, program. Yet in the absence of a concept of relative correctness, many program repair methods and tools resort to approximations of absolute correctness; since these methods and tools are often validated against programs with a single fault, making them absolutely correct is indistinguishable from making them more-correct; this has contributed to conceal/ obscure the absence of (and the need for) relative correctness. In this paper we propose a theory of program repair based on a concept of relative correctness. We aspire to encourage researchers in program repair to explicitly specify what concept of relative correctness their method or tool is based upon; and to validate their method or tool by proving that it does enhance relative correctness, as defined.more » « less
-
Variability in C software is a useful tool, but critical bugs that only exist in certain configurations are easily missed by conventional debugging techniques. Even with a small number of features, the configuration space of configurable software is too large to analyze exhaustively. Variability-aware static analysis for bug detection is being developed, but remains at too early a stage to be fully usable in real-world C programs. In this work, we present a methodology of finding variability bugs by combining variability-oblivious bug detectors, static analysis of build processes, and dynamic feature interaction inference. We further present an empirical study in which we test our methodology on two highly configurable C programs. We found our methodology to be effective, finding 88 true bugs between the two programs, of which 64 were variability bugs.more » « less
An official website of the United States government

