Scrubbing sensitive data before releasing memory is a widely accepted but often ignored programming practice for developing secure software. Consequently, confidential data such as cryptographic keys, passwords, and personal data, can remain in memory indefinitely, thereby increasing the risk of exposure to hackers who can retrieve the data using memory dumps or exploit vulnerabilities such as Heartbleed and Etherleak. We propose an approach for detecting a specific memory safety bug called Improper Clearing of Heap Memory Before Release, also known as Common Weakness Enumeration 244, in C programs. The CWE-244 bug in a program allows the leakage of confidential information when a variable is not wiped before heap memory is freed. Our approach combines taint analysis and model checking to detect this weakness. We have three main phases: (1) perform a coarse flow-insensitive inter-procedural static analysis on the program to construct a set of pointer variables that could point to sensitive data; (2) instrument the program with required dynamic variable tracking, and assertion logic for memory wiping before deallocation; and (3) invoke a model checker, the C-Bounded Model Checker (CBMC) in our case, to detect assertion violation in the instrumented program. We develop a tool, \toolname, implementing our instrumentation based algorithm, and we provide experimental validation on the Juliet Test Suite --- the tool is able to detect all the CWE-244 instances present in the test suite. To the best of our knowledge, this is the first work which presents a solution to the problem of detecting unscrubbed secure memory deallocation violations in programs.
more »
« less
OpenSHMEM Checker - A Clang Based Static Checker for OpenSHMEM
Compilers are generally not aware of the semantics of library-based parallel programming models such as MPI and OpenSHMEM, and hence are unable to detect programming errors related to their use. To alleviate this issue, we developed a custom static checker for OpenSHMEM programs based on LLVM’s Clang Static Analyzer framework (CSA). We leverage the Symbolic Execution engine of the core Static Analyzer framework and its path-sensitive analysis to check for bugs on all OpenSHMEM program paths. We have identified common programming mistakes in OpenSHMEM programs that are detectable at compile-time and provided checks for them in the analyzer. They cover: utilization of the right type of mem- ory (private vs. symmetric memory); safe/synchronized access to program data in the presence of asynchronous, one-sided communication; and double-free of memories allocated using OpenSHMEM memory allocation routines. Our experimental analysis showed that the static checker successfully detects bugs in OpenSHMEM code.
more »
« less
- Award ID(s):
- 1725499
- PAR ID:
- 10285754
- Date Published:
- Journal Name:
- 20th International Symposium on Parallel and Distributed Computing (ISPDC’2020)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Static analysis is a proven technique for catching bugs during software development. However, analysis tooling must approximate, both theoretically and in the interest of practicality. False positives are a pervading manifestation of such approximations—tool configuration and customization is therefore crucial for usability and directing analysis behavior. To suppress false positives, developers readily disable bug checks or insert comments that suppress spurious bug reports. Existing work shows that these mechanisms fall short of developer needs and present a significant pain point for using or adopting analyses. We draw on the insight that an analysis user always has one notable ability to influence analysis behavior regardless of analyzer options and implementation: modifying their program. We present a new technique for automated, generic, and temporary code changes that tailor to suppress spurious analysis errors. We adopt a rule-based approach where simple, declarative templates describe general syntactic changes for code patterns that are known to be problematic for the analyzer. Our technique promotes program transformation as a general primitive for improving the fidelity of analysis reports (we treat any given analyzer as a black box). We evaluate using five different static analyzers supporting three different languages (C, Java, and PHP) on large, real world programs (up to 800KLOC). We show that our approach is effective in sidestepping long-standing and complex issues in analysis implementations.more » « less
-
null (Ed.)Extended Berkeley Packet Filter (BPF) has emerged as a powerful method to extend packet-processing functionality in the Linux operating system. BPF allows users to write code in high-level languages (like C or Rust) and execute them at specific hooks in the kernel, such as the network device driver. To ensure safe execution of a user-developed BPF program in kernel context, Linux uses an in-kernel static checker. The checker allows a program to execute only if it can prove that the program is crash-free, always accesses memory within safe bounds, and avoids leaking kernel data. BPF programming is not easy. One, even modest-sized BPF programs are deemed too large to analyze and rejected by the kernel checker. Two, the kernel checker may incorrectly determine that a BPF program exhibits unsafe behaviors. Three, even small performance optimizations to BPF code (e.g., 5% gains) must be meticulously hand-crafted by expert developers. Traditional optimizing compilers for BPF are often inadequate since the kernel checker's safety constraints are incompatible with rule-based optimizations. We present K2, a program-synthesis-based compiler that automatically optimizes BPF bytecode with formal correctness and safety guarantees. K2 produces code with 6--26% reduced size, 1.36%--55.03% lower average packet-processing latency, and 0--4.75% higher throughput (packets per second per core) relative to the best clang-compiled program, across benchmarks drawn from Cilium, Facebook, and the Linux kernel. K2 incorporates several domain-specific techniques to make synthesis practical by accelerating equivalence-checking of BPF programs by 6 orders of magnitude.more » « less
-
In a programmer's pursuit of using or creating new programming languages, finding errors in the syntax of code can present many issues. Languages with little to no documentation and incomprehensible exception handling and reports are frustrating to work with and can create confusion when trying to locate where in the code the program has faulted. In this paper we present {\em CodeBlock}, a parser generator and syntax checker for arbitrary programming languages. CodeBlock is a block based grammar builder for any programming language that constructs a parsing expression grammar for the language based on user built expressions. This grammar can then be used within the CodeBlock website or in the CodeBlock Node.JS application to test the syntax of either written code, or files containing code in the language, reporting comprehensible error messages if errors in syntax are found. Our eventual goal is to incorporate CodeBlock into a compiler design tutoring system, called {\em CompiTS}, in which it will play a central role in teaching students how to design new programming languages and test the effectiveness of the new language using rapid prototyping and a translational approach to implementation. This is an emerging research, and in this paper, we only focus on the syntax checking component of the CompiTS system.more » « less
-
Probabilistic programming languages aid developers performing Bayesian inference. These languages provide programming constructs and tools for probabilistic modeling and automated inference. Prior work introduced a probabilistic programming language, ProbZelus, to extend probabilistic programming functionality to unbounded streams of data. This work demonstrated that the delayed sampling inference algorithm could be extended to work in a streaming context. ProbZelus showed that while delayed sampling could be effectively deployed on some programs, depending on the probabilistic model under consideration, delayed sampling is not guaranteed to use a bounded amount of memory over the course of the execution of the program. In this paper, we the present conditions on a probabilistic program’s execution under which delayed sampling will execute in bounded memory. The two conditions are dataflow properties of the core operations of delayed sampling: the m -consumed property and the unseparated paths property . A program executes in bounded memory under delayed sampling if, and only if, it satisfies the m -consumed and unseparated paths properties. We propose a static analysis that abstracts over these properties to soundly ensure that any program that passes the analysis satisfies these properties, and thus executes in bounded memory under delayed sampling.more » « less
An official website of the United States government

