Abstract Gradual typing allows programs to enjoy the benefits of both static typing and dynamic typing. While it is often desirable to migrate a program from more dynamically typed to more statically typed or vice versa, gradual typing itself does not provide a way to facilitate this migration. This places the burden on programmers who have to manually add or remove type annotations. Besides the general challenge of adding type annotations to dynamically typed code, there are subtle interactions between these annotations in gradually typed code that exacerbate the situation. For example, to migrate a program to be as static as possible, in general, all possible combinations of adding or removing type annotations from parameters must be tried out and compared. In this paper, we address this problem by developing migrational typing , which efficiently types all possible ways of replacing dynamic types with fully static types for a gradually typed program. The typing result supports automatically migrating a program to be as static as possible or introducing the least number of dynamic types necessary to remove a type error. The approach can be extended to support user-defined criteria about which annotations to modify. We have implemented migrational typing and evaluated it on large programs. The results show that migrational typing scales linearly with the size of the program and takes only 2–4 times longer than plain gradual typing.
more »
« less
BinTrimmer: Towards Static Binary Debloating Through Abstract Interpretation
The increasing complexity of modern programs motivates software engineers to often rely on the support of third-party libraries. Although this practice allows application developers to achieve a compelling time-to-market, it often makes the final product bloated with conspicuous chunks of unused code. Other than making a program unnecessarily large, this dormant code could be leveraged by willful attackers to harm users. As a consequence, several techniques have been recently proposed to perform program debloating and remove (or secure) dead code from applications. However, state-of-the-art approaches are either based on unsound strategies, thus producing unreliable results, or pose too strict assumptions on the program itself. In this work, we propose a novel abstract domain, called Signedness-Agnostic Strided Interval, which we use as the cornerstone to design a novel and sound static technique, based on abstract interpretation, to reliably perform program debloating. Throughout the paper, we detail the specifics of our approach and show its effectiveness and usefulness by implementing it in a tool, called BinTrimmer, to perform static program debloating on binaries. Our evaluation shows that BinTrimmer can remove up to 65.6% of a library’s code and that our domain is, on average, 98% more precise than the related work.
more »
« less
- Award ID(s):
- 1704253
- PAR ID:
- 10155199
- Date Published:
- Journal Name:
- Proceedings of Detection of Intrusions and Malware, and Vulnerability Assessment. DIMVA 2019.
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Extended Berkeley Packet Filter (BPF) is a language and run-time system that allows non-superusers to extend the Linux and Windows operating systems by downloading user code into the kernel. To ensure that user code is safe to run in kernel context, BPF relies on a static analyzer that proves properties about the code, such as bounded memory access and the absence of operations that crash. The BPF static analyzer checks safety using abstract interpretation with several abstract domains. Among these, the domain of tnums (tristate numbers) is a key domain used to reason about the bitwise uncertainty in program values. This paper formally specifies the tnum abstract domain and its arithmetic operators. We provide the first proofs of soundness and optimality of the abstract arithmetic operators for tnum addition and subtraction used in the BPF analyzer. Further, we describe a novel sound algorithm for multiplication of tnums that is more precise and efficient (runs 33% faster on average) than the Linux kernel's algorithm. Our tnum multiplication is now merged in the Linux kernel.more » « less
-
Feature-rich software programs typically provide many configuration options for users to enable and disable features, or tune feature behaviors. Given the values of configuration options, certain code blocks in a program will become redundant code and never be used. However, the redundant code is still present in the program and thus unnecessarily increases a program's attack surface by allowing attackers to use it as return-oriented programming (ROP) gadgets. Existing code debloating techniques have several limitations: not targeting this type of redundant code, requiring access to program source code or user-provided test inputs. In this paper, we propose a practical code debloating approach, called BinDebloat, to address these limitations. BinDebloat identifies and removes redundant code caused by configuration option values. It does not require user-provided test inputs, or support from program developers, and is designed to work on closed-source programs. It uses static program analysis to identify code blocks that are control-dependent on configuration option values. Given a set of configuration option values, it automatically determines which of such code blocks become redundant and uses static binary rewriting to neutralize these code blocks so that they are removed from the attack surface. We evaluated BinDebloat on closed-source Windows programs and the results show that BinDebloat can effectively reduce a program's attack surface.more » « less
-
Large language models (LLMs) are widely used in software development. However, the code generated by LLMs often contains vulnerabilities. Several secure code generation methods have been proposed to address this issue, but their current evaluation schemes leave several concerns unaddressed. Specifically, most existing studies evaluate security and functional correctness separately, using different datasets. That is, they assess vulnerabilities using securityrelated code datasets while validating functionality with general code datasets. In addition, prior research primarily relies on a single static analyzer, CodeQL, to detect vulnerabilities in generated code, which limits the scope of security evaluation. In this work, we conduct a comprehensive study to systematically assess the improvements introduced by four state-of-the-art secure code generation techniques. Specifically, we apply both security inspection and functionality validation to the same generated code and evaluate these two aspects together. We also employ three popular static analyzers and two LLMs to identify potential vulnerabilities in the generated code. Our study reveals that existing techniques often compromise the functionality of generated code to enhance security. Their overall performance remains limited when evaluating security and functionality together. In fact, many techniques even degrade the performance of the base LLM by more than 50%. Our further inspection reveals that these techniques often either remove vulnerable lines of code entirely or generate “garbage code” that is unrelated to the intended task. Moreover, the commonly used static analyzer CodeQL fails to detect several vulnerabilities, further obscuring the actual security improvements achieved by existing techniques. Our study serves as a guideline for a more rigorous and comprehensive evaluation of secure code generation performance in future work.more » « less
-
Commutativity of program code (the equivalence of two code fragments composed in alternate orders) is of ongoing interest in many settings such as program verification, scalable concurrency, and security analysis. While some recent works have explored static analysis for code commutativity, few have specifically catered to heap-manipulating programs. We introduce an abstract domain in which commutativity synthesis or verification techniques can safely be performed on abstract mathematical models and, from those results, one can directly obtain commutativity conditions for concrete heap programs. This approach offloads challenges of concrete heap reasoning into the simpler abstract space. We show this reasoning supports framing and composition, and conclude with commutativity analysis of programs operating on example heap data structures. Our work has been mechanized in Coq and is available in the supplement.more » « less
An official website of the United States government

