Rust’s growing popularity in high-integrity systems requires automated vulnerability detection in order to maintain its strong safety guarantees. Although Rust’s ownership model and compile-time checks prevent many errors, sometimes unexpected bugs may occasionally pass analysis, underlining the necessity for automated safe and unsafe code detection. This paper presents Rust-IR-BERT, a machine learning approach to detect security vulnerabilities in Rust code by analyzing its compiled LLVM intermediate representation (IR) instead of the raw source code. This approach offers novelty by employing LLVM IR’s language-neutral, semantically rich representation of the program, facilitating robust detection by capturing core data and control-flow semantics and reducing language-specific syntactic noise. Our method leverages a graph-based transformer model, GraphCodeBERT, which is a transformer architecture pretrained model to encode structural code semantics via data-flow information, followed by a gradient boosting classifier, CatBoost, that is capable of handling complex feature interactions—to classify code as vulnerable or safe. The model was evaluated using a carefully curated dataset of over 2300 real-world Rust code samples (vulnerable and non-vulnerable Rust code snippets) from RustSec and OSV advisory databases, compiled to LLVM IR and labeled with corresponding Common Vulnerabilities and Exposures (CVEs) identifiers to ensure comprehensive and realistic coverage. Rust-IR-BERT achieved an overall accuracy of 98.11%, with a recall of 99.31% for safe code and 93.67% for vulnerable code. Despite these promising results, this study acknowledges potential limitations such as focusing primarily on known CVEs. Built on a representative dataset spanning over 2300 real-world Rust samples from diverse crates, Rust-IR-BERT delivers consistently strong performance. Looking ahead, practical deployment could take the form of a Cargo plugin or pre-commit hook that automatically generates and scans LLVM IR artifacts during the development cycle, enabling developers to catch vulnerabilities at an early stage in the development cycle.
more »
« less
A Grounded Conceptual Model for Ownership Types in Rust
Programmers learning Rust struggle to understand ownership types, Rust’s core mechanism for ensuring memory safety without garbage collection. This paper describes our attempt to systematically design a pedagogy for ownership types. First, we studied Rust developers’ misconceptions of ownership to create the Ownership Inventory, a new instrument for measuring a person’s knowledge of ownership. We found that Rust learners could not connect Rust’s static and dynamic semantics, such as determining why an ill-typed program would (or would not) exhibit undefined behavior. Second, we created a conceptual model of Rust’s semantics that explains borrow checking in terms of flow-sensitive permissions on paths into memory. Third, we implemented a Rust compiler plugin that visualizes programs under the model. Fourth, we integrated the permissions model and visualizations into a broader pedagogy of ownership by writing a new ownership chapter forThe Rust Programming Language, a popular Rust textbook. Fifth, we evaluated an initial deployment of our pedagogy against the original version, using reader responses to the Ownership Inventory as a point of comparison. Thus far, the new pedagogy has improved learner scores on the Ownership Inventory by an average of 9
more »
« less
- Award ID(s):
- 2319014
- PAR ID:
- 10539095
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 7
- Issue:
- OOPSLA2
- ISSN:
- 2475-1421
- Page Range / eLocation ID:
- 1224 to 1252
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The Rust Programming Language, developed by Mozilla, has gained popularity for combining performance and memory safety. Its ownership and borrowing mechanisms ensure memory safety and reduce common bugs, making Rust safe. In contrast, Unsafe Rust introduces potential vulnerabilities that can undermine these guarantees and cause difficult security flaws. Unsafe Rust allows low-level operations, interaction with other languages, and optimizations, but bypasses most security checks, so its use should be limited, reviewed, and validated. Static and dynamic code analysis tools help validate that unsafe regions do not contain vulnerabilities. Due to Rust’s rapid growth, documentation of such tools is sparse and difficult to navigate. This research reviews, analyzes, and tests multiple static code analysis tools, aiming to provide developers with a comprehensive resource tailored to Unsafe Rust. The results include insights into each tool’s functionality, strengths, weaknesses, and use cases, equipping developers to write safer, more secure, and reliable Rust code.more » « less
-
This paper documents a year-long experiment to “profile” the process of learning a programming language: gathering data to understand what makes a language hard to learn, and using that data to improve the learning process. We added interactive quizzes to The Rust Programming Language, the official textbook for learning Rust. Over 13 months, 62,526 readers answered questions 1,140,202 times. First, we analyze the trajectories of readers. We find that many readers drop-out of the book early when faced with difficult language concepts like Rust’s ownership types. Second, we use classical test theory and item response theory to analyze the characteristics of quiz questions. We find that better questions are more conceptual in nature, such as asking why a program does not compile vs. whether a program compiles. Third, we performed 12 interventions into the book to help readers with difficult questions. We find that on average, interventions improved quiz scores on the targeted questions by +20more » « less
-
Memory safety is an important property for security-critical systems, but it cannot be easily extended to cryptography, which is a common source of memory safety vulnerabilities. Cryptography libraries use assembly for direct control of timing and performance, but assembly introduces unsafety when it is called from a high-level memory-safe language like Rust. To enable quick and safe integration of assembly into Rust, specifically for building memory-safe cryptography, we present CLAMS. CLAMS verifies cryptographic assembly against safety constraints derived directly from Rust’s type system. Verification is done through symbolic execution at compile time to minimize run-time overheads, and supports verifying loops over potentially unbounded input buffers. CLAMS’s procedural macro interface forces developers to map safe Rust types to registers and define preconditions on input and output parameters. CLAMS’s techniques can verify the memory safety of assembly from a popular open-source cryptography library. We evaluated CLAMS and found that verification is quick and imposes compile-time overheads under 100ms and negligible run-time overheads.more » « less
-
The Rust programming language is a prominent candidate for a C and C++ replacement in the memory-safe era. However, Rust’s safety guarantees do not in general extend to arbitrary third-party code. The main purpose of this short paper is to point out that this is true even entirely within safe Rust – which we illustrate through a series of counterexamples. To complement our examples, we present initial experimental results to investigate: do existing program analysis and program veri!cation tools detect or mitigate these risks? Are these attack patterns realizable via input to publicly exposed functions in real-world Rust libraries? And to what extent do existing supply chain attacks in Rust leverage similar attacks? All of our examples and associated data are available as an open source repository on GitHub. We hope this paper will inspire future work on rethinking safety in Rust – especially, to go beyond the safe/unsafe distinction and harden Rust against a stronger threat model of attacks that can be used in the wild.more » « less
An official website of the United States government

