skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Title: Cryptocurrency Fraud and Code Sharing Data Set and Analysis Code
This release covers the state of the data and associated analysis code for determining code sharing between cryptocurrency codebases funded through the end of the original NSF CRII award. This material is based on work supported by the National Science Foundation under Grant CNS-1849729.</p>  more » « less
Award ID(s):
1849729
PAR ID:
10377942
Author(s) / Creator(s):
; ;
Publisher / Repository:
Zenodo
Date Published:
Edition / Version:
v1.0
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background: Previous work has shown that students can understand more complicated pieces of code through the use of common software development tools (code execution, debuggers) than they can without them. Objectives: Given that tools can enable novice programmers to understand more complex code, we believe that students should be explicitly taught to do so, to facilitate their plan acquisition and development as independent programmers. In order to do so, this paper seeks to understand: (1) the relative utility of these tools, (2) the thought process students use to choose a tool, and (3) the degree to which students can choose an appropriate tool to understand a given piece of code. Method: We used a mixed-methods approach. To explore the relative effectiveness of the tools, we used a randomized control trial study (𝑁 = 421) to observe student performance with each tool in understanding a range of different code snippets. To explore tool selection, we used a series of think-aloud interviews (𝑁 = 18) where students were presented with a range of code snippets to understand and were allowed to choose which tool they wanted to use. Findings: Overall, novices were more often successful comprehending code when provided with access to code execution, perhaps because it was easier to test a larger set of inputs than the debugger. As code complexity increased (as indicated by cyclomatic complexity), students become more successful with the debugger. We found that novices preferred code execution for simpler or familiar code, to quickly verify their understanding and used the debugger on more complex or unfamiliar code or when they were confused about a small subset of the code. High-performing novices were adept at switching between tools, alternating from a detail-oriented to a broader perspective of the code and vice versa, when necessary. Novices who were unsuccessful tended to be overconfident in their incorrect understanding or did not display a willingness to double check their answers using a debugger. Implications: We can likely teach novices to independently understand code they do not recognize by utilizing code execution and debuggers. Instructors should teach students to recognize when code is complex (e.g., large number of nested loops present), and to carefully step through these loops using debuggers. We should additionally teach students to be cautious to double check their understanding of the code and to self-assess whether they are familiar with the code. They can also be encouraged to strategically switch between execution and debuggers to manage cognitive load, thus maximizing their problem-solving capabilities. 
    more » « less
  2. With the rapidly increasing capabilities and adoption of code agents for AI-assisted coding and software development, safety and security concerns, such as generating or executing malicious code, have become significant barriers to the real-world deployment of these agents. To provide comprehensive and practical evaluations on the safety of code agents, we propose RedCode, an evaluation platform with benchmarks grounded in four key principles: real interaction with systems, holistic evaluation of unsafe code generation and execution, diverse input formats, and high-quality safety scenarios and tests. RedCode consists of two parts to evaluate agents’ safety in unsafe code execution and generation: (1) RedCode-Exec provides challenging code prompts in Python as inputs, aiming to evaluate code agents’ ability to recognize and handle unsafe code. We then map the Python code to other programming languages (e.g., Bash) and natural text summaries or descriptions for evaluation, leading to a total of over 4,000 testing instances. We provide 25 types of critical vulnerabilities spanning various domains, such as websites, file systems, and operating systems. We provide a Docker sandbox environment to evaluate the execution capabilities of code agents and design corresponding evaluation metrics to assess their execution results. (2) RedCode-Gen provides 160 prompts with function signatures and docstrings as input to assess whether code agents will follow instructions to generate harmful code or software. Our empirical findings, derived from evaluating three agent frameworks based on 19 LLMs, provide insights into code agents’ vulnerabilities. For instance, evaluations on RedCode-Exec show that agents are more likely to reject executing unsafe operations on the operating system, but are less likely to reject executing technically buggy code, indicating high risks. Unsafe operations described in natural text lead to a lower rejection rate than those in code format. Additionally, evaluations on RedCode-Gen reveal that more capable base models and agents with stronger overall coding abilities, such as GPT4, tend to produce more sophisticated and effective harmful software. Our findings highlight the need for stringent safety evaluations for diverse code agents. Our dataset and code are publicly available at https://github.com/AI-secure/RedCode. 
    more » « less
  3. Students, especially those outside the field of cybersecurity, are increasingly turning to Large Language Model (LLM)-based generative AI tools for coding assistance. These AI code generators provide valuable support to developers by generating code based on provided input and instructions. However, the quality and accuracy of the generated code can vary, depending on factors such as task complexity, the clarity of instructions, and the model’s familiarity with the programming language. Additionally, these generated codes may inadvertently utilize vulnerable built-in functions, potentially leading to source code vulnerabilities and exploits. This research undertakes an in-depth analysis and comparison of code generation, code completion, and security suggestions offered by prominent AI models, including OpenAI CodeX, CodeBert, and ChatGPT. The research aims to evaluate the effectiveness and security aspects of these tools in terms of their code generation, code completion capabilities, and their ability to enhance security. This analysis serves as a valuable resource for developers, enabling them to proactively avoid introducing security vulnerabilities in their projects. By doing so, developers can significantly reduce the need for extensive revisions and resource allocation, whether in the short or long term. 
    more » « less
  4. Code reviews are an ubiquitous and essential part of the software development process. They also offer a unique, at-scale opportunity for teaching developers in the context of their day-to-day development activities versus something more removed and formal, like a class. Yet there is little research on effective teaching through code reviews: focusing on learning for the author and not just changes to the code. We address this gap through a case study at Google: interviews with 14 developers revealed 12 patterns and 15 anti-patterns in code reviews that impact learning. For instance, explanatory rationale, sample solutions backed by standards, and a constructive tone facilitates learning, whereas harsh comments, excessive shallow critiques, and non-pragmatic reviewing that ignores authors' constraints hinders learning. We validated our qualitative findings through member checking, interviews with reviewers, a literature review, and a survey of 324 developers. This comprehensive study provides an empirical evidence of how social dynamics in code reviews impact learning. Based on our findings, we provide practical recommendations on how to frame constructive reviews to create a supportive learning environment. 
    more » « less
  5. We introduce a notion of code sparsification that generalizes the notion of cut sparsification in graphs. For a (linear) code C ⊆ 𝔽nq of dimension k a (1 ± ɛ)-sparsification of size s is given by a weighted set S ⊆ [n] with |S| ≤ s such that for every codeword c ∈ C the projection c|s of c to the set S has (weighted) hamming weight which is a (1 ± ɛ) approximation of the hamming weight of c. We show that for every code there exists a (1 ± ɛ)-sparsification of size s = Õ(k log(q)/ɛ2). This immediately implies known results on graph and hypergraph cut sparsification up to polylogarithmic factors (with a simple unified proof) — the former follows from the well-known fact that cuts in a graph form a linear code over 𝔽2, while the latter is obtained by a simple encoding of hypergraph cuts. Further, by connections between the eigenvalues of the Laplacians of Cayley graphs over to the weights of codewords, we also give the first proof of the existence of spectral Cayley graph sparsifiers over by Cayley graphs, i.e., where we sparsify the set of generators to nearly-optimal size. Additionally, this work can be viewed as a continuation of a line of works on building sparsifiers for constraint satisfaction problems (CSPs); this result shows that there exist near-linear size sparsifiers for CSPs over 𝔽p-valued variables whose unsatisfying assignments can be expressed as the zeros of a linear equation modulo a prime p. As an application we give a full characterization of ternary Boolean CSPs (CSPs where the underlying predicate acts on three Boolean variables) that allow for near-linear size sparsification. This makes progress on a question posed by Kogan and Krauthgamer (ITCS 2015) asking which CSPs allow for near-linear size sparsifiers (in the number of variables). At the heart of our result is a codeword counting bound that we believe is of independent interest. Indeed, extending Karger's cut-counting bound (SODA 1993), we show a novel decomposition theorem of linear codes: we show that every linear code has a (relatively) small subset of coordinates such that after deleting those coordinates, the code on the remaining coordinates has a smooth upper bound on the number of codewords of small weight. Using the deleted coordinates in addition to a (weighted) random sample of the remaining coordinates now allows us to sparsify the whole code. The proof of this decomposition theorem extends Karger's proof (and the contraction method) in a clean way, while enabling the extensions listed above without any additional complexity in the proofs. 
    more » « less