skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the use of static analysis to engage students with software quality improvement: An experience with PMD
Static analysis tools are frequently used to scan the source code and detect deviations from the project coding guidelines. Given their importance, linters are often introduced to classrooms to educate students on how to detect and potentially avoid these code anti-patterns. However, little is known about their effectiveness in raising students’ awareness, given that these linters tend to generate a large number of false positives. To increase the awareness of potential coding issues that violate coding standards, in this paper, we aim to reflect on our experience with teaching the use of static analysis for the purpose of evaluating its effectiveness in helping students with respect to improving software quality. This paper discusses the results of an experiment in the classroom, over a period of 3 academic semesters, involving 65 submissions that carried out code review activity of 690 rules using PMD. The results of the quantitative and qualitative analysis show that the presence of a set of PMD quality issues influences the acceptance or rejection of the issues, design, and best practices-related categories that take longer time to be resolved, and students acknowledge the potential of using static analysis tools during code review. Through this experiment, code review can turn into a vital part of the educational computing plan. We envision our findings enabling educators to support students with code review strategies in order to raise students’ awareness about static analysis tools and scaffold their coding skills.  more » « less
Award ID(s):
2213765
PAR ID:
10545719
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training (ICSE-SEET)
Date Published:
ISBN:
979-8-3503-2259-0
Page Range / eLocation ID:
179 to 191
Subject(s) / Keyword(s):
software engineering refactoring education
Format(s):
Medium: X
Location:
Melbourne, Australia
Sponsoring Org:
National Science Foundation
More Like this
  1. Patil, Vishwas T; Krishnan, Ram; Shyamasundar, Rudrapatna K (Ed.)
    OSS is important and useful. We want to ensure that it is of high quality and has no security issues. Static analysis tools provide easy-to-use and application-independent mechanisms to assess various aspects of a given code. Many effective open-source static analysis tools exist. In this paper, we perform the first comprehensive analysis using 24 open-source static analysis tools (through Omega Analyzer) on 4,947 repositories. Our study identified several interesting findings, such as the distribution of errors in relation to the criticality score of repositories shows that repositories with a criticality score have the highest percentage of errors. We envision that our findings provide insights into the effectiveness of static analysis tools on OSS and future research directions in securing OSS repositories. 
    more » « less
  2. Research efforts tried to expose students to security topics early in the undergraduate CS curriculum. However, such efforts are rarely adopted in practice and remain less effective when it comes to writing secure code. In our prior work, we identified key issues with the how students code and grouped them into six themes: (a) Knowledge of C, (b) Understanding compiler and OS messages, (c) Utilization of resources, (d) Knowledge of memory, (e) Awareness of unsafe functions, and (f) Understanding of security topics. In this work, we aim to understand students' knowledge about each theme and how that knowledge affects their secure coding practices. Thus, we propose a modified SOLO taxonomy for the latter five themes. We apply the taxonomy to the coding interview data of 21 students from two US R1 universities. Our results suggest that most students have limited knowledge of each theme. We also show that scoring low in these themes correlates with why students fail to write secure code and identify possible vulnerabilities. 
    more » « less
  3. Existing malicious code detection techniques demand the integration of multiple tools to detect different malware patterns, often suffering from high misclassification rates. Therefore, malicious code detection techniques could be enhanced by adopting advanced, more automated approaches to achieve high accuracy and a low misclassification rate. The goal of this study is to aid security analysts in detecting malicious packages by empirically studying the effectiveness of Large Language Models (LLMs) in detecting malicious code. We present SocketAI, a malicious code review workflow to detect malicious code. To evaluate the effectiveness SocketAI, we leverage a benchmark dataset of 5,115 npm packages, of which 2,180 packages have malicious code. We conducted a baseline comparison of GPT-3 and GPT-4 models with the state-of-the-art CodeQL static analysis tool, using 39 custom CodeQL rules developed in prior research to detect malicious Javascript code. We also compare the effectiveness of static analysis as a pre-screener with SocketAI workflow, measuring the number of files that need to be analyzed and the associated costs. Additionally, we performed a qualitative study to understand the types of malicious packages detected or missed by our workflow. Our baseline comparison demonstrates a 16% and 9% improvement over static analysis in precision and F1 scores, respectively. GPT-4 achieves higher accuracy with 99% precision and 97% F1 scores, while GPT-3 offers a more cost-effective balance at 91% precision and 94% F1 scores. Prescreening files with a static analyzer reduces the number of files requiring LLM analysis by 77.9% and decreases costs by 60.9% for GPT-3 and 76.1% for GPT-4. Our qualitative analysis identified data theft, execution of arbitrary code, and suspicious domain categories as the top detected malicious packages. 
    more » « less
  4. Code quality is of universal concern among educators. Refactoring code, i.e., revising the structure of a program without changing its behavior is one approach for improving code quality. Numerous software tools have been created to help students refactor the code they write. Only a few software tutors have been reported in literature that help students proactively learn code quality by solving refactoring problems. But they suffer false positive and false negative grading issues because they allow freehand coding. We investigated whether refactoring tutors that do not allow freehand coding could be used to help students learn about non-trivial anti-patterns. We developed and deployed two software tutors for refactoring problems that are based on the principle of “refactoring without rewriting code”, and cover a subset of refactoring problems that can be solved using only deletion, duplication, reordering and token-wise editing of lines of code. We investigated whether students needed to learn the anti-patterns covered by the tutors and whether they benefited from using the tutors. In this experience report, we start by describing the tutors – the list of refactoring concepts covered, the user interface, grading, feedback and usage. We report our experience using the tutors over three semesters, which confirmed that both introductory and advanced students needed and benefited from using the tutors despite the limitations of the tutors’ coverage. We reflect on what worked and what did not. The tutors currently cover C++, Java and C#. They are available for free for educational use on the web at auglets.org. 
    more » « less
  5. Students, especially those outside the field of cybersecurity, are increasingly turning to Large Language Model (LLM)-based generative AI tools for coding assistance. These AI code generators provide valuable support to developers by generating code based on provided input and instructions. However, the quality and accuracy of the generated code can vary, depending on factors such as task complexity, the clarity of instructions, and the model’s familiarity with the programming language. Additionally, these generated codes may inadvertently utilize vulnerable built-in functions, potentially leading to source code vulnerabilities and exploits. This research undertakes an in-depth analysis and comparison of code generation, code completion, and security suggestions offered by prominent AI models, including OpenAI CodeX, CodeBert, and ChatGPT. The research aims to evaluate the effectiveness and security aspects of these tools in terms of their code generation, code completion capabilities, and their ability to enhance security. This analysis serves as a valuable resource for developers, enabling them to proactively avoid introducing security vulnerabilities in their projects. By doing so, developers can significantly reduce the need for extensive revisions and resource allocation, whether in the short or long term. 
    more » « less