Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Software Composition Analysis (SCA) is an important part in the software security lifecycle. Establishing the individual software components and versions that make up an application allows for identifying and remediating vulnerabilities. However, SCA tools have not kept up with the ever growing number of new vulnerabilities each year. Developers are flooded with vulnerability alerts and often struggle to quickly remediate critical issues with external components. We conducted 20 interviews with developers to investigate their processes and challenges around using SCA in their software projects. Interviews covered how SCA tools are integrated into workflows, how reports are interpreted and acted upon, and what challenges were encountered. We find that SCA tools are most often integrated into build pipelines and that users report that information in SCA alerts is too generic and lack context, specifically context on infrastructure, network configurations, reachability, and exploitability. Based on our findings we conclude that context matters throughout the SCA process, including for evaluating impact, when to trigger SCA scan runners, and how to integrate and communicate tool findings.more » « lessFree, publicly-accessible full text available August 13, 2026
-
Reusable software libraries, frameworks, and components, such as those provided by open source ecosystems and third-party suppliers, accelerate digital innovation. However, recent years have shown almost exponential growth in attackers leveraging these software artifacts to launch software supply chain attacks. Past well-known software supply chain attacks include the SolarWinds, log4j, and xz utils incidents. Supply chain attacks are considered to have three major attack vectors: through vulnerabilities and malware accidentally or intentionally injected into open source and third-partydependencies/components/containers; by infiltrating thebuild infrastructureduring the build and deployment processes; and through targeted techniques aimed at thehumansinvolved in software development, such as through social engineering. Plummeting trust in the software supply chain could decelerate digital innovation if the software industry reduces its use of open source and third-party artifacts to reduce risks. This article contains perspectives and knowledge obtained from intentional outreach with practitioners to understand their practical challenges and from extensive research efforts. We then provide an overview of current research efforts to secure the software supply chain. Finally, we propose a future research agenda to close software supply chain attack vectors and support the software industry.more » « lessFree, publicly-accessible full text available June 30, 2026
-
Existing malicious code detection techniques demand the integration of multiple tools to detect different malware patterns, often suffering from high misclassification rates. Therefore, malicious code detection techniques could be enhanced by adopting advanced, more automated approaches to achieve high accuracy and a low misclassification rate. The goal of this study is to aid security analysts in detecting malicious packages by empirically studying the effectiveness of Large Language Models (LLMs) in detecting malicious code. We present SocketAI, a malicious code review workflow to detect malicious code. To evaluate the effectiveness SocketAI, we leverage a benchmark dataset of 5,115 npm packages, of which 2,180 packages have malicious code. We conducted a baseline comparison of GPT-3 and GPT-4 models with the state-of-the-art CodeQL static analysis tool, using 39 custom CodeQL rules developed in prior research to detect malicious Javascript code. We also compare the effectiveness of static analysis as a pre-screener with SocketAI workflow, measuring the number of files that need to be analyzed and the associated costs. Additionally, we performed a qualitative study to understand the types of malicious packages detected or missed by our workflow. Our baseline comparison demonstrates a 16% and 9% improvement over static analysis in precision and F1 scores, respectively. GPT-4 achieves higher accuracy with 99% precision and 97% F1 scores, while GPT-3 offers a more cost-effective balance at 91% precision and 94% F1 scores. Prescreening files with a static analyzer reduces the number of files requiring LLM analysis by 77.9% and decreases costs by 60.9% for GPT-3 and 76.1% for GPT-4. Our qualitative analysis identified data theft, execution of arbitrary code, and suspicious domain categories as the top detected malicious packages.more » « lessFree, publicly-accessible full text available April 28, 2026
-
The integrity of software builds is fundamental to the security of the software supply chain. While Thompson first raised the potential for attacks on build infrastructure in 1984, limited attention has been given to build integrity in the past 40 years, enabling recent attacks on SolarWinds, event-stream, and xz. The best-known defense against build system attacks is creating reproducible builds; however, achieving them can be complex for both technical and social reasons and thus is often viewed as impractical to obtain. In this paper, we analyze reproducibility of builds in a novel context: reusable components distributed as packages in six popular software ecosystems (npm, Maven, PyPI, Go, RubyGems, and Cargo). Our quantitative study on a representative sample of 4000 packages in each ecosystem raises concerns: Rates of reproducible builds vary widely between ecosystems, with some ecosystems having all packages reproducible whereas others have reproducibility issues in nearly every package. However, upon deeper investigation, we identified that with relatively straightforward infrastructure configuration and patching of build tools, we can achieve very high rates of reproducible builds in all studied ecosystems. We conclude that if the ecosystems adopt our suggestions, the build process of published packages can be independently confirmed for nearly all packages without individual developer actions, and doing so will prevent significant future software supply chain attacks.more » « lessFree, publicly-accessible full text available April 26, 2026
-
Critical open-source projects form the basis of many large software systems. They provide trusted and extensible implementations of important functionality for cryptography, compatibility, and security. Verifying commit authorship authenticity in open-source projects is essential and challenging. Git users can freely configure author details such as names and email addresses. Platforms like GitHub use such information to generate profile links to user accounts. We demonstrate three attack scenarios malicious actors can use to manipulate projects and profiles on GitHub to appear trustworthy. We designed a mixed-research study to assess the effect on critical open-source software projects and evaluated countermeasures. First, we conducted a large-scale measurement among 50,328 critical open-source projects on GitHub and demonstrated that contribution workflows can be abused in 85.9% of the projects. We identified 573,043 email addresses that a malicious actor can claim to hijack historic contributions and improve the trustworthiness of their accounts. When looking at commit signing as a countermeasure, we found that the majority of users (95.4%) never signed a commit, and for the majority of projects (72.1%), no commit was ever signed. In contrast, only 2.0% of the users signed all their commits, and for 0.2% of the projects all commits were signed. Commit signing is not associated with projects’ programming languages, topics, or other security measures. Second, we analyzed online security advice to explore the awareness of contributor spoofing and identify recommended countermeasures. Most documents exhibit awareness of the simple spoofing technique via Git commits but no awareness of problems with GitHub’s handling of email addresses.more » « lessFree, publicly-accessible full text available January 1, 2026
-
Software Supply Chain Security (SSC) involves numerous stakeholders, processes and tools that work together to deliver a software product. A vulnerability in one element can cascade through the entire system and potentially affect thousands of dependents and millions of end users. Despite the SSC’s importance and the increasing awareness around its security, existing research mainly focuses on either technical aspects, exploring various attack vectors and their mitigation, or it empirically studies developers’ challenges, but mainly within the open source context. To better develop supportive tooling and education, we need to understand how developers consider and mitigate supply chain security challenges. We conducted 18 semi-structured interviews with experienced developers actively working in industry to gather in-depth insights into their experiences, encountered challenges, and the effectiveness of various strategies to secure the SSC. We find that the developers are generally interested in securing the supply chain, but encounter many obstacles in implementing effective security measures, both specific to SSC security and for general security. Developers also mention a wide set of approaches and methods to secure their projects, but mostly report general secure software engineering methodologies and seem to be mostly unaware of SSC specific threats and mitigations.more » « lessFree, publicly-accessible full text available November 19, 2025
-
Security & Privacy (S&P) software is created to have positive impacts on people: to protect them from surveillance and attacks, enhance their privacy, and keep them safe. Despite these positive intentions, S&P software can have unintended consequences, such as enabling and protecting criminals, misleading people into using the software with a false sense of security, and being inaccessible to users without strong technical backgrounds or with specific accessibility needs. In this study, through 14 semi-structured expert interviews with S&P software creators, we explore whether and how S&P software creators foresee and mitigate unintended consequences. We find that unintended consequences are often overlooked and ignored. When addressed, they are done in unstructured ways—often ad hoc and just based on user feedback—thereby shifting the burden to users. To reduce this burden on users and more effectively create positive change, we recommend S&P software creators to proactively consider and mitigate unintended consequences through increasing awareness and education, promoting accountability at the organizational level to mitigate issues, and using systematic toolkits for anticipating impacts.more » « less
-
As the reliance on open-source software dependencies increases, managing the security vulnerabilities in these dependencies becomes complex. State-of-the-art industry tools use reachability analysis of code to alert developers when security vulnerabilities in dependencies are likely to impact their projects. These tools heavily rely on precisely identifying the location of the vulnerability within the dependency, specifically vulnerable functions. However, the process of identifying vulnerable functions is currently either manual or uses a naive automated approach that falsely assumes all changed functions in a security patch link are vulnerable. In this paper, we explore using open-source large language models (LLMs) to improve pairing security advisories with vulnerable functions. We explore various prompting strategies, learning paradigms (i.e., zero-shot vs. few-shot), and show our approach generalizes to other open-source LLMs. Compared to the naive automated approach, we show a 173% increase in precision while only having an 18% decrease in recall. The significant increase in precision to enhance vulnerable function identification lays the groundwork for downstream techniques that depend on this critical information for security analysis and threat mitigation.more » « less
-
Security advisories are the primary channel of communication for discovered vulnerabilities in open-source software, but they often lack crucial information. Specifically, 63% of vulnerability database reports are missing their patch links, also referred to as vulnerability fixing commits (VFCs). This paper introduces VFCFinder, a tool that generates the top-five ranked set of VFCs for a given security advisory using Natural Language Programming Language (NL-PL) models. VFCFinder yields a 96.6% recall for finding the correct VFC within the Top-5 commits, and an 80.0% recall for the Top-1 ranked commit. VFCFinder generalizes to nine different programming languages and outperforms state-of-the-art approaches by 36 percentage points in terms of Top-1 recall. As a practical contribution, we used VFCFinder to backfill over 300 missing VFCs in the GitHub Security Advisory (GHSA) database. All of the VFCs were accepted and merged into the GHSA database. In addition to demonstrating a practical pairing of security advisories to VFCs, our general open-source implementation will allow vulnerability database maintainers to drastically improve data quality, supporting efforts to secure the software supply chain.more » « less
-
Existing research has primarily delved into the realm of computer science outreach aimed at K-12 students, with a focus on both informal and non-formal approaches. However, a noticeable research gap exists when it comes to cybersecurity outreach tailored specifically for underserved secondary school students. This article addresses this void by presenting an iterative pilot of a cybersecurity curriculum. This innovative curriculum integrates a one-week summer camp and a series of 1.5-hour workshops designed to provide students with a comprehensive understanding of cybersecurity. The overarching goal of this approach is to foster wider participation in the field of computing, particularly in the realm of cybersecurity. This research aims to spark interest among students who may currently face limited access to computing resources. The cybersecurity lessons featured in this curriculum adhere to the standards set by Cyber.org, an organization supported by the Cybersecurity and Infrastructure Agency (CISA). Key topics covered include networking, the confidentiality, integrity, and availability (CIA) triad, and operating system security. This paper not only outlines the process of creating and implementing these cybersecurity lessons but also emphasizes the iterative refinement process they underwent. The discussion primarily revolves around the valuable insights gained from implementing this curriculum at two prominent public universities in the eastern United States. By bridging the research gap and focusing on practical applications, this initiative contributes significantly to the broader discourse on cybersecurity education for underserved secondary school students.more » « less
An official website of the United States government

Full Text Available