NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Leveraging Large Language Models to Detect npm Malicious Packages

Zahan, Nusrat; Burckhardt, Philipp; Lysenko, Mikola; Aboukhadijeh, Feross; Williams, Laurie (April 2025, IEEE)

Existing malicious code detection techniques demand the integration of multiple tools to detect different malware patterns, often suffering from high misclassification rates. Therefore, malicious code detection techniques could be enhanced by adopting advanced, more automated approaches to achieve high accuracy and a low misclassification rate. The goal of this study is to aid security analysts in detecting malicious packages by empirically studying the effectiveness of Large Language Models (LLMs) in detecting malicious code. We present SocketAI, a malicious code review workflow to detect malicious code. To evaluate the effectiveness SocketAI, we leverage a benchmark dataset of 5,115 npm packages, of which 2,180 packages have malicious code. We conducted a baseline comparison of GPT-3 and GPT-4 models with the state-of-the-art CodeQL static analysis tool, using 39 custom CodeQL rules developed in prior research to detect malicious Javascript code. We also compare the effectiveness of static analysis as a pre-screener with SocketAI workflow, measuring the number of files that need to be analyzed and the associated costs. Additionally, we performed a qualitative study to understand the types of malicious packages detected or missed by our workflow. Our baseline comparison demonstrates a 16% and 9% improvement over static analysis in precision and F1 scores, respectively. GPT-4 achieves higher accuracy with 99% precision and 97% F1 scores, while GPT-3 offers a more cost-effective balance at 91% precision and 94% F1 scores. Prescreening files with a static analyzer reduces the number of files requiring LLM analysis by 77.9% and decreases costs by 60.9% for GPT-3 and 76.1% for GPT-4. Our qualitative analysis identified data theft, execution of arbitrary code, and suspicious domain categories as the top detected malicious packages.
more » « less
Free, publicly-accessible full text available April 28, 2026
AssetHarvester: A Static Analysis Tool for Detecting Secret-Asset Pairs in Software Artifacts

https://doi.org/10.1109/ICSE55347.2025.00067

Basak, Setu Kumar; English, K Virgil; Ogura, Ken; Kambara, Vitesh; Reaves, Bradley; Williams, Laurie (April 2025, IEEE)

GitGuardian monitored secrets exposure in public GitHub repositories and reported that developers leaked over 12 million secrets (database and other credentials) in 2023, indicating a 113% surge from 2021. Despite the availability of secret detection tools, developers ignore the tools' reported warnings because of false positives (25%−99%). However, each secret protects assets of different values accessible through asset identifiers (a DNS name and a public or private IP address). The asset information for a secret can aid developers in filtering false positives and prioritizing secret removal from the source code. However, existing secret detection tools do not provide the asset information, thus presenting difficulty to developers in filtering secrets only by looking at the secret value or finding the assets manually for each reported secret. The goal of our study is to aid software practitioners in prioritizing secrets removal by providing the assets information protected by the secrets through our novel static analysis tool. We present AssetHarvester, a static analysis tool to detect secret-asset pairs in a repository. Since the location of the asset can be distant from where the secret is defined, we investigated secret-asset co-location patterns and found four patterns. To identify the secret-asset pairs of the four patterns, we utilized three approaches (pattern matching, data flow analysis, and fast-approximation heuristics). We curated a benchmark of 1,791 secret-asset pairs of four database types extracted from 188 public GitHub repositories to evaluate the performance of AssetHarvester. AssetHarvester demonstrates precision of (97%), recall (90 %), and F1-score (94 %) in detecting secret-asset pairs. Our findings indicate that data flow analysis employed in AssetHarvester detects secret-asset pairs with 0 % false positives and aids in improving the recall of secret detection tools. Additionally, AssetHarvester shows 43 % increase in precision for database secret detection compared to existing detection tools through the detection of assets, thus reducing developer's alert fatigue.
more » « less
Free, publicly-accessible full text available April 26, 2026
Research Directions in Software Supply Chain Security

https://doi.org/10.1145/3714464

Williams, Laurie; Benedetti, Giacomo; Hamer, Sivana; Paramitha, Ranindya; Rahman, Imranur; Tamanna, Mahzabin; Tystahl, Greg; Zahan, Nusrat; Morrison, Patrick; Acar, Yasemin; et al (June 2025, ACM Transactions on Software Engineering and Methodology)

Reusable software libraries, frameworks, and components, such as those provided by open source ecosystems and third-party suppliers, accelerate digital innovation. However, recent years have shown almost exponential growth in attackers leveraging these software artifacts to launch software supply chain attacks. Past well-known software supply chain attacks include the SolarWinds, log4j, and xz utils incidents. Supply chain attacks are considered to have three major attack vectors: through vulnerabilities and malware accidentally or intentionally injected into open source and third-partydependencies/components/containers; by infiltrating thebuild infrastructureduring the build and deployment processes; and through targeted techniques aimed at thehumansinvolved in software development, such as through social engineering. Plummeting trust in the software supply chain could decelerate digital innovation if the software industry reduces its use of open source and third-party artifacts to reduce risks. This article contains perspectives and knowledge obtained from intentional outreach with practitioners to understand their practical challenges and from extensive research efforts. We then provide an overview of current research efforts to secure the software supply chain. Finally, we propose a future research agenda to close software supply chain attack vectors and support the software industry.
more » « less
Free, publicly-accessible full text available June 30, 2026
Narrowing the Software Supply Chain Attack Vectors: The SSDF Is Wonderful but not Enough

https://doi.org/10.1109/MSEC.2024.3359798

Williams, Laurie (March 2024, IEEE Security & Privacy)

Full Text Available
MalwareBench: Malware Samples are Not Enough

Zahan, Nusrat; Burckhardt, Philipp; Lysenko, Mikola; Aboukhadijeh, Feross; Williams, Laurie (April 2024, Proceedings of the IEEE/ACM International Conference on Mining Software Repositories (MSR))

The prevalent use of third-party components in modern software development, coupled with rapid modernization and digitization, has significantly amplified the risk of software supply chain security attacks. Popular large registries like npm and PyPI are highly targeted malware distribution channels for attackers due to heavy growth and dependence on third-party components. Industry and academia are working towards building tools to detect malware in the software supply chain. However, a lack of benchmark datasets containing both malware and neutral packages hampers the evaluation of the performance of these malware detection tools. The goal of our study is to aid researchers and tool developers in evaluating and improving malware detection tools by contributing a benchmark dataset built by systematically collecting malicious and neutral packages from the npm and PyPI ecosystems. We present MalwareBench, a labeled dataset of 20,534 packages (of which 6,475 are malicious) of npm and PyPI ecosystems. We constructed the benchmark dataset by incorporating pre-existing malware datasets with the Socket internal benchmark data and including popular and newly released npm and PyPI packages. The ground truth labels of these packages were determined using the Socket AI Scanner and manual inspection.
more » « less
S3C2 Summit 2024-03: Industry Secure Supply Chain Summit

Tystahl, Greg Tystahl; Acar, Yasemin; Cuckier, Michel Cukier; Enck, William; Kästner, Christian; Kapravelos, Alexandros; Wermke, Dominik; Williams, Laurie (March 2024, arxiv)

Supply chain security has become a very important vector to con- sider when defending against adversary attacks. Due to this, more and more developers are keen on improving their supply chains to make them more robust against future threats. On March 7th, 2024 researchers from the Secure Software Supply Chain Center (S3C2) gathered 14 industry leaders, developers and consumers of the open source ecosystem to discuss the state of supply chain security. The goal of the summit is to share insights between companies and developers alike to foster new collaborations and ideas moving forward. Through this meeting, participants were questions on best practices and thoughts how to improve things for the future. In thispaper we summarize the responses and discussions of the summit.
more » « less
Full Text Available
A Comparative Study of Software Secrets Reporting by Secret Detection Tools

https://doi.org/10.1109/ESEM56168.2023.10304853

Basak, Setu Kumar; Cox, Jamison; Reaves, Bradley; Williams, Laurie (October 2023, IEEE)

According to GitGuardian’s monitoring of public GitHub repositories, secrets sprawl continued accelerating in 2022 by 67% compared to 2021, exposing over 10 million secrets (API keys and other credentials). Though many open-source and proprietary secret detection tools are available, these tools output many false positives, making it difficult for developers to take action and teams to choose one tool out of many. To our knowledge, the secret detection tools are not yet compared and evaluated. Aims: The goal of our study is to aid developers in choosing a secret detection tool to reduce the exposure of secrets through an empirical investigation of existing secret detection tools. Method: We present an evaluation of five opensource and four proprietary tools against a benchmark dataset. Results: The top three tools based on precision are: GitHub Secret Scanner (75%), Gitleaks (46%), and Commercial X (25%), and based on recall are: Gitleaks (88%), SpectralOps (67%) and TruffleHog (52%). Our manual analysis of reported secrets reveals that false positives are due to employing generic regular expressions and ineffective entropy calculation. In contrast, false negatives are due to faulty regular expressions, skipping specific file types, and insufficient rulesets. Conclusions: We recommend developers choose tools based on secret types present in their projects to prevent missing secrets. In addition, we recommend tool vendors update detection rules periodically and correctly employ secret verification mechanisms by collaborating with API vendors to improve accuracy.
more » « less
Full Text Available
OpenSSF Scorecard: On the Path Toward Ecosystem-Wide Automated Security Metrics

https://doi.org/10.1109/MSEC.2023.3279773

Zahan, Nusrat; Kanakiya, Parth; Hambleton, Brian; Shohan, Shohanuzzaman; Williams, Laurie (November 2023, IEEE Security & Privacy)

The OpenSSF Scorecard project is an automated tool to monitor the security health of open-source software. This study evaluates the applicability of the Scorecard tool and compares the security practices and gaps in the npm and PyPI ecosystems.
more » « less
Full Text Available
Do Software Security Practices Yield Fewer Vulnerabilities?

https://doi.org/10.1109/ICSE-SEIP58684.2023.00032

Zahan, Nusrat; Shohan, Shohanuzzaman; Harris, Dan; Williams, Laurie (May 2023, IEEE)

Due to the ever-increasing number of security breaches, practitioners are motivated to produce more secure software. In the United States, the White House Office released a memorandum on Executive Order (EO) 14028 that mandates organizations provide self-attestation of the use of secure software development practices. The OpenSSF Scorecard project allows practitioners to measure the use of software security practices automatically. However, little research has been done to determine whether the use of security practices improves package security, particularly which security practices have the biggest impact on security outcomes. The goal of this study is to assist practitioners and researchers in making informed decisions on which security practices to adopt through the development of models between software security practice scores and security vulnerability counts. To that end, we developed five supervised machine learning models for npm and PyPI packages using the OpenSSF Scorecard security practices scores and aggregate security scores as predictors and the number of externally-reported vulnerabilities as a target variable. Our models found that four security practices (Maintained, Code Review, Branch Protection, and Security Policy) were the most important practices influencing vulnerability count. However, we had low R2 (ranging from 9% to 12%) when we tested the models to predict vulnerability counts. Additionally, we observed that the number of reported vulnerabilities increased rather than reduced as the aggregate security score of the packages increased. Both findings indicate that additional factors may influence the package vulnerability count. Other factors, such as the scarcity of vulnerability data, time to implicate security practices vs. time to detect vulnerabilities, and the need for more adequate scripts to detect security practices, may impede the data-driven studies to indicate that practice can aid in reducing externally-reported vulnerabilities. We suggest that vulnerability count and security score data be refined so that these measures can be used to provide actionable guidance on security practices.
more » « less
SecretBench: A Dataset of Software Secrets

https://doi.org/10.1109/MSR59073.2023.00053

Basak, Setu Kumar; Neil, Lorenzo; Reaves, Bradley; Williams, Laurie (May 2023, IEEE)

According to GitGuardian’s monitoring of public GitHub repositories, the exposure of secrets (API keys and other credentials) increased two-fold in 2021 compared to 2020, totaling more than six million secrets. However, no benchmark dataset is publicly available for researchers and tool developers to evaluate secret detection tools that produce many false positive warnings. The goal of our paper is to aid researchers and tool developers in evaluating and improving secret detection tools by curating a benchmark dataset of secrets through a systematic collection of secrets from open-source repositories. We present a labeled dataset of source codes containing 97,479 secrets (of which 15,084 are true secrets) of various secret types extracted from 818 public GitHub repositories. The dataset covers 49 programming languages and 311 file types.
more » « less
Full Text Available

« Prev Next »

Search for: All records