NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

“I see models being a whole other thing”: an empirical study of pre-trained model naming conventions and a tool for enhancing naming consistency

https://doi.org/10.1007/s10664-025-10711-4

Jiang, Wenxin; Kim, Mingyu; Cheung, Chingwo; Kim, Heesoo; Thiruvathukal, George_K; Davis, James_C (August 2025, Empirical Software Engineering)

Abstract As innovation in deep learning continues, many engineers are incorporating Pre-Trained Models (PTMs) as components in computer systems. Some PTMs are foundation models, and others are fine-tuned variations adapted to different needs. When these PTMs are named well, it facilitates model discovery and reuse. However, prior research has shown that model names are not always well chosen and can sometimes be inaccurate and misleading. The naming practices for PTM packages have not been systematically studied, which hampers engineers’ ability to efficiently search for and reliably reuse these models. In this paper, we conduct the first empirical investigation of PTM naming practices in the Hugging Face PTM registry. We begin by reporting on a survey of 108 Hugging Face users, highlighting differences from traditional software package naming and presenting findings on PTM naming practices. The survey results indicate a mismatch between engineers’ preferences and current practices in PTM naming. We then introduce DARA, the first automatedDNNARchitectureAssessment technique designed to detect PTM naming inconsistencies. Our results demonstrate that architectural information alone is sufficient to detect these inconsistencies, achieving an accuracy of 94% in identifying model types and promising performance (over 70%) in other architectural metadata as well. We also highlight potential use cases for automated naming tools, such as model validation, PTM metadata generation and verification, and plagiarism detection. Our study provides a foundation for automating naming inconsistency detection. Finally, we envision future work focusing on automated tools for standardizing package naming, improving model selection and reuse, and strengthening the security of the PTM supply chain.“The main idea is to treat a program as a piece of literature, addressed to human beings rather than to a computer”—D. Knuth
more » « less
ConfuGuard: Using Metadata to Detect Active and Stealthy Package Confusion Attacks Accurately and at Scale

Jiang, Wenxin; Çakar, Berk; Lysenko, Mikola; Davis, James C (August 2025, International Conference on Software Engineering (ICSE) 2026)

Package confusion attacks such as typosquatting threaten soft- ware supply chains. Attackers make packages with names that syntactically or semantically resemble legitimate ones, trick- ing engineers into installing malware. While prior work has developed defenses against package confusions in some soft- ware package registries, notably NPM, PyPI, and RubyGems, gaps remain: high false-positive rates, generalization to more software package ecosystems, and insights from real-world deployment. In this work, we introduce ConfuGuard, a state-of-art de- tector for package confusion threats. We begin by presenting the first empirical analysis of benign signals derived from prior package confusion data, uncovering their threat patterns, engineering practices, and measurable attributes. Advancing existing detectors, we leverage package metadata to distin- guish benign packages, and extend support from three up to seven software package registries. Our approach significantly reduces false positive rates (from 80% to 28%), at the cost of an additional 14s average latency to filter out benign pack- ages by analyzing the package metadata. ConfuGuard is used in production at our industry partner, whose analysts have already confirmed 630 real attacks detected by ConfuGuard
more » « less
Free, publicly-accessible full text available August 1, 2026
$$ZTD_{\text{JAVA}}$$: Mitigating Software Supply Chain Vulnerabilities via Zero-Trust Dependencies

https://doi.org/10.1109/ICSE55347.2025.00148

Amusuo, Paschal C; Robinson, Kyle A; Singla, Tanmay; Peng, Huiyun; Machiry, Aravind; Torres-Arias, Santiago; Simon, Laurent; Davis, James C (April 2025, IEEE)

Free, publicly-accessible full text available April 26, 2026
Signing in Four Public Software Package Registries: Quantity, Quality, and Influencing Factors

https://doi.org/10.1109/SP54263.2024.00215

Schorlemmer, Taylor R; Kalu, Kelechi G; Chigges, Luke; Ko, Kyung Myung; Ishgair, Eman Abu; Bagchi, Saurabh; Torres-Arias, Santiago; Davis, James C (May 2024, Proceedings of the IEEE Symposium on Security and Privacy)

Full Text Available
Reusing Deep Learning Models: Challenges and Directions in Software Engineering

https://doi.org/10.1109/JVA60410.2023.00015

Davis, James C; Jajal, Purvish; Jiang, Wenxin; Schorlemmer, Taylor R; Synovic, Nicholas; Thiruvathukal, George K (July 2023, IEEE)

Full Text Available
An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry

https://doi.org/10.1109/ICSE48619.2023.00206

Jiang, Wenxin; Synovic, Nicholas; Hyatt, Matt; Schorlemmer, Taylor R; Sethi, Rohan; Lu, Yung-Hsiang; Thiruvathukal, George K; Davis, James C (May 2023, IEEE)
PTMTorrent: A Dataset for Mining Open-source Pre-trained Model Packages

https://doi.org/10.1109/MSR59073.2023.00021

Jiang, Wenxin; Synovic, Nicholas; Jajal, Purvish; Schorlemmer, Taylor R; Tewari, Arav; Pareek, Bhavesh; Thiruvathukal, George K; Davis, James C (May 2023, IEEE)
Speranza: Usable, privacy-friendly software signing

Kelsey Merril, Zachary Newman (November 2022, ACM)

Software repositories, used for wide-scale open software distribu- tion, are a significant vector for security attacks. Software signing provides authenticity, mitigating many such attacks. Developer- managed signing keys pose usability challenges, but certificate- based systems introduce privacy problems. This work, Speranza, uses certificates to verify software authenticity but still provides anonymity to signers using zero-knowledge identity co-commitments. In Speranza, a signer uses an automated certificate authority (CA) to create a private identity-bound signature and proof of authoriza- tion. Verifiers check that a signer was authorized to publish a pack- age without learning the signer’s identity. The package repository privately records each package’s authorized signers, but publishes only commitments to identities in a public map. Then, when issuing certificates, the CA issues the certificate to a distinct commitment to the same identity. The signer then creates a zero-knowledge proof that these are identity co-commitments. We implemented a proof-of-concept for Speranza. We find that costs to maintainers (signing) and end users (verifying) are small (sub-millisecond), even for a repository with millions of packages. Techniques inspired by recent key transparency systems reduce the bandwidth for serving authorization policies to 2 KiB. Server costs in this system are negligible. Our evaluation finds that Speranza is practical on the scale of the largest software repositories. We also emphasize practicality and deployability in this project. By building on existing technology and employing relatively sim- ple and well-established cryptographic techniques, Speranza can be deployed for wide-scale use with only a few hundred lines of code and minimal changes to existing infrastructure. Speranza is a practical way to bring privacy and authenticity together for more trustworthy open-source software.
more » « less
An Empirical Study of Artifacts and Security Risks in the Pre-trained Model Supply Chain

https://doi.org/10.1145/3560835.3564547

Jiang, Wenxin; Synovic, Nicholas; Sethi, Rohan; Indarapu, Aryan; Hyatt, Matt; Schorlemmer, Taylor R.; Thiruvathukal, George K.; Davis, James C. (November 2022, Proceedings of the 1st ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED)

Deep neural networks achieve state-of-the-art performance on many tasks, but require increasingly complex architectures and costly training procedures. Engineers can reduce costs by reusing a pre-trained model (PTM) and fine-tuning it for their own tasks. To facilitate software reuse, engineers collaborate around model hubs, collections of PTMs and datasets organized by problem domain. Although model hubs are now comparable in popularity and size to other software ecosystems, the associated PTM supply chain has not yet been examined from a software engineering perspective. We present an empirical study of artifacts and security features in 8 model hubs. We indicate the potential threat models and show that the existing defenses are insufficient for ensuring the security of PTMs. We compare PTM and traditional supply chains, and propose directions for further measurements and tools to increase the reliability of the PTM supply chain.
more » « less
Full Text Available
SoK: Analysis of Software Supply Chain Security by Establishing Secure Design Properties

https://doi.org/10.1145/3560835.3564556

Okafor, Chinenye; Schorlemmer, Taylor R.; Torres-Arias, Santiago; Davis, James C. (November 2022, Proceedings of the 1st ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses (SCORED))

This paper systematizes knowledge about secure software supply chain patterns. It identifes four stages of a software supply chain attack and proposes three security properties crucial for a secured supply chain: transparency, validity, and separation. The paper de- scribes current security approaches and maps them to the proposed security properties, including research ideas and case studies of supply chains in practice. It discusses the strengths and weaknesses of current approaches relative to known attacks and details the various security frameworks put out to ensure the security of the software supply chain. Finally, the paper highlights potential gaps in actor and operation-centered supply chain security techniques.
more » « less
Full Text Available

« Prev Next »

Search for: All records