skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Sigstore: Software Signing for Everybody
Software supply chain compromises are on the rise. From the effects of XCodeGhost to SolarWinds, hackers have identified that targeting weak points in the supply chain allows them to compromise high-value targets such as U.S. government agencies and corporate targets such as Google and Microsoft. Software signing, a promising mitigation for many of these attacks, has seen limited adoption in open-source and enterprise ecosystems. In this paper, we propose Sigstore, a system to provide widespread software signing capabilities. To do so, we designed the system to provide baseline artifact signing capabilities that minimize the adoption barrier for developers. To this end, Sigstore leverages three distinct mechanisms: First, it uses a protocol similar to ACME to authenticate developers through OIDC, tying signatures to existing and widely-used identities. Second, it enables developers to use ephemeral keys to sign their artifacts, reducing the inconvenience and risk of key management. Finally, Sigstore enables user authentication by means of artifact and identity logs, bringing transparency to software signatures. Sigstore is quickly becoming a critical piece of Internet infrastructure with more than 2.2M signatures over critical software such as Kubernetes and Distroless.  more » « less
Award ID(s):
2229703
PAR ID:
10470793
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9781450394505
Page Range / eLocation ID:
2353 to 2367
Format(s):
Medium: X
Location:
Los Angeles CA USA
Sponsoring Org:
National Science Foundation
More Like this
  1. FLOSS ecosystem as a whole is a critical component of world’s computing infrastructure, yet not well understood. In order to understand it well, we need to measure it first. We, therefore, aim to provide a framework for measuring key aspects of the entire FLOSS ecosystem. We first consider the FLOSS ecosystem through lens of a supply chain. The concept of supply chain is the existence of series of interconnected parties/affiliates each contributing unique elements and expertise so as to ensure a final solution is accessible to all interested parties. This perspective has been extremely successful in helping allowing companies to cope with multifaceted risks caused by the distributed decision-making in their supply chains, especially as they have become more global. Software ecosystems, similarly, represent distributed decisions in supply chains of code and author contributions, suggesting that relationships among projects, developers, and source code have to be measured. We then describe a massive measurement infrastructure involving discovery, extraction, cleaning, correction, and augmentation of publicly available open-source data from version control systems and other sources. We then illustrate how the key relationships among the nodes representing developers, projects, changes, and files can be accurately measured, how to handle absence of measures for user base in version control data, and, finally, illustrate how such measurement infrastructure can be used to increase knowledge resilience in FLOSS. 
    more » « less
  2. Software Supply Chain Security (SSC) involves numerous stakeholders, processes and tools that work together to deliver a software product. A vulnerability in one element can cascade through the entire system and potentially affect thousands of dependents and millions of end users. Despite the SSC’s importance and the increasing awareness around its security, existing research mainly focuses on either technical aspects, exploring various attack vectors and their mitigation, or it empirically studies developers’ challenges, but mainly within the open source context. To better develop supportive tooling and education, we need to understand how developers consider and mitigate supply chain security challenges. We conducted 18 semi-structured interviews with experienced developers actively working in industry to gather in-depth insights into their experiences, encountered challenges, and the effectiveness of various strategies to secure the SSC. We find that the developers are generally interested in securing the supply chain, but encounter many obstacles in implementing effective security measures, both specific to SSC security and for general security. Developers also mention a wide set of approaches and methods to secure their projects, but mostly report general secure software engineering methodologies and seem to be mostly unaware of SSC specific threats and mitigations. 
    more » « less
  3. The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these models. Without such data, the MSR community cannot comprehensively understand the impact of PTM adoption and reuse. This paper presents the PeaTMOSS dataset, which comprises metadata for 281,638 PTMs and detailed snapshots for all PTMs with over 50 monthly downloads (14,296 PTMs), along with 28,575 open-source software repositories from GitHub that utilize these models. Additionally, the dataset includes 44,337 mappings from 15,129 downstream GitHub repositories to the 2,530 PTMs they use. To enhance the dataset’s comprehensiveness, we developed prompts for a large language model to automatically extract model metadata, including the model’s training datasets, parameters, and evaluation metrics. Our analysis of this dataset provides the first summary statistics for the PTM supply chain, showing the trend of PTM development and common shortcomings of PTM package documentation. Our example application reveals inconsistencies in software licenses across PTMs and their dependent projects. PeaTMOSS lays the foundation for future research, offering rich opportunities to investigate the PTM supply chain. We outline mining opportunities on PTMs, their downstream usage, and cross-cutting questions. Our artifact is available at https://github.com/PurdueDualityLab/PeaTMOSS-Artifact. Our dataset is available at https://transfer.rcac.purdue.edu/file-manager?origin_id=ff978999-16c2-4b50-ac7a-947ffdc3eb1d&origin_path=%2F. 
    more » « less
  4. The development and training of deep learning models have become increasingly costly and complex. Consequently, software engineers are adopting pre-trained models (PTMs) for their downstream applications. The dynamics of the PTM supply chain remain largely unexplored, signaling a clear need for structured datasets that document not only the metadata but also the subsequent applications of these models. Without such data, the MSR community cannot comprehensively understand the impact of PTM adoption and reuse.This paper presents the PeaTMOSS dataset, which comprises metadata for 281,638 PTMs and detailed snapshots for all PTMs with over 50 monthly downloads (14,296 PTMs), along with 28,575 open-source software repositories from GitHub that utilize these models. Additionally, the dataset includes 44,337 mappings from 15,129 downstream GitHub repositories to the 2,530 PTMs they use. To enhance the dataset’s comprehensiveness, we developed prompts for a large language model to automatically extract model metadata, including the model’s training datasets, parameters, and evaluation metrics. Our analysis of this dataset provides the first summary statistics for the PTM supply chain, showing the trend of PTM development and common shortcomings of PTM package documentation. Our example application reveals inconsistencies in software licenses across PTMs and their dependent projects. PeaTMOSS lays the foundation for future research, offering rich opportunities to investigate the PTM supply chain. We outline mining opportunities on PTMs, their downstream usage, and cross-cutting questions.Our artifact is available at https://github.com/PurdueDualityLab/PeaTMOSS-Artifact. Our dataset is available at https://transfer.rcac.purdue.edu/file-manager?origin_id=ff978999-16c2-4b50-ac7a-947ffdc3eb1d&origin_path=%2F. 
    more » « less
  5. Motivation: The question of what combination of attributes drives the adoption of a particular software technology is critical to developers. It determines both those technologies that receive wide support from the community and those which may be abandoned, thus rendering developers' investments worthless. Aim and Context: We model software technology adoption by developers and provide insights on specific technology attributes that are associated with better visibility among alternative technologies. Thus, our findings have practical value for developers seeking to increase the adoption rate of their products. Approach: We leverage social contagion theory and statistical modeling to identify, define, and test empirically measures that are likely to affect software adoption. More specifically, we leverage a large collection of open source repositories to construct a software dependency chain for a specific set of R language source-code files. We formulate logistic regression models, where developers' software library choices are modeled, to investigate the combination of technological attributes that drive adoption among competing data frame (a core concept for a data science languages) implementations in the R language: tidy and data.table. To describe each technology, we quantify key project attributes that might affect adoption (e.g., response times to raised issues, overall deployments, number of open defects, knowledge base) and also characteristics of developers making the selection (performance needs, scale, and their social network). Results: We find that a quick response to raised issues, a larger number of overall deployments, and a larger number of high-score StackExchange questions are associated with higher adoption. Decision makers tend to adopt the technology that is closer to them in the technical dependency network and in author collaborations networks while meeting their performance needs. To gauge the generalizability of the proposed methodology, we investigate the spread of two popular web JavaScript frameworks Angular and React, and discuss the results. Future work: We hope that our methodology encompassing social contagion that captures both rational and irrational preferences and the elucidation of key measures from large collections of version control data provides a general path toward increasing visibility, driving better informed decisions, and producing more sustainable and widely adopted software. 
    more » « less