skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM to 12:00 PM ET on Tuesday, March 25 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on December 15, 2025

Title: Insights from Running 24 Static Analysis Tools on Open Source Software Repositories
OSS is important and useful. We want to ensure that it is of high quality and has no security issues. Static analysis tools provide easy-to-use and application-independent mechanisms to assess various aspects of a given code. Many effective open-source static analysis tools exist. In this paper, we perform the first comprehensive analysis using 24 open-source static analysis tools (through Omega Analyzer) on 4,947 repositories. Our study identified several interesting findings, such as the distribution of errors in relation to the criticality score of repositories shows that repositories with a criticality score have the highest percentage of errors. We envision that our findings provide insights into the effectiveness of static analysis tools on OSS and future research directions in securing OSS repositories.  more » « less
Award ID(s):
2340548
PAR ID:
10566241
Author(s) / Creator(s):
; ; ;
Editor(s):
Patil, Vishwas T; Krishnan, Ram; Shyamasundar, Rudrapatna K
Publisher / Repository:
Information Systems Security
Date Published:
ISBN:
978-3-031-80019-1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Open source software (OSS) is ubiquitous, serving as specialized applications nurtured by devoted user communities, and as digital infrastructure underlying platforms used by millions of people. OSS is developed, maintained, and extended through the contribution of independent developers as well as people from businesses, universities, government research institutions, and nonprofits. Despite its prevalence, the scope and impact of OSS are not currently well-measured. Recent policies of the U.S. Federal Government promote sharing of software code developed by or for the Federal Government. While the policy to promote reusing and sharing of software created with public funding is relatively new, public funding plays an important and not fully accounted role in the creation of OSS. This paper aims to measure the scope and value of OSS development in the U.S. Federal Government. We collect data from Code.gov, the government’s platform for sharing OSS projects, and study contributions of agencies. The dataset contains 17K repositories from 21 agencies, with the majority of contributions originating from the DOE, NASA and GSA. In addition, we collect data on development activity (e.g., lines of code, contributors) of the repositories on GitHub, the largest hosting facility worldwide. Adopting a cost estimation model from software engineering, we generate estimates of investment in OSS that are consistent with the U.S. national accounting methods used for measuring software investment. Finally, we generate and analyze collaboration network resulting from cross-agency contributions to repositories and explore the centrality of agencies in the network. 
    more » « less
  2. In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse exist. This dataset seeks to encourage the studies of OSS-wide copy-based reuse by providing copying activity data that captures whole-file reuse in nearly all OSS. To accomplish that, we develop approaches to detect copybased reuse by developing an efficient algorithm that exploits World of Code infrastructure: a curated and cross referenced collection of nearly all open source repositories. We expect this data will enable future research and tool development that support such reuse and minimize associated risks. 
    more » « less
  3. Open source communities hosted in large foundations operate in a complex socio-technical ecosystem, which includes a heterogeneous mix of projects and stakeholders. Previous work has thus far investigated the challenges faced in OSS communities from the point of view of specific stakeholders, primarily at the level of individual projects. None have yet studied the challenges faced within a large, federated open source organization. In this paper, we aim to bridge this gap to identify ongoing challenges contributors face in a mature OSS organization. To do so, we surveyed 624 contributors at the Apache Software Foundation (ASF) and ran 11 semi-structured follow up interviews. We validated our findings through member checking with the interviewees as well as the ASF Diversity and Inclusion (D&I) committee. The contributions of this paper include: (1) an empirically-evidenced conceptual model of the 88 challenges that contributors face in a mature OSS foundation and (2) a set of 48 community-recommended strategies for alleviating these challenges. Our results show that even well-established and mature organizations still face a variety of individual and project-specific challenges and that it is difficult to design a comprehensive set of processes and guidelines to match the needs and expectations of a diverse and large federated community. Our conceptual challenges model and associated strategies to mitigate them can provide guidance to other OSS foundations and projects helping them in building better support processes and tools to create a successful, thriving community of contributors. 
    more » « less
  4. Defects in infrastructure as code (IaC) scripts can have serious consequences, for example, creating large-scale system outages. A taxonomy of IaC defects can be useful for understanding the nature of defects, and identifying activities needed to fix and prevent defects in IaC scripts. The goal of this paper is to help practitioners improve the quality of infrastructure as code (IaC) scripts by developing a defect taxonomy for IaC scripts through qualitative analysis. We develop a taxonomy of IaC defects by applying qualitative analysis on 1,448 defect-related commits collected from open source software (OSS) repositories of the Openstack organization. We conduct a survey with 66 practitioners to assess if they agree with the identified defect categories included in our taxonomy. We quantify the frequency of identified defect categories by analyzing 80,425 commits collected from 291 OSS repositories spanning across 2005 to 2019. Our defect taxonomy for IaC consists of eight categories, including a category specific to IaC called idempotency (i.e., defects that lead to incorrect system provisioning when the same IaC script is executed multiple times). We observe the surveyed 66 practitioners to agree most with idempotency. The most frequent defect category is configuration data i.e., providing erroneous configuration data in IaC scripts. Our taxonomy and the quantified frequency of the defect categories may help in advancing the science of IaC script quality. 
    more » « less
  5. null (Ed.)
    Open Source Software (OSS) projects start with an initial vocabulary, often determined by the first generation of developers. This vocabulary, embedded in code identifier names and internal code comments, goes through multiple rounds of change, influenced by the interrelated patterns of human (e.g., developers joining and departing) and system (e.g., maintenance activities) interactions. Capturing the dynamics of this change is crucial for understanding and synthesizing code changes over time. However, existing code evolution analysis tools, available in modern version control systems such as GitHub and SourceForge, often overlook the linguistic aspects of code evolution. To bridge this gap, in this paper, we propose to study code evolution in OSS projects through the lens of developers' language, also known as code lexicon. Our analysis is conducted using 32 OSS projects sampled from a broad range of application domains. Our results show that different maintenance activities impact code lexicon differently. These insights lay out a preliminary foundation for modeling the linguistic history of OSS projects. In the long run, this foundation will be utilized to provide support for basic program comprehension tasks and help researchers gain new insights into the complex interplay between linguistic change and various system and human aspects of OSS development. 
    more » « less