The NPM package repository contains over two million packages and serves tens of billions of downloads per-week. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository. NPM relies on a “semantic versioning” (‘semver’) scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possible, while breaking changes require manual intervention by downstream package maintainers. In order to understand how developers use semver, we build a dataset containing every version of every package on NPM and analyze the flow of updates throughout the ecosystem. We build a time-travelling dependency resolver for NPM, which allows us to determine precisely which versions of each dependency would have been resolved at different times. We segment our analysis to allow for a direct analysis of security-relevant updates (those that introduce or patch vulnerabilities) in comparison to the rest of the ecosystem. We find that when developers use semver correctly, critical updates such as security patches can flow quite rapidly to downstream dependencies in the majority of cases (90.09%), but this does not always occur, due to developers’ imperfect use of both semver version constraints and semver version number increments. Our findings have implications for developers and researchers alike. We make our infrastructure and dataset publicly available under an open source license.
more »
« less
Flexible and Optimal Dependency Management via Max-SMT
and often fails to installs the newest versions of dependencies; 2) NPM’s algorithm leads to duplicated dependencies and bloated code, which is particularly bad for web applications that need to minimize code size; 3) NPM’s vulnerability fixing algorithm is also greedy, and can even introduce new vulnerabilities; and 4) NPM’s ability to duplicate dependencies can break stateful frameworks and requires a lot of care to workaround. Although existing tools try to address these problems they are either brittle, rely on post hoc changes to the dependency tree, do not guarantee optimality, and are not composable. We present PacSolve, a unifying framework and implementation for dependency solving which allows for customizable constraints and optimization goals. We use PacSolve to build MaxNPM, a complete, drop-in replacement for NPM, which empowers developers to combine multiple objectives when installing dependencies. We evaluate MaxNPM with a large sample of packages from the NPM ecosystem and show that it can: 1) reduce more vulnerabilities in dependencies than NPM’s auditing tool in 33% cases; 2) chooses newer dependencies than NPM in 14% cases; and 3) chooses fewer dependencies than NPM in 21% cases. All our code and data is open and available.
more »
« less
- Award ID(s):
- 2102288
- PAR ID:
- 10416449
- Date Published:
- Journal Name:
- Proceedings of the International Conference on Software Engineering
- ISSN:
- 1819-3781
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recent high-profile incidents in open-source software have greatly raised practitioner attention on software supply chain attacks. To guard against potential malicious package updates, security practitioners advocatepinningdependency to specific versions rather thanfloatingin version ranges. However, it remains controversial whether pinning carries a meaningful security benefit that outweighs the cost of maintaining outdated and possibly vulnerable dependencies. In this paper, we quantify, through counterfactual analysis and simulations, the security and maintenance impact of version constraints in the npm ecosystem. By simulating dependency resolutions over historical time points, we find that pinning direct dependencies not only (as expected) increases the cost of maintaining vulnerable and outdated dependencies, but also (surprisingly) even increases the risk of exposure to malicious package updates in larger dependency graphs due to the specifics of npm’s dependency resolution mechanism. Finally, we explore collective pinning strategies to secure the ecosystem against supply chain attacks, suggesting specific changes to npm to enable such interventions. Our study provides guidance for practitioners and tool designers to manage their supply chains more securely.more » « less
-
This is the artifact for: A Large Scale Analysis of Semantic Versioning in NPM. The artifact contains: A full scrape of all metadata from NPM (package / version information, dependencies, etc.) as of October 31, 2022.A copy of our code, which includes the software for scraping metadata and package tarball (code) data, as well as all analysis scripts that are needed to replicate the figures from the paper.more » « less
-
null (Ed.)Dependencies among software entities are the basis for many software analytic research and architecture analysis tools. Dynamically typed languages, such as Python, JavaScript and Ruby, tolerate the lack of explicit type references, making certain syntactic dependencies indiscernible in source code. We call these possible dependencies, in contrast with the explicit dependencies that are directly referenced in source code. Type inference techniques have been widely studied and applied, but existing architecture analytic research and tools have not taken possible dependencies into consideration. The fundamental question is, to what extent will these missing possible dependencies impact the architecture analysis? To answer this question , we conducted an empirical study with 105 Python projects, using type inference techniques to manifest possible dependencies. Our study revealed that the architectural impact of possible dependencies is substantial-higher than that of explicit dependencies: (1) file-level possible dependencies account for at least 27.93% of all file-level dependencies, and create different dependency structures than that of explicit dependencies only, with an average difference of 30.71%; (2) adding possible dependencies significantly improves the precision (0.52%∼14.18%), recall(31.73%∼39.12%), and F1 scores (22.13%∼32.09%) of capturing co-change relations; (3) on average, a file involved in possible dependencies influences 28% more files and 42% more dependencies within architectural sub-spaces than a file involved in just explicit dependencies; (4) on average, a file involved in possible dependencies consumes 32% more maintenance effort. Consequently, maintainability scores reported by existing tools make a system written in these dynamic languages appear to be better modularized than it actually is. This evidence strongly * with the Ministry suggests that possible dependencies have a more significant impact than explicit dependencies on architecture quality, that architecture analysis and tools should assess and even emphasize the architectural impact of possible dependencies due to dynamic typing.more » « less
-
As the reliance on open-source software dependencies increases, managing the security vulnerabilities in these dependencies becomes complex. State-of-the-art industry tools use reachability analysis of code to alert developers when security vulnerabilities in dependencies are likely to impact their projects. These tools heavily rely on precisely identifying the location of the vulnerability within the dependency, specifically vulnerable functions. However, the process of identifying vulnerable functions is currently either manual or uses a naive automated approach that falsely assumes all changed functions in a security patch link are vulnerable. In this paper, we explore using open-source large language models (LLMs) to improve pairing security advisories with vulnerable functions. We explore various prompting strategies, learning paradigms (i.e., zero-shot vs. few-shot), and show our approach generalizes to other open-source LLMs. Compared to the naive automated approach, we show a 173% increase in precision while only having an 18% decrease in recall. The significant increase in precision to enhance vulnerable function identification lays the groundwork for downstream techniques that depend on this critical information for security analysis and threat mitigation.more » « less
An official website of the United States government

