<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>A Large Scale Analysis of Semantic Versioning in NPM</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>05/14/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10416452</idno>
					<idno type="doi"></idno>
					<title level='j'>IEEE International Working Conference on Mining Software Repositories</title>
<idno>2160-1860</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Donald Pinckney</author><author>Federico Cassano</author><author>Arjun Guha</author><author>Jonathan Bell</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The NPM package repository contains over two million packages and serves tens of billions of downloads per-week. Nearly every single JavaScript application uses the NPM package manager to install packages from the NPM repository. NPM relies on a “semantic versioning” (‘semver’) scheme to maintain a healthy ecosystem, where bug-fixes are reliably delivered to downstream packages as quickly as possible, while breaking changes require manual intervention by downstream package maintainers. In order to understand how developers use semver, we build a dataset containing every version of every package on NPM and analyze the flow of updates throughout the ecosystem. We build a time-travelling dependency resolver for NPM, which allows us to determine precisely which versions of each dependency would have been resolved at different times. We segment our analysis to allow for a direct analysis of security-relevant updates (those that introduce or patch vulnerabilities) in comparison to the rest of the ecosystem. We find that when developers use semver correctly, critical updates such as security patches can flow quite rapidly to downstream dependencies in the majority of cases (90.09%), but this does not always occur, due to developers’ imperfect use of both semver version constraints and semver version number increments. Our findings have implications for developers and researchers alike. We make our infrastructure and dataset publicly available under an open source license.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Modern software development relies inextricably on open source package repositories on a massive scale. For example, the NPM repository contains over two million packages and serves tens of billions of downloads weekly, and practically every JavaScript application uses the NPM package manager to install packages from the NPM repository. As open source package repositories grow in scale, the maintenance, updating, and distribution of packages represents a growing attack surface for malicious actors to target, and understanding the properties of the software supply chain is vital.</p><p>One particular concern in open source ecosystems is the technical lag <ref type="bibr">[1]</ref>- <ref type="bibr">[5]</ref> that packages experience between when a new update is available for a dependency and when that update is applied. NPM and other similarly-designed ecosystems (PyPi, etc.) offer a potential solution in the form of semantic versioning ("semver") and flexible version constraints. In semver, versions are numbered in the form major.minor.bug, where major denotes breaking API changes, minor denotes a non-breaking change adding new functionality, and bug denotes a backwards-compatible bug fix<ref type="foot">foot_0</ref>  <ref type="bibr">[6]</ref>. Flexible version constraints allow developers of downstream (i.e., dependent) packages to specify which types of updates they are willing to automatically accept. Ideally, semver helps developers to express constraints and version numbers so that non-breaking important updates (such as security patches) flow rapidly to downstream packages, while breaking changes are delayed until developers choose to accept them. For example, a developer may specify that they depend on the package react, with constraint &#710;18.1.1, which means that automatic updates are allowed until (excluding) version 19.0.0. In essence, this constraint says "receive all updates to React that are unlikely to be breaking changes".</p><p>However, there are three significant complications with semver in practice that can lead to technical lag <ref type="bibr">[1]</ref>- <ref type="bibr">[5]</ref>. First, the positive properties of semver are predicated on both upstream developers labeling their updates with the correct semver increment type, and on downstream developers using constraints that are neither too flexible nor too strict. Second, dependencies in the middle of a transitive dependency chain affect the final received versions of dependencies. The downstream developer may list a constraint that allows the most up-to-date version of a package, but if a transitive dependency has a more restrictive constraint, the downstream developer may not receive the up-to-date version. Third, allowing for automatic (bug) updates to dependencies can be dangerous, as it introduces an attack vector for malware.</p><p>In this work, we aim to understand how developers make use of dependencies, semantic versioning, and flexible version constraints at the ecosystem-scale, and how all these factors intersect to affect developer experience and supply chain security. Prior work on mining data from the NPM ecosystem has primarily focused on answering questions about NPM at a snapshot in time <ref type="bibr">[7]</ref>- <ref type="bibr">[10]</ref>. In this work, we first understand how developers make use of semantic versioning by analyzing flexible constraint type frequency and semver increment type frequency over the entire history of NPM. Then, to understand how updates flow in practice at the ecosystem scale, we run large-scale experiments that resolve packages' dependencies at different snapshots in time, observing how long it takes for updates to be received by downstream packages. To enable these experiments, we built a tool that allows for accurate timetravel dependency solving throughout the history of NPM. This methodology allows for more precision in resolving dependencies throughout time, as prior work <ref type="bibr">[2]</ref>, <ref type="bibr">[4]</ref>, <ref type="bibr">[11]</ref>- <ref type="bibr">[13]</ref> approximated NPM's behavioral semantics, which are not well-specified <ref type="bibr">[14]</ref>.</p><p>In total, we have built the first dataset of NPM that includes (as of October 31, 2022):</p><p>1) every package on NPM (2,663,681 packages) 2) every version of every package <ref type="bibr">(28,</ref><ref type="bibr">941</ref>,927 versions) 3) metadata (&#8776; 40 GB compressed) and packaged code <ref type="bibr">(&#8776; 19 TB compressed)</ref> for every version of every package, 4) full data of security advisories issued for NPM packages, downloaded from the GitHub Security Advisory database.</p><p>This dataset is indexed to allow for easy querying and largescale distributed computations. To gather this data, we designed and implemented a distributed system for downloading, archiving and retrieving packages from NPM. We release our scraper and dataset under the BSD 3-Clause license <ref type="foot">2</ref> . We use our dataset to answer several questions about the NPM ecosystem, in particular how developers use semantic versioning, and how this affects supply chain security:</p><p>&#8226; RQ1: Do developers specify dependency version constraints to allow for automated updates? &#8226; RQ2: Do developers use semantic versioning in their package releases to allow for automated updates to downstream packages? &#8226; RQ3: Do packages frequently contain out-of-date dependencies? And when updates are published, how long until those updates are received by downstream packages? &#8226; RQ4: Among the types of semver updates, what types of high-level changes do developers tend to make? How often do developers only update dependencies?</p><p>These results are impactful for both developers and researchers. We show that, generally, the NPM ecosystem is effective in terms of efficient distribution of non-breaking updates, but most packages end up with out-of-date dependencies anyways due to the sheer volume of dependencies and updates to deal with. In addition, we found evidence that some developers use semver non-optimally when releasing security patches, and that minor and major semver updates appear to have a higher risk of introducing security vulnerabilities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. RELATED WORK</head><p>Our research questions and methodology build on a large body of related work examining semantic versioning and technical lag.</p><p>1) Semantic Versioning: While semantic versioning does have a precise syntactic specification <ref type="bibr">[6]</ref>, the semantics of what counts as backwards-compatible are not formally defined. Tooling, including NPM, generally does not enforce how developers make use of semantic versioning in practice. Choices of semantic versioning usage impact speed of distribution of packages, technical lag, stability, developer frustration, and more. Developer interviews in 2015 conducted by Bogart et al. <ref type="bibr">[16]</ref> in the NPM and CRAN ecosystems found that developers try to use semantic versioning, but are not always aware of its implications and generally find dependency management exhausting. More concretely, Raemaekers et al. <ref type="bibr">[17]</ref>  <ref type="bibr">[17]</ref> found that in 2006-2011, Maven developers often introduced binary incompatible changes within supposedly non-breaking semver updates. Wittern et al. <ref type="bibr">[18]</ref> studied dependencies between packages in NPM, and found that the number of dependencies between packages is increasing over time, and observed the frequencies of version constraint types in 2016. Dietrich et al. <ref type="bibr">[19]</ref> then observed how version constraint type frequencies have changed over time, at the project level. Examining version constraint evolution at the full-ecosystem level allows for an evaluation based on "wisdom of the crowds." Decan et al. <ref type="bibr">[20]</ref> perform an analysis of dependency constraints at the ecosystem level for Cargo, NPM, Packagist and Rubygems. Focusing only on a single ecosystem (NPM), we validate Decan et al's findings, and perform a much deeper analysis of the dataset. Our study also examines the frequencies of released update types, which enables us to draw important implications about the diffusion of security updates.</p><p>2) Technical Lag: Many pieces of prior work attempt to analyze the propagate of updates to downstream packages, and how out-of-date the dependencies of a project typically are. Gonzalez-Barahona et al. <ref type="bibr">[1]</ref> define the measure of "technical lag", which analyzes how far out-of-date a package's dependencies are relative to more recently released versions, which has since been been further studied in the context of NPM <ref type="bibr">[2]</ref>, <ref type="bibr">[4]</ref>, <ref type="bibr">[5]</ref>. In addition, the concept of technical lag is specialized to the analysis of the propagation of security patches or vulnerabilities in further work <ref type="bibr">[3]</ref>, <ref type="bibr">[11]</ref>, <ref type="bibr">[12]</ref>.</p><p>Calculating technical lag is difficult, and prior works have attempted to simulate the dependencies that would have been resolved at different points in time. Some of these works do not consider transitive dependencies <ref type="bibr">[4]</ref>, <ref type="bibr">[11]</ref>, which is concerning as transitive dependencies typically represent the majority of a package's dependencies in NPM. Others have followed up by considering transitive dependencies <ref type="bibr">[2]</ref>, <ref type="bibr">[12]</ref>. Liu et al. <ref type="bibr">[13]</ref> introduce DTResolver, a custom dependency solving algorithm that more closely matches the behavior of NPM. However, the authors' evaluation of DTResolver found that it only matched NPM's behavior when building dependency trees for 90.58% of 15,673 libraries <ref type="bibr">[13]</ref>. Our recent evaluation of NPM's dependency resolution semantics showed a variety of corner cases in which NPM's algorithm will select unexpected versions for dependencies in order to unify versions <ref type="bibr">[14]</ref>. Particularly when resolving transitive dependencies, the error introduced by an incorrect approximation of NPM's resolution semantics compounds. Compared to all prior work that we are aware of in studying technical lag in the NPM ecosystem, ours is the only study to use NPM itself to resolve historical dependencies. We make our tools and dataset available to allow others to employ this methodology <ref type="bibr">[15]</ref>.</p><p>3) Studies of NPM: Finally, other studies have looked at more specific questions or applications of data analysis from NPM, such as studying when developers downgrade packages <ref type="bibr">[21]</ref>, analyzing the phenomenon of popular "micro" packages in NPM <ref type="bibr">[8]</ref>- <ref type="bibr">[10]</ref>, and developing methods to understand and prevent vulnerabilities or malware in NPM <ref type="bibr">[7]</ref>, <ref type="bibr">[22]</ref>- <ref type="bibr">[24]</ref>. We will return to discuss how our findings may guide future research applications in Section VI-C.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. METHODOLOGY</head><p>At a high-level, we answer our four core research questions using different aspects of our dataset and analysis systems. RQ1 and RQ2 are answered purely via analysis of our scraped metadata. Answering RQ3 is more challenging as it requires reasoning about how dependencies are resolved across time, which we answer by using our time-traveling dependency resolver in large-scale experiments. Finally, to answer RQ4 we compute diffs between tarballs of package versions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. RQ1: Version Constraint Usage</head><p>Within NPM's rich language for specifying version constraints on dependencies <ref type="bibr">[14]</ref>, <ref type="bibr">[25]</ref>, it is unclear which of the many constraint types developers frequently make use of and how loose or restrictive those constraints are.</p><p>We classify version constraints in the following mutually exclusive categories:</p><p>1) Exact constraints ("=1.2.3") accept no versions other than the specifically listed one; 2) Bug-flexible constraints ("&#732;1.2.3") accept any updates to the bug semver component, so 1.2.4, etc.; 3) Minor-flexible constraints ("&#710;1.2.3") accept any updates to the minor semver component, so 1.3.0, etc.; 4) Geq constraints ("&gt;=1.2.3") accept any versions greater than or equal to the specified version; 5) Any constraints (" * ") accept any versions; and 6) Other constraints, such as disjunction, conjunction, GitHub URLs, etc. We then examine frequencies of these constraint categories across NPM, segmented by year so we can observe how constraint usage has evolved historically. In addition, one challenge with analyzing data from NPM is that some packages publish a massive number of versions (React has over 1,000 versions), so aggregating across all versions may produce results that are biased towards packages with more versions. In RQ1 we select only the most recent version of every package that was uploaded within each year. This enables us to segment by time while avoiding this bias.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. RQ2: Semantic Versioning in Updates</head><p>We now turn to examine how developers increment their semantic version numbers when publishing updates. We first find all of the package updates that have occurred in NPM's history, and classify each as a bug (e.g. 5.4.8 &#8594; 5.4.9), minor (e.g. 5.4.8 &#8594; 5.5.0), or major (e.g. 5.4.8 &#8594; 6.0.0) update.</p><p>One would expect that updates can trivially be identified as consecutive versions of the same package. NPM however allows versions to be published non-chronologically. This feature allows for maintenance of parallel version branches. For example, consider the following chronological order of versions: 1.0.0, then 2.0.0, then 1.0.1, and then 2.0.1. In this example, the mined updates should consist of: 1.0.0 &#8594; 2.0.0, 1.0.0 &#8594; 1.0.1, and 2.0.0 &#8594; 2.0.1, as these reflect updates that are most closely based on the source version while being chronologically and numerically consistent. We would not include the update 1.0.1 &#8594; 2.0.0 because it is not chronologically consistent, and thus 2.0.0 is unlikely to be a derivative of 1.0.1.</p><p>To determine the set of updates, we group versions by the equivalence relation of same major component and assert that groups are ordered within themselves chronologically. We then have updates between versions within each group, and between different groups. Continuing the above example, we have two groups: {1.0.0, 1.0.1} and {2.0.0, 2.0.1}. From intra-group ordering we obtain 1.0.0 &#8594; 1.0.1 and 2.0.0 &#8594; 2.0.1, and from the inter-group ordering we obtain 1.0.0 &#8594; 2.0.0. We believe this algorithm reflects well how developers publish updates, and we discuss alternatives in Section VII. When computing these updates, we first filter out all prerelease versions (e.g. 1.2.3-beta5), yielding 1,453,789 packages with at least one update (of 2,869,085 packages). We then filter out 52,279 packages that do not have consistent intra-group chronological orders.</p><p>With all updates and version increment types identified, we examine the distribution of the three update types across the whole population, and then compare to the subgroups of updates that introduce and patch vulnerabilities. Updates that patch vulnerabilities are identified directly in the scraped advisory database, while we identify versions that introduce vulnerabilities as the minimal version containing that vulnerability. To avoid the bias introduced by some packages having a large number of updates, our top-level aggregation is among packages rather than updates. For each package, we identify the proportion of its updates of each type (segmenting by security effect), and then visualize this percentage across all the packages. This enables us to make conclusions about how packages and package developers generally handle incrementing semver numbers during updates. In addition, note that when segmenting by updates that introduce vulnerabilities, we are not attempting to study malware, rather updates that (probably inadvertently) introduce a vulnerability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. RQ3: Out-of-Date Dependencies and Update Flows</head><p>The properties examined thus far have been local properties of each package, in that each package has been analyzed individually. We now wish to answer how out-of-date NPM packages typically are, and how long it takes updates to flow </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>System Architecture</head><p>Fig. <ref type="figure">1</ref>: Overview of our system architecture.</p><p>to downstream packages. Both of these properties rely on all the packages in the transitive dependency closure of a downstream package. However, reasoning precisely about how dependencies are solved is challenging both because NPM's dependency solving algorithm is complex (Section II-2), and because we wish to parameterize this over time.</p><p>In order to compute solutions accurately and at different points in time, we use vanilla NPM's solver combined with a proxy that emulates the world state at any given point in history (described in Section IV-C). With this key tool, we then perform two experiments: first we solve the dependencies of the most recent version of every package in NPM and observe how many packages have out-of-date dependencies; we then explore how updates flow to downstream packages by solving the dependencies of the downstream package at different points in time until it receives the update.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. RQ4: Analyzing Code Changes in Updates</head><p>After having examined how developers use constraints and version numbers in isolation, we next align that with a highlevel characterization of what updates actually change. For every identified update, we decompress the packaged code from both versions, and look for file changes. We then classify changes as modifying dependencies, code (.js, .ts, .jsx, .tsx), both, or neither. We then examine the distribution of these types of changes segmented by semver increment type, again normalizing per-package to avoid biasing towards packages with more updates.</p><p>Analyzing at a deeper level is possible with our dataset, but is beyond the scope of this paper. Note that many packages upload compiled or minified JavaScript code, which makes it difficult to even look at simple line-by-line diffs. In addition, we could have chosen to count other file types as code (.sh, etc.), but we chose to focus on JavaScript code.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. SYSTEM ARCHITECTURE</head><p>In order to perform our methodology, we needed a system that could scrape and store all metadata and tarball data, and allow us to perform analyses and experiments on both the metadata and tarball data. This system needs to be able to run on our academic Slurm-backed <ref type="bibr">[26]</ref>, <ref type="bibr">[27]</ref> HPC cluster. To solve this problem, we designed our own system, organized into 3 primary components (Figure <ref type="figure">1</ref>):</p><p>1) The Metadata Manager, which continually scrapes data from NPM and periodically from the GitHub Security Advisory Database; 2) the Job Manager, which receives job requests (typically tarball download or compute jobs) from the Metadata Manager and then coordinates job execution and distributed file system locks; and 3) the Compute Cluster, in which we can spawn worker nodes and access a networked file system. We now explain how we accomplish the primary tasks required by our methodology.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Metadata Acquisition</head><p>NPM stores metadata in a CouchDB database. CouchDB is a document-oriented JSON database and is a good fit for NPM because it is schemaless and allows for arbitrary nesting of JSON objects, such as the package.json file. For performing data analysis we find it to be a poor fit due to the extremely loose structure. There is almost no validation of the package.json files in the CouchDB, making it difficult to use for analyses without first cleaning the data.</p><p>The Metadata Manager (top left of Figure <ref type="figure">1</ref>) continually receives metadata changes from NPM via their changes API <ref type="bibr">[28]</ref>, validates those changes, and inserts the data into PostgreSQL <ref type="bibr">[29]</ref>. Additionally, the Metadata Manager periodically scrapes the GitHub Security Advisory Database and imports the security metadata into PostgreSQL as well. RQ1 and RQ2 can be answered entirely via issuing PostgreSQL queries to the Metadata Manager.</p><p>When metadata changes are received that contain URLs to new package tarballs, the Metadata Manager enqueues a tarball download job to then be handled by the Job Manager.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Tarball Data Acquisition and Compute Cluster</head><p>For scraping and storing package tarballs, we need to be able to store tens of millions of tarballs, while allowing for both concurrent writes to the storage since new tarballs are downloaded continually, as well as concurrent reads from the storage when performing analyses.</p><p>The worker nodes within the Compute Cluster are connected via a networked file system. One interesting approach would be to use a technology such as Hadoop <ref type="bibr">[30]</ref> on top of the networked file system to accomplish this. However, we did not explore this approach out of concern of Hadoop's scalability with regards to storing many small files <ref type="bibr">[31]</ref> (our use case). In addition, we are are unsure if Hadoop can run correctly and efficiently on top of a networked file system.</p><p>Instead, we store tarball data in a custom-built blob storage system stored on the networked file system (bottom right of Figure <ref type="figure">1</ref>). The Job Manager (top right of Figure <ref type="figure">1</ref>) controls access to the blob storage, keeping track of byte offsets and coordinating locks for writing, while individual worker nodes in the Compute Cluster perform the networked disk I/O.</p><p>Tarballs are downloaded when the Job Manager receives a download job request from the Metadata Manager, at which point it assigns the download job to a worker node. Similarly, the Metadata Manager may also send compute job requests, which the Job Manager handles by distributing to many worker nodes and optionally allowing each to perform lockless readonly operations from the blob storage.</p><p>This system allows us to continually scrape and store tens of millions of tarballs, and to efficiently retrieve them for computation when answering RQ4. Additionally, while RQ3 does not read from the blob storage, it follows the same compute workflow.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Time-Traveling Dependency Resolver</head><p>In order to carry out our experiments outlined in Section III for RQ3, we needed to be able to observe how a package's dependencies would have been solved at arbitrary points in NPM's history. We built a proxy server that can be used with vanilla NPM to enable time-travel dependency resolving.</p><p>NPM's command line tool enables the user to specify a custom package registry to use in place of npmjs.com. To use our time-traveling resolver, we specify a registry base URL pointing to our proxy server that includes in the URL the timestamp to time-travel to. The proxy server then receives the timestamp and can then rewrite responses from npmjs.com to remove versions of packages after the timestamp. Since this does not rely on the rest of our system, it is extremely easy to setup and use. However, in order to scale the computation across the dataset, we use the compute capabilities discussed above in Section IV-B.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>V. RESULTS</head><p>At a high level, we would consider a package ecosystem to be healthy with regards to update distribution when updates that are positive (performance improvements, bug fixes, security patches, etc.) can be quickly and easily adopted by downstream dependencies, while disruptive changes (security vulnerabilities, malware, etc.) flow more slowly. In NPM, the flow of updates is determined by two factors: how do downstream developers tend to specify version constraints for dependencies (RQ1), and how do upstream developers tend to increment their version numbers when releasing updates (RQ2). We start by explaining the overall structure and general properties of the dataset. Then we move on to discuss RQ1 and RQ2 separately, and finally we consider how RQ1 and RQ2 intersect in practice in the ecosystem (RQ3), and how they are related to the actual contents of the updates (RQ4).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Dataset Structure and General Properties</head><p>As discussed in Section I our collected data is split into two parts: 1) Ecosystem Metadata: This includes the full list of packages (2,663,681 packages), versions of every package <ref type="bibr">(28,941,927 versions)</ref>, and metadata for every version including version upload times, version numbers, dependencies, descriptions, links to repositories, and more. We also have a full scrape of all security advisories for NPM packages, including data on which versions are vulnerable and which version(s) patch the vulnerability. 2) Tarballs of published packages: The full source tarball of every version of every package <ref type="foot">3</ref> has been downloaded by our system.</p><p>Before diving into the core research questions, we first discuss general properties of the dataset. Figure <ref type="figure">2</ref> displays three distributions regarding our main objects of interest: updates and dependencies.</p><p>Figure <ref type="figure">2a</ref> displays an ECDF (empirical cumulative distribution function) of the distribution of the time between updates of packages, computed across 1,401,510 packages and 16,547,653 mined updates (Section III-B). A surprising finding is how quickly updates are pushed out in many cases, with 25% of updates spanning only 39.87 minutes or less, and 50% of updates spanning 22.71 hours or less. However, a long tail of updates exists, with the top 25% of updates spanning 7.78 days or longer, and 10% spanning 40.12 days or longer. On average, updates span 21.03 days. A manual inspection of the data suggests that update behavior is quite bursty, with developers releasing multiple updates in rapid succession, and then going silent for long periods of time; however, this hypothesis should be investigated more thoroughly.</p><p>Figures <ref type="figure">2b</ref> and<ref type="figure">2c</ref> display ECDFs of the distributions of the numbers of (transitive) dependencies and downstream packages (i.e. transitive reverse dependencies), respectively. We selected the most recent non-prerelease version of every package with at least one update (to filter out abandoned packages), yielding 1,401,510 packages. We then used our time-traveling variant of NPM to resolve their dependencies and collect transitive dependency relations between packages, disregarding versions. Solving dependencies failed on some packages, due to both true solving failures with NPM (e.g. missing dependencies) and transient system failures (discussed more in Section VII) in the compute cluster. In total, our experiments include successful executions of NPM's dependency solver on 696,419 packages. The data shows that on average packages have 167.87 dependencies, and 95% of packages have solution sizes of 636 or fewer dependencies, with the largest solutions reaching up to 1,641 dependencies.</p><p>When turning to downstream packages however (Figure <ref type="figure">2c</ref>), the situation is quite asymmetrical, as there is a vastly longer tail of packages with massive amounts of downstream packages. The top 3 depended-upon packages that we observed were: 1) supports-color (does a terminal support color?, 624,883 downstream packages), 2) debug (logging library, 571,547 downstream packages), and 3) ms (time conversion library, 515,684 downstream packages). On the other hand, a large amount of packages are unused except by a handful of downstream packages, with 50% of packages having 2 or (a) An ECDF of the time in days between the publication of two versions of a package. Note that this plot specifically excludes updates for non-prerelease versions of packages.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. RQ1: Version Constraint Usage</head><p>As described in Section III-A, developers can specify version constraints in different ways, which controls the installation of newer versions of those dependencies. Figure <ref type="figure">3</ref> shows the frequency of each main type of version constraint published in each year since 2010, the year that NPM launched. For each year, we include only packages that had at least one release published, and if a package released multiple versions in that year, we include only the most recently published nonprerelease version that year. In 2022, there were a total 429,265 packages with at least one release, and across all the years 1,678,681 distinct packages.</p><p>There are several interesting trends in constraint usage over time. First, about 78.36% of all initial dependencies were specified as accepting any versions greater than some particular version (Geq, purple bars), such as "react" : "&gt;= 1.2.3". Developers then abandoned using Geq constraints within the first 3-4 years of NPM, likely because they became unmaintainable as libraries began to introduce breaking changes that would be automatically applied by Geq constraints. Second, even though constraints that are flexible in the minor component (Minor, green bars) currently represent a majority of dependencies, the phenomenon of using minor flexible constraints only started in 2014, and then rapidly expanded after. The expansion of minor flexible constraints coincides with the decreased usage of bug component flexible constraints (Bug, blue bars). Third, developers have recently gravitated towards using only two types of constraints almost exclusively: exact version constraints (Exact, red bars) and minor component flexible constraints. Together, those types represent over 94.85% of constraints in 2022. Finally, the percentage of dependencies that are potentially able to automatically receive updates (everything below the red bars) has stayed relatively stable throughout the entire life of NPM, and is currently about 87.32% of all dependencies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. RQ2: Semantic Versioning in Updates</head><p>While RQ1 examined the usage of semantic versioning when specifying dependencies, RQ2 examines the usage of semantic versioning in deploying releases of those dependencies. Figure <ref type="figure">4</ref> displays boxplots where each observation represents what percentage of a package's updates are one of the three semver increment types, normalized across security effect. This analysis includes 1,401,510 packages and 16,547,653 updates, as described in Section III-B.</p><p>We find that in the no security effect category (the vast majority of updates), the most common updates by far are bug semver increments, with 75% of packages having 66% or more (lower quartile of left-most red box). Next most popular Fig. <ref type="figure">4</ref>: A boxplot visualizing the distribution of percentages of packages' updates by semver increment type, segmented across security effects. Within each security effect the percentages across semver increment types are normalized. are minor semver increments, and finally least most popular are major semver increments.</p><p>However, when we consider updates that introduce vulnerabilities, we see a different story. Most packages introduce vulnerabilities via major semver increments, indicating that vulnerabilities are often introduced when packages developers release major new versions possibly consisting of many new features and significant structural changes to the code base. We did however find 29 outlier packages that introduced a vulnerability in at least one bug update. A particularly interesting example is an update to the ssri package (a cryptographic subresource integrity checking library, 23M weekly downloads) from version 5.2.1 to 5.2.2. The update attempted to patch a regular expression denial of service vulnerability, but inadvertently increased the severity of the vulnerability by changing the worst-case behavior from quadratic to exponential complexity <ref type="bibr">[32]</ref>. This highlights the challenge package developers face in needing to quickly release patches to vulnerabilities, while needing to be extremely careful when working on security-relevant code and releasing it through bug updates that will be easily distributed to downstream packages.</p><p>Finally, in the case of vulnerabilities being patched, almost all patches are released as bug semver increments, which means that the 87.32% of non-exact constraints shown in Figure <ref type="figure">3</ref> would potentially be able to receive them automatically. However, a handful of outlier packages have released vulnerability patches as non-bug updates (we found 358 such updates across 298 packages). From manual inspection, it appears that many of these updates include the fix for the security vulnerability mixed in with many other changes, rather than the vulnerability fix being released independently. For example, update 1.6.0 to 1.7.0 of the xmlhttprequest package (1.2M weekly downloads) fixed a high-severity code injection vulnerability <ref type="bibr">[33]</ref>. The security-relevant part of the update is only 1 line, but 892 lines were modified in the update. Without further investigation we do not know why some developers have chosen to include security patches as part of larger updates rather than as standalone updates.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. RQ3: Out-of-Date Dependencies and Update Flows</head><p>1) How out-of-date are packages' dependencies?: Version constraints and semver update types work in tandem to control the flow of updates to downstream packages, across many chains of transitive dependencies. Whether a downstream package receives up-to-date dependencies depends not only on the constraints at the downstream package and the type of semver increment at the reverse dependency, but also on packages in the middle of a transitive dependency chain.</p><p>In this experiment, we select the latest version of every package with at least one update (1,401,510 packages). We then use our time-traveling variant of NPM to solve the package's dependencies at the time the latest version was uploaded (T P ). We then observe which of its installed dependencies are out-of-date, where a dependency with version V D and upload time T D is out-of-date if another version V &#8242; D of the dependency has an upload time</p><p>We then define the out-of-date time as T &#8242; D -T D for the largest such T &#8242; D . After accounting for transient system failures, 696,419 packages were solved successfully.</p><p>Figure <ref type="figure">5a</ref> displays an ECDF of the distribution of the percentage of each package's dependencies that are out-of-date. There is a group of packages, about 17.08%, that have fully up-to-date dependencies. However, almost all of these have very few dependencies, only 3.17 dependencies on average compared to 167.87 dependencies for the whole sample. In other words, these fully up-to-date packages are packages that live primarily on the far left side of the ECDF in Figure <ref type="figure">2b</ref>.</p><p>Moving beyond the spike of up-to-date packages, most packages have at least some out-of-date dependencies, with 62.94% of packages having 25% or more of their dependencies out-of-date. Not only are packages often out-of-date, but they are often out-of-date for quite a while. Among packages with at least one out-of-date dependency, Figure <ref type="figure">5b</ref> displays an ECDF of on average how out-of-date each package's dependencies are. Half of all packages with out-of-date dependencies have on average dependencies that are 173.87 days old or older, with a long tail of 5% of packages with dependencies that are on average 527.38 days old or older. In contrast, updates are released within 21.03 days on average, and 50% are released within only 22.71 hours (Figure <ref type="figure">2a</ref>).</p><p>There can be a variety of reasons why packages have outof-date dependencies, some of which are intentional, such as developers choosing to stay on older versions of libraries rather than rewrite code to handle breaking changes.</p><p>2) How rapidly do updates flow downstream?: We now wish to understand how updates flow to downstream packages, and how developers respond when manual intervention is required. For the most recent update prior to 2021 of every package, we randomly selected 50 downstream packages that Days for downstream to receive update Cumulative percentage of flows Fig. <ref type="figure">7</ref>: An ECDF plot of how long it takes for an update flow that is blocked to be resolved. downstream package specified a flexible enough constraint to allow for the intervention of the package in the middle to be adopted. Since this type of flow happens very rarely, this indicates that downstream packages typically have constraints that are equally or more restrictive than their (transitive) dependencies. This makes sense from a software engineering perspective, as the deeper packages (those closer to libraries rather than applications) have more incentive to use flexible constraints as they are likely to be reused in contexts with otherwise conflicting constraints.</p><p>The final type of flow is when the out-of-date dependency is eventually deleted rather than updated (&#8226; &#8226; &#8226; &#8594; intervention &#8594; deleted dependency). This occurs in only 0.29% all analyzed update flows, indicating that developers do not generally delete dependencies. More investigation could be done to understand why developers choose to delete dependencies in a small number of cases.</p><p>Among the update flows that are blocked due to restrictive constraints, almost all update flows are unblocked via manual intervention quite rapidly. Figure <ref type="figure">7</ref> shows an ECDF of the distribution of how many days it takes for each update flow to be unblocked. The majority of blocked update flows (91.74%) are unblocked within 1 day, with a tail trailing off to 25 days or more. The surprising speed of update flows being unblocked is due largely to the fact that many packages that depend on each other are developed by the same contributors, and they will often bump version numbers and update dependencies of their packages nearly simultaneously.</p><p>Our results suggest that most updates effectively flow to downstream packages, while Figure <ref type="figure">5</ref> suggests that most downstream packages have at least some out-of-date dependencies. More investigation should be carried out on this phenomenon, but we suspect this is due in part to the number of dependencies per-package (Figure <ref type="figure">2b</ref>) and the rate of updates (Figure <ref type="figure">2a</ref>). With packages having an average of 167 dependencies, and updates being released on average every Fig. <ref type="figure">8</ref>: A boxplot displaying the distribution of the percentage of packages' updates grouped by semver increment type that change only code (.js, .ts, .jsx, .tsx), only dependencies, both, or neither.</p><p>21 days, we would expect that for an average package every day multiple dependencies release updates and potentially go out-of-date. Even with many updates being adopted instantly or quickly, some dependencies will become stale. This phenomena might also be explained by our methodology for this experiment, as we selected only packages that were already up-to-date at the time of our analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>E. RQ4: Analyzing Code Changes in Updates</head><p>We now turn to inspecting the contents of package updates rather than metadata analysis. Semantic versioning can only be useful if package developers release updates that are in accordance with what downstream packages expect from bug, minor, or major semver increments. In this paper, we focus on providing a high-level characterization of what updates generally consist of in the NPM ecosystem, across the different update types. While more fine-grained analyses and related applications would be interesting and useful, it is beyond the scope of this paper, and we defer discussion of ongoing and future work to Section VI. However, we believe that our dataset may be a useful building block for evaluation within the active research area of update analysis systems.</p><p>Figure <ref type="figure">8</ref> displays a boxplot where each observation is the percentage of a package's updates within each semver increment type that change only code (.js, .ts, .jsx, .tsx), only dependencies, both, or neither. Note that updates categorized as neither may include other changes such as modifications to other file types (README, CSS, etc.) or other metadata changes besides dependencies. This uses the same set of packages and updates as from Figure <ref type="figure">4</ref>, intersected with those we were able to successfully download tarballs for, giving in total 1,339,684 packages and 14,903,021 updates.</p><p>First, we see that bug updates often contain no changes to code files, or to dependencies. 50% of packages change neither code nor dependencies in about 20% or more of their bug updates, while 25% of packages change neither in a majority (64%) of their bug updates. A manual inspection of the data suggests that some of these updates consist of changes to metadata (listed contributors, descriptions, READMEs) or to configuration files (.json, .yaml, etc.), while other updates truly change nothing. However, more investigation on our data could be done to quantify this more precisely. Second, while it is not common to do so, 25% of packages do occasionally release bug updates which only modify dependencies (11% or more of bug updates). Looking at minor and major updates, the frequency of packages modifying neither or only one or the other decreases, and when looking at major updates, most packages modify both code and dependencies simultaneously.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. DISCUSSION</head><p>Considering the results that we presented in Section V, we find a number of implications for software developers, ecosystem maintainers and researchers. Developers consuming dependencies face persistent trade-offs between security, reliability, and technical lag. We identify opportunities for ecosystem maintainers to reduce some of this friction and point towards longer-term research directions to address some of the underlying challenges in package ecosystems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. For Developers</head><p>Our findings for RQ1 indicate that NPM has largely consolidated around using either exact or minor-flexible (&#710;) constraints, with the greatest proportion of dependencies specified as minor-flexible. In practice this means that minor updates will flow to downstream packages nearly as easily as bug updates, which we confirmed experimentally in RQ3.2, with 95.42% of sampled bug updates and 86.55% of sampled minor updates flowing automatically to downstream packages. This finding is important for library maintainers, who might expect that downstream packages will manually inspect minor updates for compatibility.</p><p>Overall, there is a misalignment between the way that versions are released and the way that they are depended on, as versions that are released as minor vs. bug updates commonly have distinct characteristics (Figures <ref type="figure">4</ref> and<ref type="figure">8</ref>), while dependencies in downstream packages rarely distinguish between minor or bug updates (Figure <ref type="figure">3</ref>). Specifically, we find that 81.19% of updates are released as bug updates, but 84.01% of dependency constraints accept bug and minor updates. While both bug and minor updates are supposed to maintain backwards compatibility, since minor updates may be more likely to include (inadvertent) breaking changes, developers may benefit in stability by using bug-flexible ( &#8764; ) constraints rather than minor-flexible constraints, which would still receive 81.19% of updates. This motivation may be even stronger for security-cautious developers as our results suggest that minor updates introduce vulnerabilities more often than bug updates, however they must remain careful as even bug updates occasionally introduce vulnerabilities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. For Ecosystem Maintainers</head><p>Our findings in RQ2 indicated that some developers release security patches with minor and sometimes, even major version increments. This finding is concerning as it makes it more difficult for downstream packages to receive the security fixes. This suggests that ecosystems may benefit from ecosystem maintainers attempting to have tighter communication with package developers around security patches, and help ensure that security patches are released in a timely manner, with minimal changes, and as semver bug updates.</p><p>Our findings in RQ3.2 show a small fraction of update flows that are blocked by dependencies in the middle. This is perhaps the most frustrating case for developers, as it is difficult to remedy the situation. One option is to use NPM's overrides feature <ref type="bibr">[34]</ref>, which allows the downstream package to forcefully override versions of transitive dependencies, even if this breaks version constraints. While this can be effective in the short-term, one challenge is that the developer now has the maintenance burden of removing the override when it is no longer necessary, or else face broken builds in the future. To improve the developer experience, ecosystem maintainers could 1) reduce the frequency of update propagation blockage by combining our analysis with centrality analysis to find critical packages that often block update flows, and work with them to address the situation; and 2) improve ecosystem tooling around overrides to help developers automate the removal of overrides when no longer necessary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. For Researchers</head><p>Our findings in RQ2 indicate that while NPM developers generally try to follow semver conventions, they do not always do so consistently, and thus developers of downstream packages can not be entirely confident about what exactly they will receive when updating dependencies (particularly if malicious developers release malware!). This suggests a useful and broad design space of static or dynamic program analysis tooling that could help give insight on what actually changes in an update. Such tools could aim to check for semver compliance <ref type="bibr">[35]</ref>, <ref type="bibr">[36]</ref>, check that an update actually patches the claimed vulnerability correctly, check for likely buggy changes in behavior <ref type="bibr">[37]</ref>, or detect malware. It may be particularly interesting to examine trends in semver compliance over time, as our analysis shows clear trends in the changing popularity of dependency constraints between 2010-2022.</p><p>There is already promising ongoing work in some of these directions, particularly malware detection <ref type="bibr">[7]</ref>, <ref type="bibr">[22]</ref>, <ref type="bibr">[38]</ref> via metadata and lightweight syntactic features. In RQ4 we found that a significant portion of packages publish bug updates that change neither js, ts, jsx, tsx files, nor dependencies, which suggests that a sizeable portion of updates may be changing other types of files, and such changes may be an effective place for bad actors to hide malicious changes. Whether the aim of this work is malware detection, bug detection, or other analyses, our results suggest that such tooling should aim to handle multiple file types, such as code, config files, embedded binaries, shell scripts, etc.</p><p>The analysis in RQ3.1 finds that there is a substantial amount of technical lag in NPM packages, so tooling to help developers reduce technical lag could be quite impactful. In our prior work we built a tool, MAXNPM <ref type="bibr">[14]</ref>, which allows developers to solve dependencies in a way that minimizes technical lag (or other objectives) while still satisfying current version constraints, but does not help when constraints themselves are out-of-date. Complementary future research, such as what Jayasuriya suggests <ref type="bibr">[39]</ref>, could assist developers in performing these manual updates by helping with code migration in response to breaking changes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VII. THREATS TO VALIDITY</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. External Validity</head><p>We were unable to reliably scrape packages that have been deleted (for malware, copyright violations, etc.) or unpublished (voluntarily by the developer) from NPM, and thus we excluded these in our analyses. For this reason our results might not generalize to malware or other types of packages that are often deleted.</p><p>Other than deleted packages, we consider the entire ecosystem, including so called "trivial" packages <ref type="bibr">[8]</ref>, <ref type="bibr">[9]</ref> and packages that seem unimportant (e.g. few reverse dependencies). We believe that it is difficult to tell if a package truly is irrelevant, as even a package with very few reverse dependencies may in fact be an application that has been published on NPM. Furthermore, "low-impact" developers are nevertheless important as their experience with NPM matters for the future of the ecosystem.</p><p>We only obtain packages from NPM, and do not consider GitHub or other sources. As such, this study may not generalize to JavaScript applications (rather than libraries), as only some developers choose to publish their applications on NPM. In addition, some developers may include dependencies by directly copying source files into their packages, which we do not detect. Finally, it is important to be careful when generalizing our results about security vulnerabilities, as we are only able to obtain information about known vulnerabilities, which is likely a small subset of all vulnerabilities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Internal Validity</head><p>Our system described in Section IV has a lot of moving parts, and it is possible that there are bugs in our system that could affect the results of our experiments. For example, we may have missed some packages in our scraping process, or we may have incorrectly downloaded some packages. We believe that this is unlikely, as we have written unit tests for our system and have tested it on a small subset of packages, and have not found any bugs.</p><p>While running millions of package installations for RQ3 (Section V-D) we caused intermittent failures on our compute cluster by overflowing /tmp. Since these failures were a function of system state and not of packages, we do not believe this biased our results. To check this, we computed the mean and median of the number of direct dependencies of successful and failed packages, and found that successful packages have a mean of 9.91 direct dependencies and a median of 5, while the failed packages have a mean 10.93 direct dependencies and a median of 6. This suggests that failed packages were a bit larger, but not enough to make our successful packages unrepresentative. Outliers with a large number of dependencies existed with both failed and successful packages.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Construct Validity</head><p>Throughout RQ2-RQ4 we use our algorithm for computing updates as described in Section III-B. Since there is no ground truth for correctly mined updates, one may wish to consider refinements to this algorithm. In particular, one may wish to have fine-grained equivalence classes by considering minor components as well. However, this would not change the results where our algorithm already succeeds, and since the rejection rate is already quite low (1.8%) we did not believe a more complex algorithm justified the risk of analysis bugs.</p><p>In RQ4 we defined code changes to mean files with extensions .js, .ts, .jsx, or .tsx. This is because we wanted to focus on JavaScript and TypeScript code, but this may have caused us to miss some JavaScript or TypeScript code with other extensions. Depending on the purpose, future work might want to consider a broader definition of what counts as code, such as shell scripts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VIII. CONCLUSION</head><p>We present a large-scale analysis of semantic versioning in NPM, and a full, reusable dataset of complete package metadata and tarball data from NPM. We find that there is a higher risk of security vulnerabilities being introduced through minor rather than bug (i.e. patch) semver updates, suggesting a motivation for developers to use bug-flexible constraints ( &#8764; ), even while the NPM ecosystem has largely abandoned them favor of minor-flexible constraints (&#710;). While we find that most security patches are introduced in bug updates, we find a disturbing set of outliers that are released as minor or even major updates, potentially causing slower adoption of security patches. Future work examining the NPM ecosystem might build on our dataset and tooling, examining the contents of updates for bugs and/or vulnerabilities, along with mechanisms to mitigate technical lag.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IX. DATA AVAILABILITY</head><p>Our artifact permanently archived on Zenodo <ref type="bibr">[15]</ref> contains our tools and the metadata from our dataset. At <ref type="url">https: //dependencies.science</ref> we post: 1) continually updating snapshots of our data (including the contents of all packages), and 2) the full implementations of both our scraping systems and our data analysis scripts. We intend the site to be a useful resource for other researchers looking at NPM.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0"><p>In this paper we use "bug" rather than the standard "patch" semver terminology, so as to disambiguate from the notion of security patches.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1"><p>Please see https://dependencies.science for access to up-to-date metadata, tarball data, and source code. The original artifact excluding tarball data is available on Zenodo<ref type="bibr">[15]</ref>.</p></note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2"><p>excluding deleted content, which we describe in Section VII</p></note>
		</body>
		</text>
</TEI>
