<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Decentralization in Bitcoin and Ethereum Networks.</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>05/01/2018</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10094537</idno>
					<idno type="doi"></idno>
					<title level='j'>Proc. of the Financial Cryptography and Data Security Conference</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Adem Efe Gencer</author><author>Soumya Basu</author><author>Ittay Eyal</author><author>Robbert van Renesse</author><author>Emin Gün Sirer</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Blockchain-based cryptocurrencies have demonstrated howto securely implement traditionally centralized systems, such as currencies, in a decentralized fashion. However, there have been few measurement studies on the level of decentralization they achieve in practice.We present a measurement study on various decentralization metrics oftwo of the leading cryptocurrencies with the largest market capitalization and user base, Bitcoin and Ethereum. We investigate the extent ofdecentralization by measuring the network resources of nodes and theinterconnection among them, the protocol requirements affecting the operation of nodes, and the robustness of the two systems against attacks.In particular, we adapted existing internet measurement techniques andused the Falcon Relay Network as a novel measurement tool to obtainour data. We discovered that neither Bitcoin nor Ethereum has strictlybetter properties than the other. We also provide concrete suggestionsfor improving both systems.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1I n t r o d u c t i o n</head><p>Cryptocurrencies are emerging as a new asset class, with a market capitalization of about $150B as of Sept 2017 <ref type="bibr">[15]</ref>, a growing ecosystem, and a diverse community. The most prominent platforms that account for over 70% of this market are Bitcoin <ref type="bibr">[57]</ref> and Ethereum <ref type="bibr">[28,</ref><ref type="bibr">70]</ref>. The underlying technology, the blockchain, achieves consensus in a decentralized, open system and enables innovation in industries that conventionally relied upon trusted authorities. Some examples of such services include land record management <ref type="bibr">[3]</ref>, domain name registration <ref type="bibr">[51]</ref>, and voting <ref type="bibr">[55]</ref>. The key feature that empowers such services and makes these platforms interesting is decentralization.Withoutit,suc hservicesaretec hnologically easy to construct but require trust in a centralized administrator.</p><p>Decentralization is a property regarding the fragmentation of control over the protocol. In the Bitcoin and Ethereum protocols, users submit transactions for miners to sequence into blocks. Better decentralization of miners means higher resistance against censorship of individual transactions. For communication, Bitcoin and Ethereum also have a peer-to-peer network for disseminating block and transaction information. Both Bitcoin and Ethereum also contain full nodes, which serve two critical roles: (1) to relay blocks and transactions to miners <ref type="bibr">(2)</ref> and to answer queries for end users about the state of the blockchain. Understanding the network properties of full nodes is crucial for protocol design and analysis of each network's resilience to attacks. Ongoing research explores ways to make the Bitcoin and Ethereum networks more decentralized without measurements on the underlying network. Hence, debates and decisions about the underlying networks are often based on assumptions rather than measurement.</p><p>In this paper, we present a comprehensive measurement study on decentralization metrics in these operational systems and shed light on whether or not existing assumptions are satisfied in practice. We adapt prior Internet measurement techniques for Bitcoin and Ethereum and use novel approaches to obtain application layer data. Our main data sources are (1) direct measurements of these networks from multiple vantage points, (2) aB i t c o i nr e l a yn e t w o r kc a l l e d Falcon that we deployed and operated for a year, and (3) blockchain histories of Bitcoin and Ethereum. Our study presents findings regarding the network properties, impact of protocol requirements, security, and client interactions.</p><p>This paper makes three contributions. First, it provides new tools and techniques for measuring blockchain-based cryptocurrency networks. The key tool introduced here is the Falcon relay network that we built to serve as a backbone for ferrying blocks. This network was deployed for Bitcoin across five continents, providing a unique vantage point on pruned blocks. Second, we perform a comparative study of decentralization metrics in Bitcoin and Ethereum. Our key findings are: (1) the Bitcoin network can increase the bandwidth requirements for nodes by a factor of 1.7 and keep the same level of decentralization as 2016, (2) the Bitcoin network is geographically more clustered than Ethereum, with many nodes likely residing in datacenters. (3) Ethereum has lower mining power utilization than Bitcoin and would benefit from a relay network, and (4) small miners experience more volatility in block rewards in Bitcoin than Ethereum.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2B i t c o i n a n d E t h e r e u m</head><p>Bitcoin and Ethereum use Nakamoto consensus <ref type="bibr">[5-7, 57, 38]</ref> to regulate transaction serialization in their blockchains. While architecturally very similar, these systems differ significantly in terms of their API, abstractions, and wire protocol.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">The Bitcoin Protocol</head><p>Bitcoin is a protocol that sequences transactions into groups called blocks. The protocol targets a block production interval of 10 minutes with a maximum size of 1 MB. At the time of our measurements, the last 100 blocks had a 0.99 MB median block size and a 9.8 minute mean interval. The wire protocol implements apeer-to-peernet w orkbasedonfloodingbloc kandtransactionannouncemen ts.</p><p>The peer to peer network is formed through point to point links. To form a link, clients establish a TCP connection and perform a protocol-level three-way handshake. The protocol-level handshake exchanges the state of each client, such as the height of the blockchain and a version string associated with the software 3M e a s u r e m e n t I n f r a s t r u c t u r e Blockchain-based cryptocurrencies operate on global peer-to-peer networks that span multiple administrative domains. Measurement of such networks concerns the exploration of the relationship between peers, the capabilities of individual peers, and the properties of the system as a whole-e.g. its security and fairness. To characterize Bitcoin and Ethereum, we deployed Blockchain Measurement System (BMS ), a measurement system than ran experiments of varying duration-from a few days up to 12 months. Network Properties. BMS uses multiple vantage points in order to gain a comprehensive view of the cryptocurrency networks. To capture the evolution of these networks, BMS has been continuously collecting data regarding the provisioned bandwidth of peers and peer-to-peer latency. BMS first connects to a peer, collects measurements, and then disconnects before proceeding to the next peer. These measurements target (1) Bitcoin nodes connected over IPv4, IPv6, and Tor <ref type="bibr">[23]</ref> and (2) Ethereum nodes connected over IPv4. As of May 2017, Ethereum does not have any Tor nodes mainly because Tor is exclusively TCP, whereas Ethereum node discovery is UDP-based. Moreover, this study excludes Ethereum's IPv6 network because BMS was unable to discover enough nodes to reach generalized conclusions. Table <ref type="table">1</ref> shows the timeline of the data collection for each network and the number of nodes measured in each measurement.</p><p>To estimate the p eer-to-p eer latency, BMS uses multiple vantage p oints geographically distributed across the world. Figure <ref type="figure">1</ref> shows the geographic distribution of the measurement infrastructure. 15 out of 18 nodes reside in PlanetLab's global research network <ref type="bibr">[14]</ref> and the remaining three nodes are part of Cornell's academic network, located in Ithaca, NY.</p><p>To measure the provisioned bandwidth of nodes in Bitcoin and Ethereum, BMS used nodes with extensive resources. In particular, measuring the maximum bandwidth that Bitcoin and Ethereum nodes have access to requires nodes with (1) high download capacities to ensure that the bottlenecks are not in the measurement apparatus, and (2) sufficient disk capacities to store detailed results. Since these machines need access to orders of magnitude higher bandwidth capacity than what is achievable on shared infrastructure, such as PlanetLab nodes, some BMS data was collected using dedicated, well-provisioned beacon nodes located at Cornell University.</p><p>Finally, BMS needs to pick a sample of nodes from the Bitcoin and Ethereum networks. As a sample, BMS uses a list containing nodes from Bitcoin and Ethereum node crawling sites <ref type="bibr">[1,</ref><ref type="bibr">31]</ref>, and a locally deployed Ethereum supernode configured with a high peer limit. Interpretations in this paper assume that inferences made from the reachable public nodes are representative of their entire networks. In reality, these networks contains nodes that are not visible to the public, e.g. they are behind a NAT or a firewall. One such class of nodes are part of mining.Whilemuchofthemininginfrastructureisprivate,priormeasurement work shows that mining operations often have gateway nodes to communicate with the peer-to-peer network <ref type="bibr">[56]</ref>. The properties of internal mining pool nodes are orthogonal to the focus of this paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Blockchain Information. An a i v ea p p r o a c ht oo b t a i n i n gi n f o r m a t i o na b o u tt h e</head><p>blockchain would be to simply run a Bitcoin and Ethereum node. However, this precludes information that cannot be obtained through the respective wire protocols. Many important decentralization metrics center around the analysis of blocks that are not part of the main blockchain. In Ethereum, many of these blocks become uncles which can simply be requested through the wire protocol.</p><p>In Bitcoin, however, a block that is not part of the main blockchain simply becomes pruned.P r u n e db l o c k si nB i t c o i nh a v en oe ff e c to nt h es t a t eo ft h e system, they are deleted by clients without impacting correctness. Thus, it is crucial to connect directly to miners to capture pruned blocks.</p><p>Ac r i t i c a lc o m p o n e n to fB M St oo b s e r v ep r u n e db l o c k si st h eF a l c o nR e l a y Network, which relays blocks between Bitcoin miners. The Falcon Relay Network uses cut-through routing to quickly disseminate blocks worldwide, which incentivizes miners to connect to Falcon. Indeed, Falcon is directly connected to at least 36.4% of the entire hashpower in Bitcoin. Since there is just one other operational relay network for Bitcoin <ref type="bibr">[18,</ref><ref type="bibr">16]</ref>, Falcon has observed blocks that have not been seen on other well-connected nodes <ref type="bibr">[8]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>4M e a s u r e m e n t s</head><p>In this section, we present the measurements taken by BMS. In each measurement, we describe the methodology, followed by the results of our analysis. As with any measurement study of a large-scale, uninstrumentable artifact, measurements are not perfect; we conclude each section by addressing some potential sources of error and their mitigation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Provisioned Bandwidth</head><p>Provisioned bandwidth is an estimate on a node's transmission capacity characterizing how much bandwidth the node has to communicate with the rest of the cryptocurrency network. Greater provisioned bandwidth helps miners to propagate/collect blocks to/from the network faster. Thus, it becomes more difficult for a malicious miner to situate themselves in the network to achieve the rushing property <ref type="bibr">[35]</ref> and attack the blockchain. Knowledge of provisioned bandwidth also aids in setting protocol parameters, such as the block size and frequency. Methodology. BMS measures the provisioned bandwidth of each peer by requesting a large amount of data from each peer and seeing how fast the peers can stream the data to BMS's measurement nodes. BMS does this by asking for blocks that were first seen over a year ago -similar to how a stale node asks for blocks to sync state. Each request asks for the same set of blocks in Bitcoin and blocks or the corresponding bodies in Ethereum. Next, BMS divides the time into epochs and records the number of bytes received during each epoch. This process continues until either BMS receives all data or a predefined timeout of 30 seconds is reached. A long timeout helps BMS eliminate effects from TCP slow start and other initialization noise as well as identify and eliminate spurious spikes in throughput caused by buffering in the kernel by BMS. Finally, BMS of 56 Mbit/s, as of February 2017. In other words, the provisioned bandwidth of at y p i c a lf u l ln o d ei sn o w1.7&#215; of what it was in 2016.</p><p>Critical system parameters, such as the maximum block size and block frequency, can be increased when the provisioned bandwidth increases. The increase in provisioned bandwidth suggests that the block size can be increased by a factor of 1.7 without increasing centralization beyond its de facto level in 2016. Caveats. As with every measurement technique in the real world, our results above are subject to experimental limitations and expected errors. The accuracy of the measurements may drop under certain circumstances, including the cases where: (1) the network bottleneck lies on the side of the measurement beacon rather than the remote peer, (2) network traffic on the side of BMS interferes with the collected results, (3) the remote peer intentionally shapes the traffic to selectively limit the bandwidth available to BMS, for instance via bandwidth throttling, and (4) different steady state bandwidth between Bitcoin and Ethereum, skewing the numbers for one system over another The setup of our bandwidth infrastructure helps minimize potential inaccuracies due to the first two issues. Moreover, analysis of popular Bitcoin <ref type="bibr">[5]</ref> and Ethereum client implementations <ref type="bibr">[42,</ref><ref type="bibr">61,</ref><ref type="bibr">19,</ref><ref type="bibr">60]</ref> shows that the third case is not supported by this software and would require additional, potentially non-trivial, work to set up. To verify the impact of the last issue, we ran an Ethereum and Bitcoin client and saw that their bandwidth consumption differed by 0.2 Mbps, which introduces about a 1% error on our measurements above.</p><p>In addition to our analysis above, we also expect to see certain artifacts in our data. As noted above, we see clusters of nodes around 10 Mbps and 100 Mbps, which are typical bandwidth capacities of home and EC2 users, respectively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Network Structure</head><p>The structure of the peer-to-peer network impacts the security and performance for cryptocurrencies. A geographically clustered network can quickly propagate a new block to many other nodes. This makes it more difficult for a malicious miner to propagate conflicting blocks/transactions quicker than honest nodes. However, a less clustered network may mean that full nodes are being run by a wider variety of users which is also good for decentralization. Methodology. Since it is not possible to obtain direct measurements between peers we do not control, we use the state of the art estimation techniques to establish bounds and gain insights into network structure.</p><p>Single Beacon Latency. We first collect direct ICMP ping measurements from BMS nodes to all peers in the network. We report the minimum observed ping latency, as it provides a physical bound on the distance to the BMS beacon.</p><p>Peer-to-Peer Latency. Measuring the peer-to-peer latency requires access to the end points. In both Bitcoin and Ethereum, peers do not reveal their neighbors. Hiding the network structure boosts privacy and security <ref type="bibr">[45,</ref><ref type="bibr">56]</ref>, but also makes it harder to infer properties about the network. BMS provides latency estimates for a superset of the actual links between known peers. We do not normalize for the slightly different network sizes, 3390 for Bitcoin and 4302 for Ethereum, as our samples from both networks were very similar. Since measuring peer-to-peer latencies directly is not feasible, we establish bounds from observed latencies from multiple beacons, using techniques from prior literature <ref type="bibr">[37]</ref>. BMS starts with the measurements taken from a single beacon. Then, it uses the triangle inequality to estimate the upper and lower bounds for the latency between peers. Repeating this process from other vantage points yields a set of bounds for each pair of peers. Finally, BMS determines a range for latency estimates between each peer by picking the maximum lower bound and the minimum upper bound. The paper also presents the average of the lower bound and upper bound latency between peers. In this study, BMS includes nodes that do not support the DAO fork <ref type="bibr">[10]</ref> in its measurements for Ethereum.</p><p>Geographical Distance. BMS takes the minimum of repeated latency measurements to eliminate transient network effects and capture the geographic distance between two nodes <ref type="bibr">[43,</ref><ref type="bibr">13,</ref><ref type="bibr">69]</ref>. BMS also uses IP geolocation data to calculate distances between peer nodes as an additional validation on our results. To calculate distances, BMS applies the Haversine formula <ref type="bibr">[63]</ref> using the coordinate values gathered from an IP-based geolocation service <ref type="bibr">[46]</ref>.  <ref type="table">2</ref> and PDF graphed in Figure <ref type="figure">3</ref>. We find that Bitcoin has many more nodes that are closer geographically than Ethereum. Figure <ref type="figure">3</ref> shows that Ethereum's most likely latencies are centered around 120ms, while Bitcoin nodes tend to be clustered around 50ms. Only 13% of Ethereum latencies are under 100ms, while Bitcoin has a surprisingly high 46%. Additionally, the estimated peer-to-peer latency between Ethereum nodes is 26.7% higher than Bitcoin on average. This geographic proximity between nodes, along with the observation that Bitcoin has many nodes with 100 Mbps of provisioned bandwidth (see Section 4.1), seems to indicate that many Bitcoin nodes are run in datacenters. 56% of Bitcoin's nodes and 28% of Ethereum's nodes belong to an autonomous system that provides dedicated hosting services, a difference significant at the 1% significance level.</p><p>Indeed Ethereum nodes are not accumulated in a single geographical region, but are more evenly distributed around the world. Figure <ref type="figure">3c</ref> shows the CDF of distances between peer to peer nodes based on IP geolocation information. The results corroborate our findings based on network latency measurements and show that Ethereum nodes are geographically further apart than Bitcoin. As additional evidence, when we use geolocation on the P2P distances and plot the CDF in Figure <ref type="figure">3c</ref>, we see that Ethereum nodes are further apart than Bitcoin.</p><p>there is no guarantee that they necessarily will do so. We suspect that explicit storage of uncles in Ethereum captures a larger proportion of pruned blocks. Results. Figure <ref type="figure">7</ref> shows the distribution of fairness of 20 miners with the highest mining power. The results indicate that, in both networks, the top four miners generally are more successful at appending blocks to the main chain. We run the Kolmogorov-Smirnov goodness of fit test with a p-value of 0.01 to compare the fairness distributions of Bitcoin and Ethereum. Perhaps surprisingly, we see that the fairness of Ethereum and Bitcoin differ significantly from each other keeping a constant time period. The reason for this difference is a much larger standard deviation in Bitcoin's miner fairness compared to Ethereum (1.72 versus 0.25). The mean of both fairness distributions, however, are very similar, with Ethereum at 1.08 and Bitcoin at 1. <ref type="bibr">22</ref>.</p><p>A high variance results in centralization pressure since smaller miners will have a more difficult time affording the loss of revenue due to a transiently high fairness score. This high variance is a direct result of a significantly smaller number of blocks being generated in Bitcoin. Since Ethereum has a higher block frequency than Bitcoin, smaller miners have a more predictable payoff than larger miners. This makes Ethereum more predictable to mine for smaller miners due to the lower variance in block rewards. Thus, it is important for blockchain protocols to take variance of the block rewards in addition to the mean.</p><p>Simply increasing the block frequency may not be the solution to decrease the variance of block rewards since the mining power distribution may be affected as well. The increased block frequency in Ethereum may be part of the cause of the slightly more centralized mining power distribution (see Section 4.3).</p><p>Sanity Checks. Similar to Section 4.4, our results here also assume that miners voluntarily identify themselves in uncles/pruned blocks. As before, if the miners are lying, they are likely to present a more fair system than reality. Another caveat here lies in gathering pruned blocks. While we incentivize miners to relay blocks through Falcon, there is no guarantee that they will. We suspect that explicit storage of uncles in Ethereum allows for more accurate analysis.</p><p>Finally, Bitcoin has a significantly lower block generation frequency than Ethereum. On top of that, Bitcoin also has a lower pruned block rate than Ethereum does, which means it has significantly fewer pruned blocks. Thus, this fairness metric is much noisier in Bitcoin compared to Ethereum.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>5R e l a t e d W o r k</head><p>Network measurements in blockchain-based systems have mainly focused on Bitcoin. One such study <ref type="bibr">[22]</ref> demonstrated that the latency is the dominating factor in propagation of blocks smaller than 20 KB. Following work <ref type="bibr">[20]</ref> has shown that (1) this limit has increased to 80 KB and (2) nodes are provisioned with substantially higher bandwidth capacity than what the protocol demands. Feld et al. <ref type="bibr">[36]</ref> pointed out a strong AS-level centralization that may impact Bitcoin network's connectivity -i.e. 10 ASes contain over 30% of peers. Recent work <ref type="bibr">[2]</ref> presented the level of vulnerability, showing that 13 ASes cover the same fraction of peers, but only 39 IP prefixes host half of the overall mining power. Ours is the first work that does a similar type of study on Ethereum as well.</p><p>Other work studied various aspects of the Bitcoin overlay network. Miller et al. <ref type="bibr">[56]</ref> found that a small fraction of the network, containing around 100 nodes, represents more than 75% of the mining power. The study conjectured that these nodes are well-connected to major mining pools; hence, provide higher efficiency in broadcasting blocks. Biryukov et al. <ref type="bibr">[4]</ref> examined how peer neighbors discover IP addresses that correspond to pseudonymous identities. Another study <ref type="bibr">[49]</ref> deanonymized peers by observing anomalous relaying behavior in network. Pappalardo et al. <ref type="bibr">[59]</ref> observed that low value transactions may experience waiting times of over a month. Other work measured churn and geolocated peers <ref type="bibr">[24]</ref>. Gervais et al. <ref type="bibr">[40]</ref> discussed centralization concerns regarding the client development process, distribution of mining power, and spendable coins. Most of these works focus on attacks and the structure of the overlay network, while this work focuses on the resource capabilities of the nodes used in the overlay network.</p><p>Recent work presented ways to reduce resource requirements to participate in blockchain systems. Such solutions enhance decentralization by increasing the diversity of participants. Aspen <ref type="bibr">[39]</ref> achieves this through sharding the blockchain. In this system, users store, process, and propagate only the data that is relevant to them, hence need fewer resources to join the network. Another approach <ref type="bibr">[62]</ref> relies on authenticated data structures to reduce load on nodes. Relay networks increase network efficiency through faster block propagation. The first such system <ref type="bibr">[16]</ref> achieved this by avoiding full block verification and retransmitting known transactions. Falcon, the source of pruned block data in the Bitcoin network in this paper, relies on cut-through routing for faster block propagation. Finally, FIBRE incorporates cut-through routing with compact blocks <ref type="bibr">[17]</ref> and forward error correction over UDP. The novelty in our work was utilizing Falcon data in order to gain insights into transient application layer information.</p><p>Blockchain explorers <ref type="bibr">[65,</ref><ref type="bibr">8,</ref><ref type="bibr">32,</ref><ref type="bibr">33]</ref> provide a variety of data on cryptocurrency networks, including online blockchain history; statistics on blockchain components, transaction fees, and market value; and node information. While these sources of information are useful to the community, this work scientifically tests whether the intuitions provided by these sources of information indeed hold.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>6C o n c l u s i o n</head><p>Decentralization in blockchain-based platforms is a component of the value proposition these systems offer. This work presents a comparative assessment of decentralization in two most popular cryptocurrencies, Bitcoin and Ethereum. To do so, it relies on novel measurement techniques to obtain application layer information using the Falcon Network and the application of well-established internet measurement techniques.</p><p>Our observations show that Bitcoin has a higher capacity network than Ethereum,but with more clustered nodes likely in datacenters. We also observe that Bitcoin and Ethereum have fairly centralized mining processes and that further research is needed to decentralize permissionless consensus protocols further. In Ethereum, the block rewards have less variance than Bitcoin's. Finally, Ethereum has a lower mining power utilization than Bitcoin, likely due to the high block frequency.</p><p>Further, we see that Bitcoin has undergone tremendous growth and can increase the block size by a factor of 1.7x without any decrease in decentralization compared to 2016. Additionally, our study uncovers that the volatility of mining rewards is an important, but often ignored, metric. Finally, we see that Ethereum likely benefit from a relay network to increase its mining power utilization.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>7A c k n o w l e d g e m e n t s</head><p>The authors thank Vitalik Buterin and the anonymous reviewers for their feedback on earlier drafts of this manuscript. </p></div></body>
		</text>
</TEI>
