<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Toward a Green Blockchain: Engineering Merkle Tree and Proof of Work for Energy Optimization</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>12/01/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10398707</idno>
					<idno type="doi">10.1109/TNSM.2022.3219494</idno>
					<title level='j'>IEEE Transactions on Network and Service Management</title>
<idno>2373-7379</idno>
<biblScope unit="volume">19</biblScope>
<biblScope unit="issue">4</biblScope>					

					<author>Cesar Castellon Escobar</author><author>Swapnoneel Roy</author><author>O. Patrick Kreidl</author><author>Ayan Dutta</author><author>Ladislau Boloni</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[Blockchain-powered smart systems deployed in different industrial applications promise operational efficiencies and improved yields, while significantly mitigating cybersecurity risks. Tradeoffs between availability and security arise at implementation, however, triggered by the additional resources (e.g., memory and computation) required by blockchain-enabled hosts. This paper applies an energy-reducing algorithmic engineering technique for Merkle Tree (MT) root calculations and the Proof of Work (PoW) algorithm, two principal elements of blockchain computations, as a means to preserve the promised security benefits but with less compromise to system availability. Using pyRAPL, a python library to measure the energy consumption of a computation, we experiment with both the standard and energy-reduced implementations of both algorithms for different input sizes. Our results show that up to 98% reduction in energy consumption is possible within the blockchain's MT construction module, with the benefits typically increasing with larger input sizes. For the PoW algorithm, our results show up to 20% reduction in energy consumption, with the benefits being lower for higher difficulty levels. The proposed energy-reducing technique is also applicable to other key elements of blockchain computations, potentially affording even "greener" blockchainpowered systems than implied by only the results obtained thus far on the MT and PoW algorithms.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Blockchain technology, popularized by crypto-currency systems, is seeing extensive use in several fields. Advocates for such uses cite the blockchain's inherent properties of a decentralized structure alongside enhanced security with mechanisms for privacy and non-repudiation <ref type="bibr">[1]</ref>, <ref type="bibr">[2]</ref>, <ref type="bibr">[3]</ref>, <ref type="bibr">[4]</ref>. One particularly promising use-case is the Internet of Things (IoT) <ref type="bibr">[5]</ref>, <ref type="bibr">[6]</ref>, <ref type="bibr">[7]</ref>, which embodies the vision of computing devices communicating with each other to map a physically connected world onto its digital mirror. The IoT vision also motivates prospects of smart systems <ref type="bibr">[8]</ref> e.g., smart cities, smart homes, smart grid, smart health, smart agriculture. Unfortunately, smart systems also raise critical security and privacy challenges, motivating the vision of blockchain-powered smart systems.</p><p>Smart and secure systems implemented upon IoT technology require device inter-connectivity for extended time frames, delivering continuous data. Such operations demand constant power supply <ref type="bibr">[8]</ref> -within a world that demands more environment-friendly ("green") solutions, IoT realizations also face the challenge of energy efficiency i.e., minimizing their energy footprint. Thakore et al. <ref type="bibr">[9]</ref> acknowledge the additional energy optimization requirements that blockchains require when implemented together with IoT. Depending on the specific type of blockchain-IoT combination, precise analysis of performance and energy requirements becomes critical <ref type="bibr">[10]</ref>.</p><p>As an example of these challenging tradeoffs, consider a particular blockchain-IoT implementation with a fixed power budget. To be viable for an application that values autonomy for greater lengths of time, the system must be configured to make more efficient use of energy. Disabling the blockchain will certainly save energy, but also weaken security: it is in such contexts that the exploration of ways to reduce the energy consumption of blockchain functionality can be of tremendous practical significance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Related Work</head><p>Energy efficiency in computation is a widely studied topic, with numerous points-of-view: hardware-specific platforms, operating systems, hypervisors and containers <ref type="bibr">[11]</ref>; software development and security <ref type="bibr">[12]</ref>; and algorithms <ref type="bibr">[13]</ref>, <ref type="bibr">[14]</ref>. Energy measurements are sometimes obtained by specifically instrumented equipment <ref type="bibr">[15]</ref>, while other times can leverage hardware providers' Application Programmer Interfaces (APIs) in which firmware counters are queried to provide near real-time information e.g., Running Average Power Limit (RAPL) technology <ref type="bibr">[16]</ref>. Blockchain implementations are actively under study as providing a decentralized ledger (i.e. record of transactions) by which to optimize energy management in a variety of scenarios (e.g., generation &amp; distribution <ref type="bibr">[17]</ref>, <ref type="bibr">[18]</ref>, micro-grid networks <ref type="bibr">[19]</ref>, <ref type="bibr">[20]</ref>, <ref type="bibr">[21]</ref> and smart contracts <ref type="bibr">[22]</ref>). In contrast to our motivation, however, these studies define the optimized management objectives such that the energy footprint of the blockchain itself is out of scope.</p><p>Some past studies do recognize that the blockchain itself will draw energy away from any symbiotic system it is integrated with. Examples include Sankaran et al. <ref type="bibr">[10]</ref> and Sanju et al. <ref type="bibr">[23]</ref>, who perform power measurements and evaluate real experiments on the energy consumption of two different blockchain implementations, namely Ethereum and Hyperledger. A similar analysis of energy consumption is presented in <ref type="bibr">[15]</ref> for XRP validation, which is a key element of decentralized consensus processes within many Internet services. A particularly novel theoretical approach is reported by Fu et al. <ref type="bibr">[24]</ref>, first modeling a blockchain-IoT caching infrastructure and posing its energy optimization within a geometric programming formulation whose solutions allocate resources accordingly. A recent performance evaluation survey, also by Fu et al. <ref type="bibr">[1]</ref>, illustrates how diverse and sophisticated current implementations of blockchain ledgers are.</p><p>Despite this diversity, however, all existing implementations at their core remain faithful to Nakamoto's original blockchain concept <ref type="bibr">[25]</ref>, within which the Merkle Tree construction module is essential. Also essential to Nakimoto's blockchain concept are consensus protocols, for which Proof of Work (PoW) is arguably the most popular as the backbone of popular crypto-currencies such as Bitcoin (&#8764;$700B industry) and Ethereum 1 <ref type="bibr">[26]</ref>. However, PoW is notorious for its resourceintensive nature <ref type="bibr">[27]</ref>, <ref type="bibr">[28]</ref> and, therefore, is a concern for resource-limited (mobile) sensors e.g., robots <ref type="bibr">[29]</ref>. A quantification metric to measure the effect on the carbon cost of such crypto-currencies is studied in <ref type="bibr">[30]</ref>. In recent literature, roboticists have used Blockchain-based consensus protocols, such as PoW, for preventing malicious data integrity attacks on multi-robot systems <ref type="bibr">[31]</ref>, <ref type="bibr">[32]</ref>, <ref type="bibr">[33]</ref>. As standard robots in today's market have limited on-board battery resources, using another resource-intensive application onboard might be prohibitive. Therefore, novel engineering techniques to reduce such energy consumption in Blockchains is needed, especially for the most popular underlying protocols-this paper takes a significant step in that direction.</p><p>Also worth mention is the recent flurry of research to reduce energy consumption in network services and management, in general. Examples include techniques to prevent Distributed Denial-of-Service (DDoS) attacks <ref type="bibr">[34]</ref>, improve connectivity of autonomous vehicles <ref type="bibr">[35]</ref>, or implement delay tolerant networks (DTN), all addresssing different root causes of network disconnection problems <ref type="bibr">[36]</ref>. We believe this paper, though focused on algorithmic techniques to reduce energy consumption of blockchain-enabled applications, similarly contributes to the broad field of network services and management.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Our Scope and Contributions</head><p>We study the extent to which two principal elements of blockchain computations, Merkle Tree (MT) construction and the Proof of Work (PoW) algorithm, can be made more energy efficient. Our approach employs an energy-reducing algorithmic engineering technique, based upon an Energy Complexity Model (ECM) proposed by Roy et al. <ref type="bibr">[13]</ref>, <ref type="bibr">[14]</ref>, on the SHA256 encryption algorithm, which is central to both MT and PoW algorithms. Using pyRAPL, a python library that measures an executable's Runtime Average Power Limit (RAPL), we experiment with both the standard and energy-reduced implementations of both algorithms, varying for MT the input size and for PoW the difficulty level, in both algorithms choosing configurations commonly seen 1 Ethereum will soon adopt the Proof of Stake (PoS) consensus protocol.</p><p>within contemporary blockchain implementations. The main contributions of this work are:</p><p>&#8226; To the best of our knowledge, the first to address the energy optimization of blockchains by re-engineering the implementation of two of its primary component algorithms, Merkle Tree (MT) and Proof-of-Work (PoW). &#8226; Our results show significant reductions in energy consumption: in MT, up to 98% reduction and on-average 50% across the tested input sizes, while in PoW up to 20% reduction and on-average 10% across the tested difficulty levels. Moreover, the proposed energy-reducing technique is similarly applicable to other key elements of blockchain computations, potentially affording even "greener" blockchain-IoT systems than implied by only the MT and PoW results reported here. As a final remark, a preliminary version of this paper appeared in the 2021 IEEE TRUSTCOM conference <ref type="bibr">[37]</ref>, treating only the MT construction algorithm. This paper generalizes the experimental approach to examine energy optimization prospects within both MT and PoW and, in turn, presents the PoW results for the first time.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. OVERVIEW OF MERKLE TREE AND PROOF OF WORK</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Merkle Tree (MT) Based Block Generation</head><p>A graphic representation of a simple MT based block generation in a blockchain is shown in Fig. <ref type="figure">1</ref>. The bottom layer shows the stored transactions (e.g., T 001) for the block, which later are converted to their SHA256 Hash signatures (e.g., H001) and represent the Merkle Tree leaves. Merkle Tree root calculations involve the recursive hash computation starting from these leaves until a final hash determines the Merkle Tree root (labeled TX ROOT in Fig. <ref type="figure">1</ref>). Conceptually, the process of Merkle Tree calculation through hashing can be viewed as a state transition in which an investment of computational resources is required e.g.,</p><p>That is, the block generation is represented by the state transition in <ref type="bibr">(1)</ref>, which depends upon a function f (T ) with parameter T denoting the cost as represented by <ref type="bibr">(2)</ref>. This cost has two main components: one is the energy consumed by the hardware devices to compute the hash of the input vectors, while the other is the execution time of those computations. This paper strives to reduce the overall transition cost T by reducing the energy consumption of the hardware devices, employing a technique based upon the ECM described in Section III-A.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Proof of Work (PoW)</head><p>In crypto-economic blockchains such as Bitcoin, or Ethereum, node miners run the Proof of Work (PoW) algorithm, competing to create new blocks filled with processed transactions. The PoW algorithm sets the operation rules and mining difficulty setting the pace for adding valid blocks to the chain.</p><p>To create a block in PoW, a miner will repeatedly put an actual downloaded dataset through a hashing function, looking for a result below a target nonce as dictated by the block difficulty parameter, where a lower target nonce permits a smaller set of valid hashes. Once a successful result is generated, other miners and clients can verify the validity of the block with a single hash operation. If one transaction on the chain were to change, the hash would be completely different, thus signaling possible fraud. The standard blockchains will accept that an agreement of more than 50% of the total computing capacity suffices to reach a consensus. Thus, although a possible disagreement may appear, the majority vote will solve opposite opinions. That is, another key element of a blockchain's consensus mechanism is the chain selection rule by which the correct chain is decided in the event that multiple paths evolve in parallel. Bitcoin, for instance, uses the Nakamoto rule that selects the "longest chain" as the correct one. Finally, PoW is not only applicable to consensus protocols in blockchains: it also works as block author selector <ref type="bibr">[38]</ref> and a Sybilresistance mechanism <ref type="bibr">[39]</ref>. Sybil attacks occur when a group of colluding nodes pretend to be many participant nodes. Resistance to this attack is crucial for decentralized blockchain ledgers. PoW makes users expend a lot of energy or crypto value as protection, as an economic deterrents to Sybil attacks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>1) Implementation of PoW:</head><p>The PoW algorithm involves the following essential steps local to each member of the network sharing a common blockchain:</p><p>1) Update the local copy of the chain to the latest version.</p><p>2) Solve a mathematical puzzle and create the mined block header hash. 3) Advertise the solution as soon as the math puzzle is solved. 4) Append the block to the chain if authorized. Arguably, the most widely adopted algorithm for PoW is the Hashcash scheme <ref type="bibr">[40]</ref>, <ref type="bibr">[41]</ref>, <ref type="bibr">[42]</ref>, <ref type="bibr">[43]</ref>, <ref type="bibr">[44]</ref>. To expedite our experimentation, yet still reliably emulate blockchain hashing operations, we implemented a simplified version of this algorithm (Algorithm 1). In our implementation, a difficulty is defined first -how many trailing or preceding 0's are there in the hash value of a block. The more 0's are there, the more difficult it is to find the hash value. To find this hash value, an integer number, called nonce, is used, which is initialized to 0 and increased by 1 within a for loop. If the number of 0's (i.e., the difficulty level) match the number of 0's in the hash value of the nonce variable, the loop breaks since the appropriate nonce is found. The hash value is calculated using the SHA256 algorithm. A maximum loop number is maintained -if the proper nonce is not found by this maximum time, the nonce is reset to 0 and the program returns an exception. This process of finding the proper nonce whose hash matches the the intended difficulty level is called mining. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. ENGINEERING MT AND POW FOR ENERGY CONSERVATION</head><p>In this section, we describe a technique for reducing the energy consumption of SHA256 by an engineering technique using the Energy Complexity Model (ECM) designed in <ref type="bibr">[13]</ref>. We then apply the energy-optimized SHA256 to both MT and PoW to measure the reductions of energy consumption. Before illustrating the engineering techniques, we briefly describe ECM in the next section.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. The Energy Complexity Model (ECM)</head><p>The ECM developed in <ref type="bibr">[13]</ref> is built upon an abstraction of the Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM) architecture <ref type="bibr">[45]</ref> illustrated in Fig. <ref type="figure">2</ref>. The main memory in DDR is divided into banks, each of which contains a certain number of chunks 2 . Data is allocated over chunks in each bank, and each bank also contains a special chunk called the sense amplifier. When data needs to be accessed, the chunk containing the data is fetched into the sense amplifier of the corresponding bank. The sense amplifier can only hold one chunk at a time, so the current chunk has to be put back to its bank before the next one can be fetched for access. While only one chunk of a particular bank can be accessed at a time, chunks of different banks (each with their own sense amplifier) can be accessed in parallel. Therefore, if the DDR memory is organized into P banks (where P = 4 in Fig. <ref type="figure">2</ref>), then P chunks can be accessed at a given time. In the popular DDR3 architecture, the DDR1 notion of the per-bank sense amplifier is referred to as the per-bank cache, albeit still only capable of accessing one chunk at a given time. The ECM denotes the P banks of a given DDR3 SDRAM resource by M 1 , M 2 , . . . , M P , each such bank M i comprised of multiple chunks of size B bytes and its own cache C i . The illustrative example of Fig. <ref type="figure">3</ref> assumes P = 4 banks, as was the case in Fig. <ref type="figure">2</ref>, with just four chunks per bank, assigning numerical labels 1, 2, . . . , 16 to the memory's collection of data chunks. Heeding the DDR constraint that each cache C i may access exactly one chunk at a time, the access patterns (1, 2, 3, 4) or (5, 6, 7, 8) imply a completely serial execution, while the access patterns <ref type="bibr">(1,</ref><ref type="bibr">5,</ref><ref type="bibr">9,</ref><ref type="bibr">13)</ref> or <ref type="bibr">(3,</ref><ref type="bibr">8,</ref><ref type="bibr">10,</ref><ref type="bibr">13)</ref> are each completely parallel. The authors of <ref type="bibr">[13]</ref> discovered two key properties of DDR memory: firstly, the difference in power consumption between the same number of chunks accessed sequentially or in parallel is not significant; however, the execution time of an algorithm when chunks are accessed in parallel is significantly lower than when chunks are accessed sequentially. Because the associated energy consumption depends upon both power and time, parallelizing chunk accesses offers the potential for energy reduction for any algorithm! More formally, the energy consumption (in Joules) of an algorithm A with computational complexity (execution time) W (A), assuming a P -bank DDR3 architecture with B bytes per chunk, is given by</p><p>where I is the parallelization index, essentially the number of parallel block accesses across memory banks per P block accesses made by A on the whole. That is, under the ECM, an algorithm's potential for energy reduction is inversely proportional to the degree to which it can be re-engineered for parallelization of its memory accesses. In other words, for algorithms with high computational complexity (e.g. exponential execution time), the energy complexity becomes equivalent to its computational complexity.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Re-engineering Hash Calculations Using ECM</head><p>In this work, we engineer the hash algorithm of Line 4 of Algorithm 1 based on ECM to reduce energy consumption. First, we briefly describe how any algorithm A can be parallelized based on ECM. We then illustrate how, specifically the SHA hash algorithm, is re-engineered for parallelization.</p><p>1) Parallelizing any algorithm: Given an algorithm A, the input to A is considered to identify the most common access sequence in A. The required level of parallelism for the vector formed by the desired access sequence is then engineered using a logical mapping over chunks of memory that store data accessed by A. As mentioned above, the order of chunk accesses is different for different levels of parallelization. But the physical location (chunk) of the input in the memory is static, and is handled by the memory controller of DDR. Therefore, to implement parallelization of access over physical chunks, a different page table vector V is generated for each level of parallelization, which defines the ordering among the chunks to be accessed (see Fig. <ref type="figure">4</ref>). For 1-way access, the page table vector V has the pattern (1, 2, 3, 4, . . .) while for 4-way access it has the pattern (1, 5, 9, 13, . . .). A function is then created to map the pattern of the page table vector V to the original physical locations of the input. Algorithm 2 shows the function to create an ordering among the chunks. The ordering is based on the way we want to access the chunks (P -way would mean full parallel access). The page table is populated by picking chunks with jumps. For P -way access, jumps of P are selected that ensure the consecutive chunk accesses lie in P different banks. Going by the above example, for P = 1, jumps of 1 ensure that 4 consecutive chunk accesses lie in the same bank (bank 1 of Fig. <ref type="figure">3</ref>). On the other hand, for P = 4, jumps of 4 ensure that 4 consecutive chunk accesses lie in 4 different banks (banks 1 through 4 of Fig. <ref type="figure">3</ref>).</p><p>2) Parallelizing SHA for MT and PoW: As described earlier, both MT and PoW (Algorithm 1) perform their respective hash  </p><p>calculations via the repeated use of the SHA256 algorithm. As shown in Fig. <ref type="figure">5</ref>, the input to SHA256 is partitioned into fixed size message blocks, presented in sequence to separate compression functions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Figure 5: Illustration of the SHA256 algorithm</head><p>This block sequence is identified in correspondence with the access pattern of the SHA256 algorithm, which we subject to re-engineering based on the ECM. The input vector is preprocessed into another vector by applying Algorithm 2. The mapping is then stored in a page table to be used in subsequent hash calculations. An example of this operation for 16 blocks and a parallelization index (jump) of 4 is shown in Fig. <ref type="figure">6</ref>. Fig. ?? shows the outcome of re-engineering the SHA256 Figure <ref type="figure">6</ref>: Mapping of SHA input blocks based on ECM algorithm based on ECM. In our experimentation, an 8-bank DDR3 SDRAM is used and the parallelization index is set to I = 8. This essentially means that for any set of eight consecutive block access in SHA256, we created a virtual mapping using techniques described in <ref type="bibr">[14]</ref> to ensure that each size-8 access occurs across all eight banks.</p><p>Theorem III.1. The re-engineered SHA256 algorithm has the same computational complexity as the original SHA256 algorithm.</p><p>Proof. SHA256 has a computational complexity of &#920;(N ), where N is the number of blocks in Fig. <ref type="figure">5</ref>  <ref type="bibr">[46]</ref>. Algorithm 2 has a computational complexity of &#920;(N ) since the for loop of line 2 executes exactly N B times. Therefore applying Algorithm 2 to SHA256 as illustrated in Fig. <ref type="figure">6</ref> does not change the overall computational complexity of SHA256.</p><p>Theorem III.1 implies that applying the re-engineered SHA256 in place of the original SHA256 in any algorithm will not modify the computational complexity of the algorithm.</p><p>3) Applying the re-engineered SHA256 to MT: To recap, a graphic representation of a simple MT-based block generation in a blockchain is shown in Fig. <ref type="figure">1</ref>. The bottom layer shows the stored transactions (e.g., T 001) for the block, which later are converted to their SHA256 Hash signatures (e.g., H001) and represent the Merkle Tree leaves. Merkle Tree root calculations involve the recursive hash computation starting from these leaves until a final hash determines the Merkle Tree root (labeled TX ROOT in Fig. <ref type="figure">1</ref>). Fig. <ref type="figure">7</ref> shows the input to the SHA256 hash for the MT algorithm. Section IV describes the results of a series of experiments conducted to calculate the energy savings obtained by incorporating the re-engineered SHA256 algorithm into MT.   </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>. For hash size h, Algorithm 1 has a computational complexity of O(2 h ).</head><p>Proof. Algorithm 1 attempts to find a hash value satisfying the difficulty level by brute-forcing over every possible hash values. The computational complexity stated in Theorem III.2 therefore follows.</p><p>Theorem III.2 implies, the energy complexity of Algorithm 1 (or PoW) will converge with its computational complexity based on the size of the hash function used at some point. In this work, we therefore attempted to optimize the hash function (SHA256) used in Algorithm 1 to investigate whether that reduces the overall energy consumption of the PoW algorithm. Next, we present the results of our experiments with these proposed energy-optimized MT root calculation and PoW versions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. EXPERIMENTS</head><p>In the following we describe a series of experiments designed to quantify the energy savings of the methodology described in the previous section. By virtue of the ECM's formulation, the enhanced implementation requires computer hardware using a DDR RAM architecture. Maximum energy reduction is promised by a parallelization index taken to equal the number of memory banks, which depends upon the DDR version: 4 for DDR2, 8 for DDR3 and 16 for DDR4 and higher. The machine used for our experiments features a 64-bit dualcore processor (Intel i5-2410M @ 2900MHz with cache size L2 256KB and L3 3072KB), running Linux Mint version 19.3 with a 8GB DD3 RAM and 500GB SSD storage. We use pyRAPL, a software toolkit to measure a host machine's energy footprint along the execution of a piece of Python code, to compare the energy consumption between the standard and ECM-enhanced implementations. pyRAPL is built upon Intel's Running Average Power Limit (RAPL) technology that estimates a CPU's power consumption; depending on the hardware and operating system configurations, pyRAPL can measure energy consumption of the following CPU domains: CPU socket, GPU, and DRAM <ref type="bibr">[16]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Implementation Details and Setup</head><p>Our experimental objectives could not be met by using the SHA256 function in the Hash Python library because memory management in Python involves a private heap, containing all objects and data structures. The control of this private heap is ensured internally by the Python memory manager, with different components dealing with sharing, segmentation, preallocation or caching. Our ECM-enhanced implementation of SHA256 requires greater control over memory allocation than Python's memory manager permits. Such low-level control on memory management is possible in the standard C programming language. We thus implement the standard and ECMenhanced versions of the SHA256 algorithm within separate C programs, which are called from a Python script (upon importing the ctypes module) as an external routine. This permits the use of pyRAPL for the needed energy measurements without denying low-level memory control to implement the ECM-enhanced SHA256 functionality in both MT and PoW algorithms.</p><p>1) MT Experimental Details: Our experiments simulated the Merkle Tree calculation with Python code that runs 103 consecutive two-leaves-input hashes with pyRAPL invoked. Each execution of the code yields an energy measurement, but because the instrumentation is subject to noise we invoke 5000 repetitions and report the mean and standard deviation results. Our experiments also vary the input size (i.e., the compounded-leaf size) to the Merkle Tree calculations, choosing 1, 64, 96, 128, 512, 1024, 16384 and 262144 bytes motivated as follows:</p><p>1) the 1B input is the bare minimum that the ECM permits for any algorithm <ref type="bibr">[13]</ref>; 2) the 64B, 96B and 128 inputs are common in blockchain applications <ref type="bibr">[6]</ref>; 3) the 512B and 1024B inputs are common in file hashing applications <ref type="bibr">[47]</ref>; while 4) the 16384B and 262144B inputs are common in the Interplanetary File System (IPFS) <ref type="bibr">[48]</ref>, <ref type="bibr">[49]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>2) PoW Experimental Details:</head><p>The standard PoW implementation is labeled by "O" that uses the original SHA256 hashing algorithm, while the implementation using the proposed re-engineered SHA256 algorithm is labeled by "E". The input size is fixed to 256 bytes including all the block header parameters and byte padding, while we experimented with six difficulty levels ranged from 1 through 6 (encoded by H0, H00, H000, H0000, H00000 and H000000). Per implementation and difficulty level, our experimental Python program leverages the pyRAPL toolkit to measure the average energy (mean and standard deviation over repeated trials) of the emulated PoW calculations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Results and Discussion</head><p>We discuss experimental results from both sets of experiments (MT and PoW) separately in this section.</p><p>1) MT Results: Recall that our experimental setup features two implementations of Merkle Tree (MT) calculations, the standard one (which we label by "O" as it uses the original SHA256) and the re-engineered one using ECM (which we label by "E" as it uses the enhanced SHA256), as well as eight different input sizes. Per implementation and per input size, our experimental Python script leverages the pyRAPL toolkit to measure the average energy (mean and deviation over 5000 trials) of simulated Merkle Tree calculations. Fig. <ref type="figure">9</ref> summarizes the sixteen average energy measurements in two bar charts, per input size comparing the Standard MT (O) and the Enhanced MT (E) average energy (in &#956;Joules). Fig. <ref type="figure">9</ref>(a) renders the comparison over the six smallest input sizes (using a linearly-scaled vertical axis), while Fig. <ref type="figure">9(b</ref>) is over the two largest input sizes (using a log-scaled vertical axis). It is seen that the ECM-enhanced implementation consistently requires less energy than the standard implementation, the difference being increasingly significant with the larger input sizes that befit file hashing applications (i.e., 512B and above) while still remaining meaningful for input sizes of 64B, 96B and 128B that befit blockchain applications. This observed dependence on input size may be a consequence of CPU memory caching. DRAM memory often allows the memory controller to optimise accesses by L1/L2/L3 caching of data. With smaller inputs, such caching enables parallelization of bank accesses even in the standard implementation. The comparison for the 1B input size corroborates this point, where we observe the enhanced implementation consume more energy than the standard implementation.  enhanced implementation over the standard implementation versus all eight input sizes. The energy savings for the blockchain-motivated input sizes range between 19% and 34%, while the energy savings for the file-system-motivated input sizes range between 69% and 98% -the case of 16384B exhibiting that maximum 98.47% savings. As noted in Fig. <ref type="figure">9</ref>, the 1B input renders a savings of -4.27%, meaning the standard implementation is more energy-efficient by virtue of the parallelism invoked within the CPU's L1/L2/L3 cache in this case.</p><p>2) PoW Results: Fig. <ref type="figure">11</ref> summarizes the average energy measurements (using a log-scaled vertical axis) per difficulty level size, at each level comparing the Standard PoW (O) and the Enhanced PoW (E) average energy (in &#956;Joules). It is observed from Fig. <ref type="figure">11</ref> that there exists energy savings when comparing both implementations, although the savings percentage is inversely proportional to the difficulty level. The results in Fig. <ref type="figure">11</ref> tally with what we had derived in Section III-B4 in light of Theorem III.2. With higher difficulty levels the energy complexity of PoW based on the ECM converges with its computational complexity based on the Turing Model. Our experimental results underscore what we had theoretically analyzed about PoW in terms of its energy consumption. To further illustrate this, in Fig. <ref type="figure">12</ref>, the average Figure <ref type="figure">12</ref>: Energy savings in PoW per difficulty energy comparison in relative terms versus difficulty level, namely as a percent reduction achieved by the enhanced implementation over the standard implementation, is presented. The energy savings in PoW range between 4% and 20%, the case of Difficulty 1 exhibiting a maximum saving of 20%. We observe steady reduction in energy savings as the difficulty level increases (Fig. <ref type="figure">12</ref>). This again tallies with our theoretical energy complexity analysis of PoW in Section III-B4. As further illustration, Fig. <ref type="figure">13</ref> displays the exponential rise in the number of iterations within the PoW operations with increasing difficulty levels, signifying the exponential runtime dependency of PoW (Algorithm 1) based on the hash size and difficulty level as a higher difficulty level forces Algorithm 1 to iterate through the for loop (line 3) exponentially more times. This complements Theorem III.2 about the energy complexity of Algorithm 1 converging with its computational complexity based on size of the hash function used at some point.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>C. Approximate Cost-Savings Analysis</head><p>We present an approximate cost-reduction analysis on implementing the energy optimization techniques on MT and PoW. Evaluating possible power optimization needs total time estimation for every operation. Figs. 14 and 15 respectively illustrate the percentage of savings for time (&#956;sec) and power(&#956;Watt) for MT and PoW.  The following formulas have been used to estimate energy costs in the analysis. Energy consumption calculation. The energy E in kilowatthours (kWh) per day is equal to the power P in watts (W) times number of usage hours per day t divided by 1000 watts per kilowatt.</p><p>Electricity cost calculation. The electricity cost per day in dollars is equal to the energy consumption E in kWh per day times the energy cost of 1 kW h in &#162;/kW h divided by 100 (&#162;/$).</p><p>Cost ($/day) = E (kW h/day) &#215; Cost (&#162;/kW h) /100 <ref type="bibr">(5)</ref> To estimate a yearly energy savings by our optimized PoW we make the following assumptions: The system generates 500 blocks per hour, a total of 16 hours is consumed in daily operation. Additionally, residential electricity rates in Florida average 11.42&#162;/kW h. Fig. <ref type="figure">16</ref> depicts the calculated yearly energy savings based on the above assumptions.</p><p>Figure <ref type="figure">16</ref>: Example of potential yearly savings for a ECM BCH-powered application</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>D. Discussion on Network Throughput Change</head><p>Blockchain nodes are authorized stakeholders to keep track of ledger, and may also serve as communication proxies (e.g hierarchical distributed cryptocurrency system) for different network tasks, when the underlying topology is complex, or a highly populated ecosystem is used. In simpler topologies (e.g. a fully meshed interconnected robot fleet), all nodes are considered to have access to the ledger, and no proxy functions are implemented on them.</p><p>This work has been motivated by the latter topology type, where the ledger life depends only on duration of the mission, therefore parameters like node size or network throughput have not been considered for evaluation while conducting experiments (since this is a limited-size, low-volume, smallscale transactions scenario. Nevertheless, based on Theorem III.1 it can be inferred that since the ECM modification lies in the core programming of the nodes (it is a modification of an algorithm), it does not add any additional data to the transactions. Since the resulting hashes have the same size as the original ones, there will not be any increase in size for any given node. Therefore we conjecture that our energy optimization techniques will not have influence on overall network throughput.</p><p>However, it will be interesting to observe any possible changes in Read Latency, Read Throughput, Transaction Latency, Transaction Throughput as Key Performance Indicators depending on the type of Blockchain application to implement while applying our techniques for energy optimization. This is mostly used in private blockchains <ref type="bibr">[53]</ref>.</p><p>The notable part is to the best of our knowledge, ours is the first work that attempts to re-engineer PoW based on the ECM to reduce energy consumption in it. Our technique is generic so that it can be also applied to the lighter algorithms as mentioned above to further reduce their energy consumption.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>VI. CONCLUSION AND FUTURE WORK</head><p>This paper described a technique to reduce the energy consumption of the Merkle Tree (MT) and Proof of Work (PoW) computations within blockchanns. At the technique's core is a re-engineered SHA256 hashing algorithm based upon the Energy Complexity Model (ECM) <ref type="bibr">[13]</ref>. The ECMenhanced implementations were compared to the standard implementations via experimental energy measurements, varying input sizes within the MT experiments and varying difficulty level within PoW experiments, both including configurations of practical significance. The results for MT show that up to 34% energy savings is possible for input sizes typically used by blockchains, while up to 98% is possible for input sizes used by other applications (e.g., file systems). The results for PoW show up to 20% energy savings is possible, yielding diminished savings with increasing difficulty levels as the theoretical energy complexity analysis of PoW predicts. Due to limitations of our current experimental infrastructure, we can only conjecture that the reduced energy consumption in these algorithm modules (MT and PoW) extrapolates to a comparable reduction for blockchains on the whole. Should such a conjecture hold, however, numerous applications could render "greener" blockchained-enabled networked systems e.g., autonomous vehicles <ref type="bibr">[54]</ref>, crypto-currencies <ref type="bibr">[55]</ref>, cloudbased software defined networks <ref type="bibr">[56]</ref>, intrusion detection systems <ref type="bibr">[57]</ref> as well as Internet-of-Things (IoT) technology <ref type="bibr">[5]</ref>, <ref type="bibr">[6]</ref>, <ref type="bibr">[7]</ref> and smart systems <ref type="bibr">[8]</ref>.</p><p>Natural next steps of this work are to assess the energysaving opportunities in other applications of Merkle Trees e.g., authentication schemes <ref type="bibr">[49]</ref>, healthcare systems <ref type="bibr">[58]</ref>, embedded systems <ref type="bibr">[59]</ref>, network protocols <ref type="bibr">[60]</ref>, <ref type="bibr">[61]</ref>. Also of interest is the exploration to reduce energy consumption of the different variations and implementations of PoW algorithm (e.g. <ref type="bibr">[31]</ref>, <ref type="bibr">[32]</ref>, <ref type="bibr">[62]</ref>, <ref type="bibr">[63]</ref>, <ref type="bibr">[64]</ref>), and the alternative Proof of Stake (PoS) algorithm <ref type="bibr">[31]</ref>, <ref type="bibr">[65]</ref>. Future work could also examine directions by which the emulation testbench developed herein can be merged with an actual blockchain implementation, offering all the functionalities that a peer node uses while connected to other peers in true distributed fashion. Real-world usage may also feature different sequencing and/or intermittent reliance on hashing primitives, so achieved energy savings may be only probabilistically related to the results demonstrated under deterministic usage patterns here. Another avenue for future work is to examine the sensitivity of energy savings to different hardware platforms. The energy measurement tool employed here, namely RAPL, is developed for only Intel processors; meanwhile, the Energy Complexity Model (ECM) by which the hash function was re-engineered is developed currently only for DDR memory architectures. However, current "smart" devices technologies are anticipated to use other hardware configurations, such as ARM platforms, for which the ECM is not yet developed to analogously exploit prospects of memory/CPU parallelization.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0"><p>The term "block" is used in DDR specifications, but we use the term "chunk" to avoid confusion within our blockchain context.</p></note>
		</body>
		</text>
</TEI>
