<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>HEALM: Hardware-Efficient Approximate Logarithmic Multiplier with Reduced Error</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>01/17/2022</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10324765</idno>
					<idno type="doi">10.1109/ASP-DAC52403.2022.9712543</idno>
					<title level='j'>Proc. Asia South Pacific Design Automation Conference (ASP-DAC’22)</title>
<idno></idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Shuyuan Yu</author><author>Maliha Tasnim</author><author>Sheldon X.-D. Tan</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[In this work, we propose a new approximate logarithm multipliers  (ALM)  based on a novel error compensation scheme.  The proposed hardware-efficient ALM, named HEALM, first  determines the truncation width for mantissa summation in ALM. Then  the error compensation or reduction is performed via a lookup table, which stores reduction factors for different regions of input  operands. This is in contrast to an existing approach, in which  error reduction is performed independently of the width truncation of  mantissa summation. As a result, the new design will lead to more  accurate result with both reduced area and power.  Furthermore,  different from existing approaches which will either introduce  resource overheads when doing error improvement or lose accuracy when saving area and power, HEALM can improve accuracy and  resource consumption at the same time.  Our study shows that 8-bit  HEALM can achieve up to 2.92%, 9.30%, 16.08%, 17.61% improvement in mean error, peak error, area, power consumption respectively over REALM, which  is the state of art work with the same number of bits truncated. We also propose a single error coefficient mode named HEALM-TA-S, which improves the ALM design  with a truncation adder (TA) for mantissa summation. Furthermore, we evaluate the proposed HEALM design in a discrete cosine transformation (DCT) application.   The result shows that with different values of k,  HEALM-TA can improve the image quality  upon the ALM baseline by 7.8 to 17.2dB in average  and HEALM-SOA can improve 2.9 to15.8dB in average, respectively.   Besides, HEALM-TA and HEALM-SOA outperform all the state of artworks with k=2,3,4 on the image quality. And the single coefficient mode, HEALM-TA-S,  can improve the image quality upon the baseline up to 4.1dB  in average with extremely low resource consumption]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>I. INTRODUCTION</head><p>Approximate computing enables efficient trade-off among accuracy, area, latency and power for more efficient error tolerant applications implementation such as machine learning and multimedia workloads <ref type="bibr">[1]</ref>. Those workloads are heavily dominated by the multiplication operations and hence design of hardware-efficient multiplier has been intensively investigated recently. The primary goal of the approximate multiplier design is to reduce the power and area for the least accuracy loss.</p><p>A number of approximate multiplier designs have been proposed recently <ref type="bibr">[2]</ref>- <ref type="bibr">[11]</ref>. Those approximate multipliers employ some adhoc truncation or reduction methods or mathematically formulated approximation schemes. Most of the existing methods, however, lack the systematic configurability for accuracy vs. area/power/latency trade-off. On the other hand, a class of approximate multipliers that are mathematically formulated include logarithmic multipliers, which convert multiplication into only shift and addition operations. Due to the inherent approximate nature of logarithmic operation and the easy accuracy manipulation of the resulting addition, the area, latency and power can be traded off at the cost of accuracy. The logarithmic multiplier was originally proposed by Michelle <ref type="bibr">[12]</ref>. Since then, many approximate logarithmic multipliers (ALM) have been proposed to improve Michelle's work <ref type="bibr">[9]</ref>, <ref type="bibr">[10]</ref>, <ref type="bibr">[13]</ref>, <ref type="bibr">[14]</ref>. Most of those methods focused on how to reduce and compensate the errors introduced in the piece-wise approximation of the log function, which tends to cause negative errors.</p><p>Recently Ansari et al. <ref type="bibr">[14]</ref> developed an approximate scheme to make the error distribution more balanced (double sided errors) for the ALM method. Saadat et al. <ref type="bibr">[10]</ref> further introduced a general error compensation technique, called REALM, using an analytically generated error reduction factor lookup table for different regions of input operands. The benefit of this method is that it can generate more balanced errors by designing and providing configurable design trade-off between area and precision. However, this method uses one lookup table for all truncation configuration in the approximate addition, which may lead to large errors especially for low precision cases as we will show in this work.</p><p>Based on the observation, in this work, we propose a new hardware efficient approximate logarithmic multiplier, named HEALM, with a novel error reduction scheme for low precision (8-bit to 16-bit) multiplication. The key contributions of this work are listed as follows:</p><p>1. HEALM first determines the truncation width for mantissa summation in ALM based on the resource requirement or design constraints. Then the error compensation or reduction is performed via a lookup table, which stores error compensation coefficients for different regions of input operands. This is different from the existing approach like REALM <ref type="bibr">[10]</ref>, in which error reduction is performed independently of the width truncation of mantissa summation. This paper is organized as follows: Section II reviews several recently proposed approximate multiplication designs. Section III presents the proposed HEALM design including the inexact adders and the error reduction techniques. Section IV shows the experimental results for the error metrics, area, power and comparison results with state of art methods. Finally, section V concludes the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>II. REVIEW OF RELATED WORK</head><p>Recently, various designs of approximate unsigned integer multipliers have been proposed. Earlier designs often involve ad-hoc based approximations, such as recursive multipliers <ref type="bibr">[2]</ref> which consist of 2 &#215; 2 multiplication blocks, simplification of Wallace tree <ref type="bibr">[3]</ref>, simplifying partial product generation/summation <ref type="bibr">[4]</ref>- <ref type="bibr">[6]</ref>, others used a smaller multiplier by extracting m-bit fragment from the N -bit precision inputs. Such as <ref type="bibr">[7]</ref>, <ref type="bibr">[8]</ref>.</p><p>Among these methods, many recent approximate multipliers are developed based on the classic approximate logarithmic multiplier proposed by Mitchell, also called ALM, as it shows good overall performance and has flexibility for trade-offs among area, power and accuracy <ref type="bibr">[12]</ref>. Specifically, for ALM design, the two inputs A and B are first represented by the following format: 2 ka &#8226;(1+x) and 2 k b &#8226;(1+ y), respectively. Then the multiplication result can be approximated as <ref type="bibr">(1)</ref>.</p><p>Here C ALM is the approximate multiplication result. The ALM design requires four steps to finish the multiplication process. First it utilizes leading-one detectors (LOD) to find the leading bit '1' as the integer part; second, barrel shifters are used to re-align the rest of the bits as the fraction part; then it sums the two fraction and integer parts up as k a + k b + x + y; and finally it shifts back with the same bits. Although ALM suffers from high absolute MRED (mean relative error distance) and peak relative error of 3.76% and 11.11%, respectively, it can perform a good trade-off among accuracy, area and power.</p><p>To further improve the accuracy of the ALM method, several derivative works have been proposed by means of different error compensation mechanisms. For instance, the MBM design tried to add a fixed single error-correction term to the final result <ref type="bibr">[9]</ref>. This was further improved by the LeAp multiplier, which added different error coefficients to the fraction parts based on the value ranges of the results <ref type="bibr">[15]</ref>. The REALM multiplier design further improved the compensation scheme by using a lookup table to store M &#215; M coefficients / factors for M &#215;M partitions of input ranges with some hardware resource overheads <ref type="bibr">[10]</ref>. These works indeed improved the error metrics of the approximate logarithmic multiplication without incurring too much resource overheads.</p><p>One important observation is that the ALM design will become less effective in reducing area and power when the precision of inputs decreases. Ebrahimi et al. <ref type="bibr">[15]</ref> recently showed that 32bit ALM can have more area and power reduction than 16-bit ALM. However, low precision operation is important as emerging machine learning workloads can be performed (at least for inference) using low precision operations. For instance, 16-bit fixed point is demonstrated to be sufficient for training neural networks with no loss in classification accuracy <ref type="bibr">[16]</ref>. 8-bit precision is sufficient for inference with minimal accuracy loss <ref type="bibr">[17]</ref>. Some previous works <ref type="bibr">[13]</ref>, <ref type="bibr">[14]</ref> tried to do further area reduction by replacing the exact adder with an inexact one. Since the exact adder unit is the bottleneck of the ALM critical path and occupies large area, this idea does help in area saving. But the inexact adder also introduces extra error, and the error can become quite significant especially in the 8-bit case (shown later in Sec. IV). The REALM design <ref type="bibr">[10]</ref> did error compensation for ALM and achieved extremely low error bias with very low peak error even under the circumstance that truncates the lower part in the mantissa summation. However, the results are also obtained under 16-bit precision only. We'll show that REALM under 8-bit precision will not perform as well as the 16-bit case in Sec. IV.</p><p>In this work, we will focus on the 8-bit and 16-bit precision hardware efficient approximate logarithmic multiplier design and demonstrate the superior performance of the proposed new design against the ALM <ref type="bibr">[12]</ref> baseline and other state of art works like LeAp <ref type="bibr">[15]</ref>, REALM <ref type="bibr">[10]</ref>, ALM-SOA <ref type="bibr">[13]</ref>, ILM-EA <ref type="bibr">[14]</ref> and ILM-AA <ref type="bibr">[14]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>III. PROPOSED HARDWARE-EFFICIENT APPROXIMATE</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>MULTIPLIER</head><p>In this section, we show the details of the proposed hardwareefficient ALM design by considering the bit or width truncation in the mantissa summation part and error compensation at the same time.</p><p>Consider an inexact adder with N -bit inputs A and B, then the sum S has N +1 bits. Let k be the number of bits in the lower part of the sum which are approximated. The binary representation of A is A N -1AN -2...Ak A k-1 ...A0 and B is in a similar form. The upper part A N -1AN -2...Ak and BN-1BN-2...B k , which are denoted as A H and BH , respectively, will perform the exact summation to obtain the higher part of S, S N SN-1...S k , which is denoted as SH . The lower part A k-1 ...A0 and B k-1 ...B0, which are denoted as AL and B L, will perform the approximate summation to obtain the lower part of S: C k-1 S k-1 ...S0. Note that C k-1 is the carry bit to the exact summation of upper parts.</p><p>We implement two representative approximate adders (or inexact adders), one is the truncation adder (TA), the other is the set one adder (SOA) <ref type="bibr">[13]</ref> with error improvement to the ALM. These adders have very small complexity, which are suitable for implementing our HEALM designs, named as HEALM-TA and HEALM-SOA with error improvement. To carry out the error compensation, we first analyze the error profile of ALM when the exact adder used for the mantissa summation is replaced with an approximate adder. Then, we perform specified error compensation for HEALM with each approximate adder under different k values, where k represents the number of bits in the inexact mantissa summation part, to achieve the best trade-off among the error metrics and the hardware resources. The structure of the HEALM design is shown in Fig. <ref type="figure">1</ref>.  A. HEALM with truncation adder: HEALM-TA</p><p>1) The error behaviors of ALM-TA: Before we discuss our proposed HEALM design with truncation adder, or HEALM-TA, we need to show the error behavior of ALM with a simple TA, represented as ALM-TA. First, we give a brief introduction on the concept of TA. The TA simply truncates the lower part of the inputs A and B, which makes the inexact part of the mantissa summation S L equal to zero. Thus the mantissa summation with TA will only calculate the upper part S H . An N -bit adder is actually truncated to an N -k-bit adder. We use a 7-bit case with k = 4 to describe the concept of TA, which is shown in Fig. <ref type="figure">3(a)</ref>.</p><p>1C-1 The error behavior of ALM-TA is similar as the error behavior of the ALM method. We have shown the ALM method in (1) in Sec. II, and now the exact multiplication written in log-based form can be expressed as <ref type="bibr">(2)</ref>.</p><p>Based on (1) and ( <ref type="formula">2</ref>), we can calculate the error of ALM as (3). The error behavior of ALM showing a proportional replication in each power-of-two interval is demonstrated in Fig. <ref type="figure">2</ref>. Hence, we can perform error compensation to the fractional part before barrel shifting operation to save resource, which is also demonstrated in <ref type="bibr">[15]</ref>.</p><p>After replacing the exact mantissa summation in ALM ("x + y") with approximate summation by using TA, the error will also accumulate. We use Error ALM -T A to represent it from time being. The error behavior of ALM-TA can be calculated as ( <ref type="formula">4</ref>) and also has a proportional replication in each power-of-two interval. Fig. <ref type="figure">4(a)</ref> demonstrates the error behavior of ALM-TA with k = 4 in an interval. Notice that the error behavior though distributes nearly symmetric, which is similar as the error profile of ALM in a single interval. It further shows proportional replication in each 1/8 interval. Also, for TA summation, no matter what the lower half of the inputs are (here represented as x L and yL, as the inputs of the mantissa summation is the fractional part x and y of input A and B, respectively), the approximate summation is only determined by the exact summation part (x H + yH ). Thus, we partition the fractional xy-space into 64 blocks (8&#215;8) with the red dash line, which are the most significant 3 bits (3 MSBs) of x and y, as shown in Fig. <ref type="figure">4</ref>; and recalculate the average error in each block as shown in Fig. <ref type="figure">4(b)</ref>. Also, for ALM-TA with more than 3 bits in the exact summation part (k &lt; 4), we still use the 8&#215;8 blocks partition to calculate the average error to save resource. The experimental results shown in Sec. IV will prove that the partition with 8&#215;8 is sufficient to achieve acceptable accuracy improvement and good resource saving.</p><p>2) The proposed HEALM-TA error compensation: Based on the aforementioned observation on the error behavior of ALM-TA, we can perform specified error compensation and propose our HEALM idea. We first generate a lookup table, which is of the same size as 8&#215;8 blocks partition, as an error compensation pattern. An example of the pattern is shown in Fig. <ref type="figure">4(c</ref>). The error coefficient, Err coef f , which is added to the approximate mantissa summation, is generated by searching the lookup table based on the 3 MSBs of x and y to perform the specified error compensation. Note that when the error compensation pattern is simple (usually the value of k is large), such as the example we show in Fig. <ref type="figure">4</ref>(c), the lookup table can be simplified to several large squarish area. Like the case shown in Fig. <ref type="figure">4</ref>(c), the blue area is equivalent to the sum of 3 rectangular regions, which can be described much simpler than an 8&#215;8 lookup table, thus saving the resource consumption.</p><p>The HEALM-TA method can be expressed as <ref type="bibr">(5)</ref>, where C HEALM -T A is the product of HEALM-TA method. And the value of the error coefficients are determined by the average error of each block. We notice that if x + y &#8805; 1, the error coefficient Err coef f will be added twice. Thus, the equivalent error in these blocks where x + y &#8805; 1 should be as half as its initial value. So we divided the Err coef f for these blocks to half. And for those blocks where x + y could be either smaller or larger than 1, we further perform error compensation arrangement to achieve the possible smallest peak relative error. Note that the mantissa summation of x + y is replaced with the approximate summation now, so we need to do quatization of the error coefficients to ensure that the precision of these coefficients no larger than the precision of the exact summation part. In the case of k = 4 as shown in Fig. <ref type="figure">4</ref>(c), the exact summation only has 3 bits. So Err coef f also need to be a 3-bit parameter. Actually in this case, the error coefficient will either be 1/8 or 2/8 as shown in Fig. <ref type="figure">4</ref>(c).</p><p>Our proposed HEALM-TA design performs well especially when the value of k is large. And 8-bit HEALM-TA with k = 3 can improve the traditional ALM design in both error metrics and area, which is never achieved by the previous works with 8-bit precision. We'll prove this later in Sec. IV.</p><p>3) Single coefficient mode: HEALM-TA-S: Furthermore, we propose a single coefficient mode, named as HEALM-TA-S to perform error compensation on ALM-TA with almost no resource overheads. As an N -bit simple TA with k bits truncated, it consists of 1 HA (half adder) and N -k -1 FAs (full adder), which is shown in Fig. <ref type="figure">6</ref>(a) (in the example of k = 4, the exact summation part includes 2 FA and 1 HA). To perform the simplest error compensation, the error coefficient for the whole fractional space is set to be the same value, which is 2 -(N -k) (1/8 in this case); and the HA is replaced with an FA at the LSB (least significant bit) location to obtain the smallest resource overheads. The structure of the mantissa summation part of HEALM-TA-S design is shown in Fig. <ref type="figure">6(b)</ref>. Note that the input carry bit (C in) for the FA at LSB is always set to '1' according to the error coefficient. We'll prove later in Sec. IV that HEALM-TA-S can perform a good trade-off among the error metrics and the hardware performance especially when k is large for HEALM-TA-S design.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. HEALM with set one adder: HEALM-SOA</head><p>Besides HEALM-TA, we also propose another HEALM design with set one adder, or SOA, called HEALM-SOA. Simple ALM design with mantissa summation replaced with SOA (ALM-SOA) has already been proposed before <ref type="bibr">[13]</ref>. Based on ALM-SOA, we further perform error compensation similar to Sec. III-A.</p><p>In an SOA, different from TA, all the bits in S L part are set to logic '1' to produce a balanced error in the ALM-SOA method. For the S H , which is the exact summation part, SH = AH + BH + Cin, where the carry bit C in is obtained by doing an AND operation of the MSB in A H and BH (A[k-1] and B[k-1], respectively), as expressed in <ref type="bibr">(6)</ref>, suppose the SOA is an N -bit summation with k bits in the approximate summation part S L.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>SH = S[N</head><p>1C-1 Then, based on ( <ref type="formula">1</ref>) and ( <ref type="formula">6</ref>), we can calculate the error of ALM-SOA in an power-of-2 interval as <ref type="bibr">(7)</ref>, where</p><p>Similar as ALM-TA, we show the error behavior of ALM-SOA (k = 4) with an example in Fig. <ref type="figure">5(a)</ref>.</p><p>The HEALM-SOA idea is similar as HEALM-TA. The error compensation pattern of HEALM-SOA is shown in Fig. <ref type="figure">5(c</ref>). We partition the fractional space into 8&#215;8 blocks and calculate the average error following the same way as HEALM-TA, which is shown in Fig. <ref type="figure">5(b)</ref>. Then based on the error distribution of ALM-SOA, we generate a specified error compensation pattern in a lookup table form. The error coefficient which is added to the mantissa summation part is determined by the 3 MSBs of x and y as HEALM-TA. Similar to HEALM-TA, HEALM-SOA also selects the error compensation patterns to achieve the smallest possible peak relative error and can provide improvement upon the traditional ALM design in terms of both the error metrics and resource consumption. We'll show this later in Sec. IV.</p><p>Note that unlike HEALM-TA, the LSB summation of the exact summation part in HEALM-SOA should consider the carry bit C in from the SL (approximate summation part). The LSB summation also requires a FA instead of HA in the exact summation part of HEALM-SOA. We cannot directly add a bit '1' as error compensation to the LSB location. So HEALM-SOA will not have a single error coefficient mode like "HEALM-SOA-S".</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>IV. EXPERIMENTAL RESULTS AND DISCUSSIONS</head><p>In this section, we evaluate the performance of the proposed hardware-efficient approximate logarithmic multiplier with reduced error, named HEALM under 8-bit precision. We also compare HEALM against the ALM (approximate logarithmic multiplier) baseline <ref type="bibr">[12]</ref> and other state of art works with the same precision. Furthermore, we demonstrate 16-bit HEALM design results compared with the baseline and state of art works as a complementary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>A. Experimental setup</head><p>To evaluate the performance of the proposed HEALM design, we first compare the error metrics and the hardware performance of HEALM with its original version: a classical ALM proposed by Mitchell, which is selected as the baseline. We also compare HEALM with other state of art improved ALMs. These improved ALMs include: LeAp <ref type="bibr">[15]</ref>, REALM <ref type="bibr">[10]</ref>, ALM-SOA <ref type="bibr">[13]</ref>, ILM-EA <ref type="bibr">[14]</ref>, ILM-AA <ref type="bibr">[14]</ref>. For REALM design, we compare REALM8 which did the same partition in the fractional space (in an power-of-2 interval) as HEALM does for fair comparison.</p><p>All the above mentioned 8-bit multipliers are implemented in Verilog HDL and synthesized with Synopsys Design Compiler using EDK 32nm standard cell library <ref type="bibr">[18]</ref> as single-cycle designs, and at the same timing constraints of 2.5ns (400 MHz working frequency) for area and power consumption comparison. For 16-bit multipliers, we implemented with the same library but at the timing constraints of 5ns (200MHz).</p><p>For the error metrics evaluation, we developed behavioral simulation models for all the multipliers listed in Table <ref type="table">I in MATLAB</ref> 1C-1</p><p>and measured the accuracy using 1 million random inputs uniformly distributed over the set {0, 1, ..., (2 8 -1)}. The errors are reported with respect to the exact results. The error metrics used to report the error behavior include: mean error (mean of absolute relative error, also referred as MRED in some previous works <ref type="bibr">[14]</ref>); and peak error (maximum value of the absolute relative error). All the error metrics are in percentages. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>B. Performance evaluation</head><p>The error metrics and the hardware performance for the implemented multipliers are shown in Table <ref type="table">I</ref>. Since the proposed HEALM designs utilize the inexact adder in the mantissa summation, we use a parameter k in Table <ref type="table">I</ref> to represent the number of bits in the inexact summation part. For example, in the demonstrated cases shown in Fig. <ref type="figure">3</ref>(a) and Fig. <ref type="figure">3</ref>(b), k equals to 4. For ALM, LeAp, and ILM-EA designs, as these multipliers do not have an inexact summation unit, k equals to 0. For the REALM design, the value of the error configuration parameter mentioned in the work <ref type="bibr">[10]</ref> is equivalent to the number of bits in the inexact summation part, which is represented by k in our work. To avoid ambiguity, we use the same notation as HEALM design does for easy comparison.</p><p>Table <ref type="table">I</ref> demonstrates that with the same value of k, the proposed HEALM-TA design improves the error metrics upon the ALM-TA design, reducing up to 6.63%, 9.76%, in mean and peak error, respectively; HEALM-SOA improves the error metrics upon the ALM-SOA design, reducing up to 2.37%, 6.40%, in mean and peak error, respectively. When compared with the ALM baseline, HEALM-TA and HEALM-SOA can improve the mean / peak error by 2.64% / 6.25%, and 2.63% / 6.40%, respectively. When compared with REALM, which is the state of art work, the HEALM designs can Considering the trade-off among error metrics improvement and resource consumption, the previous works can improve either error metrics or resource consumption (area, power) aspect, but can hardly improve both of these aspects especially when the precision is small (like 8-bit precision) as shown in Table <ref type="table">I</ref>. To better illustrate this, we show the relationship between the mean error / peak error and area / power for all the listed multipliers in Fig. <ref type="figure">7</ref>. The rectangular area with the red dash border line in all four sub figures represents that a design outperforms the classical ALM design both in error metrics and resource consumption aspects. Notice in Fig. <ref type="figure">7(a)</ref>, only the proposed HEALM-TA and HEALM-SOA with k = 3 improve both the peak error and area aspects, decreasing the peak error with 1.36% and 3.46%, respectively. In Fig. <ref type="figure">7(c</ref>), though some previous works like ALM-SOA outperforms ALM in both mean error and area aspects, only a little improvement in mean error (at most 0.69%) was obtained. In contrast, the proposed HEALM-TA and HEALM-SOA with k = 3 reduce the mean error by 1.59% and 1.98% when compared to ALM, respectively; and provides 9.38% and 1.42% in area reduction at the same time. Besides, HEALM-TA and HEALM-SOA design with k = 4 can reduce the mean error by 0.10% and 0.64% when compared to ALM and reduce power by 29.41% and 18.23% at the same time.</p><p>The results of HEALM-TA-S (single error coefficient mode) design in Table <ref type="table">I</ref> shows that HEALM-TA-S can do better trade-offs between accuracy and resource consumption especially when the value of k is large. In case of k = 4, which is the largest value of k, HEALM-TA-S decreases the mean / peak error by up to 6.03% / 6.11%, respectively when compared to ALM-TA. Note that HEALM-TA-S achieves this improvement with almost no resource overheads. It also saves 34.84% / 41.89% of area / power with 8-bit inputs, respectively; and 30.59% / 40.09% with 16-bit inputs when compared to the ALM baseline.</p><p>C. An image processing application evaluation Now, we show how the proposed HEALM designs compare to state of art methods in an multimedia application. Discrete cosine transformation (DCT) is a commonly used lossy image compression method. The quality of the compressed images is usually evaluated using metrics such as PSNR (peak signal noise ratio) and higher PSNR value represents better image quality. We implement the proposed HEALM design with 8-bit precision in the DCT-iDCT (inverse DCT) workloads, and compare with other logarithmic multipliers on five example images. To be fair, the mantissa summation parts of all the compared logarithmic multipliers are inexact unit, except for the ALM, which is chosen as the baseline. We show the results of image compression in Table <ref type="table">III</ref>. The result shows that with different values of k, HEALM-TA can improve the image quality upon the ALM baseline by from 7.8&#8764;17.2dB in average and HEALM-SOA can improve 2.9&#8764;15.8dB in average, respectively. Besides, HEALM-TA and HEALM-SOA design outperform all the other state of art works when k = 2, 3, 4 by at least 6.3dB, 6.3dB, 8.8dB, respectively. Note that the single coefficient mode design HEALM-TA-S performs the best when k = 4, making improvement upon the ALM baseline by 4.1dB in average with extremely low resource consumption as mentioned before. This is due to the error behavior of ALM, whose outputs are always smaller than the exact product. And HEALM-TA-S with k = 4 will have a more balanced error than the cases of k = 1, 2, 3.</p><p>1C-1 V. CONCLUSION In this work, we have proposed a novel hardware-efficient approximate logarithmic multiplier, called HEALM. The proposed design, first determined the truncation width for mantissa summation in ALM. Then the error reduction is performed via a lookup table for multiple partitioned input ranges. Numerical results showed that HEALM and its enhanced designs could lead to more accurate results with reduced area and power at the same time than the existing ALM baseline design. It also outperformed the state of art design, REALM, with up to 2.92%, 9.30%, 16.08%, 17.61% improvement in mean error, peak error, area, power consumption for 8-bit precision. For discrete cosine transformation (DCT) application, with different values of k, HEALM-TA could improve the image quality upon the ALM baseline by 7.8&#8764;17.2dB in average and HEALM-SOA could improve 2.9&#8764;15.8dB in average, respectively. Besides, HEALM-TA and HEALM-SOA outperformed all the state of art works with k = 2, 3, 4 on the image quality.</p></div><note xmlns="http://www.tei-c.org/ns/1.0" place="foot" xml:id="foot_0"><p>Authorized licensed use limited to: Univ of Calif Riverside. Downloaded on May 06,2022 at 16:25:52 UTC from IEEE Xplore. Restrictions apply.</p></note>
		</body>
		</text>
</TEI>
