Images are often corrupted with noise. As a result, noise reduction is an important task in image processing.
Common noise reduction techniques, such as mean or median filtering, lead to blurring of the edges in the image, while fuzzy filters are able to preserve the edge information. In this work, we
implement an efficient hardware design for a well-known fuzzy
noise reduction filter based on stochastic computing. The filter
consists of two main stages: edge detection and fuzzy smoothing.
The fuzzy difference, which is encoded as bit-streams, is used to
detect edges. Then, fuzzy smoothing is done to average the pixel
value based on eight directions. Our experimental results show a
significant reduction in the hardware area and power consumption
compared to the conventional binary implementation while preserving the quality of the results.
more »
« less
HEALM: Hardware-Efficient Approximate Logarithmic Multiplier with Reduced Error
In this work, we propose a new approximate logarithm multipliers (ALM) based on a novel error compensation scheme. The proposed hardware-efficient ALM, named HEALM, first determines the truncation width for mantissa summation in ALM. Then the error compensation or reduction is performed via a lookup table, which stores reduction factors for different regions of input operands. This is in contrast to an existing approach, in which error reduction is performed independently of the width truncation of mantissa summation. As a result, the new design will lead to more accurate result with both reduced area and power. Furthermore, different from existing approaches which will either introduce resource overheads when doing error improvement or lose accuracy when saving area and power, HEALM can improve accuracy and resource consumption at the same time. Our study shows that 8-bit HEALM can achieve up to 2.92%, 9.30%, 16.08%, 17.61% improvement in mean error, peak error, area, power consumption respectively over REALM, which is the state of art work with the same number of bits truncated. We also propose a single error coefficient mode named HEALM-TA-S, which improves the ALM design with a truncation adder (TA) for mantissa summation. Furthermore, we evaluate the proposed HEALM design in a discrete cosine transformation (DCT) application. The result shows that with different values of k, HEALM-TA can improve the image quality upon the ALM baseline by 7.8 to 17.2dB in average and HEALM-SOA can improve 2.9 to15.8dB in average, respectively. Besides, HEALM-TA and HEALM-SOA outperform all the state of artworks with k=2,3,4 on the image quality. And the single coefficient mode, HEALM-TA-S, can improve the image quality upon the baseline up to 4.1dB in average with extremely low resource consumption
more »
« less
- Award ID(s):
- 1854276
- NSF-PAR ID:
- 10324765
- Date Published:
- Journal Name:
- Proc. Asia South Pacific Design Automation Conference (ASP-DAC’22)
- Page Range / eLocation ID:
- 37 to 42
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Approximate computing (AC) leverages the inherent error resilience and is used in many big-data applications from various domains such as multimedia, computer vision, signal processing, and machine learning to improve systems performance and power consumption. Like many other approximate circuits and algorithms, the memory subsystem can also be used to enhance performance and save power significantly. This paper proposes an efficient and effective systematic methodology to construct an approximate non-volatile magneto-resistive RAM (MRAM) framework using consumer-off-the-shelf (COTS) MRAM chips. In the proposed scheme, an extensive experimental characterization of memory errors is performed by manipulating the write latency of MRAM chips which exploits the inherent (intrinsic/extrinsic process variation) stochastic switching behavior of magnetic tunnel junctions (MTJs). The experimental results, involving error-resilient image compression and machine learning applications, reveal that the proposed AC framework provides a significant performance improvement and demonstrates a reduction in MRAM write energy of ~47.5% on average with negligible or no loss in output quality.more » « less
-
Network-on-Chips (NoCs) have emerged as the standard on-chip communication fabrics for multi/many core systems and system on chips. However, as the number of cores on chip increases, so does power consumption. Recent studies have shown that NoC power consumption can reach up to 40% of the overall chip power. Considerable research efforts have been deployed to significantly reduce NoC power consumption. In this paper, we build on approximate computing techniques and propose an approximate communication methodology called DEC-NoC for reducing NoC power consumption. The proposed DEC-NoC leverages applications' error tolerance and dynamically reduces the amount of error checking and correction in packet transmission, which results in a significant reduction in the number of retransmitted packets. The reduction in packet retransmission results in reduced power consumption. Our cycle accurate simulation using PARSEC benchmark suites shows that DEC-NoC achieves up to 56% latency reduction and up to 58% dynamic power reduction compared to NoC architectures with conventional error control techniques.more » « less
-
Stochastic computing (SC) is a low-cost computational paradigm that has promising applications in digital filter design, image processing, and neural networks. Fundamental to these applications is the weighted addition operation, which is most often implemented by a multiplexer (mux) tree. Mux-based adders have very low area but typically require long bitstreams to reach practical accuracy thresholds when the number of summands is large. In this work, we first identify the main contributors to mux adder error. We then demonstrate with analysis and experiment that two new techniques, precise sampling and full correlation, can target and mitigate these error sources. Implementing these techniques in hardware leads to the design of CeMux (Correlation-enhanced Multiplexer), a stochastic mux adder that is significantly more accurate and uses much less area than traditional weighted adders. We compare CeMux to other SC and hybrid designs for an electrocardiogram filtering case study that employs a large digital filter. One major result is that CeMux is shown to be accurate even for large input sizes. CeMux's higher accuracy leads to a latency reduction of 4× to 16× over other designs. Furthermore, CeMux uses about 35% less area than existing designs, and we demonstrate that a small amount of accuracy can be traded for a further 50% reduction in area. Finally, we compare CeMux to a conventional binary design and we show that CeMux can achieve a 50% to 73% area reduction for similar power and latency as the conventional design but at a slightly higher level of error.more » « less
-
Abstract. Heterogeneous chemistry of oxidized carbons in aerosol phase is known to significantly contribute to secondary organic aerosol (SOA) burdens. TheUNIfied Partitioning Aerosol phase Reaction (UNIPAR) model was developed to process the multiphase chemistry of various oxygenated organics into SOAmass predictions in the presence of salted aqueous phase. In this study, the UNIPAR model simulated the SOA formation from gasoline fuel, which is amajor contributor to the observed concentration of SOA in urban areas. The oxygenated products, predicted by the explicit mechanism, were lumpedaccording to their volatility and reactivity and linked to stoichiometric coefficients which were dynamically constructed by predetermined mathematical equations at different NOx levels and degrees of gas aging. To improve the model feasibility in regional scales, the UNIPAR model was coupled with the Carbon Bond 6 (CB6r3) mechanism. CB6r3 estimated the hydrocarbon consumption and the concentration of radicals (i.e., RO2 and HO2) to process atmospheric aging of gas products. The organic species concentrations, estimated bystoichiometric coefficient array and the consumption of hydrocarbons, were applied to form gasoline SOA via multiphase partitioning andaerosol-phase reactions. To improve the gasoline SOA potential in ambient air, model parameters were also corrected for gas–wall partitioning(GWP). The simulated gasoline SOA mass was evaluated against observed data obtained in the University of Florida Atmospheric PHotochemical Outdoor Reactor (UF-APHOR) chamber under varying sunlight, NOx levels, aerosol acidity, humidity, temperature, and concentrations of aqueous salts and gasoline vapor. Overall, gasoline SOAwas dominantly produced via aerosol-phase reaction, regardless of the seed conditions owing to heterogeneous reactions of reactive multifunctionalorganic products. Both the measured and simulated gasoline SOA was sensitive to seed conditions showing a significant increase in SOA mass with increasing aerosol acidity and water content. A considerable difference in SOA mass appeared between two inorganic aerosol states (dry aerosol vs. wet aerosol) suggesting a large difference in SOA formation potential between arid (western United States) and humid regions (eastern United States). Additionally, aqueous reactions of organic products increased the sensitivity of gasoline SOA formation to NOx levels as well as temperature. The impact of the chamber wall on SOA formation was generally significant, and it appeared to be higher in the absence of wet salts. Based on the evaluation of UNIPAR against chamber data from 10 aromatic hydrocarbons and gasoline fuel, we conclude that the UNIPAR model with both heterogeneous reactions and the model parameters corrected for GWP can improve the ability to accurately estimate SOA mass in regional scales.more » « less