Stochastic computing (SC) division circuits have gained importance in recent years compared to other arithmetic circuits due to their low complexity as a result of an accuracy tradeoff. Designing a division circuit is already complex in conventional binary-based hardware systems. Developing an accurate and efficient SC division circuit is an open research problem. Prior work proposed different SC division circuits by using multiplexers and JK-flip-flop units, which may require correlated or uncorrelated input bit-streams. This study is primarily centered on exploring a cost-effective and highly efficient bit-stream generator specifically designed for SC division circuits. In conjunction with this objective, we assess the performance of multiple bit-stream generators and analyze the impact of correlation on SC division. We compare different designs in terms of accuracy and hardware cost. Moreover, we discuss a low-cost and energy-efficient bit-stream generator via powers-of-2 Van der Corput (VDC) sequences. Among the tested sequence generators, our best results were achieved with VDC sequences. Our evaluation results demonstrate that the novel VDC-based design yields promising outputs, resulting in a 15.5% reduction in the area-delay product and an 18.05% saving in energy consumption for the same accuracy level compared to conventional bit-stream generators. Significantly, our investigation reveals that employing the proposed generator improves the precision compared to the state-of-the-art. We validate the proposed architecture with an image processing case study, achieving high PSNR and structural similarity values.
more »
« less
Scalable Low-Cost Sorting Network with Weighted Bit-Streams
Sorting is a fundamental function in many applications from data processing to database systems. For high performance, sorting-hardware based sorting designs are implemented by conventional binary or emerging stochastic computing (SC) approaches. Binary designs are fast and energy-efficient but costly to implement. SC-based designs, on the other hand, are area and power-efficient but slow and energy-hungry. So, the previous studies of the hardware-based sorting further faced scalability issues. In this work, we propose a novel scalable low-cost design for implementing sorting networks. We borrow the concept of SC for the area- and power efficiency but use weighted stochastic bit-streams to address the high latency and energy consumption issue of SC designs. A new lock and swap (LAS) unit is proposed to sort weighted bit-streams. The LAS-based sorting network can determine the result of comparing different input values early and then map the inputs to the corresponding outputs based on shorter weighted bit-streams. Experimental results show that the proposed design approach achieves much better hardware scalability than prior work. Especially, as increasing the number of inputs, the proposed scheme can reduce the energy consumption by about 3.8% - 93% compared to prior binary and SC-based designs.
more »
« less
- Award ID(s):
- 2019511
- PAR ID:
- 10431795
- Date Published:
- Journal Name:
- 24th International Symposium on Quality Electronic Design (ISQED '23)
- Volume:
- 1
- Page Range / eLocation ID:
- 1 to 6
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Multiply-accumulate (MAC) operations are common in data processing and machine learning but costly in terms of hardware usage. Stochastic Computing (SC) is a promising approach for low-cost hardware design of complex arithmetic operations such as multiplication. Computing with deterministic unary bit-streams (defined as bit-streams with all 1s grouped together at the beginning or end of a bit-stream) has been recently suggested to improve the accuracy of SC. Conventionally, SC designs use multiplexer (MUX) units or OR gates to accumulate data in the stochastic domain. MUX-based addition suffers from scaling of data and OR-based addition from inaccuracy. This work proposes a novel technique for MAC operation on unary bit-streamsthat allows exact, non-scaled addition of multiplication results. By introducing a relative delay between the products, we control correlation between bit-streams and eliminate OR-based addition error. We evaluate the accuracy of the proposed technique compared to the state-of-the-art MAC designs. After quantization, the proposed technique demonstrates at least 37% and up to 100% decrease of the mean absolute error for uniformly distributed random input values, compared to traditional OR-based MAC designs. Further, we demonstrate that the proposed technique is practical and evaluate area, power and energy of three possible implementations.more » « less
-
Stochastic computing (SC) is a re-emerging computing paradigm providing low-cost and noise-tolerant designs for a wide range of arithmetic operations. SC circuits operate on uniform bit-streams with the value determined by the probability of observing 1’s in the bit-stream. The accuracy of SC operations highly depends on the correlation between input bit-streams. While some operations such as minimum and maximum value functions require highly correlated inputs, some other such as multiplication operation need uncorrelated or independent inputs for accurate computation. Developing low-cost and accurate correlation manipulation circuits is an important research in SC as these circuits can manage correlation between bit-streams without expensive bit-stream regeneration. This work proposes a novel in-stream correlator and decorrelator circuit that manages 1) correlation between stochastic bit-streams, and 2) distribution of 1’s in the output bit-streams. Compared to state-of-the-art solutions, our designs achieve lower hardware cost and higher accuracy. The output bit-streams enjoy a low-discrepancy distribution of bits which leads to higher quality of results. The effectiveness of the proposed circuits is shown with two case studies: SC design of sorting and median filteringmore » « less
-
Stochastic computing (SC) can lead area-efficient implementation of logic designs. Existing SC multiplication, however, suffers a long-standing problem: large multiplication error with small inputs due to its intrinsic nature of bit-stream based computing. In this article, we propose a new scaled counting-based SC multiplication approach, called {\it Scaled-CBSC}, to mitigate this issue by introducing scaling bits to ensure the bit `1' density of the stochastic number is sufficiently large. The idea is to convert the ``small'' inputs to ``large'' inputs, thus improve the accuracy of SC multiplication. But different from an existing stream-bit based approach, the new method uses the binary format and does not require stochastic addition as the SC multiplication always starts with binary numbers. Furthermore, Scaled-CBSC only requires all the numbers to be larger than 0.5 instead of arbitrary defined threshold, which leads to integer numbers only for the scaling term. The experimental results show that the 8-bit Scaled-CBSC multiplication with 3 scaling bits can achieve up to 46.6\% and 30.4\% improvements in mean error and standard deviation, respectively; reduce the peak relative error from 100\% to 1.8\%; and improve 12.6\%, 51.5\%, 57.6\%, 58.4\% in delay, area, area-delay product, energy consumption, respectively, over the state of art work.more » « less
-
Low-cost and hardware-efficient design of trigonometric functions is challenging. Stochastic computing (SC), an emerging computing model processing random bit-streams, offers promising solutions for this problem. The existing implementations, however, often overlook the importance of the data converters necessary to generate the needed bit-streams. While recent advancements in SC bit-stream generators focus on basic arithmetic operations such as multiplication and addition, energy-efficient SC design of non-linear functions demands attention to both the computation circuit and the bit-stream generator. This work introduces TriSC, a novel approach for SC-based design of trigonometric functions enjoying state-of-the-art (SOTA) quasi-random bit-streams. Unlike SOTA SC designs of trigonometric functions that heavily rely on delay elements to decorrelate bit-streams, our approach avoids delay elements while improving the accuracy of the results. TriSC yields significant energy savings of up to 92% compared to SOTA. As two novel use cases studied for the first time in SC literature, we employ the proposed design for 2D image transformation and forward kinematics of a robotic arm, two computation-intensive applications demanding low-cost trigonometric designs.more » « less
An official website of the United States government

