skip to main content


Title: Accelerated fixed-point iterative reconstruction for fiber borescope imaging

Computational imaging systems with embedded processing have potential advantages in power consumption, computing speed, and cost. However, common processors in embedded vision systems have limited computing capacity and low level of parallelism. The widely used iterative algorithms for image reconstruction rely on floating-point processors to ensure calculation precision, which require more computing resources than fixed-point processors. Here we present a regularized Landweber fixed-point iterative solver for image reconstruction, implemented on a field programmable gated array (FPGA). Compared with floating-point embedded uniprocessors, iterative solvers implemented on the fixed-point FPGA gain 1 to 2 orders of magnitude acceleration, while achieving the same reconstruction accuracy in comparable number of effective iterations. Specifically, we have demonstrated the proposed fixed-point iterative solver in fiber borescope image reconstruction, successfully correcting the artifacts introduced by the lenses and fiber bundle.

 
more » « less
NSF-PAR ID:
10471736
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Optical Society of America
Date Published:
Journal Name:
Optics Express
Volume:
31
Issue:
23
ISSN:
1094-4087; OPEXFF
Format(s):
Medium: X Size: Article No. 38355
Size(s):
["Article No. 38355"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Solving linear systems, often accomplished by iterative algorithms, is a ubiquitous task in science and engineering. To accommodate the dynamic range and precision requirements, these iterative solvers are carried out on floating-point processing units, which are not efficient in handling large-scale matrix multiplications and inversions. Low-precision, fixed-point digital or analog processors consume only a fraction of the energy per operation than their floating-point counterparts, yet their current usages exclude iterative solvers due to the cumulative computational errors arising from fixed-point arithmetic. In this work, we show that for a simple iterative algorithm, such as Richardson iteration, using a fixed-point processor can provide the same convergence rate and achieve solutions beyond its native precision when combined with residual iteration. These results indicate that power-efficient computing platforms consisting of analog computing devices can be used to solve a broad range of problems without compromising the speed or precision. 
    more » « less
  2. Brain-inspired Hyperdimensional (HD) computing models cognition by exploiting properties of high dimensional statistics– high-dimensional vectors, instead of working with numeric values used in contemporary processors. A fundamental weakness of existing HD computing algorithms is that they require to use floating point models in order to provide acceptable accuracy on realistic classification problems. However, working with floating point values significantly increases the HD computation cost. To address this issue, we proposed QuantHD, a novel framework for quantization of HD computing model during training. QuantHD enables HD computing to work with a low-cost quantized model (binary or ternary model) while providing a similar accuracy as the floating point model. We accordingly propose an FPGA implementation which accelerates HD computing in both training and inference phases. We evaluate QuantHD accuracy and efficiency on various real-world applications, and observe that QuantHD can achieve on average 17.2% accuracy improvement as compared to the existing binarized HD computing algorithms which provide a similar computation cost. In terms of efficiency, QuantHD FPGA implementation can achieve on average 42.3× and 4.7× (34.1× and 4.1×) energy efficiency improvement and speedup during inference (training) as compared to the state-of-the-art HD computing algorithms. 
    more » « less
  3. Cloud providers such as Amazon and Microsoft have begun to support on-demand FPGA acceleration in the cloud, and hardware vendors will support FPGAs in future processors. At the same time, technology advancements such as 3D stacking, through-silicon vias (TSVs), and FinFETs have greatly increased FPGA density. The massive parallelism of current FPGAs can support not only extremely large applications, but multiple applications simultaneously as well. System support for FPGAs, however, is in its infancy. Unlike software, where resource configurations are limited to simple dimensions of compute, memory, and I/O, FPGAs provide a multi-dimensional sea of resources known as the FPGA fabric: logic cells, floating point units, memories, and I/O can all be wired together, leading to spatial constraints on FPGA resources. Current stacks either support only a single application or statically partition the FPGA fabric into fixed-size slots. These designs cannot efficiently support diverse workloads: the size of the largest slot places an artificial limit on application size, and oversized slots result in wasted FPGA resources and reduced concurrency. This paper presents AMORPHOS, which encapsulates user FPGA logic in morphable tasks, or Morphlets. Morphlets provide isolation and protection across mutually distrustful protection domains, extending the guarantees of software processes. Morphlets can morph, dynamically altering their deployed form based on resource requirements and availability. To build Morphlets, developers provide a parameterized hardware design that interfaces with AMORPHOS, along with a mesh, which specifies external resource requirements. AMORPHOS explores the parameter space, generating deployable Morphlets of varying size and resource requirements. AMORPHOS multiplexes Morphlets on the FPGA in both space and time to maximize FPGA utilization. We implement AMORPHOS on Amazon F1 [1] and Microsoft Catapult [92]. We show that protected sharing and dynamic scalability support on workloads such as DNN inference and blockchain mining improves aggregate throughput up to 4× and 23× on Catapult and F1 respectively. 
    more » « less
  4. Sitting is the most common status of modern human beings. Some sitting postures may bring health issues. To prevent the harm from bad sitting postures, a local sitting posture recognition system is desired with low power consumption and low computing overhead. The system should also provide good user experience with accuracy and privacy. This paper reports a novel posture recognition system on an office chair that can categorize seven different health-related sitting postures. The system uses six flex sensors, an Analog to Digital Converter (ADC) board and a Machine Learning algorithm of a two-layer Artificial Neural Network (ANN) implemented on a Spartan-6 Field Programmable Gate Array (FPGA). The system achieves 97.78% accuracy with a floating-point evaluation and 97.43% accuracy with the 9-bit fixed-point implementation. The ADC control logic and the ANN are constructed with a maximum propagation delay of 8.714 ns. The dynamic power consumption is 7.35 mW when the sampling rate is 5 Sample/second with the clock frequency of 5 MHz. 
    more » « less
  5. In this paper, we explore the prospect of accelerating tree-based genetic programming (TGP) by way of modern field-programmable gate array (FPGA) devices, which is motivated by the fact that FPGAs can sometimes leverage larger amounts of data/function parallelism, as well as better energy efficiency, when compared to general-purpose CPU/GPU systems. In our preliminary study, we introduce a fixed-depth, tree-based architecture capable of evaluating type-consistent primitives that can be fully unrolled and pipelined. The current primitive constraints preclude arbitrary control structures, but they allow for entire programs to be evaluated every clock cycle. Using a variety of floating-point primitives and random programs, we compare to the recent TensorGP tool executing on a modern 8 nm GPU, and we show that our accelerator implemented on a 14 nm FPGA achieves an average speedup of 43×. When compared to the popular baseline tool DEAP executing across all cores of a 2-socket, 28-core (56-thread), 14 nm CPU server, our accelerator achieves an average speedup of 4,902×. Finally, when compared to the recent state-of-the-art tool Operon executing on the same 2-processor CPU system, our accelerator executes about 2.4× slower on average. Despite not achieving an average speedup over every tool tested, our single-FPGA accelerator is the fastest in several instances, and we describe five future extensions that could allow for a 32–144× speedup over our current design as well as allow for larger program depths/sizes. Overall, we estimate that a future version of our accelerator will constitute a state-of-the-art GP system for many applications. 
    more » « less