NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Searching for Fast Demosaicking Algorithms

https://doi.org/10.1145/3508461

Ma, Karima; Gharbi, Michael; Adams, Andrew; Kamil, Shoaib; Li, Tzu-Mao; Barnes, Connelly; Ragan-Kelley, Jonathan (October 2022, ACM Transactions on Graphics)

We present a method to automatically synthesize efficient, high-quality demosaicking algorithms, across a range of computational budgets, given a loss function and training data. It performs a multi-objective, discrete-continuous optimization which simultaneously solves for the program structure and parameters that best tradeoff computational cost and image quality. We design the method to exploit domain-specific structure for search efficiency. We apply it to several tasks, including demosaicking both Bayer and Fuji X-Trans color filter patterns, as well as joint demosaicking and super-resolution. In a few days on 8 GPUs, it produces a family of algorithms that significantly improves image quality relative to the prior state-of-the-art across a range of computational budgets from 10 s to 1000 s of operations per pixel (1 dB–3 dB higher quality at the same cost, or 8.5–200× higher throughput at same or better quality). The resulting programs combine features of both classical and deep learning-based demosaicking algorithms into more efficient hybrid combinations, which are bandwidth-efficient and vectorizable by construction. Finally, our method automatically schedules and compiles all generated programs into optimized SIMD code for modern processors.
more » « less
Full Text Available
Exocompilation for productive programming of hardware accelerators

https://doi.org/10.1145/3519939.3523446

Ikarashi, Yuka; Bernstein, Gilbert Louis; Reinking, Alex; Genc, Hasan; Ragan-Kelley, Jonathan (June 2022, Proceedings of the 43rd ACM SIGPLAN International Conference on Programming Language Design and Implementation)

High-performance kernel libraries are critical to exploiting accelerators and specialized instructions in many applications. Because compilers are difficult to extend to support diverse and rapidly-evolving hardware targets, and automatic optimization is often insufficient to guarantee state-of-the-art performance, these libraries are commonly still coded and optimized by hand, at great expense, in low-level C and assembly. To better support development of high-performance libraries for specialized hardware, we propose a new programming language, Exo, based on the principle of exocompilation: externalizing target-specific code generation support and optimization policies to user-level code. Exo allows custom hardware instructions, specialized memories, and accelerator configuration state to be defined in user libraries. It builds on the idea of user scheduling to externalize hardware mapping and optimization decisions. Schedules are defined as composable rewrites within the language, and we develop a set of effect analyses which guarantee program equivalence and memory safety through these transformations. We show that Exo enables rapid development of state-of-the-art matrix-matrix multiply and convolutional neural network kernels, for both an embedded neural accelerator and x86 with AVX-512 extensions, in a few dozen lines of code each.
more » « less
Full Text Available
Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU

https://doi.org/10.1145/3485486

Anderson, Luke; Adams, Andrew; Ma, Karima; Li, Tzu-Mao; Jin, Tian; Ragan-Kelley, Jonathan (October 2021, Proceedings of the ACM on programming languages)

Full Text Available
Automatically translating image processing libraries to halide

https://doi.org/10.1145/3355089.3356549

Ahmad, Maaz Bin; Ragan-Kelley, Jonathan; Cheung, Alvin; Kamil, Shoaib (November 2019, ACM Transactions on Graphics)

Full Text Available
Taichi: a language for high-performance computation on spatially sparse data structures

https://doi.org/10.1145/3355089.3356506

Hu, Yuanming; Li, Tzu-Mao; Anderson, Luke; Ragan-Kelley, Jonathan; Durand, Frédo (November 2019, ACM Transactions on Graphics)

Full Text Available
Learning to optimize halide with tree search and random programs

https://doi.org/10.1145/3306346.3322967

Adams, Andrew; Durand, Frédo; Ragan-Kelley, Jonathan; Ma, Karima; Anderson, Luke; Baghdadi, Riyadh; Li, Tzu-Mao; Gharbi, Michaël; Steiner, Benoit; Johnson, Steven; et al (July 2019, ACM Transactions on Graphics)

Full Text Available
Swizzle Inventor: Data Movement Synthesis for GPU Kernels

https://doi.org/10.1145/3297858.3304059

Phothilimthana, Phitchaya Mangpo; Bodik, Rastislav; Elliott, Archibald Samuel; Wang, An; Jangda, Abhinav; Hagedorn, Bastian; Barthels, Henrik; Kaufman, Samuel J.; Grover, Vinod; Torlak, Emina (April 2019, ASPLOS)

Utilizing memory and register bandwidth in modern architectures may require swizzles — non-trivial mappings of data and computations onto hardware resources — such as shuffles. We develop Swizzle Inventor to help programmers implement swizzle programs, by writing program sketches that omit swizzles and delegating their creation to an automatic synthesizer. Our synthesis algorithm scales to real-world programs, allowing us to invent new GPU kernels for stencil computations, matrix transposition, and a finite field multiplication algorithm (used in cryptographic applications). The synthesized 2D convolution and finite field multiplication kernels are on average 1.5–3.2x and 1.1–1.7x faster, respectively, than expert-optimized CUDA kernels.
more » « less
Full Text Available
Differentiable programming for image processing and deep learning in halide

https://doi.org/10.1145/3197517.3201383

Li, Tzu-Mao; Gharbi, Michaël; Adams, Andrew; Durand, Frédo; Ragan-Kelley, Jonathan (July 2018, ACM Transactions on Graphics)

Full Text Available
Generalized data structure synthesis

https://doi.org/10.1145/3180155.3180211

Loncaric, Calvin; Ernst, Michael D.; Torlak, Emina (January 2018, Proceedings of the 40th International Conference on Software Engineering)

Full Text Available

Search for: All records