NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Analyzing and Leveraging Remote-Core Bandwidth for Enhanced Performance in GPUs

https://doi.org/10.1109/PACT.2019.00028

Ibrahim, Mohamed Assem; Liu, Hongyuan; Kayiran, Onur; Jog, Adwait (September 2019, 28th International Conference on Parallel Architectures and Compilation Techniques (PACT))

Full Text Available
Opportunistic computing in GPU architectures

https://doi.org/10.1145/3307650.3322212

Pattnaik, Ashutosh; Tang, Xulong; Kayiran, Onur; Jog, Adwait; Mishra, Asit; Kandemir, Mahmut T.; Sivasubramaniam, Anand; Das, Chita R. (June 2019, ISCA '19: Proceedings of the 46th International Symposium on Computer Architecture)

Full Text Available
Address-stride assisted approximate load value prediction in GPUs

https://doi.org/10.1145/3330345.3330362

Wang, Haonan; Ibrahim, Mohamed; Mittal, Sparsh; Jog, Adwait (June 2019, ICS '19: Proceedings of the ACM International Conference on Supercomputing)

Full Text Available
Architectural Support for Efficient Large-Scale Automata Processing

https://doi.org/10.1109/MICRO.2018.00078

Liu, Hongyuan; Ibrahim, Mohamed; Kayiran, Onur; Pai, Sreepathi; Jog, Adwait (October 2018, 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO))

Full Text Available
Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management

https://doi.org/10.1109/HPCA.2018.00030

Wang, Haonan; Luo, Fan; Ibrahim, Mohamed; Kayiran, Onur; Jog, Adwait (February 2018, IEEE International Symposium on High Performance Computer Architecture (HPCA))

Full Text Available
Architecting SOT-RAM Based GPU Register File

https://doi.org/10.1109/ISVLSI.2017.17

Mittal, Sparsh; Bishnoi, Rajendra; Oboril, Fabian; Wang, Haonan; Tahoori, Mehdi; Jog, Adwait; Vetter, Jeffrey S. (July 2017, IEEE Computer Society Annual Symposium on VLSI (ISVLSI))

With the increase in GPU register file (RF) size, its power consumption has also increased. Since RF exists at the highest level in cache hierarchy, designing it with memories with high write latency/energy (e.g., spin transfer torque RAM) can lead to large energy loss. In this paper, we present an spin orbit torque RAM (SOT-RAM) based RF design which provides higher energy efficiency than SRAM and STT-RAM RFs while maintaining performance same as that of SRAM RF. To further improve energy efficiency of SOT-RAM based RF, we propose avoiding redundant bit-writes to RF. Compared to SRAM RF, SOT-RAM RF saves 18.6% energy and by using our technique for avoiding redundant writes, the energy saving can be increased to 44.3%, without harming performance.
more » « less
Full Text Available
Controlled Kernel Launch for Dynamic Parallelism in GPUs

https://doi.org/10.1109/HPCA.2017.14

Tang, Xulong; Pattnaik, Ashutosh; Jiang, Huaipan; Kayiran, Onur; Jog, Adwait; Pai, Sreepathi; Ibrahim, Mohamed; Kandemir, Mahmut T.; Das, Chita R. (February 2017, IEEE International Symposium on High Performance Computer Architecture (HPCA))

Dynamic parallelism (DP) is a promising feature for GPUs, which allows on-demand spawning of kernels on the GPU without any CPU intervention. However, this feature has two major drawbacks. First, the launching of GPU kernels can incur significant performance penalties. Second, dynamically-generated kernels are not always able to efficiently utilize the GPU cores due to hardware-limits. To address these two concerns cohesively, we propose SPAWN, a runtime framework that controls the dynamically-generated kernels, thereby directly reducing the associated launch overheads and queuing latency. Moreover, it allows a better mix of dynamically-generated and original (parent) kernels for the scheduler to effectively hide the remaining overheads and improve the utilization of the GPU resources. Our results show that, across 13 benchmarks, SPAWN achieves 69% and 57% speedup over the flat (non-DP) implementation and baseline DP, respectively.
more » « less
Full Text Available
Design and Analysis of Soft-Error Resilience Mechanisms for GPU Register File

https://doi.org/10.1109/VLSID.2017.14

Mittal, Sparsh; Wang, Haonan; Jog, Adwait; Vetter, Jeffrey S. (January 2017, International Conference on VLSI Design and International Conference on Embedded Systems (VLSID))

Modern graphics processing units (GPUs) are using increasingly larger register file (RF) which occupies a large fraction of GPU core area and is very frequently accessed. This makes RF vulnerable to soft-errors (SE). In this paper, we present two techniques for improving SE resilience of GPU RF. First, we propose compressing the RF values for reducing the number of vulnerable bits. We leverage value similarity and the presence of narrow-width values to perform compression at warp or thread-level, respectively. Second, we propose selective hardening to design a portion of register entry with SE immune circuits. By collectively using these techniques, higher resilience can be provided with lower overhead. Without hardening, our warp and thread-level compression techniques bring 47.0% and 40.8% reduction in SE vulnerability, respectively.
more » « less
Full Text Available

Search for: All records