NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Twig: Profile-Guided BTB Prefetching for Data Center Applications

https://doi.org/10.1145/3466752.3480124

Khan, Tanvir Ahmed; Brown, Nathan; Sriraman, Akshitha; Soundararajan, Niranjan K; Kumar, Rakesh; Devietti, Joseph; Subramoney, Sreenivas; Pokam, Gilles A; Litz, Heiner; Kasikci, Baris (October 2021, 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21))
null (Ed.)
Full Text Available
PDede: Partitioned, Deduplicated, Delta Branch Target Buffer

https://doi.org/10.1145/3466752.3480046

Soundararajan, Niranjan K; Braun, Peter; Khan, Tanvir Ahmed; Kasikci, Baris; Litz, Heiner; Subramoney, Sreenivas (October 2021, 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21))
null (Ed.)
Full Text Available
Ripple: Profile-Guided Instruction Cache Replacement for Data Center Applications

https://doi.org/10.1109/ISCA52012.2021.00063

Khan, Tanvir Ahmed; Zhang, Dexin; Sriraman, Akshitha; Devietti, Joseph; Pokam, Gilles; Litz, Heiner; Kasikci, Baris (June 2021, 48th Annual International Symposium on Computer Architecture (ISCA))
null (Ed.)
Full Text Available
I-SPY: Context-Driven Conditional Instruction Prefetching with Coalescing

https://doi.org/10.1109/MICRO50266.2020.00024

Khan, Tanvir Ahmed; Sriraman, Akshitha; Devietti, Joseph; Pokam, Gilles; Litz, Heiner; Kasikci, Baris (October 2020, Proceedings of the 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO))
null (Ed.)
Modern data center applications have rapidly expanding instruction footprints that lead to frequent instruction cache misses, increasing cost and degrading data center performance and energy efficiency. Mitigating instruction cache misses is challenging since existing techniques (1) require significant hardware modifications, (2) expect impractical on-chip storage, or (3) prefetch instructions based on inaccurate understanding of program miss behavior. To overcome these limitations, we first investigate the challenges of effective instruction prefetching. We then use insights derived from our investigation to develop I-SPY, a novel profile-driven prefetching technique. I-SPY uses dynamic miss profiles to drive an offline analysis of I-cache miss behavior, which it uses to inform prefetching decisions. Two key techniques underlie I-SPY's design: (1) conditional prefetching, which only prefetches instructions if the program context is known to lead to misses, and (2) prefetch coalescing, which merges multiple prefetches of non-contiguous cache lines into a single prefetch instruction. I-SPY exposes these techniques via a family of light-weight hardware code prefetch instructions. We study I-SPY in the context of nine data center applications and show that it provides an average of 15.5% (up to 45.9%) speedup and 95.9% (and up to 98.4%) reduction in instruction cache misses, outperforming the state-of-the-art prefetching technique by 22.5%. We show that I-SPY achieves performance improvements that are on average 90.5% of the performance of an ideal cache with no misses.
more » « less
Full Text Available
Learning I/O Access Patterns to Improve Prefetching in SSDs

Chakraborttii, Chandranil; Litz, Heiner (July 2020, ICML-PKDD)

Abstract. Flash based solid state drives (SSDs) have established them- selves as a higher-performance alternative to hard disk drives in cloud and mobile environments. Nevertheless, SSDs remain a performance bot- tleneck of computer systems due to their high I/O access latency. A com- mon approach for improving the access latency is prefetching. Prefetch- ing predicts future block accesses and preloads them into main memory ahead of time. In this paper, we discuss the challenges of prefetching in SSDs, explain why prior approaches fail to achieve high accuracy, and present a neural network based prefetching approach that signi cantly outperforms the state-of the-art. To achieve high performance, we ad- dress the challenges of prefetching in very large sparse address spaces, as well as prefetching in a timely manner by predicting ahead of time. We collect I/O trace les from several real-world applications running on cloud servers and show that our proposed approach consistently outper- forms the existing stride prefetchers by up to 800 and prior prefetching approaches based on Markov chains by up to 8. Furthermore, we pro- pose an address mapping learning technique to demonstrate the applica- bility of our approach to previously unseen SSD workloads and perform a hyperparameter sensitivity study.
more » « less
Full Text Available
AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers

https://doi.org/10.1109/MM.2020.2986212

Nagendra, Nayana Prasad; Ayers, Grant; August, David I.; Cho, Hyoun Kyu; Kanev, Svilen; Kozyrakis, Christos; Krishnamurthy, Trivikram; Litz, Heiner; Moseley, Tipp; Ranganathan, Parthasarathy (May 2020, IEEE Micro)

Full Text Available
Classifying Memory Access Patterns for Prefetching

https://doi.org/10.1145/3373376.3378498

Ayers, Grant; Litz, Heiner; Kozyrakis, Christos; Ranganathan, Parthasarathy (March 2020, ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems)

Prefetching is a well-studied technique for addressing the memory access stall time of contemporary microprocessors. However, despite a large body of related work, the memory access behavior of applications is not well understood, and it remains difficult to predict whether a particular application will benefit from a given prefetcher technique. In this work we propose a novel methodology to classify the memory access patterns of applications, enabling well-informed reasoning about the applicability of a certain prefetcher. Our approach leverages instruction dataflow information to uncover a wide range of access patterns, including arbitrary combinations of offsets and indirection. These combinations or prefetch kernels represent reuse, strides, reference locality, and complex address generation. By determining the complexity and frequency of these access patterns, we enable reasoning about prefetcher timeliness and criticality, exposing the limitations of existing prefetchers today. Moreover, using these kernels, we are able to compute the next address for the majority of top-missing instructions, and we propose a software prefetch injection methodology that is able to outperform state-of-the-art hardware prefetchers.
more » « less
Full Text Available
Understanding Memory Access Patterns for Prefetching

Braun, Peter; Litz, Heiner (April 2019, International Workshop on AI-assisted Design for Architecture (AIDArc), held in conjunction with ISCA)

The Von Neumann bottleneck is a persistent problem in computer architecture, causing stalls and wasted CPU cycles. The Van Neumann bottleneck is particularly relevant for memory-intensive workloads whose working set does not fit into the microprocessor’s cache and hence memory accesses suffer the high access latency of DRAM. One technique to address this bottleneck is to prefetch data from memory into on-chip caches. While prefetching has proven successful, for simple access patterns such as strides, existing prefetchers are incapable of providing benefit for applications with complex, irregular access patterns. A neural network-based prefetcher shows promise for these challenging workloads. We provide a better understanding of what type of memory access patterns an LSTM neural network can learn by training individual models on microbenchmarks with well-characterized memory access patterns. We explore a range of model parameters and provide a better understanding of what model is ideal to use. We achieve over 95% accuracy on the microbenchmarks and find a strong relationship between lookback (history window) size and the ability of the model to learn the pattern. We find also an upper limit on the number of concurrent distinct memory access streams that can be learned by a model of a given size.
more » « less
Full Text Available
SSP: Eliminating Redundant Writes in Failure-Atomic NVRAMs via Shadow Sub-Paging

https://doi.org/10.1145/3352460.3358326

Ni, Yuanjiang; Zhao, Jishen; Litz, Heiner; Bittman, Daniel; Miller, Ethan L. (January 2019, IEEE/ACM International Symposium on Microarchitecture)

Full Text Available

Search for: All records