In parallel with big data processing and analysis dominating the usage of distributed and Cloud infrastructures, the demand for distributed metadata access and transfer has increased. The volume of data generated by many application domains exceeds petabytes, while the corresponding metadata amounts to terabytes or even more. This paper proposes a novel solution for efficient and scalable metadata access for distributed applications across wide-area networks, dubbed SMURF. Our solution combines novel pipelining and concurrent transfer mechanisms with reliability, provides distributed continuum caching and semantic locality-aware prefetching strategies to sidestep fetching latency, and achieves scalable and high-performance metadata fetch/prefetch services in the Cloud. We incorporate the phenomenon of semantic locality awareness for increased prefetch prediction rate using real-life application I/O traces from Yahoo! Hadoop audit logs and propose a novel prefetch predictor. By effectively caching and prefetching metadata based on the access patterns, our continuum caching and prefetching mechanism significantly improves the local cache hit rate and reduces the average fetching latency. We replay approximately 20 Million metadata access operations from real audit traces, where SMURF achieves 90% accuracy during prefetch prediction and reduced the average fetch latency by 50% compared to the state-of-the-art mechanisms.
more »
« less
SMURF: Efficient and Scalable Metadata Access for Distributed Applications from Edge to the Cloud
In parallel with big data processing and analysis dominating the usage of distributed and cloud infrastructures, the demand for distributed metadata access and transfer has increased. In many application domains, the volume of data generated exceeds petabytes, while the corresponding metadata amounts to terabytes or even more. In this paper, we propose a novel solution for efficient and scalable metadata access for distributed applications across wide-area networks, dubbed SMURF. Our solution combines novel pipelining and concurrent transfer mechanisms with reliability, provides distributed continuum caching and prefetching strategies to sidestep fetching latency, and achieves scalable and high-performance metadata fetch/prefetch services in the cloud. We also study the phenomenon of semantic locality in real trace logs which is not well utilized in metadata access prediction. We implement our predictor based on this observation and compare it with three existing state-of-the-art prefetch schemes on Yahoo! Hadoop audit traces. By effectively caching and prefetching metadata based on the access patterns, our continuum caching and prefetching mechanism greatly improves local cache hit rate and reduces the average fetching latency. We replayed approximately 20 Million metadata access operations from real audit traces, in which our system achieved 80% accuracy during prefetch prediction and reduced the average fetch latency 50% compared to the state-of-the-art mechanisms.
more »
« less
- Award ID(s):
- 1724898
- PAR ID:
- 10113314
- Date Published:
- Journal Name:
- 2019 IEEE International Conference on Edge Computing (EDGE)
- Page Range / eLocation ID:
- 102 to 106
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
High load latency that results from deep cache hierarchies and relatively slow main memory is an important limiter of single-thread performance. Data prefetch helps reduce this latency by fetching data up the hierarchy before it is requested by load instructions. However, data prefetching has shown to be imperfect in many situations. We propose cache-level prediction to complement prefetchers. Our method predicts which memory hierarchy level a load will access allowing the memory loads to start earlier, and thereby saves many cycles. The predictor provides high prediction accuracy at the cost of just one cycle added latency to L1 misses. Level prediction reduces the memory access latency by 20% on average, and provides speedup of 10.3% over a conventional baseline, and 6.1% over a boosted baseline on generic, graph, and HPC applications.more » « less
-
Abstract. Flash based solid state drives (SSDs) have established them- selves as a higher-performance alternative to hard disk drives in cloud and mobile environments. Nevertheless, SSDs remain a performance bot- tleneck of computer systems due to their high I/O access latency. A com- mon approach for improving the access latency is prefetching. Prefetch- ing predicts future block accesses and preloads them into main memory ahead of time. In this paper, we discuss the challenges of prefetching in SSDs, explain why prior approaches fail to achieve high accuracy, and present a neural network based prefetching approach that signi cantly outperforms the state-of the-art. To achieve high performance, we ad- dress the challenges of prefetching in very large sparse address spaces, as well as prefetching in a timely manner by predicting ahead of time. We collect I/O trace les from several real-world applications running on cloud servers and show that our proposed approach consistently outper- forms the existing stride prefetchers by up to 800 and prior prefetching approaches based on Markov chains by up to 8. Furthermore, we pro- pose an address mapping learning technique to demonstrate the applica- bility of our approach to previously unseen SSD workloads and perform a hyperparameter sensitivity study.more » « less
-
Temporal prefetching offers great potential, but this potential is difficult to achieve because of the need to store large amounts of prefetcher metadata off chip. To reduce the latency and traffic of off-chip metadata accesses, recent advances in temporal prefetching have proposed increasingly complex mechanisms that cache and prefetch this off-chip metadata. This paper suggests a return to simplicity: We present a temporal prefetcher whose metadata resides entirely on chip. The key insights are (1) only a small portion of prefetcher metadata is important, and (2) for most workloads with irregular accesses, the benefits of an effective prefetcher outweigh the marginal benefits of a larger data cache. Thus, our solution, the Triage prefetcher, identifies important metadata and uses a portion of the LLC to store this metadata, and it dynamically partitions the LLC between data and metadata. Our empirical results show that when compared against spatial prefetchers that use only on-chip metadata, Triage performs well, achieving speedups on irregular subset of SPEC2006 of 23.5% compared to 5.8% for the previous state-of-the-art. When compared against state-of-the-art temporal prefetchers that use off-chip metadata, Triage sacrifices performance on single-core systems (23.5% speedup vs. 34.7% speedup), but its 62% lower traffic overhead translates to better performance in bandwidth-constrained 16-core systems (6.2% speedup vs. 4.3% speedup).more » « less
-
null (Ed.)Prefetching techniques have been studied for decades. However, there are few studies on how concurrent memory accesses may affect prefetching effectiveness. When there are multiple concurrent memory requests, we can classify them into sub-classes by analyzing the overlapping relationship. In this work, we first propose pure prefetch coverage (PPC), a novel prefetching metric that can identify an accurate prefetch coverage under the concurrent memory access model. Then we propose APAC, an adaptive prefetch framework with PPC metric that can capture the dynamics of applications and adjust the prefetching aggressiveness. Our experimental results show that the PPC metric has a higher IPC correlation compared to the conventional prefetch coverage (PC) metric. For memory-intensive single-thread benchmarks, APAC provides an average performance improvement by 17.3% and 5.9% compared to the state-of-the-art adaptive prefetch framework FDP and NST. In a multi-core system, APAC outperforms FDP and NST by 8.5% and 5.0% IPC on average, respectively.more » « less
An official website of the United States government

