skip to main content


Title: Desperately Seeking ... Optimal Multi-Tier Cache Configurations
Modern cache hierarchies are tangled webs of complexity. Multiple tiers of heterogeneous physical and virtual devices, with many configurable parameters, all contend to optimally serve swarms of requests between local and remote applications. The challenge of effectively designing these systems is exacerbated by continuous advances in hardware, firmware, innovation in cache eviction algorithms, and evolving workloads and access patterns. This rapidly expanding configuration space has made it costly and time-consuming to physically experiment with numerous cache configurations for even a single stable workload. Current cache evaluation techniques (e.g., Miss Ratio Curves) are short-sighted: they analyze only a single tier of cache, focus primarily on performance, and fail to examine the critical relationships between metrics like throughput and monetary cost. Publicly available I/O cache simulators are also lacking: they can only simulate a fixed or limited number of cache tiers, are missing key features, or offer limited analyses. It is our position that best practices in cache analysis should include the evaluation of multi-tier configurations, coupled with more comprehensive metrics that reveal critical design trade-offs, especially monetary costs. We are developing an n-level I/O cache simulator that is general enough to model any cache hierarchy, captures many metrics, provides a robust set of analysis features, and is easily extendable to facilitate experimental research or production level provisioning. To demonstrate the value of our proposed metrics and simulator, we extended an existing cache simulator (PyMimircache). We present several interesting and counter-intuitive results in this paper.  more » « less
Award ID(s):
1755958
NSF-PAR ID:
10190169
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
HotStorage 2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Modern cache hierarchies are tangled webs of complexity. Multiple tiers of heterogeneous physical and virtual devices, with many configurable parameters, all contend to optimally serve swarms of requests between local and remote applications. The challenge of effectively designing these systems is exacerbated by continuous advances in hardware, firmware, innovation in cache eviction algorithms, and evolving workloads and access patterns. This rapidly expanding configuration space has made it costly and time-consuming to physically experiment with numerous cache configurations for even a single stable workload. Current cache evaluation techniques (e.g., Miss Ratio Curves) are short-sighted: they analyze only a single tier of cache, focus primarily on performance, and fail to examine the critical relationships between metrics like throughput and monetary cost. Publicly available I/O cache simulators are also lacking: they can only simulate a fixed or limited number of cache tiers, are missing key features, or offer limited analyses. It is our position that best practices in cache analysis should include the evaluation of multi-tier configurations, coupled with more comprehensive metrics that reveal critical design trade-offs, especially monetary costs. We are developing an n-level I/O cache simulator that is general enough to model any cache hierarchy, captures many metrics, provides a robust set of analysis features, and is easily extendable to facilitate experimental research or production level provisioning. To demonstrate the value of our proposed metrics and simulator, we extended an existing cache simulator (PyMimircache). We present several interesting and counter-intuitive results in this paper. 
    more » « less
  2. Many applications are increasingly becoming I/O-bound. To improve scalability, analytical models of parallel I/O performance are often consulted to determine possible I/O optimizations. However, I/O performance modeling has predominantly focused on applications that directly issue I/O requests to a parallel file system or a local storage device. These I/O models are not directly usable by applications that access data through standardized I/O libraries, such as HDF5, FITS, and NetCDF, because a single I/O request to an object can trigger a cascade of I/O operations to different storage blocks. The I/O performance characteristics of applications that rely on these libraries is a complex function of the underlying data storage model, user-configurable parameters and object-level access patterns. As a consequence, I/O optimization is predominantly an ad-hoc process that is performed by application developers, who are often domain scientists with limited desire to delve into nuances of the storage hierarchy of modern computers.This paper presents an analytical cost model to predict the end-to-end execution time of applications that perform I/O through established array management libraries. The paper focuses on the HDF5 and Zarr array libraries, as examples of I/O libraries with radically different storage models: HDF5 stores every object in one file, while Zarr creates multiple files to store different objects. We find that accessing array objects via these I/O libraries introduces new overheads and optimizations. Specifically, in addition to I/O time, it is crucial to model the cost of transforming data to a particular storage layout (memory copy cost), as well as model the benefit of accessing a software cache. We evaluate the model on real applications that process observations (neuroscience) and simulation results (plasma physics). The evaluation on three HPC clusters reveals that I/O accounts for as little as 10% of the execution time in some cases, and hence models that only focus on I/O performance cannot accurately capture the performance of applications that use standard array storage libraries. In parallel experiments, our model correctly predicts the fastest storage library between HDF5 and Zarr 94% of the time, in contrast with 70% of the time for a cutting-edge I/O model. 
    more » « less
  3. Traditional caching models emphasize hit rate as the principal measure of performance for cache replacement algorithms. However, hit rate alone can be misleading in the presence of a phenomenon known as a delayed hit. Delayed hits occur in high-throughput systems when multiple requests for an object accumulate before the object can be fetched from the backing store. Prior work by Atre et al. has explored the impact of delayed hits in simple caching scenarios, namely single-tier caches with uniform object sizes. In this work we seek to extend that investigation to consider multi-level caches, such as those that might be found in a modern CDN. Furthermore, we extend MAD, the delayed-hits-aware policy proposed by Atre et al, so that it can be deployed in a multi-tier caching system. We evaluate the performance of MAD using a multi-tier cache simulator and an empirical cache configuration based on modern CDNs. Our initial results lead us to believe that delayed hits can still be a prominent factor in the performance of multi-level caches, although their effect may be reduced in comparison to simpler cache configurations. 
    more » « less
  4. null (Ed.)
    Many high-performance systems now include different types of memory devices within the same compute platform to meet strict performance and cost constraints. Such heterogeneous memory systems often include an upper-level tier with better performance, but limited capacity, and lower-level tiers with higher capacity, but less bandwidth and longer latencies for reads and writes. To utilize the different memory layers efficiently, current systems rely on hardware-directed, memory -side caching or they provide facilities in the operating system (OS) that allow applications to make their own data-tier assignments. Since these data management options each come with their own set of trade-offs, many systems also include mixed data management configurations that allow applications to employ hardware- and software-directed management simultaneously, but for different portions of their address space. Despite the opportunity to address limitations of stand-alone data management options, such mixed management modes are under-utilized in practice, and have not been evaluated in prior studies of complex memory hardware. In this work, we develop custom program profiling, configurations, and policies to study the potential of mixed data management modes to outperform hardware- or software-based management schemes alone. Our experiments, conducted on an Intel ® Knights Landing platform with high-bandwidth memory, demonstrate that the mixed data management mode achieves the same or better performance than the best stand-alone option for five memory intensive benchmark applications (run separately and in isolation), resulting in an average speedup compared to the best stand-alone policy of over 10 %, on average. 
    more » « less
  5. ABSTRACT

    Periderm is a well‐known structural feature with vital roles in protection of inner plant tissues and wound healing. Despite its importance to plant survival, knowledge of periderm occurrences outside the seed plants is limited and the evolutionary origins of periderm remain poorly explored. Here, we review the current knowledge of the taxonomic distribution of periderm in its two main forms – canonical periderm (periderm formed as a typical ontogenetic stage) and wound periderm (periderm produced as a self‐repair mechanism) – with a focus on major plant lineages, living and extinct. We supplement the published occurrences with data based on our own observations and experiments. This updated body of data reveals that the distribution of wound periderm is more widespread taxonomically than previously recognized and some living and extinct groups are capable of producing wound periderm, despite canonical periderm being absent from their normal developmental program. A critical review of canonical and wound periderms in extant and fossil lineages indicates that not all periderms are created equal. Their organisation is widely variable and the differences can be characterised in terms of variations in three structural features: (i) the consistency in orientation of periclinal walls within individual files of periderm cells; (ii) the lateral coordination of periclinal walls between adjacent cell files; and (iii) whether a cambial layer and conspicuous layering of inward and outward derivatives can be distinguished. Using a new system of scoring periderm structure based on these criteria, we characterise the level of organisation of canonical and wound periderms in different lineages. Looking at periderms through the lens provided by their level of organisation reveals that the traditional image of periderm as a single generalised feature, is best viewed as a continuum of structural configurations that are all predicated by the same basic process (periclinal divisions), but can fall anywhere between very loosely organized (diffuse periclinal growth) to very tightly coordinated (organized periclinal growth). Overall, wound periderms in both seed plants and seed‐free plants have lower degrees of organisation than canonical periderms, which may be due to their initiation in response to inherently disruptive traumatic events. Wound and canonical periderms of seed plants have higher degrees of organisation than those of seed‐free plants, possibly due to co‐option of the programs responsible for organizing their vascular cambial growth. Given the importance of wound periderm to plant survival, its widespread taxonomic distribution, and its early occurrence in the fossil record, we hypothesise that wound periderm may have had a single origin in euphyllophytes and canonical periderm may have originated separately in different lineages by co‐option of the basic regulatory toolkit of wound periderm formation. In one evolutionary scenario, wound periderm regulators activated initially by tissue tearing due to tensional stresses elicited by woody growth underwent heterochronic change that switched their activation trigger from tissue tearing to the tensional stresses that precede it, with corresponding changes in the signalling that triggered the regulatory cascade of periderm development from tearing‐induced signals to signalling induced by tension in cells.

     
    more » « less