CacheIt: Application-Agnostic Dynamic Caching for Big Data Analytics

Nguyen, Dat; Rafid, Muhammad; Santoso, Nathanael; Nguyen, Khanh

doi:10.1109/BigData62323.2024.10825231

Citation Details

CacheIt: Application-Agnostic Dynamic Caching for Big Data Analytics

Apache Spark arguably is the most prominent Big Data processing framework tackling the scalability challenge of a wide variety of modern workloads. A key to its success is caching critical data in memory, thereby eliminating wasteful computations of regenerating intermediate results. While critical to performance, caching is not automated. Instead, developers have to manually handle such a data management task via APIs, a process that is error-prone and labor-intensive, yet may still yield sub-optimal performance due to execution complexities. Existing optimizations rely on expensive profiling steps and/or application-specific cost models to enable a postmortem analysis and a manual modification to existing applications. This paper presents CACHEIT, built to take the guesswork off the users while running applications as-is. CACHEIT analyzes the program’s workflow, extracting important features such as dependencies and access patterns, using them as an oracle to detect high-value data candidates and guide the caching decisions at run time. CACHEIT liberates users from low-level memory management requirements, allowing them to focus on the business logic instead. CACHEIT is application-agnostic and requires no profiling or a cost model. A thorough evaluation with a broad range of Spark applications on real-world datasets shows that CACHEIT is effective in maintaining satisfactory performance, incurring only marginal slowdown compared to the manually well-tuned counterparts more »

Award ID(s):: 2107010

PAR ID:: 10620971

Author(s) / Creator(s):: Nguyen, Dat; Rafid, Muhammad; Santoso, Nathanael; Nguyen, Khanh

Publisher / Repository:: IEEE

Date Published:: 2024-12-15

ISBN:: 979-8-3503-6248-0

Page Range / eLocation ID:: 262 to 271

Subject(s) / Keyword(s):: caching memory management dynamic analysis

Format(s):: Medium: X

Location:: Washington, DC, USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/BigData62323.2024.10825231

More Like this