Ignem: Upward Migration of Cold Data in Big Data File Systems

Simbarashe Dzinamarira, Florin Dinu

Citation Details

This paper investigates whether migrating cold data can yield significant speedup for big data jobs that run on modern big data file systems. Our work is motivated by two observations. First, improving the input stage of a job can provide significant speedup because many jobs spend a large part of their execution reading inputs. The second observation is that the inputs for many jobs are cold. Common techniques that aim to keep hot data in memory do not benefit these jobs. We analyze the Google production cluster trace data and find that the key ingredients for effectively migrating cold data do exist in such production environments. Encouraged by our findings, we design and implement Ignem, a framework for migrating cold data in big data file systems. We evaluate Ignem in a series of experiments and show that it provides significant speedup for both small and large jobs. Specifically, Hive queries are accelerated by up to 34%; the mean job duration in a tracedriven workload is reduced by 12% and the task duration by nearly 40%; other standalone jobs such as sort and wordcount also improve similarly by up to 30%. more »

Award ID(s):: 1718980

PAR ID:: 10058538

Author(s) / Creator(s):: Simbarashe Dzinamarira, Florin Dinu

Date Published:: 2018-07-01

Journal Name:: 8th IEEE International Conference on Distributed Computing Systems (ICDCS 2018)

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this