Job-Aware Optimization of File Placement in Hadoop

Nakagami, Makoto; Fortes, Jose A.; Yamaguchi, Saneyasu

doi:10.1109/COMPSAC.2019.10284

Citation Details

Job-Aware Optimization of File Placement in Hadoop

DOI 10.1109/COMPSAC.2019.10284 Abstract—Hadoop is a popular data-analytics platform based on the MapReduce model. When analyzing extremely big data, hard disk drives are commonly used and Hadoop performance can be optimized by improving I/O performance. Hard disk drives have different performance depending on whether data are placed in the outer or inner disk zones. In this paper, we propose a method that uses knowledge of job characteristics to place data in hard disk drives so that Hadoop performance is improved. Files of a job that intensively and sequentially accesses the storage device are placed in outer disk tracks which have higher sequential access speed than inner tracks. Temporary and permanent files are placed in the outer and inner zones, respectively. This enables repeated usage of the faster zones by avoiding the use of the faster zones by permanent files. Our evaluation demonstrates that the proposed method improves the performance of Hadoop jobs by 15.0% over the normal case when file placement is not used. The proposed method also outperforms a previously proposed placement approach by 9.9%. more »

Award ID(s):: 1550126

PAR ID:: 10193021

Author(s) / Creator(s):: Nakagami, Makoto; Fortes, Jose A.; Yamaguchi, Saneyasu

Date Published:: 2019-01-01

Journal Name:: 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), 15-19 Jul 2019,

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/COMPSAC.2019.10284

More Like this