skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Job-Aware Optimization of File Placement in Hadoop
DOI 10.1109/COMPSAC.2019.10284 Abstract—Hadoop is a popular data-analytics platform based on the MapReduce model. When analyzing extremely big data, hard disk drives are commonly used and Hadoop performance can be optimized by improving I/O performance. Hard disk drives have different performance depending on whether data are placed in the outer or inner disk zones. In this paper, we propose a method that uses knowledge of job characteristics to place data in hard disk drives so that Hadoop performance is improved. Files of a job that intensively and sequentially accesses the storage device are placed in outer disk tracks which have higher sequential access speed than inner tracks. Temporary and permanent files are placed in the outer and inner zones, respectively. This enables repeated usage of the faster zones by avoiding the use of the faster zones by permanent files. Our evaluation demonstrates that the proposed method improves the performance of Hadoop jobs by 15.0% over the normal case when file placement is not used. The proposed method also outperforms a previously proposed placement approach by 9.9%.  more » « less
Award ID(s):
1550126
PAR ID:
10193021
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), 15-19 Jul 2019,
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Configuration space complexity makes the big-data software systems hard to configure well. Consider Hadoop, with over nine hundred parameters, developers often just use the default configurations provided with Hadoop distributions. The opportunity costs in lost performance are significant. Popular learning-based approaches to auto-tune software does not scale well for big-data systems because of the high cost of collecting training data. We present a new method based on a combination of Evolutionary Markov Chain Monte Carlo (EMCMC)} sampling and cost reduction techniques tofind better-performing configurations for big data systems. For cost reduction, we developed and experimentally tested and validated two approaches: using scaled-up big data jobs as proxies for the objective function for larger jobs and using a dynamic job similarity measure to infer that results obtained for one kind of big data problem will work well for similar problems. Our experimental results suggest that our approach promises to improve the performance of big data systems significantly and that it outperforms competing approaches based on random sampling, basic genetic algorithms (GA), and predictive model learning. Our experimental results support the conclusion that our approach strongly demonstrates the potential toimprove the performance of big data systems significantly and frugally. 
    more » « less
  2. Solid State Drives (SSD) compete with Hard Disk Drives (HDD) in the data storage market. Recent advances in SSD capacity/cost have come from arranging the flash memory cells not just on the 2D surface but from also stacking many cells vertically through the 3rd dimension. The same option has not been seen as a practical approach for HDD technology that is based on magnetic recording. Data can only be written to and read from just above the surface of the medium, and any data on additional layers deeper in the medium is profoundly affected by the additional spacing and loss of resolution. Nevertheless, modest gains may be still be possible. Earlier work suggested gains around 17% for two stacked layers. That work only examined a single isolated track on each of two layers and just one reader. In this new work, we examine a minimal 3D configuration again comprising two layers, where two adjacent tracks on the upper layer straddle a double width track on the lower layer. We take the writing process as a given—for instance utilizing Microwave Assisted Magnetic Recording. For readback, we variously assume 1, 2, or 3 readers arrayed above the data tracks. 
    more » « less
  3. Abstract Several lines of evidence suggest that the Milky Way underwent a major merger at z ∼ 2 with the Gaia-Sausage-Enceladus (GSE) galaxy. Here we use H3 Survey data to argue that GSE entered the Galaxy on a retrograde orbit based on a population of highly retrograde stars with chemistry similar to the largely radial GSE debris. We present the first tailored N -body simulations of the merger. From a grid of ≈500 simulations we find that a GSE with M ⋆ = 5 × 10 8 M ⊙ , M DM = 2 × 10 11 M ⊙ best matches the H3 data. This simulation shows that the retrograde stars are stripped from GSE’s outer disk early in the merger. Despite being selected purely on angular momenta and radial distributions, this simulation reproduces and explains the following phenomena: (i) the triaxial shape of the inner halo, whose major axis is at ≈35° to the plane and connects GSE’s apocenters; (ii) the Hercules-Aquila Cloud and the Virgo Overdensity, which arise due to apocenter pileup; and (iii) the 2 Gyr lag between the quenching of GSE and the truncation of the age distribution of the in situ halo, which tracks the lag between the first and final GSE pericenters. We make the following predictions: (i) the inner halo has a “double-break” density profile with breaks at both ≈15–18 kpc and 30 kpc, coincident with the GSE apocenters; and (ii) the outer halo has retrograde streams awaiting discovery at >30 kpc that contain ≈10% of GSE’s stars. The retrograde (radial) GSE debris originates from its outer (inner) disk—exploiting this trend, we reconstruct the stellar metallicity gradient of GSE (−0.04 ± 0.01 dex r 50 − 1 ). These simulations imply that GSE delivered ≈20% of the Milky Way’s present-day dark matter and ≈50% of its stellar halo. 
    more » « less
  4. Scientific data analysis pipelines face scalability bottlenecks when processing massive datasets that consist of millions of small files. Such datasets commonly arise in domains as diverse as detecting supernovae and post-processing computational fluid dynamics simulations. Furthermore, applications often use inference frameworks such as TensorFlow and PyTorch whose naive I/O methods exacerbate I/O bottlenecks. One solution is to use scientific file formats, such as HDF5 and FITS, to organize small arrays in one big file. However, storing everything in one file does not fully leverage the heterogeneous data storage capabilities of modern clusters. This paper presents Henosis, a system that intercepts data accesses inside the HDF5 library and transparently redirects I/O to the in-memory Redis object store or the disk-based TileDB array store. During this process, Henosis consolidates small arrays into bigger chunks and intelligently places them in data stores. A critical research aspect of Henosis is that it formulates object consolidation and data placement as a single optimization problem. Henosis carefully constructs a graph to capture the I/O activity of a workload and produces an initial solution to the optimization problem using graph partitioning. Henosis then refines the solution using a hill-climbing algorithm which migrates arrays between data stores to minimize I/O cost. The evaluation on two real scientific data analysis pipelines shows that consolidation with Henosis makes I/O 300× faster than directly reading small arrays from TileDB and 3.5× faster than workload-oblivious consolidation methods. Moreover, jointly optimizing consolidation and placement in Henosis makes I/O 1.7× faster than strategies that perform consolidation and placement independently. 
    more » « less
  5. Scientific workflows drive most modern large-scale science breakthroughs by allowing scientists to define their computations as a set of jobs executed in a given order based on their data dependencies. Workflow management systems (WMSs) have become key to automating scientific workflows-executing computational jobs and orchestrating data transfers between those jobs running on complex high-performance computing (HPC) platforms. Traditionally, WMSs use files to communicate between jobs: a job writes out files that are read by other jobs. However, HPC machines face a growing gap between their storage and compute capabilities. To address that concern, the scientific community has adopted a new approach called in situ, which bypasses costly parallel filesystem I/O operations with faster in-memory or in-network communications. When using in situ approaches, communication and computations can be interleaved. In this work, we leverage the Decaf in situ dataflow framework to accelerate task-based scientific workflows managed by the Pegasus WMS, by replacing file communications with faster MPI messaging. We propose a new execution engine that uses Decaf to manage communications within a sub-workflow (i.e., set of jobs) to optimize inter-job communications. We consider two workflows in this study: (i) a synthetic workflow that benchmarks and compares file- and MPI-based communication; and (ii) a realistic bioinformatics workflow that computes mu-tational overlaps in the human genome. Experiments show that in situ communication can improve the bioinformatics workflow execution time by 22% to 30% compared with file communication. Our results motivate further opportunities and challenges for bridging traditional WMSs with in situ frameworks. 
    more » « less