skip to main content


Search for: All records

Award ID contains: 1836881

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
  2. null (Ed.)
  3. null (Ed.)
  4. Due to its speed and ease of use, Spark has become a popular tool amongst data scientists to analyze data in various sizes. Counter-intuitively, data processing workloads in industrial companies such as Google, Facebook, and Yahoo are dominated by short-running applications, which is due to the majority of applications being mostly consisted of simple SQL-like queries (Dean, 2004, Zaharia et al, 2008). Unfortunately, the current version of Spark is not optimized for such kinds of workloads. In this paper, we propose a novel framework, called Meteor, which can dramatically improve the performance for short-running applications. We extend Spark with three additional operating modes: one-thread, one-container, and distributed. The one-thread mode executes all tasks on just one thread; the one-container mode runs these tasks in one container by multi-threading; the distributed mode allocates all tasks over the whole cluster. A new framework for submitting applications is also designed, which utilizes a fine-grained Spark performance model to decide which of the three modes is the most efficient to invoke upon a new application submission. From our extensive experiments on Amazon EC2, one-thread mode is the optimal choice when the input size is small, otherwise the distributed mode is better. Overall, Meteor is up to 2 times faster than the original Spark for short applications. 
    more » « less
  5. null (Ed.)