skip to main content


Title: Ship Compute or Ship Data? Why Not Both?
Award ID(s):
1845853
NSF-PAR ID:
10232567
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
18th USENIX Symposium on Networked Systems Design and Implementation (NSDI'21)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Sampling is often used to reduce query latency for interactive big data analytics. The established parallel data processing paradigm relies on function shipping, where a coordinator dispatches queries to worker nodes and then collects the results. The commoditization of high-performance networking makes data shipping possible, where the coordinator directly reads data in the workers’ memory using RDMA while workers process other queries. In this work, we explore when to use function shipping or data shipping for interactive query processing with sampling. Whether function shipping or data shipping should be preferred depends on the amount of data transferred, the current CPU utilization and the sampling method. The results show that data shipping is up to 6.5Ă— faster when performing clustered sampling with heavily-utilized workers. 
    more » « less
  2. null (Ed.)
    How cloud applications should interact with their data remains an active area of research. Over the last decade, many have suggested relying on a key-value (KV) interface to interact with data stored in remote storage servers, while others have vouched for the benefits of using remote procedure call (RPC). Instead of choosing one over the other, in this paper, we observe that an ideal solution must adaptively combine both of them in order to maximize throughput while meeting application latency requirements. To this end, we propose a new system called Kayak that proactively adjusts the rate of requests and the fraction of requests to be executed using RPC or KV, all in a fully decentralized and self-regulated manner. We theoretically prove that Kayak can quickly converge to the optimal parameters. We implement a system prototype of Kayak. Our evaluations show that Kayak achieves sub-second convergence and improves overall throughput by 32.5%-63.4% for compute-intensive workloads and up to 12.2% for non-compute-intensive and transactional workloads over the state-of-the-art. 
    more » « less
  3. null (Ed.)
  4. Abstract

    Despite technological advances over the last several decades, ship-based hydrography remains the only method for obtaining high-quality, high spatial and vertical resolution measurements of physical, chemical, and biological parameters over the full water column essential for physical, chemical, and biological oceanography and climate science. The Global Ocean Ship-based Hydrographic Investigations Program (GO-SHIP) coordinates a network of globally sustained hydrographic sections. These data provide a unique data set that spans four decades, comprised of more than 40 cross-ocean transects. The section data are, however, difficult to use owing to inhomogeneous format. The purpose of this new temperature, salinity, and dissolved oxygen data product is to combine, reformat and grid these data measured by Conductivity-Temperature-Depth-Oxygen (CTDO) profilers in order to facilitate their use by a wider audience. The product is machine readable and readily accessible by many existing visualisation and analysis software packages. The data processing can be repeated with modifications to suit various applications such as analysis of deep ocean, validation of numerical simulation, and calibration of autonomous platforms.

     
    more » « less