skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Skyway: Connecting Managed Heaps in Distributed Big Data Systems
Managed languages such as Java and Scala are prevalently used in development of large-scale distributed systems. Under the managed runtime, when performing data transfer across machines, a task frequently conducted in a Big Data system, the system needs to serialize a sea of objects into a byte sequence before sending them over the network. The remote node receiving the bytes then deserializes them back into objects. This process is both performance-inefficient and labor-intensive: (1) object serialization/deserialization makes heavy use of reflection, an expensive runtime operation and/or (2) serialization/deserialization functions need to be hand-written and are error-prone. This paper presents Skyway, a JVM-based technique that can directly connect managed heaps of different (local or remote) JVM processes. Under Skyway, objects in the source heap can be directly written into a remote heap without changing their formats. Skyway provides performance benefits to any JVM-based system by completely eliminating the need (1) of invoking serialization/deserialization functions, thus saving CPU time, and (2) of requiring developers to hand-write serialization functions.  more » « less
Award ID(s):
1703598
PAR ID:
10079573
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
Page Range / eLocation ID:
56 to 69
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Big Data systems are typically implemented in object-oriented languages such as Java and Scala due to the quick development cycle they provide. These systems are executed on top of a managed runtime such as the Java Virtual Machine (JVM), which requires each data item to be represented as an object before it can be processed. This representation is the direct cause of many kinds of severe inefficiencies. We developed Gerenuk, a compiler and runtime that aims to enable a JVM-based data-parallel system to achieve near-native efficiency by transforming a set of statements in the system for direct execution over inlined native bytes. The key insight leading to Gerenuk's success is two-fold: (1) analytics workloads often use immutable and confined data types. If we speculatively optimize the system and user code with this assumption, the transformation can be made tractable. (2) The flow of data starts at a deserialization point where objects are created from a sequence of native bytes and ends at a serialization point where they are turned back into a byte sequence to be sent to the disk or network. This flow naturally defines a speculative execution region (SER) to be transformed. Gerenuk compiles a SER speculatively into a version that can operate directly over native bytes that come from the disk or network. The Gerenuk runtime aborts the SER execution upon violations of the immutability and confinement assumption and switches to the slow path by deserializing the bytes and re-executing the original SER. Our evaluation on Spark and Hadoop demonstrates promising results. 
    more » « less
  2. null (Ed.)
    Resource-disaggregated architectures have risen in popularity for large datacenters. However, prior disaggregation systems are designed for native applications; in addition, all of them require applications to possess excellent locality to be efficiently executed. In contrast, programs written in managed languages are subject to periodic garbage collection (GC), which is a typical graph workload with poor locality. Although most datacenter applications are written in managed languages, current systems are far from delivering acceptable performance for these applications. This paper presents Semeru, a distributed JVM that can dramatically improve the performance of managed cloud applications in a memory-disaggregated environment. Its design possesses three major innovations: (1) a universal Java heap, which provides a unified abstraction of virtual memory across CPU and memory servers and allows any legacy program to run without modifications; (2) a distributed GC, which offloads object tracing to memory servers so that tracing is performed closer to data; and (3) a swap system in the OS kernel that works with the runtime to swap page data efficiently. An evaluation of Semeru on a set of widely-deployed systems shows very promising results. 
    more » « less
  3. Thanks to recent advances in high-bandwidth, low-latency interconnects, running a data-intensive application with a remote memory pool is now a feasibility. When developing a data-intensive application, a managed language such as Java is often the developer’s choice due to convenience of the runtime such as automatic memory management. However, the memory management cost increases significantly in far memory due to remote memory accesses. Our insight is that data hotness (i.e., access frequency of objects) is the key to reducing the memory management cost and improving efficiency in far memory. In this paper, we present an ongoing work designing Polar, an enhanced runtime system that is hotness-aware, and optimized for far memory. In Polar, the garbage collector is augmented to identify cold (infrequently accessed) objects and relocate them to remote memory pools. By placing objects at memory locations based on their access frequency, Polar minimizes the number of remote accesses, ensures low access latency for the application, and thus improves overall performance. 
    more » « less
  4. One-sided communication is a useful paradigm for irregular paral- lel applications, but most one-sided programming environments, including MPI’s one-sided interface and PGAS programming lan- guages, lack application-level libraries to support these applica- tions. We present the Berkeley Container Library, a set of generic, cross-platform, high-performance data structures for irregular ap- plications, including queues, hash tables, Bloom filters and more. BCL is written in C++ using an internal DSL called the BCL Core that provides one-sided communication primitives such as remote get and remote put operations. The BCL Core has backends for MPI, OpenSHMEM, GASNet-EX, and UPC++, allowing BCL data structures to be used natively in programs written using any of these programming environments. Along with our internal DSL, we present the BCL ObjectContainer abstraction, which allows BCL data structures to transparently serialize complex data types while maintaining efficiency for primitive types. We also introduce the set of BCL data structures and evaluate their performance across a number of high-performance computing systems, demonstrating that BCL programs are competitive with hand-optimized code, even while hiding many of the underlying details of message aggregation, serialization, and synchronization. 
    more » « less
  5. Far-memory techniques that enable applications to use remote memory are increasingly appealing in modern data centers, supporting applications’ large memory footprint and improving machines’ resource utilization. Unfortunately, most far-memory techniques focus on OS-level optimizations and are agnostic to managed runtimes and garbage collections (GC) underneath applications written in high-level languages. With different object-access patterns from applications, GC can severely interfere with existing far-memory techniques, breaking remote memory prefetching algorithms and causing severe local-memory misses. We developed MemLiner, a runtime technique that improves the performance of far-memory systems by aligning memory accesses from application and GC threads so that they follow similar memory access paths, thereby (1) reducing the local-memory working set and (2) improving remote-memory prefetching through simplified memory access patterns. We implemented MemLiner in two widely used GCs in OpenJDK: G1 and Shenandoah. Our evaluation with a range of widely deployed cloud systems shows that MemLiner improves applications’ end-to-end performance by up to3.3×and reduces applications’ tail latency by up to220.0×. 
    more » « less