Managed languages such as Java and Scala are prevalently used in development of large-scale distributed systems. Under the managed runtime, when performing data transfer across machines, a task frequently conducted in a Big Data system, the system needs to serialize a sea of objects into a byte sequence before sending them over the network. The remote node receiving the bytes then deserializes them back into objects. This process is both performance-inefficient and labor-intensive: (1) object serialization/deserialization makes heavy use of reflection, an expensive runtime operation and/or (2) serialization/deserialization functions need to be hand-written and are error-prone. This paper presents Skyway, a JVM-based technique that can directly connect managed heaps of different (local or remote) JVM processes. Under Skyway, objects in the source heap can be directly written into a remote heap without changing their formats. Skyway provides performance benefits to any JVM-based system by completely eliminating the need (1) of invoking serialization/deserialization functions, thus saving CPU time, and (2) of requiring developers to hand-write serialization functions.
more »
« less
Efficient Heap Data Management on Software Managed Manycore Architectures
Software Managed Manycore (SMM) architectures have been proposed as a solution for scaling the memory architecture. In a typical SMM architecture, Scratch Pad Memories (SPM) is used instead of caches, and data must be explicitly managed in software. While all code and data need to be managed, heap management on SMMs is especially challenging due to the highly dynamic nature of heap data access. Existing techniques spend over 90% of execution time on heap data management, which largely compromised the power efficiency of SMM architectures. This paper presents compiler-based efficient techniques that reduce heap management overhead. Experimental results on benchmarks from MiBench executing on an SMM processor modeled in Gem5 demonstrate that our approach implemented in LLVM 3.8 can improve execution time by an average of 80%, compared to the state-of-the-art.
more »
« less
- PAR ID:
- 10108944
- Date Published:
- Journal Name:
- 2019 32nd International Conference on VLSI Design and 2019 18th International Conference on Embedded Systems (VLSID)
- Page Range / eLocation ID:
- 269 to 274
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Resource-disaggregated architectures have risen in popularity for large datacenters. However, prior disaggregation systems are designed for native applications; in addition, all of them require applications to possess excellent locality to be efficiently executed. In contrast, programs written in managed languages are subject to periodic garbage collection (GC), which is a typical graph workload with poor locality. Although most datacenter applications are written in managed languages, current systems are far from delivering acceptable performance for these applications. This paper presents Semeru, a distributed JVM that can dramatically improve the performance of managed cloud applications in a memory-disaggregated environment. Its design possesses three major innovations: (1) a universal Java heap, which provides a unified abstraction of virtual memory across CPU and memory servers and allows any legacy program to run without modifications; (2) a distributed GC, which offloads object tracing to memory servers so that tracing is performed closer to data; and (3) a swap system in the OS kernel that works with the runtime to swap page data efficiently. An evaluation of Semeru on a set of widely-deployed systems shows very promising results.more » « less
-
Karim Ali and Jan Vitek (Ed.)Using a stack for managing the local state of procedures as popularized by Algol is a simple but effective way to achieve a primitive form of automatic memory management. Hence, the call stack remains the backbone of most programming language runtimes to the present day. However, the appealing simplicity of the call stack model comes at the price of strictly enforced limitations: since every function return pops the stack, it is difficult to return stack-allocated data from a callee upwards to its caller – especially variable-size data such as closures. This paper proposes a solution by introducing a small tweak to the usual stack semantics. We design a type system that tracks the underlying storage mode of values, and when a function returns a stack-allocated value, we just don’t pop the stack! Instead, the stack frame is de-allocated together with a parent the next time a heap-allocated value or primitive is returned. We identify a range of use cases where this delayed-popping strategy is beneficial, ranging from closures to trait objects to other types of variable-size data. Our evaluation shows that this execution model reduces heap and GC pressure and recovers spatial locality of programs improving execution time between 10% and 25% with respect to standard execution.more » « less
-
Abstract Memory safety is a fundamental correctness property of software. For programs that manipulate linked, heap-allocated data structures, ensuring memory safety requires analyzing their possible shapes. Despite significant advances in shape analysis, existing techniques rely on hand-crafted domains tailored to specific data structures, making them difficult to generalize and extend. This paper presents a novel approach that reduces memory-safety proofs to the verification of heap-less imperative programs, enabling the use of off-the-shelf software verification tools. We achieve this reduction through two complementary program instrumentation techniques: space invariants, which enable symbolic reasoning about unbounded heaps, and flow abstraction, which encodes global heap properties as local flow equations. The approach effectively verifies memory safety across a broad range of programs, including concurrent lists and trees that lie beyond the reach of existing shape analysis tools.more » « less
-
This paper presents a new privacy negotiation mechanism for an IoT environment that is both efficient and practical to cope with the IoT special need of seamlessness. This mechanism allows IoT users to express and enforce their personal privacy preferences in a seamless manner while interacting with IoT deployments. In addition, the proposed mechanism satisfies the privacy requirements of the IoT deployment owner. Finally, the proposed privacy mechanism is agnostic to the actual IoT architecture and can be used over a user-managed, edge-managed or a cloud-managed IoT architecture. Prototypes of the proposed mechanism have been implemented for each of these three architectures, and the results show the capability of the protocol to negotiate privacy while adding insignificant time overhead.more » « less
An official website of the United States government

