Far-memory techniques that enable applications to use remote memory are increasingly appealing in modern datacenters, supporting applications’ large memory footprint and improving machines’ resource utilization. Unfortunately, most far-memory techniques focus on OS-level optimizations and are agnostic to managed runtimes and garbage collections (GC) underneath applications written in high-level languages. With different object-access patterns from applications, GC can severely interfere with existing far-memory techniques, breaking prefetching algorithms and causing severe local-memory misses. We developed MemLiner, a runtime technique that improves the performance of far-memory systems by “lining up” memory accesses from the application and the GC so that they follow similar memory access paths, thereby (1)reducing the local-memory working set and (2) improving remote-memory prefetching through simplified memory access patterns. We implemented MemLiner in two widely-used GCs in OpenJDK: G1 and Shenandoah. Our evaluation with a range of widely-deployed cloud systems shows MemLiner improves applications’ end-to-end performance by up to 2.5x.
more »
« less
Can far memory improve job throughput?
As memory requirements grow, and advances in memory technology slow, the availability of sufficient main memory is increasingly the bottleneck in large compute clusters. One solution to this is memory disaggregation, where jobs can remotely access memory on other servers, or far memory. This paper first presents faster swapping mechanisms and a far memory-aware cluster scheduler that make it possible to support far memory at rack scale. Then, it examines the conditions under which this use of far memory can increase job throughput. We find that while far memory is not a panacea, for memory-intensive workloads it can provide performance improvements on the order of 10% or more even without changing the total amount of memory available.
more »
« less
- Award ID(s):
- 1704941
- PAR ID:
- 10188384
- Date Published:
- Journal Name:
- EuroSys ’20
- Page Range / eLocation ID:
- 1 to 16
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Thanks to recent advances in high-bandwidth, low-latency interconnects, running a data-intensive application with a remote memory pool is now a feasibility. When developing a data-intensive application, a managed language such as Java is often the developer’s choice due to convenience of the runtime such as automatic memory management. However, the memory management cost increases significantly in far memory due to remote memory accesses. Our insight is that data hotness (i.e., access frequency of objects) is the key to reducing the memory management cost and improving efficiency in far memory. In this paper, we present an ongoing work designing Polar, an enhanced runtime system that is hotness-aware, and optimized for far memory. In Polar, the garbage collector is augmented to identify cold (infrequently accessed) objects and relocate them to remote memory pools. By placing objects at memory locations based on their access frequency, Polar minimizes the number of remote accesses, ensures low access latency for the application, and thus improves overall performance.more » « less
-
Far-memory techniques that enable applications to use remote memory are increasingly appealing in modern data centers, supporting applications’ large memory footprint and improving machines’ resource utilization. Unfortunately, most far-memory techniques focus on OS-level optimizations and are agnostic to managed runtimes and garbage collections (GC) underneath applications written in high-level languages. With different object-access patterns from applications, GC can severely interfere with existing far-memory techniques, breaking remote memory prefetching algorithms and causing severe local-memory misses. We developed MemLiner, a runtime technique that improves the performance of far-memory systems by aligning memory accesses from application and GC threads so that they follow similar memory access paths, thereby (1) reducing the local-memory working set and (2) improving remote-memory prefetching through simplified memory access patterns. We implemented MemLiner in two widely used GCs in OpenJDK: G1 and Shenandoah. Our evaluation with a range of widely deployed cloud systems shows that MemLiner improves applications’ end-to-end performance by up to3.3×and reduces applications’ tail latency by up to220.0×.more » « less
-
Balzarotti, Davide; Xu, Wenyuan (Ed.)Kernel privilege-escalation exploits typically leverage memory-corruption vulnerabilities to overwrite particular target locations. These memory corruption targets play a critical role in the exploits, as they determine which privileged resources (e.g., files, memory, and operations) the adversary may access and what privileges (e.g., read, write, and unrestricted) they may gain. While prior research has made important advances in discovering vulnerabilities and achieving privilege escalation, in practice, the exploits rely on the few memory corruption targets that have been discovered manually so far. We propose SCAVY, a framework that automatically discovers memory corruption targets for privilege escalation in the Linux kernel. SCAVY's key insight lies in broadening the search scope beyond the kernel data structures explored in prior work, which focused on function pointers or pointers to structures that include them, to encompass the remaining 90% of Linux kernel structures. Additionally, the search is bug-type agnostic, as it considers any memory corruption capability. To this end, we develop novel and scalable techniques that combine fuzzing and differential analysis to automatically explore and detect privilege escalation by comparing the accessibility of resources between executions with and without corruption. This allows SCAVY to determine that corrupting a certain field puts the system in an exploitable state, independently of the vulnerability exploited. SCAVY found 955 PoC, from which we identify 17 new fields in 12 structures that can enable privilege escalation. We utilize these targets to develop 6 exploits for 5 CVE vulnerabilities. Our findings show that new memory corruption targets can change the security implications of vulnerabilities, urging researchers to proactively discover memory corruption targets.more » « less
-
Alistarh, Dan (Ed.)Broadcast is a ubiquitous distributed computing problem that underpins many other system tasks. In static, connected networks, it was recently shown that broadcast is solvable without any node memory and only constant-size messages in worst-case asymptotically optimal time (Hussak and Trehan, PODC'19/STACS'20/DC'23). In the dynamic setting of adversarial topology changes, however, existing algorithms rely on identifiers, port labels, or polynomial memory to solve broadcast and compute functions over node inputs. We investigate space-efficient, terminating broadcast algorithms for anonymous, synchronous, 1-interval connected dynamic networks and introduce the first memory lower bounds in this setting. Specifically, we prove that broadcast with termination detection is impossible for idle-start algorithms (where only the broadcaster can initially send messages) and otherwise requires Ω(log n) memory per node, where n is the number of nodes in the network. Even if the termination condition is relaxed to stabilizing termination (eventually no additional messages are sent), we show that any idle-start algorithm must use ω(1) memory per node, separating the static and dynamic settings for anonymous broadcast. This lower bound is not far from optimal, as we present an algorithm that solves broadcast with stabilizing termination using O(log n) memory per node in worst-case asymptotically optimal time. In sum, these results reveal the necessity of non-constant memory for nontrivial terminating computation in anonymous dynamic networks.more » « less
An official website of the United States government

