Title: Mesh: compacting memory management for C/C++ applications
Programs written in C/C++ can suffer from serious memory fragmentation, leading to low utilization of memory, de- graded performance, and application failure due to memory exhaustion. This paper introduces Mesh, a plug-in replace- ment for malloc that, for the first time, eliminates fragmen- tation in unmodified C/C++ applications. Mesh combines novel randomized algorithms with widely-supported virtual memory operations to provably reduce fragmentation, break- ing the classical Robson bounds with high probability. Mesh generally matches the runtime performance of state-of-the- art memory allocators while reducing memory consumption; in particular, it reduces the memory of consumption of Fire- fox by 16% and Redis by 39%. more »« less
Luo, Weiyu; Demsky, Brian
(, Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)
null
(Ed.)
Writing correct concurrent code that uses atomics under the C/C++ memory model is extremely difficult. We present C11Tester, a race detector for the C/C++ memory model that can explore executions in a larger fragment of the C/C++ memory model than previous race detector tools. Relative to previous work, C11Tester's larger fragment includes behaviors that are exhibited by ARM processors. C11Tester uses a new constraint-based algorithm to implement modification order that is optimized to allow C11Tester to make decisions in terms of application-visible behaviors. We evaluate C11Tester on several benchmark applications, and compare C11Tester's performance to both tsan11rec, the state of the art tool that controls scheduling for C/C++; and tsan11, the state of the art tool that does not control scheduling.
Choudhury, Dwaipayan; Rajam, Aravind Sukumaran; Kalyanaraman, Ananth; Pande, Partha Pratim
(, ACM Journal on Emerging Technologies in Computing Systems)
Recent advances in GPU-based manycore accelerators provide the opportunity to efficiently process large-scale graphs on chip. However, real world graphs have a diverse range of topology and connectivity patterns (e.g., degree distributions) that make the design of input-agnostic hardware architectures a challenge. Network-on-Chip (NoC)- based architectures provide a way to overcome this challenge as the architectural topology can be used to approximately model the expected traffic patterns that emerge from graph application workloads. In this paper, we first study the mix of long- and short-range traffic patterns generated on-chip using graph workloads, and subsequently use the findings to adapt the design of an optimal NoC-based architecture. In particular, by leveraging emerging three-dimensional (3D) integration technology, we propose design of a small-world NoC (SWNoC)- enabled manycore GPU architecture, where the placement of the links connecting the streaming multiprocessors (SM) and the memory controllers (MC) follow a power-law distribution. The proposed 3D manycore GPU architecture outperforms the traditional planar (2D) counterparts in both performance and energy consumption. Moreover, by adopting a joint performance-thermal optimization strategy, we address the thermal concerns in a 3D design without noticeably compromising the achievable performance. The 3D integration technology is also leveraged to incorporate Near Data Processing (NDP) to complement the performance benefits introduced by the SWNoC architecture. As graph applications are inherently memory intensive, off-chip data movement gives rise to latency and energy overheads in the presence of external DRAM. In conventional GPU architectures, as the main memory layer is not integrated with the logic, off-chip data movement negatively impacts overall performance and energy consumption. We demonstrate that NDP significantly reduces the overheads associated with such frequent and irregular memory accesses in graph-based applications. The proposed SWNoC-enabled NDP framework that integrates 3D memory (like Micron's HMC) with a massive number of GPU cores achieves 29.5% performance improvement and 30.03% less energy consumption on average compared to a conventional planar Mesh-based design with external DRAM.
Abstract In this paper, we generalize the original idea of Thurston for the so-called Mather-Thurston’s theorem for foliated bundles to prove new variants of this theorem for PL homeomorphisms and contactormorphisms. These versions answer questions posed by Gelfand-Fuks ([GF73, Section 5]) and Greenberg ([Gre92]) on PL foliations and Rybicki ([Ryb10, Section 11]) on contactomorphisms. The interesting point about the original Thurston’s technique compared to the better-known Segal-McDuff’s proof of the Mather-Thurston theorem is that it gives acompactly supportedc-principle theorem without knowing the relevant local statement on open balls. In the appendix, we show that Thurston’s fragmentation implies the non-abelian Poincare duality theorem and its generalization using blob complexes ([MW12, Theorem 7.3.1]). To the memory of John Mather.
Wu, Chenyuan; Amiri, Mohammad Javad; Asch, Jared; Nagda, Heena; Zhang, Qizhen; Loo, Boon Thau
(, Proceedings of the VLDB Endowment)
While permissioned blockchains enable a family of data center applications, existing systems suffer from imbalanced loads across compute and memory, exacerbating the underutilization of cloud resources. This paper presents FlexChain , a novel permissioned blockchain system that addresses this challenge by physically disaggregating CPUs, DRAM, and storage devices to process different blockchain workloads efficiently. Disaggregation allows blockchain service providers to upgrade and expand hardware resources independently to support a wide range of smart contracts with diverse CPU and memory demands. Moreover, it ensures efficient resource utilization and hence prevents resource fragmentation in a data center. We have explored the design of XOV blockchain systems in a disaggregated fashion and developed a tiered key-value store that can elastically scale its memory and storage. Our design significantly speeds up the execution stage. We have also leveraged several techniques to parallelize the validation stage in FlexChain to further improve the overall blockchain performance. Our evaluation results show that FlexChain can provide independent compute and memory scalability, while incurring at most 12.8% disaggregation overhead. FlexChain achieves almost identical throughput as the state-of-the-art distributed approaches with significantly lower memory and CPU consumption for compute-intensive and memory-intensive workloads respectively.
Zhou, Jie; Criswell, John; Hicks, Michael
(, Proceedings of the ACM on Programming Languages)
Temporal memory safety bugs, especially use-after-free and double free bugs, pose a major security threat to C programs. Real-world exploits utilizing these bugs enable attackers to read and write arbitrary memory locations, causing disastrous violations of confidentiality, integrity, and availability. Many previous solutions retrofit temporal memory safety to C, but they all either incur high performance overhead and/or miss detecting certain types of temporal memory safety bugs. In this paper, we propose a temporal memory safety solution that is both efficient and comprehensive. Specifically, we extend Checked C, a spatially-safe extension to C, with temporally-safe pointers. These are implemented by combining two techniques: fat pointers and dynamic key-lock checks. We show that the fat-pointer solution significantly improves running time and memory overhead compared to the disjoint-metadata approach that provides the same level of protection. With empirical program data and hands-on experience porting real-world applications, we also show that our solution is practical in terms of backward compatibility---one of the major complaints about fat pointers.
@article{osti_10112132,
place = {Country unknown/Code not available},
title = {Mesh: compacting memory management for C/C++ applications},
url = {https://par.nsf.gov/biblio/10112132},
DOI = {10.1145/3314221.3314582},
abstractNote = {Programs written in C/C++ can suffer from serious memory fragmentation, leading to low utilization of memory, de- graded performance, and application failure due to memory exhaustion. This paper introduces Mesh, a plug-in replace- ment for malloc that, for the first time, eliminates fragmen- tation in unmodified C/C++ applications. Mesh combines novel randomized algorithms with widely-supported virtual memory operations to provably reduce fragmentation, break- ing the classical Robson bounds with high probability. Mesh generally matches the runtime performance of state-of-the- art memory allocators while reducing memory consumption; in particular, it reduces the memory of consumption of Fire- fox by 16% and Redis by 39%.},
journal = {PLDI 2019},
author = {Powers, Bobby and Tench, David and Berger, Emery D. and McGregor, Andrew},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.