Title: Threaded Dynamic Memory Management in Many-Core Processors
Current trends in desktop processor design have been toward many-core solutions with increased parallelism. As the number of supported threads grows in these processors, it may prove difficult to exploit them on the commodity desktop. This paper presents a study that explores the spawning of the dynamic memory management activities into a separately executing thread that runs concurrently with the main program thread. Our approach works without requiring modifications to the original source program by redefining the dynamic link path to capture malloc and free calls in a threading dynamic memory management library. The routines of this library are setup so that the initial call to malloc triggers the creation of a thread for dynamic memory management; successive calls to malloc and free will trigger coordination with this thread for dynamic memory management activities. Our preliminary studies show that we can transparently redefine the dynamic memory management activities and we have successfully done so for numerous test programs including most of the SPEC CPU2006 benchmarks, Firefox, and other unix utilities. The results of our experiments show that it is possible to achieve 2-3% performance gains in the three most memory-intensive SPEC CPU2006 benchmarks without requiring recompilation of the benchmark source code. We were also able to achieve a 3-4% speedup when using our library with the gcc and llvm compilers. more »« less
Appel, Andrew W.; Naumann, David A.
(, International Symposium on Memory Management)
null
(Ed.)
We verify the functional correctness of an array-of-bins (segregated free-lists) single-thread malloc/free system with respect to a correctness specification written in separation logic. The memory allocator is written in standard C code compatible with the standard API; the specification is in the Verifiable C program logic, and the proof is done in the Verified Software Toolchain within the Coq proof assistant. Our "resource-aware" specification can guarantee when malloc will successfully return a block, unlike the standard Posix specification that allows malloc to return NULL whenever it wants to. We also prove subsumption (refinement): the resource-aware specification implies a resource-oblivious spec.
Arora, Jatin; Westrick, Sam; Acar, Umut A.
(, Proceedings of the ACM on Programming Languages)
null
(Ed.)
Because of its many desirable properties, such as its ability to control effects and thus potentially disastrous race conditions, functional programming offers a viable approach to programming modern multicore computers. Over the past decade several parallel functional languages, typically based on dialects of ML and Haskell, have been developed. These languages, however, have traditionally underperformed procedural languages (such as C and Java). The primary reason for this is their hunger for memory, which only grows with parallelism, causing traditional memory management techniques to buckle under increased demand for memory. Recent work opened a new angle of attack on this problem by identifying a memory property of determinacy-race-free parallel programs, called disentanglement, which limits the knowledge of concurrent computations about each other’s memory allocations. The work has showed some promise in delivering good time scalability. In this paper, we present provably space-efficient automatic memory management techniques for determinacy-race-free functional parallel programs, allowing both pure and imperative programs where memory may be destructively updated. We prove that for a program with sequential live memory of R * , any P -processor garbage-collected parallel run requires at most O ( R * · P ) memory. We also prove a work bound of O ( W + R * P ) for P -processor executions, accounting also for the cost of garbage collection. To achieve these results, we integrate thread scheduling with memory management. The idea is to coordinate memory allocation and garbage collection with thread scheduling decisions so that each processor can allocate memory without synchronization and independently collect a portion of memory by consulting a collection policy, which we formulate. The collection policy is fully distributed and does not require communicating with other processors. We show that the approach is practical by implementing it as an extension to the MPL compiler for Parallel ML. Our experimental results confirm our theoretical bounds and show that the techniques perform and scale well.
Herrmann, Edward C.; Janga, Prudhvi; Wilsey, Philip A.
(, International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2011))
In recent years, the number of hardware supported threads in desktop processors has increased dramatically. All but the very lowest cost netbooks and embedded processors now have at least dual cores and soon systems supporting upwards of 8 to 16 hardware threads are likely to be commonplace. Unfortunately, it will be difficult to take full advantage of the parallelism emerging processors will be able to provide. To help address this issue, we are investigating mechanisms to pre-compute function results in separate threads running concurrently with the main program thread. The concurrent threads are forked automatically and without program modification. A critical component for the success of this idea is an ability to build a background thread that can pre-compute usable results in some effective manner. For some support functions (dynamic memory) exact arguments predictions for the function pre-computation are not necessary, for others (trigonometric functions) they are. In work with dynamic memory, we are able to pre-compute memory blocks and show modest speedup: saving approximately 25% of the dynamic memory costs. In studies with predicting argument values to trigonometric functions, we show that learning algorithms are able to successfully predict the next argument values approximately 44% of the time.
C program components verified for functional correctness in Coq using VST (Verified Software Toolchain) can now rely on a set of standard library components (math functions, malloc/free, atomic load/store, locks, threads) that have formal specifications.
C program components verified for functional correctness in Coq using VST (Verified Software Toolchain) can now rely on a set of standard library components (math functions, malloc/free, atomic load/store, locks, threads) that have formal specifications.
Herrmann, Edward C, and Wilsey, Philip A. Threaded Dynamic Memory Management in Many-Core Processors. Retrieved from https://par.nsf.gov/biblio/10351016. International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010) . Web. doi:10.1109/CISIS.2010.34.
Herrmann, Edward C, & Wilsey, Philip A. Threaded Dynamic Memory Management in Many-Core Processors. International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010), (). Retrieved from https://par.nsf.gov/biblio/10351016. https://doi.org/10.1109/CISIS.2010.34
Herrmann, Edward C, and Wilsey, Philip A.
"Threaded Dynamic Memory Management in Many-Core Processors". International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010) (). Country unknown/Code not available. https://doi.org/10.1109/CISIS.2010.34.https://par.nsf.gov/biblio/10351016.
@article{osti_10351016,
place = {Country unknown/Code not available},
title = {Threaded Dynamic Memory Management in Many-Core Processors},
url = {https://par.nsf.gov/biblio/10351016},
DOI = {10.1109/CISIS.2010.34},
abstractNote = {Current trends in desktop processor design have been toward many-core solutions with increased parallelism. As the number of supported threads grows in these processors, it may prove difficult to exploit them on the commodity desktop. This paper presents a study that explores the spawning of the dynamic memory management activities into a separately executing thread that runs concurrently with the main program thread. Our approach works without requiring modifications to the original source program by redefining the dynamic link path to capture malloc and free calls in a threading dynamic memory management library. The routines of this library are setup so that the initial call to malloc triggers the creation of a thread for dynamic memory management; successive calls to malloc and free will trigger coordination with this thread for dynamic memory management activities. Our preliminary studies show that we can transparently redefine the dynamic memory management activities and we have successfully done so for numerous test programs including most of the SPEC CPU2006 benchmarks, Firefox, and other unix utilities. The results of our experiments show that it is possible to achieve 2-3% performance gains in the three most memory-intensive SPEC CPU2006 benchmarks without requiring recompilation of the benchmark source code. We were also able to achieve a 3-4% speedup when using our library with the gcc and llvm compilers.},
journal = {International Conference on Complex, Intelligent and Software Intensive Systems (CISIS 2010)},
author = {Herrmann, Edward C and Wilsey, Philip A},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.