NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Beating OPT with Statistical Clairvoyance and Variable Size Caching

https://doi.org/10.1145/3297858.3304067

Li, Pengcheng; Pronovost, Colin; Wilson, William; Tait, Benjamin; Zhou, Jie; Ding, Chen; Criswell, John (January 2019, Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2019)

Caching techniques are widely used in today’s computing infrastructure from virtual memory management to server cache and memory cache. This paper builds on two observa- tions. First, the space utilization in cache can be improved by varying the cache size based on dynamic application demand. Second, it is easier to predict application behavior statistically than precisely. This paper presents a new variable-size cache that uses statistical knowledge of program behavior to maximize the cache performance. We measure performance using data access traces from real-world workloads, including Memcached traces from Facebook and storage traces from Microsoft Research. In an offline setting, the new cache is demonstrated to outperform even OPT, the optimal fixed- size cache which makes use of precise knowledge of program behavior.
more » « less
Full Text Available
Timescale functions for parallel memory allocation

https://doi.org/10.1145/3315573.3329987

Li, Pengcheng; Luo, Hao; Ding, Chen (January 2019, Proceedings of the 2019 {ACM} {SIGPLAN} International Symposium on Memory Management, {ISMM} 2019)

Memory allocation is increasingly important to parallel performance, yet it is challenging because a program has data of many sizes, and the demand differs from thread to thread. Modern allocators use highly tuned heuristics but do not provide uniformly good performance when the level of concurrency increases from a few threads to hundreds of threads. This paper presents a new timescale theory to model the memory demand in real time. Using the new theory, an allocator can ad- just its synchronization frequency using a single parameter called allocations per fetch (apf ). The paper presents the timescale the- ory, the design and implementation of APF tuning in an existing allocator, and evaluation of the effect on program speed and mem- ory efficiency. APF tuning improves the throughput of MongoDB by 55%, reduces the tail latency of a Web server by over 60%, and increases the speed of a selection of synthetic benchmarks by up to 24× while using the same amount of memory.
more » « less
Full Text Available
Codestitcher: inter-procedural basic block layout optimization

https://doi.org/10.1145/3302516.3307358

Lavaee, Rahman; Criswell, John; Ding, Chen (January 2019, Proceedings of the 28th International Conference on Compiler Construction {CC} 2019)

Modern software executes a large amount of code. Previous techniques of code layout optimization were developed one or two decades ago and have become inadequate to cope with the scale and complexity of new types of applications such as compilers, browsers, interpreters, language VMs and shared libraries. This paper presents Codestitcher, an inter-procedural basic block code layout optimizer which reorders basic blocks in an executable to benefit from better cache and TLB performance. Codestitcher provides a hierarchical framework which can be used to improve locality in various layers of the memory hierarchy. Our evaluation shows that Codestitcher improves the performance of the origi- nal program by 3% to 25% (on average, by 10%) on 5 widely used applications with large code sizes: MySQL, Clang, Firefox, Apache, and Python. It gives an additional improvement of 4% over LLVM’s PGO and 3% over PGO combined with the best function reordering technique.
more » « less
Full Text Available
Fast Miss Ratio Curve Modeling for Storage Cache

https://doi.org/10.1145/3185751

Hu, Xiameng; Wang, Xiaolin; Zhou, Lan; Luo, Yingwei; Wang, Zhenlin; Ding, Chen; Ye, Chencheng (May 2018, ACM Transactions on Storage)

Full Text Available
Prediction and bounds on shared cache demand from memory access interleaving

https://doi.org/10.1145/3210563.3210565

Brock, Jacob; Ding, Chen; Lavaee, Rahman; Liu, Fangzhou; Yuan, Liang (January 2018, ISMM 2018: Proceedings of the 2018 ACM SIGPLAN International Symposium on Memory Management)

Full Text Available
Locality analysis through static parallel sampling

https://doi.org/10.1145/3192366.3192402

Chen, Dong; Liu, Fangzhou; Ding, Chen; Pai, Sreepathi (January 2018, PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and Implementation)

Full Text Available
PAYJIT: space-optimal JIT compilation and its practical implementation

https://doi.org/10.1145/3178372.3179523

Brock, Jacob; Ding, Chen; Xu, Xiaoran; Zhang, Yan (January 2018, Proceedings of the 27th International Conference on Compiler Construction, CC 2018)

Full Text Available
Cache Exclusivity and Sharing: Theory and Optimization

https://doi.org/10.1145/3134437

Ye, Chencheng; Ding, Chen; Luo, Hao; Brock, Jacob; Chen, Dong; Jin, Hai (December 2017, ACM Transactions on Architecture and Code Optimization)

Full Text Available
Optimizing Locality-Aware Memory Management of Key-Value Caches

https://doi.org/10.1109/TC.2016.2618920

Hu, Xiameng; Wang, Xiaolin; Zhou, Lan; Luo, Yingwei; Ding, Chen; Jiang, Song; Wang, Zhenlin (May 2017, IEEE Transactions on Computers)

Full Text Available
Adaptive Software Caching for Efficient NVRAM Data Persistence

https://doi.org/10.1109/IPDPS.2017.83

Li, Pengcheng; Chakrabarti, Dhruva R.; Ding, Chen; Yuan, Liang (May 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Full Text Available

« Prev Next »

Search for: All records