NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Automatic HBM Management: Models and Algorithms

https://doi.org/10.1145/3490148.3538570

DeLayo, Daniel; Zhang, Kenny; Agrawal, Kunal; Bender, Michael A.; Berry, Jonathan W.; Das, Rathish; Moseley, Benjamin; Phillips, Cynthia A. (July 2022, Proc. 34th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA))

Full Text Available
Timely Reporting of Heavy Hitters Using External Memory

https://doi.org/10.1145/3472392

Singh, Shikha; Pandey, Prashant; Bender, Michael A.; Berry, Jonathan W.; Farach-Colton, Martín; Johnson, Rob; Kroeger, Thomas M.; Phillips, Cynthia A. (December 2021, ACM Transactions on Database Systems)

Given an input stream S of size N , a ɸ-heavy hitter is an item that occurs at least ɸN times in S . The problem of finding heavy-hitters is extensively studied in the database literature. We study a real-time heavy-hitters variant in which an element must be reported shortly after we see its T = ɸ N-th occurrence (and hence it becomes a heavy hitter). We call this the Timely Event Detection ( TED ) Problem. The TED problem models the needs of many real-world monitoring systems, which demand accurate (i.e., no false negatives) and timely reporting of all events from large, high-speed streams with a low reporting threshold (high sensitivity). Like the classic heavy-hitters problem, solving the TED problem without false-positives requires large space (Ω (N) words). Thus in-RAM heavy-hitters algorithms typically sacrifice accuracy (i.e., allow false positives), sensitivity, or timeliness (i.e., use multiple passes). We show how to adapt heavy-hitters algorithms to external memory to solve the TED problem on large high-speed streams while guaranteeing accuracy, sensitivity, and timeliness. Our data structures are limited only by I/O-bandwidth (not latency) and support a tunable tradeoff between reporting delay and I/O overhead. With a small bounded reporting delay, our algorithms incur only a logarithmic I/O overhead. We implement and validate our data structures empirically using the Firehose streaming benchmark. Multi-threaded versions of our structures can scale to process 11M observations per second before becoming CPU bound. In comparison, a naive adaptation of the standard heavy-hitters algorithm to external memory would be limited by the storage device’s random I/O throughput, i.e., ≈100K observations per second.
more » « less
Full Text Available
How to Manage High-Bandwidth Memory Automatically

https://doi.org/10.1145/3350755.3400233

Das, Rathish; Agrawal, Kunal; Bender, Michael A.; Berry, Jonathan; Moseley, Benjamin; Phillips, Cynthia A. (July 2020, Symposium on Parallelism in Algorithms and Architectures)

Full Text Available
Timely Reporting of Heavy Hitters using External Memory

Singh, Shikha; Pandey, Prashant; Bender, Michael A.; Berry, Jonathan W.; Farach-Colton, Mart\'\i; Johnson, Rob; Kroeger, Thomas M.; Phillips, Cynthia A. (January 2021, ACM transactions on database systems)
null (Ed.)
Full Text Available
Optimizing for KNL Usage Modes When Data Doesn’t Fit in MCDRAM

https://doi.org/10.1145/3225058.3225116

Butcher, Neil; Olivier, Stephen L.; Berry, Jonathan; Hammond, Simon D.; Kogge, Peter M. (August 2018, International Conference on Parallel Processing)

Technologies such as Multi-Channel DRAM (MCDRAM) or High Bandwidth Memory (HBM) provide significantly more bandwidth than conventional memory. This trend has raised questions about how applications should manage data transfers between levels.This paper focuses on evaluating different usage modes of the MCDRAM in Intel Knights Landing (KNL) manycore processors. We evaluate these usage modes with a sorting kernel and a sortingbased streaming benchmark. We develop a performance model for the benchmark and use experimental evidence to demonstrate the correctness of the model. The model projects near-optimal numbers of copy threads for memory bandwidth bound computations. We demonstrate on KNL up to a 1.9X speedup for sort when the problem does not fit in MCDRAM over an OpenMP GNU sort that does not use MCDRAM.
more » « less
Full Text Available

Search for: All records