NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

XRP: In-Kernel Storage Functions with eBPF

Zhong, Yuhong; Li, Haoyu; Wu, Yu Jian; Zarkadas, Ioannis; Tao, Jeffrey; Mesterhazy, Evan; Makris, Michael; Yang, Junfeng; Tai, Amy; Stutsman, Ryan; et al (July 2022, Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation)

With the emergence of microsecond-scale NVMe storage devices, the Linux kernel storage stack overhead has become significant, almost doubling access times. We present XRP, a framework that allows applications to execute user-defined storage functions, such as index lookups or aggregations, from an eBPF hook in the NVMe driver, safely bypassing most of the kernel’s storage stack. To preserve file system semantics, XRP propagates a small amount of kernel state to its NVMe driver hook where the user-registered eBPF functions are called. We show how two key-value stores, BPF-KV, a simple B+-tree key-value store, and WiredTiger, a popular log-structured merge tree storage engine, can leverage XRP to significantly improve throughput and latency.
more » « less
Full Text Available
Cache-coherent accelerators for persistent memory crash consistency

https://doi.org/10.1145/3538643.3539752

Bhardwaj, Ankit; Thornley, Todd; Pawar, Vinita; Achermann, Reto; Zellweger, Gerd; Stutsman, Ryan (June 2022, Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems)

Building persistent memory (PM) data structures is difficult because crashes interrupt operations, leaving data structures in an inconsistent state. Solving this requires augmenting code that modifies PM state to ensure that interrupted operations can be completed or undone. Today, this is done using careful, hand-crafted code, a compiler pass, or page faults. We propose a new, easy way to transform volatile data structure code to work with PM that uses a cache-coherent accelerator to do this augmentation, and we show that it may outperform existing approaches for building PM structures.
more » « less
Full Text Available
NrOS: Effective Replication and Sharing in an Operating System

Bhardwaj, Ankit; Kulkarni, Chinmay; Achermann, Reto; Calciu, Irina; Kashyap, Sanidhya; Stutsman, Ryan; Tai, Amy; Zellweger, Gerd (July 2021, Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation)

Full Text Available
BPF for Storage: an Exokernel-Inspired Approach

https://doi.org/10.1145/3458336.3465290

Zhong, Yuhong; Wang, Hongyi; Wu, Yu Jian; Cidon, Asaf; Stutsman, Ryan; Tai, Amy; Yang, Junfeng (June 2021, Proceedings of the Workshop on Hot Topics in Operating Systems)

Full Text Available
Achieving high throughput and elasticity in a larger-than-memory store

https://doi.org/10.14778/3457390.3457406

Kulkarni, Chinmay; Chandramouli, Badrish; Stutsman, Ryan (April 2021, Proceedings of the VLDB Endowment)

Millions of sensors, mobile applications and machines now generate billions of events. Specialized many-core key-value stores (KVSs) can ingest and index these events at high rates (over 100 Mops/s on one machine) if events are generated on the same machine; however, to be practical and cost-effective they must ingest events over the network and scale across cloud resources elastically. We present Shadowfax, a new distributed KVS based on FASTER, that transparently spans DRAM, SSDs, and cloud blob storage while serving 130 Mops/s/VM over commodity Azure VMs using conventional Linux TCP. Beyond high single-VM performance, Shadowfax uses a unique approach to distributed reconfiguration that avoids any server-side key ownership checks or cross-core coordination both during normal operation and migration. Hence, Shadowfax can shift load in 17 s to improve system throughput by 10 Mops/s with little disruption. Compared to the state-of-the-art, it has 8x better throughput (than Seastar+memcached) and avoids costly I/O to move cold data during migration. On 12 machines, Shadowfax retains its high throughput to perform 930 Mops/s, which, to the best of our knowledge, is the highest reported throughput for a distributed KVS used for large-scale data ingestion and indexing.
more » « less
Full Text Available
Adaptive Placement for In-memory Storage Functions

Bhardwaj, Ankit; Kulkarni, Chinmay; Stutsman, Ryan (July 2020, 2020 USENIX Annual Technical Conference)
null (Ed.)
Fast networks and the desire for high resource utilization in data centers and the cloud have driven disaggregation. Application compute is separated from storage, but this leads to high overheads when data must move over the network for simple operations on it. Alternatively, systems could allow applications to run application logic within storage via user-defined functions. Unfortunately, this ties provisioning and utilization of storage and compute resources together again. We present a new approach to executing storage-level functions in an in-memory key-value store that avoids this problem by dynamically deciding where to execute functions over data. Users write storage functions that are logically decoupled from storage, but storage servers choose where to run invocations of these functions physically. By using a server-internal cost model and observing function execution, servers choose to directly run inexpensive functions, while preferring to execute functions with high CPU-cost at client machines. We show that with this approach storage servers can reduce network request processing costs, avoid server compute bottlenecks, and improve aggregate storage system throughput. We realize our approach on an in-memory key-value store that executes 3.2 million strict serializable user-defined storage functions per second with 100 us response times. When running a mix of logic from different applications, it provides throughput better than running that logic purely at storage servers (85% more) or purely at clients (10% more). For our workloads, it also reduces latency (up to 2x) and transactional aborts (up to 33%) over pure client-side execution.
more » « less
Full Text Available
On the Impact of Isolation Costs on Locality-aware Cloud Scheduling

Bhardwaj, Ankit; Gupta, Meghana; Stutsman, Ryan (July 2020, 12th USENIX Workshop on Hot Topics in Cloud Computing)
null (Ed.)
Serverless applications create an opportunity for more granular scheduling across machines in cloud platforms that can improve efficiency, especially if functions can be run within storage services to eliminate data movement. However, embedding code within storage services creates code isolation overheads that offset some of those savings. We argue for a new approach to serverless function scheduling that can look within serverless applications' functions, profile their data movement and networking costs, and model the impact of different code placement and isolation schemes for those costs. Beyond improvements in efficiency, such an approach would fuel innovation in cloud isolation schemes and programming abstractions, since a scheduler with a modular cost modeling approach could incorporate new schemes and automatically use them to improve efficiency for pre-existing applications.
more » « less
Full Text Available
Narrowing the Gap Between Serverless and its State with Storage Functions

https://doi.org/10.1145/3357223.3362723

Zhang, Tian; Xie, Dong; Li, Feifei; Stutsman, Ryan (November 2019, Proceedings of the ACM Symposium on Cloud Computing)

Serverless computing has gained attention due to its fine-grained provisioning, large-scale multi-tenancy, and on-demand scaling. However, it also forces applications to externalize state in remote storage, adding substantial overheads. To fix this "data shipping problem" we built Shredder, a low-latency multi-tenant cloud store that allows small units of computation to be performed directly within storage nodes. Storage tenants provide Shredder with JavaScript functions (or WebAssembly programs), which can interact directly with data without moving them over the network. The key challenge in Shredder is safely isolating thousands of tenant storage functions while minimizing data interaction costs. Shredder uses a unique approach where its data store and networking paths are implemented in native code to ensure performance, while isolated tenant functions interact with data using a V8-specific intermediate representation that avoids expensive cross-protection-domain calls and data copying. As a result, Shredder can execute 4 million remotely-invoked tenant functions per second spread over thousands of tenants with median and 99th-percentile response latencies of less than 50 μs and 500 μs, respectively. Our evaluation shows that Shredder achieves a 14% to 78% speedup against conventional remote storage when fetching items with just one to three data dependencies between them. We also demonstrate Shredder's effectiveness in accelerating data-intensive applications, including a k-hop query on social graphs that shows orders of magnitude gain.
more » « less
Full Text Available
SolarDB: Toward a Shared-Everything Database on Distributed Log-Structured Storage

https://doi.org/10.1145/3318158

Zhu, Tao; Zhao, Zhuoyue; Li, Feifei; Qian, Weining; Zhou, Aoying; Xie, Dong; Stutsman, Ryan; Li, Haining; Hu, Huiqi (June 2019, ACM Transactions on Storage)

Full Text Available
Splinter: Bare-Metal Extensions for Multi-Tenant Low-Latency Storage

Kulkarni, C; Moore, S; Naqvi, M; Zhang, T; Ricci, R; Stutsman, R (October 2018, USENIX Symposium on Operating Systems Design and Implementation)

In-memory key-value stores that use kernel-bypass networking serve millions of operations per second per machine with microseconds of latency. They are fast in part because they are simple, but their simple interfaces force applications to move data across the network. This is inefficient for operations that aggregate over large amounts of data, and it causes delays when traversing complex data structures. Ideally, applications could push small functions to storage to avoid round trips and data movement; however, pushing code to these fast systems is challenging. Any extra complexity for interpreting or isolating code cuts into their latency and throughput benefits. We present Splinter, a low-latency key-value store that clients extend by pushing code to it. Splinter is designed for modern multi-tenant data centers; it allows mutually distrusting tenants to write their own fine-grained extensions and push them to the store at runtime. The core of Splinter’s design relies on type- and memory-safe extension code to avoid conventional hardware isolation costs. This still allows for bare-metal execution, avoids data copying across trust boundaries, and makes granular storage functions that perform less than a microsecond of compute practical. Our measurements show that Splinter can process 3.5 million remote extension invocations per second with a median round-trip latency of less than 9 μs at densities of more than 1,000 tenants per server. We provide an implementation of Facebook’s TAO as an 800 line extension that, when pushed to a Splinter server, improves performance by 400 Kop/s to perform 3.2 Mop/s over online graph data with 30 μs remote access times.
more » « less
Full Text Available

« Prev Next »

Search for: All records