NSF PAR Search | NSF Public Access Repository

Recent work in networking, storage and multi-threading has demonstrated improved performance and scalability by replacing kernel-mode interrupts with high-rate user-space polling. Typically, such polling is performed by a dedicated core. Compiler Interrupts (CIs) instead enable efficient, automatic high-rate polling on a shared thread, which performs other work between polls. CIs are instrumentation-based and light-weight, allowing frequent interrupts with little performance impact. For example, when targeting a 5,000 cycle interval, the median overhead of our fastest CI design is 4% vs. 800% for hardware interrupts, across programs in the SPLASH-2, Phoenix and Parsec benchmark suites running with 32 threads. We evaluate CIs on three systems-level applications: (a) kernel bypass networking with mTCP, (b) joint kernel bypass networking and CPU scheduling with Shenango, and (c) delegation, a message-passing alternative to locking, with FFWD. For each application, we find that CIs offer compelling qualitative and quantitative improvements over the current state of the art. For example, CI-based mTCP achieves ≈2× stock mTCP throughput on a sample HTTP application.

Lazy Determinism for Faster Deterministic Multithreading

https://doi.org/10.1145/3297858.3304047

Merrifield, Timothy; Roghanchi, Sepideh; Devietti, Joseph; Eriksson, Jakob (January 2019, Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems)

Deterministic multithreading (DMT) fundamentally requires total, deterministic ordering of synchronization operations on each synchronization variable, i.e. a partial ordering over all synchronization operations. In practice, prior DMT systems totally order all synchronization operations, regardless of synchronization variable; the result is severe performance degradation for highly concurrent applications using fine-grained synchronization. Motivated by this class of programs, we propose lazy determinism as a way to go beyond this total order bottleneck. Lazy determinism executes synchronization operations speculatively, and enforces determinism by subsequently validating the resulting order of operations. If an ordering violation is detected, part of the computation is restarted. By enforcing only the partial ordering required to guarantee determinism, lazy determinism increases the available parallelism during deterministic execution. We implement LazyDet via a pure-software runtime system accelerated by custom Linux kernel support. Our experiments with hash table benchmarks from Synchrobench show roughly an order of magnitude improvement in the performance of lock-based data structures compared to the state of the art in eager determinism. For benchmarks from PARSEC-2, SPLASH-2, and Phoenix, we demonstrate runtime improvements of up to 2× on the programs that challenge deterministic execution environments the most.

Full Text Available

Search for: All records