skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Opportunities for optimism in contended main-memory multicore transactions
Main-memory multicore transactional systems have achieved excellent performance using single-version optimistic concurrency control (OCC), especially on uncontended workloads. Nevertheless, systems based on other concurrency control protocols, such as hybrid OCC/ locking and variations on multiversion concurrency control (MVCC), are reported to outperform the best OCC systems, especially with increasing contention. This paper shows that implementation choices unrelated to concurrency control can explain some of these performance differences. Our evaluation shows the strengths and weaknesses of OCC, MVCC, and TicToc concurrency control under varying workloads and contention levels, and the importance of several implementation choices called basis factors. Given sensible basis factor choices, OCC performance does not collapse on high-contention TPC-C. We also present two optimization techniques, deferred updates and timestamp splitting, that can dramatically improve the high-contention performance of both OCC and MVCC. These techniques are known, but we apply them in a new context and highlight their potency: when combined, they lead to performance gains of 4.74× for MVCC and 5.01× for OCC in a TPC-C workload.  more » « less
Award ID(s):
1704376
PAR ID:
10323465
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
The VLDB Journal
ISSN:
1066-8888
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents the design and implementation of Obladi, the first system to provide ACID transactions while also hiding access patterns. Obladi uses as its building block oblivious RAM, but turns the demands of supporting transac- tions into a performance opportunity. By executing transac- tions within epochs and delaying commit decisions until an epoch ends, Obladi reduces the amortized bandwidth costs of oblivious storage and increases overall system through- put. These performance gains, combined with new oblivious mechanisms for concurrency control and recovery, allow Obladi to execute OLTP workloads with reasonable through- put: it comes within 5× to 12× of a non-oblivious baseline on the TPC-C, SmallBank, and FreeHealth applications. Latency overheads, however, are higher (70× on TPC-C). 
    more » « less
  2. Research on transaction processing has made significant progress towards improving performance of main memory multicore OLTP systems under low contention. However, these systems struggle on workloads with lots of conflicts. Partitioned databases (and variants) perform well on high contention workloads that are statically partitionable, but time-varying workloads often make them impractical. To- wards addressing this, we propose Strife—a novel transac- tion processing scheme that clusters transactions together dynamically and executes most of them without any con- currency control. Strife executes transactions in batches, where each batch is partitioned into disjoint clusters with- out any cross-cluster conflicts and a small set of residuals. The clusters are then executed in parallel with no concur- rency control, followed by residuals separately executed with concurrency control. Strife uses a fast dynamic clustering al- gorithm that exploits a combination of random sampling and concurrent union-find data structure to partition the batch online, before executing it. Strife outperforms lock-based and optimistic protocols by up to 2× on high contention workloads. While Strife incurs about 50% overhead relative to partitioned systems in the statically partitionable case, it performs 2× better when such static partitioning is not possible and adapts to dynamically varying workloads. 
    more » « less
  3. Multi-version concurrency control (MVCC) is a widely used, sophisticated approach for handling concurrent transactions. vMVCC is the first MVCC-based transaction library that comes with a machine-checked proof of correctness, providing clients with a guarantee that it will correctly handle all transactions despite a complicated design and implementation that might otherwise be error-prone. vMVCC is implemented in Go, stores data in memory, and uses several optimizations, such as RDTSC-based timestamps, to achieve high performance (25–96% the throughput of Silo, a state-of-the-art in-memory database, for YCSB and TPC-C workloads). Formally specifying and verifying vMVCC required adopting advanced proof techniques, such as logical atomicity and prophecy variables, owing to the fact that MVCC transactions can linearize at timestamp generation prior to transaction execution. 
    more » « less
  4. Persistent memory (PM) can be accessed directly from userspace without kernel involvement, but most PM filesystems still perform metadata operations in the kernel for secuity and rely on the kernel for cross-process synchronization. We present per-file virtualization, where a virtualization layer implements a complete set of file functionalities, including metadata management, crash consistency, and concurrency control, in userspace. We observe that not all file metadata need to be maintained by the kernel and propose embedding insensitive metadata into the file for userspace management. For crash consistency, copy-on-write (CoW) benefits from the embedding of the block mapping since the mapping can be efficiently updated without kernel involvement. For cross-process synchronization, we introduce lockfree optimistic concurrency control (OCC) at user level, which tolerates process crashes and provides better scalability. Based on per-file virtualization, we implement MadFS, a library PM filesystem that maintains the embedded metadata as a compact log. Experimental results show that on concurrent workloads, MadFS achieves up to 3.6× the throughput of ext4-DAX. For real-world applications, MadFS provides up to 48% speedup for YCSB on LevelDB and 85% for TPC-C on SQLite compared to NOVA. 
    more » « less
  5. ScaleDB is a serializable in-memory transactional database that achieves excellent scalability on multi-core machines by asynchronously updating range indexes. We find that asynchronous range index updates can significantly improve database scalability by applying updates in batches, reducing contention on critical sections. To avoid stale reads, ScaleDB uses small hash indexlets to hold delayed updates. We use in- dexlets to design ACC, an asynchronous concurrency control protocol providing serializability. With ACC, it is possible to delay range index updates without adverse performance effects on transaction execution in the common case. ACC delivers scalable serializable isolation for transactions, with high throughput and low abort rate. Evaluation on a dual- socket server with 36 cores shows that ScaleDB achieves 9.5× better query throughput than Peloton on the YCSB bench- mark and 1.8× better transaction throughput than Cicada on the TPC-C benchmark. 
    more » « less