We present a randomized approach for wait-free locks with strong bounds on time and fairness in a context in which any process can be arbitrarily delayed. Our approach supports a tryLock operation that is given a set of locks, and code to run when all the locks are acquired. A tryLock operation, or attempt, may fail if there is contention on the locks, in which case the code is not run. Given an upper bound κ known to the algorithm on the point contention of any lock, and an upper bound L on the number of locks in a try- Lock’s set, a tryLock will succeed in acquiring its locks and running the code with probability at least 1/(κL). It is thus fair. Furthermore, if the maximum step complexity for the code in any lock is T , the attempt will take O(κ^2L^2T ) steps, regardless of whether it succeeds or fails. The attempts are independent, thus if the tryLock is repeat- edly retried on failure, it will succeed in O(κ^3L^3T ) expected steps, and with high probability in not much more.
more »
« less
Lock-free locks revisited
This paper presents a new and practical approach to lock-free locks based on helping, which allows the user to write code using fine-grained locks, but run it in a lock-free manner. Although lock-free locks have been suggested in the past, they are widely viewed as impractical, have some key limitations, and, as far as we know, have never been implemented. The paper presents some key techniques that make lock-free locks practical and more general. The most important technique is an approach to idempotence—i.e. making code that runs multiple times appear as if it ran once. The idea is based on using a shared log among processes running the same protected code. Importantly, the approach can be library based, requiring very little if any change to standard code—code just needs to use the idempotent versions of memory operations (load, store, LL/SC, allocation, free). We have implemented a C++ library called Flock based on the ideas. Flock allows lock-based data structures to run in either lock-free or blocking (traditional locks) mode. We implemented a variety of tree and list-based data structures with Flock and compare the performance of the lock-free and blocking modes under a variety of workloads. The lock-free mode is almost as fast as blocking mode under almost all workloads, and significantly faster when threads are oversubscribed (more threads than processors). We also compare with several existing lock-based and lock-free alternatives.
more »
« less
- PAR ID:
- 10416516
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
- ISBN:
- 9781450392044
- Page Range / eLocation ID:
- 278 to 293
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recent work has shown how to augment any CAS-based concurrent data structure to support taking a snapshot of the current memory state. Taking the snapshot, as well as loads and CAS (Compare and Swap) operations, take constant time. Importantly, such snapshotting can be used to easily implement linearizable queries, such as range queries, over any part of a data structure. In this paper, we make two significant improvements over this approach. The first improvement removes a subtle and hard to reason about restriction that was needed to avoid a level of indirection on pointers. We introduce an approach, which we refer to as indirection-on-need, that removes the restriction, but yet almost always avoids indirection. The second improvement is to efficiently support snapshotting with lock-free locks. This requires supporting an idempotent CAS. We show a particularly simple solution to the problem that leverages the data structures used for snapshotting. Based on these ideas we implemented an easy-to-use C++ library, verlib, centered around a versioned pointer type. The library works with lock (standard or lock-free) and CAS based algorithms, or any combination. Converting existing concurrent data-structures to use the library takes minimal effort. We present results for experiments that use verlib to convert state-of-the-art data structures for ordered maps (a B-tree), radix-ordered maps (an ART-tree), and unordered maps (an optimized hash table) to be snapshottable. The snapshottable versions perform almost as well as the original versions and far outperform any previous implementations that support atomic range queries.more » « less
-
Bertot, Yves; Enrico, Tassi (Ed.)C program components verified for functional correctness in Coq using VST (Verified Software Toolchain) can now rely on a set of standard library components (math functions, malloc/free, atomic load/store, locks, threads) that have formal specifications.more » « less
-
Bertot, Yves; Tassi Enrico (Ed.)C program components verified for functional correctness in Coq using VST (Verified Software Toolchain) can now rely on a set of standard library components (math functions, malloc/free, atomic load/store, locks, threads) that have formal specifications.more » « less
-
We introduce the scheduler subversion problem, where lock usage patterns determine which thread runs, thereby subverting CPU scheduling goals. To mitigate this problem, we introduce Scheduler-Cooperative Locks (SCLs), a new family of locking primitives that controls lock usage and thus aligns with system-wide scheduling goals; our initial work focuses on proportional share schedulers. Unlike existing locks, SCLs provide an equal (or proportional) time window called lock opportunity within which each thread can acquire the lock. We design and implement three different scheduler-cooperative locks that work well with proportional-share schedulers: a user-level mutex lock (u-SCL), a reader-writer lock (RWSCL), and a simplified kernel implementation (k-SCL). We demonstrate the effectiveness of SCLs in two user-space applications (UpScaleDB and KyotoCabinet) and the Linux kernel. In all three cases, regardless of lock usage patterns, SCLs ensure that each thread receives proportional lock allocations that match those of the CPU scheduler. Using microbenchmarks, we show that SCLs are efficient and achieve high performance with minimal overhead under extreme workloads.more » « less