skip to main content

This content will become publicly available on July 11, 2023

Title: Automatic Reliability Testing for Cluster Management Controllers
Modern cluster managers like Borg, Omega and Kubernetes rely on the state-reconciliation principle to be highly resilient and extensible. In these systems, all cluster-management logic is embedded in a loosely coupled collection of microservices called controllers. Each controller independently observes the current cluster state and issues corrective actions to converge the cluster to a desired state. However, the complex distributed nature of the overall system makes it hard to build reliable and correct controllers – we find that controllers face myriad reliability issues that lead to severe consequences like data loss, security vulnerabilities, and resource leaks. We present Sieve, the first automatic reliability-testing tool for cluster-management controllers. Sieve drives controllers to their potentially buggy corners by systematically and extensively perturbing the controller’s view of the current cluster state in ways it is expected to tolerate. It then compares the cluster state’s evolution with and without perturbations to detect safety and liveness issues. Sieve’s design is powered by a fundamental opportunity in state-reconciliation systems – these systems are based on state-centric interfaces between the controllers and the cluster state; such interfaces are highly transparent and thereby enable fully-automated reliability testing. To date, Sieve has efficiently found 46 serious safety and liveness bugs more » (35 confirmed and 22 fixed) in ten popular controllers with a low false-positive rate of 3.5%. « less
; ; ; ; ; ; ;
Award ID(s):
1816615 2130560 2145295
Publication Date:
Journal Name:
Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22)
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Modern distributed data management systems face a new challenge: how can autonomous, mutually distrusting parties cooperate safely and effectively? Addressing this challenge brings up familiar questions from classical distributed systems: how to combine multiple steps into a single atomic action, how to recover from failures, and how to synchronize concurrent access to data. Nevertheless, each of these issues requires rethinking when participants are autonomous and potentially adversarial. We propose the notion of across-chain deal, a new way to structure complex distributed computations that manage assets in an adversarial setting. Deals are inspired by classical atomic transactions, but are necessarily different, in important ways, to accommodate the decentralized and untrusting nature of the exchange. We describe novel safety and liveness properties, along with two alternative protocols for implementing cross-chain deals in a system of independent blockchain ledgers. One protocol, based on synchronous communication, is fully decentralized, while the other, based on semi-synchronous communication, requires a globally shared ledger. We also prove that some degree of centralization is required in the semi-synchronous communication model.

  2. Programmable Logic Controllers are an established platform used throughout industrial automation, but rather poorly understood among researchers in the control systems community. This paper gives an overview of the state of the practice in industrial control systems while presenting a critical analysis of the dominant programming styles used in today's automation systems. We describe the patterns standardized loosely in IEC 61131-3 and, where there are ambiguities in the standard, realized in concrete vendor implementations. Ultimately, we suggest directions for further research towards enabling increasingly complex industrial control applications subject to the novel requirements of Industry 4.0 settings without compromising the safety and reliability guaranteed by the current industrial automation stack.
  3. Self-driving cars and trucks, autonomous vehicles (AVs), should not be accepted by regulatory bodies and the public until they have much higher confidence in their safety and reliability --- which can most practically and convincingly be achieved by testing. But existing testing methods are inadequate for checking the end-to-end behaviors of AV controllers against complex, real-world corner cases involving interactions with multiple independent agents such as pedestrians and human-driven vehicles. While test-driving AVs on streets and highways fails to capture many rare events, existing simulation-based testing methods mainly focus on simple scenarios and do not scale well for complex driving situations that require sophisticated awareness of the surroundings. To address these limitations, we propose a new fuzz testing technique, called AutoFuzz, which can leverage widely-used AV simulators' API grammars to generate semantically and temporally valid complex driving scenarios (sequences of scenes). To efficiently search for traffic violations-inducing scenarios in a large search space, we propose a constrained neural network (NN) evolutionary search method to optimize AutoFuzz. Evaluation of our prototype on one state-of-the-art learning-based controller, two rule-based controllers, and one industrial-grade controller in five scenarios shows that AutoFuzz efficiently finds hundreds of traffic violations in high-fidelity simulation environments. For eachmore »scenario, AutoFuzz can find on average 10-39% more unique traffic violations than the best-performing baseline method. Further, fine-tuning the learning-based controller with the traffic violations found by AutoFuzz successfully reduced the traffic violations found in the new version of the AV controller software.« less
  4. Solid-state-batteries (SSBs) present a promising technology for next-generation batteries due to their superior properties including increased energy density, wider electrochemical window and safer electrolyte design. Commercialization of SSBs, however, will depend on the resolution of a number of critical chemical and mechanical stability issues. The resolution of these issues will in turn depend heavily on our ability to accurately model these systems such that appropriate material selection, microstructure design, and operational parameters may be determined. In this article we review the current state-of-the art modeling tools with a focus on chemo-mechanics. Some of the key chemo-mechanical problems in SSBs involve dendrite growth through the solid-state electrolyte (SSE), interphase formation at the anode/SSE interface, and damage/decohesion of the various phases in the solid-state composite cathode. These mechanical processes in turn lead to capacity fade, impedance increase, and short-circuit of the battery, ultimately compromising safety and reliability. The article is divided into the three natural components of an all-solid-state architecture. First, modeling efforts pertaining to Li-metal anodes and dendrite initiation and growth mechanisms are reviewed, making the transition from traditional liquid electrolyte anodes to next generation all-solid-state anodes. Second, chemo-mechanics modeling of the SSE is reviewed with a particular focus on themore »formation of a thermodynamically unstable interphase layer at the anode/SSE interface. Finally, we conclude with a review of chemo-mechanics modeling efforts for solid-state composite cathodes. For each of these critical areas in a SSB we conclude by highlighting the key open areas for future research as it pertains to modeling the chemo-mechanical behavior of these systems.« less
  5. New technologies for future electronics such as personal healthcare devices and foldable smartphones require emerging developments in flexible energy storage devices as power sources. Besides the energy and power densities of energy devices, more attention should be paid to safety, reliability, and compatibility within highly integrated systems because they are almost in 24-hour real-time operation close to the human body. Thereupon, all-solid-state energy devices become the most promising candidates to meet these requirements. In this mini-review, the most recent research progress in all-solid-state flexible supercapacitors and batteries will be covered. The main focus of this mini-review is to summarize new materials development for all-solid-state flexible energy devices. The potential issues and perspectives regarding all-solid-state flexible energy device technologies will be highlighted.