skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Trading Off Consistency and Availability in Tiered Heterogeneous Distributed Systems
Tiered distributed computing systems, where components run in Internet-of-things devices, in edge computers, and in the cloud, introduce unique difficulties in maintaining consistency of shared data while ensuring availability. A major source of difficulty is the highly variable network latencies that applications must deal with. It is well known in distributed computing that when network latencies rise sufficiently, one or both of consistency and availability must be sacrificed. This paper quantifies consistency and availability and gives an algebraic relationship between these quantities and network latencies. The algebraic relation is linear in a max-plus algebra and supports heterogeneous networks, where the communication latency between 2 components may differ from the latency between another 2 components. We show how to make use of this algebraic relation to guide design, enabling software designers to specify consistency and availability requirements, and to derive from those the requirements on network latencies. We show how to design systems to fail in predictable ways when the network latency requirements are violated, by choosing to sacrifice either consistency or availability.  more » « less
Award ID(s):
2233769
PAR ID:
10473656
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Intelligent Computing
Date Published:
Journal Name:
Intelligent Computing
Volume:
2
ISSN:
2771-5892
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In distributed applications, Brewer’s CAP theorem tells us that when networks become partitioned (P), one must give up either consistency (C) or availability (A). Consistency is agreement on the values of shared variables; availability is the ability to respond to reads and writes accessing those shared variables. Availability is a real-time property whereas consistency is a logical property. We extend consistency and availability to refer to cyber-physical properties such as the state of the physical system and delays in actuation. We have further extended the CAP theorem to relate quantitative measures of these two properties to quantitative measures of communication and computation latency (L), obtaining a relation called the CAL theorem that is linear in a max-plus algebra. This paper shows how to use the CAL theorem in various ways to help design cyber-physical systems. We develop a methodology for systematically trading off availability and consistency in application-specific ways and to guide the system designer when putting functionality in end devices, in edge computers, or in the cloud. We build on theLingua Francacoordination language to provide system designers with concrete analysis and design tools to make the required tradeoffs in deployable embedded software. 
    more » « less
  2. Edge computing has emerged as a popular paradigm for running latency-sensitive applications due to its ability to offer lower network latencies to end-users. In this paper, we argue that despite its lower network latency, the resource-constrained nature of the edge can result in higher end-to-end latency, especially at higher utilizations, when compared to cloud data centers. We study this edge performance inversion problem through an analytic comparison of edge and cloud latencies and analyze conditions under which the edge can yield worse performance than the cloud. To verify our analytic results, we conduct a detailed experimental comparison of the edge and the cloud latencies using a realistic application and real cloud workloads. Both our analytical and experimental results show that even at moderate utilizations, the edge queuing delays can offset the benefits of lower network latencies, and even result in performance inversion where running in the cloud would provide superior latencies. We finally discuss practical implications of our results and provide insights into how application designers and service providers should design edge applications and systems to avoid these pitfalls. 
    more » « less
  3. Many Cyber-Physical Systems (CPS) have timing constraints that must be met by the cyber components (software and the network) to ensure safety. It is a tedious job to check if a CPS meets its timing requirement especially when they are distributed and the software and/or the underlying computing platforms are complex. Furthermore, the system design is brittle since a timing failure can still happen e.g., network failure, soft error bit flip, etc. In this paper, we propose a new design methodology called Plan B where timing constraints of the CPS are monitored at the runtime, and a proper backup routine is executed when a timing failure happens to ensure safety. We provide a model on how to express the desired timing behavior using a set of timing constructs in a C/C++ code and how to efficiently monitor them at the runtime. We showcase the effectiveness of our approach by conducting experiments on three case studies: 1) the full software stack for autonomous driving (Apollo), 2) a multi-agent system with 1/10th scale model robots, and 3) a quadrotor for search and rescue application. We show that the system remains safe and stable even when intentional faults are injected to cause a timing failure. We also demonstrate that the system can achieve graceful degradation when a less extreme timing failure happens. 
    more » « less
  4. Caches are at the heart of latency-sensitive systems. In this paper, we identify a growing challenge for the design of latency-minimizing caches called delayed hits. Delayed hits occur at high throughput, when multiple requests to the same object queue up before an outstanding cache miss is resolved. This effect increases latencies beyond the predictions of traditional caching models and simulations; in fact, caching algorithms are designed as if delayed hits simply didn't exist. We show that traditional caching strategies -- even so called 'optimal' algorithms -- can fail to minimize latency in the presence of delayed hits. We design a new, latency-optimal offline caching algorithm called belatedly which reduces average latencies by up to 45% compared to the traditional, hit-rate optimal Belady's algorithm. Using belatedly as our guide, we show that incorporating an object's 'aggregate delay' into online caching heuristics can improve latencies for practical caching systems by up to 40%. We implement a prototype, Minimum-AggregateDelay (mad), within a CDN caching node. Using a CDN production trace and backends deployed in different geographic locations, we show that mad can reduce latencies by 12-18% depending on the backend RTTs. 
    more » « less
  5. null (Ed.)
    In today's era of Internet of Things (IoT), where massive amounts of data are produced by IoT and other devices, edge computing has emerged as a prominent paradigm for low-latency data processing. However, applications may have diverse latency requirements: certain latency-sensitive processing operations may need to be performed at the edge, while delay-tolerant operations can be performed on the cloud, without occupying the potentially limited edge computing resources. To achieve that, we envision an environment where computing resources are distributed across edge and cloud offerings. In this paper, we present the design of CLEDGE (CLoud + EDGE), an information-centric hybrid cloud-edge framework, aiming to maximize the on-time completion of computational tasks offloaded by applications with diverse latency requirements. The design of CLEDGE is motivated by the networking challenges that mixed reality researchers face. Our evaluation demonstrates that CLEDGE can complete on-time more than 90% of offloaded tasks with modest overheads. 
    more » « less