skip to main content


Title: Log-Based CRDT for Edge Applications
In this paper, we investigate extensions for Conflict-Free Replicated Data Types (CRDTs) that permit their use in failure-prone, heterogeneous, resource-constrained, distributed, multi-tier (cloud/edge/device) cloud deployments such as the Internet-of-Things (IoT), while addressing multiple CRDT limitations. Specifically, we employ distributed logging to implement robust, strong eventual consistency of replicas. Our approach also enables uniform reversal of operations and precludes the requirement of exactly-once delivery and idempotence imposed by operation-based CRDTs. Moreover, it exposes CRDT versions for use in debugging and history-based programming. We evaluate our approach for commonly used CRDTs and show that it enables higher operation throughput (up to 1.8x) versus conventional CRDTs for the workloads we consider.  more » « less
Award ID(s):
2107101 1703560
NSF-PAR ID:
10345308
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Conference on Cloud Engineering
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    One of the activities of the Pacific Rim Applications and Grid Middleware Assembly (PRAGMA) is fostering Virtual Biodiversity Expeditions by bringing domain scientists and cyber infrastructure specialists together as a team. Over the past few years, PRAGMA members have been collaborating on virtualizing the Lifemapper software. Virtualization and cloud computing have introduced great flexibility and efficiency into IT projects. Virtualization refers to the technologies that provide a layer of abstraction between server hardware system and software that runs on it. This abstraction enables a logical view of computing resources and allows multiple servers to run on the same hardware. With this project, we are virtualizing Lifemapper by enabling its installation and configuration on a virtual cluster. Virtualization provides application scalability, maximizes resources utilization, and creates a more efficient, agile, and automated infrastructure. However, there are downsides to the complexity inherent in these environments, including the need for special techniques to deploy cluster hosts, dependence on virtual environments, and challenging application installation, management, and configuration. In this study, we report on progress of the Lifemapper virtualization framework focused on a reproducible and highly configurable infrastructure capable of fast deployment.

    Lifemapper is a distributed software application developed by the Biodiversity Institute at The University of Kansas. Lifemapper creates and maintains a publicly accessible archive of species distribution maps calculated from public specimen data. Lifemapper software also provides a suite of tools for biodiversity researchers that calculate single and multispecies distribution predictions and macroecological analyses through application programming interfaces. Our goal is to create a viable solution that can be easily adopted and reused by scientists from multiple institutions or projects. This solution (1) allows fast deployment of ready‐made cluster images, (2) reproduces the complete Lifemapper processing pipeline on demand at multiple sites and in different hosting environments, and (3) enables scientists to perform Lifemapper‐facilitated data processing on restricted‐use data, very large datasets, or other unique data.

    A key contribution of this work is describing the practical experience in taking a complex, clustered, domain‐specific, data analysis, and simulation system and enabling its operation on a variety of system configurations. Uses of this portability range from whole cluster replication to teaching and experimentation on a single laptop. System virtualization is used to practically define and make portable the full application stack, including all of its complex set of supporting software and allows Lifemapper deployment in a variety of environments.

     
    more » « less
  2. Conflict-free replicated data types (CRDTs) are a promising tool for designing scalable, coordination-free distributed systems. However, constructing correct CRDTs is difficult, posing a challenge for even seasoned developers. As a result, CRDT development is still largely the domain of academics, with new designs often awaiting peer review and a manual proof of correctness. In this paper, we present Katara, a program synthesis-based system that takes sequential data type implementations and automatically synthesizes verified CRDT designs from them. Key to this process is a new formal definition of CRDT correctness that combines a reference sequential type with a lightweight ordering constraint that resolves conflicts between non-commutative operations. Our process follows the tradition of work in verified lifting, including an encoding of correctness into SMT logic using synthesized inductive invariants and hand-crafted grammars for the CRDT state and runtime. Katara is able to automatically synthesize CRDTs for a wide variety of scenarios, from reproducing classic CRDTs to synthesizing novel designs based on specifications in existing literature. Crucially, our synthesized CRDTs are fully, automatically verified, eliminating entire classes of common errors and reducing the process of producing a new CRDT from a painstaking paper proof of correctness to a lightweight specification. 
    more » « less
  3. null (Ed.)
    The extreme bandwidth and performance of 5G mobile networks changes the way we develop and utilize digital services. Within a few years, 5G will not only touch technology and applications, but dramatically change the economy, our society and individual life. One of the emerging technologies that enables the evolution to 5G by bringing cloud capabilities near to the end users is Edge Computing or also known as Multi-Access Edge Computing (MEC) that will become pertinent towards the evolution of 5G. This evolution also entails growth in the threat landscape and increase privacy in concerns at different application areas, hence security and privacy plays a central role in the evolution towards 5G. Since MEC application instantiated in the virtualized infrastructure, in this paper we present a distributed application that aims to constantly introspect multiple virtual machines (VMs) in order to detect malicious activities based on their anomalous behavior. Once suspicious processes detected, our IDS in real-time notifies system administrator about the potential threat. Developed software is able to detect keyloggers, rootkits, trojans, process hiding and other intrusion artifacts via agent-less operation, by operating remotely or directly from the host machine. Remote memory introspection means no software to install, no notice to malware to evacuate or destroy data. Experimental results of remote VMI on more than 50 different malicious code demonstrate average anomaly detection rate close to 97%. We have established wide testbed environment connecting networks of two universities Kyushu Institute of Technology and The City College of New York through secure GRE tunnel. Conducted experiments on this testbed deliver high response time of the proposed system. 
    more » « less
  4. Millions of sensors, mobile applications and machines now generate billions of events. Specialized many-core key-value stores (KVSs) can ingest and index these events at high rates (over 100 Mops/s on one machine) if events are generated on the same machine; however, to be practical and cost-effective they must ingest events over the network and scale across cloud resources elastically. We present Shadowfax, a new distributed KVS based on FASTER, that transparently spans DRAM, SSDs, and cloud blob storage while serving 130 Mops/s/VM over commodity Azure VMs using conventional Linux TCP. Beyond high single-VM performance, Shadowfax uses a unique approach to distributed reconfiguration that avoids any server-side key ownership checks or cross-core coordination both during normal operation and migration. Hence, Shadowfax can shift load in 17 s to improve system throughput by 10 Mops/s with little disruption. Compared to the state-of-the-art, it has 8x better throughput (than Seastar+memcached) and avoids costly I/O to move cold data during migration. On 12 machines, Shadowfax retains its high throughput to perform 930 Mops/s, which, to the best of our knowledge, is the highest reported throughput for a distributed KVS used for large-scale data ingestion and indexing. 
    more » « less
  5. Computational science today depends on complex, data-intensive applications operating on datasets from a variety of scientific instruments. A major challenge is the integration of data into the scientist's workflow. Recent advances in dynamic, networked cloud resources provide the building blocks to construct reconfigurable, end-to-end infrastructure that can increase scientific productivity. However, applications have not adequately taken advantage of these advanced capabilities. In this work, we have developed a novel network-centric platform that enables high-performance, adaptive data flows and coordinated access to distributed cloud resources and data repositories for atmospheric scientists. We demonstrate the effectiveness of our approach by evaluating time-critical, adaptive weather sensing workflows, which utilize advanced networked infrastructure to ingest live weather data from radars and compute data products used for timely response to weather events. The workflows are orchestrated by the Pegasus workflow management system and were chosen because of their diverse resource requirements. We show that our approach results in timely processing of Nowcast workflows under different infrastructure configurations and network conditions. We also show how workflow task clustering choices affect throughput of an ensemble of Nowcast workflows with improved turnaround times. Additionally, we find that using our network-centric platform powered by advanced layer2 networking techniques results in faster, more reliable data throughput, makes cloud resources easier to provision, and the workflows easier to configure for operational use and automation. 
    more » « less