skip to main content

Search for: All records

Award ID contains: 1703560

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. In this paper, we investigate extensions for Conflict-Free Replicated Data Types (CRDTs) that permit their use in failure-prone, heterogeneous, resource-constrained, distributed, multi-tier (cloud/edge/device) cloud deployments such as the Internet-of-Things (IoT), while addressing multiple CRDT limitations. Specifically, we employ distributed logging to implement robust, strong eventual consistency of replicas. Our approach also enables uniform reversal of operations and precludes the requirement of exactly-once delivery and idempotence imposed by operation-based CRDTs. Moreover, it exposes CRDT versions for use in debugging and history-based programming. We evaluate our approach for commonly used CRDTs and show that it enables higher operation throughput (up to 1.8x) versus conventional CRDTs for the workloads we consider.
    Free, publicly-accessible full text available September 1, 2023
  2. Data replication facilitates availability and recovery in a distributed environment. However, concurrent updates to multiple replicas result in divergence of data. Conflict-Free Replicated Data Types (CRDTs) are abstract data types that provide a principled approach to asynchronously reconcile this divergence. We propose a different perspective on the divergence of data, whereby we treat data divergences as versions of the data. That is, instead of treating it only as a problem that needs to be solved, we consider it also to be a feature that provides a way to track versioning and evolution of data. Versioning information is helpful in multiple scenarios, such as provenance tracking and system debugging. Doing so allows us to leverage concepts such as the version tree found in the literature for persistent (versioned) data structures. We show that many techniques used in CRDTs to order elements can be derived from version trees, which predates CRDTs by more than 20 years. Using version trees for maintaining order and append-only logs for storage, we propose a method to ensure convergence of arbitrary data types, while maintaining information related to the evolution of data.
    Free, publicly-accessible full text available April 5, 2023
  3. Data privacy has garnered significant attention recently due to diverse applications that store sensitive data in untrusted infrastructure. From a data management point of view, the focus has been on the privacy of stored data and the privacy of querying data at a large scale. However, databases are not solely query engines on static data, they must support updates on dynamically evolving datasets. In this paper, we lay out a vision for privacy-preserving dynamic data. In particular, we focus on dynamic data that might be stored remotely on untrusted providers. Updates arrive at a provider and are verified and incorporated into the database based on predefined constraints. Depending on the application, the content of the stored data, the content of the updates and the constraints may be private or public. We then propose PReVer, a universal framework for managing regulated dynamic data in a privacy-preserving manner. We explore a set of research challenges that PReVer needs to address in order to guarantee the privacy of data, updates, and/or constraints and address the consistent and verifiable execution of updates. This opens the space of privacy-preserving data management from the narrow perspective of private queries on static datasets to the larger spacemore »of private management of dynamic data.« less
  4. We introduce Canal, a programmable, topic-based, publish/subscribe system that is designed for multi-tier cloud deployments (e.g. edge-cloud, multi-cloud, IoT-cloud, etc.). Canal implements a triggered computational (i.e. “serverless”) programming model and provides developers with a uniform and portable programming interface. To achieve scalability and reliability, Canal combines the use of a distributed hash table (DHT) and replica consensus protocol to distribute and replicate functions, state, and data. Canal also decouples replica placement from the DHT topology to allow developers to optimize function placement for different objectives. We evaluate Canal using a real-world multi-tier IoT deployment and we use Canal to compare placement strategies, end-to-end performance, and failure recovery using both benchmarks and a real-world IoT-edge application. Our results show that Canal is able to achieve both low latency and reliability in this setting.
  5. We present CAPLets, an authorization mechanism that extends capability based security to support fine grained access control for multi-scale (sensors, edge, cloud) IoT deployments. To enable this, CAPLets uses a strong cryptographic construction to provide integrity while preserving computational efficiency for resource constrained systems. Moreover, CAPLets augments capabilities with dynamic, user defined constraints to describe arbitrary access control policies. We introduce an application specific, turing complete virtual machine, CapVM, alongside with eBPF and Wasm, to describe constraints. We show that CAPLets is able to express permissions and requirements at a fine grain, facilitating construction of non-trivial access control policies. We empirically evaluate the efficiency and flexibility of CAPLets abstractions using resource constrained devices and end-to-end IoT deployments, and compare it against related mechanisms in wide use today. Our empirical results show that CAPLets is an order of magnitude faster and more energy efficient than current IoT authorization systems.
  6. Co-location of processing infrastructure and IoT devices at the edge is used to reduce response latency and long-haul network use for IoT applications. As a result, edge clouds for many applications (e.g. agriculture, ecology, and smart city deployments) must operate in remote, unattended, and environmentally harsh settings, introducing new challenges. One key challenge is heat exposure, which can degrade the performance, reliability, and longevity of electronics. For edge clouds, these problems are exacerbated because they increasingly perform complex workloads, such as machine learning, to affect data-driven actuation and control of devices and systems in the environment. The goal of our work is to protect edge clouds from overheating. To enable this, we develop a heat-budget-based scheduling system, called Sparta, which leverages dynamic voltage and frequency scaling (DVFS) to adaptively control CPU temperature. Sparta takes machine learning applications, datasets, and a temperature threshold as input. It sets the initial frequency of the CPU based on historical data and then dynamically updates it, according to the applications’ execution profile and ambient temperature, to safeguard edge devices. We find that for a suite of machine learning applications and deployment temperatures, Sparta is able to maintain CPU temperature below the threshold 94% of themore »time while facilitating improvements in execution time by 1.04x − 1.32x over competitive approaches.« less
  7. Given a private string q and a remote server that holds a set of public documents D, how can one of the K most relevant documents to q in D be selected and viewed without anyone (not even the server) learning anything about q or the document? This is the oblivious document ranking and retrieval problem. In this paper, we describe Coeus, a system that solves this problem. At a high level, Coeus composes two cryptographic primitives: secure matrix-vector product for scoring document relevance using the widely-used term frequency-inverse document frequency (tf-idf) method, and private information retrieval (PIR) for obliviously retrieving documents. However, Coeus reduces the time to run these protocols, thereby improving the user-perceived latency, which is a key performance metric. Coeus first reduces the PIR overhead by separating out private metadata retrieval from document retrieval, and it then scales secure matrix-vector product to tf-idf matrices with several hundred billion elements through a series of novel cryptographic refinements. For a corpus of English Wikipedia containing 5 million documents, a keyword dictionary with 64K keywords, and on a cluster of 143 machines on AWS, Coeus enables a user to obliviously rank and retrieve a document in 3.9 seconds---a 24x improvementmore »over a baseline system.« less
  8. Ever since the commercial offerings of the Cloud started appearing in 2006, the landscape of cloud computing has been undergoing remarkable changes with the emergence of many different types of service offerings, developer productivity enhancement tools, and new application classes as well as the manifestation of cloud functionality closer to the user at the edge. The notion of utility computing, however, has remained constant throughout its evolution, which means that cloud users always seek to save costs of leasing cloud resources while maximizing their use. On the other hand, cloud providers try to maximize their profits while assuring service-level objectives of the cloud-hosted applications and keeping operational costs low. All these outcomes require systematic and sound cloud engineering principles. The aim of this paper is to highlight the importance of cloud engineering, survey the landscape of best practices in cloud engineering and its evolution, discuss many of the existing cloud engineering advances, and identify both the inherent technical challenges and research opportunities for the future of cloud computing in general and cloud engineering in particular.
  9. Due to the proliferation of IoT and the popularity of smart contracts mediated by blockchain, smart home systems have become capable of providing privacy and security to their occupants. In blockchain-based home automation systems, business logic is handled by smart contracts securely. However, a blockchain-based solution is inherently resource-intensive, making it unsuitable for resource-constrained IoT devices. Moreover, time-sensitive actions are complex to perform in a blockchainbased solution due to the time required to mine a block. In this work, we propose a blockchain-independent smart contract infrastructure suitable for resource-constrained IoT devices. Our proposed method is also capable of executing time-sensitive business logic. As an example of an end-to-end application, we describe a smart camera system using our proposed method, compare this system with an existing blockchain-based solution, and present an empirical evaluation of their performance.
  10. In this paper, we investigate how to automatically persist versioned data structures in distributed settings (e.g. cloud + edge) using append-only storage. By doing so, we facilitate resiliency by enabling program state to survive program activations and termination, and program-level data structures and their version information to be accessed programmatically by multiple clients (for replay, provenance tracking, debugging, and coordination avoidance, and more). These features are useful in distributed, failure-prone contexts such as those for heterogeneous and pervasive Internet of Things (IoT) deployments. We prototype our approach within an open-source, distributed operating system for IoT. Our results show that it is possible to achieve algorithmic complexities similar to those of in-memory versioning but in a distributed setting.