skip to main content

Title: Fides: Managing Data on Untrusted Infrastructure
Significant amounts of data are currently being stored and managed on third-party servers. It is impractical for many small scale enterprises to own their private datacenters, hence renting third-party servers is a viable solution for such businesses. But the increasing number of malicious attacks, both internal and external, as well as buggy software on third-party servers is causing clients to loose their trust in these external infrastructures. While small enterprises cannot avoid using external infrastructures, they need the right set of protocols to manage their data on untrusted infrastructures. In this paper, we propose TFCommit, a novel atomic commitment protocol that executes transactions on data stored across multiple untrusted servers. To our knowledge, TFCommit is the first atomic commitment protocol to execute transactions in an untrusted environment without using expensive Byzantine replication. Using TFCommit, we propose an auditable data management system, Fides, residing completely on untrustworthy infrastructure. As an auditable system, Fides guarantees the detection of potentially malicious failures occurring on untrusted servers using tamper-resistant logs with the support of cryptographic techniques. The experimental evaluation demonstrates the scalability of our approach and the relatively low overhead of executing transactions on untrusted infrastructure.  more » « less
Award ID(s):
1703560 1815733 1815212
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE International Conference on Distributed Computing Systems
Page Range / eLocation ID:
344 to 354
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Darmont, J ; Novikov, B. ; Wrembel, R. (Ed.)
    Bitcoin [12] is a successful and interesting example of a global scale peer-to-peer cryptocurrency that integrates many techniques and protocols from cryptography, distributed systems, and databases. The main underlying data structure is blockchain, a scalable fully replicated structure that is shared among all participants and guarantees a consistent view of all user transactions by all participants in the system. In a blockchain, nodes agree on their shared states across a large network of untrusted participants. Although originally devised for cryptocurrencies, recent systems exploit its many unique features such as transparency, provenance, fault tolerance, and authenticity to support a wide range of distributed applications. Bitcoin and other cryptocurrencies use permissionless blockchains. In a permissionless blockchain, the network is public, and anyone can participate without a specific identity. Many other distributed applications, such as supply chain management and healthcare, are deployed on permissioned blockchains consisting of a set of known, identified nodes that still might not fully trust each other. This paper illustrates some of the main challenges and opportunities from a database perspective in the many novel and interesting application domains of blockchains. These opportunities are illustrated using various examples from recent research in both permissionless and permissioned blockchains. Two main themes unite the various examples: (1) the important role of distribution and consensus in managing large scale systems and (2) the need to tolerate malicious failures. The advent of cloud computing and large data centers shifted large scale data management infrastructures from centralized databases to distributed systems. One of the main challenges in designing distributed systems is the need for fault-tolerance. Cloud-based systems typically assume trusted infrastructures, since data centers are owned by the enterprises managing the data, and hence the design typically only assumes and tolerates crash failures. The advent of blockchain and the underlying premise that copies of the blockchain are distributed among untrusted entities has shifted the focus of fault-tolerance from tolerating crash failures to tolerating malicious failures. These interesting and challenging settings pose great opportunities for database researchers. 
    more » « less
  2. Federated learning (FL) is an increasingly popular approach for machine learning (ML) in cases where the training dataset is highly distributed. Clients perform local training on their datasets and the updates are then aggregated into the global model. Existing protocols for aggregation are either inefficient, or don’t consider the case of malicious actors in the system. This is a major barrier in making FL an ideal solution for privacy-sensitive ML applications. We present ELSA, a secure aggregation protocol for FL, which breaks this barrier - it is efficient and addresses the existence of malicious actors at the core of its design. Similar to prior work on Prio and Prio+, ELSA provides a novel secure aggregation protocol built out of distributed trust across two servers that keeps individual client updates private as long as one server is honest, defends against malicious clients, and is efficient end-to-end. Compared to prior works, the distinguishing theme in ELSA is that instead of the servers generating cryptographic correlations interactively, the clients act as untrusted dealers of these correlations without compromising the protocol’s security. This leads to a much faster protocol while also achieving stronger security at that efficiency compared to prior work. We introduce new techniques that retain privacy even when a server is malicious at a small added cost of 7-25% in runtime with negligible increase in communication over the case of semi-honest server. Our work improves end-to-end runtime over prior work with similar security guarantees by big margins - single-aggregator RoFL by up to 305x (for the models we consider), and distributed trust Prio by up to 8x. 
    more » « less
  3. Increasing System-on-Chip (SoC) design complexity coupled with time-to-market constraints have motivated manufacturers to integrate several third-party Intellectual Property (IP) cores in their SoC designs. IPs acquired from potentially untrusted vendors can be a serious threat to the trusted IPs when they are connected using the same Network-on-Chip (NoC). For example, the malicious IPs can tamper packets as well as degrade SoC performance by launching DoS attacks. While existing authentication schemes can check the data integrity of packets, it can introduce unacceptable overhead on resource-constrained SoCs. In this paper, we propose a lightweight and trust-aware routing mechanism to bypass malicious IPs during packet transfers. This reduces the number of re-transmissions due to tampered data, minimizes DoS attack risk, and as a result, improves SoC performance even in the presence of malicious IPs. Experimental results demonstrate significant improvement in both performance and energy efficiency with minor impact on area overhead. 
    more » « less
  4. null (Ed.)
    The CS Education community has developed many educational tools in recent years, such as interactive exercises. Often the developer makes them freely available for use, hosted on their own server, and usually they are directly accessible within the instructor's LMS through the LTI protocol. As convenient as this can be, instructors using these third-party tools for their courses can experience issues related to data access and privacy concerns. The tools typically collect clickstream data on student use. But they might not make it easy for the instructor to access these data, and the institution might be concerned about privacy violations. While the developers might allow and even support local installation of the tool, this can be a difficult process unless the tool carefully designed for third-party installation. And integration of small tools within larger frameworks (like a type of interactive exercise within an eTextbook framework) is also difficult without proper design. This paper describes an ongoing containerization effort for the OpenDSA eTextbook project. Our goal is both to serve our needs by creating an easier-to-manage decomposition of the many tools and sub-servers required by this complex system, and also to provide an easily installable production environment that instructors can run locally. This new system provides better access to developer-level data analysis tools and potentially removes many FERPA-related privacy concerns. We also describe our efforts to integrate Caliper Analytics into OpenDSA to expand the data collection and analysis services. We hope that our containerization architecture can help provide a roadmap for similar projects to follow 
    more » « less
  5. Wang, H. (Ed.)
    Once upon a time databases were structured, one size fitted all and they resided on machines that were trustworthy and even when they failed, they simply crashed. This era has come and gone as eloquently stated by Stonebraker and Cetintemel [16]. We now have key-value stores, graph databases, text databases, and a myriad of unstructured data repositories. The database community has wholeheartedly accepted the fact that the same information might come in different formats, modes and representations. We also accept that data might not be ”clean” and that data might need to be ”cleaned” due to the diverse sources of information. However, we, as a database community still cling to our 20th century belief that databases always reside on trustworthy, honest servers. Although the database community has always considered fault-tolerance as an integral building block of data management (remember ”D” in ACID is for Durability), we still have trouble accepting the fact that not all failures are simply crash failures and might in fact involve malicious and non-trustworthy infrastructure. This notion has been challenged and abandoned by many other Computer Science communities, most notably the security and the distributed systems communities. The rise of the cloud computing paradigm as well as the rapid popularity of blockchains demand a rethinking of our na¨ıve, comfortable beliefs in an ideal benign infrastructure. In the cloud, clients store their sensitive data in remote servers owned and operated by cloud providers. The Security and Crypto Communities have made significant inroads to protect both data and access privacy from malicious untrusted storage providers using encryption and oblivious data stores. The Distributed Systems and the Systems Communities have developed consensus protocols to ensure the fault-tolerant maintenance of data residing on untrusted, malicious infrastructure. However, these solutions face significant scalability and performance challenges when incorporated in large scale data repositories. Novel database designs need to directly address the natural tension between performance, fault-tolerance and trustworthiness. This is a perfect setting for the database community to lead and guide. 
    more » « less