skip to main content

Title: FalconDB: Blockchain-based Collaborative Database
Nowadays an emerging class of applications are based oncollaboration over a shared database among different entities. However, the existing solutions on shared database may require trust on others, have high hardware demand that is unaffordable for individual users, or have relatively low performance. In other words, there is a trilemma among security, compatibility and efficiency. In this paper, we present FalconDB, which enables different parties with limited hardware resources to efficiently and securely collaborate on a database. FalconDB adopts database servers with verification interfaces accessible to clients and stores the digests for query/update authentications on a blockchain. Using blockchain as a consensus platform and a distributed ledger, FalconDB is able to work without any trust on each other. Meanwhile, FalconDB requires only minimal storage cost on each client, and provides anywhere-available, real-time and concurrent access to the database. As a result, FalconDB over-comes the disadvantages of previous solutions, and enables individual users to participate in the collaboration with high efficiency, low storage cost and blockchain-level security guarantees.
Authors:
; ; ; ;
Award ID(s):
1718834
Publication Date:
NSF-PAR ID:
10189591
Journal Name:
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
Page Range or eLocation-ID:
637 to 652
Sponsoring Org:
National Science Foundation
More Like this
  1. Obeid, Iyad ; Selesnick, Ivan ; Picone, Joseph (Ed.)
    The goal of this work was to design a low-cost computing facility that can support the development of an open source digital pathology corpus containing 1M images [1]. A single image from a clinical-grade digital pathology scanner can range in size from hundreds of megabytes to five gigabytes. A 1M image database requires over a petabyte (PB) of disk space. To do meaningful work in this problem space requires a significant allocation of computing resources. The improvements and expansions to our HPC (highperformance computing) cluster, known as Neuronix [2], required to support working with digital pathology fall into two broad categories: computation and storage. To handle the increased computational burden and increase job throughput, we are using Slurm [3] as our scheduler and resource manager. For storage, we have designed and implemented a multi-layer filesystem architecture to distribute a filesystem across multiple machines. These enhancements, which are entirely based on open source software, have extended the capabilities of our cluster and increased its cost-effectiveness. Slurm has numerous features that allow it to generalize to a number of different scenarios. Among the most notable is its support for GPU (graphics processing unit) scheduling. GPUs can offer a tremendous performance increase inmore »machine learning applications [4] and Slurm’s built-in mechanisms for handling them was a key factor in making this choice. Slurm has a general resource (GRES) mechanism that can be used to configure and enable support for resources beyond the ones provided by the traditional HPC scheduler (e.g. memory, wall-clock time), and GPUs are among the GRES types that can be supported by Slurm [5]. In addition to being able to track resources, Slurm does strict enforcement of resource allocation. This becomes very important as the computational demands of the jobs increase, so that they have all the resources they need, and that they don’t take resources from other jobs. It is a common practice among GPU-enabled frameworks to query the CUDA runtime library/drivers and iterate over the list of GPUs, attempting to establish a context on all of them. Slurm is able to affect the hardware discovery process of these jobs, which enables a number of these jobs to run alongside each other, even if the GPUs are in exclusive-process mode. To store large quantities of digital pathology slides, we developed a robust, extensible distributed storage solution. We utilized a number of open source tools to create a single filesystem, which can be mounted by any machine on the network. At the lowest layer of abstraction are the hard drives, which were split into 4 60-disk chassis, using 8TB drives. To support these disks, we have two server units, each equipped with Intel Xeon CPUs and 128GB of RAM. At the filesystem level, we have implemented a multi-layer solution that: (1) connects the disks together into a single filesystem/mountpoint using the ZFS (Zettabyte File System) [6], and (2) connects filesystems on multiple machines together to form a single mountpoint using Gluster [7]. ZFS, initially developed by Sun Microsystems, provides disk-level awareness and a filesystem which takes advantage of that awareness to provide fault tolerance. At the filesystem level, ZFS protects against data corruption and the infamous RAID write-hole bug by implementing a journaling scheme (the ZFS intent log, or ZIL) and copy-on-write functionality. Each machine (1 controller + 2 disk chassis) has its own separate ZFS filesystem. Gluster, essentially a meta-filesystem, takes each of these, and provides the means to connect them together over the network and using distributed (similar to RAID 0 but without striping individual files), and mirrored (similar to RAID 1) configurations [8]. By implementing these improvements, it has been possible to expand the storage and computational power of the Neuronix cluster arbitrarily to support the most computationally-intensive endeavors by scaling horizontally. We have greatly improved the scalability of the cluster while maintaining its excellent price/performance ratio [1].« less
  2. To facilitate the adoption of cloud by organizations, Cryptographic Access Control (CAC) is the obvious solution to control data sharing among users while preventing partially trusted Cloud Service Providers (CSP) from accessing sensitive data. Indeed, several CAC schemes have been proposed in the literature. Despite their differences, available solutions are based on a common set of entities—e.g., a data storage service or a proxy mediating the access of users to encrypted data—that operate in different (security) domains—e.g., on-premise or the CSP. However, the majority of these CAC schemes assumes a fixed assignment of entities to domains; this has security and usability implications that are not made explicit and can make inappropriate the use of a CAC scheme in certain scenarios with specific trust assumptions and requirements. For instance, assuming that the proxy runs at the premises of the organization avoids the vendor lock-in effect but may give rise to other security concerns (e.g., malicious insiders attackers). To the best of our knowledge, no previous work considers how to select the best possible architecture (i.e., the assignment of entities to domains) to deploy a CAC scheme for the trust assumptions and requirements of a given scenario. In this article, we proposemore »a methodology to assist administrators in exploring different architectures for the enforcement of CAC schemes in a given scenario. We do this by identifying the possible architectures underlying the CAC schemes available in the literature and formalizing them in simple set theory. This allows us to reduce the problem of selecting the most suitable architectures satisfying a heterogeneous set of trust assumptions and requirements arising from the considered scenario to a decidable Multi-objective Combinatorial Optimization Problem (MOCOP) for which state-of-the-art solvers can be invoked. Finally, we show how we use the capability of solving the MOCOP to build a prototype tool assisting administrators to preliminarily perform a “What-if” analysis to explore the trade-offs among the various architectures and then use available standards and tools (such as TOSCA and Cloudify) for automated deployment in multiple CSPs.« less
  3. 1. Description of the objectives and motivation for the contribution to ECE education The demand for wireless data transmission capacity is increasing rapidly and this growth is expected to continue due to ongoing prevalence of cellular phones and new and emerging bandwidth-intensive applications that encompass high-definition video, unmanned aerial systems (UAS), intelligent transportation systems (ITS) including autonomous vehicles, and others. Meanwhile, vital military and public safety applications also depend on access to the radio frequency spectrum. To meet these demands, the US federal government is beginning to move from the proven but inefficient model of exclusive frequency assignments to a more-efficient, shared-spectrum approach in some bands of the radio frequency spectrum. A STEM workforce that understands the radio frequency spectrum and applications that use the spectrum is needed to further increase spectrum efficiency and cost-effectiveness of wireless systems over the next several decades to meet anticipated and unanticipated increases in wireless data capacity. 2. Relevant background including literature search examples if appropriate CISCO Systems’ annual survey indicates continued strong growth in demand for wireless data capacity. Meanwhile, undergraduate electrical and computer engineering courses in communication systems, electromagnetics, and networks tend to emphasize mathematical and theoretical fundamentals and higher-layer protocols, withmore »less focus on fundamental concepts that are more specific to radio frequency wireless systems, including the physical and media access control layers of wireless communication systems and networks. An efficient way is needed to introduce basic RF system and spectrum concepts to undergraduate engineering students in courses such as those mentioned above who are unable to, or had not planned to take a full course in radio frequency / microwave engineering or wireless systems and networks. We have developed a series of interactive online modules that introduce concepts fundamental to wireless communications, the radio frequency spectrum, and spectrum sharing, and seek to present these concepts in context. The modules include interactive, JavaScript-based simulation exercises intended to reinforce the concepts that are presented in the modules through narrated slide presentations, text, and external links. Additional modules in development will introduce advanced undergraduate and graduate students and STEM professionals to configuration and programming of adaptive frequency-agile radios and spectrum management systems that can operate efficiently in congested radio frequency environments. Simulation exercises developed for the advanced modules allow both manual and automatic control of simulated radio links in timed, game-like simulations, and some exercises will enable students to select from among multiple pre-coded controller strategies and optionally edit the code before running the timed simulation. Additionally, we have developed infrastructure for running remote laboratory experiments that can also be embedded within the online modules, including a web-based user interface, an experiment management framework, and software defined radio (SDR) application software that runs in a wireless testbed initially developed for research. Although these experiments rely on limited hardware resources and introduce additional logistical considerations, they provide additional realism that may further challenge and motivate students. 3. Description of any assessment methods used to evaluate the effectiveness of the contribution, Each set of modules is preceded and followed by a survey. Each individual module is preceded by a quiz and followed by another quiz, with pre- and post-quiz questions drawn from the same pool. The pre-surveys allow students to opt in or out of having their survey and quiz results used anonymously in research. 4. Statement of results. The initial modules have been and are being used by three groups of students: (1) students in an undergraduate Introduction to Communication Systems course; (2) an interdisciplinary group of engineering students, including computer science students, who are participating in related undergraduate research project; and (3) students in a graduate-level communications course that includes both electrical and computer engineers. Analysis of results from the first group of students showed statistically significant increases from pre-quiz to post-quiz for each of four modules on fundamental wireless communication concepts. Results for the other students have not yet been analyzed, but also appear to show substantial pre-quiz to post-quiz increases in mean scores.« less
  4. Data outsourcing is a promising technical paradigm to facilitate cost-effective real-time data storage, processing, and dissemination. In such a system, a data owner proactively pushes a stream of data records to a third-party cloud server for storage, which in turn processes various types of queries from end users on the data owner’s behalf. This paper considers outsourced multi-version key-value stores that have gained increasing popularity in recent years, where a critical security challenge is to ensure that the cloud server returns both authentic and fresh data in response to end users’ queries. Despite several recent attempts on authenticating data freshness in outsourced key-value stores, they either incur excessively high communication cost or can only offer very limited real-time guarantee. To fill this gap, this paper introduces KV-Fresh, a novel freshness authentication scheme for outsourced key-value stores that offers strong real-time guarantee. KV-Fresh is designed based on a novel data structure, Linked Key Span Merkle Hash Tree, which enables highly efficient freshness proof by embedding chaining relationship among records generated at different time. Detailed simulation studies using a synthetic dataset generated from real data confirm the efficacy and efficiency of KV-Fresh.
  5. Data outsourcing is a promising technical paradigm to facilitate cost-effective real-time data storage, processing, and dissemination. In such a system, a data owner proactively pushes a stream of data records to a third-party cloud server for storage, which in turn processes various types of queries from end users on the data owner’s behalf. This paper considers outsourced multi-version key-value stores that have gained increasing popularity in recent years, where a critical security challenge is to ensure that the cloud server returns both authentic and fresh data in response to end users’ queries. Despite several recent attempts on authenticating data freshness in outsourced key value stores, they either incur excessively high communication cost or can only offer very limited real-time guarantee. To fill this gap, this paper introduces KV-Fresh, a novel freshness authentication scheme for outsourced key-value stores that offers strong real-time guarantee. KV-Fresh is designed based on a novel data structure, Linked Key Span Merkle Hash Tree, which enables highly efficient freshness proof by embedding chaining relationship among records generated at different time. Detailed simulation studies using a synthetic dataset generated from real data confirm the efficacy and efficiency of KV-Fresh.