skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Hydra: A Scalable Decentralized P2P Storage Federation for Large Scientific Datasets
An increasingly collaborative and distributed nature of scientific collaborations, along with the exploding volume and variety of datasets point to an urgent need for data publication frameworks that allow researchers to publish data rapidly and reliably. However, current scientific data publication solutions only support any one of these requirements at a time. Currently, the most common data publication models are either centralized or ad-hoc. While the centralized model (e.g., publishing via a repository controlled by a central organization) can provide reliability through replication, the publication speed tends to be slower due to the inevitable curation and processing delays. Further, such centralized models may place restrictions regarding what data can be published through them. On the contrary, adhoc models lead to concerns such as the lack of replication and a robust security model. We present Hydra, a peer-to-peer, decentralized storage system that enables decentralized and reliable data publication capabilities. Hydra enables collaborating organizations to create a loosely interconnected and federated storage overlay atop community provided storage servers. The Hydra overlay is entirely decentralized. Hydra enables secure publication and access to data from anywhere and ensures automatic replication of published data, enhancing availability and reliability. Hydra also makes replication decisions without a central controller while accommodating local policies. Hydra embodies a significant stride toward next-generation scientific data management, fostering a decentralized, reliable, and accessible system that fits the changing landscape of scientific collaborations.  more » « less
Award ID(s):
2430341 2126148 2019012
PAR ID:
10647726
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
IEEE
Date Published:
Page Range / eLocation ID:
810 to 816
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Reliability enhancement of microgrids is challenged by environmental and operational failures. Centrally controlled microgrids are susceptible to failures at high probability due to a single-point-of-failure, e.g. the central controller. True decentralization of microgrid architecture entails elimination of the central controller, attaining a parallel configuration for the system. In this paper, decentralized microgrid control architecture is proposed as a solution for reliability degradation over the time, and analyzes the reliability aspects of centralized and decentralized control architectures for microgrids. Degree of importance of a single controller in centralized and decentralized architectures is determined and validated by Markov Chain Models (MCM). Results confirm that higher reliability is achieved when true decentralization of control architecture is adopted. Challenges of implementing a true decentralized control architecture are discussed. Hardware-In-the-Loop simulation results for microgrid controller failure scenarios for both architectures are presented and discussed. 
    more » « less
  2. ABSTRACT Biophysics research is exciting because physical approaches to biology can provide novel insights, and it is challenging because it requires knowledge and skills from multiple disciplines. We have developed an undergraduate biophysics laboratory module that teaches fundamental skills such as time-lapse microscopy, image analysis, programming, critical reading of scientific literature, and basics of scientific writing and peer review. The module is accessible to students who are familiar with introductory statistics, cell biology, and differential calculus. We used published research on the biomechanics ofHydramouth opening as a framework because it describes a stunning biological phenomenon:Hydra, a freshwater polyp, generates a multicell-wide mouth opening in an otherwise closed epithelium through extreme cell deformations within seconds. This publication was co–first authored by an undergraduate and was featured in the public press, thus providing multiple anchors that make the research accessible and motivating to undergraduates. Students start with a critical reading and discussion of the publication and then execute some of the experiments and analysis from the publication, thereby learning fluorescence time-lapse microscopy and image analysis by using ImageJ and/or MATLAB. Students quantify the kinematics of the tissue deformations during mouth opening and compare their data to the literature. The module culminates in the students writing a short paper about their results following themicroPublicationjournal style, a blinded peer review, and final paper submission. Here, we describe one possible implementation of the module with the necessary resources to reproduce it and summarize student feedback from a pilot run. We also provide suggestions for more advanced exercises and for using Python for data analysis. Several students expressed that repeating a published study completed by an undergraduate inspired and motivated them, thus creating buy-in and assurance that they can do it, which we expect to help with confidence and retention. 
    more » « less
  3. While our society accelerates its transition to the Internet of Things, billions of IoT devices are now linked to the network. While these gadgets provide enormous convenience, they generate a large amount of data that has already beyond the network’s capacity. To make matters worse, the data acquired by sensors on such IoT devices also include sensitive user data that must be appropriately treated. At the moment, the answer is to provide hub services for data storage in data centers. However, when data is housed in a centralized data center, data owners lose control of the data, since data centers are centralized solutions that rely on data owners’ faith in the service provider. In addition, edge computing enables edge devices to collect, analyze, and act closer to the data source, the challenge of data privacy near the edge is also a tough nut to crack. A large number of user information leakage both for IoT hub and edge made the system untrusted all along. Accordingly, building a decentralized IoT system near the edge and bringing real trust to the edge is indispensable and significant. To eliminate the need for a centralized data hub, we present a prototype of a unique, secure, and decentralized IoT framework called Reja, which is built on a permissioned Blockchain and an intrusion-tolerant messaging system ChiosEdge, and the critical components of ChiosEdge are reliable broadcast and BFT consensus. We evaluated the latency and throughput of Reja and its sub-module ChiosEdge. 
    more » « less
  4. The InterPlanetary File System (IPFS) is a pioneering effort for Web 3.0, well-known for its decentralized infrastructure. However, some recent studies have shown that IPFS exhibits a high degree of centralization and has integrated centralized components for improved performance. While this change contradicts the core decentralized ethos of IPFS and introduces risks of hurting the data replication level and thus availability, it also opens some opportunities for better data management and cost savings through deduplication. To explore these challenges and opportunities, we start by collecting an extensive dataset of IPFS internal traffic spanning the last three years with 20+ billion messages. By analyzing this long- term trace, we obtain a more complete and accurate view of how the status of centralization evolves over an extended period. In particular, our study reveals that (1) IPFS shows a low replication level, with only 2.71% of data files replicated more than 5 times. While increasing replication enhances lookup performance and data availability, it adversely affects downloading throughput due to the overhead involved in managing peer connections, (2) there is a clear growing trend in centralization within IPFS in the last 3 years, with just 5% of peers now hosting over 80% of the content, significantly decreasing from 21.38% 3 years ago, which is largely driven by the increase of cloud nodes, (3) the default deduplication strategy of IPFS using Fixed-Size Chunking (FSC) is largely inefficient, especially with the default 256KB chunk size, showing near-zero duplication being detected. Although Content-Defined Chunking (CDC) with smaller chunks could save ∼1.8 petabytes (PB) storage space, it could impact user performance negatively. We thus design and evaluate a new metadata format that optimizes deduplication without compromising performance. 
    more » « less
  5. This paper presents a novel framework for creating a recoverable rare disease patient identity system using blockchain and smart contracts, decentralized identifiers (DIDs), and the InterPlanetary File System (IPFS). Smart contracts are executable code that can be written into decentralized storage such as blockchains in order to enable tamper-proof transactions of data. DIDs provide a secure, decentralized, and extensible way to create, store, and manage digital identities, while IPFS provides a distributed, immutable, and secure storage system for patient identities. Utilizing these technologies with smart contracts, we created a framework to store persistent medical records of patients. Smart contracts additionally allow account recovery without the use of any centralized authority. The framework enables healthcare providers to securely access a patient's data while maintaining the patient's ownership of their data. The paper explores the advantages of using a decentralized identity system and highlights the potential of this approach to improve the security and universality of medical records for patients with rare diseases. 
    more » « less