skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: HDMF: Hierarchical Data Modeling Framework for Modern Science Data Standards
A ubiquitous problem in aggregating data across different experimental and observational data sources is a lack of software infrastructure that enables flexible and extensible standardization of data and metadata. To address this challenge, we developed HDMF, a hierarchical data modeling framework for modern science data standards. With HDMF, we separate the process of data standardization into three main components: (1) data modeling and specification, (2) data I/O and storage, and (3) data interaction and data APIs. To enable standards to support the complex requirements and varying use cases throughout the data life cycle, HDMF provides object mapping infrastructure to insulate and integrate these various components. This approach supports the flexible development of data standards and extensions, optimized storage backends, and data APIs, while allowing the other components of the data standards ecosystem to remain stable. To meet the demands of modern, large-scale science data, HDMF provides advanced data I/O functionality for iterative data write, lazy data load, and parallel I/O. It also supports optimization of data storage via support for chunking, compression, linking, and modular data storage. We demonstrate the application of HDMF in practice to design NWB 2.0, a modern data standard for collaborative science across the neurophysiology community.  more » « less
Award ID(s):
1816577
PAR ID:
10177970
Author(s) / Creator(s):
; ; ; ; ; ; ;
Date Published:
Journal Name:
2019 IEEE International Conference on Big Data (Big Data)
Page Range / eLocation ID:
165 to 179
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In the era of data-intensive computing, large-scale applications, in both scientific and the BigData communities, demonstrate unique I/O requirements leading to a proliferation of different storage devices and software stacks, many of which have conflicting requirements. Further, new hardware technologies and system designs create a hierarchical composition that may be ideal for computational storage operations. In this article, we investigate how to support a wide variety of conflicting I/O workloads under a single storage system. We introduce the idea of a Label , a new data representation, and, we present LABIOS: a new, distributed, Label- based I/O system. LABIOS boosts I/O performance by up to 17× via asynchronous I/O, supports heterogeneous storage resources, offers storage elasticity, and promotes in situ analytics and software defined storage support via data provisioning. LABIOS demonstrates the effectiveness of storage bridging to support the convergence of HPC and BigData workloads on a single platform. 
    more » « less
  2. In collaboration with the Center for Microbiome Analysis through Island Knowledge and Investigations (C-MĀIKI), the Hawaii EPSCoR Ike Wai project and the Hawaii Data Science Institute, a new science gateway, the C-MĀIKI gateway, was developed to support modern, interoperable and scalable microbiome data analysis. This gateway provides a web-based interface for accessing high-performance computing resources and storage to enable and support reproducible microbiome data analysis. The C-MĀIKI gateway is accelerating the analysis of microbiome data for Hawaii through ease of use and centralized infrastructure. 
    more » « less
  3. To bridge the giant semantic gap between applications and modern storage systems, passing a piece of tiny and useful information, called I/O access hints, from upper layers to the storage layer may greatly improve application performance and ease data management in storage systems. This is especially true for heterogeneous storage systems that consist of multiple types of storage devices. Since ingesting external access hints will likely involve laborious modifications of legacy I/O stacks, it is very hard to evaluate the effect and take advantages of access hints. In this article, we design a generic and flexible framework, called HintStor, to quickly play with a set of I/O access hints and evaluate their impacts on heterogeneous storage systems. HintStor provides a new application/user-level interface, a file system plugin, and performs data management with a generic block storage data manager. We demonstrate the flexibility of HintStor by evaluating four types of access hints: file system data classification, stream ID, cloud prefetch, and I/O task scheduling on a Linux platform. The results show that HintStor can execute and evaluate various I/O access hints under different scenarios with minor modifications to the kernel and applications. 
    more » « less
  4. O-RAN establishes an advanced radio access network (RAN) architecture that supports inter-operable, multi-vendor, and artificial intelligence (AI) controlled wireless access networks. The unique components, interfaces, and technologies of O-RAN differentiate it from the 3GPP RAN. Because O-RAN supports 3GPP protocols, currently 4G and 5G, while offering additional network interfaces and controllers, it has a larger attack surface. The O-RAN security requirements, vulnerabilities, threats, and countermeasures must be carefully assessed for it to become a platform for 5G Advanced and future 6G wireless. This article presents the ongoing standardization activities of the O-RAN Alliance for modeling the potential threats to the network and to the open fronthaul interface, in particular. We identify end-to-end security threats and discuss those on the open fronthaul in more detail. We then provide recommendations for countermeasures to tackle the identified security risks and encourage industry to establish standards and best practices for safe and secure implementations of the open fronthaul interface. 
    more » « less
  5. In today’s Big Data era, data scientists require modern workflows to quickly analyze large-scale datasets using complex codes to maintain the rate of scientific progress. These scientists often rely on available campus resources or off-the-shelf computational systems for their applications. Unified infrastructure or over-provisioned servers can quickly become bottlenecks for specific tasks, wasting time and resources. Composable infrastructure helps solve these problems by providing users with new ways to increase resource utilization. Composable infrastructure disaggregates a computer’s components – CPU, GPU (accelerators), storage and networking – into fluid pools of resources, but typically relies upon infrastructure engineers to architect individual machines. Infrastructure is either managed with specialized command-line utilities, user interfaces or specification files. These management models are cumbersome and difficult to incorporate into data-science workflows. We developed a high-level software API, Composastructure, which, when integrated into modern workflows, can be used by infrastructure engineers as well as data scientists to reorganize composable resources on demand. Composastructure enables infrastructures to be programmable, secure, persistent and reproducible. Our API composes machines, frees resources, supports multi-rack operations, and includes a Python module for Jupyter Notebooks. 
    more » « less