The National Science Foundation’s Arctic Data Center is the primary data repository for NSF-funded research conducted in the Arctic. There are major challenges in discovering and interpreting resources in a repository containing data as heterogeneous and interdisciplinary as those in the Arctic Data Center. This paper reports on advances in cyberinfrastructure at the Arctic Data Center that help address these issues by leveraging semantic technologies that enhance the repository’s adherence to the FAIR data principles and improve the Findability, Accessibility, Interoperability, and Reusability of digital resources in the repository. We describe the Arctic Data Center’s improvements. We use semantic annotation to bind metadata about Arctic data sets with concepts in web-accessible ontologies. The Arctic Data Center’s implementation of a semantic annotation mechanism is accompanied by the development of an extended search interface that increases the findability of data by allowing users to search for specific, broader, and narrower meanings of measurement descriptions, as well as through their potential synonyms. Based on research carried out by the DataONE project, we evaluated the potential impact of this approach, regarding the accessibility, interoperability, and reusability of measurement data. Arctic research often benefits from having additional data, typically from multiple, heterogeneous sources, that complement and extend the bases – spatially, temporally, or thematically – for understanding Arctic phenomena. These relevant data resources must be 'found', and 'harmonized' prior to integration and analysis. The findings of a case study indicated that the semantic annotation of measurement data enhances the capabilities of researchers to accomplish these tasks.
more »
« less
Rethinking Data Management Systems for Disaggregated Data Centers
One recent trend of cloud data center design is resource disaggregation. Instead of having server units with “converged” compute, memory, and storage resources, a disaggregated data center (DDC) has pools of resources of each type connected via a network. While the systems community has been investigating the research challenges of DDC by designing new OS and network stacks, the implications of DDC for next-generation database systems remain unclear. In this paper, we take a first step towards understanding how DDCs might affect the design of relational databases, discuss the potential advantages and drawbacks in the context of data processing, and outline research challenges in addressing them.
more »
« less
- Award ID(s):
- 1845749
- PAR ID:
- 10157860
- Date Published:
- Journal Name:
- Conference on Innovative Data Systems Research
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
As the field of Artificial Life advances and grows, we find ourselves in the midst of an increasingly complex ecosystem of software systems. Each system is developed to address particular research objectives, all unified under the common goal of understanding life. Such an ambitious endeavor begets a variety of algorithmic challenges. Many projects have solved some of these problems for individual systems, but these solutions are rarely portable and often must be re-engineered across systems. Here, we propose a community-driven process of developing standards for representing commonly used types of data across our field. These standards will improve software re-use across research groups and allow for easier comparisons of results generated with different artificial life systems. We began the process of developing data standards with two discussion-driven workshops (one at the 2018 Conference for Artificial Life and the other at the 2018 Congress for the BEACON Center for the Study of Evolution in Action). At each of these workshops, we discussed the vision for Artificial Life data standards, proposed and refined a standard for phylogeny (ancestry tree) data, and solicited feedback from attendees. In addition to proposing a general vision and framework for Artificial Life data standards, we release and discuss version 1.0.0 of the standards. This release includes the phylogeny data standard developed at these workshops and several software resources under development to support our proposed phylogeny standards framework.more » « less
-
In recent years, Network-on-Chip (NoC) has emerged as a promising solution for addressing a critical performance bottleneck encountered in designing large-scale multi-core systems, i.e., data communication. With advancements in chip manufacturing technologies and the increasing complexity of system designs, the task of designing the communication sub- systems has become increasingly challenging. The emergence of hardware accelerators, such as GPUs, FPGAs and ASICs, together with heterogeneous system integration of the CPUs and the accelerators creates new challenges in NoC design. Conventional NoC architectures developed for CPU-based multi- core systems are not able to satisfy the traffic demands of heterogeneous systems. In recent years, numerous research efforts have been dedicated to exploring the various aspects of NoC design in hardware accelerators and heterogeneous systems. However, there is a need for a comprehensive understanding of the current state-of-the-art research in this emerging research area. This paper aims to provide a summary of research work conducted in heterogeneous NoC design. Through this survey, we aim to present a comprehensive overview of the current related research, highlighting key findings, challenges, and future directions in this field.more » « less
-
This check sheet offers guidance on writing a data management plan based on National Science Foundation requirements. This guidance can help researchers who are applying for external funding and who intend to publish their data, data collection protocols, or instruments via the DesignSafe Cyberinfrastructure. About the CONVERGE Extreme Events Research Check Sheets Series: The National Science Foundation-supported CONVERGE facility at the Natural Hazards Center at the University of Colorado Boulder has developed a series of short, graphical check sheets that are meant to be used as researchers design their studies, prepare to enter the field, conduct field research, and exit the field. The series offers best practices for extreme events research and includes check sheets that are free to the research community. More information is available at: https://converge.colorado.edu/resources/check-sheets.more » « less
-
The emergence of big data has created new challenges for researchers transmitting big data sets across campus networks to local (HPC) cloud resources, or over wide area networks to public cloud services. Unlike conventional HPC systems where the network is carefully architected (e.g., a high speed local interconnect, or a wide area connection between Data Transfer Nodes), today's big data communication often occurs over shared network infrastructures with many external and uncontrolled factors influencing performance. This paper describes our efforts to understand and characterize the performance of various big data transfer tools such as rclone, cyberduck, and other provider-specific CLI tools when moving data to/from public and private cloud resources. We analyze the various parameter settings available on each of these tools and their impact on performance. Our experimental results give insights into the performance of cloud providers and transfer tools, and provide guidance for parameter settings when using cloud transfer tools. We also explore performance when coming from HPC DTN nodes as well as researcher machines located deep in the campus network, and show that emerging SDN approaches such as the VIP Lanes system can deliver excellent performance even from researchers' machines.more » « less