Abstract The Deep Ocean Observing Strategy (DOOS) is an international, community-driven initiative that facilitates collaboration across disciplines and fields, elevates a diverse cohort of early career researchers into future leaders, and connects scientific advancements to societal needs. DOOS represents a global network of deep-ocean observing, mapping, and modeling experts, focusing community efforts in the support of strong science, policy, and planning for sustainable oceans. Its initiatives work to propose deep-sea Essential Ocean Variables; assess technology development; develop shared best practices, standards, and cross-calibration procedures; and transfer knowledge to policy makers and deep-ocean stakeholders. Several of these efforts align with the vision of the UN Ocean Decade to generate the science we need to create the deep ocean we want. DOOS works toward (1) a healthy and resilient deep ocean by informing science-based conservation actions, including optimizing data delivery, creating habitat and ecological maps of critical areas, and developing regional demonstration projects; (2) a predicted deep ocean by strengthening collaborations within the modeling community, determining needs for interdisciplinary modeling and observing system assessment in the deep ocean; (3) an accessible deep ocean by enhancing open access to innovative low-cost sensors and open-source plans, making deep-ocean data Findable, Accessible, Interoperable, and Reusable, and focusing on capacity development in developing countries; and finally (4) an inspiring and engaging deep ocean by translating science to stakeholders/end users and informing policy and management decisions, including in international waters.
more »
« less
This content will become publicly available on October 23, 2025
Towards an open-source model for data and metadata standards
Progress in machine learning and artificial intelligence promises to advance research and understanding across a wide range of fields and activities. In tandem, increased awareness of the importance of open data for reproducibility and scientific transparency is making inroads in fields that have not traditionally produced large publicly available datasets. Data sharing requirements from publishers and funders, as well as from other stakeholders, have also created pressure to make datasets with research and/or public interest value available through digital repositories. However, to make the best use of existing data, and facilitate the creation of useful future datasets, robust, interoperable and usable standards need to evolve and adapt over time. The open-source development model provides significant potential benefits to the process of standard creation and adaptation. In particular, data and meta-data standards can use long-standing technical and socio-technical processes that have been key to managing the development of software, and which allow incorporating broad community input into the formulation of these standards. On the other hand, open-source models carry unique risks that need to be considered. This report surveys existing open-source standards development, addressing these benefits and risks. It outlines recommendations for standards developers, funders and other stakeholders on the path to robust, interoperable and usable open-source data and metadata standards.
more »
« less
- Award ID(s):
- 2334483
- PAR ID:
- 10550347
- Publisher / Repository:
- Open Science Framework
- Date Published:
- Format(s):
- Medium: X
- Institution:
- University of Washington
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Persistent identifiers for research objects, researchers, organizations, and funders are the key to creating unambiguous and persistent connections across the global research infrastructure (GRI). Many repositories are implementing mechanisms to collect and integrate these identifiers into their submission and record curation processes. This bodes well for a well-connected future, but metadata for existing resources submitted in the past are missing these identifiers, thus missing the connections required for inclusion in the connected infrastructure. Re-curation of these metadata is required to make these connections. This paper introduces the global research infrastructure and demonstrates how repositories, and their user communities, can contribute to and benefit from connections to the global research infrastructure. The Dryad Data Repository has existed since 2008 and has successfully re-curated the repository metadata several times, adding identifiers for research organizations, funders, and researchers. Understanding and quantifying these successes depends on measuring repository and identifier connectivity. Metrics are described and applied to the entire repository here. Identifiers (Digital Object Identifiers, DOIs) for papers connected to datasets in Dryad have long been a critical part of the Dryad metadata creation and curation processes. Since 2019, the portion of datasets with connected papers has decreased from 100% to less than 40%. This decrease has significant ramifications for the re-curation efforts described above as connected papers have been an important source of metadata. In addition, missing connections to papers make understanding and re-using datasets more difficult. Connections between datasets and papers can be difficult to make because of time lags between submission and publication, lack of clear mechanisms for citing datasets and other research objects from papers, changing focus of researchers, and other obstacles. The Dryad community of members, i.e. users, research institutions, publishers, and funders have vested interests in identifying these connections and critical roles in the curation and re-curation efforts. Their engagement will be critical in building on the successes Dryad has already achieved and ensuring sustainable connectivity in the future.more » « less
-
Abstract Open science and open data within scholarly research programs are growing both in popularity and by requirement from grant funding agencies and journal publishers. A central component of open data management, especially on collaborative, multidisciplinary, and multi-institutional science projects, is documentation of complete and accurate metadata, workflow, and source code in addition to access to raw data and data products to uphold FAIR (Findable, Accessible, Interoperable, Reusable) principles. Although best practice in data/metadata management is to use established internationally accepted metadata schemata, many of these standards are discipline-specific making it difficult to catalog multidisciplinary data and data products in a way that is easily findable and accessible. Consequently, scattered and incompatible metadata records create a barrier to scientific innovation, as researchers are burdened to find and link multidisciplinary datasets. One possible solution to increase data findability, accessibility, interoperability, reproducibility, and integrity within multi-institutional and interdisciplinary projects is a centralized and integrated data management platform. Overall, this type of interoperable framework supports reproducible open science and its dissemination to various stakeholders and the public in a FAIR manner by providing direct access to raw data and linking protocols, metadata and supporting workflow materials.more » « less
-
Abstract. There is a continuously increasing need for reliable feature detection and tracking tools based on objective analysis principles for use with meteorological data. Many tools have been developed over the previous 2 decades that attempt to address this need but most have limitations on the type of data they can be used with, feature computational and/or memory expenses that make them unwieldy with larger datasets, or require some form of data reduction prior to use that limits the tool's utility. The Tracking and Object-Based Analysis of Clouds (tobac) Python package is a modular, open-source tool that improves on the overall generality and utility of past tools. A number of scientific improvements (three spatial dimensions, splits and mergers of features, an internal spectral filtering tool) and procedural enhancements (increased computational efficiency, internal regridding of data, and treatments for periodic boundary conditions) have been included in tobac as a part of the tobac v1.5 update. These improvements have made tobac one of the most robust, powerful, and flexible identification and tracking tools in our field to date and expand its potential use in other fields. Future plans for tobac v2 are also discussed.more » « less
-
null (Ed.)The systemic challenges of the COVID-19 pandemic require cross-disciplinary collaboration in a global and timely fashion. Such collaboration needs open research practices and the sharing of research outputs, such as data and code, thereby facilitating research and research reproducibility and timely collaboration beyond borders. The Research Data Alliance COVID-19 Working Group recently published a set of recommendations and guidelines on data sharing and related best practices for COVID-19 research. These guidelines include recommendations for researchers, policymakers, funders, publishers and infrastructure providers from the perspective of different domains (Clinical Medicine, Omics, Epidemiology, Social Sciences, Community Participation, Indigenous Peoples, Research Software, Legal and Ethical Considerations). Several overarching themes have emerged from this document such as the need to balance the creation of data adherent to FAIR principles (findable, accessible, interoperable and reusable), with the need for quick data release; the use of trustworthy research data repositories; the use of well-annotated data with meaningful metadata; and practices of documenting methods and software. The resulting document marks an unprecedented cross-disciplinary, cross-sectoral, and cross-jurisdictional effort authored by over 160 experts from around the globe. This letter summarises key points of the Recommendations and Guidelines, highlights the relevant findings, shines a spotlight on the process, and suggests how these developments can be leveraged by the wider scientific community.more » « less