Abstract This paper summarizes the open community conventions developed by the Ecological Forecasting Initiative (EFI) for the common formatting and archiving of ecological forecasts and the metadata associated with these forecasts. Such open standards are intended to promote interoperability and facilitate forecast communication, distribution, validation, and synthesis. For output files, we first describe the convention conceptually in terms of global attributes, forecast dimensions, forecasted variables, and ancillary indicator variables. We then illustrate the application of this convention to the two file formats that are currently preferred by the EFI, netCDF (network common data form), and comma‐separated values (CSV), but note that the convention is extensible to future formats. For metadata, EFI's convention identifies a subset of conventional metadata variables that are required (e.g., temporal resolution and output variables) but focuses on developing a framework for storing information about forecast uncertainty propagation, data assimilation, and model complexity, which aims to facilitate cross‐forecast synthesis. The initial application of this convention expands upon the Ecological Metadata Language (EML), a commonly used metadata standard in ecology. To facilitate community adoption, we also provide a Github repository containing a metadata validator tool and several vignettes in R and Python on how to both write and read in the EFI standard. Lastly, we provide guidance on forecast archiving, making an important distinction between short‐term dissemination and long‐term forecast archiving, while also touching on the archiving of code and workflows. Overall, the EFI convention is a living document that can continue to evolve over time through an open community process.
more »
« less
Change in Pictures: Creating best practices in archiving ecological imagery for reuse
The research data repository of the Environmental Data Initiative (EDI) is building on over 30 years of data curation research and experience in the National Science Foundation-funded US Long-Term Ecological Research (LTER) Network. It provides mature functionalities, well established workflows, and now publishes all ‘long-tail’ environmental data. High quality scientific metadata are enforced through automatic checks against community developed rules and the Ecological Metadata Language (EML) standard. Although the EDI repository is far along in making its data findable, accessible, interoperable, and reusable (FAIR), representatives from EDI and the LTER are developing best practices for the edge cases in environmental data publishing. One of these is the vast amount of imagery taken in the context of ecological research, ranging from wildlife camera traps to plankton imaging systems to aerial photography. Many images are used in biodiversity research for community analyses (e.g., individual counts, species cover, biovolume, productivity), while others are taken to study animal behavior and landscape-level change. Some examples from the LTER Network include: using photos of a heron colony to measure provisioning rates for chicks (Clarkson and Erwin 2018) or identifying changes in plant cover and functional type through time (Peters et al. 2020). Multi-spectral images are employed to identify prairie species. Underwater photo quads are used to monitor changes in benthic biodiversity (Edmunds 2015). Sosik et al. (2020) used a continuous Imaging FlowCytobot to identify and measure phyto- and microzooplankton. Cameras at McMurdo Dry Valleys assess snow and ice cover on Antarctic lakes allowing estimation of primary production (Myers 2019). It has been standard practice to publish numerical data extracted from images in EDI; however, the supporting imagery generally has not been made publicly available. Our goal in developing best practices for documenting and archiving these images is for them to be discovered and re-used. Our examples demonstrate several issues. The research questions, and hence, the image subjects are variable. Images frequently come in logical sets of time series. The size of such sets can be large and only some images may be contributed to a dedicated specialized repository. Finally, these images are taken in a larger monitoring context where many other environmental data are collected at the same time and location. Currently, a typical approach to publishing image data in EDI are packages containing compressed (ZIP or tar) files with the images, a directory manifest with additional image-specific metadata, and a package-level EML metadata file. Images in the compressed archive may be organized within directories with filenames corresponding to treatments, locations, time periods, individuals, or other grouping attributes. Additionally, the directory manifest table has columns for each attribute. Package-level metadata include standard coverage elements (e.g., date, time, location) and sampling methods. This approach of archiving logical ‘sets’ of images reduces the effort of providing metadata for each image when most information would be repeated, but at the expense of not making every image individually searchable. The latter may be overcome if the provided manifest contains standard metadata that would allow searching and automatic integration with other images.
more »
« less
- Award ID(s):
- 1637396
- PAR ID:
- 10330316
- Date Published:
- Journal Name:
- Biodiversity Information Science and Standards
- Volume:
- 4
- ISSN:
- 2535-0897
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Two programs that provide high-quality long-term ecological data, the Environmental Data Initiative (EDI) and the National Ecological Observatory Network (NEON), have recently teamed up with data users interested in synthesizing biodiversity data, such as ecological synthesis working groups supported by the US Long Term Ecological Research (LTER) Network Office, to make their data more Findable, Interoperable, Accessible, and Reusable (FAIR). To this end: we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. we have developed a flexible intermediate data design pattern for ecological community data (L1 formatted data in Fig. 1, see Fig. 2 for design details) called "ecocomDP" (O'Brien et al. 2021), and we provide tools to work with data packages in which this design pattern has been implemented. The ecocomDP format provides a data pattern commonly used for reporting community level data, such as repeated observations of species-level measures of biomass, abundance, percent cover, or density across multiple locations. The ecocomDP library for R includes tools to search for data packages, download or import data packages into an R (programming language) session in a standard format, and visualization tools for data exploration steps that are recommended for data users prior to any cross-study synthesis work. To date, EDI has created 70 ecocomDP data packages derived from their holdings, which include data from the US Long Term Ecological Research (US LTER) program, Long Term Research in Environmental Biology (LTREB) program, and other projects, which are now discoverable and accessible using the ecocomDP library. Similarly, NEON data products for 12 taxonomic groups are discoverable using the ecocomDP search tool. Input from data users provided guidance for the ecocomDP developers in mapping the NEON data products to the ecocomDP format to facilitate interoperability with the ecocomDP data packages available from the EDI repository. The standardized data design pattern allows common data visualizations across data packages, and has the potential to facilitate the development of new tools and workflows for biodiversity synthesis. The broader impacts of this collaboration are intended to lower the barriers for researchers in ecology and the environmental sciences to access and work with long-term biodiversity data and provide a hub around which data providers and data users can develop best practices that will build a diverse and inclusive community of practice.more » « less
-
The goal of the Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) project is to establish a collection of nonhuman ovary histology images for multiple species as a resource for researchers and educators. An important component of sharing scientific data is the inclusion of the contextual metadata that describes the data. MOTHER extends the Ecological Metadata Language (EML) for documenting research data, leveraging its data provenance and usage license with the inclusion of metadata for ovary histology images. The design of the MOTHER metadata includes information on the donor animal, including reproductive cycle status, the slide and its preparation. MOTHER also extends the ezEML tool, called ezEML+MOTHER, for the specification of the metadata. The design of the MOTHER database (MOTHER-DB) captures the metadata about the histology images, providing a searchable resource for discovering relevant images. MOTHER also defines a curation process for the ingestion of a collection of images and its metadata, verifying the validity of the metadata before its inclusion in the MOTHER collection. A Web search provides the ability to identify relevant images based on various characteristics in the metadata itself, such as genus and species, using filters.more » « less
-
Abstract The Multispecies Ovary Tissue Histology Electronic Repository (MOTHER) is a publicly accessible repository of ovary histology images. MOTHER includes hundreds of images from nonhuman primates, as well as ovary histology images from an expanding range of other species. Along with an image, MOTHER provides metadata about the image, and for selected species, follicle identification annotations. Ongoing work includes assisting scientists with contributing their histology images, creation of manual and automated (via machine learning) processing pipelines to identify and count ovarian follicles in different stages of development, and the incorporation of that data into the MOTHER database (MOTHER-DB). MOTHER will be a critical data repository storing and disseminating high-value histology images that are essential for research into ovarian function, fertility, and intra-species variability.more » « less
-
In order to calculate net community production (NCP) rates on Northeast U.S. Shelf Long-Term Ecological Research (NES-LTER) transect cruises, gas tracer data were collected with a continuous at-sea mass spectrometer. The ratio of O2/Ar, measured continuously from underway water, yields 8,000-15,000 rates of NCP per cruise. Discrete water samples (50 to 150 per cruise) were collected for triple oxygen isotope (TOI) analysis to estimate gross primary production (GPP) rates and ratios of NCP/GPP. Along-shelf (upstream-downstream) transects were conducted in addition to the main across-shelf transect. This data package provides two types of data tables for NES-LTER transect cruises beginning in 2018: a high-frequency continuous Equilibration Inlet Mass Spectrometer (EIMS) table, provided by year, and a low-frequency discrete triple oxygen isotope (TOI) table with all years combined. Rates calculated from these measurements are provided as separate packages, per year, in the EDI repository.more » « less
An official website of the United States government

