skip to main content

This content will become publicly available on November 23, 2024

Title: A community convention for ecological forecasting: Output files and metadata version 1.0

This paper summarizes the open community conventions developed by the Ecological Forecasting Initiative (EFI) for the common formatting and archiving of ecological forecasts and the metadata associated with these forecasts. Such open standards are intended to promote interoperability and facilitate forecast communication, distribution, validation, and synthesis. For output files, we first describe the convention conceptually in terms of global attributes, forecast dimensions, forecasted variables, and ancillary indicator variables. We then illustrate the application of this convention to the two file formats that are currently preferred by the EFI, netCDF (network common data form), and comma‐separated values (CSV), but note that the convention is extensible to future formats. For metadata, EFI's convention identifies a subset of conventional metadata variables that are required (e.g., temporal resolution and output variables) but focuses on developing a framework for storing information about forecast uncertainty propagation, data assimilation, and model complexity, which aims to facilitate cross‐forecast synthesis. The initial application of this convention expands upon the Ecological Metadata Language (EML), a commonly used metadata standard in ecology. To facilitate community adoption, we also provide a Github repository containing a metadata validator tool and several vignettes in R and Python on how to both write and read in the EFI standard. Lastly, we provide guidance on forecast archiving, making an important distinction between short‐term dissemination and long‐term forecast archiving, while also touching on the archiving of code and workflows. Overall, the EFI convention is a living document that can continue to evolve over time through an open community process.

more » « less
Award ID(s):
1926388 1942280 1638577
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Date Published:
Journal Name:
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Near‐term iterative forecasting is a powerful tool for ecological decision support and has the potential to transform our understanding of ecological predictability. However, to this point, there has been no cross‐ecosystem analysis of near‐term ecological forecasts, making it difficult to synthesize diverse research efforts and prioritize future developments for this emerging field. In this study, we analyzed 178 near‐term (≤10‐yr forecast horizon) ecological forecasting papers to understand the development and current state of near‐term ecological forecasting literature and to compare forecast accuracy across scales and variables. Our results indicated that near‐term ecological forecasting is widespread and growing: forecasts have been produced for sites on all seven continents and the rate of forecast publication is increasing over time. As forecast production has accelerated, some best practices have been proposed and application of these best practices is increasing. In particular, data publication, forecast archiving, and workflow automation have all increased significantly over time. However, adoption of proposed best practices remains low overall: for example, despite the fact that uncertainty is often cited as an essential component of an ecological forecast, only 45% of papers included uncertainty in their forecast outputs. As the use of these proposed best practices increases, near‐term ecological forecasting has the potential to make significant contributions to our understanding of forecastability across scales and variables. In this study, we found that forecastability (defined here as realized forecast accuracy) decreased in predictable patterns over 1–7 d forecast horizons. Variables that were closely related (i.e., chlorophyll and phytoplankton) displayed very similar trends in forecastability, while more distantly related variables (i.e., pollen and evapotranspiration) exhibited significantly different patterns. Increasing use of proposed best practices in ecological forecasting will allow us to examine the forecastability of additional variables and timescales in the future, providing a robust analysis of the fundamental predictability of ecological variables.

    more » « less
  2. Abstract

    Ecological forecasting provides a powerful set of methods for predicting short‐ and long‐term change in living systems. Forecasts are now widely produced, enabling proactive management for many applied ecological problems. However, despite numerous calls for an increased emphasis on prediction in ecology, the potential for forecasting to accelerate ecological theory development remains underrealized.

    Here, we provide a conceptual framework describing how ecological forecasts can energize and advance ecological theory. We emphasize the many opportunities for future progress in this area through increased forecast development, comparison and synthesis.

    Our framework describes how a forecasting approach can shed new light on existing ecological theories while also allowing researchers to address novel questions. Through rigorous and repeated testing of hypotheses, forecasting can help to refine theories and understand their generality across systems. Meanwhile, synthesizing across forecasts allows for the development of novel theory about the relative predictability of ecological variables across forecast horizons and scales.

    We envision a future where forecasting is integrated as part of the toolset used in fundamental ecology. By outlining the relevance of forecasting methods to ecological theory, we aim to decrease barriers to entry and broaden the community of researchers using forecasting for fundamental ecological insight.

    more » « less
  3. This data publication includes code and results from a systematic literature review on the current state of near-term forecasting of freshwater quality. The review aimed to address the following questions: (1) Freshwater variables, scales, models, and skill: Which freshwater variables and temporal scales are most commonly targeted for near-term forecasts, and what modeling methods are most commonly employed to develop these forecasts? How is the accuracy of freshwater quality forecasts assessed, and how accurate are they? How is uncertainty typically incorporated into water quality forecast output? (2) Forecast infrastructure and workflows: Are iterative, automated workflows commonly employed in near-term freshwater quality forecasting? How are forecasts validated and archived? (3) Human dimensions: What is the stated motivation for development of most near-term freshwater quality forecasts, and who are the most common end users (if any)? How are end users engaged in forecast development? An initial search was conducted for published papers presenting freshwater quality forecasts from 1 January 2017 to 17 February 2022 in the Web of Science Core Collection. Results were subsequently analyzed in three stages. First, paper titles were screened for relevance. Second, an initial screen was conducted to assess whether each paper presented a near-term freshwater quality forecast. Third, papers that passed the initial screen were analyzed using a standardized matrix to assess the state of near-term freshwater quality forecasting and identify areas of recent progress and ongoing challenges. Additional details regarding the systematic literature search and review are presented in the Methods section of the metadata. 
    more » « less
  4. PmagPy Online: Jupyter Notebooks, the PmagPy Software Package and the Magnetics Information Consortium (MagIC) Database Lisa Tauxe$^1$, Rupert Minnett$^2$, Nick Jarboe$^1$, Catherine Constable$^1$, Anthony Koppers$^2$, Lori Jonestrask$^1$, Nick Swanson-Hysell$^3$ $^1$Scripps Institution of Oceanography, United States of America; $^2$ Oregon State University; $^3$ University of California, Berkely; The Magnetics Information Consortium (MagIC), hosted at is a database that serves as a Findable, Accessible, Interoperable, Reusable (FAIR) archive for paleomagnetic and rock magnetic data. It has a flexible, comprehensive data model that can accomodate most kinds of paleomagnetic data. The PmagPy software package is a cross-platform and open-source set of tools written in Python for the analysis of paleomagnetic data that serves as one interface to MagIC, accommodating various levels of user expertise. It is available through Because PmagPy requires installation of Python, several non-standard Python modules, and the PmagPy software package, there is a speed bump for many practitioners on beginning to use the software. In order to make the software and MagIC more accessible to the broad spectrum of scientists interested in paleo and rock magnetism, we have prepared a set of Jupyter notebooks, hosted on which serve a set of purposes. 1) There is a complete course in Python for Earth Scientists, 2) a set of notebooks that introduce PmagPy (pulling the software package from the github repository) and illustrate how it can be used to create data products and figures for typical papers, and 3) show how to prepare data from the laboratory to upload into the MagIC database. The latter will satisfy expectations from NSF for data archiving and for example the AGU publication data archiving requirements. Getting started To use the PmagPy notebooks online, go to website at Create an Earthref account using your ORCID and log on. [This allows you to keep files in a private work space.] Open the PmagPy Online - Setup notebook and execute the two cells. Then click on File = > Open and click on the PmagPy_Online folder. Open the PmagPy_online notebook and work through the examples. There are other notebooks that are useful for the working paleomagnetist. Alternatively, you can install Python and the PmagPy software package on your computer (see for instructions). Follow the instructions for "Full PmagPy install and update" through section 1.4 (Quickstart with PmagPy notebooks). This notebook is in the collection of PmagPy notebooks. Overview of MagIC The Magnetics Information Consortium (MagIC), hosted at is a database that serves as a Findable, Accessible, Interoperable, Reusable (FAIR) archive for paleomagnetic and rock magnetic data. Its datamodel is fully described here: Each contribution is associated with a publication via the DOI. There are nine data tables: contribution: metadata of the associated publication. locations: metadata for locations, which are groups of sites (e.g., stratigraphic section, region, etc.) sites: metadata and derived data at the site level (units with a common expectation) samples: metadata and derived data at the sample level. specimens: metadata and derived data at the specimen level. criteria: criteria by which data are deemed acceptable ages: ages and metadata for sites/samples/specimens images: associated images and plots. Overview of PmagPy The functionality of PmagPy is demonstrated within notebooks in the PmagPy repository: PmagPy_online.ipynb: serves as an introdution to PmagPy and MagIC (this conference). It highlights the link between PmagPy and the Findable Accessible Interoperable Reusabe (FAIR) database maintained by the Magnetics Information Consortium (MagIC) at Other notebooks of interest are: PmagPy_calculations.ipynb: demonstrates many of the PmagPy calculation functions such as those that rotate directions, return statistical parameters, and simulate data from specified distributions. PmagPy_plots_analysis.ipynb: demonstrates PmagPy functions that can be used to visual data as well as those that conduct statistical tests that have associated visualizations. PmagPy_MagIC.ipynb: demonstrates how PmagPy can be used to read and write data to and from the MagIC database format including conversion from many individual lab measurement file formats. Please see also our YouTube channel with more presentations from the 2020 MagIC workshop here: 
    more » « less
  5. Abstract

    Arthropods play a dominant role in natural and human-modified terrestrial ecosystem dynamics. Spatially-explicit arthropod population time-series data are crucial for statistical or mathematical models of these dynamics and assessment of their veterinary, medical, agricultural, and ecological impacts. Such data have been collected world-wide for over a century, but remain scattered and largely inaccessible. In particular, with the ever-present and growing threat of arthropod pests and vectors of infectious diseases, there are numerous historical and ongoing surveillance efforts, but the data are not reported in consistent formats and typically lack sufficient metadata to make reuse and re-analysis possible. Here, we present the first-ever minimum information standard for arthropod abundance, Minimum Information for Reusable Arthropod Abundance Data (MIReAD). Developed with broad stakeholder collaboration, it balances sufficiency for reuse with the practicality of preparing the data for submission. It is designed to optimize data (re)usability from the “FAIR,” (Findable, Accessible, Interoperable, and Reusable) principles of public data archiving (PDA). This standard will facilitate data unification across research initiatives and communities dedicated to surveillance for detection and control of vector-borne diseases and pests.

    more » « less