skip to main content


Title: Implementing community best practice data and metadata capture into laboratory workflows using Sparrow
An implementation of the Sparrow data system (https://sparrow-data.org) is currently being developed to support laboratory workflows for sample preparation, geochemical analysis, and SEM imaging in support of tephra research. Tephra, consisting of fragmental material ejected from volcanoes, has a multidisciplinary array of applications from volcanology to geochronology, archaeology, environmental change, and more. The international tephra research community has developed a comprehensive set of recommendations for data and metadata collection and reporting (https://doi.org/10.5281/zenodo.3866266) as part of a broader effort to adopt FAIR practices. Implementations of these recommendations now exist for field data via StraboSpot (https://strabospot.org/files/StraboSpotTephraHelp.pdf) and for samples, analytical methods, and geochemistry via SESAR and EarthChem (https://earthchem.org/communities/tephra/). Implementing these recommended practices in Sparrow helps to (1) cover laboratory workflows between field sample collection and project data archiving and (2) address a key researcher pain point. As re-emphasized by participants in the Tephra Fusion 2022 workshop earlier this year (Wallace et al., this meeting), the huge workload currently needed to capture and organize data and metadata in preparation for archiving in community data repositories is a major obstacle to achieving FAIR practices. By capturing this information on the fly during laboratory workflows and integrating it together in a single data system, this challenge may be overcome. We are implementing the tephra community recommendations as extensions to Sparrow’s core database schema. Data import pipelines and user interfaces to streamline metadata capture are also being developed. In the longer term, we aim to achieve interoperability with an ecosystem of tools and repositories like StraboSpot, SESAR, EarthChem, and Throughput. The results of these developments will be applicable not just to tephra but also to other research areas which utilize similar laboratory and analytical methods - e.g. sedimentology, mineralogy, and petrology.  more » « less
Award ID(s):
1928341
NSF-PAR ID:
10359380
Author(s) / Creator(s):
Date Published:
Journal Name:
EarthCube Annual Meeting 2022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Tephra is a unique volcanic product with an unparalleled role in understanding past eruptions, the long-term behavior of volcanoes, and the effects of volcanism on climate and the environment. Tephra deposits also provide spatially widespread, extremely high-resolution time-stratigraphic markers across a range of sedimentary settings and are used by many disciplines (e.g. volcanology, seismotectonics, climate science, archaeology, ecology, public health, ash impact assessment). The interdisciplinary shift in tephra studies over the last two decades is challenged by the lack of standardization that often prevents comparison amongst various regions and across disciplines. To address this challenge, the global tephra community has united through a series of workshops to establish best practice recommendations for tephra studies, including sample collection, analysis and data reporting (https://doi.org/10.5281/zenodo.3866266). This new standardized framework is being incorporated into digital tools and data repositories and supports FAIR (findable, accessible, interoperable and reusable) data principles. Widespread adoption will facilitate consistent tephra documentation and parametrization, foster interdisciplinary communication and improve the effectiveness of data sharing among diverse communities of researchers. Here we report on recent implementations of the best-practice recommendations including: 1) a set of templates for samples, methods documentation, and data reporting, 2) a tephra module in the StraboSpot field app (https://strabospot.org), 3) implementations at SESAR and EarthChem, including a tephra community portal (https://earthchem.org/communities/tephra/), 4) implementation in the Sparrow laboratory data system (https://sparrow-data.org/), and 5) a new manuscript supporting the framework. Data linking is facilitated by extensive use of unique identifiers including ORCIDs for people, IGSNs for field sites and samples; DOIs for publications, data, and methods; and Smithsonian IDs for volcanoes and eruptions. These developments allow users to follow simple workflows to archive data and facilitate faster access to key research by secondary users. 
    more » « less
  2. Tephra is a unique volcanic product that plays an unparalleled role in understanding past eruptions, the long-term behavior of volcanoes, and the effects of volcanism on climate and the environment. Tephra deposits also provide spatially widespread, extremely high-resolution time-stratigraphic markers across a range of sedimentary settings and are used by many disciplines (e.g. volcanology, seismotectonics, climate science, archaeology, ecology, public health and ash impact assessment). In the last two decades, tephra studies have become more interdisciplinary in nature but are challenged by a lack of standardization that often prevents comparison amongst various regions and across disciplines. To address this challenge, the global tephra community has come together through a series of workshops to establish best practice recommendations for tephra studies from sample collection through analysis and data reporting. This new standardized framework will facilitate consistent tephra documentation and parametrization, foster interdisciplinary communication, and improve effectiveness of data sharing among diverse communities of researchers. One specific goal is to use the best practice guidelines to inform digital tool and data repository development. Here we report on 1) a new set of templates for tephra sample documentation, geochemical method documentation and data reporting using recommended best- practice data and metadata fields, 2) a new tephra module added to StraboSpot, an open source geologic mapping and data- recording multi-platform software application, and 3) new implementations and cross-mapping of metadata requirements at SESAR (System for Earth Sample Registration) and EarthChem. Addition of tephra-specific fields to StraboSpot enables users to consistently collect and report essential tephra data in the field which is then automatically saved to an online data repository. A new tephra portal on the EarthChem website will allow users to follow simple workflows to register tephra samples at SESAR and submit microanalytical data to EarthChem. 
    more » « less
  3. A series of international workshops held in 2014, 2017, 2019, and 2022 focused on improving tephra studies from field collection through publication and encouraging FAIR (findable, accessible, interoperable, reusable) data practices for tephra data and metadata. Two consensus needs for tephra studies emerged from the 2014 and 2017 workshops: (a) standardization of tephra field data collection, geochemical analysis, correlation, and data reporting, and (b) development of next generation computer tools and databases to facilitate information access across multidisciplinary communities. To achieve (a), we developed a series of recommendations for best practices in tephra studies, from sample collection through analysis and data reporting (https://zenodo.org/record/3866266). A 4-part virtual workshop series (https://tephrochronology.org/cot/Tephra2022/) was held in February and March, 2022, to update the tephra community on these developments, to get community feedback, to learn of unmet needs, and to plan a future roadmap for open and FAIR tephra data. More than 230 people from 25 nations registered for the workshop series. The community strongly emphasized the need for better computer systems, including physical infrastructure (repositories and servers), digital infrastructure (software and tools) and human infrastructure (people, training, and professional assistance), to store, manage and serve global tephra datasets. Some desired attributes of improved computer systems include: 1) user friendliness 2) ability to easily ingest multiparameter tephra data (using best practice recommended data fields); 3) interoperability with existing data repositories; 4) development of tool add-ons (plotting and statistics); 5) improved searchability 6) development of a tephra portal with access to distributed data systems, and 7) commitments to long-term support from funding agencies, publishers and the cyberinfrastructure community. 
    more » « less
  4. The research data repository of the Environmental Data Initiative (EDI) is building on over 30 years of data curation research and experience in the National Science Foundation-funded US Long-Term Ecological Research (LTER) Network. It provides mature functionalities, well established workflows, and now publishes all ‘long-tail’ environmental data. High quality scientific metadata are enforced through automatic checks against community developed rules and the Ecological Metadata Language (EML) standard. Although the EDI repository is far along in making its data findable, accessible, interoperable, and reusable (FAIR), representatives from EDI and the LTER are developing best practices for the edge cases in environmental data publishing. One of these is the vast amount of imagery taken in the context of ecological research, ranging from wildlife camera traps to plankton imaging systems to aerial photography. Many images are used in biodiversity research for community analyses (e.g., individual counts, species cover, biovolume, productivity), while others are taken to study animal behavior and landscape-level change. Some examples from the LTER Network include: using photos of a heron colony to measure provisioning rates for chicks (Clarkson and Erwin 2018) or identifying changes in plant cover and functional type through time (Peters et al. 2020). Multi-spectral images are employed to identify prairie species. Underwater photo quads are used to monitor changes in benthic biodiversity (Edmunds 2015). Sosik et al. (2020) used a continuous Imaging FlowCytobot to identify and measure phyto- and microzooplankton. Cameras at McMurdo Dry Valleys assess snow and ice cover on Antarctic lakes allowing estimation of primary production (Myers 2019). It has been standard practice to publish numerical data extracted from images in EDI; however, the supporting imagery generally has not been made publicly available. Our goal in developing best practices for documenting and archiving these images is for them to be discovered and re-used. Our examples demonstrate several issues. The research questions, and hence, the image subjects are variable. Images frequently come in logical sets of time series. The size of such sets can be large and only some images may be contributed to a dedicated specialized repository. Finally, these images are taken in a larger monitoring context where many other environmental data are collected at the same time and location. Currently, a typical approach to publishing image data in EDI are packages containing compressed (ZIP or tar) files with the images, a directory manifest with additional image-specific metadata, and a package-level EML metadata file. Images in the compressed archive may be organized within directories with filenames corresponding to treatments, locations, time periods, individuals, or other grouping attributes. Additionally, the directory manifest table has columns for each attribute. Package-level metadata include standard coverage elements (e.g., date, time, location) and sampling methods. This approach of archiving logical ‘sets’ of images reduces the effort of providing metadata for each image when most information would be repeated, but at the expense of not making every image individually searchable. The latter may be overcome if the provided manifest contains standard metadata that would allow searching and automatic integration with other images. 
    more » « less
  5. Abstract The StraboSpot data system provides field-based geologists the ability to digitally collect, archive, query, and share data. Recent efforts have expanded this data system with the vocabulary, standards, and workflow utilized by the sedimentary geology community. A standardized vocabulary that honors typical workflows for collecting sedimentologic and stratigraphic field and laboratory data was developed through a series of focused workshops and vetted/refined through subsequent workshops and field trips. This new vocabulary was designed to fit within the underlying structure of StraboSpot and resulted in the expansion of the existing data structure. Although the map-based approach of StraboSpot did not fully conform to the workflow for sedimentary geologists, new functions were developed for the sedimentary community to facilitate descriptions, interpretations, and the plotting of measured sections to document stratigraphic position and relationships between data types. Consequently, a new modality was added to StraboSpot—Strat Mode—which now accommodates sedimentary workflows that enable users to document stratigraphic positions and relationships and automates construction of measured stratigraphic sections. Strat Mode facilitates data collection and co-location of multiple data types (e.g., descriptive observations, images, samples, and measurements) in geographic and stratigraphic coordinates across multiple scales, thus preserving spatial and stratigraphic relationships in the data structure. Incorporating these digital technologies will lead to better research communication in sedimentology through a common vocabulary, shared standards, and open data archiving and sharing. 
    more » « less