An implementation of the Sparrow data system (https://sparrow-data.org) is currently being developed to support laboratory workflows for sample preparation, geochemical analysis, and SEM imaging in support of tephra research. Tephra, consisting of fragmental material ejected from volcanoes, has a multidisciplinary array of applications from volcanology to geochronology, archaeology, environmental change, and more. The international tephra research community has developed a comprehensive set of recommendations for data and metadata collection and reporting (https://doi.org/10.5281/zenodo.3866266) as part of a broader effort to adopt FAIR practices. Implementations of these recommendations now exist for field data via StraboSpot (https://strabospot.org/files/StraboSpotTephraHelp.pdf) and for samples, analytical methods, and geochemistry via SESAR and EarthChem (https://earthchem.org/communities/tephra/). Implementing these recommended practices in Sparrow helps to (1) cover laboratory workflows between field sample collection and project data archiving and (2) address a key researcher pain point. As re-emphasized by participants in the Tephra Fusion 2022 workshop earlier this year (Wallace et al., this meeting), the huge workload currently needed to capture and organize data and metadata in preparation for archiving in community data repositories is a major obstacle to achieving FAIR practices. By capturing this information on the fly during laboratory workflows and integrating it together in a single data system, this challenge may be overcome. We are implementing the tephra community recommendations as extensions to Sparrow’s core database schema. Data import pipelines and user interfaces to streamline metadata capture are also being developed. In the longer term, we aim to achieve interoperability with an ecosystem of tools and repositories like StraboSpot, SESAR, EarthChem, and Throughput. The results of these developments will be applicable not just to tephra but also to other research areas which utilize similar laboratory and analytical methods - e.g. sedimentology, mineralogy, and petrology. 
                        more » 
                        « less   
                    
                            
                            medna-metadata: an open-source data management system for tracking environmental DNA samples and metadata
                        
                    
    
            Abstract MotivationEnvironmental DNA (eDNA), as a rapidly expanding research field, stands to benefit from shared resources including sampling protocols, study designs, discovered sequences, and taxonomic assignments to sequences. High-quality community shareable eDNA resources rely heavily on comprehensive metadata documentation that captures the complex workflows covering field sampling, molecular biology lab work, and bioinformatic analyses. There are limited sources that provide documentation of database development on comprehensive metadata for eDNA and these workflows and no open-source software. ResultsWe present medna-metadata, an open-source, modular system that aligns with Findable, Accessible, Interoperable, and Reusable guiding principles that support scholarly data reuse and the database and application development of a standardized metadata collection structure that encapsulates critical aspects of field data collection, wet lab processing, and bioinformatic analysis. Medna-metadata is showcased with metabarcoding data from the Gulf of Maine (Polinski et al., 2019). Availability and implementationThe source code of the medna-metadata web application is hosted on GitHub (https://github.com/Maine-eDNA/medna-metadata). Medna-metadata is a docker-compose installable package. Documentation can be found at https://medna-metadata.readthedocs.io/en/latest/?badge=latest. The application is implemented in Python, PostgreSQL and PostGIS, RabbitMQ, and NGINX, with all major browsers supported. A demo can be found at https://demo.metadata.maine-edna.org/. Supplementary informationSupplementary data are available at Bioinformatics online. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1849227
- PAR ID:
- 10372296
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Bioinformatics
- Volume:
- 38
- Issue:
- 19
- ISSN:
- 1367-4803
- Format(s):
- Medium: X Size: p. 4589-4597
- Size(s):
- p. 4589-4597
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Cheng, Jianlin (Ed.)Abstract MotivationA Multiple Sequence Alignment (MSA) contains fundamental evolutionary information that is useful in the prediction of structure and function of proteins and nucleic acids. The “Number of Effective Sequences” (NEFF) quantifies the diversity of sequences of an MSA. While several tools embed NEFF calculation with various options, none are standalone tools for this purpose, and they do not offer all the available options. ResultsWe developed NEFFy, the first software package to integrate all these options and calculate NEFF across diverse MSA formats for proteins, RNAs, and DNAs. It surpasses existing tools in functionality without compromising computational efficiency and scalability. NEFFy also offers per-residue NEFF calculation and supports NEFF computation for MSAs of multimeric proteins, with the capability to be extended to DNAs and RNAs. Availability and ImplementationNEFFy is released as open-source software under the GNU Public License v3.0. The source code in C ++ and a Python wrapper are available at https://github.com/Maryam-Haghani/NEFFy. To ensure users can fully leverage these capabilities, comprehensive documentation and examples are provided at https://Maryam-Haghani.github.io/NEFFy. Supplementary InformationSupplementary data are available at Bioinformatics online.more » « less
- 
            PremiseThe digitization of natural history collections includes transcribing specimen label data into standardized formats. Born‐digital specimen data initially gathered in digital formats do not need to be transcribed, enabling their efficient integration into digitized collections. Modernizing field collection methods for born‐digital workflows requires the development of new tools and processes. Methods and ResultscollNotes, a mobile application, was developed for Android andiOSto supplement traditional field journals. Designed for efficiency in the field, collNotes avoids redundant data entries and does not require cellular service. collBook, a companion desktop application, refines field notes into database‐ready formats and produces specimen labels. ConclusionscollNotes and collBook can be used in combination as a field‐to‐database solution for gathering born‐digital voucher specimen data for plants and fungi. Both programs are open source and use common file types simplifying either program's integration into existing workflows.more » « less
- 
            Abstract SummarySequencing data resources have increased exponentially in recent years, as has interest in large-scale meta-analyses of integrated next-generation sequencing datasets. However, curation of integrated datasets that match a user’s particular research priorities is currently a time-intensive and imprecise task. MetaSeek is a sequencing data discovery tool that enables users to flexibly search and filter on any metadata field to quickly find the sequencing datasets that meet their needs. MetaSeek automatically scrapes metadata from all publicly available datasets in the Sequence Read Archive, cleans and parses messy, user-provided metadata into a structured, standard-compliant database and predicts missing fields where possible. MetaSeek provides a web-based graphical user interface and interactive visualization dashboard, as well as a programmatic API to rapidly search, filter, visualize, save, share and download matching sequencing metadata. Availability and implementationThe MetaSeek online interface is available at https://www.metaseek.cloud/. The MetaSeek database can also be accessed via API to programmatically search, filter and download all metadata. MetaSeek source code, metadata scrapers and documents are available at https://github.com/MetaSeek-Sequencing-Data-Discovery/metaseek/.more » « less
- 
            Abstract MotivationAcross biology, we are seeing rapid developments in scale of data production without a corresponding increase in data analysis capabilities. ResultsHere, we present Aether (http://aether.kosticlab.org), an intuitive, easy-to-use, cost-effective and scalable framework that uses linear programming to optimally bid on and deploy combinations of underutilized cloud computing resources. Our approach simultaneously minimizes the cost of data analysis and provides an easy transition from users’ existing HPC pipelines. Availability and implementationData utilized are available at https://pubs.broadinstitute.org/diabimmune and with EBI SRA accession ERP005989. Source code is available at (https://github.com/kosticlab/aether). Examples, documentation and a tutorial are available at http://aether.kosticlab.org. Supplementary informationSupplementary data are available at Bioinformatics online.more » « less
 An official website of the United States government
An official website of the United States government 
				
			
