skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: iMicrobe: Tools and data-driven discovery platform for the microbiome sciences
Abstract BackgroundScientists have amassed a wealth of microbiome datasets, making it possible to study microbes in biotic and abiotic systems on a population or planetary scale; however, this potential has not been fully realized given that the tools, datasets, and computation are available in diverse repositories and locations. To address this challenge, we developed iMicrobe.us, a community-driven microbiome data marketplace and tool exchange for users to integrate their own data and tools with those from the broader community. FindingsThe iMicrobe platform brings together analysis tools and microbiome datasets by leveraging National Science Foundation–supported cyberinfrastructure and computing resources from CyVerse, Agave, and XSEDE. The primary purpose of iMicrobe is to provide users with a freely available, web-based platform to (1) maintain and share project data, metadata, and analysis products, (2) search for related public datasets, and (3) use and publish bioinformatics tools that run on highly scalable computing resources. Analysis tools are implemented in containers that encapsulate complex software dependencies and run on freely available XSEDE resources via the Agave API, which can retrieve datasets from the CyVerse Data Store or any web-accessible location (e.g., FTP, HTTP). ConclusionsiMicrobe promotes data integration, sharing, and community-driven tool development by making open source data and tools accessible to the research community in a web-based platform.  more » « less
Award ID(s):
1639588 1640775 1639614
PAR ID:
10555360
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
GigaScience
Volume:
8
Issue:
7
ISSN:
2047-217X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract SummaryPlasCAT (Plasmid Cloud Assembly Tool) is an easy-to-use cloud-based bioinformatics tool that enables de novo plasmid sequence assembly from raw sequencing data. Nontechnical users can now assemble sequences from long reads and short reads without ever touching a line of code. PlasCAT uses high-performance computing servers to reduce run times on assemblies and deliver results faster. Availability and implementationPlasCAT is freely available on the web at https://sequencing.genofab.com. The assembly pipeline source code and server code are available for download at https://bitbucket.org/genofabinc/workspace/projects/PLASCAT. Click the Cancel button to access the source code without authenticating. Web servers implemented in React.js and Python, with all major browsers supported. 
    more » « less
  2. Abstract BackgroundAt the molecular level, nonlinear networks of heterogeneous molecules control many biological processes, so that systems biology provides a valuable approach in this field, building on the integration of experimental biology with mathematical modeling. One of the biggest challenges to making this integration a reality is that many life scientists do not possess the mathematical expertise needed to build and manipulate mathematical models well enough to use them as tools for hypothesis generation. Available modeling software packages often assume some modeling expertise. There is a need for software tools that are easy to use and intuitive for experimentalists. ResultsThis paper introduces PlantSimLab, a web-based application developed to allow plant biologists to construct dynamic mathematical models of molecular networks, interrogate them in a manner similar to what is done in the laboratory, and use them as a tool for biological hypothesis generation. It is designed to be used by experimentalists, without direct assistance from mathematical modelers. ConclusionsMathematical modeling techniques are a useful tool for analyzing complex biological systems, and there is a need for accessible, efficient analysis tools within the biological community. PlantSimLab enables users to build, validate, and use intuitive qualitative dynamic computer models, with a graphical user interface that does not require mathematical modeling expertise. It makes analysis of complex models accessible to a larger community, as it is platform-independent and does not require extensive mathematical expertise. 
    more » « less
  3. Many research codes assume a user’s proficiency with high-performance computing tools, which often hinders their adoption by a community of users. Our goal is to create a user-friendly gateway to allow such users to leverage new ca- pabilities brought forward to the fracture mechanics community by the phase-field approach to fracture, implemented in the open source code vDef. We leveraged popular existing tools for building such frame- works: Agave, Django, and Docker, to build a Science Gateway that allows a user to submit a large number of jobs at once. We use the Agave framework to run jobs and handle all communications with the high-performance computers, as well as data sharing and tracking of provenance. Django was used to create a web application. Docker provided an easily deployable image of the system, simplifying setup by the user. The result is a system that masks all interactions with the high- performance computing environment and provides a graphical interface that makes sense for scientists. In the common situation of parameter sweeps our gateway also helps the scientists comparing outputs of various computations using a matrix view that links to individual computations. 
    more » « less
  4. Abstract Microbes drive myriad ecosystem processes, but under strong influence from viruses. Because studying viruses in complex systems requires different tools than those for microbes, they remain underexplored. To combat this, we previously aggregated double-stranded DNA (dsDNA) virus analysis capabilities and resources into ‘iVirus’ on the CyVerse collaborative cyberinfrastructure. Here we substantially expand iVirus’s functionality and accessibility, to iVirus 2.0, as follows. First, core iVirus apps were integrated into the Department of Energy’s Systems Biology KnowledgeBase (KBase) to provide an additional analytical platform. Second, at CyVerse, 20 software tools (apps) were upgraded or added as new tools and capabilities. Third, nearly 20-fold more sequence reads were aggregated to capture new data and environments. Finally, documentation, as “live” protocols, was updated to maximize user interaction with and contribution to infrastructure development. Together, iVirus 2.0 serves as a uniquely central and accessible analytical platform for studying how viruses, particularly dsDNA viruses, impact diverse microbial ecosystems. 
    more » « less
  5. Abstract PremiseAmong the slowest steps in the digitization of natural history collections is converting imaged labels into digital text. We present here a working solution to overcome this long‐recognized efficiency bottleneck that leverages synergies between community science efforts and machine learning approaches. MethodsWe present two new semi‐automated services. The first detects and classifies typewritten, handwritten, or mixed labels from herbarium sheets. The second uses a workflow tuned for specimen labels to label text using optical character recognition (OCR). The label finder and classifier was built via humans‐in‐the‐loop processes that utilize the community science Notes from Nature platform to develop training and validation data sets to feed into a machine learning pipeline. ResultsOur results showcase a >93% success rate for finding and classifying main labels. The OCR pipeline optimizes pre‐processing, multiple OCR engines, and post‐processing steps, including an alignment approach borrowed from molecular systematics. This pipeline yields >4‐fold reductions in errors compared to off‐the‐shelf open‐source solutions. The OCR workflow also allows human validation using a custom Notes from Nature tool. DiscussionOur work showcases a usable set of tools for herbarium digitization including a custom‐built web application that is freely accessible. Further work to better integrate these services into existing toolkits can support broad community use. 
    more » « less