skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CORE: Comparable, Open, Reliable, Extensible Software: An Experience Report of Four Flux Balance Analysis Tools (Poster)
Computational tools have become increasingly important to the advancement of biological research. Despite the existence of data-sharing principles such as FAIR, there has been little attention paid to ensuring that computational tools provide a platform to compare experimental results and data across different studies. Tools often are lightly documented, non-extensible, and provide different levels of accuracy of their representation. This lack of standardization in biological research reduces the potential power for new insight and discovery and makes it hard for biologists to experiment, compare, and trust results between different studies. In this poster we present our experience using four tools that perform flux balance analysis, on a set of different metabolic models. We frame our work around a proposed principle, akin to FAIR, aimed at bioinformatics tools. We call this CORE (Comparable, Open, Reliable and Extensible), and find the biggest challenges to be with comparability between tools. We also find that while the tools are all open source, without a deep understanding of the code base, they have insufficient openness. We needed to reach out to developers to resolve many of our questions, and we were still left with unexpected behaviors. We present lessons learned as a path to future improvement using the CORE principles.  more » « less
Award ID(s):
1909688
PAR ID:
10464794
Author(s) / Creator(s):
Date Published:
Journal Name:
ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Science Gateways provide an easily accessible and powerful computing environment for researchers. These are built around a set of software tools that are frequently and heavily used by large number of researchers in specific domains. Science Gateways have been catering to a growing need of researchers for easy to use computational tools, however their usage model is typically single user-centric. As scientific research becomes ever more team oriented, the need driven by user-demand to support integrated collaborative capabilities in Science Gateways is natural progression. Ability to share data/results with others in an integrated manner is an important and frequently requested capability. In this article we will describe and discuss our work to provide a rich environment for data organization and data sharing by integrating the SeedMeLab (formerly SeedMe2) platform with two Science Gateways: CIPRES and GenApp. With this integration we also demonstrate SeedMeLab’s extensible features and how Science Gateways may incorporate and realize FAIR data principles in practice and transform into community data hubs. 
    more » « less
  2. The ability to repeat research is vital in confirming the validity of scientific discovery and is relevant to ubiquitous sensor research. Investigation of novel sensors and sensing mechanisms intersect several Federal and non-Federal agencies. Despite numerous studies on sensors at different stages of development, the absence of new field-ready or commercial sensors seems limited by reproducibility. Current research practices in sensors needs sustainable transformations. The scientific community seeks ways to incorporate reproducibility and repeatability to validate published results. A case study on the reproducibility of low-cost air quality sensors is presented. In this context, the article discusses (a) open source data management frameworks in alignment with findability, accessibility, interoperability, and reuse (FAIR) principles to facilitate sensor reproducibility; (b) suggestions for journals focused on sensors to incorporate a reproducibility editorial board and incentivization for data sharing; (c) practice of reproducibility by targeted focus issues; and (d) education of current and the next generation of diverse student and faculty community on FAIR principles. The existence of different types of sensors such as physical, chemical, biological, and magnetic (to name a few) and the fact that the sensing field spans multiple disciplines (electrical engineering, mechanical engineering, physics, chemistry, and electrochemistry) call for a generic model for reproducibility. Considering the available metrics, the authors propose eight FAIR metric standards to that transcend disciplines: citation standards, design and analysis transparency, data transparency, analytical methods transparency, research materials transparency, hardware transparency, preregistration of studies, and replication. 
    more » « less
  3. Abstract As the scale of biological data generation has increased, the bottleneck of research has shifted from data generation to analysis. Researchers commonly need to build computational workflows that include multiple analytic tools and require incremental development as experimental insights demand tool and parameter modifications. These workflows can produce hundreds to thousands of intermediate files and results that must be integrated for biological insight. Data-centric workflow systems that internally manage computational resources, software, and conditional execution of analysis steps are reshaping the landscape of biological data analysis and empowering researchers to conduct reproducible analyses at scale. Adoption of these tools can facilitate and expedite robust data analysis, but knowledge of these techniques is still lacking. Here, we provide a series of strategies for leveraging workflow systems with structured project, data, and resource management to streamline large-scale biological analysis. We present these practices in the context of high-throughput sequencing data analysis, but the principles are broadly applicable to biologists working beyond this field. 
    more » « less
  4. The FAIR Hackathon Workshop for Mathematics and the Physical Sciences (MPS) February 27-28, 2019 in Alexandria, Virginia brought together forty-four stakeholders in the physical sciences community to share skills, tools and techniques to FAIRify research data. As one of the first efforts of its kind in the US, the workshop offered participants a way to engage with FAIR principles (Findable, Accessible, Interoperable and Reusable) Data and metrics in the context of a hackathon. The workshop was designed to address issues of public access to data and to provide experience with FAIR tools and relevant hands-on experience for researchers. Existing FAIR tools and infrastructure were introduced. Hands-on hackathon breakout time was devoted to testing FAIR metrics and tools against physical sciences data. The hackathon invited MPS research data management stakeholders to react to the FAIR principles and to jointly consider gaps in the MPS data sharing ecosystem in the context of researcher’s actual projects. FAIR Gap analysis was introduced as a way to identify community-specific tools or infrastructure that could dramatically enhance the ability of domain scientists to make their data more FAIR. 
    more » « less
  5. Abstract Image‐based machine learning tools are an ascendant ‘big data’ research avenue. Citizen science platforms, like iNaturalist, and museum‐led initiatives provide researchers with an abundance of data and knowledge to extract. These include extraction of metadata, species identification, and phenomic data. Ecological and evolutionary biologists are increasingly using complex, multi‐step processes on data. These processes often include machine learning techniques, often built by others, that are difficult to reuse by other members in a collaboration.We present a conceptual workflow model for machine learning applications using image data to extract biological knowledge in the emerging field of imageomics. We derive an implementation of this conceptual workflow for a specific imageomics application that adheres to FAIR principles as a formal workflow definition that allows fully automated and reproducible execution, and consists of reusable workflow components.We outline technologies and best practices for creating an automated, reusable and modular workflow, and we show how they promote the reuse of machine learning models and their adaptation for new research questions. This conceptual workflow can be adapted: it can be semi‐automated, contain different components than those presented here, or have parallel components for comparative studies.We encourage researchers—both computer scientists and biologists—to build upon this conceptual workflow that combines machine learning tools on image data to answer novel scientific questions in their respective fields. 
    more » « less