skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Enabling data‐driven collaborative and reproducible environmental synthesis science
Abstract This manuscript shares the lessons learned from providing scientific computing support to over 600 researchers and discipline experts, helping them develop reproducible and scalable analytical workflows to process large amounts of heterogeneous data.When providing scientific computing support, focus is first placed on how to foster the collaborative aspects of multidisciplinary projects on the technological side by providing virtual spaces to communicate and share documents. Then insights on data management planning and how to implement a centralized data management workflow for data‐driven projects are provided.Developing reproducible workflows requires the development of code. We describe tools and practices that have been successful in fostering collaborative coding and scaling on remote servers, enabling teams to iterate more efficiently. We have found short training sessions combined with on‐demand specialized support to be the most impactful combination in helping scientists develop their technical skills.Here we share our experiences in enabling researchers to do science more collaboratively and more reproducibly beyond any specific project, with long‐lasting effects on the way researchers conduct science. We hope that other groups supporting team‐ and data‐driven science (in environmental science and beyond) will benefit from the lessons we have learned over the years through trial and error.  more » « less
Award ID(s):
2419138 1929393
PAR ID:
10593947
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Methods in Ecology and Evolution
Volume:
16
Issue:
6
ISSN:
2041-210X
Format(s):
Medium: X Size: p. 1061-1074
Size(s):
p. 1061-1074
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract AQME, automated quantum mechanical environments, is a free and open‐source Python package for the rapid deployment of automated workflows using cheminformatics and quantum chemistry. AQME workflows integrate tasks performed across multiple computational chemistry packages and data formats, preserving all computational protocols, data, and metadata for machine and human users to access and reuse. AQME has a modular structure of independent modules that can be implemented in any sequence, allowing the users to use all or only the desired parts of the program. The code has been developed for researchers with basic familiarity with the Python programming language. The CSEARCH module interfaces to molecular mechanics and semi‐empirical QM (SQM) conformer generation tools (e.g., RDKit and Conformer–Rotamer Ensemble Sampling Tool, CREST) starting from various initial structure formats. The CMIN module enables geometry refinement with SQM and neural network potentials, such as ANI. The QPREP module interfaces with multiple QM programs, such as Gaussian, ORCA, and PySCF. The QCORR module processes QM results, storing structural, energetic, and property data while also enabling automated error handling (i.e., convergence errors, wrong number of imaginary frequencies, isomerization, etc.) and job resubmission. The QDESCP module provides easy access to QM ensemble‐averaged molecular descriptors and computed properties, such as NMR spectra. Overall, AQME provides automated, transparent, and reproducible workflows to produce, analyze and archive computational chemistry results. SMILES inputs can be used, and many aspects of tedious human manipulation can be avoided. Installation and execution on Windows, macOS, and Linux platforms have been tested, and the code has been developed to support access through Jupyter Notebooks, the command line, and job submission (e.g., Slurm) scripts. Examples of pre‐configured workflows are available in various formats, and hands‐on video tutorials illustrate their use. This article is categorized under:Data Science > ChemoinformaticsData Science > Computer Algorithms and ProgrammingSoftware > Quantum Chemistry 
    more » « less
  2. Abstract Many have argued that datasets resulting from scientific research should be part of the scholarly record as first class research products. Data sharing mandates from funding agencies and scientific journal publishers along with calls from the scientific community to better support transparency and reproducibility of scientific research have increased demand for tools and support for publishing datasets. Hydrology domain‐specific data publication services have been developed alongside more general purpose and even commercial data repositories. Prominent among these are the Hydrologic Information System (HIS) and HydroShare repositories developed by the Consortium of Universities for the Advancement of Hydrologic Science, Inc. (CUAHSI). More broadly, however, multiple organizations have been involved in the practice of data publication in the hydrology domain, each having different roles that have shaped data publication and reuse. Bibliographic and archival approaches to data publication have been advanced, but both have limitations with respect to hydrologic data. Specific recommendations for improving data publication infrastructure, support, and practices to move beyond existing limitations and enable more effective data publication in support of scientific research in the hydrology domain include: improving support for journal article‐based data access and data citation, considering the workflow for data publication, enhancing support for reproducible science, encouraging publication of curated reference data collections, advancing interoperability standards for sharing data and metadata among repositories, developing partnerships with university libraries offering data services, and developing more specific data management plans. While presented in the context of CUAHSI's data repositories and experience, these recommendations are broadly applicable to other domains. This article is categorized under:Science of Water > Methods 
    more » « less
  3. Abstract With the rise of data volume and computing power, seismological research requires more advanced skills in data processing, numerical methods, and parallel computing. We present the experience of conducting training workshops in various forms of delivery to support the adoption of large-scale high-performance computing (HPC) and cloud computing, advancing seismological research. The seismological foci were on earthquake source parameter estimation in catalogs, forward and adjoint wavefield simulations in 2D and 3D at local, regional, and global scales, earthquake dynamics, ambient noise seismology, and machine learning. This contribution describes the series of workshops delivered as part of research projects, the learning outcomes for participants, and lessons learned by the instructors. Our curriculum was grounded on open and reproducible science, large-scale scientific computing and data mining, and computing infrastructure (access and usage) for HPC and the cloud. We also describe the types of teaching materials that have proven beneficial to the instruction and the sustainability of the program. We propose guidelines to deliver future workshops on these topics. 
    more » « less
  4. Abstract Open science and open data within scholarly research programs are growing both in popularity and by requirement from grant funding agencies and journal publishers. A central component of open data management, especially on collaborative, multidisciplinary, and multi-institutional science projects, is documentation of complete and accurate metadata, workflow, and source code in addition to access to raw data and data products to uphold FAIR (Findable, Accessible, Interoperable, Reusable) principles. Although best practice in data/metadata management is to use established internationally accepted metadata schemata, many of these standards are discipline-specific making it difficult to catalog multidisciplinary data and data products in a way that is easily findable and accessible. Consequently, scattered and incompatible metadata records create a barrier to scientific innovation, as researchers are burdened to find and link multidisciplinary datasets. One possible solution to increase data findability, accessibility, interoperability, reproducibility, and integrity within multi-institutional and interdisciplinary projects is a centralized and integrated data management platform. Overall, this type of interoperable framework supports reproducible open science and its dissemination to various stakeholders and the public in a FAIR manner by providing direct access to raw data and linking protocols, metadata and supporting workflow materials. 
    more » « less
  5. Abstract Much attention in constructionism has focused on designing tools and activities that support learners in designing fully finished and functional applications and artefacts to be shared with others. But helping students learn to debug their applications often takes on a surprisingly more instructionist stance by giving them checklists, teaching them strategies or providing them with test programmes. The idea of designing bugs for learning—ordebugging by design—makes learners agents of their own learning and, more importantly, of making and solving mistakes. In this paper, we report on our implementation of ‘Debugging by Design’ activities in a high school classroom over a period of 8 hours as part of an electronic textiles unit. Students were tasked to craft the electronic textile artefacts with problems or bugs for their peers to solve. Drawing on observations and interviews, we answer the following research questions: (1) How did students participate in making bugs for others? (2) What did students gain from designing and solving bugs for others? In the discussion, we address the opportunities and challenges that designing personally and socially meaningful failure artefacts provides for becoming objects‐to‐think‐with and objects‐to‐share‐with in student learning and promoting new directions in constructionism. Practitioner notesWhat is already known about this topicThere is substantial evidence for the benefits of learning programming and debugging in the context of constructing personally relevant and complex artefacts, including electronic textiles.Related, work on productive failure has demonstrated that providing learners with strategically difficult problems (in which they ‘fail’) equips them to better handle subsequent challenges.What this paper addsIn this paper, we argue that designing bugs or ‘failure artefacts’ is as much a constructionist approach to learning as is designing fully functional artefacts.We consider how ‘failure artefacts’ can be both objects‐to‐learn‐with and objects‐to‐share‐with.We introduce the concept of ‘Debugging by Design’ (DbD) as a means to expand application of constructionism to the context of developing ‘failure artifacts’.Implications for practice and/or policyWe conceptualise a new way to enable and empower students in debugging—by designing creative, multimodal buggy projects for others to solve.The DbD approach may support students in near‐transfer of debugging and the beginning of a more systematic approach to debugging in later projects and should be explored in other domains beyond e‐textiles.New studies should explore learning, design and teaching that empower students to design bugs in projects in mischievous and creative ways. 
    more » « less