Abstract BackgroundScientists have amassed a wealth of microbiome datasets, making it possible to study microbes in biotic and abiotic systems on a population or planetary scale; however, this potential has not been fully realized given that the tools, datasets, and computation are available in diverse repositories and locations. To address this challenge, we developed iMicrobe.us, a community-driven microbiome data marketplace and tool exchange for users to integrate their own data and tools with those from the broader community. FindingsThe iMicrobe platform brings together analysis tools and microbiome datasets by leveraging National Science Foundation–supported cyberinfrastructure and computing resources from CyVerse, Agave, and XSEDE. The primary purpose of iMicrobe is to provide users with a freely available, web-based platform to (1) maintain and share project data, metadata, and analysis products, (2) search for related public datasets, and (3) use and publish bioinformatics tools that run on highly scalable computing resources. Analysis tools are implemented in containers that encapsulate complex software dependencies and run on freely available XSEDE resources via the Agave API, which can retrieve datasets from the CyVerse Data Store or any web-accessible location (e.g., FTP, HTTP). ConclusionsiMicrobe promotes data integration, sharing, and community-driven tool development by making open source data and tools accessible to the research community in a web-based platform.
more »
« less
Cyberinfrastructure deployments on public research clouds enable accessible Environmental Data Science education
Modern science depends on computers, but not all scientists have access to the scale of computation they need. A digital divide separates scientists who accelerate their science using large cyberinfrastructure from those who do not, or who do not have access to the compute resources or learning opportunities to develop the skills needed. The exclusionary nature of the digital divide threatens equity and the future of innovation by leaving people out of the scientific process while over-amplifying the voices of a small group who have resources. However, there are potential solutions: recent advancements in public research cyberinfrastructure and resources developed during the open science revolution are providing tools that can help bridge this divide. These tools can enable access to fast and powerful computation with modest internet connections and personal computers. Here we contribute another resource for narrowing the digital divide: scalable virtual machines running on public cloud infrastructure. We describe the tools, infrastructure, and methods that enabled successful deployment of a reproducible and scalable cyberinfrastructure architecture for a collaborative data synthesis working group in February 2023. This platform enabled 45 scientists with varying data and compute skills to leverage 40,000 hours of compute time over a 4-day workshop. Our approach provides an open framework that can be replicated for educational and collaborative data synthesis experiences in any data- and compute-intensive discipline.
more »
« less
- Award ID(s):
- 2017889
- PAR ID:
- 10481402
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9781450399852
- Page Range / eLocation ID:
- 367 to 373
- Format(s):
- Medium: X
- Location:
- Portland OR USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
With the growing availability and accessibility of big data in ecology, we face an urgent need to train the next generation of scientists in data science practices and tools. One of the biggest barriers for implementing a data-driven curriculum in undergraduate classrooms is the lack of training and support for educators to develop their own skills and time to incorporate these principles into existing courses or develop new ones. Alongside the research goals of the National Ecological Observatory Network (NEON), providing education and training are key components for building a community of scientists and users equipped to utilize large-scale ecological and environmental data. To address this need, the NEON Data Education Fellows program formed as a collaborative Faculty Mentoring Network (FMN) between scientists from NEON and university faculty interested in using NEON data and resources in their ecology classrooms. Like other FMNs, this group has two main goals: 1) to provide tools, resources, and support for faculty interested in developing data-driven curriculum, and (2) to make teaching materials that have been implemented and tested in the classroom available as open educational resources for other educators. We hosted this program using an open education and collaboration platform from the Quantitative Undergraduate Biology Education and Synthesis (QUBES) project. Here, we share lessons learned from facilitating five FMN cohorts and emphasize the successes, pitfalls, and opportunities for developing open education resources through community-driven collaborations.more » « less
-
With the increase in data-driven analytics, the demand for high performing computing resources has risen. There are many high-performance computing centers providing cyberinfrastructure (CI) for academic research. However, there exists access barriers in bringing these resources to a broad range of users. Users who are new to data analytics field are not yet equipped to take advantage of the tools offered by CI. In this paper, we propose a framework to lower the access barriers that exist in bringing the high-performance computing resources to users that do not have the training to utilize the capability of CI. The framework uses divide-and-conquer (DC) paradigm for data-intensive computing tasks. It consists of three major components - user interface (UI), parallel scripts generator (PSG) and underlying cyberinfrastructure (CI). The goal of the framework is to provide a user-friendly method for parallelizing data-intensive computing tasks with minimal user intervention. Some of the key design goals are usability, scalability and reproducibility. The users can focus on their problem and leave the parallelization details to the framework.more » « less
-
null (Ed.)Scientists are increasingly motivated to engage the public, particularly those who do not or cannot access traditional science education opportunities. Communication researchers have identified shortcomings of the deficit model approach, which assumes that skepticism toward science is based on a lack of information or scientific literacy, and encourage scientists to facilitate open-minded exchange with the public. We describe an ambassador approach, to develop a scientist’s impact identity, which integrates his or her research, personal interests and experiences to achieve societal impacts. The scientist identifies a community or focal group to engage, on the basis of his or her impact identity, learns about that group, and promotes inclusion of all group members by engaging in venues in which that group naturally gathers, rather than in traditional education settings. Focal group members stated that scientists communicated effectively and were responsive to participant questions and ideas. Scientists reported professional and personal benefits from this approach.more » « less
-
Geospatial research and education have become increasingly dependent on cyberGIS to tackle computation and data challenges. However, the use of advanced cyberinfrastructure resources for geospatial research and education is extremely challenging due to both high learning curve for users and high software development and integration costs for developers, due to limited availability of middleware tools available to make such resources easily accessible. This tutorial describes CyberGIS-Compute as a middleware framework that addresses these challenges and provides access to high-performance resources through simple easy to use interfaces. The CyberGIS-Compute framework provides an easy to use application interface and a Python SDK to provide access to CyberGIS capabilities, allowing geospatial applications to easily scale and employ advanced cyberinfrastructure resources. In this tutorial, we will first start with the basics of CyberGISJupyter and CyberGIS-Compute, then introduce the Python SDK for CyberGIS-Compute with a simple Hello World example. Then, we will take multiple real-world geospatial applications use-cases like spatial accessibility and wildfire evacuation simulation using agent based modeling. We will also provide pointers on how to contribute applications to the CyberGIS-Compute framework.more » « less