Modern science depends on computers, but not all scientists have access to the scale of computation they need. A digital divide separates scientists who accelerate their science using large cyberinfrastructure from those who do not, or who do not have access to the compute resources or learning opportunities to develop the skills needed. The exclusionary nature of the digital divide threatens equity and the future of innovation by leaving people out of the scientific process while over-amplifying the voices of a small group who have resources. However, there are potential solutions: recent advancements in public research cyberinfrastructure and resources developed during the open science revolution are providing tools that can help bridge this divide. These tools can enable access to fast and powerful computation with modest internet connections and personal computers. Here we contribute another resource for narrowing the digital divide: scalable virtual machines running on public cloud infrastructure. We describe the tools, infrastructure, and methods that enabled successful deployment of a reproducible and scalable cyberinfrastructure architecture for a collaborative data synthesis working group in February 2023. This platform enabled 45 scientists with varying data and compute skills to leverage 40,000 hours of compute time over a 4-day workshop. Our approach provides an open framework that can be replicated for educational and collaborative data synthesis experiences in any data- and compute-intensive discipline.
more »
« less
Biodiversity at the global scale: the synthesis continues
Traditionally, the generation and use of biodiversity data and their associated specimen objects have been primarily the purview of individuals and small research groups. While deposition of data and specimens in herbaria and other repositories has long been the norm, throughout most of their history, these resources have been accessible only to a small community of specialists. Through recent concerted efforts, primarily at the level of national and international governmental agencies over the last two decades, the pace of biodiversity data accumulation has accelerated, and a wider array of biodiversity scientists has gained access to this massive accumulation of resources, applying them to an ever‐widening compass of research pursuits. We review how these new resources and increasing access to them are affecting the landscape of biodiversity research in plants today, focusing on new applications across evolution, ecology, and other fields that have been enabled specifically by the availability of these data and the global scope that was previously beyond the reach of individual investigators. We give an overview of recent advances organized along three lines: broad‐scale analyses of distributional data and spatial information, phylogenetic research circumscribing large clades with comprehensive taxon sampling, and data sets derived from improved accessibility of biodiversity literature. We also review synergies between large data resources and more traditional data collection paradigms, describe shortfalls and how to overcome them, and reflect on the future of plant biodiversity analyses in light of increasing linkages between data types and scientists in our field.
more »
« less
- Award ID(s):
- 1916632
- PAR ID:
- 10389002
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- American Journal of Botany
- Volume:
- 108
- Issue:
- 6
- ISSN:
- 0002-9122
- Format(s):
- Medium: X Size: p. 912-924
- Size(s):
- p. 912-924
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Online community and citizen science (CCS) projects have broadened access to scientific research and enabled different forms of participation in biodiversity research; however, little is known about whether and how such opportunities are taken up by young people (aged 5–19). Furthermore, when they do participate, there is little research on whether their online activity makes a tangible contribution to scientific research. We addressed these knowledge gaps using quantitative analytical approaches and visualisations to investigate 249 youths’ contributions to CCS on the iNaturalist platform, and the potential for the scientific use of their contributions. We found that nearly all the young volunteers’ observations were ‘verifiable’ (included a photo, location, and date/time) and therefore potentially useful to biodiversity research. Furthermore, more than half were designated as ‘Research Grade’, with a community agreed-upon identification, making them more valuable and accessible to biodiversity science researchers. Our findings show that young volunteers with lasting participation on the platform and those aged 16–19 years are more likely to have a higher proportion of Research Grade observations than younger, or more ephemeral participants. This study enhances our understanding of young volunteers’ contributions to biodiversity research, as well as the important role professional scientists and data users can play in helping verify youths’ contributions to make them more accessible for biodiversity research.more » « less
-
Public genomic datasets like the 1000 Genomes project (1KGP), Human Genome Diversity Project (HGDP), and the Adolescent Brain Cognitive Development (ABCD) study are valuable public resources that facilitate scientific advancements in biology and enhance the scientific and economic impact of federally funded research projects. Regrettably, these datasets have often been developed and studied in ways that propagate outdated racialized and typological thinking, leading to fallacious reasoning among some readers that social and health disparities among the so-called races are due in part to innate biological differences between them. We highlight how this framing has set the stage for the racist exploitation of these datasets in two ways: First, we discuss the use of public biomedical datasets in studies that claim support for innate genetic differences in intelligence and other social outcomes between the groups identified as races. We further highlight recent instances of this which involve unauthorized access, use, and dissemination of public datasets. Second, we discuss thememification,use of simple figures meant for quick dissemination among lay audiences, of population genetic data to argue for a biological basis for purported human racial groups. We close with recommendations for scientists, to preempt the exploitation and misuse of their data, and for funding agencies, to better enforce violations of data use agreements.more » « less
-
Abstract To fulfill their conservation potential and provide safeguards for biodiversity, marine protected areas (MPAs) need coordinated research and monitoring for informed management through effective evaluation of ecosystem dynamics. However, coordination is challenging, often due to knowledge gaps caused by inadequate access to data and resources, compounded by insufficient communication between scientists and managers. We propose to use the world's largest MPA in the Ross Sea, Antarctica as a model system to create a comprehensive framework for an interdisciplinary network supporting research and monitoring that could be implemented in other remote large‐scale international MPAs. Our proposed framework has three key components: (i) policy engagement, including delineation of policy needs and ecosystem metrics to assess MPA effectiveness; (ii) community partner engagement to elevate diverse voices, build trust, and share resources; and (iii) integrated science comprising three themes. These themes are: advancement of data science and cyberinfrastructure to facilitate data synthesis and sharing; biophysical modeling towards understanding ecosystem changes and uncertainties; and execution of observational and process studies to address uncertainties and evaluate ecosystem metrics. This proposed framework can improve MPA implementation by generating policy‐relevant science through this coordinated network, which can in turn improve MPA effectiveness in the Ross Sea and beyond.more » « less
-
The wide array of currently available genomes displays a wonderful diversity in size, composition, and structure and is quickly expanding thanks to several global biodiversity genomics initiatives. However, sequencing of genomes, even with the latest technologies, can still be challenging for both technical (e.g., small physical size, contaminated samples, or access to appropriate sequencing platforms) and biological reasons (e.g., germline-restricted DNA, variable ploidy levels, sex chromosomes, or very large genomes). In recent years,k-mer-based techniques have become popular to overcome some of these challenges. They are based on the simple process of dividing the analyzed sequences (e.g., raw reads or genomes) into a set of subsequences of lengthk, calledk-mers, and then analyzing the frequency or sequences of thosek-mers. Analyses based onk-mers allow for a rapid and intuitive assessment of complex sequencing data sets. Here, we provide a comprehensive review to the theoretical properties and practical applications ofk-mers in biodiversity genomics with a special focus on genome modeling.more » « less
An official website of the United States government
