skip to main content

This content will become publicly available on December 1, 2022

Title: Representation and participation across 20 years of plant genome sequencing
Abstract The field of plant genome sequencing has grown rapidly in the past 20 years, leading to increases in the quantity and quality of publicly available genomic resources. The growing wealth of genomic data from an increasingly diverse set of taxa provides unprecedented potential to better understand the genome biology and evolution of land plants. Here we provide a contemporary view of land plant genomics, including analyses on assembly quality, taxonomic distribution of sequenced species and national participation. We show that assembly quality has increased dramatically in recent years, that substantial taxonomic gaps exist and that the field has been dominated by affluent nations in the Global North and China, despite a wide geographic distribution of study species. We identify numerous disconnects between the native range of focal species and the national affiliation of the researchers studying them, which we argue are rooted in colonialism—both past and present. Luckily, falling sequencing costs, widening availability of analytical tools and an increasingly connected scientific community provide key opportunities to improve existing assemblies, fill sampling gaps and empower a more global plant genomics community.
Authors:
; ; ;
Award ID(s):
1906094 1906015
Publication Date:
NSF-PAR ID:
10332278
Journal Name:
Nature Plants
Volume:
7
Issue:
12
Page Range or eLocation-ID:
1571 to 1578
ISSN:
2055-0278
Sponsoring Org:
National Science Foundation
More Like this
  1. In less than 25 y, the field of animal genome science has transformed from a discipline seeking its first glimpses into genome sequences across the Tree of Life to a global enterprise with ambitions to sequence genomes for all of Earth’s eukaryotic diversity [H. A. Lewin et al. , Proc. Natl. Acad. Sci. U.S.A. 115, 4325–4333 (2018)]. As the field rapidly moves forward, it is important to take stock of the progress that has been made to best inform the discipline’s future. In this Perspective, we provide a contemporary, quantitative overview of animal genome sequencing. We identified the best available genome assemblies in GenBank, the world’s most extensive genetic database, for 3,278 unique animal species across 24 phyla. We assessed taxonomic representation, assembly quality, and annotation status for major clades. We show that while tremendous taxonomic progress has occurred, stark disparities in genomic representation exist, highlighted by a systemic overrepresentation of vertebrates and underrepresentation of arthropods. In terms of assembly quality, long-read sequencing has dramatically improved contiguity, whereas gene annotations are available for just 34.3% of taxa. Furthermore, we show that animal genome science has diversified in recent years with an ever-expanding pool of researchers participating. However, the field still appearsmore »to be dominated by institutions in the Global North, which have been listed as the submitting institution for 77% of all assemblies. We conclude by offering recommendations for improving genomic resource availability and research value while also broadening global representation.« less
  2. Hoffmann, Federico (Ed.)
    Abstract The first insect genome assembly (Drosophila melanogaster) was published two decades ago. Today, nuclear genome assemblies are available for a staggering 601 insect species representing 20 orders. In this study, we analyzed the most-contiguous assembly for each species and provide a “state-of-the-field” perspective, emphasizing taxonomic representation, assembly quality, gene completeness, and sequencing technologies. Relative to species richness, genomic efforts have been biased toward four orders (Diptera, Hymenoptera, Collembola, and Phasmatodea), Coleoptera are underrepresented, and 11 orders still lack a publicly available genome assembly. The average insect genome assembly is 439.2 Mb in length with 87.5% of single-copy benchmarking genes intact. Most notable has been the impact of long-read sequencing; assemblies that incorporate long reads are ∼48× more contiguous than those that do not. We offer four recommendations as we collectively continue building insect genome resources: 1) seek better integration between independent research groups and consortia, 2) balance future sampling between filling taxonomic gaps and generating data for targeted questions, 3) take advantage of long-read sequencing technologies, and 4) expand and improve gene annotations.
  3. Oliveira, Pedro L. (Ed.)
    Scientific collections such as the U.S. National Museum (USNM) are critical to filling knowledge gaps in molecular systematics studies. The global taxonomic impediment has resulted in a reduction of expert taxonomists generating new collections of rare or understudied taxa and these large historic collections may be the only reliable source of material for some taxa. Integrated systematics studies using both morphological examinations and DNA sequencing are often required for resolving many taxonomic issues but as DNA methods often require partial or complete destruction of a sample, there are many factors to consider before implementing destructive sampling of specimens within scientific collections. We present a methodology for the use of archive specimens that includes two crucial phases: 1) thoroughly documenting specimens destined for destructive sampling—a process called electronic vouchering, and 2) the pipeline used for whole genome sequencing of archived specimens, from extraction of genomic DNA to assembly of putative genomes with basic annotation. The process is presented for eleven specimens from two different insect subfamilies of medical importance to humans: Anophelinae (Diptera: Culicidae)—mosquitoes and Triatominae (Hemiptera: Reduviidae)—kissing bugs. Assembly of whole mitochondrial genome sequences of all 11 specimens along with the results of an ortholog search and BLAST against themore »NCBI nucleotide database are also presented.« less
  4. Background High-quality genomic resources facilitate investigations into behavioral ecology, morphological and physiological adaptations, and the evolution of genomic architecture. Lizards in the genus Sceloporus have a long history as important ecological, evolutionary, and physiological models, making them a valuable target for the development of genomic resources. Findings We present a high-quality chromosome-level reference genome assembly, SceUnd1.0 (using 10X Genomics Chromium, HiC, and Pacific Biosciences data), and tissue/developmental stage transcriptomes for the eastern fence lizard, Sceloporus undulatus. We performed synteny analysis with other snake and lizard assemblies to identify broad patterns of chromosome evolution including the fusion of micro- and macrochromosomes. We also used this new assembly to provide improved reference-based genome assemblies for 34 additional Sceloporus species. Finally, we used RNAseq and whole-genome resequencing data to compare 3 assemblies, each representing an increased level of cost and effort: Supernova Assembly with data from 10X Genomics Chromium, HiRise Assembly that added data from HiC, and PBJelly Assembly that added data from Pacific Biosciences sequencing. We found that the Supernova Assembly contained the full genome and was a suitable reference for RNAseq and single-nucleotide polymorphism calling, but the chromosome-level scaffolds provided by the addition of HiC data allowed synteny and whole-genome associationmore »mapping analyses. The subsequent addition of PacBio data doubled the contig N50 but provided negligible gains in scaffold length. Conclusions These new genomic resources provide valuable tools for advanced molecular analysis of an organism that has become a model in physiology and evolutionary ecology.« less
  5. ABSTRACT Diversification can generate genomic and phenotypic strain-level diversity within microbial species. This microdiversity is widely recognized in populations, but the community-level consequences of microbial strain-level diversity are poorly characterized. Using the cheese rind model system, we tested whether strain diversity across microbiomes from distinct geographic regions impacts assembly dynamics and functional outputs. We first isolated the same three bacterial species ( Staphylococcus equorum , Brevibacterium auranticum , and Brachybacterium alimentarium ) from nine cheeses produced in different regions of the United States and Europe to construct nine synthetic microbial communities consisting of distinct strains of the same three bacterial species. Comparative genomics identified distinct phylogenetic clusters and significant variation in genome content across the nine synthetic communities. When we assembled each synthetic community with initially identical compositions, community structure diverged over time, resulting in communities with different dominant taxa. The taxonomically identical communities showed differing responses to abiotic (high salt) and biotic (the fungus Penicillium ) perturbations, with some communities showing no response and others substantially shifting in composition. Functional differences were also observed across the nine communities, with significant variation in pigment production (light yellow to orange) and in composition of volatile organic compound profiles emitted from themore »rinds (nutty to sulfury). IMPORTANCE Our work demonstrated that the specific microbial strains used to construct a microbiome could impact the species composition, perturbation responses, and functional outputs of that system. These findings suggest that 16S rRNA gene taxonomic profiles alone may have limited potential to predict the dynamics of microbial communities because they usually do not capture strain-level diversity. Observations from our synthetic communities also suggest that strain-level diversity has the potential to drive variability in the aesthetics and quality of surface-ripened cheeses.« less