skip to main content


Title: Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing
Abstract

Advancing crop genomics requires efficient genetic systems enabled by high-quality personalized genome assemblies. Here, we introduce RagTag, a toolset for automating assembly scaffolding and patching, and we establish chromosome-scale reference genomes for the widely used tomato genotype M82 along with Sweet-100, a new rapid-cycling genotype that we developed to accelerate functional genomics and genome editing in tomato. This work outlines strategies to rapidly expand genetic systems and genomic resources in other plant species.

 
more » « less
Award ID(s):
1627442 2216612 1920103 1732253
NSF-PAR ID:
10385598
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Genome Biology
Volume:
23
Issue:
1
ISSN:
1474-760X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Low‐coverage whole genome sequencing (lcWGS) has emerged as a powerful and cost‐effective approach for population genomic studies in both model and nonmodel species. However, with read depths too low to confidently call individual genotypes, lcWGS requires specialized analysis tools that explicitly account for genotype uncertainty. A growing number of such tools have become available, but it can be difficult to get an overview of what types of analyses can be performed reliably with lcWGS data, and how the distribution of sequencing effort between the number of samples analysed and per‐sample sequencing depths affects inference accuracy. In this introductory guide to lcWGS, we first illustrate how the per‐sample cost for lcWGS is now comparable to RAD‐seq and Pool‐seq in many systems. We then provide an overview of software packages that explicitly account for genotype uncertainty in different types of population genomic inference. Next, we use both simulated and empirical data to assess the accuracy of allele frequency, genetic diversity, and linkage disequilibrium estimation, detection of population structure, and selection scans under different sequencing strategies. Our results show that spreading a given amount of sequencing effort across more samples with lower depth per sample consistently improves the accuracy of most types of inference, with a few notable exceptions. Finally, we assess the potential for using imputation to bolster inference from lcWGS data in nonmodel species, and discuss current limitations and future perspectives for lcWGS‐based population genomics research. With this overview, we hope to make lcWGS more approachable and stimulate its broader adoption.

     
    more » « less
  2. Summary

    Maize (Zea maysL.), a model species for genetic studies, is one of the two most important crop species worldwide. The genome sequence of the reference genotype, B73, representative of the stiff stalk heterotic group was recently updated (AGPv4) using long‐read sequencing and optical mapping technology. To facilitate the use ofAGPv4 and to enable functional genomic studies and association of genotype with phenotype, we determined expression abundances for replicatedmRNA‐sequencing datasets from 79 tissues and five abiotic/biotic stress treatments revealing 36 207 expressed genes. Characterization of the B73 transcriptome across six organs revealed 4154 organ‐specific and 7704 differentially expressed (DE) genes following stress treatment. Gene co‐expression network analyses revealed 12 modules associated with distinct biological processes containing 13 590 genes providing a resource for further association of gene function based on co‐expression patterns. Presence−absence variants (PAVs) previously identified using whole genome resequencing data from 61 additional inbred lines were enriched in organ‐specific and stress‐induced DE genes suggesting thatPAVs may function in phenological variation and adaptation to environment. Relative to core genes conserved across the 62 profiled inbreds,PAVs have lower expression abundances which are correlated with their frequency of dispersion across inbreds and on average have significantly fewer co‐expression network connections suggesting that a subset ofPAVs may be on an evolutionary path to pseudogenization. To facilitate use by the community, we developed the Maize Genomics Resource website (maize.plantbiology.msu.edu) for viewing and data‐mining these resources and deployed two new views on the maize electronic Fluorescent Pictograph Browser (bar.utoronto.ca/efp_maize).

     
    more » « less
  3. Abstract

    Infections by maternally inherited bacterial endosymbionts, especially Wolbachia, are common in insects and other invertebrates but infection dynamics across species ranges are largely under studied. Specifically, we lack a broad understanding of the origin of Wolbachia infections in novel hosts, and the historical and geographical dynamics of infections that are critical for identifying the factors governing their spread. We used Genotype-by-Sequencing data from previous population genomics studies for range-wide surveys of Wolbachia presence and genetic diversity in North American butterflies of the genus Lycaeides. As few as one sequence read identified by assembly to a Wolbachia reference genome provided high accuracy in detecting infections in host butterflies as determined by confirmatory PCR tests, and maximum accuracy was achieved with a threshold of only 5 sequence reads per host individual. Using this threshold, we detected Wolbachia in all but 2 of the 107 sampling localities spanning the continent, with infection frequencies within populations ranging from 0% to 100% of individuals, but with most localities having high infection frequencies (mean = 91% infection rate). Three major lineages of Wolbachia were identified as separate strains that appear to represent 3 separate invasions of Lycaeides butterflies by Wolbachia. Overall, we found extensive evidence for acquisition of Wolbachia through interspecific transfer between host lineages. Strain wLycC was confined to a single butterfly taxon, hybrid lineages derived from it, and closely adjacent populations in other taxa. While the other 2 strains were detected throughout the rest of the continent, strain wLycB almost always co-occurred with wLycA. Our demographic modeling suggests wLycB is a recent invasion. Within strain wLycA, the 2 most frequent haplotypes are confined almost exclusively to separate butterfly taxa with haplotype A1 observed largely in Lycaeides melissa and haplotype A2 observed most often in Lycaeides idas localities, consistent with either cladogenic mode of infection acquisition from a common ancestor or by hybridization and accompanying mutation. More than 1 major Wolbachia strain was observed in 15 localities. These results demonstrate the utility of using resequencing data from hosts to quantify Wolbachia genetic variation and infection frequency and provide evidence of multiple colonizations of novel hosts through hybridization between butterfly lineages and complex dynamics between Wolbachia strains.

     
    more » « less
  4. Abstract

    Heritable, facultative symbionts are common in arthropods, often functioning in host defence. Despite moderately reduced genomes, facultative symbionts retain evolutionary potential through mobile genetic elements (MGEs). MGEs form the primary basis of strain‐level variation in genome content and architecture, and often correlate with variability in symbiont‐mediated phenotypes. In pea aphids (Acyrthosiphon pisum), strain‐level variation in the type of toxin‐encoding bacteriophages (APSEs) carried by the bacteriumHamiltonella defensacorrelates with strength of defence against parasitoids. However, co‐inheritance creates difficulties for partitioning their relative contributions to aphid defence. Here we identified isolates ofH. defensathat were nearly identical except for APSE type. When holdingH. defensagenotype constant, protection levels corresponded to APSE virulence module type. Results further indicated that APSEs move repeatedly within someH. defensaclades providing a mechanism for rapid evolution in anti‐parasitoid defences. Strain variation inH. defensaalso correlates with the presence of a second symbiontFukatsuia symbiotica. Predictions that nutritional interactions structured this coinfection were not supported by comparative genomics, but bacteriocin‐containing plasmids unique to co‐infecting strains may contribute to their common pairing. In conclusion, strain diversity, and joint capacities for horizontal transfer of MGEs and symbionts, are emergent players in the rapid evolution of arthropods.

     
    more » « less
  5. Abstract

    An increasing number of phylogenomic studies have documented a clear “footprint” of postspeciation introgression among closely related species. Nonetheless, systematic genome-wide studies of factors that determine the likelihood of introgression remain rare. Here, we propose an a priori hypothesis-testing framework that uses introgression statistics—including a new metric of estimated introgression, Dp—to evaluate general patterns of introgression prevalence and direction across multiple closely related species. We demonstrate this approach using whole genome sequences from 32 lineages in 11 wild tomato species to assess the effect of three factors on introgression—genetic relatedness, geographical proximity, and mating system differences—based on multiple trios within the “ABBA–BABA” test. Our analyses suggest each factor affects the prevalence of introgression, although our power to detect these is limited by the number of comparisons currently available. We find that of 14 species pairs with geographically “proximate” versus “distant” population comparisons, 13 showed evidence of introgression; in 10 of these cases, this was more prevalent between geographically closer populations. We also find modest evidence that introgression declines with increasing genetic divergence between lineages, is more prevalent between lineages that share the same mating system, and—when it does occur between mating systems—tends to involve gene flow from more inbreeding to more outbreeding lineages. Although our analysis indicates that recent postspeciation introgression is frequent in this group—detected in 15 of 17 tested trios—estimated levels of genetic exchange are modest (0.2–2.5% of the genome), so the relative importance of hybridization in shaping the evolutionary trajectories of these species could be limited. Regardless, similar clade-wide analyses of genomic introgression would be valuable for disentangling the major ecological, reproductive, and historical determinants of postspeciation gene flow, and for assessing the relative contribution of introgression as a source of genetic variation.

     
    more » « less