skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A pangenomic approach reveals the sources of genetic variation fueling the rapid radiation of Capuchino Seedeaters
Abstract The search for the genetic basis of phenotypes has primarily focused on single nucleotide polymorphisms, often overlooking structural variants (SVs). SVs can significantly affect gene function, but detecting and characterizing them is challenging, even with long-read sequencing. Moreover, traditional single-reference methods can fail to capture many genetic variants. Using long reads, we generated a Capuchino Seedeater (Sporophila) pangenome, including 16 individuals from 7 species, to investigate how SVs contribute to species and coloration differences. Leveraging this pangenome, we mapped short-read data from 127 individuals, genotyped variants identified in the pangenome graph, and subsequently performed FST scans and genome-wide association studies. Species divergence primarily arises from SNPs and indels (< 50 bp) in non-coding regions of melanin-related genes, as larger SVs rarely overlap with divergence peaks. One exception was a 55 bp deletion near the OCA2 and HERC2 genes, associated with feather pheomelanin content. These findings support the hypothesis that the reshuffling of small regulatory alleles, rather than larger species-specific mutations, accelerated plumage evolution leading to prezygotic isolation in Capuchinos.  more » « less
Award ID(s):
2232929
PAR ID:
10646297
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Editor(s):
Mérot, Claire; Connallon, Tim
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Evolution
ISSN:
0014-3820
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Despite receiving significant recent attention, the relevance of structural variation (SV) in driving phenotypic diversity remains understudied, although recent advances in long‐read sequencing, bioinformatics and pangenomic approaches have enhanced SV detection. We review the role of SVs in shaping phenotypes in avian model systems, and identify some general patterns in SV type, length and their associated traits. We found that most of the avian SVs so far identified are short indels in chickens, which are frequently associated with changes in body weight and plumage colouration. Overall, we found that relatively short SVs are more frequently detected, likely due to a combination of their prevalence compared to large SVs, and a detection bias, stemming primarily from the widespread use of short‐read sequencing and associated analytical methods. SVs most commonly involve non‐coding regions, especially introns, and when patterns of inheritance were reported, SVs associated primarily with dominant discrete traits. We summarise several examples of phenotypic convergence across different species, mediated by different SVs in the same or different genes and different types of changes in the same gene that can lead to various phenotypes. Complex rearrangements and supergenes, which can simultaneously affect and link several genes, tend to have pleiotropic phenotypic effects. Additionally, SVs commonly co‐occur with single‐nucleotide polymorphisms, highlighting the need to consider all types of genetic changes to understand the basis of phenotypic traits. We end by summarising expectations for when long‐read technologies become commonly implemented in non‐model birds, likely leading to an increase in SV discovery and characterisation. The growing interest in this subject suggests an increase in our understanding of the phenotypic effects of SVs in upcoming years. 
    more » « less
  2. Purugganan, Michael (Ed.)
    Abstract Structural variants (SVs) are a largely unstudied feature of plant genome evolution, despite the fact that SVs contribute substantially to phenotypes. In this study, we discovered SVs across a population sample of 347 high-coverage, resequenced genomes of Asian rice (Oryza sativa) and its wild ancestor (O. rufipogon). In addition to this short-read data set, we also inferred SVs from whole-genome assemblies and long-read data. Comparisons among data sets revealed different features of genome variability. For example, genome alignment identified a large (∼4.3 Mb) inversion in indica rice varieties relative to japonica varieties, and long-read analyses suggest that ∼9% of genes from the outgroup (O. longistaminata) are hemizygous. We focused, however, on the resequencing sample to investigate the population genomics of SVs. Clustering analyses with SVs recapitulated the rice cultivar groups that were also inferred from SNPs. However, the site-frequency spectrum of each SV type—which included inversions, duplications, deletions, translocations, and mobile element insertions—was skewed toward lower frequency variants than synonymous SNPs, suggesting that SVs may be predominantly deleterious. Among transposable elements, SINE and mariner insertions were found at especially low frequency. We also used SVs to study domestication by contrasting between rice and O. rufipogon. Cultivated genomes contained ∼25% more derived SVs and mobile element insertions than O. rufipogon, indicating that SVs contribute to the cost of domestication in rice. Peaks of SV divergence were enriched for known domestication genes, but we also detected hundreds of genes gained and lost during domestication, some of which were enriched for traits of agronomic interest. 
    more » « less
  3. Abstract Motivation Variation graph representations are projected to either replace or supplement conventional single genome references due to their ability to capture population genetic diversity and reduce reference bias. Vast catalogues of genetic variants for many species now exist, and it is natural to ask which among these are crucial to circumvent reference bias during read mapping. Results In this work, we propose a novel mathematical framework for variant selection, by casting it in terms of minimizing variation graph size subject to preserving paths of length α with at most δ differences. This framework leads to a rich set of problems based on the types of variants [e.g. single nucleotide polymorphisms (SNPs), indels or structural variants (SVs)], and whether the goal is to minimize the number of positions at which variants are listed or to minimize the total number of variants listed. We classify the computational complexity of these problems and provide efficient algorithms along with their software implementation when feasible. We empirically evaluate the magnitude of graph reduction achieved in human chromosome variation graphs using multiple α and δ parameter values corresponding to short and long-read resequencing characteristics. When our algorithm is run with parameter settings amenable to long-read mapping (α = 10 kbp, δ = 1000), 99.99% SNPs and 73% SVs can be safely excluded from human chromosome 1 variation graph. The graph size reduction can benefit downstream pan-genome analysis. Availability and implementation https://github.com/AT-CG/VF. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  4. ABSTRACT Comprehensively identifying the loci shaping trait variation has been challenging, in part because standard approaches often miss many types of genetic variants. Structural variants (SVs), especially transposable elements (TEs), are likely to affect phenotypic variation but we lack methods that can detect polymorphic SVs and TEs using short‐read sequencing data. Here, we used a whole genome alignment between two maize genotypes to identify polymorphic SVs and then genotyped a large maize diversity panel for these variants using short‐read sequencing data. After characterising SV variation in the panel, we identified SV polymorphisms that are associated with life history traits and genotype‐by‐environment (GxE) interactions. While most of the SVs associated with traits contained TEs, only two of the SVs had boundaries that clearly matched TE breakpoints indicative of a TE insertion, while the other polymorphisms were likely caused by deletions. One of the SVs that appeared to be caused by a TE insertion had the most associations with gene expression compared to other trait‐associated SVs. All of the SVs associated with traits were in linkage disequilibrium with nearby single nucleotide polymorphisms (SNPs), suggesting that the approach used here did not identify unique associations that would have been missed in a SNP association study. Overall, we have (1) created a technique to genotype SV polymorphisms across a large diversity panel using support from genomic short‐read sequencing alignments and (2) connected this presence/absence SV variation to diverse traits and GxE interactions. 
    more » « less
  5. Abstract Multi‐locus sequence data are widely used in fungal systematic and taxonomic studies to delimit species and infer evolutionary relationships. We developed and assessed the efficacy of a multi‐locus pooled sequencing method using PacBio long‐read high‐throughput sequencing. Samples included fresh and dried voucher specimens, cultures and archival DNA extracts of Agaricomycetes with an emphasis on the order Cantharellales. Of the 283 specimens sequenced, 93.6% successfully amplified at one or more loci with a mean of 3.3 loci amplified. Our method recovered multiple sequence variants representing alleles of rDNA loci and single copy protein‐coding genesrpb1,rpb2 andtef1. Within‐sample genetic variation differed by locus and taxonomic group, with the greatest genetic divergence observed among sequence variants ofrpb2 andtef1 from corticioid Cantharellales. Our method is a cost‐effective approach for generating accurate multi‐locus sequence data coupled with recovery of alleles from polymorphic samples and multi‐organism specimens. These results have important implications for understanding intra‐individual genomic variation among genetic loci commonly used in species delimitation of fungi. 
    more » « less