skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 10:00 PM ET on Friday, February 6 until 10:00 AM ET on Saturday, February 7 due to maintenance. We apologize for the inconvenience.


Search for: All records

Award ID contains: 1934384

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. ABSTRACT Comprehensively identifying the loci shaping trait variation has been challenging, in part because standard approaches often miss many types of genetic variants. Structural variants (SVs), especially transposable elements (TEs), are likely to affect phenotypic variation but we lack methods that can detect polymorphic SVs and TEs using short‐read sequencing data. Here, we used a whole genome alignment between two maize genotypes to identify polymorphic SVs and then genotyped a large maize diversity panel for these variants using short‐read sequencing data. After characterising SV variation in the panel, we identified SV polymorphisms that are associated with life history traits and genotype‐by‐environment (GxE) interactions. While most of the SVs associated with traits contained TEs, only two of the SVs had boundaries that clearly matched TE breakpoints indicative of a TE insertion, while the other polymorphisms were likely caused by deletions. One of the SVs that appeared to be caused by a TE insertion had the most associations with gene expression compared to other trait‐associated SVs. All of the SVs associated with traits were in linkage disequilibrium with nearby single nucleotide polymorphisms (SNPs), suggesting that the approach used here did not identify unique associations that would have been missed in a SNP association study. Overall, we have (1) created a technique to genotype SV polymorphisms across a large diversity panel using support from genomic short‐read sequencing alignments and (2) connected this presence/absence SV variation to diverse traits and GxE interactions. 
    more » « less
  2. Abstract Comprehensive maps of functional variation at transcription factor (TF) binding sites (cis-elements) are crucial for elucidating how genotype shapes phenotype. Here, we report the construction of a pan-cistrome of the maize leaf under well-watered and drought conditions. We quantified haplotype-specific TF footprints across a pan-genome of 25 maize hybrids and mapped over 200,000 variants, genetic, epigenetic, or both (termed binding quantitative trait loci (bQTL)), linked tocis-element occupancy. Three lines of evidence support the functional significance of bQTL: (1) coincidence with causative loci that regulate traits, includingvgt1,ZmTRE1and the MITE transposon nearZmNAC111under drought; (2) bQTL allelic bias is shared between inbred parents and matches chromatin immunoprecipitation sequencing results; and (3) partitioning genetic variation across genomic regions demonstrates that bQTL capture the majority of heritable trait variation across ~72% of 143 phenotypes. Our study provides an auspicious approach to make functionalcis-variation accessible at scale for genetic studies and targeted engineering of complex traits. 
    more » « less
  3. Abstract A key prediction of neutral theory is that the level of genetic diversity in a population should scale with population size. However, as was noted by Richard Lewontin in 1974 and reaffirmed by later studies, the slope of the population size-diversity relationship in nature is much weaker than expected under neutral theory. We hypothesize that one contributor to this paradox is that current methods relying on single nucleotide polymorphisms (SNPs) called from aligning short reads to a reference genome underestimate levels of genetic diversity in many species. As a first step to testing this idea, we calculated nucleotide diversity (π) and k-mer-based metrics of genetic diversity across 112 plant species, amounting to over 205 terabases of DNA sequencing data from 27,488 individuals. After excluding 14 species with low coverage or no variant sites called, we compared how different diversity metrics correlated with proxies of population size that account for both range size and population density variation across species. We found that our population size proxies scaled anywhere from about 3 to over 20 times faster with k-mer diversity than nucleotide diversity after adjusting for evolutionary history, mating system, life cycle habit, cultivation status, and invasiveness. The relationship between k-mer diversity and population size proxies also remains significant after correcting for genome size, whereas the analogous relationship for nucleotide diversity does not. These results are consistent with the possibility that variation not captured by common SNP-based analyses explains part of Lewontin’s paradox in plants, but larger scale pangenomic studies are needed to definitively address this question. 
    more » « less
  4. Harris, Kelley (Ed.)
    Abstract Many commonly studied species now have more than one chromosome-scale genome assembly, revealing a large amount of genetic diversity previously missed by approaches that map short reads to a single reference. However, many species still lack multiple reference genomes and correctly aligning references to build pangenomes can be challenging for many species, limiting our ability to study this missing genomic variation in population genetics. Here, we argue that k-mers are a very useful but underutilized tool for bridging the reference-focused paradigms of population genetics with the reference-free paradigms of pangenomics. We review current literature on the uses of k-mers for performing three core components of most population genetics analyses: identifying, measuring, and explaining patterns of genetic variation. We also demonstrate how different k-mer-based measures of genetic variation behave in population genetic simulations according to the choice of k, depth of sequencing coverage, and degree of data compression. Overall, we find that k-mer-based measures of genetic diversity scale consistently with pairwise nucleotide diversity (π) up to values of about π=0.025 (R2=0.97) for neutrally evolving populations. For populations with even more variation, using shorter k-mers will maintain the scalability up to at least π=0.1. Furthermore, in our simulated populations, k-mer dissimilarity values can be reliably approximated from counting bloom filters, highlighting a potential avenue to decreasing the memory burden of k-mer-based genomic dissimilarity analyses. For future studies, there is a great opportunity to further develop methods to identifying selected loci using k-mers. 
    more » « less
  5. Birchler, J (Ed.)
    Abstract The highly active family of Mutator (Mu) DNA transposons has been widely used for forward and reverse genetics in maize. There are examples of Mu-suppressible alleles that result in conditional phenotypic effects based on the activity of Mu. Phenotypes from these Mu-suppressible mutations are observed in Mu-active genetic backgrounds, but absent when Mu activity is lost. For some Mu-suppressible alleles, phenotypic suppression likely results from an outward-reading promoter within Mu that is only active when the autonomous Mu element is silenced or lost. We isolated 35 Mu alleles from the UniformMu population that represent insertions in 24 different genes. Most of these mutant alleles are due to insertions within gene coding sequences, but several 5′ UTR and intron insertions were included. RNA-seq and de novo transcript assembly were utilized to document the transcripts produced from 33 of these Mu insertion alleles. For 20 of the 33 alleles, there was evidence of transcripts initiating within the Mu sequence reading through the gene. This outward-reading promoter activity was detected in multiple types of Mu elements and does not depend on the orientation of Mu. Expression analyses of Mu-initiated transcripts revealed the Mu promoter often provides gene expression levels and patterns that are similar to the wild-type gene. These results suggest the Mu promoter may represent a minimal promoter that can respond to gene cis-regulatory elements. Findings from this study have implications for maize researchers using the UniformMu population, and more broadly highlight a strategy for transposons to co-exist with their host. 
    more » « less
  6. Much of the profound interspecific variation in genome content has been attributed to transposable elements (TEs). To explore the extent of TE variation within species, we developed an optimized open-source algorithm, panEDTA, to de novo annotate TEs in a pangenome context. We then generated a unified TE annotation for a maize pangenome derived from 26 reference-quality genomes, which reveals an excess of 35.1 Mb of TE sequences per genome in tropical maize relative to temperate maize. A small number (n= 216) of TE families, mainly LTR retrotransposons, drive these differences. Evidence from the methylome, transcriptome, LTR age distribution, and LTR insertional polymorphisms reveals that 64.7% of the variability is contributed by LTR families that are young, less methylated, and more expressed in tropical maize, whereas 18.5% is driven by LTR families with removal or loss in temperate maize. Additionally, we find enrichment for Young LTR families adjacent to nucleotide-binding and leucine-rich repeat (NLR) clusters of varying copy number across lines, suggesting TE activity may be associated with disease resistance in maize. 
    more » « less
  7. VITTE, Clémentine (Ed.)
    Structural differences between genomes are a major source of genetic variation that contributes to phenotypic differences. Transposable elements, mobile genetic sequences capable of increasing their copy number and propagating themselves within genomes, can generate structural variation. However, their repetitive nature makes it difficult to characterize fine-scale differences in their presence at specific positions, limiting our understanding of their impact on genome variation. Domesticated maize is a particularly good system for exploring the impact of transposable element proliferation as over 70% of the genome is annotated as transposable elements. High-quality transposable element annotations were recently generated forde novogenome assemblies of 26 diverse inbred maize lines. We generated base-pair resolved pairwise alignments between the B73 maize reference genome and the remaining 25 inbred maize line assemblies. From this data, we classified transposable elements as either shared or polymorphic in a given pairwise comparison. Our analysis uncovered substantial structural variation between lines, representing both simple and complex connections between TEs and structural variants. Putative insertions in SNP depleted regions, which represent recently diverged identity by state blocks, suggest some TE families may still be active. However, our analysis reveals that within these recently diverged genomic regions, deletions of transposable elements likely account for more structural variation events and base pairs than insertions. These deletions are often large structural variants containing multiple transposable elements. Combined, our results highlight how transposable elements contribute to structural variation and demonstrate that deletion events are a major contributor to genomic differences. 
    more » « less
  8. Suh, Alexander; Chapman, Tracey (Ed.)
    Abstract It is unclear how mobile DNA sequences (transposable elements, hereafter TEs) invade eukaryotic genomes and reach stable copy numbers, as transposition can decrease host fitness. This challenge is particularly stark early in the invasion of a TE family at which point hosts may lack the specialized machinery to repress the spread of these TEs. One possibility (in addition to the evolution of host regulation of TEs) is that TE families may evolve to preferentially insert into chromosomal regions that are less likely to impact host fitness. This may allow the mean TE copy number to grow while minimizing the risk for host population extinction. To test this, we constructed simulations to explore how the transposition probability and insertion preference of a TE family influence the evolution of mean TE copy number and host population size, allowing for extinction. We find that the effect of a TE family’s insertion preference depends on a host’s ability to regulate this TE family. Without host repression, a neutral insertion preference increases the frequency of and decreases the time to population extinction. With host repression, a preference for neutral insertions minimizes the cumulative deleterious load, increases population fitness, and, ultimately, avoids triggering an extinction vortex. 
    more » « less
  9. Abstract Protein translation is tightly and precisely controlled by multiple mechanisms including upstream open reading frames (uORFs), but the origins of uORFs and their role in maize are largely unexplored. In this study, an active transposition event was identified during the propagation of maize inbred line B73. The transposon, which was named BTA for ‘B73 active transposable element hAT’, creates a novel dosage-dependent hypomorphic allele of the hexose transporter gene ZmSWEET4c through insertion within the coding sequence in the first exon, and results in reduced kernel size. The BTA insertion does not affect transcript abundance but reduces protein abundance of ZmSWEET4c, probably through the introduction of a uORF. Furthermore, the introduction of BTA sequence in the exon of other genes can regulate translation efficiency without affecting their mRNA levels. A transposon capture assay revealed 79 novel insertions for BTA and BTA-like elements. These insertion sites have typical euchromatin features, including low levels of DNA methylation and high levels of H3K27ac. A putative autonomous element that mobilizes BTA and BTA-like elements was identified. Together, our results suggest a transposon-based origin of uORFs and document a new role for transposable elements to influence protein abundance and phenotypic diversity by affecting the translation rate. 
    more » « less