skip to main content


Title: Extensive genome heterogeneity leads to preferential allele expression and copy number‐dependent expression in cultivated potato
Summary

Relative to homozygous diploids, the presence of multiple homologs or homeologs in polyploids affords greater tolerance to mutations that can impact genome evolution. In this study, we describe sequence and structural variation in the genomes of six accessions of cultivated potato (Solanum tuberosumL.),a vegetatively propagated autotetraploid and their impact on the transcriptome. Sequence diversity was high with a mean single nucleotide polymorphisms (SNP) rate of approximately 1 per 50 bases suggestive of high levels of allelic diversity. Additive gene expression was observed in leaves (3605 genes) and tubers (6156 genes) that contrasted the preferential allele expression of between 2180 and 3502 and 3367 and 5270 genes in the leaf and tuber transcriptome, respectively. Preferential allele expression was significantly associated with evolutionarily conserved genes suggesting selection of specific alleles of genes responsible for biological processes common to angiosperms during the breeding selection process. Copy number variation was rampant with between 16 098 and 18 921 genes in each cultivar exhibiting duplication or deletion. Copy number variable genes tended to be evolutionarily recent, lowly expressed, and enriched in genes that show increased expression in response to biotic and abiotic stress treatments suggestive of a role in adaptation. Gene copy number impacts on gene expression were detected with 528 genes having correlations between copy number and gene expression. Collectively, these data suggest that in addition to allelic variation of coding sequence, the heterogenous nature of the tetraploid potato genome contributes to a highly dynamic transcriptome impacted by allele preferential and copy number‐dependent expression effects.

 
more » « less
NSF-PAR ID:
10044525
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
The Plant Journal
Volume:
92
Issue:
4
ISSN:
0960-7412
Page Range / eLocation ID:
p. 624-637
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Plants respond to abiotic stress through a variety of physiological, biochemical, and transcriptional mechanisms. Many genes exhibit altered levels of expression in response to abiotic stress, which requires concerted action of bothcis‐andtrans‐regulatory features. In order to study the variability in transcriptome response to abiotic stress,RNAsequencing was performed using 14‐day‐old maize seedlings of inbreds B73, Mo17, Oh43,PH207 and B37 under control, cold and heat conditions. Large numbers of genes that responded differentially to stress between parental inbred lines were identified.RNAsequencing was also performed on similar tissues of theF1hybrids produced by crossing B73 and each of the three other inbred lines. By evaluating allele‐specific transcript abundance in theF1hybrids, we were able to measure the abundance ofcis‐andtrans‐regulatory variation between genotypes for both steady‐state and stress‐responsive expression differences. Although examples oftrans‐regulatory variation were observed,cis‐regulatory variation was more common for both steady‐state and stress‐responsive expression differences. The genes withcis‐allelic variation for response to cold or heat stress provided an opportunity to study the basis for regulatory diversity.

     
    more » « less
  2. Abstract

    Local adaptation and phenotypic plasticity are main mechanisms of organisms’ resilience in changing environments. Both are affected by gene flow and are expected to be weak in zooplankton populations inhabiting large continuous water bodies and strongly affected by currents. Lake Baikal, the deepest and one of the coldest lakes on Earth, experienced epilimnion temperature increase during the last 100 years, exposing Baikal's zooplankton to novel selective pressures. We obtained a partial transcriptome ofEpischura baikalensis(Copepoda: Calanoida), the dominant component of Baikal's zooplankton, and estimatedSNPallele frequencies and transcript abundances in samples from regions of Baikal that differ in multiyear average surface temperatures. The strongest signal in bothSNPand transcript abundance differentiation is theSWNEgradient along the 600+ km long axis of the lake, suggesting isolation by distance.SNPdifferentiation is stronger for nonsynonymous than synonymousSNPs and is paralleled by differential survival during a laboratory exposure to increased temperature, indicating directional selection operating on the temperature gradient. Transcript abundance, generally collinear with theSNPdifferentiation, shows samples from the warmest, less deep location clustering together with the southernmost samples. Differential expression is more frequent among transcripts orthologous to candidate thermal response genes previously identified in model arthropods, including genes encoding cytoskeleton proteins, heat‐shock proteins, proteases, enzymes of central energy metabolism, lipid and antioxidant pathways. We conclude that the pivotal endemic zooplankton species in Lake Baikal exists under temperature‐mediated selection and possesses both genetic variation and plasticity to respond to novel temperature‐related environmental pressures.

     
    more » « less
  3. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  4. Lasky, Jesse R. (Ed.)

    Gene expression can be influenced by genetic variants that are closely linked to the expressed gene (cis eQTLs) and variants in other parts of the genome (trans eQTLs). We created a multiparental mapping population by sampling genotypes from a single natural population ofMimulus guttatusand scored gene expression in the leaves of 1,588 plants. We find that nearly every measured gene exhibits cis regulatory variation (91% have FDR < 0.05). cis eQTLs are usually allelic series with three or more functionally distinct alleles. The cis locus explains about two thirds of the standing genetic variance (on average) but varies among genes and tends to be greatest when there is high indel variation in the upstream regulatory region and high nucleotide diversity in the coding sequence. Despite mapping over 10,000 trans eQTL / affected gene pairs, most of the genetic variance generated by trans acting loci remains unexplained. This implies a large reservoir of trans acting genes with subtle or diffuse effects. Mapped trans eQTLs show lower allelic diversity but much higher genetic dominance than cis eQTLs. Several analyses also indicate that trans eQTLs make a substantial contribution to the genetic correlations in expression among different genes. They may thus be essential determinants of “gene expression modules,” which has important implications for the evolution of gene expression and how it is studied by geneticists.

     
    more » « less
  5. Abstract

    Copy number variation (CNV) is a major part of the genetic diversity segregating within populations, but remains poorly understood relative to single nucleotide variation. Here, we report on atRNAligase gene (Migut.N02091;RLG1a) exhibiting unprecedented, and fitness‐relevant,CNVwithin an annual population of the yellow monkeyflowerMimulus guttatus.RLG1a variation was associated with multiple traits in pooled population sequencing (PoolSeq) scans of phenotypic and phenological cohorts. Resequencing of inbred lines revealed intermediate‐frequency three‐copy variants ofRLG1a (trip+;5/35 = 14%), andtrip+lines exhibited elevatedRLG1a expression under multiple conditions.trip+carriers, in addition to being over‐represented in late‐flowering and large‐flowered PoolSeq populations, flowered later under stressful conditions in a greenhouse experiment (p < 0.05). In wild population samples, we discovered an additional rareRLG1a variant (high+) that carries 250–300 copies ofRLG1a totalling ~5.7 Mb (20–40% of a chromosome). In the progeny of ahigh+carrier, Mendelian segregation of diagnostic alleles andqPCR‐based copy counts indicate thathigh+is a single tandem array unlinked to the single‐copyRLG1a locus. In the wild,high+carriers had highest fitness in two particularly dry and/or hot years (2015 and 2017; bothp < 0.01), while single‐copy individuals were twice as fecund as eitherCNVtype in a lush year (2016:p < 0.005). Our results demonstrate fluctuating selection onCNVs affecting phenological traits in a wild population, suggest that planttRNAligases mediate stress‐responsive life‐history traits, and introduce a novel system for investigating the molecular mechanisms of gene amplification.

     
    more » « less