skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Evolution of conserved noncoding sequences in Arabidopsis thaliana
Abstract Recent pangenome studies have revealed a large fraction of the gene content within a species exhibits presence-absence variation (PAV). However, coding regions alone provide an incomplete assessment of functional genomic sequence variation at the species level. Little to no attention has been paid to noncoding regulatory regions in pangenome studies, though these sequences directly modulate gene expression and phenotype. To uncover regulatory genetic variation, we generated chromosome-scale genome assemblies for thirty Arabidopsis thaliana accessions from multiple distinct habitats and characterized species level variation in Conserved Noncoding Sequences (CNS). Our analyses uncovered not only PAV and positional variation (PosV) but that diversity in CNS is non-random, with variants shared across different accessions. Using evolutionary analyses and chromatin accessibility data, we provide further evidence supporting roles for conserved and variable CNS in gene regulation. Additionally, our data suggests transposable elements contribute to CNS variation. Characterizing species-level diversity in all functional genomic sequences may later uncover previously unknown mechanistic links between genotype and phenotype.  more » « less
Award ID(s):
1856627 1737898
PAR ID:
10220350
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Wittkopp, Patricia
Date Published:
Journal Name:
Molecular Biology and Evolution
ISSN:
0737-4038
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Gossmann, Toni (Ed.)
    Abstract Understanding and predicting the relationships between genotype and phenotype is often challenging, largely due to the complex nature of eukaryotic gene regulation. A step towards this goal is to map how phenotypic diversity evolves through genomic changes that modify gene regulatory interactions. Using the Prairie Rattlesnake (Crotalus viridis) and related species, we integrate mRNA-seq, proteomic, ATAC-seq and whole genome resequencing data to understand how specific evolutionary modifications to gene regulatory network components produce differences in venom gene expression. Through comparisons within and between species, we find a remarkably high degree of gene expression and regulatory network variation across even a shallow level of evolutionary divergence. We use these data to test hypotheses about the roles of specific trans-factors and cis-regulatory elements, how these roles may vary across venom genes and gene families, and how variation in regulatory systems drive diversity in venom phenotypes. Our results illustrate that differences in chromatin and genotype at regulatory elements play major roles in modulating expression. However, we also find that enhancer deletions, differences in transcription-factor expression, and variation in activity of the insulator protein CTCF also likely impact venom phenotypes. Our findings provide insight into the diversity and gene-specificity of gene regulatory features and highlight the value of comparative studies to link gene regulatory network variation to phenotypic variation. 
    more » « less
  2. Hake, Sarah (Ed.)
    A striking paradox is that genes with conserved protein sequence, function and expression pattern over deep time often exhibit extremely divergentcis-regulatory sequences. It remains unclear how such drasticcis-regulatory evolution across species allows preservation of gene function, and to what extent these differences influence howcis-regulatory variation arising within species impacts phenotypic change. Here, we investigated these questions using a plant stem cell regulator conserved in expression pattern and function over ~125 million years. Usingin-vivogenome editing in two distantly related models,Arabidopsis thaliana(Arabidopsis) andSolanum lycopersicum(tomato), we generated over 70 deletion alleles in the upstream and downstream regions of the stem cell repressor geneCLAVATA3(CLV3) and compared their individual and combined effects on a shared phenotype, the number of carpels that make fruits. We found that sequences upstream of tomatoCLV3are highly sensitive to even small perturbations compared to its downstream region. In contrast, ArabidopsisCLV3function is tolerant to severe disruptions both upstream and downstream of the coding sequence. Combining upstream and downstream deletions also revealed a different regulatory outcome. Whereas phenotypic enhancement from adding downstream mutations was predominantly weak and additive in tomato, mutating both regions of ArabidopsisCLV3caused substantial and synergistic effects, demonstrating distinct distribution and redundancy of functionalcis-regulatory sequences. Our results demonstrate remarkable malleability incis-regulatory structural organization of a deeply conserved plant stem cell regulator and suggest that major reconfiguration ofcis-regulatory sequence space is a common yet cryptic evolutionary force altering genotype-to-phenotype relationships from regulatory variation in conserved genes. Finally, our findings underscore the need for lineage-specific dissection of the spatial architecture ofcis-regulation to effectively engineer trait variation from conserved productivity genes in crops. 
    more » « less
  3. Hoffmann, Federico (Ed.)
    Abstract Y chromosomal ampliconic genes (YAGs) are important for male fertility, as they encode proteins functioning in spermatogenesis. The variation in copy number and expression levels of these multicopy gene families has been studied in great apes; however, the diversity of splicing variants remains unexplored. Here, we deciphered the sequences of polyadenylated transcripts of all nine YAG families (BPY2, CDY, DAZ, HSFY, PRY, RBMY, TSPY, VCY, and XKRY) from testis samples of six great ape species (human, chimpanzee, bonobo, gorilla, Bornean orangutan, and Sumatran orangutan). To achieve this, we enriched YAG transcripts with capture probe hybridization and sequenced them with long (Pacific Biosciences) reads. Our analysis of this data set resulted in several findings. First, we observed evolutionarily conserved alternative splicing patterns for most YAG families except for BPY2 and PRY. Second, our results suggest that BPY2 transcripts and proteins originate from separate genomic regions in bonobo versus human, which is possibly facilitated by acquiring new promoters. Third, our analysis indicates that the PRY gene family, having the highest representation of noncoding transcripts, has been undergoing pseudogenization. Fourth, we have not detected signatures of selection in the five YAG families shared among great apes, even though we identified many species-specific protein-coding transcripts. Fifth, we predicted consensus disorder regions across most gene families and species, which could be used for future investigations of male infertility. Overall, our work illuminates the YAG isoform landscape and provides a genomic resource for future functional studies focusing on infertility phenotypes in humans and critically endangered great apes. 
    more » « less
  4. Haplotype-level allelic characterization facilitates research on the functional, evolutionary and breeding-related features of extremely large and complex plant genomes. We report a 21.7-Gb chromosome-level haplotype-resolved assembly in Pinus densiflora. We found genome rearrangements involving translocations and inversions between chromosomes 1 and 3 of Pinus species and a proliferation of specific long terminal repeat (LTR) retrotransposons (LTR-RTs) in P. densiflora. Evolutionary analyses illustrated that tandem and LTR-RT-mediated duplications led to an increment of transcription factor (TF) genes in P. densiflora. The haplotype sequence comparison showed allelic imbalances, including presence–absence variations of genes (PAV genes) and their functional contributions to flowering and abiotic stress-related traits in P. densiflora. Allele-aware resequencing analysis revealed PAV gene diversity across P. densiflora accessions. Our study provides insights into key mechanisms underlying the evolution of genome structure, LTR-RTs and TFs within the Pinus lineage as well as allelic imbalances and diversity across P. densiflora. 
    more » « less
  5. Rokas, A (Ed.)
    Abstract Subtelomeres are dynamic genomic regions shaped by elevated rates of recombination, mutation, and gene birth/death. These processes contribute to formation of lineage-specific gene family expansions that commonly occupy subtelomeres across eukaryotes. Investigating the evolution of subtelomeric gene families is complicated by the presence of repetitive DNA and high sequence similarity among gene family members that prevents accurate assembly from whole genome sequences. Here, we investigated the evolution of the telomere-associated (TLO) gene family in Candida albicans using 189 complete coding sequences retrieved from 23 genetically diverse strains across the species. Tlo genes conformed to the 3 major architectural groups (α/β/γ) previously defined in the genome reference strain but significantly differed in the degree of within-group diversity. One group, Tloβ, was always found at the same chromosome arm with strong sequence similarity among all strains. In contrast, diverse Tloα sequences have proliferated among chromosome arms. Tloγ genes formed 7 primary clades that included each of the previously identified Tloγ genes from the genome reference strain with 3 Tloγ genes always found on the same chromosome arm among strains. Architectural groups displayed regions of high conservation that resolved newly identified functional motifs, providing insight into potential regulatory mechanisms that distinguish groups. Thus, by resolving intraspecies subtelomeric gene variation, it is possible to identify previously unknown gene family complexity that may underpin adaptive functional variation. 
    more » « less