IntroductionGene expression is often controlled via cis-regulatory elements (CREs) that modulate the production of transcripts. For multi-gene genetic engineering and synthetic biology, precise control of transcription is crucial, both to insulate the transgenes from unwanted native regulation and to prevent readthrough or cross-regulation of transgenes within a multi-gene cassette. To prevent this activity, insulator-like elements, more properly referred to as transcriptional blockers, could be inserted to separate the transgenes so that they are independently regulated. However, only a few validated insulator-like elements are available for plants, and they tend to be larger than ideal. MethodsTo identify additional potential insulator-like sequences, we conducted a genome-wide analysis ofUtricularia gibba(humped bladderwort), one of the smallest known plant genomes, with genes that are naturally close together. The 10 best insulator-like candidates were evaluated in vivo for insulator-like activity. ResultsWe identified a total of 4,656 intergenic regions with expression profiles suggesting insulator-like activity. Comparisons of these regions across 45 other plant species (representing Monocots, Asterids, and Rosids) show low levels of syntenic conservation of these regions. Genome-wide analysis of unmethylated regions (UMRs) indicates ~87% of the targeted regions are unmethylated; however, interpretation of this is complicated becauseU. gibbahas remarkably low levels of methylation across the genome, so that large UMRs frequently extend over multiple genes and intergenic spaces. We also could not identify any conserved motifs among our selected intergenic regions or shared with existing insulator-like elements for plants. Despite this lack of conservation, however, testing of 10 selected intergenic regions for insulator-like activity found two elements on par with a previously published element (EXOB) while being significantly smaller. DiscussionGiven the small number of insulator-like elements currently available for plants, our results make a significant addition to available tools. The high hit rate (2 out of 10) also implies that more useful sequences are likely present in our selected intergenic regions; additional validation work will be required to identify which will be most useful for plant genetic engineering.
more »
« less
Evolutionary characteristics of intergenic transcribed regions indicate rare novel genes and widespread noisy transcription in the Poaceae
Abstract Extensive transcriptional activity occurring in intergenic regions of genomes has raised the question whether intergenic transcription represents the activity of novel genes or noisy expression. To address this, we evaluated cross-species and post-duplication sequence and expression conservation of intergenic transcribed regions (ITRs) in four Poaceae species. Among 43,301 ITRs across the four species, 34,460 (80%) are species-specific. ITRs found across species tend to be more divergent in expression and have more recent duplicates compared to annotated genes. To assess if ITRs are functional (under selection), machine learning models were established inOryza sativa(rice) that could accurately distinguish between phenotype genes and pseudogenes (area under curve-receiver operating characteristic = 0.94). Based on the models, 584 (8%) and 4391 (61%) rice ITRs are classified as likely functional and nonfunctional with high confidence, respectively. ITRs with conserved expression and ancient retained duplicates, features that were not part of the model, are frequently classified as likely-functional, suggesting these characteristics could serve as pragmatic rules of thumb for identifying candidate sequences likely to be under selection. This study also provides a framework to identify novel genes using comparative transcriptomic data to improve genome annotation that is fundamental for connecting genotype to phenotype in crop and model systems.
more »
« less
- Award ID(s):
- 1655386
- PAR ID:
- 10153511
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Scientific Reports
- Volume:
- 9
- Issue:
- 1
- ISSN:
- 2045-2322
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Wittkopp, Patricia (Ed.)Abstract Whole-genome duplications (WGDs) have occurred in many eukaryotic lineages. However, the underlying evolutionary forces and molecular mechanisms responsible for the long-term retention of gene duplicates created by WGDs are not well understood. We employ a population-genomic approach to understand the selective forces acting on paralogs and investigate ongoing duplicate-gene loss in multiple species of Paramecium that share an ancient WGD. We show that mutations that abolish protein function are more likely to be segregating in retained WGD paralogs than in single-copy genes, most likely because of ongoing nonfunctionalization post-WGD. This relaxation of purifying selection occurs in only one WGD paralog, accompanied by the gradual fixation of nonsynonymous mutations and reduction in levels of expression, and occurs over a long period of evolutionary time, “marking” one locus for future loss. Concordantly, the fitness effects of new nonsynonymous mutations and frameshift-causing indels are significantly more deleterious in the highly expressed copy compared with their paralogs with lower expression. Our results provide a novel mechanistic model of gene duplicate loss following WGDs, wherein selection acts on the sum of functional activity of both duplicate genes, allowing the two to wander in expression and functional space, until one duplicate locus eventually degenerates enough in functional efficiency or expression that its contribution to total activity is too insignificant to be retained by purifying selection. Retention of duplicates by such mechanisms predicts long times to duplicate-gene loss, which should not be falsely attributed to retention due to gain/change in function.more » « less
-
INTRODUCTION During the independent process of cereal evolution, many trait shifts appear to have been under convergent selection to meet the specific needs of humans. Identification of convergently selected genes across cereals could help to clarify the evolution of crop species and to accelerate breeding programs. In the past several decades, researchers have debated whether convergent phenotypic selection in distinct lineages is driven by conserved molecular changes or by diverse molecular pathways. Two of the most economically important crops, maize and rice, display some conserved phenotypic shifts—including loss of seed dispersal, decreased seed dormancy, and increased grain number during evolution—even though they experienced independent selection. Hence, maize and rice can serve as an excellent system for understanding the extent of convergent selection among cereals. RATIONALE Despite the identification of a few convergently selected genes, our understanding of the extent of molecular convergence on a genome-wide scale between maize and rice is very limited. To learn how often selection acts on orthologous genes, we investigated the functions and molecular evolution of the grain yield quantitative trait locus KRN2 in maize and its rice ortholog OsKRN2 . We also identified convergently selected genes on a genome-wide scale in maize and rice, using two large datasets. RESULTS We identified a selected gene, KRN2 ( kernel row number2 ), that differs between domesticated maize and its wild ancestor, teosinte. This gene underlies a major quantitative trait locus for kernel row number in maize. Selection in the noncoding upstream regions resulted in a reduction of KRN2 expression and an increased grain number through an increase in kernel rows. The rice ortholog, OsKRN2 , also underwent selection and negatively regulates grain number via control of secondary panicle branches. These orthologs encode WD40 proteins and function synergistically with a gene of unknown function, DUF1644, which suggests that a conserved protein interaction controls grain number in maize and rice. Field tests show that knockout of KRN2 in maize or OsKRN2 in rice increased grain yield by ~10% and ~8%, respectively, with no apparent trade-off in other agronomic traits. This suggests potential applications of KRN2 and its orthologs for crop improvement. On a genome-wide scale, we identified a set of 490 orthologous genes that underwent convergent selection during maize and rice evolution, including KRN2/OsKRN2 . We found that the convergently selected orthologous genes appear to be significantly enriched in two specific pathways in both maize and rice: starch and sucrose metabolism, and biosynthesis of cofactors. A deep analysis of convergently selected genes in the starch metabolic pathway indicates that the degree of genetic convergence via convergent selection is related to the conservation and complexity of the gene network for a given selection. CONCLUSION Our findings show that common phenotypic shifts during maize and rice evolution acting on conserved genes are driven at least in part by convergent selection, which in maize and rice likely occurred both during and after domestication. We provide evolutionary and functional evidence on the convergent selection of KRN2/OsKRN2 for grain number between maize and rice. We further found that a complete loss-of-function allele of KRN2/OsKRN2 increased grain yield without an apparent negative impact on other agronomic traits. Exploring the role of KRN2/OsKRN2 and other convergently selected genes across the cereals could provide new opportunities to enhance the production of other global crops. Shared selected orthologous genes in maize and rice for convergent phenotypic shifts during domestication and improvement. By comparing 3163 selected genes in maize and 18,755 selected genes in rice, we identified 490 orthologous gene pairs, including KRN2 and its rice ortholog OsKRN2 , as having been convergently selected. Knockout of KRN2 in maize or OsKRN2 in rice increased grain yield by increasing kernel rows and secondary panicle branches, respectively.more » « less
-
Abstract Rice, an important food resource, is highly sensitive to salt stress, which is directly related to food security. Although many studies have identified physiological mechanisms that confer tolerance to the osmotic effects of salinity, the link between rice genotype and salt tolerance is not very clear yet. Association of gene co‐expression network and rice phenotypic data under stress has penitential to identify stress‐responsive genes, but there is no standard method to associate stress phenotype with gene co‐expression network. A novel method for integration of gene co‐expression network and stress phenotype data was developed to conduct a system analysis to link genotype to phenotype. We applied aLASSO‐based method to the gene co‐expression network of rice with salt stress to discover key genes and their interactions for salt tolerance‐related phenotypes. Submodules in gene modules identified from the co‐expression network were selected by theLASSOregression, which establishes a linear relationship between gene expression profiles and physiological responses, that is, sodium/potassium condenses under salt stress. Genes in these submodules have functions related to ion transport, osmotic adjustment, and oxidative tolerance. We argued that these genes in submodules are biologically meaningful and useful for studies on rice salt tolerance. This method can be applied to other studies to efficiently and reliably integrate co‐expression network and phenotypic data.more » « less
-
Abstract BackgroundMany plant species exhibit genetic variation for coping with environmental stress. However, there are still limited approaches to effectively uncover the genomic region that regulates distinct responsive patterns of the gene across multiple varieties within the same species under abiotic stress. ResultsBy analyzing the transcriptomes of more than 100 maize inbreds, we reveal manycis- andtrans-acting eQTLs that influence the expression response to heat stress. Thecis-acting eQTLs in response to heat stress are identified in genes with differential responses to heat stress between genotypes as well as genes that are only expressed under heat stress. Thecis-acting variants for heat stress-responsive expression likely result from distinct promoter activities, and the differential heat responses of the alleles are confirmed for selected genes using transient expression assays. Global footprinting of transcription factor binding is performed in control and heat stress conditions to document regions with heat-enriched transcription factor binding occupancies. ConclusionsFootprints enriched near proximal regions of characterized heat-responsive genes in a large association panel can be utilized for prioritizing functional genomic regions that regulate genotype-specific responses under heat stress.more » « less
An official website of the United States government
