skip to main content

Title: Comprehensive definition of genome features in Spirodela polyrhiza by high‐depth physical mapping and short‐read DNA sequencing strategies

Spirodela polyrhizais a fast‐growing aquatic monocot with highly reduced morphology, genome size and number of protein‐coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158‐Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome‐wide physical maps combined with high‐coverage short‐read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of therDNArepeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, smallRNAsequencing revealed 29 Spirodela‐specific microRNA, with only two being shared withElaeis guineensis(oil palm) andMusa balbisiana(banana). CombiningDNAmethylation data and smallRNAsequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:IntactLTRratio of 8.2. Interestingly, we found that Spirodela has the lowest globalDNAmethylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non‐essential protein coding genes,rDNAandLTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large‐scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family.

more » « less
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Date Published:
Journal Name:
The Plant Journal
Page Range / eLocation ID:
p. 617-635
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    In plants, 24 nucleotide long heterochromatic siRNAs (het‐siRNAs) transcriptionally regulate gene expression byRNA‐directedDNAmethylation (RdDM). The biogenesis of most het‐siRNAs depends on the plant‐specificRNApolymeraseIV(PolIV), andARGONAUTE4 (AGO4) is a major het‐siRNAeffector protein. Through genome‐wide analysis ofsRNA‐seq data sets, we found thatAGO4is required for the accumulation of a small subset of het‐siRNAs. The accumulation ofAGO4‐dependent het‐siRNAs also requires several factors known to participate in the effector portion of the RdDMpathway, includingRNA POLYMERASEV (POLV),DOMAINS REARRANGED METHYLTRANSFERASE2 (DRM2) andSAWADEE HOMEODOMAIN HOMOLOGUE1 (SHH1). Like manyAGOproteins,AGO4 is an endonuclease that can ‘slice’RNAs. We found that a slicing‐defectiveAGO4 was unable to fully recoverAGO4‐dependent het‐siRNAaccumulation fromago4mutant plants. Collectively, our data suggest thatAGO4‐dependent siRNAs are secondary siRNAs dependent on the prior activity of the RdDMpathway at certain loci.

    more » « less
  2. Prasinophytes form a paraphyletic assemblage of early diverging green algae, which have the potential to reveal the traits of the last common ancestor of the main two green lineages: (i) chlorophyte algae and (ii) streptophyte algae. Understanding the genetic composition of prasinophyte algae is fundamental to understanding the diversification and evolutionary processes that may have occurred in both green lineages. In this study, we sequenced the chloroplast genome ofPyramimonas parkeaeNIES254 and compared it with that ofP. parkeaeCCMP726, the only other fully sequencedP. parkeaechloroplast genome. The results revealed thatP. parkeaechloroplast genomes are surprisingly variable. The chloroplast genome ofNIES254 was larger than that ofCCMP726 by 3,204 bp, theNIES254 large single copy was 288 bp longer, the small single copy was 5,088 bp longer, and theIRwas 1,086 bp shorter than that ofCCMP726. Similarity values of the two strains were almost zero in four large hot spot regions. Finally, the strains differed in copy number for three protein‐coding genes:ycf20,psaC, andndhE. Phylogenetic analyses using 16S and 18SrDNAandrbcLsequences resolved a clade consisting of these twoP. parkeaestrains and a clade consisting of these plus otherPyramimonasisolates. These results are consistent with past studies indicating that prasinophyte chloroplast genomes display a higher level of variation than is commonly found among land plants. Consequently, prasinophyte chloroplast genomes may be less useful for inferring the early history of Viridiplantae than has been the case for land plant diversification.

    more » « less
  3. Summary

    The flowering plantArabidopsis thalianais a dicot model organism for research in many aspects of plant biology. A comprehensive annotation of its genome paves the way for understanding the functions and activities of all types of transcripts, includingmRNA, the various classes of non‐codingRNA, and smallRNA. TheTAIR10 annotation update had a profound impact on Arabidopsis research but was released more than 5 years ago. Maintaining the accuracy of the annotation continues to be a prerequisite for future progress. Using an integrative annotation pipeline, we assembled tissue‐specificRNA‐Seq libraries from 113 datasets and constructed 48 359 transcript models of protein‐coding genes in eleven tissues. In addition, we annotated various classes of non‐codingRNAincluding microRNA, long intergenicRNA, small nucleolarRNA, natural antisense transcript, small nuclearRNA, and smallRNAusing published datasets and in‐house analytic results. Altogether, we identified 635 novel protein‐coding genes, 508 novel transcribed regions, 5178 non‐codingRNAs, and 35 846 smallRNAloci that were formerly unannotated. Analysis of the splicing events andRNA‐Seq based expression profiles revealed the landscapes of gene structures, untranslated regions, and splicing activities to be more intricate than previously appreciated. Furthermore, we present 692 uniformly expressed housekeeping genes, 43% of whose human orthologs are also housekeeping genes. This updated Arabidopsis genome annotation with a substantially increased resolution of gene models will not only further our understanding of the biological processes of this plant model but also of other species.

    more » « less
  4. Summary

    From a single transgenic line harboring fiveTnt1transposon insertions, we generated a near‐saturated insertion population inMedicago truncatula. Using thermal asymmetric interlaced‐polymerase chain reaction followed by sequencing, we recovered 388 888 flanking sequence tags (FSTs) from 21 741 insertion lines in this population.FSTrecovery from 14Tnt1lines using the whole‐genome sequencing (WGS) and/orTnt1‐capture sequencing approaches suggests an average of 80 insertions per line, which is more than the previous estimation of 25 insertions. Analysis of the distribution pattern and preference ofTnt1insertions showed thatTnt1is overall randomly distributed throughout theM. truncatulagenome. At the chromosomal level,Tnt1insertions occurred on both arms of all chromosomes, with insertion frequency negatively correlated with theGCcontent. Based on 174 546 filteredFSTs that show exact insertion locations in theM. truncatulagenome version 4.0 (Mt4.0), 0.44Tnt1insertions occurred per kb, and 19 583 genes containedTnt1with an average of 3.43 insertions per gene. Pathway and gene ontology analyses revealed thatTnt1‐inserted genes are significantly enriched in processes associated with ‘stress’, ‘transport’, ‘signaling’ and ‘stimulus response’. Surprisingly, gene groups with higher methylation frequency were more frequently targeted for insertion. Analysis of 19 583Tnt1‐inserted genes revealed that 59% (1265) of 2144 transcription factors, 63% (765) of 1216 receptor kinases and 56% (343) of 616 nucleotide‐binding site‐leucine‐rich repeat genes harbored at least oneTnt1insertion, compared with the overall 38% ofTnt1‐inserted genes out of 50 894 annotated genes in the genome.

    more » « less
  5. Abstract

    ARGONAUTES are the central effector proteins ofRNAsilencing which bind target transcripts in a smallRNA‐guided manner.Arabidopsis thalianahas 10ARGONAUTE(AGO) genes, with specialized roles inRNA‐directedDNAmethylation, post‐transcriptional gene silencing, and antiviral defense. To better understand specialization amongAGOgenes at the level of transcriptional regulation we tested a library of 1497 transcription factors for binding to the promoters ofAGO1,AGO10, andAGO7using yeast 1‐hybrid assays. A ranked list of candidateDNA‐bindingTFs revealed binding of theAGO7promoter by a number of proteins in two families: the miR156‐regulatedSPLfamily and the miR319‐regulatedTCPfamily, both of which have roles in developmental timing and leaf morphology. Possible functions forSPLandTCPbinding are unclear: we showed that these binding sites are not required for the polar expression pattern ofAGO7, nor for the function ofAGO7in leaf shape. NormalAGO7transcription levels and function appear to depend instead on an adjacent 124‐bp region. Progress in understanding the structure of this promoter may aid efforts to understand how the conservedAGO7‐triggeredTAS3pathway functions in timing and polarity.

    more » « less