skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Crossroads of assembling a moss genome: navigating contaminants and horizontal gene transfer in the moss Physcomitrellopsis africana
Abstract The first chromosome-scale reference genome of the rare narrow-endemic African moss Physcomitrellopsis africana (P. africana) is presented here. Assembled from 73 × Oxford Nanopore Technologies (ONT) long reads and 163 × Beijing Genomics Institute (BGI)-seq short reads, the 414 Mb reference comprises 26 chromosomes and 22,925 protein-coding genes [Benchmarking Universal Single-Copy Ortholog (BUSCO) scores: C:94.8% (D:13.9%)]. This genome holds 2 genes that withstood rigorous filtration of microbial contaminants, have no homolog in other land plants, and are thus interpreted as resulting from 2 unique horizontal gene transfers (HGTs) from microbes. Further, P. africana shares 176 of the 273 published HGT candidates identified in Physcomitrium patens (P. patens), but lacks 98 of these, highlighting that perhaps as many as 91 genes were acquired in P. patens in the last 40 million years following its divergence from its common ancestor with P. africana. These observations suggest rather continuous gene gains via HGT followed by potential losses during the diversification of the Funariaceae. Our findings showcase both dynamic flux in plant HGTs over evolutionarily “short” timescales, alongside enduring impacts of successful integrations, like those still functionally maintained in extant P. africana. Furthermore, this study describes the informatic processes employed to distinguish contaminants from candidate HGT events.  more » « less
Award ID(s):
1943371
PAR ID:
10521866
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
G3: Genes, Genomes, Genetics
Volume:
14
Issue:
7
ISSN:
2160-1836
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Prokaryotic genomes are often considered to be mosaics of genes that do not necessarily share the same evolutionary history due to widespread horizontal gene transfers (HGTs). Consequently, representing evolutionary relationships of prokaryotes as bifurcating trees has long been controversial. However, studies reporting conflicts among gene trees derived from phylogenomic data sets have shown that these conflicts can be the result of artifacts or evolutionary processes other than HGT, such as incomplete lineage sorting, low phylogenetic signal, and systematic errors due to substitution model misspecification. Here, we present the results of an extensive exploration of phylogenetic conflicts in the cyanobacterial order Nostocales, for which previous studies have inferred strongly supported conflicting relationships when using different concatenated phylogenomic data sets. We found that most of these conflicts are concentrated in deep clusters of short internodes of the Nostocales phylogeny, where the great majority of individual genes have low resolving power. We then inferred phylogenetic networks to detect HGT events while also accounting for incomplete lineage sorting. Our results indicate that most conflicts among gene trees are likely due to incomplete lineage sorting linked to an ancient rapid radiation, rather than to HGTs. Moreover, the short internodes of this radiation fit the expectations of the anomaly zone, i.e., a region of the tree parameter space where a species tree is discordant with its most likely gene tree. We demonstrated that concatenation of different sets of loci can recover up to 17 distinct and well-supported relationships within the putative anomaly zone of Nostocales, corresponding to the observed conflicts among well-supported trees based on concatenated data sets from previous studies. Our findings highlight the important role of rapid radiations as a potential cause of strongly conflicting phylogenetic relationships when using phylogenomic data sets of bacteria. We propose that polytomies may be the most appropriate phylogenetic representation of these rapid radiations that are part of anomaly zones, especially when all possible genomic markers have been considered to infer these phylogenies. [Anomaly zone; bacteria; horizontal gene transfer; incomplete lineage sorting; Nostocales; phylogenomic conflict; rapid radiation; Rhizonema.] 
    more » « less
  2. Abstract Until recently, precise genome editing has been limited to a few organisms. The ability of Cas9 to generate double stranded DNA breaks at specific genomic sites has greatly expanded molecular toolkits in many organisms and cell types. Before CRISPR‐Cas9 mediated genome editing,P. patenswas unique among plants in its ability to integrate DNA via homologous recombination. However, selection for homologous recombination events was required to obtain edited plants, limiting the types of editing that were possible. Now with CRISPR‐Cas9, molecular manipulations inP. patenshave greatly expanded. This protocol describes a method to generate a variety of different genome edits. The protocol describes a streamlined method to generate the Cas9/sgRNA expression constructs, design homology templates, transform, and quickly genotype plants. © 2023 Wiley Periodicals LLC. Basic Protocol 1: Constructing the Cas9/sgRNA transient expression vector Alternate Protocol 1: Shortcut to generating single and pooled Cas9/sgRNA expression vectors Basic Protocol 2: Designing the oligonucleotide‐based homology‐directed repair (HDR) template Alternate Protocol 2: Designing the plasmid‐based HDR template Basic Protocol 3: Inducing genome editing by transforming CRISPR vector intoP. patensprotoplasts Basic Protocol 4: Identifying edited plants. 
    more » « less
  3. Summary With global climate change, water scarcity threatens whole agro/ecosystems. The desert mossSyntrichia caninervis, an extremophile, offers novel insights into surviving desiccation and heat. The sequencedS. caninervisgenome consists of 13 chromosomes containing 16 545 protein‐coding genes and 2666 unplaced scaffolds. Syntenic relationships within theS.caninervisandPhyscomitrellapatensgenomes indicate theS. caninervisgenome has undergone a single whole genome duplication event (compared to two forP. patens) and evidence suggests chromosomal or segmental losses in the evolutionary history ofS. caninervis. The genome contains a large sex chromosome composed primarily of repetitive sequences with a large number ofCopiaandGypsyelements. Orthogroup analyses revealed an expansion ofELIPgenes encoding proteins important in photoprotection. The transcriptomic response to desiccation identified four structural clusters of novel genes. The genomic resources established for this extremophile offer new perspectives for understanding the evolution of desiccation tolerance in plants. 
    more » « less
  4. Abstract The aye-aye (Daubentonia madagascariensis) is the only extant member of the Daubentoniidae primate family. Although several reference genomes exist for this endangered strepsirrhine primate, the predominant usage of short-read sequencing has resulted in limited assembly contiguity and completeness, and no protein-coding gene annotations have yet been released. Here, we present a novel, fully annotated, chromosome-level hybrid de novo assembly for the species based on a combination of Oxford Nanopore Technologies long reads and Illumina short reads and scaffolded using genome-wide chromatin interaction data—a community resource that will improve future conservation efforts as well as primate comparative analyses. 
    more » « less
  5. Ingvarsson, P (Ed.)
    Abstract Eucalyptus grandis is a hardwood tree used worldwide as pure species or hybrid partner to breed fast-growing plantation forestry crops that serve as feedstocks of timber and lignocellulosic biomass for pulp, paper, biomaterials, and biorefinery products. The current v2.0 genome reference for the species served as the first reference for the genus and has helped drive the development of molecular breeding tools for eucalypts. Using PacBio HiFi long reads and Omni-C proximity ligation sequencing, we produced an improved, haplotype-phased assembly (v4.0) for TAG0014, an early-generation selection of E. grandis. The 2 haplotypes are 571 Mbp (HAP1) and 552 Mbp (HAP2) in size and consist of 37 and 46 contigs scaffolded onto 11 chromosomes (contig N50 of 28.9 and 16.7 Mbp), respectively. These haplotype assemblies are 70–90 Mbp smaller than the diploid v2.0 assembly but capture all except one of the 22 telomeres, suggesting that substantial redundant sequence was included in the previous assembly. A total of 35,929 (HAP1) and 35,583 (HAP2) gene models were annotated, of which 438 and 472 contain long introns (>10 kbp) in gene models previously (v2.0) identified as multiple smaller genes. These and other improvements have increased gene annotation completeness levels from 93.8 to 99.4% in the v4.0 assembly. We found that 6,493 and 6,346 genes are within tandem duplicate arrays (HAP1 and HAP2, respectively, 18.4 and 17.8% of the total) and >43.8% of the haplotype assemblies consists of repeat elements. Analysis of synteny between the haplotypes and the E. grandis v2.0 reference genome revealed extensive regions of collinearity, but also some major rearrangements, and provided a preview of population and pangenome variation in the species. 
    more » « less