Abstract BackgroundDe novo phased (haplo)genome assembly using long-read DNA sequencing data has improved the detection and characterization of structural variants (SVs) in plant and animal genomes. Able to span across haplotypes, long reads allow phased, haplogenome assembly in highly outbred organisms such as forest trees. Eucalyptus tree species and interspecific hybrids are the most widely planted hardwood trees with F1 hybrids of Eucalyptus grandis and E. urophylla forming the bulk of fast-growing pulpwood plantations in subtropical regions. The extent of structural variation and its effect on interspecific hybridization is unknown in these trees. As a first step towards elucidating the extent of structural variation between the genomes of E. grandis and E. urophylla, we sequenced and assembled the haplogenomes contained in an F1 hybrid of the two species. FindingsUsing Nanopore sequencing and a trio-binning approach, we assembled the separate haplogenomes (566.7 Mb and 544.5 Mb) to 98.0% BUSCO completion. High-density SNP genetic linkage maps of both parents allowed scaffolding of 88.0% of the haplogenome contigs into 11 pseudo-chromosomes (scaffold N50 of 43.8 Mb and 42.5 Mb for the E. grandis and E. urophylla haplogenomes, respectively). We identify 48,729 SVs between the two haplogenomes providing the first detailed insight into genome structural rearrangement in these species. The two haplogenomes have similar gene content, 35,572 and 33,915 functionally annotated genes, of which 34.7% are contained in genome rearrangements. ConclusionsKnowledge of SV and haplotype diversity in the two species will form the basis for understanding the genetic basis of hybrid superiority in these trees.
more »
« less
A haplotype-resolved reference genome for Eucalyptus grandis
Abstract Eucalyptus grandis is a hardwood tree used worldwide as pure species or hybrid partner to breed fast-growing plantation forestry crops that serve as feedstocks of timber and lignocellulosic biomass for pulp, paper, biomaterials, and biorefinery products. The current v2.0 genome reference for the species served as the first reference for the genus and has helped drive the development of molecular breeding tools for eucalypts. Using PacBio HiFi long reads and Omni-C proximity ligation sequencing, we produced an improved, haplotype-phased assembly (v4.0) for TAG0014, an early-generation selection of E. grandis. The 2 haplotypes are 571 Mbp (HAP1) and 552 Mbp (HAP2) in size and consist of 37 and 46 contigs scaffolded onto 11 chromosomes (contig N50 of 28.9 and 16.7 Mbp), respectively. These haplotype assemblies are 70–90 Mbp smaller than the diploid v2.0 assembly but capture all except one of the 22 telomeres, suggesting that substantial redundant sequence was included in the previous assembly. A total of 35,929 (HAP1) and 35,583 (HAP2) gene models were annotated, of which 438 and 472 contain long introns (>10 kbp) in gene models previously (v2.0) identified as multiple smaller genes. These and other improvements have increased gene annotation completeness levels from 93.8 to 99.4% in the v4.0 assembly. We found that 6,493 and 6,346 genes are within tandem duplicate arrays (HAP1 and HAP2, respectively, 18.4 and 17.8% of the total) and >43.8% of the haplotype assemblies consists of repeat elements. Analysis of synteny between the haplotypes and the E. grandis v2.0 reference genome revealed extensive regions of collinearity, but also some major rearrangements, and provided a preview of population and pangenome variation in the species.
more »
« less
- Award ID(s):
- 1943371
- PAR ID:
- 10608347
- Editor(s):
- Ingvarsson, P
- Publisher / Repository:
- Genetics Society of America
- Date Published:
- Journal Name:
- G3: Genes, Genomes, Genetics
- ISSN:
- 2160-1836
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Castric, Vincent (Ed.)Abstract White clover (Trifolium repens L.; Fabaceae) is an important forage and cover crop in agricultural pastures around the world and is increasingly used in evolutionary ecology and genetics to understand the genetic basis of adaptation. Historically, improvements in white clover breeding practices and assessments of genetic variation in nature have been hampered by a lack of high-quality genomic resources for this species, owing in part to its high heterozygosity and allotetraploid hybrid origin. Here, we use PacBio HiFi and chromosome conformation capture (Omni-C) technologies to generate a chromosome-level, haplotype-resolved genome assembly for white clover totaling 998 Mbp (scaffold N50 = 59.3 Mbp) and 1 Gbp (scaffold N50 = 58.6 Mbp) for haplotypes 1 and 2, respectively, with each haplotype arranged into 16 chromosomes (8 per subgenome). We additionally provide a functionally annotated haploid mapping assembly (968 Mbp, scaffold N50 = 59.9 Mbp), which drastically improves on the existing reference assembly in both contiguity and assembly accuracy. We annotated 78,174 protein-coding genes, resulting in protein BUSCO completeness scores of 99.6% and 99.3% against the embryophyta_odb10 and fabales_odb10 lineage datasets, respectively.more » « less
-
Abstract Carya glabra(2n= 4x= 64), also known as pignut hickory, is a widely distributed species in the walnut family (Juglandaceae). Native to the central and eastern United States and southeastern Canada,C. glabraplays an important ecological role as a common upland forest species; it is closely related to several economically valuable nut trees, includingC. illinoinensis(pecan). A deeper understanding of the genetics ofC. glabrais essential for studying its evolutionary history and biology, with potential implications for agricultural improvement of pecan. Here, we present the first nuclear genome assembly and annotation ofC. glabra. The assembly is chromosome-level and phased, representing the first assembled polyploid genome in the genusCarya. A total of 64 pseudochromosomes were assembled and phased into four haplotypes. The haplotype A assembly spans 600.4 Mb, comprises 55.0% repetitive sequences, and contains 30,947 protein-coding genes, with a BUSCO completeness score of 97.7%. Functional annotation assigned 94.3% of haplotype A genes to gene families, and 79.7% and 86.3% of genes were annotated with Gene Ontology terms and protein domains, respectively; 635 putative plant disease resistance genes were found in haplotype A. The other three haplotypes exhibited similarly high-quality annotation metrics. Our genomic analyses also suggest thatC. glabrais an autotetraploid. Comparative genomic analyses revealed high collinearity among the four haplotypes ofC. glabraand the published genomes of three otherCaryaspecies, although structural variation among the genomes of these species was identified. In addition, we provide an improved chloroplast genome assembly and the first mitochondrial genome forC. glabra. Importantly, most members of the research team are undergraduate students; the sequenced individual is located in McCarty Woods, a Conservation Area on the University of Florida campus. This work highlights the value of genome assembly efforts as powerful tools for teaching genomics and supporting conservation initiatives. This first high-quality reference genome forC. glabraprovides a valuable resource for studyingCarya, a genus of significant ecological and economic importance. Article summaryCarya glabra(pignut hickory) is a common upland forest species in North America. This species is a member of the walnut family (Juglandaceae), which includes many economically important nut trees. Here, we present the first nuclear genome assembly and annotation ofC. glabra. The assembly is chromosome-level and phased. The haplotype A assembly contains 30,947 protein-coding genes, with a BUSCO completeness score of 97.7%. Our genomic analyses suggest thatC. glabrais an autopolyploid. We also provide chloroplast and mitochondrial genome assemblies. This nuclear genome provides a valuable resource for studyingCarya, a genus of significant ecological and economic importance.more » « less
-
Meyer, Rachel (Ed.)Abstract The plant genus Bidens (Asteraceae or Compositae; Coreopsidae) is a species-rich and circumglobally distributed taxon. The 19 hexaploid species endemic to the Hawaiian Islands are considered an iconic example of adaptive radiation, of which many are imperiled and of high conservation concern. Until now, no genomic resources were available for this genus, which may serve as a model system for understanding the evolutionary genomics of explosive plant diversification. Here, we present a high-quality reference genome for the Hawaiʻi Island endemic species B. hawaiensis A. Gray reconstructed from long-read, high-fidelity sequences generated on a Pacific Biosciences Sequel II System. The haplotype-aware, draft genome assembly consisted of ~6.67 Giga bases (Gb), close to the holoploid genome size estimate of 7.56 Gb (±0.44 SD) determined by flow cytometry. After removal of alternate haplotigs and contaminant filtering, the consensus haploid reference genome was comprised of 15 904 contigs containing ~3.48 Gb, with a contig N50 value of 422 594. The high interspersed repeat content of the genome, approximately 74%, along with hexaploid status, contributed to assembly fragmentation. Both the haplotype-aware and consensus haploid assemblies recovered >96% of Benchmarking Universal Single-Copy Orthologs. Yet, the removal of alternate haplotigs did not substantially reduce the proportion of duplicated benchmarking genes (~79% vs. ~68%). This reference genome will support future work on the speciation process during adaptive radiation, including resolving evolutionary relationships, determining the genomic basis of trait evolution, and supporting ongoing conservation efforts.more » « less
-
Shapiro, Beth (Ed.)Abstract In addition to including one of the most popular companion animals, species from the cat family Felidae serve as a powerful system for genetic analysis of inherited and infectious disease, as well as for the study of phenotypic evolution and speciation. Previous diploid-based genome assemblies for the domestic cat have served as the primary reference for genomic studies within the cat family. However, these versions suffered from poor resolution of complex and highly repetitive regions, with substantial amounts of unplaced sequence that is polymorphic or copy number variable. We sequenced the genome of a female F1 Bengal hybrid cat, the offspring of a domestic cat (Felis catus) x Asian leopard cat (Prionailurus bengalensis) cross, with PacBio long sequence reads and used Illumina sequence reads from the parents to phase >99.9% of the reads into the 2 species’ haplotypes. De novo assembly of the phased reads produced highly continuous haploid genome assemblies for the domestic cat and Asian leopard cat, with contig N50 statistics exceeding 83 Mb for both genomes. Whole-genome alignments reveal the Felis and Prionailurus genomes are colinear, and the cytogenetic differences between the homologous F1 and E4 chromosomes represent a case of centromere repositioning in the absence of a chromosomal inversion. Both assemblies offer significant improvements over the previous domestic cat reference genome, with a 100% increase in contiguity and the capture of the vast majority of chromosome arms in 1 or 2 large contigs. We further demonstrated that comparably accurate F1 haplotype phasing can be achieved with members of the same species when one or both parents of the trio are not available. These novel genome resources will empower studies of feline precision medicine, adaptation, and speciation.more » « less
An official website of the United States government

