skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Haplogenome assembly reveals structural variation in Eucalyptus interspecific hybrids
Abstract BackgroundDe novo phased (haplo)genome assembly using long-read DNA sequencing data has improved the detection and characterization of structural variants (SVs) in plant and animal genomes. Able to span across haplotypes, long reads allow phased, haplogenome assembly in highly outbred organisms such as forest trees. Eucalyptus tree species and interspecific hybrids are the most widely planted hardwood trees with F1 hybrids of Eucalyptus grandis and E. urophylla forming the bulk of fast-growing pulpwood plantations in subtropical regions. The extent of structural variation and its effect on interspecific hybridization is unknown in these trees. As a first step towards elucidating the extent of structural variation between the genomes of E. grandis and E. urophylla, we sequenced and assembled the haplogenomes contained in an F1 hybrid of the two species. FindingsUsing Nanopore sequencing and a trio-binning approach, we assembled the separate haplogenomes (566.7 Mb and 544.5 Mb) to 98.0% BUSCO completion. High-density SNP genetic linkage maps of both parents allowed scaffolding of 88.0% of the haplogenome contigs into 11 pseudo-chromosomes (scaffold N50 of 43.8 Mb and 42.5 Mb for the E. grandis and E. urophylla haplogenomes, respectively). We identify 48,729 SVs between the two haplogenomes providing the first detailed insight into genome structural rearrangement in these species. The two haplogenomes have similar gene content, 35,572 and 33,915 functionally annotated genes, of which 34.7% are contained in genome rearrangements. ConclusionsKnowledge of SV and haplotype diversity in the two species will form the basis for understanding the genetic basis of hybrid superiority in these trees.  more » « less
Award ID(s):
1943371
PAR ID:
10491504
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
GSA
Date Published:
Journal Name:
GigaScience
Volume:
12
ISSN:
2047-217X
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ingvarsson, P (Ed.)
    Abstract Eucalyptus grandis is a hardwood tree used worldwide as pure species or hybrid partner to breed fast-growing plantation forestry crops that serve as feedstocks of timber and lignocellulosic biomass for pulp, paper, biomaterials, and biorefinery products. The current v2.0 genome reference for the species served as the first reference for the genus and has helped drive the development of molecular breeding tools for eucalypts. Using PacBio HiFi long reads and Omni-C proximity ligation sequencing, we produced an improved, haplotype-phased assembly (v4.0) for TAG0014, an early-generation selection of E. grandis. The 2 haplotypes are 571 Mbp (HAP1) and 552 Mbp (HAP2) in size and consist of 37 and 46 contigs scaffolded onto 11 chromosomes (contig N50 of 28.9 and 16.7 Mbp), respectively. These haplotype assemblies are 70–90 Mbp smaller than the diploid v2.0 assembly but capture all except one of the 22 telomeres, suggesting that substantial redundant sequence was included in the previous assembly. A total of 35,929 (HAP1) and 35,583 (HAP2) gene models were annotated, of which 438 and 472 contain long introns (>10 kbp) in gene models previously (v2.0) identified as multiple smaller genes. These and other improvements have increased gene annotation completeness levels from 93.8 to 99.4% in the v4.0 assembly. We found that 6,493 and 6,346 genes are within tandem duplicate arrays (HAP1 and HAP2, respectively, 18.4 and 17.8% of the total) and >43.8% of the haplotype assemblies consists of repeat elements. Analysis of synteny between the haplotypes and the E. grandis v2.0 reference genome revealed extensive regions of collinearity, but also some major rearrangements, and provided a preview of population and pangenome variation in the species. 
    more » « less
  2. Abstract Carya glabra(2n= 4x= 64), also known as pignut hickory, is a widely distributed species in the walnut family (Juglandaceae). Native to the central and eastern United States and southeastern Canada,C. glabraplays an important ecological role as a common upland forest species; it is closely related to several economically valuable nut trees, includingC. illinoinensis(pecan). A deeper understanding of the genetics ofC. glabrais essential for studying its evolutionary history and biology, with potential implications for agricultural improvement of pecan. Here, we present the first nuclear genome assembly and annotation ofC. glabra. The assembly is chromosome-level and phased, representing the first assembled polyploid genome in the genusCarya. A total of 64 pseudochromosomes were assembled and phased into four haplotypes. The haplotype A assembly spans 600.4 Mb, comprises 55.0% repetitive sequences, and contains 30,947 protein-coding genes, with a BUSCO completeness score of 97.7%. Functional annotation assigned 94.3% of haplotype A genes to gene families, and 79.7% and 86.3% of genes were annotated with Gene Ontology terms and protein domains, respectively; 635 putative plant disease resistance genes were found in haplotype A. The other three haplotypes exhibited similarly high-quality annotation metrics. Our genomic analyses also suggest thatC. glabrais an autotetraploid. Comparative genomic analyses revealed high collinearity among the four haplotypes ofC. glabraand the published genomes of three otherCaryaspecies, although structural variation among the genomes of these species was identified. In addition, we provide an improved chloroplast genome assembly and the first mitochondrial genome forC. glabra. Importantly, most members of the research team are undergraduate students; the sequenced individual is located in McCarty Woods, a Conservation Area on the University of Florida campus. This work highlights the value of genome assembly efforts as powerful tools for teaching genomics and supporting conservation initiatives. This first high-quality reference genome forC. glabraprovides a valuable resource for studyingCarya, a genus of significant ecological and economic importance. Article summaryCarya glabra(pignut hickory) is a common upland forest species in North America. This species is a member of the walnut family (Juglandaceae), which includes many economically important nut trees. Here, we present the first nuclear genome assembly and annotation ofC. glabra. The assembly is chromosome-level and phased. The haplotype A assembly contains 30,947 protein-coding genes, with a BUSCO completeness score of 97.7%. Our genomic analyses suggest thatC. glabrais an autopolyploid. We also provide chloroplast and mitochondrial genome assemblies. This nuclear genome provides a valuable resource for studyingCarya, a genus of significant ecological and economic importance. 
    more » « less
  3. Abstract BackgroundCapturing the genetic diversity of wild relatives is crucial for improving crops because wild species are valuable sources of agronomic traits that are essential to enhance the sustainability and adaptability of domesticated cultivars. Genetic diversity across a genus can be captured in super-pangenomes, which provide a framework for interpreting genomic variations. ResultsHere we report the sequencing, assembly, and annotation of nine wild North American grape genomes, which are phased and scaffolded at chromosome scale. We generate a reference-unbiased super-pangenome using pairwise whole-genome alignment methods, revealing the extent of the genomic diversity among wild grape species from sequence to gene level. The pangenome graph captures genomic variation between haplotypes within a species and across the different species, and it accurately assesses the similarity of hybrids to their parents. The species selected to build the pangenome are a great representation of the genus, as illustrated by capturing known allelic variants in the sex-determining region and for Pierce’s disease resistance loci. Using pangenome-wide association analysis, we demonstrate the utility of the super-pangenome by effectively mapping short reads from genus-wide samples and identifying loci associated with salt tolerance in natural populations of grapes. ConclusionsThis study highlights how a reference-unbiased super-pangenome can reveal the genetic basis of adaptive traits from wild relatives and accelerate crop breeding research. 
    more » « less
  4. Abstract The high sequencing error rate has impeded the application of long noisy reads for diploid genome assembly. Most existing assemblers failed to generate high-quality phased assemblies using long noisy reads. Here, we present PECAT, aPhasedErrorCorrection andAssemblyTool, for reconstructing diploid genomes from long noisy reads. We design a haplotype-aware error correction method that can retain heterozygote alleles while correcting sequencing errors. We combine a corrected read SNP caller and a raw read SNP caller to further improve the identification of inconsistent overlaps in the string graph. We use a grouping method to assign reads to different haplotype groups. PECAT efficiently assembles diploid genomes using Nanopore R9, PacBio CLR or Nanopore R10 reads only. PECAT generates more contiguous haplotype-specific contigs compared to other assemblers. Especially, PECAT achieves nearly haplotype-resolved assembly onB. taurus(Bison×Simmental) using Nanopore R9 reads and phase block NG50 with 59.4/58.0 Mb for HG002 using Nanopore R10 reads. 
    more » « less
  5. McIntyre, L (Ed.)
    Abstract The adelgids (Adelgidae) are a small family of sap-feeding insects, which, together with true aphids (Aphididae) and phylloxerans (Phylloxeridae), make up the infraorder Aphidomorpha. Some adelgid species are highly destructive to forest ecosystems such as Adelges tsugae, Adelges piceae, Adelges laricis, Pineus pini, and Pineus boerneri. Despite this, there are no high-quality genomic resources for adelgids, hindering advanced genomic analyses within Adelgidae and among Aphidomorpha. Here, we used PacBio continuous long-read and Illumina RNA-sequencing to construct a high-quality draft genome assembly for the Cooley spruce gall adelgid, Adelges cooleyi (Gillette), a gall-forming species endemic to North America. The assembled genome is 270.2 Mb in total size and has scaffold and contig N50 statistics of 14.87 and 7.18 Mb, respectively. There are 24,967 predicted coding sequences, and the assembly completeness is estimated at 98.1 and 99.6% with core BUSCO gene sets of Arthropoda and Hemiptera, respectively. Phylogenomic analysis using the A. cooleyi genome, 3 publicly available adelgid transcriptomes, 4 phylloxera transcriptomes, the Daktulosphaira vitifoliae (grape phylloxera) genome, 4 aphid genomes, and 2 outgroup coccoid genomes fully resolves adelgids and phylloxerans as sister taxa. The mitochondrial genome is 24 kb, among the largest in insects sampled to date, with 39.4% composed of noncoding regions. This genome assembly is currently the only genome-scale, annotated assembly for adelgids and will be a valuable resource for understanding the ecology and evolution of Aphidomorpha. 
    more » « less