skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Reference Genome Sequence for Giant Sequoia
Abstract The giant sequoia (Sequoiadendron giganteum) of California are massive, long-lived trees that grow along the U.S. Sierra Nevada mountains. Genomic data are limited in giant sequoia and producing a reference genome sequence has been an important goal to allow marker development for restoration and management. Using deep-coverage Illumina and Oxford Nanopore sequencing, combined with Dovetail chromosome conformation capture libraries, the genome was assembled into eleven chromosome-scale scaffolds containing 8.125 Gbp of sequence. Iso-Seq transcripts, assembled from three distinct tissues, was used as evidence to annotate a total of 41,632 protein-coding genes. The genome was found to contain, distributed unevenly across all 11 chromosomes and in 63 orthogroups, over 900 complete or partial predicted NLR genes, of which 375 are supported by annotation derived from protein evidence and gene modeling. This giant sequoia reference genome sequence represents the first genome sequenced in the Cupressaceae family, and lays a foundation for using genomic tools to aid in giant sequoia conservation and management.  more » « less
Award ID(s):
1744309
PAR ID:
10308625
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Date Published:
Journal Name:
G3 Genes|Genomes|Genetics
Volume:
10
Issue:
11
ISSN:
2160-1836
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Macrocystis pyrifera(giant kelp), is a brown macroalga of great ecological importance as a primary producer and structure-forming foundational species that provides habitat for hundreds of species. It has many commercial uses (e.g. source of alginate, fertilizer, cosmetics, feedstock). One of the limitations to exploiting giant kelp’s economic potential and assisting in giant kelp conservation efforts is a lack of genomic tools like a high quality, contiguous reference genome with accurate gene annotations. Reference genomes attempt to capture the complete genomic sequence of an individual or species, and importantly provide a universal structure for comparison across a multitude of genetic experiments, both within and between species. We assembled the giant kelp genome of a haploid female gametophyte de novo using PacBio reads, then ordered contigs into chromosome level scaffolds using Hi-C. We found the giant kelp genome to be 537 MB, with a total of 35 scaffolds and 188 contigs. The assembly N50 is 13,669,674 with GC content of 50.37%. We assessed the genome completeness using BUSCO, and found giant kelp contained 94% of the BUSCO genes from the stramenopile clade. Annotation of the giant kelp genome revealed 25,919 genes. Additionally, we present genetic variation data based on 48 diploid giant kelp sporophytes from three different Southern California populations that confirms the population structure found in other studies of these populations. This work resulted in a high-quality giant kelp genome that greatly increases the genetic knowledge of this ecologically and economically vital species. 
    more » « less
  2. SUMMARY Drought is a major limitation for survival and growth in plants. With more frequent and severe drought episodes occurring due to climate change, it is imperative to understand the genomic and physiological basis of drought tolerance to be able to predict how species will respond in the future. In this study, univariate and multitrait multivariate genome‐wide association study methods were used to identify candidate genes in two iconic and ecosystem‐dominating species of the western USA, coast redwood and giant sequoia, using 10 drought‐related physiological and anatomical traits and genome‐wide sequence‐capture single nucleotide polymorphisms. Population‐level phenotypic variation was found in carbon isotope discrimination, osmotic pressure at full turgor, xylem hydraulic diameter, and total area of transporting fibers in both species. Our study identified new 78 new marker × trait associations in coast redwood and six in giant sequoia, with genes involved in a range of metabolic, stress, and signaling pathways, among other functions. This study contributes to a better understanding of the genomic basis of drought tolerance in long‐generation conifers and helps guide current and future conservation efforts in the species. 
    more » « less
  3. Abstract Background Dalbergia odorifera is an economically and culturally important species in the Fabaceae because of the high-quality lumber and traditional Chinese medicines made from this plant, however, overexploitation has increased the scarcity of D. odorifera . Given the rarity and the multiple uses of this species, it is important to expand the genomic resources for utilizing in applications such as tracking illegal logging, determining effective population size of wild stands, delineating pedigrees in marker assisted breeding programs, and resolving gene networks in functional genomics studies. Even the nuclear and chloroplast genomes have been published for D. odorifera , the complete mitochondrial genome has not been assembled or assessed for sequence transfer to other genomic compartments until now. Such work is essential in understanding structural and functional genome evolution in a lineage (Fabaceae) with frequent intergenomic sequence transfers. Results We integrated Illumina short-reads and PacBio CLR long-reads to assemble and annotate the complete mitochondrial genome of D. odorifera . The mitochondrial genome was organized as a single circular structure of 435 Kb in length containing 33 protein coding genes, 4 rRNA and 17 tRNA genes. Nearly 4.0% (17,386 bp) of the genome was annotated as repetitive DNA. From the sequence transfer analysis, it was found that 114 Kb of DNA originating from the mitochondrial genome has been transferred to the nuclear genome, with most of the transfer events having taken place relatively recently. The high frequency of sequence transfers from the mitochondria to the nuclear genome was similar to that of sequence transfer from the chloroplast to the nuclear genome. Conclusion For the first-time, the complete mitochondrial genome of D. odorifera was assembled in this study, which will provide a baseline resource in understanding genomic evolution in the highly specious Fabaceae. In particular, the assessment of intergenomic sequence transfer suggests that transfers have been common and recent indicating a possible role in environmental adaptation as has been found in other lineages. The high turnover rate of genomic colinearly and large differences in mitochondrial genome size found in the comparative analyses herein providing evidence for the rapid evolution of mitochondrial genome structure compared to chloroplasts in Faboideae. While phylogenetic analyses using functional genes indicate that mitochondrial genes are very slowly evolving compared to chloroplast genes. 
    more » « less
  4. Abstract Hares (genus Lepus) provide clear examples of repeated and often massive introgressive hybridization and striking local adaptations. Genomic studies on this group have so far relied on comparisons to the European rabbit (Oryctolagus cuniculus) reference genome. Here, we report the first de novo draft reference genome for a hare species, the mountain hare (Lepus timidus), and evaluate the efficacy of whole-genome re-sequencing analyses using the new reference versus using the rabbit reference genome. The genome was assembled using the ALLPATHS-LG protocol with a combination of overlapping pair and mate-pair Illumina sequencing (77x coverage). The assembly contained 32,294 scaffolds with a total length of 2.7 Gb and a scaffold N50 of 3.4 Mb. Re-scaffolding based on the rabbit reference reduced the total number of scaffolds to 4,205 with a scaffold N50 of 194 Mb. A correspondence was found between 22 of these hare scaffolds and the rabbit chromosomes, based on gene content and direct alignment. We annotated 24,578 protein coding genes by combining ab-initio predictions, homology search, and transcriptome data, of which 683 were solely derived from hare-specific transcriptome data. The hare reference genome is therefore a new resource to discover and investigate hare-specific variation. Similar estimates of heterozygosity and inferred demographic history profiles were obtained when mapping hare whole-genome re-sequencing data to the new hare draft genome or to alternative references based on the rabbit genome. Our results validate previous reference-based strategies and suggest that the chromosome-scale hare draft genome should enable chromosome-wide analyses and genome scans on hares. 
    more » « less
  5. Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of non-gap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2,000 genes that were previously unplaced. We also discovered more than 5,700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd-B1 photoperiod response locus. 
    more » « less