skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The genome of the American groundhog, Marmota monax
We sequenced the genome of the North American groundhog, Marmota monax , also known as the woodchuck. Our sequencing strategy included a combination of short, high-quality Illumina reads plus long reads generated by both Pacific Biosciences and Oxford Nanopore instruments. Assembly of the combined data produced a genome of 2.74 Gbp in total length, with an N50 contig size of 1,094,236 bp. To annotate the genome, we mapped the genes from another M. monax genome and from the closely related Alpine marmot, Marmota marmota , onto our assembly, resulting in 20,559 annotated protein-coding genes and 28,135 transcripts. The genome assembly and annotation are available in GenBank under BioProject PRJNA587092 .  more » « less
Award ID(s):
1744309
PAR ID:
10308626
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
F1000Research
Volume:
9
ISSN:
2046-1402
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Gulf pipefish Syngnathus scovelli has emerged as an important species for studying sexual selection, development, and physiology. Comparative evolutionary genomics research involving fishes from Syngnathidae depends on having a high-quality genome assembly and annotation. However, the first S. scovelli genome assembled using short-read sequences and a smaller RNA-sequence dataset has limited contiguity and a relatively poor annotation. Here, using PacBio long-read high-fidelity sequences and a proximity ligation library, we generate an improved assembly to obtain 22 chromosome-level scaffolds. Compared to the first assembly, the gaps in the improved assembly are smaller, the N75 is larger, and our genome is ~95% BUSCO complete. Using a large body of RNA-Seq reads from different tissue types and NCBI's Eukaryotic Annotation Pipeline, we discovered 28,162 genes, of which 8,061 are non-coding genes. Our new genome assembly and annotation are tagged as a RefSeq genome by NCBI and provide enhanced resources for research work involving S. scovelli. 
    more » « less
  2. Abstract Suncus etruscusis one of the world’s smallest mammals, with an average body mass of about 2 grams. The Etruscan shrew’s small body is accompanied by a very high energy demand and numerous metabolic adaptations. Here we report a chromosome-level genome assembly using PacBio long read sequencing, 10X Genomics linked short reads, optical mapping, and Hi-C linked reads. The assembly is partially phased, with the 2.472 Gbp primary pseudohaplotype and 1.515 Gbp alternate. We manually curated the primary assembly and identified 22 chromosomes, including X and Y sex chromosomes. The NCBI genome annotation pipeline identified 39,091 genes, 19,819 of them protein-coding. We also identified segmental duplications, inferred GO term annotations, and computed orthologs of human and mouse genes. This reference-quality genome will be an important resource for research on mammalian development, metabolism, and body size control. 
    more » « less
  3. Jewel wasps in the genus of Nasonia are parasitoids with haplodiploidy sex determination, rapid development and are easy to culture in the laboratory. They are excellent models for insect genetics, genomics, epigenetics, development, and evolution. Nasonia vitripennis ( Nv ) and N. giraulti ( Ng ) are closely-related species that can be intercrossed, particularly after removal of the intracellular bacterium Wolbachia , which serve as a powerful tool to map and positionally clone morphological, behavioral, expression and methylation phenotypes. The Nv reference genome was assembled using Sanger, PacBio and Nanopore approaches and annotated with extensive RNA-seq data. In contrast, Ng genome is only available through low coverage resequencing. Therefore, de novo Ng assembly is in urgent need to advance this system. In this study, we report a high-quality Ng assembly using 10X Genomics linked-reads with 670X sequencing depth. The current assembly has a genome size of 259,040,977 bp in 3,160 scaffolds with 38.05% G-C and a 98.6% BUSCO completeness score. 97% of the RNA reads are perfectly aligned to the genome, indicating high quality in contiguity and completeness. A total of 14,777 genes are annotated in the Ng genome, and 72% of the annotated genes have a one-to-one ortholog in the Nv genome. We reported 5 million Ng-Nv SNPs which will facility mapping and population genomic studies in Nasonia . In addition, 42 Ng -specific genes were identified by comparing with Nv genome and annotation. This is the first de novo assembly for this important species in the Nasonia model system, providing a useful new genomic toolkit. 
    more » « less
  4. Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of non-gap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2,000 genes that were previously unplaced. We also discovered more than 5,700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd-B1 photoperiod response locus. 
    more » « less
  5. The hagfishes (Myxiniformes) arose from agnathan (jawless vertebrate) lineages and they are one of only two extant cyclostome taxa, together with lampreys (Petromyzontiformes). Even though whole genome sequencing has been achieved for diverse vertebrate taxa, genome-wide sequence information has been highly limited for cyclostomes. Here we sequenced the genome of the inshore hagfish Eptatretus burgeri using DNA extracted from the testis, with a short-read sequencing platform, aiming to reconstruct a high-coverage protein-coding gene catalogue. The obtained genome assembly, scaffolded with mate-pair reads and paired RNA-seq reads, exhibited an N50 scaffold length of 293 Kbp, which allowed the genome-wide prediction of coding genes. This computation resulted in the gene models whose completeness was estimated at the complete coverage of more than 83 % and the partial coverage of more than 93 % by referring to evolutionarily conserved single-copy orthologs. The high contiguity of the assembly and completeness of the gene models promise a high utility in various comparative analyses including phylogenomics and phylome exploration. 
    more » « less