skip to main content

Title: A Reference Genome Assembly of Simmental Cattle, Bos taurus taurus
Abstract Genomics research has relied principally on the establishment and curation of a reference genome for the species. However, it is increasingly recognized that a single reference genome cannot fully describe the extent of genetic variation within many widely distributed species. Pangenome representations are based on high-quality genome assemblies of multiple individuals and intended to represent the broadest possible diversity within a species. A Bovine Pangenome Consortium (BPC) has recently been established to begin assembling genomes from more than 600 recognized breeds of cattle, together with other related species to provide information on ancestral alleles and haplotypes. Previously reported de novo genome assemblies for Angus, Brahman, Hereford, and Highland breeds of cattle are part of the initial BPC effort. The present report describes a complete single haplotype assembly at chromosome-scale for a fullblood Simmental cow from an F1 bison–cattle hybrid fetus by trio binning. Simmental cattle, also known as Fleckvieh due to their red and white spots, originated in central Europe in the 1830s as a triple-purpose breed selected for draught, meat, and dairy production. There are over 50 million Simmental cattle in the world, known today for their fast growth and beef yields. This assembly (ARS_Simm1.0) is similar in length to the other bovine assemblies at 2.86 Gb, with a scaffold N50 of 102 Mb (max scaffold 156.8 Mb) and meets or exceeds the continuity of the best Bos taurus reference assemblies to date.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Koepfli, Klaus-Peter
Date Published:
Journal Name:
Journal of Heredity
Page Range / eLocation ID:
184 to 191
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Despite being quite specious (~10,000 extant species), birds have a fairly uniform genome size and karyotype (including the common occurrence of microchromosomes) relative to other vertebrate lineages. Storks (Family Ciconiidae) are a charismatic and distinct group of large wading birds with nearly worldwide distribution but few genomic resources. Here we present an annotated chromosome-level reference genome and chromosome orthology analysis for the wood stork (Mycteria americana), a species that has been federally protected under the Endangered Species Act since 1984. The annotated chromosome-level reference assembly was produced using the blood of a wild female wood stork chick, has a length of 1.35 Gb, a contig N50 of 37 Mb, a scaffold N50 of 80 Mb, and a BUSCO score of 98.8%. We identified 31 autosomal pairs and two sex chromosomes in the wood stork genome, but failed to identify four additional autosomal microchromosomes previously found via karyotyping. Orthology analyses confirmed reported synapomorphies unique to storks and identified the chromosomes participating in these fusions. This study highlights the difficulty and potential problems associated with delineating microchromosomes in reference genome assemblies. It also provides a foundation for studying karyotype evolution in the core water bird clade that includes penguins, albatrosses, storks, cormorants, herons, and ibises. Finally, our reference genome will allow for numerous genomic studies, such as genome-wide association studies of local adaptation, that will aid in wood stork conservation.

    more » « less
  2. Abstract

    The number of reference genomes of snakes lags behind several other vertebrate groups (e.g. birds and mammals). However, in the last two years, a concerted effort by researchers from around the world has produced new genomes of snakes representing members from several new families. Here, we present a high-quality, annotated genome of the central ratsnake (Pantherophis alleghaniensis), a member of the most diverse snake lineage, Colubroidea. Pantherophis alleghaniensis is found in the central part of the Nearctic, east of the Mississippi River. This genome was sequenced using 10X Chromium synthetic long reads and polished using Illumina short reads. The final genome assembly had an N50 of 21.82 Mb and an L50 of 22 scaffolds with a maximum scaffold length of 82.078 Mb. The genome is composed of 49.24% repeat elements dominated by long interspersed elements. We annotated this genome using transcriptome assemblies from 14 tissue types and recovered 28,368 predicted proteins. Finally, we estimated admixture proportions between two species of ratsnakes and discovered that this specimen is an admixed individual containing genomes from the western (Pantherophis obsoletus) and central ratsnakes (P. alleghaniensis). We discuss the importance of considering interspecific admixture in downstream approaches for inferring demography and phylogeny.

    more » « less
  3. null (Ed.)
    The blue crab, Callinectes sapidus (Rathbun, 1896) is an economically, culturally, and ecologically important species found across the temperate and tropical North and South American Atlantic coast. A reference genome will enable research for this high-value species. Initial assembly combined 200× coverage Illumina paired-end reads, a 60× 8 kb mate-paired library, and 50× PacBio data using the MaSuRCA assembler resulting in a 985 Mb assembly with a scaffold N50 of 153 kb. Dovetail Chicago and HiC sequencing with the 3d DNA assembler and Juicebox assembly tools were then used for chromosome scaffolding. The 50 largest scaffolds span 810 Mb are 1.5–37 Mb long and have a repeat content of 36%. The 190 Mb unplaced sequence is in 3921 sequences over 10 kb with a repeat content of 68%. The final assembly N50 is 18.9 Mb for scaffolds and 9317 bases for contigs. Of arthropod BUSCO, ∼88% (888/1013) were complete and single copies. Using 309 million RNAseq read pairs from 12 different tissues and developmental stages, 25,249 protein-coding genes were predicted. Between C. sapidus and Portunus trituberculatus genomes, 41 of 50 large scaffolds had high nucleotide identity and protein-coding synteny, but 9 scaffolds in both assemblies were not clear matches. The protein-coding genes included 9423 one-to-one putative orthologs, of which 7165 were syntenic between the two crab species. Overall, the two crab genome assemblies show strong similarities at the nucleotide, protein, and chromosome level and verify the blue crab genome as an excellent reference for this important seafood species. 
    more » « less
  4. Abstract Background

    The increasing number of chromosome-level genome assemblies has advanced our knowledge and understanding of macroevolutionary processes. Here, we introduce the genome of the desert horned lizard, Phrynosoma platyrhinos, an iguanid lizard occupying extreme desert conditions of the American southwest. We conduct analysis of the chromosomal structure and composition of this species and compare these features across genomes of 12 other reptiles (5 species of lizards, 3 snakes, 3 turtles, and 1 bird).


    The desert horned lizard genome was sequenced using Illumina paired-end reads and assembled and scaffolded using Dovetail Genomics Hi-C and Chicago long-range contact data. The resulting genome assembly has a total length of 1,901.85 Mb, scaffold N50 length of 273.213 Mb, and includes 5,294 scaffolds. The chromosome-level assembly is composed of 6 macrochromosomes and 11 microchromosomes. A total of 20,764 genes were annotated in the assembly. GC content and gene density are higher for microchromosomes than macrochromosomes, while repeat element distributions show the opposite trend. Pathway analyses provide preliminary evidence that microchromosome and macrochromosome gene content are functionally distinct. Synteny analysis indicates that large microchromosome blocks are conserved among closely related species, whereas macrochromosomes show evidence of frequent fusion and fission events among reptiles, even between closely related species.


    Our results demonstrate dynamic karyotypic evolution across Reptilia, with frequent inferred splits, fusions, and rearrangements that have resulted in shuffling of chromosomal blocks between macrochromosomes and microchromosomes. Our analyses also provide new evidence for distinct gene content and chromosomal structure between microchromosomes and macrochromosomes within reptiles.

    more » « less
  5. Abstract Background The development of trio binning as an approach for assembling diploid genomes has enabled the creation of fully haplotype-resolved reference genomes. Unlike other methods of assembly for diploid genomes, this approach is enhanced, rather than hindered, by the heterozygosity of the individual sequenced. To maximize heterozygosity and simultaneously assemble reference genomes for 2 species, we applied trio binning to an interspecies F1 hybrid of yak (Bos grunniens) and cattle (Bos taurus), 2 species that diverged nearly 5 million years ago. The genomes of both of these species are composed of acrocentric autosomes. Results We produced the most continuous haplotype-resolved assemblies for a diploid animal yet reported. Both the maternal (yak) and paternal (cattle) assemblies have the largest 2 chromosomes in single haplotigs, and more than one-third of the autosomes similarly lack gaps. The maximum length haplotig produced was 153 Mb without any scaffolding or gap-filling steps and represents the longest haplotig reported for any species. The assemblies are also more complete and accurate than those reported for most other vertebrates, with 97% of mammalian universal single-copy orthologs present. Conclusions The high heterozygosity inherent to interspecies crosses maximizes the effectiveness of the trio binning method. The interspecies trio binning approach we describe is likely to provide the highest-quality assemblies for any pair of species that can interbreed to produce hybrid offspring that develop to sufficient cell numbers for DNA extraction. 
    more » « less