skip to main content

Title: Continuous chromosome-scale haplotypes assembled from a single interspecies F1 hybrid of yak and cattle
Abstract Background The development of trio binning as an approach for assembling diploid genomes has enabled the creation of fully haplotype-resolved reference genomes. Unlike other methods of assembly for diploid genomes, this approach is enhanced, rather than hindered, by the heterozygosity of the individual sequenced. To maximize heterozygosity and simultaneously assemble reference genomes for 2 species, we applied trio binning to an interspecies F1 hybrid of yak (Bos grunniens) and cattle (Bos taurus), 2 species that diverged nearly 5 million years ago. The genomes of both of these species are composed of acrocentric autosomes. Results We produced the most continuous haplotype-resolved assemblies for a diploid animal yet reported. Both the maternal (yak) and paternal (cattle) assemblies have the largest 2 chromosomes in single haplotigs, and more than one-third of the autosomes similarly lack gaps. The maximum length haplotig produced was 153 Mb without any scaffolding or gap-filling steps and represents the longest haplotig reported for any species. The assemblies are also more complete and accurate than those reported for most other vertebrates, with 97% of mammalian universal single-copy orthologs present. Conclusions The high heterozygosity inherent to interspecies crosses maximizes the effectiveness of the trio binning method. The interspecies trio binning more » approach we describe is likely to provide the highest-quality assemblies for any pair of species that can interbreed to produce hybrid offspring that develop to sufficient cell numbers for DNA extraction. « less
Authors:
; ; ; ; ; ; ; ; ; ; ; ; ; ;
Award ID(s):
1754451
Publication Date:
NSF-PAR ID:
10147793
Journal Name:
GigaScience
Volume:
9
Issue:
4
ISSN:
2047-217X
Sponsoring Org:
National Science Foundation
More Like this
  1. Koepfli, Klaus-Peter (Ed.)
    Abstract Genomics research has relied principally on the establishment and curation of a reference genome for the species. However, it is increasingly recognized that a single reference genome cannot fully describe the extent of genetic variation within many widely distributed species. Pangenome representations are based on high-quality genome assemblies of multiple individuals and intended to represent the broadest possible diversity within a species. A Bovine Pangenome Consortium (BPC) has recently been established to begin assembling genomes from more than 600 recognized breeds of cattle, together with other related species to provide information on ancestral alleles and haplotypes. Previously reported de novo genome assemblies for Angus, Brahman, Hereford, and Highland breeds of cattle are part of the initial BPC effort. The present report describes a complete single haplotype assembly at chromosome-scale for a fullblood Simmental cow from an F1 bison–cattle hybrid fetus by trio binning. Simmental cattle, also known as Fleckvieh due to their red and white spots, originated in central Europe in the 1830s as a triple-purpose breed selected for draught, meat, and dairy production. There are over 50 million Simmental cattle in the world, known today for their fast growth and beef yields. This assembly (ARS_Simm1.0) is similar inmore »length to the other bovine assemblies at 2.86 Gb, with a scaffold N50 of 102 Mb (max scaffold 156.8 Mb) and meets or exceeds the continuity of the best Bos taurus reference assemblies to date.« less
  2. Shapiro, Beth (Ed.)
    Abstract In addition to including one of the most popular companion animals, species from the cat family Felidae serve as a powerful system for genetic analysis of inherited and infectious disease, as well as for the study of phenotypic evolution and speciation. Previous diploid-based genome assemblies for the domestic cat have served as the primary reference for genomic studies within the cat family. However, these versions suffered from poor resolution of complex and highly repetitive regions, with substantial amounts of unplaced sequence that is polymorphic or copy number variable. We sequenced the genome of a female F1 Bengal hybrid cat, the offspring of a domestic cat (Felis catus) x Asian leopard cat (Prionailurus bengalensis) cross, with PacBio long sequence reads and used Illumina sequence reads from the parents to phase >99.9% of the reads into the 2 species’ haplotypes. De novo assembly of the phased reads produced highly continuous haploid genome assemblies for the domestic cat and Asian leopard cat, with contig N50 statistics exceeding 83 Mb for both genomes. Whole-genome alignments reveal the Felis and Prionailurus genomes are colinear, and the cytogenetic differences between the homologous F1 and E4 chromosomes represent a case of centromere repositioning in the absencemore »of a chromosomal inversion. Both assemblies offer significant improvements over the previous domestic cat reference genome, with a 100% increase in contiguity and the capture of the vast majority of chromosome arms in 1 or 2 large contigs. We further demonstrated that comparably accurate F1 haplotype phasing can be achieved with members of the same species when one or both parents of the trio are not available. These novel genome resources will empower studies of feline precision medicine, adaptation, and speciation.« less
  3. Koepfli, Klaus-Peter (Ed.)
    Abstract Bison are an icon of the American West and an ecologically, commercially, and culturally important species. Despite numbering in the hundreds of thousands today, conservation concerns remain for the species, including the impact on genetic diversity of a severe bottleneck around the turn of the 20th century and genetic introgression from domestic cattle. Genetic diversity and admixture are best evaluated at genome-wide scale, for which a high-quality reference is necessary. Here, we use trio binning of long reads from a bison–Simmental cattle (Bos taurus taurus) male F1 hybrid to sequence and assemble the genome of the American plains bison (Bison bison bison). The male haplotype genome is chromosome-scale, with a total length of 2.65 Gb across 775 scaffolds (839 contigs) and a scaffold N50 of 87.8 Mb. Our bison genome is ~13× more contiguous overall and ~3400× more contiguous at the contig level than the current bison reference genome. The bison genome sequence presented here (ARS-UCSC_bison1.0) will enable new research into the evolutionary history of this iconic megafauna species and provide a new tool for the management of bison populations in federal and commercial herds.
  4. Gralnick, Jeffrey A. (Ed.)
    ABSTRACT Reconstructing microbial genomes from metagenomic short-read data can be challenging due to the unknown and uneven complexity of microbial communities. This complexity encompasses highly diverse populations, which often includes strain variants. Reconstructing high-quality genomes is a crucial part of the metagenomic workflow, as subsequent ecological and metabolic inferences depend on their accuracy, quality, and completeness. In contrast to microbial communities in other ecosystems, there has been no systematic assessment of genome-centric metagenomic workflows for drinking water microbiomes. In this study, we assessed the performance of a combination of assembly and binning strategies for time series drinking water metagenomes that were collected over 6 months. The goal of this study was to identify the combination of assembly and binning approaches that result in high-quality and -quantity metagenome-assembled genomes (MAGs), representing most of the sequenced metagenome. Our findings suggest that the metaSPAdes coassembly strategies had the best performance, as they resulted in larger and less fragmented assemblies, with at least 85% of the sequence data mapping to contigs greater than 1 kbp. Furthermore, a combination of metaSPAdes coassembly strategies and MetaBAT2 produced the highest number of medium-quality MAGs while capturing at least 70% of the metagenomes based on read recruitment. Utilizing different assembly/binningmore »approaches also assists in the reconstruction of unique MAGs from closely related species that would have otherwise collapsed into a single MAG using a single workflow. Overall, our study suggests that leveraging multiple binning approaches with different metaSPAdes coassembly strategies may be required to maximize the recovery of good-quality MAGs. IMPORTANCE Drinking water contains phylogenetic diverse groups of bacteria, archaea, and eukarya that affect the esthetic quality of water, water infrastructure, and public health. Taxonomic, metabolic, and ecological inferences of the drinking water microbiome depend on the accuracy, quality, and completeness of genomes that are reconstructed through the application of genome-resolved metagenomics. Using time series metagenomic data, we present reproducible genome-centric metagenomic workflows that result in high-quality and -quantity genomes, which more accurately signifies the sequenced drinking water microbiome. These genome-centric metagenomic workflows will allow for improved taxonomic and functional potential analysis that offers enhanced insights into the stability and dynamics of drinking water microbial communities.« less
  5. Abstract Comparisons of high-quality, reference butterfly, and moth genomes have been instrumental to advancing our understanding of how hybridization, and natural selection drive genomic change during the origin of new species and novel traits. Here, we present a genome assembly of the Southern Dogface butterfly, Zerene cesonia (Pieridae) whose brilliant wing colorations have been implicated in developmental plasticity, hybridization, sexual selection, and speciation. We assembled 266,407,278 bp of the Z. cesonia genome, which accounts for 98.3% of the estimated 271 Mb genome size. Using a hybrid approach involving Chicago libraries with Hi-Rise assembly and a diploid Meraculous assembly, the final haploid genome was assembled. In the final assembly, nearly all autosomes and the Z chromosome were assembled into single scaffolds. The largest 29 scaffolds accounted for 91.4% of the genome assembly, with the remaining ∼8% distributed among another 247 scaffolds and overall N50 of 9.2 Mb. Tissue-specific RNA-seq informed annotations identified 16,442 protein-coding genes, which included 93.2% of the arthropod Benchmarking Universal Single-Copy Orthologs (BUSCO). The Z. cesonia genome assembly had ∼9% identified as repetitive elements, with a transposable element landscape rich in helitrons. Similar to other Lepidoptera genomes, Z. cesonia showed a high conservation of chromosomal synteny. The Z. cesonia assembly provides a high-quality reference formore »studies of chromosomal arrangements in the Pierid family, as well as for population, phylo, and functional genomic studies of adaptation and speciation.« less