skip to main content

Title: A Reference Genome Assembly of American Bison, Bison bison bison
Abstract Bison are an icon of the American West and an ecologically, commercially, and culturally important species. Despite numbering in the hundreds of thousands today, conservation concerns remain for the species, including the impact on genetic diversity of a severe bottleneck around the turn of the 20th century and genetic introgression from domestic cattle. Genetic diversity and admixture are best evaluated at genome-wide scale, for which a high-quality reference is necessary. Here, we use trio binning of long reads from a bison–Simmental cattle (Bos taurus taurus) male F1 hybrid to sequence and assemble the genome of the American plains bison (Bison bison bison). The male haplotype genome is chromosome-scale, with a total length of 2.65 Gb across 775 scaffolds (839 contigs) and a scaffold N50 of 87.8 Mb. Our bison genome is ~13× more contiguous overall and ~3400× more contiguous at the contig level than the current bison reference genome. The bison genome sequence presented here (ARS-UCSC_bison1.0) will enable new research into the evolutionary history of this iconic megafauna species and provide a new tool for the management of bison populations in federal and commercial herds.
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;
Koepfli, Klaus-Peter
Award ID(s):
Publication Date:
Journal Name:
Journal of Heredity
Page Range or eLocation-ID:
174 to 183
Sponsoring Org:
National Science Foundation
More Like this
  1. Koepfli, Klaus-Peter (Ed.)
    Abstract Genomics research has relied principally on the establishment and curation of a reference genome for the species. However, it is increasingly recognized that a single reference genome cannot fully describe the extent of genetic variation within many widely distributed species. Pangenome representations are based on high-quality genome assemblies of multiple individuals and intended to represent the broadest possible diversity within a species. A Bovine Pangenome Consortium (BPC) has recently been established to begin assembling genomes from more than 600 recognized breeds of cattle, together with other related species to provide information on ancestral alleles and haplotypes. Previously reported de novo genome assemblies for Angus, Brahman, Hereford, and Highland breeds of cattle are part of the initial BPC effort. The present report describes a complete single haplotype assembly at chromosome-scale for a fullblood Simmental cow from an F1 bison–cattle hybrid fetus by trio binning. Simmental cattle, also known as Fleckvieh due to their red and white spots, originated in central Europe in the 1830s as a triple-purpose breed selected for draught, meat, and dairy production. There are over 50 million Simmental cattle in the world, known today for their fast growth and beef yields. This assembly (ARS_Simm1.0) is similar inmore »length to the other bovine assemblies at 2.86 Gb, with a scaffold N50 of 102 Mb (max scaffold 156.8 Mb) and meets or exceeds the continuity of the best Bos taurus reference assemblies to date.« less
  2. null (Ed.)
    Abstract. American bison (Bison bison L.) have recovered from the brink ofextinction over the past century. Bison reintroduction creates multipleenvironmental benefits, but impacts on greenhouse gas emissions are poorlyunderstood. Bison are thought to have produced some 2 Tg yr−1 of theestimated 9–15 Tg yr−1 of pre-industrial enteric methane emissions,but few measurements have been made due to their mobile grazing habits andsafety issues associated with measuring non-domesticated animals. Here, wemeasure methane and carbon dioxide fluxes from a bison herd on an enclosedpasture during daytime periods in winter using eddy covariance. Methaneemissions from the study area were negligible in the absence of bison(mean ± standard deviation = −0.0009 ± 0.008 µmol m−2 s−1) and were significantly greater than zero,0.048 ± 0.082 µmol m−2 s−1, with a positively skeweddistribution, when bison were present. We coupled bison location estimatesfrom automated camera images with two independent flux footprint models tocalculate a mean per-animal methane efflux of 58.5 µmol s−1 per bison, similar to eddy covariance measurements ofmethane efflux from a cattle feedlot during winter. When we sum theobservations over time with conservative uncertainty estimates we arrive at81 g CH4 per bison d−1 with 95 % confidence intervalsbetween 54 and 109 g CH4 per bison d−1. Uncertainty wasdominated by bison location estimates (46 % of the total uncertainty),then the flux footprint model (33 %) and the eddy covariance measurements(21 %), suggesting that making higher-resolution animal location estimatesis a logical starting point formore »decreasing total uncertainty. Annualmeasurements are ultimately necessary to determine the full greenhouse gasburden of bison grazing systems. Our observations highlight the need tocompare greenhouse gas emissions from different ruminant grazing systems anddemonstrate the potential for using eddy covariance to measure methaneefflux from non-domesticated animals.« less
  3. Dutra, Walderez O. (Ed.)
    More than 100 years since the first description of Chagas Disease and with over 29,000 new cases annually due to vector transmission (in 2010), American Trypanosomiasis remains a Neglected Tropical Disease (NTD). This study presents the most comprehensive Trypanosoma cruzi sampling in terms of geographic locations and triatomine species analyzed to date and includes both nuclear and mitochondrial genomes. This addresses the gap of information from North and Central America. We incorporate new and previously published DNA sequence data from two mitochondrial genes, Cytochrome oxidase II (COII) and NADH dehydrogenase subunit 1 (ND1). These T . cruzi samples were collected over a broad geographic range including 111 parasite DNA samples extracted from triatomines newly collected across North and Central America, all of which were infected with T . cruzi in their natural environment. In addition, we present parasite reduced representation (Restriction site Associated DNA markers, RAD-tag) genomic nuclear data combined with the mitochondrial gene sequences for a subset of the triatomines (27 specimens) collected from Guatemala and El Salvador. Our mitochondrial phylogenetic reconstruction revealed two of the major mitochondrial lineages circulating across North and Central America, as well as the first ever mitochondrial data for TcBat from a triatomine collectedmore »in Central America. Our data also show that within mtTcIII, North and Central America represent an independent, distinct clade from South America, named here as mtTcIII NA-CA , geographically restricted to North and Central America. Lastly, the most frequent lineage detected across North and Central America, mtTcI, was also an independent, distinct clade from South America, noted as mtTcI NA-CA . Furthermore, nuclear genome data based on Single Nucleotide Polymorphism (SNP) showed genetic structure of lineage TcI from specimens collected in Guatemala and El Salvador supporting the hypothesis that genetic diversity at a local scale has a geographical component. Our multiscale analysis contributes to the understanding of the independent and distinct evolution of T . cruzi lineages in North and Central America regions.« less
  4. Abstract Background The release of the first reference genome of walnut (Juglans regia L.) enabled many achievements in the characterization of walnut genetic and functional variation. However, it is highly fragmented, preventing the integration of genetic, transcriptomic, and proteomic information to fully elucidate walnut biological processes. Findings Here, we report the new chromosome-scale assembly of the walnut reference genome (Chandler v2.0) obtained by combining Oxford Nanopore long-read sequencing with chromosome conformation capture (Hi-C) technology. Relative to the previous reference genome, the new assembly features an 84.4-fold increase in N50 size, with the 16 chromosomal pseudomolecules assembled and representing 95% of its total length. Using full-length transcripts from single-molecule real-time sequencing, we predicted 37,554 gene models, with a mean gene length higher than the previous gene annotations. Most of the new protein-coding genes (90%) present both start and stop codons, which represents a significant improvement compared with Chandler v1.0 (only 48%). We then tested the potential impact of the new chromosome-level genome on different areas of walnut research. By studying the proteome changes occurring during male flower development, we observed that the virtual proteome obtained from Chandler v2.0 presents fewer artifacts than the previous reference genome, enabling the identification of amore »new potential pollen allergen in walnut. Also, the new chromosome-scale genome facilitates in-depth studies of intraspecies genetic diversity by revealing previously undetected autozygous regions in Chandler, likely resulting from inbreeding, and 195 genomic regions highly differentiated between Western and Eastern walnut cultivars. Conclusion Overall, Chandler v2.0 will serve as a valuable resource to better understand and explore walnut biology.« less
  5. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implementedmore »a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx.« less