skip to main content


This content will become publicly available on June 2, 2024

Title: A global catalog of whole-genome diversity from 233 primate species
The rich diversity of morphology and behavior displayed across primate species provides an informative context in which to study the impact of genomic diversity on fundamental biological processes. Analysis of that diversity provides insight into long-standing questions in evolutionary and conservation biology and is urgent given severe threats these species are facing. Here, we present high-coverage whole-genome data from 233 primate species representing 86% of genera and all 16 families. This dataset was used, together with fossil calibration, to create a nuclear DNA phylogeny and to reassess evolutionary divergence times among primate clades. We found within-species genetic diversity across families and geographic regions to be associated with climate and sociality, but not with extinction risk. Furthermore, mutation rates differ across species, potentially influenced by effective population sizes. Lastly, we identified extensive recurrence of missense mutations previously thought to be human specific. This study will open a wide range of research avenues for future primate genomic research.  more » « less
Award ID(s):
0621020 1232349
NSF-PAR ID:
10432190
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; « less
Date Published:
Journal Name:
Science
Volume:
380
Issue:
6648
ISSN:
0036-8075
Page Range / eLocation ID:
906 to 913
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Understanding the genetics of biological diversification across micro‐ and macro‐evolutionary time scales is a vibrant field of research for molecular ecologists as rapid advances in sequencing technologies promise to overcome former limitations. In palms, an emblematic, economically and ecologically important plant family with high diversity in the tropics, studies of diversification at the population and species levels are still hampered by a lack of genomic markers suitable for the genotyping of large numbers of recently diverged taxa. To fill this gap, we used a whole genome sequencing approach to develop target sequencing for molecular markers in 4,184 genome regions, including 4,051 genes and 133 non‐genic putatively neutral regions. These markers were chosen to cover a wide range of evolutionary rates allowing future studies at the family, genus, species and population levels. Special emphasis was given to the avoidance of copy number variation during marker selection. In addition, a set of 149 well‐known sequence regions previously used as phylogenetic markers by the palm biological research community were included in the target regions, to open the possibility to combine and jointly analyse already available data sets with genomic data to be produced with this new toolkit. The bait set was effective for species belonging to all three palm sub‐families tested (Arecoideae, Ceroxyloideae and Coryphoideae), with high mapping rates, specificity and efficiency. The number of high‐quality single nucleotide polymorphisms (SNPs) detected at both the sub‐family and population levels facilitates efficient analyses of genomic diversity across micro‐ and macro‐evolutionary time scales.

     
    more » « less
  2. null (Ed.)
    Primate evolution has led to a remarkable diversity of behavioral specializations and pronounced brain size variation among species. Gene expression provides a promising opportunity for studying the molecular basis of brain evolution, but it has been explored in very few primate species to date. To understand the landscape of gene expression evolution across the primate lineage, we generated and analyzed RNA-Seq data from four brain regions in an unprecedented eighteen species. Here we show a remarkable level of variation in gene expression among hominid species, including humans and chimpanzees, despite their relatively recent divergence time from other primates. We found that individual genes display a wide range of expression dynamics across evolutionary time reflective of the diverse selection pressures acting on genes within primate brain tissue. Using our sample that represents an unprecedented 190-fold difference in primate brain size, we identified genes with variation in expression most correlated with brain size and found several with signals of positive selection in their regulatory regions. Our study extensively broadens the context of what is known about the molecular evolution of the brain across primates and identifies novel candidate genes for study of genetic regulation of brain development and evolution. 
    more » « less
  3. null (Ed.)
    The Opiliones superfamily Triaenonychoidea currently includes two families, the monogeneric New Zealand–endemic Synthetonychiidae Forster, 1954 and Triaenonychidae Sørensen, 1886, a diverse family distributed mostly throughout the temperate Gondwanan terranes, with ~110 genera and ~500 species and subspecies currently described. Traditionally, Triaenonychidae has been divided into subfamilies diagnosed by very few morphological characters largely derived from the troublesome ‘Roewerian system’ of morphology, and classifications based on this system led to many complications. Recent research within Triaenonychoidea using morphology and traditional multilocus data has shown multiple deeply divergent lineages, non-monophyly of Triaenonychidae, and non-monophyly of subfamilies, necessitating a revision based on phylogenomic data. We used sequence capture of ultraconserved elements across 164 samples to create a 50% taxon occupancy matrix with 704 loci. Using phylogenomic and morphological examinations, we explored family-level relationships within Triaenonychoidea, including describing two new families: (1) Lomanellidae Mendes & Derkarabetian, fam. nov., consisting of Lomanella Pocock, 1903, and a newly described genus Abaddon Derkarabetian & Baker, gen. nov. with one species, A. despoliator Derkarabetian, sp. nov.; and (2) the elevation to family of Buemarinoidae Karaman, 2019, consisting of Buemarinoa Roewer, 1956, Fumontana Shear, 1977, Flavonuncia Lawrence, 1959, and a newly described genus Turonychus Derkarabetian, Prieto & Giribet, gen. nov., with one species, T. fadriquei Derkarabetian, Prieto & Giribet, sp. nov. With our dataset we also explored phylogenomic relationships within Triaenonychidae with an extensive taxon set including samples representing ~80% of the genus-level diversity. Based on our results we (1) discuss systematics of this family including the historical use of subfamilies, (2) reassess morphology in the context of our phylogeny, (3) hypothesise placement for all unsampled genera, (4) highlight lineages most in need of taxonomic revision, and (5) provide an updated species-level checklist. Aside from describing new taxa, our study provides the phylogenomic context necessary for future evolutionary and systematic research across this diverse lineage.ZooBank Registration: urn:lsid:zoobank.org:pub:81683834-98AB-43AA-B25A-C28C6A404F41 
    more » « less
  4. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  5. Green plants (Viridiplantae) include around 450,000–500,000 species of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida), including green plants (Viridiplantae), glaucophytes (Glaucophyta) and red algae (Rhodophyta). Our analysis provides a robust phylogenomic framework for examining the evolution of green plants. Most inferred species relationships are well supported across multiple species tree and supermatrix analyses, but discordance among plastid and nuclear gene trees at a few important nodes highlights the complexity of plant genome evolution, including polyploidy, periods of rapid speciation, and extinction. Incomplete sorting of ancestral variation, polyploidization and massive expansions of gene families punctuate the evolutionary history of green plants. Notably, we find that large expansions of gene families preceded the origins of green plants, land plants and vascular plants, whereas whole-genome duplications are inferred to have occurred repeatedly throughout the evolution of flowering plants and ferns. The increasing availability of high-quality plant genome sequences and advances in functional genomics are enabling research on genome evolution across the green tree of life. 
    more » « less