skip to main content


Title: Placing human gene families into their evolutionary context
Abstract

Following the draft sequence of the first human genome over 20 years ago, we have achieved unprecedented insights into the rules governing its evolution, often with direct translational relevance to specific diseases. However, staggering sequence complexity has also challenged the development of a more comprehensive understanding of human genome biology. In this context, interspecific genomic studies between humans and other animals have played a critical role in our efforts to decode human gene families. In this review, we focus on how the rapid surge of genome sequencing of both model and non-model organisms now provides a broader comparative framework poised to empower novel discoveries. We begin with a general overview of how comparative approaches are essential for understanding gene family evolution in the human genome, followed by a discussion of analyses of gene expression. We show how homology can provide insights into the genes and gene families associated with immune response, cancer biology, vision, chemosensation, and metabolism, by revealing similarity in processes among distant species. We then explain methodological tools that provide critical advances and show the limitations of common approaches. We conclude with a discussion of how these investigations position us to gain fundamental insights into the evolution of gene families among living organisms in general. We hope that our review catalyzes additional excitement and research on the emerging field of comparative genomics, while aiding the placement of the human genome into its existentially evolutionary context.

 
more » « less
Award ID(s):
1755242 2032073
NSF-PAR ID:
10379850
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Human Genomics
Volume:
16
Issue:
1
ISSN:
1479-7364
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Transposable elements (TEs), repeat expansions, and repeat-mediated structural rearrangements play key roles in chromosome structure and species evolution, contribute to human genetic variation, and substantially influence human health through copy number variants, structural variants, insertions, deletions, and alterations to gene transcription and splicing. Despite their formative role in genome stability, repetitive regions have been relegated to gaps and collapsed regions in human genome reference GRCh38 owing to the technological limitations during its development. The lack of linear sequence in these regions, particularly in centromeres, resulted in the inability to fully explore the repeat content of the human genome in the context of both local and regional chromosomal environments. RATIONALE Long-read sequencing supported the complete, telomere-to-telomere (T2T) assembly of the pseudo-haploid human cell line CHM13. This resource affords a genome-scale assessment of all human repetitive sequences, including TEs and previously unknown repeats and satellites, both within and outside of gaps and collapsed regions. Additionally, a complete genome enables the opportunity to explore the epigenetic and transcriptional profiles of these elements that are fundamental to our understanding of chromosome structure, function, and evolution. Comparative analyses reveal modes of repeat divergence, evolution, and expansion or contraction with locus-level resolution. RESULTS We implemented a comprehensive repeat annotation workflow using previously known human repeats and de novo repeat modeling followed by manual curation, including assessing overlaps with gene annotations, segmental duplications, tandem repeats, and annotated repeats. Using this method, we developed an updated catalog of human repetitive sequences and refined previous repeat annotations. We discovered 43 previously unknown repeats and repeat variants and characterized 19 complex, composite repetitive structures, which often carry genes, across T2T-CHM13. Using precision nuclear run-on sequencing (PRO-seq) and CpG methylated sites generated from Oxford Nanopore Technologies long-read sequencing data, we assessed RNA polymerase engagement across retroelements genome-wide, revealing correlations between nascent transcription, sequence divergence, CpG density, and methylation. These analyses were extended to evaluate RNA polymerase occupancy for all repeats, including high-density satellite repeats that reside in previously inaccessible centromeric regions of all human chromosomes. Moreover, using both mapping-dependent and mapping-independent approaches across early developmental stages and a complete cell cycle time series, we found that engaged RNA polymerase across satellites is low; in contrast, TE transcription is abundant and serves as a boundary for changes in CpG methylation and centromere substructure. Together, these data reveal the dynamic relationship between transcriptionally active retroelement subclasses and DNA methylation, as well as potential mechanisms for the derivation and evolution of new repeat families and composite elements. Focusing on the emerging T2T-level assembly of the HG002 X chromosome, we reveal that a high level of repeat variation likely exists across the human population, including composite element copy numbers that affect gene copy number. Additionally, we highlight the impact of repeats on the structural diversity of the genome, revealing repeat expansions with extreme copy number differences between humans and primates while also providing high-confidence annotations of retroelement transduction events. CONCLUSION The comprehensive repeat annotations and updated repeat models described herein serve as a resource for expanding the compendium of human genome sequences and reveal the impact of specific repeats on the human genome. In developing this resource, we provide a methodological framework for assessing repeat variation within and between human genomes. The exhaustive assessment of the transcriptional landscape of repeats, at both the genome scale and locally, such as within centromeres, sets the stage for functional studies to disentangle the role transcription plays in the mechanisms essential for genome stability and chromosome segregation. Finally, our work demonstrates the need to increase efforts toward achieving T2T-level assemblies for nonhuman primates and other species to fully understand the complexity and impact of repeat-derived genomic innovations that define primate lineages, including humans. Telomere-to-telomere assembly of CHM13 supports repeat annotations and discoveries. The human reference T2T-CHM13 filled gaps and corrected collapsed regions (triangles) in GRCh38. Combining long read–based methylation calls, PRO-seq, and multilevel computational methods, we provide a compendium of human repeats, define retroelement expression and methylation profiles, and delineate locus-specific sites of nascent transcription genome-wide, including previously inaccessible centromeres. SINE, short interspersed element; SVA, SINE–variable number tandem repeat– Alu ; LINE, long interspersed element; LTR, long terminal repeat; TSS, transcription start site; pA, xxxxxxxxxxxxxxxx. 
    more » « less
  2. Synopsis

    Fifty years ago, animal models studied in the lab were highly diverse, and biological insights were derived from experiments in many species. However, the pursuit of mechanistic explanations in organismal biology led to a shift in the species most commonly studied. The advent of genetic manipulations and economies of scale promoted the consolidation of research into fewer species (eg, Drosophila melanogaster, Mus musculus, Danio rerio, Caenorhabditis elegans). As a result, the tremendous variety of evolutionary adaptations across species provided insights into ultimate causes of evolution, but their proximate mechanisms have been understudied in recent decades. Within the last decade, developments in genome modifications have enabled functional genetic studies in a wide variety of species. This special issue combines papers derived from a symposium organized at the Society for Integrative and Comparative Biology conference in January 2023 in Austin, Texas. The symposium entitled “Neuroethology in the age of gene editing: New tools and novel insights into the molecular and neural basis of behavior” was convened to catalyze the transfer of knowledge and skills from researchers who have applied genome modification technologies in new model organisms to scientists who would like to bring these approaches to their own research programs. We highlight this work here, and suggest how the future of biological knowledge will be informed by these powerful experiments.

     
    more » « less
  3. Abstract

    Microbial communities are essential components of aquatic ecosystems through their contribution to food web dynamics and biogeochemical processes. Aquatic microbial diversity is immense and a general challenge is to understand how metabolism and interactions of single organisms shape microbial community dynamics and ecosystem‐scale biogeochemical transformations. Metagenomic approaches have developed rapidly, and proven to be powerful in linking microbial community dynamics to biogeochemical processes. In this review, we provide an overview of metagenomic approaches, followed by a discussion on some recent insights they have provided, including those in this special issue. These include the discovery of new taxa and metabolisms in aquatic microbiomes, insights into community assembly and functional ecology as well as evolutionary processes shaping microbial genomes and microbiomes, and the influence of human activities on aquatic microbiomes. Given that metagenomics can now be considered a mature technology where data generation and descriptive analyses are relatively routine and informative, we then discuss metagenomic‐enabled research avenues to further link microbial dynamics to biogeochemical processes. These include the integration of metagenomics into well‐designed ecological experiments, the use of metagenomics to inform and validate metabolic and biogeochemical models, and the pressing need for ecologically relevant model organisms and simple microbial systems to better interpret the taxonomic and functional information integrated in metagenomes. These research avenues will contribute to a more mechanistic and predictive understanding of links between microbial dynamics and biogeochemical cycles. Owing to rapid climate change and human impacts on aquatic ecosystems, the urgency of such an understanding has never been greater.

     
    more » « less
  4. null (Ed.)
    Abstract Background Comparative genomics studies are growing in number partly because of their unique ability to provide insight into shared and divergent biology between species. Of particular interest is the use of phylogenetic methods to infer the evolutionary history of cis-regulatory sequence features, which contribute strongly to phenotypic divergence and are frequently gained and lost in eutherian genomes. Understanding the mechanisms by which cis-regulatory element turnover generate emergent phenotypes is crucial to our understanding of adaptive evolution. Ancestral reconstruction methods can place species-specific cis-regulatory features in their evolutionary context, thus increasing our understanding of the process of regulatory sequence turnover. However, applying these methods to gain and loss of cis-regulatory features historically required complex workflows, preventing widespread adoption by the broad scientific community. Results MapGL simplifies phylogenetic inference of the evolutionary history of short genomic sequence features by combining the necessary steps into a single piece of software with a simple set of inputs and outputs. We show that MapGL can reliably disambiguate the mechanisms underlying differential regulatory sequence content across a broad range of phylogenetic topologies and evolutionary distances. Thus, MapGL provides the necessary context to evaluate how genomic sequence gain and loss contribute to species-specific divergence. Conclusions MapGL makes phylogenetic inference of species-specific sequence gain and loss easy for both expert and non-expert users, making it a powerful tool for gaining novel insights into genome evolution. 
    more » « less
  5. Synopsis Marine mammals exhibit some of the most dramatic physiological adaptations in their clade and offer unparalleled insights into the mechanisms driving convergent evolution on relatively short time scales. Some of these adaptations, such as extreme tolerance to hypoxia and prolonged food deprivation, are uncommon among most terrestrial mammals and challenge established metabolic principles of supply and demand balance. Non-targeted omics studies are starting to uncover the genetic foundations of such adaptations, but tools for testing functional significance in these animals are currently lacking. Cellular modeling with primary cells represents a powerful approach for elucidating the molecular etiology of physiological adaptation, a critical step in accelerating genome-to-phenome studies in organisms in which transgenesis is impossible (e.g., large-bodied, long-lived, fully aquatic, federally protected species). Gene perturbation studies in primary cells can directly evaluate whether specific mutations, gene loss, or duplication confer functional advantages such as hypoxia or stress tolerance in marine mammals. Here, we summarize how genetic and pharmacological manipulation approaches in primary cells have advanced mechanistic investigations in other non-traditional mammalian species, and highlight the need for such investigations in marine mammals. We also provide key considerations for isolating, culturing, and conducting experiments with marine mammal cells under conditions that mimic in vivo states. We propose that primary cell culture is a critical tool for conducting functional mechanistic studies (e.g., gene knockdown, over-expression, or editing) that can provide the missing link between genome- and organismal-level understanding of physiological adaptations in marine mammals. 
    more » « less