skip to main content


Title: Shades of complexity: New perspectives on the evolution and genetic architecture of human skin
Abstract

Like many highly variable human traits, more than a dozen genes are known to contribute to the full range of skin color. However, the historical bias in favor of genetic studies in European and European‐derived populations has blinded us to the magnitude of pigmentation's complexity. As deliberate efforts are being made to better characterize diverse global populations and new sequencing technologies, better measurement tools, functional assessments, predictive modeling, and ancient DNA analyses become more widely accessible, we are beginning to appreciate how limited our understanding of the genetic bases of human skin color have been. Novel variants in genes not previously linked to pigmentation have been identified and evidence is mounting that there are hundreds more variants yet to be found. Even for genes that have been exhaustively characterized in European populations like MC1R, OCA2, and SLC24A5, research in previously understudied groups is leading to a new appreciation of the degree to which genetic diversity, epistatic interactions, pleiotropy, admixture, global and local adaptation, and cultural practices operate in population‐specific ways to shape the genetic architecture of skin color. Furthermore, we are coming to terms with how factors like tanning response and barrier function may also have influenced selection on skin throughout human history. By examining how our knowledge of pigmentation genetics has shifted in the last decade, we can better appreciate how far we have come in understanding human diversity and the still long road ahead for understanding many complex human traits.

 
more » « less
Award ID(s):
1714867
NSF-PAR ID:
10462502
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
American Journal of Physical Anthropology
Volume:
168
Issue:
S67
ISSN:
0002-9483
Page Range / eLocation ID:
p. 4-26
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Phenotypic variation within a species is often structured geographically in clines. InDrosophila americana, a longitudinal cline for body colour exists within North America that appears to be due to local adaptation. Thetanandebonygenes have been hypothesized to contribute to this cline, with alleles of both genes that lighten body colour found inD. americana. These alleles are similar in sequence and function to the allele fixed inD. americana'smore lightly pigmented sister species,Drosophila novamexicana. Here, we examine the frequency and geographic distribution of theseD. novamexicana‐like alleles inD. americana. Among alleles from over 100 strains ofD. americanaisolated from 21 geographic locations, we failed to identify additional alleles oftanorebonywith as much sequence similarity toD. novamexicanaas theD. novamexicana‐like alleles previously described. However, using genetic analysis of 51D. americanastrains derived from 20 geographic locations, we identified one new allele ofebonyand one new allele oftansegregating inD. americanathat are functionally equivalent to theD. novamexicanaallele. An additional 5 alleles oftanalso showed marginal evidence of functional similarity. Given the rarity of these alleles, however, we conclude that they are unlikely to be driving the pigmentation cline. Indeed, phenotypic distributions of the 51 backcross populations analysed indicate a more complex genetic architecture, with diversity in the number and effects of loci altering pigmentation observed both within and among populations ofD. americana. This genetic heterogeneity poses a challenge to association studies and genomic scans for clinal variation, but might be common in natural populations.

     
    more » « less
  2. INTRODUCTION One of the central applications of the human reference genome has been to serve as a baseline for comparison in nearly all human genomic studies. Unfortunately, many difficult regions of the reference genome have remained unresolved for decades and are affected by collapsed duplications, missing sequences, and other issues. Relative to the current human reference genome, GRCh38, the Telomere-to-Telomere CHM13 (T2T-CHM13) genome closes all remaining gaps, adds nearly 200 million base pairs (Mbp) of sequence, corrects thousands of structural errors, and unlocks the most complex regions of the human genome for scientific inquiry. RATIONALE We demonstrate how the T2T-CHM13 reference genome universally improves read mapping and variant identification in a globally diverse cohort. This cohort includes all 3202 samples from the expanded 1000 Genomes Project (1KGP), sequenced with short reads, as well as 17 globally diverse samples sequenced with long reads. By applying state-of-the-art methods for calling single-nucleotide variants (SNVs) and structural variants (SVs), we document the strengths and limitations of T2T-CHM13 relative to its predecessors and highlight its promise for revealing new biological insights within technically challenging regions of the genome. RESULTS Across the 1KGP samples, we found more than 1 million additional high-quality variants genome-wide using T2T-CHM13 than with GRCh38. Within previously unresolved regions of the genome, we identified hundreds of thousands of variants per sample—a promising opportunity for evolutionary and biomedical discovery. T2T-CHM13 improves the Mendelian concordance rate among trios and eliminates tens of thousands of spurious SNVs per sample, including a reduction of false positives in 269 challenging, medically relevant genes by up to a factor of 12. These corrections are in large part due to improvements to 70 protein-coding genes in >9 Mbp of inaccurate sequence caused by falsely collapsed or duplicated regions in GRCh38. Using the T2T-CHM13 genome also yields a more comprehensive view of SVs genome-wide, with a greatly improved balance of insertions and deletions. Finally, by providing numerous resources for T2T-CHM13 (including 1KGP genotypes, accessibility masks, and prominent annotation databases), our work will facilitate the transition to T2T-CHM13 from the current reference genome. CONCLUSION The vast improvements in variant discovery across samples of diverse ancestries position T2T-CHM13 to succeed as the next prevailing reference for human genetics. T2T-CHM13 thus offers a model for the construction and study of high-quality reference genomes from globally diverse individuals, such as is now being pursued through collaboration with the Human Pangenome Reference Consortium. As a foundation, our work underscores the benefits of an accurate and complete reference genome for revealing diversity across human populations. Genomic features and resources available for T2T-CHM13. Comparisons to GRCh38 reveal broad improvements in SNVs, indels, and SVs discovered across diverse human populations by means of short-read (1KGP) and long-read sequencing (LRS). These improvements are due to resolution of complex genomic loci (nonsyntenic and previously unresolved), duplication errors, and discordant haplotypes, including those in medically relevant genes. 
    more » « less
  3. Abstract Background

    The vast majority of findings from human genome-wide association studies (GWAS) map to non-coding sequences, complicating their mechanistic interpretations and clinical translations. Non-coding sequences that are evolutionarily conserved and biochemically active could offer clues to the mechanisms underpinning GWAS discoveries. However, genetic effects of such sequences have not been systematically examined across a wide range of human tissues and traits, hampering progress to fully understand regulatory causes of human complex traits.

    Results

    Here we develop a simple yet effective strategy to identify functional elements exhibiting high levels of human-mouse sequence conservation and enhancer-like biochemical activity, which scales well to 313 epigenomic datasets across 106 human tissues and cell types. Combined with 468 GWAS of European (EUR) and East Asian (EAS) ancestries, these elements show tissue-specific enrichments of heritability and causal variants for many traits, which are significantly stronger than enrichments based on enhancers without sequence conservation. These elements also help prioritize candidate genes that are functionally relevant to body mass index (BMI) and schizophrenia but were not reported in previous GWAS with large sample sizes.

    Conclusions

    Our findings provide a comprehensive assessment of how sequence-conserved enhancer-like elements affect complex traits in diverse tissues and demonstrate a generalizable strategy of integrating evolutionary and biochemical data to elucidate human disease genetics.

     
    more » « less
  4. Abstract

    Identifying a common set of genes that mediate host–microbial interactions across populations and species of mammals has broad relevance for human health and animal biology. However, the genetic basis of the gut microbial composition in natural populations remains largely unknown outside of humans. Here, we used wild house mouse populations as a model system to ask three major questions: (a) Does host genetic relatedness explain interindividual variation in gut microbial composition? (b) Do population differences in the microbiota persist in a common environment? (c) What are the host genes associated with microbial richness and the relative abundance of bacterial genera? We found that host genetic distance is a strong predictor of the gut microbial composition as characterized by 16S amplicon sequencing. Using a common garden approach, we then identified differences in microbial composition between populations that persisted in a shared laboratory environment. Finally, we used exome sequencing to associate host genetic variants with microbial diversity and relative abundance of microbial taxa in wild mice. We identified 20 genes that were associated with microbial diversity or abundance including a macrophage‐derived cytokine (IL12a) that contained three nonsynonymous mutations. Surprisingly, we found a significant overrepresentation of candidate genes that were previously associated with microbial measurements in humans. The homologous genes that overlapped between wild mice and humans included genes that have been associated with traits related to host immunity and obesity in humans. Gene–bacteria associations identified in both humans and wild mice suggest some commonality to the host genetic determinants of gut microbial composition across mammals.

     
    more » « less
  5. Understanding the factors influencing the current distribution of genetic diversity across a species range is one of the main questions of evolutionary biology, especially given the increasing threat to biodiversity posed by climate change. Historical demographic processes such as population expansion or bottlenecks and decline are known to exert a predominant influence on past and current levels of genetic diversity, and revealing this demo‐genetic history can have immediate conservation implications. We used a whole‐exome capture sequencing approach to analyze polymorphism across the gene space of red spruce (Picea rubens Sarg.), an endemic and emblematic tree species of eastern North America high‐elevation forests that are facing the combined threat of global warming and increasing human activities. We sampled a total of 340 individuals, including populations from the current core of the range in northeastern USA and southeastern Canada and from the southern portions of its range along the Appalachian Mountains, where populations occur as highly fragmented mountaintop “sky islands.” Exome capture baits were designed from the closely relative white spruce (P. glauca Voss) transcriptome, and sequencing successfully captured most regions on or near our target genes, resulting in the generation of a new and expansive genomic resource for studying standing genetic variation in red spruce applicable to its conservation. Our results, based on over 2 million exome‐derived variants, indicate that red spruce is structured into three distinct ancestry groups that occupy different geographic regions of its highly fragmented range. Moreover, these groups show small Ne , with a temporal history of sustained population decline that has been ongoing for thousands (or even hundreds of thousands) of years. These results demonstrate the broad potential of genomic studies for revealing details of the demographic history that can inform management and conservation efforts of nonmodel species with active restoration programs, such as red spruce. 
    more » « less