skip to main content

Title: On the number of genealogical ancestors tracing to the source groups of an admixed population

Members of genetically admixed populations possess ancestry from multiple source groups, and studies of human genetic admixture frequently estimate ancestry components corresponding to fractions of individual genomes that trace to specific ancestral populations. However, the same numerical ancestry fraction can represent a wide array of admixture scenarios within an individual’s genealogy. Using a mechanistic model of admixture, we consider admixture genealogically: how many ancestors from the source populations does the admixture represent? We consider African-Americans, for whom continent-level estimates produce a 75–85% value for African ancestry on average and 15–25% for European ancestry. Genetic studies together with key features of African-American demographic history suggest ranges for parameters of a simple three-epoch model. Considering parameter sets compatible with estimates of current ancestry levels, we infer that if all genealogical lines of a random African-American born during 1960–1965 are traced back until they reach members of source populations, the mean over parameter sets of the expected number of genealogical lines terminating with African individuals is 314 (interquartile range 240–376), and the mean of the expected number terminating in Europeans is 51 (interquartile range 32–69). Across discrete generations, the peak number of African genealogical ancestors occurs in birth cohorts from the early 1700s, and the probability exceeds 50% that at least one European ancestor was born more recently than 1835. Our genealogical perspective can contribute to further understanding the admixture processes that underlie admixed populations. For African-Americans, the results provide insight both on how many of the ancestors of a typical African-American might have been forcibly displaced in the Transatlantic Slave Trade and on how many separate European admixture events might exist in a typical African-American genealogy.

more » « less
Award ID(s):
2116322 2109515
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT Background

    Epistasis and gene‐environment interactions are known to contribute significantly to variation of complex phenotypes in model organisms. However, their identification in human association studies remains challenging for myriad reasons. In the case of epistatic interactions, the large number of potential interacting sets of genes presents computational, multiple hypothesis correction, and other statistical power issues. In the case of gene‐environment interactions, the lack of consistently measured environmental covariates in most disease studies precludes searching for interactions and creates difficulties for replicating studies.


    In this work, we develop a new statistical approach to address these issues that leverages genetic ancestry, defined as the proportion of ancestry derived from each ancestral population (e.g., the fraction of European/African ancestry in African Americans), in admixed populations. We applied our method to gene expression and methylation data from African American and Latino admixed individuals, respectively, identifying nine interactions that were significant at. We show that two of the interactions in methylation data replicate, and the remaining six are significantly enriched for lowP‐values ().


    We show that genetic ancestry can be a useful proxy for unknown and unmeasured covariates in the search for interaction effects. These results have important implications for our understanding of the genetic architecture of complex traits.

    more » « less
  2. Abstract Objective

    Socially constructed ethnic identities are frequently rooted in beliefs about common descent that form when people with disparate cultures, languages, and biology come into contact. This study explores connections between beliefs about common descent, as represented by ethnic nomenclatures, and histories of migration and isolation ascertained from genomic data in New Mexicans of Spanish‐speaking descent (NMS).

    Materials and Methods

    We interviewed 507 NMS who further identified using one of seven ethnic terms that they associated with beliefs about connections to past ancestors. For groups of individuals who identified using each term, we estimated biogeographic ancestry, fit admixture models to ancestry distributions, and partitioned genetic distance into admixture and drift components.


    Regardless of which ethnic term they used, all NMS had appreciable Native American (avg. 27%) and European ancestry (avg.71%). However, individuals who identified using terms associated with beliefs connecting them to colonial‐period Spanish ancestors had significantly higher European ancestry than individuals who identified using terms associated with ancestral connections to post‐colonial‐period migrants from Mexico. Model‐fitting analyses show that this ancestry difference reflects post‐colonial gene flow with non‐NMS European Americans, not colonial‐period gene flow with Spaniards. Drift, not admixture, accounted for most of the genetic distance between NMS who expressed connections to Mexican versus Spanish ancestors, reflecting relative isolation of New Mexico and Mexico through the 19th century.


    Patterns of genomic diversity in NMS are consistent with beliefs about common descent in showing that New Mexico was isolated for generations following initial colonization. They are inconsistent with these beliefs in showing that all NMS have substantial European and Native American ancestry, and in showing that a proportion of European ancestry derives from post‐colonial‐period admixture with non‐NMS European Americans. Our findings provide insights into the construction of ethnic identity in contexts of migration and isolation in New Mexico and, potentially, throughout human prehistory.

    more » « less
  3. Abstract

    The Africanized honey bee (AHB) is a New World amalgamation of several subspecies of the western honey bee (Apis mellifera), a diverse taxon historically grouped into four major biogeographic lineages: A (African), M (Western European), C (Eastern European), and O (Middle Eastern). In 1956, accidental release of experimentally bred “Africanized” hybrids from a research apiary in Sao Paulo, Brazil initiated a hybrid species expansion that now extends from northern Argentina to northern California (U.S.A.). Here, we assess nuclear admixture and mitochondrial ancestry in 60 bees from four countries (Panamá; Costa Rica, Mexico; U.S.A) across this expansive range to assess ancestry of AHB several decades following initial introduction and test the prediction that African ancestry decreases with increasing latitude. We find that AHB nuclear genomes from Central America and Mexico have predominately African genomes (76%–89%) with smaller contributions from Western and Eastern European lineages. Similarly, nearly all honey bees from Central America and Mexico possess mitochondrial ancestry from the African lineage with few individuals having European mitochondria. In contrast, AHB from San Diego (CA) shows markedly lower African ancestry (38%) with substantial genomic contributions from all four major honey bee lineages and mitochondrial ancestry from all four clades as well. Genetic diversity measures from all New World populations equal or exceed those of ancestral populations. Interestingly, the feral honey bee population of San Diego emerges as a reservoir of diverse admixture and high genetic diversity, making it a potentially rich source of genetic material for honey bee breeding.

    more » « less
  4. Abstract Rationale: Genetic variation has a substantial contribution to chronic obstructive pulmonary disease (COPD) and lung function measurements. Heritability estimates using genome-wide genotyping data can be biased if analyses do not appropriately account for the nonuniform distribution of genetic effects across the allele frequency and linkage disequilibrium (LD) spectrum. In addition, the contribution of rare variants has been unclear. Objectives: We sought to assess the heritability of COPD and lung function using whole-genome sequence data from the Trans-Omics for Precision Medicine program. Methods: Using the genome-based restricted maximum likelihood method, we partitioned the genome into bins based on minor allele frequency and LD scores and estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio in 11 051 European ancestry and 5853 African-American participants. Measurements and Main Results: In European ancestry participants, the estimated heritability of COPD, FEV1% predicted and FEV1/FVC ratio were 35.5%, 55.6% and 32.5%, of which 18.8%, 19.7%, 17.8% were from common variants, and 16.6%, 35.8%, and 14.6% were from rare variants. These estimates had wide confidence intervals, with common variants and some sets of rare variants showing a statistically significant contribution (P-value < 0.05). In African-Americans, common variant heritability was similar to European ancestry participants, but lower sample size precluded calculation of rare variant heritability. Conclusions: Our study provides updated and unbiased estimates of heritability for COPD and lung function, and suggests an important contribution of rare variants. Larger studies of more diverse ancestry will improve accuracy of these estimates. 
    more » « less
  5. Abstract Background Large medical centers in urban areas, like Los Angeles, care for a diverse patient population and offer the potential to study the interplay between genetic ancestry and social determinants of health. Here, we explore the implications of genetic ancestry within the University of California, Los Angeles (UCLA) ATLAS Community Health Initiative—an ancestrally diverse biobank of genomic data linked with de-identified electronic health records (EHRs) of UCLA Health patients ( N =36,736). Methods We quantify the extensive continental and subcontinental genetic diversity within the ATLAS data through principal component analysis, identity-by-descent, and genetic admixture. We assess the relationship between genetically inferred ancestry (GIA) and >1500 EHR-derived phenotypes (phecodes). Finally, we demonstrate the utility of genetic data linked with EHR to perform ancestry-specific and multi-ancestry genome and phenome-wide scans across a broad set of disease phenotypes. Results We identify 5 continental-scale GIA clusters including European American (EA), African American (AA), Hispanic Latino American (HL), South Asian American (SAA) and East Asian American (EAA) individuals and 7 subcontinental GIA clusters within the EAA GIA corresponding to Chinese American, Vietnamese American, and Japanese American individuals. Although we broadly find that self-identified race/ethnicity (SIRE) is highly correlated with GIA, we still observe marked differences between the two, emphasizing that the populations defined by these two criteria are not analogous. We find a total of 259 significant associations between continental GIA and phecodes even after accounting for individuals’ SIRE, demonstrating that for some phenotypes, GIA provides information not already captured by SIRE. GWAS identifies significant associations for liver disease in the 22q13.31 locus across the HL and EAA GIA groups (HL p -value=2.32×10 −16 , EAA p -value=6.73×10 −11 ). A subsequent PheWAS at the top SNP reveals significant associations with neurologic and neoplastic phenotypes specifically within the HL GIA group. Conclusions Overall, our results explore the interplay between SIRE and GIA within a disease context and underscore the utility of studying the genomes of diverse individuals through biobank-scale genotyping linked with EHR-based phenotyping. 
    more » « less