skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2116322

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract In a genetically admixed population, admixed individuals possess genealogical and genetic ancestry from multiple source groups. Under a mechanistic model of admixture, we study the number of distinct ancestors from the source populations that the admixture represents. Combining a mechanistic admixture model with a recombination model that describes the probability that a genealogical ancestor is a genetic ancestor, for a member of a genetically admixed population, we count genetic ancestors from the source populations—those genealogical ancestors from the source populations who contribute to the genome of the modern admixed individual. We compare patterns in the numbers of genealogical and genetic ancestors across the generations. To illustrate the enumeration of genetic ancestors from source populations in an admixed group, we apply the model to the African-American population, extending recent results on the numbers of African and European genealogical ancestors that contribute to the pedigree of an African-American chosen at random, so that we also evaluate the numbers of African and European genetic ancestors who contribute to random African-American genomes. The model suggests that the autosomal genome of a random African-American born in the interval 1960–1965 contains genetic contributions from a mean of 162 African (standard deviation 47, interquartile range 127–192) and 32 European ancestors (standard deviation 14, interquartile range 21–43). The enumeration of genetic ancestors can potentially be performed in other diploid species in which admixture and recombination models can be specified. 
    more » « less
  2. Abstract Rooted binarygalledtrees generalize rooted binary trees to allow a restricted class of cycles, known asgalls. We build upon the Wedderburn-Etherington enumeration of rooted binary unlabeled trees withnleaves to enumerate rooted binary unlabeled galled trees withnleaves, also enumerating rooted binary unlabeled galled trees withnleaves andggalls,$$0 \leqslant g \leqslant \lfloor \frac{n-1}{2} \rfloor $$ 0 g n - 1 2 . The enumerations rely on a recursive decomposition that considers subtrees descended from the nodes of a gall, adopting a restriction on galls that amounts to considering only the rooted binarynormalunlabeled galled trees in our enumeration. We write an implicit expression for the generating function encoding the numbers of trees for alln. We show that the number of rooted binary unlabeled galled trees grows with$$0.0779(4.8230^n)n^{-\frac{3}{2}}$$ 0.0779 ( 4 . 8230 n ) n - 3 2 , exceeding the growth$$0.3188(2.4833^n)n^{-\frac{3}{2}}$$ 0.3188 ( 2 . 4833 n ) n - 3 2 of the number of rooted binary unlabeled trees without galls. However, the growth of the number of galled trees with only one gall has the same exponential order 2.4833 as the number with no galls, exceeding it only in the subexponential term,$$0.3910n^{\frac{1}{2}}$$ 0.3910 n 1 2 compared to$$0.3188n^{-\frac{3}{2}}$$ 0.3188 n - 3 2 . For a fixed number of leavesn, the number of gallsgthat produces the largest number of rooted binary unlabeled galled trees lies intermediate between the minimum of$$g=0$$ g = 0 and the maximum of$$g=\lfloor \frac{n-1}{2} \rfloor $$ g = n - 1 2 . We discuss implications in mathematical phylogenetics. 
    more » « less
  3. Abstract Members of genetically admixed populations possess ancestry from multiple source groups, and studies of human genetic admixture frequently estimate ancestry components corresponding to fractions of individual genomes that trace to specific ancestral populations. However, the same numerical ancestry fraction can represent a wide array of admixture scenarios within an individual’s genealogy. Using a mechanistic model of admixture, we consider admixture genealogically: how many ancestors from the source populations does the admixture represent? We consider African-Americans, for whom continent-level estimates produce a 75–85% value for African ancestry on average and 15–25% for European ancestry. Genetic studies together with key features of African-American demographic history suggest ranges for parameters of a simple three-epoch model. Considering parameter sets compatible with estimates of current ancestry levels, we infer that if all genealogical lines of a random African-American born during 1960–1965 are traced back until they reach members of source populations, the mean over parameter sets of the expected number of genealogical lines terminating with African individuals is 314 (interquartile range 240–376), and the mean of the expected number terminating in Europeans is 51 (interquartile range 32–69). Across discrete generations, the peak number of African genealogical ancestors occurs in birth cohorts from the early 1700s, and the probability exceeds 50% that at least one European ancestor was born more recently than 1835. Our genealogical perspective can contribute to further understanding the admixture processes that underlie admixed populations. For African-Americans, the results provide insight both on how many of the ancestors of a typical African-American might have been forcibly displaced in the Transatlantic Slave Trade and on how many separate European admixture events might exist in a typical African-American genealogy. 
    more » « less
  4. Abstract In studying allele-frequency variation across populations, it is often convenient to classify an allelic type as “rare,” with nonzero frequency less than or equal to a specified threshold, “common,” with a frequency above the threshold, or entirely unobserved in a population. When sample sizes differ across populations, however, especially if the threshold separating “rare” and “common” corresponds to a small number of observed copies of an allelic type, discreteness effects can lead a sample from one population to possess substantially more rare allelic types than a sample from another population, even if the two populations have extremely similar underlying allele-frequency distributions across loci. We introduce a rarefaction-based sample-size correction for use in comparing rare and common variation across multiple populations whose sample sizes potentially differ. We use our approach to examine rare and common variation in worldwide human populations, finding that the sample-size correction introduces subtle differences relative to analyses that use the full available sample sizes. We introduce several ways in which the rarefaction approach can be applied: we explore the dependence of allele classifications on subsample sizes, we permit more than two classes of allelic types of nonzero frequency, and we analyze rare and common variation in sliding windows along the genome. The results can assist in clarifying similarities and differences in allele-frequency patterns across populations. 
    more » « less
  5. Abstract Allele-sharing statistics for a genetic locus measure the dissimilarity between two populations as a mean of the dissimilarity between random pairs of individuals, one from each population. Owing to within-population variation in genotype, allele-sharing dissimilarities can have the property that they have a nonzero value when computed between a population and itself. We consider the mathematical properties of allele-sharing dissimilarities in a pair of populations, treating the allele frequencies in the two populations parametrically. Examining two formulations of allele-sharing dissimilarity, we obtain the distributions of within-population and between-population dissimilarities for pairs of individuals. We then mathematically explore the scenarios in which, for certain allele-frequency distributions, the within-population dissimilarity – the mean dissimilarity between randomly chosen members of a population – can exceed the dissimilarity between two populations. Such scenarios assist in explaining observations in population-genetic data that members of a population can be empirically more genetically dissimilar from each other on average than they are from members of another population. For a population pair, however, the mathematical analysis finds that at least one of the two populations always possesses smaller within-population dissimilarity than the value of the between-population dissimilarity. We illustrate the mathematical results with an application to human population-genetic data. 
    more » « less
  6. Abstract Properties of gene genealogies such as tree height (H), total branch length (L), total lengths of external (E) and internal (I) branches, mean length of basal branches (B), and the underlying coalescence times (T) can be used to study population-genetic processes and to develop statistical tests of population-genetic models. Uses of tree features in statistical tests often rely on predictions that depend on pairwise relationships among such features. For genealogies under the coalescent, we provide exact expressions for Taylor approximations to expected values and variances of ratios Xn/Yn, for all 15 pairs among the variables {Hn,Ln,En,In,Bn,Tk}, considering n leaves and 2≤k≤n. For expected values of the ratios, the approximations match closely with empirical simulation-based values. The approximations to the variances are not as accurate, but they generally match simulations in their trends as n increases. Although En has expectation 2 and Hn has expectation 2 in the limit as n→∞, the approximation to the limiting expectation for En/Hn is not 1, instead equaling π2/3−2≈1.28987. The new approximations augment fundamental results in coalescent theory on the shapes of genealogical trees. 
    more » « less
  7. Free, publicly-accessible full text available February 13, 2026
  8. Colijn and Plazzotta (2018) [1] described a bijective scheme for associating the unlabeled bifurcating rooted trees with the positive integers. In mathematical and biological applications of unlabeled rooted trees, however, nodes of rooted trees are sometimes multifurcating rather than bifurcating. Building on the bijection between the unlabeled bifurcating rooted trees and the positive integers, we describe bijective schemes for associating the unlabeled multifurcating rooted trees with the positive integers. We devise bijections with the positive integers for a set of trees in which each non-leaf node has exactly k child nodes, and for a set of trees in which each non-leaf node has at most k child nodes. The calculations make use of Macaulay's binomial expansion formula. The generalization to multifurcating trees can assist with the use of unlabeled trees for applications in evolutionary biology, such as the measurement of phylogenetic patterns of genetic lineages in pathogens. 
    more » « less
  9. Mailler, Cécile; Wild, Sebastian (Ed.)
    Galled trees appear in problems concerning admixture, horizontal gene transfer, hybridization, and recombination. Building on a recursive enumerative construction, we study the asymptotic behavior of the number of rooted binary unlabeled (normal) galled trees as the number of leaves n increases, maintaining a fixed number of galls g. We find that the exponential growth with n of the number of rooted binary unlabeled normal galled trees with g galls has the same value irrespective of the value of g ≥ 0. The subexponential growth, however, depends on g; it follows c_g n^{2g-3/2}, where c_g is a constant dependent on g. Although for each g, the exponential growth is approximately 2.4833ⁿ, summing across all g, the exponential growth is instead approximated by the much larger 4.8230ⁿ. 
    more » « less
  10. Abstract Objective In mathematical phylogenetics, a labeled rooted binary tree topology can possess any of a number of labeled histories, each of which represents a possible temporal ordering of its coalescences. Labeled histories appear frequently in calculations that describe the combinatorics of phylogenetic trees. Here, we generalize the concept of labeled histories from rooted phylogenetic trees to rooted phylogenetic networks, specifically for the class of rooted phylogenetic networks known as rooted galled trees . Results Extending a recursive algorithm for enumerating the labeled histories of a labeled tree topology, we present a method to enumerate the labeled histories associated with a labeled rooted galled tree. The method relies on a recursive decomposition by which each gall in a galled tree possesses three or more descendant subtrees. We exhaustively provide the numbers of labeled histories for all small galled trees, finding that each gall reduces the number of labeled histories relative to a specified galled tree that does not contain it. Conclusion The results expand the set of structures for which labeled histories can be enumerated, extending a well-known calculation for phylogenetic trees to a class of phylogenetic networks. 
    more » « less