Fine-scale evolutionary dynamics can be challenging to tease out when focused on the broad brush strokes of whole populations over long time spans. We propose a suite of diagnostic analysis techniques that operate on lineages and phylogenies in digital evolution experiments, with the aim of improving our capacity to quantitatively explore the nuances of evolutionary histories in digital evolution experiments. We present three types of lineage measurements: lineage length, mutation accumulation, and phenotypic volatility. Additionally, we suggest the adoption of four phylogeny measurements from biology: phylogenetic richness, phylogenetic divergence, phylogenetic regularity, and depth of the most-recent common ancestor. In addition to quantitative metrics, we also discuss several existing data visualizations that are useful for understanding lineages and phylogenies: state sequence visualizations, fitness landscape overlays, phylogenetic trees, and Muller plots. We examine the behavior of these metrics (with the aid of data visualizations) in two well-studied computational contexts: (1) a set of two-dimensional, real-valued optimization problems under a range of mutation rates and selection strengths, and (2) a set of qualitatively different environments in the Avida digital evolution platform. These results confirm our intuition about how these metrics respond to various evolutionary conditions and indicate their broad value.
more »
« less
Hereditary Stratigraphy: Genome Annotations to Enable Phylogenetic Inference over Distributed Populations
Phylogenies provide direct accounts of the evolutionary trajectories behind evolved artifacts in genetic algorithm and artificial life systems. Phylogenetic analyses can also enable insight into evolutionary and ecological dynamics such as selection pressure and frequency-dependent selection. Traditionally, digital evolution systems have recorded data for phylogenetic analyses through perfect tracking where each birth event is recorded in a centralized data structure. This approach, however, does not easily scale to distributed computing environments where evolutionary individuals may migrate between a large number of disjoint processing elements. To provide for phylogenetic analyses in these environments, we propose an approach to enable phylogenies to be inferred via heritable genetic annotations rather than directly tracked. We introduce a “hereditary stratigraphy” algorithm that enables efficient, accurate phylogenetic reconstruction with tunable, explicit trade-offs between annotation memory footprint and reconstruction accuracy. In particular, we demonstrate an approach that enables estimation of the most recent common ancestor (MRCA) between two individuals with fixed relative accuracy irrespective of lineage depth while only requiring logarithmic annotation space complexity with respect to lineage depth. This approach can estimate, for example, MRCA generation of two genomes within 10% relative error with 95% confidence up to a depth of a trillion generations with genome annotations smaller than a kilobyte. We also simulate inference over known lineages, recovering up to 85.70% of the information contained in the original tree using 64-bit annotations.
more »
« less
- Award ID(s):
- 1655715
- PAR ID:
- 10394105
- Date Published:
- Journal Name:
- The 2022 Conference on Artificial Life (ALIFE 2022)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Fine-scale evolutionary dynamics can be challenging to tease out when focused on broad brush strokes of whole populations over long time spans. We propose a suite of diagnostic metrics that operate on lineages and phylogenies in digital evolution experiments with the aim of improving our capacity to quantitatively explore the nuances of evolutionary histories in digital evolution experiments. We present three types of lineage measurements: lineage length, mutation accumulation, and phenotypic volatility. Additionally, we suggest the adoption of four phylogeny measurements from biology: depth of the most-recent common ancestor, phylogenetic richness, phylogenetic divergence, and phylogenetic regularity. We demonstrate the use of each metric on a set of two-dimensional, real-valued optimization problems under a range of mutation rates and selection strengths, confirming our intuitions about what they can tell us about evolutionary dynamics.more » « less
-
Abstract Erroneous data can creep into sequence datasets for reasons ranging from contamination to annotation and alignment mistakes and reduce the accuracy of downstream analyses. As datasets keep getting larger, it has become difficult to check multiple sequence alignments visually for errors, and thus, automatic error detection methods are needed more than ever before. Alignment masking methods, which are widely used, remove entire aligned sites and may reduce signal as much as or more than they reduce the noise.The alternative we propose here is a surprisingly under‐explored approach: looking for errors in small species‐specific stretches of the multiple sequence alignments. We introduce a method called TAPER that uses a novel two‐dimensional outlier detection algorithm. Importantly, TAPER adjusts its null expectations per site and species, and in doing so, it attempts to distinguish the real heterogeneity (signal) from errors (noise).Our results show that TAPER removes very little data yet finds much of the error. The effectiveness of TAPER depends on several properties of the alignment (e.g. evolutionary divergence levels) and the errors (e.g. their length).By enabling data clean up with minimal loss of signal, TAPER can improve downstream analyses such as phylogenetic reconstruction and selection detection. Data errors, small or large, can reduce confidence in the downstream results, and thus, eliminating them can be beneficial even when downstream analyses are not impacted.more » « less
-
Identifying the evolutionary and ecological mechanisms that drive lineage diversification in the species-rich tropics is of broad interest to evolutionary biologists. Here, we use phylogeographic and demographic analyses of genomic scale RADseq data to assess the impact of a large geographic feature, the Amazon River, on lineage formation in a venomous pitviper, Bothrops atrox. We compared genetic differentiation in samples from four sites near Santarem, Brazil that spanned the Amazon and represented major habitat types. A species delimitation analysis identified each population as a distinct evolutionary lineage while a species tree analysis with populations as taxa revealed a phylogenetic tree consistent with dispersal across the Amazon from north to south. Phylogenetic analyses of mtDNA variation confirmed this pattern and suggest that all lineages originated during the mid- to late-Pleistocene. Historical demographic analyses support a population model of lineage formation through isolation between lineages with low ongoing migration between large populations and reject a model of differentiation through isolation by distance alone. Our results provide a rare example of a phylogeographic pattern demonstrating dispersal over evolutionary time scales across a large tropical river and suggest a role for the Amazon River as a driver of in-situ divergence by both impeding (but not preventing) gene flow and through parapatric differentiation along an ecological gradient.more » « less
-
Davalos, Liliana (Ed.)Abstract African cichlids (subfamily: Pseudocrenilabrinae) are among the most diverse vertebrates, and their propensity for repeated rapid radiation has made them a celebrated model system in evolutionary research. Nonetheless, despite numerous studies, phylogenetic uncertainty persists, and riverine lineages remain comparatively underrepresented in higher-level phylogenetic studies. Heterogeneous gene histories resulting from incomplete lineage sorting (ILS) and hybridization are likely sources of uncertainty, especially during episodes of rapid speciation. We investigate the relationships of Pseudocrenilabrinae and its close relatives while accounting for multiple sources of genetic discordance using species tree and hybrid network analyses with hundreds of single-copy exons. We improve sequence recovery for distant relatives, thereby extending the taxonomic reach of our probes, with a hybrid reference guided/de novo assembly approach. Our analyses provide robust hypotheses for most higher-level relationships and reveal widespread gene heterogeneity, including in riverine taxa. ILS and past hybridization are identified as the sources of genetic discordance in different lineages. Sampling of various Blenniiformes (formerly Ovalentaria) adds strong phylogenomic support for convict blennies (Pholidichthyidae) as sister to Cichlidae and points to other potentially useful protein-coding markers across the order. A reliable phylogeny with representatives from diverse environments will support ongoing taxonomic and comparative evolutionary research in the cichlid model system. [African cichlids; Blenniiformes; Gene tree heterogeneity; Hybrid assembly; Phylogenetic network; Pseudocrenilabrinae; Species tree.]more » « less
An official website of the United States government

