skip to main content


Title: Handling Logical Character Dependency in Phylogenetic Inference: Extensive Performance Testing of Assumptions and Solutions Using Simulated and Empirical Data
Abstract Logical character dependency is a major conceptual and methodological problem in phylogenetic inference of morphological data sets, as it violates the assumption of character independence that is common to all phylogenetic methods. It is more frequently observed in higher-level phylogenies or in data sets characterizing major evolutionary transitions, as these represent parts of the tree of life where (primary) anatomical characters either originate or disappear entirely. As a result, secondary traits related to these primary characters become “inapplicable” across all sampled taxa in which that character is absent. Various solutions have been explored over the last three decades to handle character dependency, such as alternative character coding schemes and, more recently, new algorithmic implementations. However, the accuracy of the proposed solutions, or the impact of character dependency across distinct optimality criteria, has never been directly tested using standard performance measures. Here, we utilize simple and complex simulated morphological data sets analyzed under different maximum parsimony optimization procedures and Bayesian inference to test the accuracy of various coding and algorithmic solutions to character dependency. This is complemented by empirical analyses using a recoded data set on palaeognathid birds. We find that in small, simulated data sets, absent coding performs better than other popular coding strategies available (contingent and multistate), whereas in more complex simulations (larger data sets controlled for different tree structure and character distribution models) contingent coding is favored more frequently. Under contingent coding, a recently proposed weighting algorithm produces the most accurate results for maximum parsimony. However, Bayesian inference outperforms all parsimony-based solutions to handle character dependency due to fundamental differences in their optimization procedures—a simple alternative that has been long overlooked. Yet, we show that the more primary characters bearing secondary (dependent) traits there are in a data set, the harder it is to estimate the true phylogenetic tree, regardless of the optimality criterion, owing to a considerable expansion of the tree parameter space. [Bayesian inference, character dependency, character coding, distance metrics, morphological phylogenetics, maximum parsimony, performance, phylogenetic accuracy.]  more » « less
Award ID(s):
2045842
NSF-PAR ID:
10428439
Author(s) / Creator(s):
; ; ;
Editor(s):
Davalos, Liliana
Date Published:
Journal Name:
Systematic Biology
Volume:
72
Issue:
3
ISSN:
1063-5157
Page Range / eLocation ID:
662 to 680
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Liliana Davalos (Ed.)
    Logical character dependency is a major conceptual and methodological problem in phylogenetic inference of morphological data sets, as it violates the assumption of character independence that is common to all phylogenetic methods. It is more frequently observed in higher-level phylogenies or in data sets characterizing major evolutionary transitions, as these represent parts of the tree of life where (primary) anatomical characters either originate or disappear entirely. As a result, secondary traits related to these primary characters become “inapplicable” across all sampled taxa in which that character is absent. Various solutions have been explored over the last three decades to handle character dependency, such as alternative character coding schemes and, more recently, new algorithmic implementations. However, the accuracy of the proposed solutions, or the impact of character dependency across distinct optimality criteria, has never been directly tested using standard performance measures. Here, we utilize simple and complex simulated morphological data sets analyzed under different maximum parsimony optimization procedures and Bayesian inference to test the accuracy of various coding and algorithmic solutions to character dependency. This is complemented by empirical analyses using a recoded data set on palaeognathid birds. We find that in small, simulated data sets, absent coding performs better than other popular coding strategies available (contingent and multistate), whereas in more complex simulations (larger data sets controlled for different tree structure and character distribution models) contingent coding is favored more frequently. Under contingent coding, a recently proposed weighting algorithm produces the most accurate results for maximum parsimony. However, Bayesian inference outperforms all parsimony-based solutions to handle character dependency due to fundamental differences in their optimization procedures—a simple alternative that has been long overlooked. Yet, we show that the more primary characters bearing secondary (dependent) traits there are in a data set, the harder it is to estimate the true phylogenetic tree, regardless of the optimality criterion, owing to a considerable expansion of the tree parameter space. 
    more » « less
  2. Wright, April (Ed.)
    Abstract Popular optimality criteria for phylogenetic trees focus on sequences of characters that are applicable to all the taxa. As studies grow in breadth, it can be the case that some characters are applicable for a portion of the taxa and inapplicable for others. Past work has explored the limitations of treating inapplicable characters as missing data, noting that this strategy may favor trees where internal nodes are assigned impossible states, where the arrangement of taxa within subclades is unduly influenced by variation in distant parts of the tree, and/or where taxa that otherwise share most primary characters are grouped distantly. Approaches that avoid the first two problems have recently been proposed. Here, we propose an alternative approach which avoids all three problems. We focus on data matrices that use reductive coding of traits, that is, explicitly incorporate the innate hierarchy induced by inapplicability, and as such our approach extend to hierarchical characters, in general. In the spirit of maximum parsimony, the proposed criterion seeks the phylogenetic tree with the minimal changes across any tree branch, but where changes are defined in terms of dissimilarity metrics that weigh the effects of inapplicable characters. The approach can accommodate binary, multistate, ordered, unordered, and polymorphic characters. We give a polynomial-time algorithm, inspired by Fitch’s algorithm, to score trees under a family of dissimilarity metrics, and prove its correctness. We show that the resulting optimality criteria is computationally hard, by reduction to the NP-hardness of the maximum parsimony optimality criteria. We demonstrate our approach using synthetic and empirical data sets and compare the results with other recently proposed methods for choosing optimal phylogenetic trees when the data includes hierarchical characters. [Character optimization, dissimilarity metrics, hierarchical characters, inapplicable data, phylogenetic tree search.] 
    more » « less
  3. It is now well established that the end-Cretaceous mass extinction had enormous repercussions for mammalian evolution. Following the extinction, during the Paleocene, mammals started to radiate, occupying new and diverse ecological niches. However, the phylogenetic relationships between the socalled “archaic” mammals of this time, and their position within Placentalia, remain contentious. The Periptychidae are a clade of distinctive “archaic” ungulates, composed of ~17 genera of small to large bodied, highly bunodont, terrestrial herbivores that were among the first placental mammals to appear after the end-Cretaceous mass extinction. Although the Periptychidae has been historically considered a distinctive “condylarth” subgroup, their higherlevel relationships have been rarely tested. Here, we present an inclusive cladistic analysis to determine and test the phylogenetic affinities of Periptychidae and other key Paleocene groups within Placentalia under different cladistic optimality criteria. We scored 140 taxa for 503 dental, cranial and postcranial characters, incorporating new morphological and taxonomic data. The data were then subject to parsimony and Bayesian tree of morphological evolution, running 5000000 generations with samples every 200 generations and discarding 25% of the samples as burn-in. Stationarity was achieved and a 50 percent majority rule consensus tree from the sampled trees was obtained. The parsimony analysis recovered 48 most parsimonious trees. The two consensus trees derived from the different analyses are largely congruent and recover a monophyletic Periptychidae, although the parsimony consensus tree is better resolved. These results are consistent with simulation studies showing that parsimony tends to be more precise (more nodes reconstructed) than Bayesian analyses, although less accurate. The main topological differences between the results relate to the position of poorly known Puercan (earliest Paleocene) species. Our results affirm the monophyly of Periptychidae and its nesting within a group of “condylarths” positioned at the base of Laurasiatheria and closely related to Artiodactyla. Within Periptychidae we found support for the three major subfamilial divisions in both analyses. These results highlight the importance of using different optimality criteria when resolving a phylogeny and provide a new insight into how placental mammals were evolving after the end-Cretaceous extinction. Grant Information: CONICYT PFCHA/DOCTORADO BECAS CHILE/2018, European Research Council Starting Grant (ERC StG 2017, 756226, PalM), National Science Foundation (NSF EAR 1654952, DEB 1654949) 
    more » « less
  4. Abstract

    The phylogeny ofCyclops(~30 spp.), a predominantly Palearctic cold‐adapted genus, was reconstructed based on morphological and molecular characters. The morphological analysis used extensive taxon sampling from the entire Holarctic range of the genus and included 53 morphological characters. Polymorphic traits were coded by the “unordered,” “unscaled” and “scaled” methods; maximum parsimony criterion was applied in tree building. Molecular phylogenetic reconstructions utilized partial nuclear 18S and 28S ribosomal genes, mitochondrial cytochrome oxidase I and complete internal transcribed spacer regions I and II, albeit with limited taxon sampling. Bayesian inference and maximum likelihood were used in these tree reconstructions. The molecular characters were used both in combination with morphology and as an independent test of the basal relationships inferred from morphology. Monophyly of the genus received strong support in both the morphological and molecular phylogenies; the basal relationships remain unresolved. The morphology‐based phylogenies, along with the geographic distribution patterns and ecological traits, supported monophyly of theankyraeladakanusclade,scutifer‐clade (C. scutifer,C. jashnovi,C. columbianus),kolensis‐clade (C. kolensis,C. kikuchii,C. vicinus,C. furcifer,C. insignis,C. alaskaensis),abyssorum‐clade (C. abyssorums. str.,C. abyssorum larianus,C. ricae,C. sevani) anddivergens‐clade (South Carpathian “Cyclopssp. Y,”C. mauritaniae,C. divergens,C. bohater,C. lacustris). Relationships among European and North American populations ofC. scutiferandC. columbianusbased on partial sequences of the 12S mitochondrial gene showC. scutiferto be paraphyletic, suggesting two independent invasions into North America via the Bering Land Bridge from Siberia to Alaska.

     
    more » « less
  5. South American Ungulates (SANUs) exhibit astonishing morphological and ecological diversity due to their almost complete isolation during their early evolution. This unique diversity coupled with the limited fossil record of their earliest evolution makes it difficult to establish their phylogenetic position within the placental mammal tree. Litopterna is the second most diverse order of SANUs after only Notoungulata, with species ranging from the middle Paleocene (~63 Ma) to the late Pleistocene. Among SANUs, litopterns are characterized by having cursorial limbs similar to Holarctic groups like Perissodactyla. Currently there are 67 genera of litopterns grouped into nine families, and the affinities of the Paleogene families remain unclear. Furthermore, it is unclear how litopterns are related to older groups of “archaic” Paleogene ungulates of South America (Kollpaninae and Didolodontidae) and North America (e.g., Mioclaenidae), and other SANUs. To test the phylogenetic relationships of Litopterna, we assembled a new morphological matrix with ~1000 craniodental and postcranial characters for 79 taxa. The data were subjected to Bayesian and maximum parsimony analyses. We conducted tip-dated and undated Bayesian analyses using a Mk + G model of morphological evolution. Fifty percent majority rule consensus trees were obtained from the sampled trees from each analysis. The parsimony analysis resulted in ten most parsimonious trees and a strict consensus was computed. The consensus trees derived from the different analyses were largely congruent. A traditional monophyletic Litopterna failed to be recovered as Protolipternidae was closely related to Didolodontidae. Litopterna was found more closely related to Kollpaninae than to North American Mioclaenidae, and Kollpaninae did not form a monophyletic group with the latter. Adianthidae and Indaleciidae were found in a relatively basal position within Litopterna. Macraucheniidae was found as a sister group to Proterotheriidae, whereas Anisolambdidae was the sister group of Sparnotheriodontidae, these four families forming a monophyletic group. By utilizing a more comprehensive approach, these results alter previous conceptions of the intrafamilial affinities within Litopterna and their position among other Paleogene ungulates, shedding new light on how litopterns evolved and diversified during the Paleogene of South America. Funding Sources ANID-PFCHA-Doctorado en el extranjero Becas Chile-2018-72190003, ERC starting grant PalM 756226, NSF DEB 1654949 and 1654952 
    more » « less