skip to main content

This content will become publicly available on August 25, 2023

Title: Trees, forests, chickens, and eggs: when and why to prune trees in a random forest
Authors:
 ;  
Publication Date:
NSF-PAR ID:
10370278
Journal Name:
Statistical Analysis and Data Mining: The ASA Data Science Journal
ISSN:
1932-1864
Publisher:
Wiley Blackwell (John Wiley & Sons)
Sponsoring Org:
National Science Foundation
More Like this
  1. Though large multilocus genomic data sets have led to overall improvements in phylogenetic inference, they have posed the new challenge of addressing conflicting signals across the genome. In particular, ancestral population structure, which has been uncovered in a number of diverse species, can skew gene tree frequencies, thereby hindering the performance of species tree estimators. Here we develop a novel maximum likelihood method, termed TASTI (Taxa with Ancestral structure Species Tree Inference), that can infer phylogenies under such scenarios, and find that it has increasing accuracy with increasing numbers of input gene trees, contrasting with the relatively poor performances of methods not tailored for ancestral structure. Moreover, we propose a supertree approach that allows TASTI to scale computationally with increasing numbers of input taxa. We use genetic simulations to assess TASTI’s performance in the three- and four-taxon settings and demonstrate the application of TASTI on a six-species Afrotropical mosquito data set. Finally, we have implemented TASTI in an open-source software package for ease of use by the scientific community.