skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Uneven missing data skew phylogenomic relationships within the lories and lorikeets.
The resolution of the Tree of Life has accelerated with advances in DNA sequencing technology. To achieve dense taxon sampling, it is often necessary to obtain DNA from historical museum specimens to supplement modern genetic samples. However, DNA from historical material is generally degraded, which presents various challenges. In this study, we evaluated how the coverage at variant sites and missing data among historical and modern samples impacts phylogenomic inference. We explored these patterns in the brush-tongued parrots (lories and lorikeets) of Australasia by sampling ultraconserved elements in 105 taxa. Trees estimated with low coverage characters had several clades where relationships appeared to be influenced by whether the sample came from historical or modern specimens, which were not observed when more stringent filtering was applied. To assess if the topologies were affected by missing data, we performed an outlier analysis of sites and loci, and a data reduction approach where we excluded sites based on data completeness. Depending on the outlier test, 0.15% of total sites or 38% of loci were driving the topological differences among trees, and at these sites, historical samples had 10.9× more missing data than modern ones. In contrast, 70% data completeness was necessary to avoid spurious relationships. Predictive modeling found that outlier analysis scores were correlated with parsimony informative sites in the clades whose topologies changed the most by filtering. After accounting for biased loci and understanding the stability of relationships, we inferred a more robust phylogenetic hypothesis for lories and lorikeets.  more » « less
Award ID(s):
1655736
PAR ID:
10290188
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Genome biology and evolution
Volume:
12
Issue:
7
ISSN:
1759-6653
Page Range / eLocation ID:
1131-1147
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. PremisePhylogenetic relationships within major angiosperm clades are increasingly well resolved, but largely informed by plastid data. Areas of poor resolution persist within the Dipsacales, including placement ofHeptacodiumandZabelia, and relationships within the Caprifolieae and Linnaeeae, hindering our interpretation of morphological evolution. Here, we sampled a significant number of nuclear loci using a Hyb‐Seq approach and used these data to infer the Dipsacales phylogeny and estimate divergence times. MethodsSampling all major clades within the Dipsacales, we applied the Angiosperms353 probe set to 96 species. Data were filtered based on locus completeness and taxon recovery per locus, and trees were inferred using RAxML and ASTRAL. Plastid loci were assembled from off‐target reads, and 10 fossils were used to calibrate dated trees. ResultsVarying numbers of targeted loci and off‐target plastomes were recovered from most taxa. Nuclear and plastid data confidently placeHeptacodiumwith Caprifolieae, implying homoplasy in calyx morphology, ovary development, and fruit type. Placement ofZabelia, and relationships within the Caprifolieae and Linnaeeae, remain uncertain. Dipsacales diversification began earlier than suggested by previous angiosperm‐wide dating analyses, but many major splitting events date to the Eocene. ConclusionsThe Angiosperms353 probe set facilitated the assembly of a large, single‐copy nuclear dataset for the Dipsacales. Nevertheless, many relationships remain unresolved, and resolution was poor for woody clades with low rates of molecular evolution. We favor expanding the Angiosperms353 probe set to include more variable loci and loci of special interest, such as developmental genes, within particular clades. 
    more » « less
  2. Ruane, Sara (Ed.)
    Abstract Some phylogenetic problems remain unresolved even when large amounts of sequence data are analyzed and methods that accommodate processes such as incomplete lineage sorting are employed. In addition to investigating biological sources of phylogenetic incongruence, it is also important to reduce noise in the phylogenomic dataset by using appropriate filtering approach that addresses gene tree estimation errors. We present the results of a case study in manakins, focusing on the very difficult clade comprising the genera Antilophia and Chiroxiphia. Previous studies suggest that Antilophia is nested within Chiroxiphia, though relationships among Antilophia+Chiroxiphia species have been highly unstable. We extracted more than 11,000 loci (ultra-conserved elements and introns) from whole genomes and conducted analyses using concatenation and multispecies coalescent methods. Topologies resulting from analyses using all loci differed depending on the data type and analytical method, with 2 clades (Antilophia+Chiroxiphia and Manacus+Pipra+Machaeopterus) in the manakin tree showing incongruent results. We hypothesized that gene trees that conflicted with a long coalescent branch (e.g., the branch uniting Antilophia+Chiroxiphia) might be enriched for cases of gene tree estimation error, so we conducted analyses that either constrained those gene trees to include monophyly of Antilophia+Chiroxiphia or excluded these loci. While constraining trees reduced some incongruence, excluding the trees led to completely congruent species trees, regardless of the data type or model of sequence evolution used. We found that a suite of gene metrics (most importantly the number of informative sites and likelihood of intralocus recombination) collectively explained the loci that resulted in non-monophyly of Antilophia+Chiroxiphia. We also found evidence for introgression that may have contributed to the discordant topologies we observe in Antilophia+Chiroxiphia and led to deviations from expectations given the multispecies coalescent model. Our study highlights the importance of identifying factors that can obscure phylogenetic signal when dealing with recalcitrant phylogenetic problems, such as gene tree estimation error, incomplete lineage sorting, and reticulation events. [Birds; c-gene; data type; gene estimation error; model fit; multispecies coalescent; phylogenomics; reticulation] 
    more » « less
  3. The availability of genetic data from wild populations limits our understanding of primate evolution and conservation, particularly for small nocturnal species such as lorisiforms (galagos, lorises, angwantibos, and pottos). Emerging methods for recovering genomic DNA from historical museum specimens have been rarely used in primate studies. We aimed to optimize extraction and bioinformatics protocols to maximize the recovery of historical DNA to fill important geographic and taxonomic gaps, improve phylogenetic resolution, and inform conservation of Lorisiform primates. First, we compared the performance of two DNA extraction methods by using 238 specimens up to a hundred years old. We then selected 96 samples with the highest DNA yields for shotgun sequencing. To evaluate the impact of phylogenetic divergence in bioinformatic read mapping, we compared coverage depths when using human and three lorisiform reference mitogenomes. Based on whole genomic data, we performed metagenomics and microbial diversity analyses to assess the composition of potentially exogenous content. Lastly, based on the most geographically and taxonomically comprehensive sampling for the West African lorisiforms to date (19/32 currently recognized species), we performed phylogenetic inference using Maximum Likelihood. The results showed that older samples yield lower DNA concentration, with an optimized phenol-chloroform protocol outperforming a commercial kit. However, both extraction methods generated DNA in sufficient amount and quality for phylogenetic inference. Our reference bias comparisons showed that higher phylogenetic proximity between focal species and reference mitogenome increases coverage depth. The metagenomic analysis found human contamination in only one of 96 samples (1%), whereas ten of 96 (11%) samples showed nonnegligible levels of other exogenous contents, among which are certain blood parasites. We inferred low support for the monophyly of Asian and African Lorisids but confirmed the monophyly and previously suggested relationships among Galagid genera. Lastly, we found evidence of cryptic species diversity within the western dwarf galagos (genus Galagoides). Taken together, these results attest to the enormous potential of museomics to advance our understanding of galago evolution, ecology, and conservation, an approach that can be extended to other primate clades. 
    more » « less
  4. Abstract Alignment is a crucial issue in molecular phylogenetics because different alignment methods can potentially yield very different topologies for individual genes. But it is unclear if the choice of alignment methods remains important in phylogenomic analyses, which incorporate data from dozens, hundreds, or thousands of genes. For example, problematic biases in alignment might be multiplied across many loci, whereas alignment errors in individual genes might become irrelevant. The issue of alignment trimming (i.e. removing poorly aligned regions or missing data from individual genes) is also poorly explored. Here, we test the impact of 12 different combinations of alignment and trimming methods on phylogenomic analyses. We compare these methods using published phylogenomic data from ultraconserved elements (UCEs) from squamate reptiles (lizards and snakes), birds, and tetrapods. We compare the properties of alignments generated by different alignment and trimming methods (e.g., length, informative sites, missing data). We also test whether these datasets can recover well-established clades when analyzed with concatenated (RAxML) and species-tree methods (ASTRAL-III), using the full data (∼5,000 loci) and subsampled datasets (10% and 1% of loci). We show that different alignment and trimming methods can significantly impact various aspects of phylogenomic datasets (e.g. length, informative sites). However, these different methods generally had little impact on the recovery and support values for well-established clades, even across very different numbers of loci. Nevertheless, our results suggest several “best practices” for alignment and trimming. Intriguingly, the choice of phylogenetic methods impacted the results most strongly, with concatenated analyses recovering significantly more well-established clades (with stronger support) than the species-tree analyses. 
    more » « less
  5. Abstract Numerous genomic methods developed over the past two decades have enabled the discovery and extraction of orthologous loci to help resolve phylogenetic relationships across various taxa and scales. Genome skimming (or low‐coverage genome sequencing) is a promising method to not only extract high‐copy loci but also 100s to 1000s of phylogenetically informative nuclear loci (e.g., ultraconserved elements [UCEs] and exons) from contemporary and museum samples. The subphylum Anthozoa, including important ecosystem engineers (e.g., stony corals, black corals, anemones, and octocorals) in the marine environment, is in critical need of phylogenetic resolution and thus might benefit from a genome‐skimming approach. We conducted genome skimming on 242 anthozoan corals collected from 1886 to 2022. Using existing target‐capture baitsets, we bioinformatically obtained UCEs and exons from the genome‐skimming data and incorporated them with data from previously published target‐capture studies. The mean number of UCE and exon loci extracted from the genome skimming data was 1837 ± 662 SD for octocorals and 1379 ± 476 SD loci for hexacorals. Phylogenetic relationships were well resolved within each class. A mean of 1422 ± 720 loci was obtained from the historical specimens, with 1253 loci recovered from the oldest specimen collected in 1886. We also obtained partial to whole mitogenomes and nuclear rRNA genes from >95% of samples. Bioinformatically pulling UCEs, exons, mitochondrial genomes, and nuclear rRNA genes from genome skimming data is a viable and low‐cost option for phylogenetic studies. This approach can be used to review and support taxonomic revisions and reconstruct evolutionary histories, including historical museum and type specimens. 
    more » « less