NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

More is needed—Thousands of loci are required to elucidate the relationships of the ‘flowers of the sea’ (Sabellida, Annelida)

https://doi.org/10.1016/j.ympev.2020.106892

Tilic, Ekin; Sayyari, Erfan; Stiller, Josefin; Mirarab, Siavash; Rouse, Greg W. (October 2020, Molecular Phylogenetics and Evolution)
null (Ed.)
Full Text Available
TADA: phylogenetic augmentation of microbiome samples enhances phenotype classification

https://doi.org/10.1093/bioinformatics/btz394

Sayyari, Erfan; Kawas, Ban; Mirarab, Siavash (July 2019, Bioinformatics)

Abstract MotivationLearning associations of traits with the microbial composition of a set of samples is a fundamental goal in microbiome studies. Recently, machine learning methods have been explored for this goal, with some promise. However, in comparison to other fields, microbiome data are high-dimensional and not abundant; leading to a high-dimensional low-sample-size under-determined system. Moreover, microbiome data are often unbalanced and biased. Given such training data, machine learning methods often fail to perform a classification task with sufficient accuracy. Lack of signal is especially problematic when classes are represented in an unbalanced way in the training data; with some classes under-represented. The presence of inter-correlations among subsets of observations further compounds these issues. As a result, machine learning methods have had only limited success in predicting many traits from microbiome. Data augmentation consists of building synthetic samples and adding them to the training data and is a technique that has proved helpful for many machine learning tasks. ResultsIn this paper, we propose a new data augmentation technique for classifying phenotypes based on the microbiome. Our algorithm, called TADA, uses available data and a statistical generative model to create new samples augmenting existing ones, addressing issues of low-sample-size. In generating new samples, TADA takes into account phylogenetic relationships between microbial species. On two real datasets, we show that adding these synthetic samples to the training set improves the accuracy of downstream classification, especially when the training data have an unbalanced representation of classes. Availability and implementationTADA is available at https://github.com/tada-alg/TADA. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction

https://doi.org/10.1186/s12864-016-3098-z

Sayyari, Erfan; Mirarab, Siavash (November 2016, BMC Genomics)

Full Text Available
Fast Coalescent-Based Computation of Local Branch Support from Quartet Frequencies

https://doi.org/10.1093/molbev/msw079

Sayyari, Erfan; Mirarab, Siavash (June 2016, Molecular Biology and Evolution)

Full Text Available
Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea

https://doi.org/10.1038/s41467-019-13443-4

Zhu, Qiyun; Mai, Uyen; Pfeiffer, Wayne; Janssen, Stefan; Asnicar, Francesco; Sanders, Jon_G; Belda-Ferre, Pedro; Al-Ghalith, Gabriel_A; Kopylova, Evguenia; McDonald, Daniel; et al (December 2019, Nature Communications)

Abstract Rapid growth of genome data provides opportunities for updating microbial evolutionary relationships, but this is challenged by the discordant evolution of individual genes. Here we build a reference phylogeny of 10,575 evenly-sampled bacterial and archaeal genomes, based on a comprehensive set of 381 markers, using multiple strategies. Our trees indicate remarkably closer evolutionary proximity between Archaea and Bacteria than previous estimates that were limited to fewer “core” genes, such as the ribosomal proteins. The robustness of the results was tested with respect to several variables, including taxon and site sampling, amino acid substitution heterogeneity and saturation, non-vertical evolution, and the impact of exclusion of candidate phyla radiation (CPR) taxa. Our results provide an updated view of domain-level relationships.
more » « less
One thousand plant transcriptomes and the phylogenomics of green plants

https://doi.org/10.1038/s41586-019-1693-2

Leebens-Mack, James H; Barker, Michael S; Carpenter, Eric J; Deyholos, Michael K; Gitzendanner, Matthew A; Graham, Sean W; Grosse, Ivo; Li, Zheng; Melkonian, Michael; Mirarab, Siavash; et al (October 2019, Nature)

Green plants (Viridiplantae) include around 450,000–500,000 species of great diversity and have important roles in terrestrial and aquatic ecosystems. Here, as part of the One Thousand Plant Transcriptomes Initiative, we sequenced the vegetative transcriptomes of 1,124 species that span the diversity of plants in a broad sense (Archaeplastida), including green plants (Viridiplantae), glaucophytes (Glaucophyta) and red algae (Rhodophyta). Our analysis provides a robust phylogenomic framework for examining the evolution of green plants. Most inferred species relationships are well supported across multiple species tree and supermatrix analyses, but discordance among plastid and nuclear gene trees at a few important nodes highlights the complexity of plant genome evolution, including polyploidy, periods of rapid speciation, and extinction. Incomplete sorting of ancestral variation, polyploidization and massive expansions of gene families punctuate the evolutionary history of green plants. Notably, we find that large expansions of gene families preceded the origins of green plants, land plants and vascular plants, whereas whole-genome duplications are inferred to have occurred repeatedly throughout the evolution of flowering plants and ferns. The increasing availability of high-quality plant genome sequences and advances in functional genomics are enabling research on genome evolution across the green tree of life.
more » « less
Full Text Available

Search for: All records