skip to main content


Title: MEGA11: Molecular Evolutionary Genetics Analysis Version 11
Abstract The Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution. Here, we describe new additions that make MEGA a more comprehensive tool for building timetrees of species, pathogens, and gene families using rapid relaxed-clock methods. Methods for estimating divergence times and confidence intervals are implemented to use probability densities for calibration constraints for node-dating and sequence sampling dates for tip-dating analyses. They are supported by new options for tagging sequences with spatiotemporal sampling information, an expanded interactive Node Calibrations Editor, and an extended Tree Explorer to display timetrees. Also added is a Bayesian method for estimating neutral evolutionary probabilities of alleles in a species using multispecies sequence alignments and a machine learning method to test for the autocorrelation of evolutionary rates in phylogenies. The computer memory requirements for the maximum likelihood analysis are reduced significantly through reprogramming, and the graphical user interface has been made more responsive and interactive for very big data sets. These enhancements will improve the user experience, quality of results, and the pace of biological discovery. Natively compiled graphical user interface and command-line versions of MEGA11 are available for Microsoft Windows, Linux, and macOS from www.megasoftware.net.  more » « less
Award ID(s):
2034228 1661218
NSF-PAR ID:
10293686
Author(s) / Creator(s):
; ;
Editor(s):
Battistuzzi, Fabia Ursula
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
38
Issue:
7
ISSN:
1537-1719
Page Range / eLocation ID:
3022 to 3027
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The Molecular Evolutionary Genetics Analysis (MEGA) software enables comparative analysis of molecular sequences in phylogenetics and evolutionary medicine. Here, we introduce the macOS version of the MEGA software. This new version eliminates the need for virtualization and emulation programs previously required to use MEGA on Apple computers. MEGA for macOS utilizes memory and computing resources efficiently for conducting evolutionary analyses on macOS. It has a native Cocoa graphical user interface that is programmed to provide a consistent user experience across macOS, Windows, and Linux. MEGA for macOS is available from www.megasoftware.net free of charge. 
    more » « less
  2. Abstract Motivation

    Timetrees depict evolutionary relationships between species and the geological times of their divergence. Hundreds of research articles containing timetrees are published in scientific journals every year. The TimeTree (TT) project has been manually locating, curating and synthesizing timetrees from these articles for almost two decades into a TimeTree of Life, delivered through a unique, user-friendly web interface (timetree.org). The manual process of finding articles containing timetrees is becoming increasingly expensive and time-consuming. So, we have explored the effectiveness of text-mining approaches and developed optimizations to find research articles containing timetrees automatically.

    Results

    We have developed an optimized machine learning system to determine if a research article contains an evolutionary timetree appropriate for inclusion in the TT resource. We found that BERT classification fine-tuned on whole-text articles achieved an F1 score of 0.67, which we increased to 0.88 by text-mining article excerpts surrounding the mentioning of figures. The new method is implemented in the TimeTreeFinder (TTF) tool, which automatically processes millions of articles to discover timetree-containing articles. We estimate that the TTF tool would produce twice as many timetree-containing articles as those discovered manually, whose inclusion in the TT database would potentially double the knowledge accessible to a wider community. Manual inspection showed that the precision on out-of-distribution recently published articles is 87%. This automation will speed up the collection and curation of timetrees with much lower human and time costs.

    Availability and implementation

    https://github.com/marija-stanojevic/time-tree-classification.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract

    In recent years, multiple technological and methodological advances have increased our ability to estimate phylogenies, leading to more accurate dating of the primate tree of life. Here we provide an overview of the limitations and potentials of some of these advancements and discuss how dated phylogenies provide the crucial temporal scale required to understand primate evolution. First, we review new methods, such as thetotal‐evidence datingapproach, that promise a better integration between the fossil record and molecular data. We then explore how the ever‐increasing availability of genomic‐level data for more primate species can impact our ability to accurately estimate timetrees. Finally, we discuss more recent applications of mutation rates to date divergence times. We highlight example studies that have applied these approaches to estimate divergence dates within primates. Our goal is to provide a critical overview of these new developments and explore the promises and challenges of their application in evolutionary anthropology.

     
    more » « less
  4. Abstract

    Chronograms—phylogenies with branch lengths proportional to time—represent key data on timing of evolutionary events, allowing us to study natural processes in many areas of biological research. Chronograms also provide valuable information that can be used for education, science communication, and conservation policy decisions. Yet, achieving a high-quality reconstruction of a chronogram is a difficult and resource-consuming task. Here we present DateLife, a phylogenetic software implemented as an R package and an R Shiny web application available at www.datelife.org, that provides services for efficient and easy discovery, summary, reuse, and reanalysis of node age data mined from a curated database of expert, peer-reviewed, and openly available chronograms. The main DateLife workflow starts with one or more scientific taxon names provided by a user. Names are processed and standardized to a unified taxonomy, allowing DateLife to run a name match across its local chronogram database that is curated from Open Tree of Life’s phylogenetic repository, and extract all chronograms that contain at least two queried taxon names, along with their metadata. Finally, node ages from matching chronograms are mapped using the congruification algorithm to corresponding nodes on a tree topology, either extracted from Open Tree of Life’s synthetic phylogeny or one provided by the user. Congruified node ages are used as secondary calibrations to date the chosen topology, with or without initial branch lengths, using different phylogenetic dating methods such as BLADJ, treePL, PATHd8, and MrBayes. We performed a cross-validation test to compare node ages resulting from a DateLife analysis (i.e, phylogenetic dating using secondary calibrations) to those from the original chronograms (i.e, obtained with primary calibrations), and found that DateLife’s node age estimates are consistent with the age estimates from the original chronograms, with the largest variation in ages occurring around topologically deeper nodes. Because the results from any software for scientific analysis can only be as good as the data used as input, we highlight the importance of considering the results of a DateLife analysis in the context of the input chronograms. DateLife can help to increase awareness of the existing disparities among alternative hypotheses of dates for the same diversification events, and to support exploration of the effect of alternative chronogram hypotheses on downstream analyses, providing a framework for a more informed interpretation of evolutionary results.

     
    more » « less
  5. Resolving the phylogenetic relationships among Paleocene mammals has been a longstanding goal in paleontology. Constructing an accurate and comprehensive phylogeny for Paleocene mammals is a worthwhile objective in itself, but it also provides a framework on which we can better understand the origin of placental mammals and the evolutionary processes underlying the diversification of mammals before, during, and after the end-Cretaceous mass extinction. More recently, a robust Palaeocene mammal phylogeny has become a much-coveted tool for reconciling discrepancies between morphological and molecular evidence for the phylogeny and diversification of Placentalia. Here, we present a novel phylogenetic dataset to test hypotheses regarding Paleocene mammal phylogeny and the origin and diversification of Placentalia. To date, our matrix combines phenomic data for 36 extant mammal species and 107 fossil species scored for 2540 morphological characters alongside 26 genes sequenced for 47 species. We utilized a reductive morphological scoring strategy in order to minimize assumptions and test hypotheses on homology. Multiple sequence alignments were performed in MEGA-X for each gene. We then analysed the data using Bayesian methods and explored the effects of different approaches. Relaxed clock analyses using a molecular constraint and an FBD prior are congruent with the diversification of many extant orders prior to the K-Pg boundary. Relaxed clocked total-evidence analyses (morphology and molecules) using an FBD prior resulted in older ages of diversification than those estimated by our relaxed clock molecular constraint model and previous molecular studies. Within Placentalia, our phylogenies provide support for the divergence of Atlantogenata (Afrotheria and Xenarthra) from Boreoeutheria (Euarchontoglires and Laurasiatheria). Among the Paleocene taxa, ‘condylarths’ are distributed along the base of Laurasiatheria with members of ‘Arctocyonidae’ recovered as sister taxa to Artiodactyla; enigmatic groups such as Pantodonta and Taeniodonta are recovered as crown placentals whereas Leptictida is not. Our Paleocene mammal phylogeny is a critical step toward better understanding placental mammal evolution. Ultimately, this work will facilitate the investigation of fundamental questions previously encumbered by the lack of a well-resolved phylogeny. 
    more » « less