skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Create, Analyze, and Visualize Phylogenomic Datasets Using PhyloFisher
Abstract PhyloFisher is a software package written primarily in Python3 that can be used for the creation, analysis, and visualization of phylogenomic datasets that consist of protein sequences from eukaryotic organisms. Unlike many existing phylogenomic pipelines, PhyloFisher comes with a manually curated database of 240 protein‐coding genes, a subset of a previous phylogenetic dataset sampled from 304 eukaryotic taxa. The software package can also utilize a user‐created database of eukaryotic proteins, which may be more appropriate for shallow evolutionary questions. PhyloFisher is also equipped with a set of utilities to aid in running routine analyses, such as the prediction of alternative genetic codes, removal of genes and/or taxa based on occupancy/completeness of the dataset, testing for amino acid compositional heterogeneity among sequences, removal of heterotachious and/or fast‐evolving sites, removal of fast‐evolving taxa, supermatrix creation from randomly resampled genes, and supermatrix creation from nucleotide sequences. © 2024 Wiley Periodicals LLC. Basic Protocol 1: Constructing a phylogenomic dataset Basic Protocol 2: Performing phylogenomic analyses Support Protocol 1: Installing PhyloFisher Support Protocol 2: Creating a custom phylogenomic database  more » « less
Award ID(s):
2100888
PAR ID:
10501379
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
Current Protocols
Volume:
4
Issue:
1
ISSN:
2691-1299
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. PhyloFisher is a software package, written in Python3, that contains a protocol designed for phylogenomic dataset assembly and data exploration. This software package aids in the construction and curation of protein sequence-based phylogenomic datasets, conducts post-assembly analyses, and allows visualization of the results. In addition, PhyloFisher currently includes a manually curated starting dataset of 240 proteins from 304 eukaryotic taxa representing the full breadth of known diversity in the eukaryotic tree of life. Importantly, this dataset also includes identified paralogs of each of the 240 proteins from all investigated taxa which is crucial for the identification of probable orthologs. Although PhyloFisher includes this pan-eukaryotic dataset, the tool is flexible and can work with any dataset consisting of protein sequences derived from eukaryotes. The combination of all of the foregoing features makes PhyloFisher a broadly-useful, user-friendly software tool for sophisticated phylogenomic analyses of eukaryotes.</div></div>PROJECT WEBSITE: http://amoeba.msstate.edu/phylofisher/ </div>PROJECT GITHUB: http://github.com/TheBrownLab/PhyloFisher</div></div>This dataset contains files for endusers to retrieve for installation of PhyloFisher as well as accompanying data from the PhyloFisher manuscript.</div></div>Tice_etal.PhyloFisher.archives.tar.gz | Installation requirements for PIP installation</div>Tice_etal.PhyloFisher1.FINAL_DATASET_RENAMED.tar.gz | File dataset associated with the manuscript including matrices and phylogenetic analyses</div>Tice_etal.PhyloFisher_v1.0_input_proteomes_LongNames.tar.gz | Input proteome data from taxa that was used to construct PhyloFisher v1.0</div>Tice_etal.PhyloFisherDatabase_v1.0_Jan.28.2021.tar.gz | PhyloFisher v1.0 starting database</div>Tice_etal.PhyloFisher_FOR_CUSTOM_DATASET_Jan.28.2021.tar.gz | Necessary files and directory structure to be used in custom database construction.</div>Tice_etal.PhyloFisher.DATA.tgz | All data associated with the figures (Fig 3, 4, A-Y) along with all phylogenomic trees and analyses. </div></div></div> 
    more » « less
  2. ABSTRACT Phylogenies built from multiple genes have become a common component of evolutionary biology studies. Molecular phylogenomic matrices used to build multi-gene phylogenies can be built from either nucleotide or protein matrices. Nucleotide-based analyses are often more appropriate for addressing phylogenetic questions in evolutionarily shallow timescales (i.e., less than 100 million years) while protein-based analyses are often more appropriate for addressing deep phylogenetic questions. PhyloFisher is a phylogenomic software package written in Python3. The manually curated PhyloFisher database contains 240 protein-coding genes from 304 eukaryotic taxa. Here we presentnucl_matrix_constructor.py, an expansion of the PhyloFisher starting database, and an update to PhyloFisher that maintains DNA sequences. This combination will allow users the ability to easily build nucleotide phylogenomic matrices while retaining the benefits of protein-based pre-processing used to identify contaminants and paralogy. 
    more » « less
  3. Abstract Multiple sequence alignments and phylogenetic trees are rich in biological information and are fundamental to research in biology. PhyKIT is a tool for processing and analyzing the information content of multiple sequence alignments and phylogenetic trees. Here, we describe how to use PhyKIT for diverse analyses, including (i) constructing a phylogenomic supermatrix, (ii) detecting errors in orthology inference, (iii) quantifying biases in phylogenomic data sets, (iv) identifying radiation events or lack of resolution using gene support frequencies, and (v) conducting evolution‐based screens to facilitate gene function prediction. Several PhyKIT functions that streamline multiple sequence alignment and phylogenetic processing—such as renaming FASTA entries or tree tips—are also discussed. These protocols demonstrate how simple command‐line operations in the unified framework of PhyKIT facilitate diverse phylogenomic data analysis and processing, from supermatrix construction and diagnosis to gaining clues about gene function. © 2024 The Author(s). Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Installing PhyKIT and syntax for usage Basic Protocol 2: Constructing a phylogenomic supermatrix Basic Protocol 3: Detecting anomalies in orthology relationships Basic Protocol 4: Quantifying biases in phylogenomic data matrices and related measures Basic Protocol 5: Identifying polytomies Basic Protocol 6: Assessing gene‐gene coevolution as a genetic screen 
    more » « less
  4. Hejnol, Andreas (Ed.)
    Phylogenomic analyses of hundreds of protein-coding genes aimed at resolving phylogenetic relationships is now a common practice. However, no software currently exists that includes tools for dataset construction and subsequent analysis with diverse validation strategies to assess robustness. Furthermore, there are no publicly available high-quality curated databases designed to assess deep (>100 million years) relationships in the tree of eukaryotes. To address these issues, we developed an easy-to-use software package, PhyloFisher ( https://github.com/TheBrownLab/PhyloFisher ), written in Python 3. PhyloFisher includes a manually curated database of 240 protein-coding genes from 304 eukaryotic taxa covering known eukaryotic diversity, a novel tool for ortholog selection, and utilities that will perform diverse analyses required by state-of-the-art phylogenomic investigations. Through phylogenetic reconstructions of the tree of eukaryotes and of the Saccharomycetaceae clade of budding yeasts, we demonstrate the utility of the PhyloFisher workflow and the provided starting database to address phylogenetic questions across a large range of evolutionary time points for diverse groups of organisms. We also demonstrate that undetected paralogy can remain in phylogenomic “single-copy orthogroup” datasets constructed using widely accepted methods such as all vs. all BLAST searches followed by Markov Cluster Algorithm (MCL) clustering and application of automated tree pruning algorithms. Finally, we show how the PhyloFisher workflow helps detect inadvertent paralog inclusions, allowing the user to make more informed decisions regarding orthology assignments, leading to a more accurate final dataset. 
    more » « less
  5. Abstract Fluorescence fluctuation spectroscopy (FFS) encompasses a bevy of techniques that involve analyzing fluorescence intensity fluctuations occurring due to fluorescently labeled molecules diffusing in and out of a microscope's focal region. Statistical analysis of these fluctuations may reveal the oligomerization (i.e., association) state of said molecules. We have recently developed a new FFS‐based method, termed Two‐Dimensional Fluorescence Intensity Fluctuation (2D FIF) spectrometry, which provides quantitative information on the size and stability of protein oligomers as a function of receptor concentration. This article describes protocols for employing FIF spectrometry to quantify the oligomerization of a membrane protein of interest, with specific instructions regarding cell preparation, image acquisition, and analysis of images given in detail. Application of the FIF Spectrometry Suite, a software package designed for applying FIF analysis on fluorescence images, is emphasized in the protocol. Also discussed in detail is the identification, removal, and/or analysis of inhomogeneous regions of the membrane that appear as bright spots. The 2D FIF approach is particularly suited to assess the effects of agonists and antagonists on the oligomeric size of membrane receptors of interest. © 2022 Wiley Periodicals LLC. Basic Protocol 1: Preparation of live cells expressing protein constructs Basic Protocol 2: Image acquisition and noise correction Basic Protocol 3: Drawing and segmenting regions of interest Basic Protocol 4: Calculating the molecular brightness and concentration of individual image segments Basic Protocol 5: Combining data subsets using a manual procedure (Optional) Alternate Protocol 1: Combining data subsets using the advanced FIF spectrometry suite (Optional; alternative to Basic Protocol 5) Basic Protocol 6: Performing meta‐analysis of brightness spectrograms Alternate Protocol 2: Performing meta‐analysis of brightness spectrograms (alternative to Basic Protocol 6) Basic Protocol 7: Spot extraction and analysis using a manual procedure or by writing a program (Optional) Alternate Protocol 3: Automated spot extraction and analysis (Optional; alternative to Protocol 7) Support Protocol: Monomeric brightness determination 
    more » « less