CoreCruncher : Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets

Harris, Connor D; Torrance, Ellis L; Raymann, Kasie; Bobay, Louis-Marie

doi:10.1093/molbev/msaa224

Citation Details

CoreCruncher : Fast and Robust Construction of Core Genomes in Large Prokaryotic Data Sets

Abstract The core genome represents the set of genes shared by all, or nearly all, strains of a given population or species of prokaryotes. Inferring the core genome is integral to many genomic analyses, however, most methods rely on the comparison of all the pairs of genomes; a step that is becoming increasingly difficult given the massive accumulation of genomic data. Here, we present CoreCruncher; a program that robustly and rapidly constructs core genomes across hundreds or thousands of genomes. CoreCruncher does not compute all pairwise genome comparisons and uses a heuristic based on the distributions of identity scores to classify sequences as orthologs or paralogs/xenologs. Although it is much faster than current methods, our results indicate that our approach is more conservative than other tools and less sensitive to the presence of paralogs and xenologs. CoreCruncher is freely available from: https://github.com/lbobay/CoreCruncher. CoreCruncher is written in Python 3.7 and can also run on Python 2.7 without modification. It requires the python library Numpy and either Usearch or Blast. Certain options require the programs muscle or mafft. more »

Award ID(s):: 1930776 2344788

PAR ID:: 10250798

Author(s) / Creator(s):: Harris, Connor D; Torrance, Ellis L; Raymann, Kasie; Bobay, Louis-Marie

Editor(s):: Ouangraoua, Aida

Date Published:: 2020-09-04

Journal Name:: Molecular Biology and Evolution

Volume:: 38

Issue:: 2

ISSN:: 1537-1719

Page Range / eLocation ID:: 727 to 734

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1093/molbev/msaa224

More Like this