Large-Scale Multiple Sequence Alignment and the Maximum Weight Trace Alignment Merging Problem

Zaharias, Paul; Smirnov, Vladimir; Warnow, Tandy

doi:10.1109/TCBB.2022.3191848

Citation Details

Large-Scale Multiple Sequence Alignment and the Maximum Weight Trace Alignment Merging Problem

MAGUS is a recent multiple sequence alignment method that provides excellent accuracy on large challenging datasets. MAGUS uses divide-and-conquer: it divides the sequences into disjoint sets, computes alignments on the disjoint sets, and then merges the alignments using a technique it calls the Graph Clustering Method (GCM). To understand why MAGUS is so accurate, we show that GCM is a good heuristic for the NP-hard MWT-AM problem (Maximum Weight Trace, adapted to the Alignment Merging problem). Our study, using both biological and simulated data, establishes that MWT-AM scores correlate very well with alignment accuracy and presents improvements to GCM that are even better heuristics for MWT-AM. This study suggests a new direction for large-scale MSA estimation based on improved divide-and-conquer strategies, with the merging step based on optimizing MWT-AM. MAGUS and its enhanced versions are available at https://github.com/vlasmirnov/MAGUS. more »

Award ID(s):: 2006069

PAR ID:: 10415277

Author(s) / Creator(s):: Zaharias, Paul; Smirnov, Vladimir; Warnow, Tandy

Date Published:: 2022-07-18

Journal Name:: IEEE/ACM Transactions on Computational Biology and Bioinformatics

ISSN:: 1545-5963

Page Range / eLocation ID:: 1 to 13

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1109/TCBB.2022.3191848

More Like this