NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Maximum likelihood reconstruction of ancestral networks by integer linear programming

https://doi.org/10.1093/bioinformatics/btaa931

Rajan, Vaibhav; Zhang, Ziqi; Kingsford, Carl; Zhang, Xiuwei (December 2020, Bioinformatics)
Yann, Ponty (Ed.)
Abstract Motivation The study of the evolutionary history of biological networks enables deep functional understanding of various bio-molecular processes. Network growth models, such as the Duplication–Mutation with Complementarity (DMC) model, provide a principled approach to characterizing the evolution of protein–protein interactions (PPIs) based on duplication and divergence. Current methods for model-based ancestral network reconstruction primarily use greedy heuristics and yield sub-optimal solutions. Results We present a new Integer Linear Programming (ILP) solution for maximum likelihood reconstruction of ancestral PPI networks using the DMC model. We prove the correctness of our solution that is designed to find the optimal solution. It can also use efficient heuristics from general-purpose ILP solvers to obtain multiple optimal and near-optimal solutions that may be useful in many applications. Experiments on synthetic data show that our ILP obtains solutions with higher likelihood than those from previous methods, and is robust to noise and model mismatch. We evaluate our algorithm on two real PPI networks, with proteins from the families of bZIP transcription factors and the Commander complex. On both the networks, solutions from our ILP have higher likelihood and are in better agreement with independent biological evidence from other studies. Availability and implementation A Python implementation is available at https://bitbucket.org/cdal/network-reconstruction. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available
CHESPA/CHESCA-SPARKY: automated NMR data analysis plugins for SPARKY to map protein allostery

https://doi.org/10.1093/bioinformatics/btaa781

Shao, Hongzhao; Boulton, Stephen; Olivieri, Cristina; Mohamed, Hebatallah; Akimoto, Madoka; Subrahmanian, Manu Veliparambil; Veglia, Gianluigi; Markley, John L; Melacini, Giuseppe; Lee, Woonghee (September 2020, Bioinformatics)
Yann, Ponty (Ed.)
Abstract Motivation Correlated Nuclear Magnetic Resonance (NMR) chemical shift changes identified through the CHEmical Shift Projection Analysis (CHESPA) and CHEmical Shift Covariance Analysis (CHESCA) reveal pathways of allosteric transitions in biological macromolecules. To address the need for an automated platform that implements CHESPA and CHESCA and integrates them with other NMR analysis software packages, we introduce here integrated plugins for NMRFAM-SPARKY that implement the seamless detection and visualization of allosteric networks. Availability and implementation CHESCA-SPARKY and CHESPA-SPARKY are available in the latest version of NMRFAM-SPARKY from the National Magnetic Resonance Facility at Madison (http://pine.nmrfam.wisc.edu/download_packages.html), the NMRbox Project (https://nmrbox.org) and to subscribers to the SBGrid (https://sbgrid.org). The assigned spectra involved in this study and tutorial videos using this dataset are available at https://sites.google.com/view/chescachespa-sparky. Supplementary information Supplementary data are available at Bioinformatics Online.
more » « less
Full Text Available
Overlap detection on long, error-prone sequencing reads via smooth q -gram

https://doi.org/10.1093/bioinformatics/btaa252

Song, Yan; Tang, Haixu; Zhang, Haoyu; Zhang, Qin (April 2020, Bioinformatics)
Yann, Ponty (Ed.)
Abstract Motivation Third generation sequencing techniques, such as the Single Molecule Real Time technique from PacBio and the MinION technique from Oxford Nanopore, can generate long, error-prone sequencing reads which pose new challenges for fragment assembly algorithms. In this paper, we study the overlap detection problem for error-prone reads, which is the first and most critical step in the de novo fragment assembly. We observe that all the state-of-the-art methods cannot achieve an ideal accuracy for overlap detection (in terms of relatively low precision and recall) due to the high sequencing error rates, especially when the overlap lengths between reads are relatively short (e.g. <2000 bases). This limitation appears inherent to these algorithms due to their usage of q-gram-based seeds under the seed-extension framework. Results We propose smooth q-gram, a variant of q-gram that captures q-gram pairs within small edit distances and design a novel algorithm for detecting overlapping reads using smooth q-gram-based seeds. We implemented the algorithm and tested it on both PacBio and Nanopore sequencing datasets. Our benchmarking results demonstrated that our algorithm outperforms the existing q-gram-based overlap detection algorithms, especially for reads with relatively short overlapping lengths. Availability and implementation The source code of our implementation in C++ is available at https://github.com/FIGOGO/smoothq. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available

Search for: All records