skip to main content


Title: pystablemotifs: Python library for attractor identification and control in Boolean networks
Abstract Summary

pystablemotifs is a Python 3 library for analyzing Boolean networks. Its non-heuristic and exhaustive attractor identification algorithm was previously presented in Rozum et al. (2021). Here, we illustrate its performance improvements over similar methods and discuss how it uses outputs of the attractor identification process to drive a system to one of its attractors from any initial state. We implement six attractor control algorithms, five of which are new in this work. By design, these algorithms can return different control strategies, allowing for synergistic use. We also give a brief overview of the other tools implemented in pystablemotifs.

Availability and implementation

The source code is on GitHub at https://github.com/jcrozum/pystablemotifs/.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
Award ID(s):
1715826
NSF-PAR ID:
10362648
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
38
Issue:
5
ISSN:
1367-4803
Format(s):
Medium: X Size: p. 1465-1466
Size(s):
["p. 1465-1466"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Due to their high genomic variability, RNA viruses and retroviruses present a unique opportunity for detailed study of molecular evolution. Lentiviruses, with HIV being a notable example, are one of the best studied viral groups: hundreds of thousands of sequences are available together with experimentally resolved three-dimensional structures for most viral proteins. In this work, we use these data to study specific patterns of evolution of the viral proteins, and their relationship to protein interactions and immunogenicity.

    Results

    We propose a method for identification of two types of surface residues clusters with abnormal conservation: extremely conserved and extremely variable clusters. We identify them on the surface of proteins from HIV and other animal immunodeficiency viruses. Both types of clusters are overrepresented on the interaction interfaces of viral proteins with other proteins, nucleic acids or low molecular-weight ligands, both in the viral particle and between the virus and its host. In the immunodeficiency viruses, the interaction interfaces are not more conserved than the corresponding proteins on an average, and we show that extremely conserved clusters coincide with protein–protein interaction hotspots, predicted as the residues with the largest energetic contribution to the interaction. Extremely variable clusters have been identified here for the first time. In the HIV-1 envelope protein gp120, they overlap with known antigenic sites. These antigenic sites also contain many residues from extremely conserved clusters, hence representing a unique interacting interface enriched both in extremely conserved and in extremely variable clusters of residues. This observation may have important implication for antiretroviral vaccine development.

    Availability and Implementation

    A Python package is available at https://bioinf.mpi-inf.mpg.de/publications/viral-ppi-pred/

    Contact

    voitenko@mpi-inf.mpg.de or kalinina@mpi-inf.mpg.de

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract Motivation

    Multistate protein design addresses real-world challenges, such as multi-specificity design and backbone flexibility, by considering both positive and negative protein states with an ensemble of substates for each. It also presents an enormous challenge to exact algorithms that guarantee the optimal solutions and enable a direct test of mechanistic hypotheses behind models. However, efficient exact algorithms are lacking for multistate protein design.

    Results

    We have developed an efficient exact algorithm called interconnected cost function networks (iCFN) for multistate protein design. Its generic formulation allows for a wide array of applications such as stability, affinity and specificity designs while addressing concerns such as global flexibility of protein backbones. iCFN treats each substate design as a weighted constraint satisfaction problem (WCSP) modeled through a CFN; and it solves the coupled WCSPs using novel bounds and a depth-first branch-and-bound search over a tree structure of sequences, substates, and conformations. When iCFN is applied to specificity design of a T-cell receptor, a problem of unprecedented size to exact methods, it drastically reduces search space and running time to make the problem tractable. Moreover, iCFN generates experimentally-agreeing receptor designs with improved accuracy compared with state-of-the-art methods, highlights the importance of modeling backbone flexibility in protein design, and reveals molecular mechanisms underlying binding specificity.

    Availability and implementation

    https://shen-lab.github.io/software/iCFN

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract Motivation

    Here, we performed a benchmarking analysis of five tools for microbe sequence detection using transcriptomics data (Kraken2, MetaPhlAn2, PathSeq, DRAC and Pandora). We built a synthetic database mimicking real-world structure with tuned conditions accounting for microbe species prevalence, base calling quality and sequence length. Sensitivity and positive predictive value (PPV) parameters, as well as computational requirements, were used for tool ranking.

    Results

    GATK PathSeq showed the highest sensitivity on average and across all scenarios considered. However, the main drawback of this tool was its slowness. Kraken2 was the fastest tool and displayed the second-best sensitivity, though with large variance depending on the species to be classified. There was no significant difference for the other three algorithms sensitivity. The sensitivity of MetaPhlAn2 and Pandora was affected by sequence number and DRAC by sequence quality and length. Results from this study support the use of Kraken2 for routine microbiome profiling based on its competitive sensitivity and runtime performance. Nonetheless, we strongly endorse to complement it by combining with MetaPhlAn2 for thorough taxonomic analyses.

    Availability and implementation

    https://github.com/fjuradorueda/MIME/ and https://github.com/lola4/DRAC/.

    Supplementary information

    Supplementary data are available at Bioinformatics Advances online.

     
    more » « less
  4. Abstract Motivation

    Whole metagenome shotgun sequencing is a powerful approach for assaying the functional potential of microbial communities. We currently lack tools that efficiently and accurately align DNA reads against protein references, the technique necessary for constructing a functional profile. Here, we present PALADIN—a novel modification of the Burrows-Wheeler Aligner that provides accurate alignment, robust reporting capabilities and orders-of-magnitude improved efficiency by directly mapping in protein space.

    Results

    We compared the accuracy and efficiency of PALADIN against existing tools that employ nucleotide or protein alignment algorithms. Using simulated reads, PALADIN consistently outperformed the popular DNA read mappers BWA and NovoAlign in detected proteins, percentage of reads mapped and ontological similarity. We also compared PALADIN against four existing protein alignment tools: BLASTX, RAPSearch2, DIAMOND and Lambda, using empirically obtained reads. PALADIN yielded results seven times faster than the best performing alternative, DIAMOND and nearly 8000 times faster than BLASTX. PALADIN's accuracy was comparable to all tested solutions.

    Availability and Implementation

    PALADIN was implemented in C, and its source code and documentation are available at https://github.com/twestbrookunh/paladin

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Robinson, Peter (Ed.)
    Abstract Motivation

    Identifying cis-acting genetic variants associated with gene expression levels—an analysis commonly referred to as expression quantitative trait loci (eQTLs) mapping—is an important first step toward understanding the genetic determinant of gene expression variation. Successful eQTL mapping requires effective control of confounding factors. A common method for confounding effects control in eQTL mapping studies is the probabilistic estimation of expression residual (PEER) analysis. PEER analysis extracts PEER factors to serve as surrogates for confounding factors, which is further included in the subsequent eQTL mapping analysis. However, it is computationally challenging to determine the optimal number of PEER factors used for eQTL mapping. In particular, the standard approach to determine the optimal number of PEER factors examines one number at a time and chooses a number that optimizes eQTLs discovery. Unfortunately, this standard approach involves multiple repetitive eQTL mapping procedures that are computationally expensive, restricting its use in large-scale eQTL mapping studies that being collected today.

    Results

    Here, we present a simple and computationally scalable alternative, Effect size Correlation for COnfounding determination (ECCO), to determine the optimal number of PEER factors used for eQTL mapping studies. Instead of performing repetitive eQTL mapping, ECCO jointly applies differential expression analysis and Mendelian randomization analysis, leading to substantial computational savings. In simulations and real data applications, we show that ECCO identifies a similar number of PEER factors required for eQTL mapping analysis as the standard approach but is two orders of magnitude faster. The computational scalability of ECCO allows for optimized eQTL discovery across 48 GTEx tissues for the first time, yielding an overall 5.89% power gain on the number of eQTL harboring genes (eGenes) discovered as compared to the previous GTEx recommendation that does not attempt to determine tissue-specific optimal number of PEER factors.

    Availabilityand implementation

    Our method is implemented in the ECCO software, which, along with its GTEx mapping results, is freely available at www.xzlab.org/software.html. All R scripts used in this study are also available at this site.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less