skip to main content

Title: Unsupervised logic-based mechanism inference for network-driven biological processes
Modern analytical techniques enable researchers to collect data about cellular states, before and after perturbations. These states can be characterized using analytical techniques, but the inference of regulatory interactions that explain and predict changes in these states remains a challenge. Here we present a generalizable, unsupervised approach to generate parameter-free, logic-based models of cellular processes, described by multiple discrete states. Our algorithm employs a Hamming-distance based approach to formulate, test, and identify optimized logic rules that link two states. Our approach comprises two steps. First, a model with no prior knowledge except for the mapping between initial and attractor states is built. We then employ biological constraints to improve model fidelity. Our algorithm automatically recovers the relevant dynamics for the explored models and recapitulates key aspects of the biochemical species concentration dynamics in the original model. We present the advantages and limitations of our work and discuss how our approach could be used to infer logic-based mechanisms of signaling, gene-regulatory, or other input-output processes describable by the Boolean formalism.
Authors:
; ; ; ; ;
Editors:
Saucerman, Jeffrey J.
Award ID(s):
1942255
Publication Date:
NSF-PAR ID:
10302781
Journal Name:
PLOS Computational Biology
Volume:
17
Issue:
6
ISSN:
1553-7358
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Advances in experimental and imaging techniques have allowed for unprecedented insights into the dynamical processes within individual cells. However, many facets of intracellular dynamics remain hidden, or can be measured only indirectly. This makes it challenging to reconstruct the regulatory networks that govern the biochemical processes underlying various cell functions. Current estimation techniques for inferring reaction rates frequently rely on marginalization over unobserved processes and states. Even in simple systems this approach can be computationally challenging, and can lead to large uncertainties and lack of robustness in parameter estimates. Therefore we will require alternative approaches to efficiently uncover the interactions in complex biochemical networks.

    Results

    We propose a Bayesian inference framework based on replacing uninteresting or unobserved reactions with time delays. Although the resulting models are non-Markovian, recent results on stochastic systems with random delays allow us to rigorously obtain expressions for the likelihoods of model parameters. In turn, this allows us to extend MCMC methods to efficiently estimate reaction rates, and delay distribution parameters, from single-cell assays. We illustrate the advantages, and potential pitfalls, of the approach using a birth–death model with both synthetic and experimental data, and show that we can robustly infer model parameters using a relativelymore »small number of measurements. We demonstrate how to do so even when only the relative molecule count within the cell is measured, as in the case of fluorescence microscopy.

    Availability and implementation

    Accompanying code in R is available at https://github.com/cbskust/DDE_BD.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less
  2. Epithelial-to-mesenchymal transition (EMT) plays an important role in many biological processes during development and cancer. The advent of single-cell transcriptome sequencing techniques allows the dissection of dynamical details underlying EMT with unprecedented resolution. Despite several single-cell data analysis on EMT, how cell communicates and regulates dynamics along the EMT trajectory remains elusive. Using single-cell transcriptomic datasets, here we infer the cell–cell communications and the multilayer gene–gene regulation networks to analyze and visualize the complex cellular crosstalk and the underlying gene regulatory dynamics along EMT. Combining with trajectory analysis, our approach reveals the existence of multiple intermediate cell states (ICSs) with hybrid epithelial and mesenchymal features. Analyses on the time-series datasets from cancer cell lines with different inducing factors show that the induced EMTs are context-specific: the EMT induced by transforming growth factor B1 (TGFB1) is synchronous, whereas the EMTs induced by epidermal growth factor and tumor necrosis factor are asynchronous, and the responses of TGF-β pathway in terms of gene expression regulations are heterogeneous under different treatments or among various cell states. Meanwhile, network topology analysis suggests that the ICSs during EMT serve as the signaling in cellular communication under different conditions. Interestingly, our analysis of a mouse skin squamousmore »cell carcinoma dataset also suggests regardless of the significant discrepancy in concrete genes between in vitro and in vivo EMT systems, the ICSs play dominant role in the TGF-β signaling crosstalk. Overall, our approach reveals the multiscale mechanisms coupling cell–cell communications and gene–gene regulations responsible for complex cell-state transitions.« less
  3. Abstract Motivation

    Reversible protein phosphorylation is an essential post-translational modification regulating protein functions and signaling pathways in many cellular processes. Aberrant activation of signaling pathways often contributes to cancer development and progression. The mass spectrometry-based phosphoproteomics technique is a powerful tool to investigate the site-level phosphorylation of the proteome in a global fashion, paving the way for understanding the regulatory mechanisms underlying cancers. However, this approach is time-consuming and requires expensive instruments, specialized expertise and a large amount of starting material. An alternative in silico approach is predicting the phosphoproteomic profiles of cancer patients from the available proteomic, transcriptomic and genomic data.

    Results

    Here, we present a winning algorithm in the 2017 NCI-CPTAC DREAM Proteogenomics Challenge for predicting phosphorylation levels of the proteome across cancer patients. We integrate four components into our algorithm, including (i) baseline correlations between protein and phosphoprotein abundances, (ii) universal protein–protein interactions, (iii) shareable regulatory information across cancer tissues and (iv) associations among multi-phosphorylation sites of the same protein. When tested on a large held-out testing dataset of 108 breast and 62 ovarian cancer samples, our method ranked first in both cancer tissues, demonstrating its robustness and generalization ability.

    Availability and implementation

    Our code and reproducible results are freelymore »available on GitHub: https://github.com/GuanLab/phosphoproteome_prediction.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less
  4. Abstract Background Second messengers, c-di-GMP and (p)ppGpp, are vital regulatory molecules in bacteria, influencing cellular processes such as biofilm formation, transcription, virulence, quorum sensing, and proliferation. While c-di-GMP and (p)ppGpp are both synthesized from GTP molecules, they play antagonistic roles in regulating the cell cycle. In C. crescentus , c-di-GMP works as a major regulator of pole morphogenesis and cell development. It inhibits cell motility and promotes S-phase entry by inhibiting the activity of the master regulator, CtrA. Intracellular (p)ppGpp accumulates under starvation, which helps bacteria to survive under stressful conditions through regulating nucleotide levels and halting proliferation. (p)ppGpp responds to nitrogen levels through RelA-SpoT homolog enzymes, detecting glutamine concentration using a nitrogen phosphotransferase system (PTS Ntr ). This work relates the guanine nucleotide-based second messenger regulatory network with the bacterial PTS Ntr system and investigates how bacteria respond to nutrient availability. Results We propose a mathematical model for the dynamics of c-di-GMP and (p)ppGpp in C. crescentus and analyze how the guanine nucleotide-based second messenger system responds to certain environmental changes communicated through the PTS Ntr system. Our mathematical model consists of seven ODEs describing the dynamics of nucleotides and PTS Ntr enzymes. Our simulations are consistent with experimentalmore »observations and suggest, among other predictions, that SpoT can effectively decrease c-di-GMP levels in response to nitrogen starvation just as well as it increases (p)ppGpp levels. Thus, the activity of SpoT (or its homologues in other bacterial species) can likely influence the cell cycle by influencing both c-di-GMP and (p)ppGpp. Conclusions In this work, we integrate current knowledge and experimental observations from the literature to formulate a novel mathematical model. We analyze the model and demonstrate how the PTS Ntr system influences (p)ppGpp, c-di-GMP, GMP and GTP concentrations. While this model does not consider all aspects of PTS Ntr signaling, such as cross-talk with the carbon PTS system, here we present our first effort to develop a model of nutrient signaling in C. crescentus .« less
  5. Abstract Rapid growth of single-cell transcriptomic data provides unprecedented opportunities for close scrutinizing of dynamical cellular processes. Through investigating epithelial-to-mesenchymal transition (EMT), we develop an integrative tool that combines unsupervised learning of single-cell transcriptomic data and multiscale mathematical modeling to analyze transitions during cell fate decision. Our approach allows identification of individual cells making transition between all cell states, and inference of genes that drive transitions. Multiscale extractions of single-cell scale outputs naturally reveal intermediate cell states (ICS) and ICS-regulated transition trajectories, producing emergent population-scale models to be explored for design principles. Testing on the newly designed single-cell gene regulatory network model and applying to twelve published single-cell EMT datasets in cancer and embryogenesis, we uncover the roles of ICS on adaptation, noise attenuation, and transition efficiency in EMT, and reveal their trade-off relations. Overall, our unsupervised learning method is applicable to general single-cell transcriptomic datasets, and our integrative approach at single-cell resolution may be adopted for other cell fate transition systems beyond EMT.