Personalized (patient-specific) approaches have recently emerged with a precision medicine paradigm that acknowledges the fact that molecular pathway structures and activity might be considerably different within and across tumors. The functional cancer genome and proteome provide rich sources of information to identify patient-specific variations in signaling pathways and activities within and across tumors; however, current analytic methods lack the ability to exploit the diverse and multi-layered architecture of these complex biological networks. We assessed pan-cancer pathway activities for >7700 patients across 32 tumor types from The Cancer Proteome Atlas by developing a personalized cancer-specific integrated network estimation (PRECISE) model. PRECISE is a general Bayesian framework for integrating existing interaction databases, data-driven
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Scientific Reports
- Nature Publishing Group
- Sponsoring Org:
- National Science Foundation
More Like this
Abstract Motivation Somatic mutations result from processes related to DNA replication or environmental/lifestyle exposures. Knowing the activity of mutational processes in a tumor can inform personalized therapies, early detection, and understanding of tumorigenesis. Computational methods have revealed 30 validated signatures of mutational processes active in human cancers, where each signature is a pattern of single base substitutions. However, half of these signatures have no known etiology, and some similar signatures have distinct etiologies, making patterns of mutation signature activity hard to interpret. Existing mutation signature detection methods do not consider tumor-level clinical/demographic (e.g. smoking history) or molecular features (e.g. inactivations to DNA damage repair genes). Results To begin to address these challenges, we present the Tumor Covariate Signature Model (TCSM), the first method to directly model the effect of observed tumor-level covariates on mutation signatures. To this end, our model uses methods from Bayesian topic modeling to change the prior distribution on signature exposure conditioned on a tumor’s observed covariates. We also introduce methods for imputing covariates in held-out data and for evaluating the statistical significance of signature-covariate associations. On simulated and real data, we find that TCSM outperforms both non-negative matrix factorization and topic modeling-based approaches, particularly in recoveringmore »
METABOLIC: high-throughput profiling of microbial genomes for functional traits, metabolism, biogeochemistry, and community-scale functional networks
Advances in microbiome science are being driven in large part due to our ability to study and infer microbial ecology from genomes reconstructed from mixed microbial communities using metagenomics and single-cell genomics. Such omics-based techniques allow us to read genomic blueprints of microorganisms, decipher their functional capacities and activities, and reconstruct their roles in biogeochemical processes. Currently available tools for analyses of genomic data can annotate and depict metabolic functions to some extent; however, no standardized approaches are currently available for the comprehensive characterization of metabolic predictions, metabolite exchanges, microbial interactions, and microbial contributions to biogeochemical cycling.
We present METABOLIC (METabolic And BiogeOchemistry anaLyses In miCrobes), a scalable software to advance microbial ecology and biogeochemistry studies using genomes at the resolution of individual organisms and/or microbial communities. The genome-scale workflow includes annotation of microbial genomes, motif validation of biochemically validated conserved protein residues, metabolic pathway analyses, and calculation of contributions to individual biogeochemical transformations and cycles. The community-scale workflow supplements genome-scale analyses with determination of genome abundance in the microbiome, potential microbial metabolic handoffs and metabolite exchange, reconstruction of functional networks, and determination of microbial contributions to biogeochemical cycles. METABOLIC can take input genomes from isolates, metagenome-assembled genomes, ormore »
METABOLIC enables the consistent and reproducible study of microbial community ecology and biogeochemistry using a foundation of genome-informed microbial metabolism, and will advance the integration of uncultivated organisms into metabolic and biogeochemical models. METABOLIC is written in Perl and R and is freely available under GPLv3 at
Identifying splice site regions is an important step in the genomic DNA sequencing pipelines of biomedical and pharmaceutical research. Within this research purview, efficient and accurate splice site detection is highly desirable, and a variety of computational models have been developed toward this end. Neural network architectures have recently been shown to outperform classical machine learning approaches for the task of splice site prediction. Despite these advances, there is still considerable potential for improvement, especially regarding model prediction accuracy, and error rate.
Given these deficits, we propose EnsembleSplice, an ensemble learning architecture made up of four (4) distinct convolutional neural networks (CNN) model architecture combination that outperform existing splice site detection methods in the experimental evaluation metrics considered including the accuracies and error rates. We trained and tested a variety of ensembles made up of CNNs and DNNs using the five-fold cross-validation method to identify the model that performed the best across the evaluation and diversity metrics. As a result, we developed our diverse and highly effective splice site (SS) detection model, which we evaluated using two (2) genomic
Homo sapiensdatasets and the Arabidopsis thalianadataset. The results showed that for of the Homo sapiensEnsembleSplice achieved accuracies of 94.16% for one of themore » Conclusions
Our five-fold cross validation ensured the prediction accuracy of our models are consistent. For reproducibility, all the datasets used, models generated, and results in our work are publicly available in our GitHub repository here:
Diverse processes in cancer are mediated by enzymes, which most proximally exert their function through their activity. High-fidelity methods to profile enzyme activity are therefore critical to understanding and targeting the pathological roles of enzymes in cancer. Here, we present an integrated set of methods for measuring specific protease activities across scales, and deploy these methods to study treatment response in an autochthonous model of
Alk-mutant lung cancer. We leverage multiplexed nanosensors and machine learning to analyze in vivo protease activity dynamics in lung cancer, identifying significant dysregulation that includes enhanced cleavage of a peptide, S1, which rapidly returns to healthy levels with targeted therapy. Through direct on-tissue localization of protease activity, we pinpoint S1 cleavage to the tumor vasculature. To link protease activity to cellular function, we design a high-throughput method to isolate and characterize proteolytically active cells, uncovering a pro-angiogenic phenotype in S1-cleaving cells. These methods provide a framework for functional, multiscale characterization of protease dysregulation in cancer.
There are currently no effective biomarkers for prognosis and optimal treatment selection to improve non-small cell lung cancer (NSCLC) survival outcomes. This study further validated a seven-gene panel for diagnosis and prognosis of NSCLC using RNA sequencing and proteomic profiles of patient tumors. Within the seven-gene panel, ZNF71 expression combined with dendritic cell activities defined NSCLC patient subgroups (n = 966) with distinct survival outcomes (p = 0.04, Kaplan–Meier analysis). ZNF71 expression was significantly associated with the activities of natural killer cells (p = 0.014) and natural killer T cells (p = 0.003) in NSCLC patient tumors (n = 1016) using Chi-squared tests. Overexpression of ZNF71 resulted in decreased expression of multiple components of the intracellular intrinsic and innate immune systems, including dsRNA and dsDNA sensors. Multi-omics networks of ZNF71 and the intracellular intrinsic and innate immune systems were computed as relevant to NSCLC tumorigenesis, proliferation, and survival using patient clinical information and in-vitro CRISPR-Cas9/RNAi screening data. From these networks, pan-sensitive and pan-resistant genes to 21 NCCN-recommended drugs for treating NSCLC were selected. Based on the gene associations with patient survival and in-vitro CRISPR-Cas9, RNAi, and drug screening data, MEK1/2 inhibitors PD-198306 and U-0126, VEGFR inhibitor ZM-306416, and IGF-1R inhibitormore »