Finding the network biomarkers of cancers and the analysis of cancer driving genes that are involved in these biomarkers are essential for understanding the dynamics of cancer. Clusters of genes in co-expression networks are commonly known as functional units. This work is based on the hypothesis that the dense clusters or communities in the gene co-expression networks of cancer patients may represent functional units regarding cancer initiation and progression. In this study, RNA-seq gene expression data of three cancers - Breast Invasive Carcinoma (BRCA), Colorectal Adenocarcinoma (COAD) and Glioblastoma Multiforme (GBM) - from The Cancer Genome Atlas (TCGA) are used to construct gene co-expression networks using Pearson Correlation. Six well-known community detection algorithms are applied on these networks to identify communities with five or more genes. A permutation test is performed to further mine the communities that are conserved in other cancers, thus calling them conserved communities. Then survival analysis is performed on clinical data of three cancers using the conserved community genes as prognostic co-variates. The communities that could distinguish the cancer patients between high- and low-risk groups are considered as cancer biomarkers. In the present study, 16 such network biomarkers are discovered.
more »
« less
GenReP: An Ensemble Model for Predicting TP53 in Response to Pharmaceutical Compounds
TP53 is a tumor-suppressor gene involved in regulating apoptosis, DNA repair, and genomic stability. Mutations in TP53 are implicated in approximately half of all detected cancers, including breast, lung, colorectal, and ovarian cancers, making it a significant target for therapeutic interventions. Many pharmaceutical drugs aim to restore TP53 function, and there is a need for predictive tools to assess how compounds may affect TP53 expression. In this study, we propose a new ensemble machine-learning model to predict the direction of TP53 relative gene expression in response to pharmaceutical compounds. Our model utilizes molecular fingerprints, descriptors, and scaffold-based features extracted from SMILES representations of compounds concatenated into a single feature vector. Trained using our newly generated benchmark dataset based on the Connectivity Map (CMap) database and addressing class imbalance with the Synthetic Minority Over-sampling Technique (SMOTE), our model achieves 62.9%, 93.9%, 40.3%, and 0.39 in terms of accuracy, sensitivity, specificity, and Matthews Correlation Coefficient (MCC), respectively. As the first-of-its-kind TP53 gene regulation prediction, our study serves as a convincing proof-of-concept that paves the way for future investigation. GenReP as a stand-alone predictor, its source code, and our newly generated benchmark dataset are publicly available.
more »
« less
- Award ID(s):
- 2152059
- PAR ID:
- 10682446
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- Molecules
- Volume:
- 31
- Issue:
- 4
- ISSN:
- 1420-3049
- Page Range / eLocation ID:
- 739
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Background Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions. Results We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses. Conclusion We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensional representations of transcriptional states.more » « less
-
Abstract From single-cell RNA-sequencing (scRNA-seq) and spatial transcriptomics (ST), one can extract high-dimensional gene expression patterns that can be described by intercellular communication networks or decoupled gene modules. These two descriptions of information flow are often assumed to occur independently. However, intercellular communication drives directed flows of information that are mediated by intracellular gene modules, in turn triggering outflows of other signals. Methodologies to describe such intercellular flows are lacking. We present FlowSig, a method that infers communication-driven intercellular flows from scRNA-seq or ST data using graphical causal modeling and conditional independence. We benchmark FlowSig using newly generated experimental cortical organoid data and synthetic data generated from mathematical modeling. We demonstrate FlowSig’s utility by applying it to various studies, showing that FlowSig can capture stimulation-induced changes to paracrine signaling in pancreatic islets, demonstrate shifts in intercellular flows due to increasing COVID-19 severity and reconstruct morphogen-driven activator–inhibitor patterns in mouse embryogenesis.more » « less
-
Abstract Melanoma and nonmelanoma skin cancers are among the most prevalent and most lethal forms of skin cancers. To identify new lead compounds with potential anticancer properties for further optimization, in vitro assays combined with in‐silico target fishing and docking have been used to identify and further map out the antiproliferative and potential mode of action of molecules from a small library of compounds previously prepared in our laboratory. From screening these compounds in vitro against A375, SK‐MEL‐28, A431, and SCC‐12 skin cancer cell lines, 35 displayed antiproliferative activities at the micromolar level, with the majority being primarily potent against the A431 and SCC‐12 squamous carcinoma cell lines. The most active compounds11(A431: IC50 = 5.0 μM, SCC‐12: IC50 = 2.9 μM, SKMEL‐28: IC50 = 4.9 μM, A375: IC50 = 6.7 μM) and13(A431: IC50 = 5.0 μM, SCC‐12: IC50 = 3.3 μM, SKMEL‐28: IC50 = 13.8 μM, A375: IC50 = 17.1 μM), significantly and dose‐dependently induced apoptosis of SCC‐12 and SK‐MEL‐28 cells, as evidenced by the suppression of Bcl‐2 and upregulation of Bax, cleaved caspase‐3, caspase‐9, and PARP protein expression levels. Both agents significantly reduced scratch wound healing, colony formation, and expression levels of deregulated cancer molecular targets including RSK/Akt/ERK1/2 and S6K1. In silico target prediction and docking studies using the SwissTargetPrediction web‐based tool suggested that CDK8, CLK4, nuclear receptor ROR, tyrosine protein‐kinase Fyn/LCK, ROCK1/2, and PARP, all of which are dysregulated in skin cancers, might be prospective targets for the two most active compounds. Further validation of these targets by western blot analyses, revealed that ROCK/Fyn and its associated Hedgehog (Hh) pathways were downregulated or modulated by the two lead compounds. In aggregate, these results provide a strong framework for further validation of the observed activities and the development of a more comprehensive structure–activity relationship through the preparation and biological evaluation of analogs.more » « less
-
Single-cell and single-nucleus RNA sequencing are used to reveal heterogeneity in cells, showing a growing potential for precision and personalized medicine. Nevertheless, sustainable drug discovery must be based on a population-level understanding of molecular mechanisms, which calls for a population-scale analysis of this data. This work introduces a sequential target-drug selection model for drug repurposing against Alzheimer’s Disease (AD) targets inferred from snRNA-seq data of AD progression- involving hundreds of thousands of nuclei from multipatient and multiregional studies. We utilize Persistent Sheaf Laplacians (PSL) to facilitate a Protein−Protein Interaction (PPI) analysis inferred from disease related differential gene expression (DEG). We then use an ensemble of machine learning models to predict repurpose- able compounds. We screen the efficacy of different small compounds and further examine their central nervous system relevant ADMET properties, resulting in a list of potential molecular targets as well as pharmaceutical lead candidates for AD treatment.more » « less
An official website of the United States government

