skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 13 until 2:00 AM ET on Friday, June 14 due to maintenance. We apologize for the inconvenience.

Title: NetCellMatch: Multiscale Network‐Based Matching of Cancer Cell Lines to Patients Using Graphical Wavelets

Cancer cell lines serve as modelin vitrosystems for investigating therapeutic interventions. Recent advances in high‐throughput genomic profiling have enabled the systematic comparison between cell lines and patient tumor samples. The highly interconnected nature of biological data, however, presents a challenge when mapping patient tumors to cell lines. Standard clustering methods can be particularly susceptible to the high level of noise present in these datasets and only output clusters at one unknown scale of the data. In light of these challenges, we present NetCellMatch, a robust framework for network‐based matching of cell lines to patient tumors. NetCellMatch first constructs a global network across all cell line‐patient samples using their genomic similarity. Then, a multi‐scale community detection algorithm integrates information across topologically meaningful (clustering) scales to obtain Network‐Based Matching Scores (NBMS). NBMS are measures ofcluster robustnesswhich map patient tumors to cell lines. We use NBMS to determine representative “avatar” cell lines for subgroups of patients. We apply NetCellMatch to reverse‐phase protein array data obtained from The Cancer Genome Atlas for patients and the MD Anderson Cell Line Project for cell lines. Along with avatar cell line identification, we evaluate connectivity patterns for breast, lung, and colon cancer and explore the proteomic profiles of avatars and their corresponding top matching patients. Our results demonstrate our framework's ability to identify both patient‐cell line matches and potential proteomic drivers of similarity. Our methods are general and can be easily adapted to other'omic datasets.

more » « less
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Chemistry & Biodiversity
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Personalized (patient-specific) approaches have recently emerged with a precision medicine paradigm that acknowledges the fact that molecular pathway structures and activity might be considerably different within and across tumors. The functional cancer genome and proteome provide rich sources of information to identify patient-specific variations in signaling pathways and activities within and across tumors; however, current analytic methods lack the ability to exploit the diverse and multi-layered architecture of these complex biological networks. We assessed pan-cancer pathway activities for >7700 patients across 32 tumor types from The Cancer Proteome Atlas by developing a personalized cancer-specific integrated network estimation (PRECISE) model. PRECISE is a general Bayesian framework for integrating existing interaction databases, data-drivende novocausal structures, and upstream molecular profiling data to estimate cancer-specific integrated networks, infer patient-specific networks and elicit interpretable pathway-level signatures. PRECISE-based pathway signatures, can delineate pan-cancer commonalities and differences in proteomic network biology within and across tumors, demonstrates robust tumor stratification that is both biologically and clinically informative and superior prognostic power compared to existing approaches. Towards establishing the translational relevance of the functional proteome in research and clinical settings, we provide an online, publicly available, comprehensive database and visualization repository of our findings (

    more » « less
  2. Abstract Background

    Alternative RNA splicing is widely dysregulated in cancers including lung adenocarcinoma, where aberrant splicing events are frequently caused by somatic splice site mutations or somatic mutations of splicing factor genes. However, the majority of mis-splicing in cancers is unexplained by these known mechanisms. We hypothesize that the aberrant Ras signaling characteristic of lung cancers plays a role in promoting the alternative splicing observed in tumors.


    We recently performed transcriptome and proteome profiling of human lung epithelial cells ectopically expressing oncogenic KRAS and another cancer-associated Ras GTPase, RIT1. Unbiased analysis of phosphoproteome data identified altered splicing factor phosphorylation in KRAS-mutant cells, so we performed differential alternative splicing analysis using rMATS to identify significantly altered isoforms in lung epithelial cells. To determine whether these isoforms were uniquely regulated by KRAS, we performed a large-scale splicing screen in which we generated over 300 unique RNA sequencing profiles of isogenic A549 lung adenocarcinoma cells ectopically expressing 75 different wild-type or variant alleles across 28 genes implicated in lung cancer.


    Mass spectrometry data showed widespread downregulation of splicing factor phosphorylation in lung epithelial cells expressing mutant KRAS compared to cells expressing wild-type KRAS. We observed alternative splicing in the same cells, with 2196 and 2416 skipped exon events in KRASG12Vand KRASQ61Hcells, respectively, 997 of which were shared (p < 0.001 by hypergeometric test). In the high-throughput splicing screen, mutant KRAS induced the greatest number of differential alternative splicing events, second only to the RNA binding protein RBM45 and its variant RBM45M126I. We identified ten high confidence cassette exon events across multiple KRAS variants and cell lines. These included differential splicing of the Myc Associated Zinc Finger (MAZ). As MAZ regulates expression of KRAS, this splice variant may be a mechanism for the cell to modulate wild-type KRAS levels in the presence of oncogenic KRAS.


    Proteomic and transcriptomic profiling of lung epithelial cells uncovered splicing factor phosphorylation and mRNA splicing events regulated by oncogenic KRAS. These data suggest that in addition to widespread transcriptional changes, the Ras signaling pathway can promote post-transcriptional splicing changes that may contribute to oncogenic processes.

    more » « less
  3. Stabler, Cherie L. (Ed.)

    The unavailability of reliable models for studying breast cancer bone metastasis is the major challenge associated with poor prognosis in advanced-stage breast cancer patients. Breast cancer cells tend to preferentially disseminate to bone and colonize within the remodeling bone to cause bone metastasis. To improve the outcome of patients with breast cancer bone metastasis, we have previously developed a 3D in vitro breast cancer bone metastasis model using human mesenchymal stem cells (hMSCs) and primary breast cancer cell lines (MCF-7 and MDAMB231), recapitulating late-stage of breast cancer metastasis to bone. In the present study, we have tested our model using hMSCs and patient-derived breast cancer cell lines (NT013 and NT023) exhibiting different characteristics. We investigated the effect of breast cancer metastasis on bone growth using this 3D in vitro model and compared our results with previous studies. The results showed that NT013 and NT023 cells exhibiting hormone-positive and triple-negative characteristics underwent mesenchymal to epithelial transition (MET) and formed tumors in the presence of bone microenvironment, in line with our previous results with MCF-7 and MDAMB231 cell lines. In addition, the results showed upregulation of Wnt-related genes in hMSCs, cultured in the presence of excessive ET-1 cytokine released by NT013 cells, while downregulation of Wnt-related genes in the presence of excessive DKK-1, released by NT023 cells, leading to stimulation and abrogation of the osteogenic pathway, respectively, ultimately mimicking different types of bone lesions in breast cancer patients.

    more » « less
  4. In precision medicine, the ultimate goal is to recommend the most effective treatment to an individual patient based on patient‐specific molecular and clinical profiles, possibly high‐dimensional. To advance cancer treatment, large‐scale screenings of cancer cell lines against chemical compounds have been performed to help better understand the relationship between genomic features and drug response; existing machine learning approaches use exclusively supervised learning, including penalized regression and recommender systems. However, it would be more efficient to apply reinforcement learning to sequentially learn as data accrue, including selecting the most promising therapy for a patient given individual molecular and clinical features and then collecting and learning from the corresponding data. In this article, we propose a novel personalized ranking system called Proximal Policy Optimization Ranking (PPORank), which ranks the drugs based on their predicted effects per cell line (or patient) in the framework of deep reinforcement learning (DRL). Modeled as a Markov decision process, the proposed method learns to recommend the most suitable drugs sequentially and continuously over time. As a proof‐of‐concept, we conduct experiments on two large‐scale cancer cell line data sets in addition to simulated data. The results demonstrate that the proposed DRL‐based PPORank outperforms the state‐of‐the‐art competitors based on supervised learning. Taken together, we conclude that novel methods in the framework of DRL have great potential for precision medicine and should be further studied.

    more » « less
  5. Cancer is an umbrella term that includes a range of disorders, from those that are fast-growing and lethal to indolent lesions with low or delayed potential for progression to death. The treatment options, as well as treatment success, are highly dependent on the correct subtyping of individual patients. With the advancement of high-throughput platforms, we have the opportunity to differentiate among cancer subtypes from a holistic perspective that takes into consideration phenomena at different molecular levels (mRNA, methylation, etc.). This demands powerful integrative methods to leverage large multi-omics datasets for a better subtyping. Here we introduce Subtyping Multi-omics using a Randomized Transformation (SMRT), a new method for multi-omics integration and cancer subtyping. SMRT offers the following advantages over existing approaches: (i) the scalable analysis pipeline allows researchers to integrate multi-omics data and analyze hundreds of thousands of samples in minutes, (ii) the ability to integrate data types with different numbers of patients, (iii) the ability to analyze un-matched data of different types, and (iv) the ability to offer users a convenient data analysis pipeline through a web application. We also improve the efficiency of our ensemble-based, perturbation clustering to support analysis on machines with memory constraints. In an extensive analysis, we compare SMRT with eight state-of-the-art subtyping methods using 37 TCGA and two METABRIC datasets comprising a total of almost 12,000 patient samples from 28 different types of cancer. We also performed a number of simulation studies. We demonstrate that SMRT outperforms other methods in identifying subtypes with significantly different survival profiles. In addition, SMRT is extremely fast, being able to analyze hundreds of thousands of samples in minutes. The web application is available at . The R package will be deposited to CRAN as part of our PINSPlus software suite. 
    more » « less