skip to main content


Title: Lift the Veil of Breast Cancers Using 4 or Fewer Critical Genes
Known genes in the breast cancer study literature could not be confirmed whether they are vital to breast cancer formations due to lack of convincing accuracy, although they may be biologically directly related to breast cancer based on present biological knowledge. It is hoped vital genes can be identified with the highest possible accuracy, for example, 100% accuracy and convincing causal patterns beyond what has been known in breast cancer. One hope is that finding gene-gene interaction signatures and functional effects may solve the puzzle. This research uses a recently developed competing linear factor analysis method in differentially expressed gene detection to advance the study of breast cancer formation. Surprisingly, 3 genes are detected to be differentially expressed in TNBC and non-TNBC (Her2, Luminal A, Luminal B) samples with 100% sensitivity and 100% specificity in 1 study of triple-negative breast cancers (TNBC, with 54 675 genes and 265 samples). These 3 genes show a clear signature pattern of how TNBC patients can be grouped. For another TNBC study (with 54 673 genes and 66 samples), 4 genes bring the same accuracy of 100% sensitivity and 100% specificity. Four genes are found to have the same accuracy of 100% sensitivity and 100% specificity in 1 breast cancer study (with 54 675 genes and 121 samples), and the same 4 genes bring an accuracy of 100% sensitivity and 96.5% specificity in the fourth breast cancer study (with 60 483 genes and 1217 samples). These results show the 4-gene-based classifiers are robust and accurate. The detected genes naturally classify patients into subtypes, for example, 7 subtypes. These findings demonstrate the clearest gene-gene interaction patterns and functional effects with the smallest numbers of genes and the highest accuracy compared with findings reported in the literature. The 4 genes are considered to be essential for breast cancer studies and practice. They can provide focused, targeted researches and precision medicine for each subtype of breast cancer. New breast cancer disease types may be detected using the classified subtypes, and hence new effective therapies can be developed.  more » « less
Award ID(s):
2012298
NSF-PAR ID:
10340960
Author(s) / Creator(s):
Date Published:
Journal Name:
Cancer Informatics
Volume:
21
ISSN:
1176-9351
Page Range / eLocation ID:
117693512210763
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Finding genes biologically directly or indirectly related to lung cancer has been drawing much attention, and many genes directly related to lung cancer have been reported. However, it has not been confirmed whether those published 'key' genes are truly critical to lung cancer formation, i.e., they may be with very limited useful information. As a result, finding essential genes remains a challenging lung cancer research problem. Using a recently developed competing linear factor analysis method in differentially expressed gene detection, we advance the study of lung cancer critical genes detection to a uniformly informative level. A set of common four genes and their functional effects are detected to be differentially expressed in tumor and non- tumor samples with 100% sensitivity and 100% specificity in one study of lung adenocarcinoma (LUAD) and one study of squamous cell lung cancers (LUSC) (two North American cohorts with 20429 genes, 576 and 552 samples respectively). Two additional analyses also gain accuracy of 97.8% sensitivity and 100% specificity in one study of non-small cell lung carcinomas (NSCLC, a European cohort with 20356 genes and 156 samples), and an accuracy of 100% sensitivity and 95% specificity (1 out of 20 non-tumor samples) in one study of ALK-positive and EGFR/KRAS/ALK-negative lung adenocarcinomas (LUAD, a Japanese cohort with 20356 genes and 224 samples). There are some common genes, but different functional effects, within each set of four genes among two North American cohorts and a European cohort and among North American cohorts and the Japanese cohort. These results show the four-gene-based classifiers are robust with different types of lung cancers and different race cohorts and accurate. The functional effects of four genes disclose significantly other mechanisms (mysteries) between LUAD and LUSC. These sets of four genes and their functional effects are considered to be essential for lung cancer studies and practice. These genes' functional effects naturally classify patients into different groups (more than seven subtypes). Subtype information is useful for personalized therapies. The new findings can motivate new lung cancer research in more focused and targeted directions to save lives, protect people, and reduce enormous economic costs in research and lung cancer treatments. 
    more » « less
  2. There is currently no gene expression assay that can assess if premalignant lesions will develop into invasive breast cancer. This study sought to identify biomarkers for selecting patients with a high potential for developing invasive carcinoma in the breast with normal histology, benign lesions, or premalignant lesions. A set of 26-gene mRNA expression profiles were used to identify invasive ductal carcinomas from histologically normal tissue and benign lesions and to select those with a higher potential for future cancer development (ADHC) in the breast associated with atypical ductal hyperplasia (ADH). The expression-defined model achieved an overall accuracy of 94.05% (AUC = 0.96) in classifying invasive ductal carcinomas from histologically normal tissue and benign lesions (n = 185). This gene signature classified cancer development in ADH tissues with an overall accuracy of 100% (n = 8). The mRNA expression patterns of these 26 genes were validated using RT-PCR analyses of independent tissue samples (n = 77) and blood samples (n = 48). The protein expression of PBX2 and RAD52 assessed with immunohistochemistry were prognostic of breast cancer survival outcomes. This signature provided significant prognostic stratification in The Cancer Genome Atlas breast cancer patients (n = 1100), as well as basal-like and luminal A subtypes, and was associated with distinct immune infiltration and activities. The mRNA and protein expression of the 26 genes was associated with sensitivity or resistance to 18 NCCN-recommended drugs for treating breast cancer. Eleven genes had significant proliferative potential in CRISPR-Cas9/RNAi screening. Based on this gene expression signature, the VEGFR inhibitor ZM-306416 was discovered as a new drug for treating breast cancer.

     
    more » « less
  3. PURPOSE Lehmann et al have identified four molecular subtypes of triple-negative breast cancer (TNBC)—basal-like (BL) 1, BL2, mesenchymal (M), and luminal androgen receptor—and an immunomodulatory (IM) gene expression signature modifier. Our group previously showed that the response of TNBC to neoadjuvant systemic chemotherapy (NST) differs by molecular subtype, but whether NST affects the subtype was unknown. Here, we tested the hypothesis that in patients without pathologic complete response, TNBC subtypes can change after NST. Moreover, in cases with the changed subtype, we determined whether epithelial-to-mesenchymal transition (EMT) had occurred. MATERIALS AND METHODS From the Pan-Pacific TNBC Consortium data set containing TNBC patient samples from four countries, we examined 64 formalin-fixed, paraffin-embedded pairs of matched pre- and post-NST tumor samples. The TNBC subtype was determined using the TNBCtype-IM assay. We analyzed a partial EMT gene expression scoring metric using mRNA data. RESULTS Of the 64 matched pairs, 36 (56%) showed a change in the TNBC subtype after NST. The most frequent change was from BL1 to M subtypes (38%). No tumors changed from M to BL1. The IM signature was positive in 14 (22%) patients before NST and eight (12.5%) patients after NST. The EMT score increased after NST in 28 (78%) of the 36 patients with the changed subtype ( v 39% of the 28 patients without change; P = .002254). CONCLUSION We report, to our knowledge, for the first time that the TNBC molecular subtype and IM signature frequently change after NST. Our results also suggest that EMT is promoted by NST. Our findings may lead to innovative adjuvant therapy strategies in TNBC cases with residual tumor after NST. 
    more » « less
  4. e20551 Background: Enzyme activity is at the center of all biological processes. When these activities are misregulated by changes in sequence, expression, or activity, pathologies emerge. Misregulation of protease enzymes such as Matrix Metalloproteinases and Cathepsins play a key role in the pathophysiology of cancer. We describe here a novel class of graphene-based, cost effective biosensors that can detect altered protease activation in a blood sample from early stage lung cancer patients. Methods: The Gene Expression Omnibus (GEO) tool was used to identify proteases differentially expressed in lung cancer and matched normal tissue. Biosensors were assembled on a graphene backbone annotated with one of a panel of fluorescently tagged peptides. The graphene quenches fluorescence until the peptide is either cleaved by active proteases or altered by post-translational modification. 19 protease biosensors were evaluated on 431 commercially collected serum samples from non-lung cancer controls (69%) and pathologically confirmed lung cancer cases (31%) tested over two independent cohorts. Serum was incubated with each of the 19 biosensors and enzyme activity was measured indirectly as a continuous variable by a fluorescence plate reader. Analysis was performed using Emerge, a proprietary predictive and classification modeling system based on massively parallel evolving “Turing machine” algorithms. Each analysis stratified allocation into training and testing sets, and reserved an out-of-sample validation set for reporting. Results: 256 clinical samples were initially evaluated including 35% cancer cases evenly distributed across stages I (29%), II (26%), III (24%) and IV (21%). The case controls included common co-morbidies in the at-risk population such as COPD, chronic bronchitis, and benign nodules (19%). Using the Emerge classification analysis, biosensor biomarkers alone (no clinical factors) demonstrated Sensitivity (Se.) = 92% (CI 82%-99%) and Specificity (Sp.) = 82% (CI 69%-91%) in the out-of-sample set. An independent cohort of 175 clinical cases (age 67±8, 52% male) focused on early detection (26% cancer, 70% Stage I, 30% Stage II/III) were similarly evaluated. Classification showed Se. = 100% (CI 79%-100%) and Sp. = 93% (CI 80%-99%) in the out-of-sample set. For the entire dataset of 175 samples, Se. = 100% (CI 92%-100%) and Sp. = 97% (CI 92%-99%) was observed. Conclusions: Lung cancer can be treated if it is diagnosed when still localized. Despite clear data showing screening for lung cancer by Low Dose Computed Tomography (LDCT) is effective, screening compliance remains very low. Protease biosensors provide a cost effective additional specialized tool with high sensitivity and specificity in detection of early stage lung cancer. A large prospective trial of at-risk smokers with follow up is being conducted to evaluate a commercial version of this assay. 
    more » « less
  5. Abstract Motivation Detecting cancer gene expression and transcriptome changes with mRNA-sequencing (RNA-Seq) or array-based data are important for understanding the molecular mechanisms underlying carcinogenesis and cellular events during cancer progression. In previous studies, the differentially expressed genes were detected across patients in one cancer type. These studies ignored the role of mRNA expression changes in driving tumorigenic mechanisms that are either universal or specific in different tumor types. To address the problem, we introduce two network-based multi-task learning frameworks, NetML and NetSML, to discover common differentially expressed genes shared across different cancer types as well as differentially expressed genes specific to each cancer type. The proposed frameworks consider the common latent gene co-expression modules and gene-sample biclusters underlying the multiple cancer datasets to learn the knowledge crossing different tumor types. Results Large-scale experiments on simulations and real cancer high-throughput datasets validate that the proposed network-based multi-task learning frameworks perform better sample classification compared with the models without the knowledge sharing across different cancer types. The common and cancer specific molecular signatures detected by multi-task learning frameworks on TCGA ovarian cancer, breast cancer, and prostate cancer datasets are correlated with the known marker genes and enriched in cancer relevant KEGG pathways and Gene Ontology terms. Availability and Implementation Source code is available at: https://github.com/compbiolabucf/NetML Supplementary information Supplementary data are available at Bioinformatics 
    more » « less