skip to main content


Title: TAaCGH Suite for Detecting Cancer—Specific Copy Number Changes Using Topological Signatures
Copy number changes play an important role in the development of cancer and are commonly associated with changes in gene expression. Persistence curves, such as Betti curves, have been used to detect copy number changes; however, it is known these curves are unstable with respect to small perturbations in the data. We address the stability of lifespan and Betti curves by providing bounds on the distance between persistence curves of Vietoris–Rips filtrations built on data and slightly perturbed data in terms of the bottleneck distance. Next, we perform simulations to compare the predictive ability of Betti curves, lifespan curves (conditionally stable) and stable persistent landscapes to detect copy number aberrations. We use these methods to identify significant chromosome regions associated with the four major molecular subtypes of breast cancer: Luminal A, Luminal B, Basal and HER2 positive. Identified segments are then used as predictor variables to build machine learning models which classify patients as one of the four subtypes. We find that no single persistence curve outperforms the others and instead suggest a complementary approach using a suite of persistence curves. In this study, we identified new cytobands associated with three of the subtypes: 1q21.1-q25.2, 2p23.2-p16.3, 23q26.2-q28 with the Basal subtype, 8p22-p11.1 with Luminal B and 2q12.1-q21.1 and 5p14.3-p12 with Luminal A. These segments are validated by the TCGA BRCA cohort dataset except for those found for Luminal A.  more » « less
Award ID(s):
1854770 1934568
NSF-PAR ID:
10350856
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Entropy
Volume:
24
Issue:
7
ISSN:
1099-4300
Page Range / eLocation ID:
896
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Tumor subtype and menopausal status are strong predictors of breast cancer (BC) prognosis. We aimed to find and validate subtype- or menopausal-status-specific changes in tumor DNA methylation (DNAm) associated with all-cause mortality or BC progression. Associations between site-specific tumor DNAm and BC prognosis were estimated among The Cancer Genome Atlas participants ( n = 692) with Illumina Infinium HumanMethylation450 BeadChip array data. All-cause mortality and BC progression were modeled using Cox proportional hazards models stratified by tumor subtypes, adjusting for age, race, stage, menopausal status, tumor purity, and cell type proportion. Effect measure modification by subtype and menopausal status were evaluated by incorporating a product term with DNAm. Site-specific inference was used to identify subtype- or menopausal-status-specific differentially methylated regions (DMRs) and functional pathways. The validation of the results was carried out on an independent dataset (GSE72308; n = 180). We identified a total of fifteen unique CpG probes that were significantly associated ( P ≤ 1 × 10 − 7 with survival outcomes in subtype- or menopausal-status-specific manner. Seven probes were associated with overall survival (OS) or progression-free interval (PFI) for women with luminal A subtype, and four probes were associated with PFI for women with luminal B subtype. Five probes were associated with PFI for post-menopausal women. A majority of significant probes showed a lower risk of OS or BC progression with higher DNAm. We identified subtype- or menopausal-status-specific DMRs and functional pathways of which top associated pathways differed across subtypes or menopausal status. None of significant probes from site-specific analyses met genome-wide significant level in validation analyses while directions and magnitudes of coefficients showed consistent pattern. We have identified subtype- or menopausal-status-specific DNAm biomarkers, DMRs and functional pathways associated with all-cause mortality or BC progression, albeit with limited validation. Future studies with larger independent cohort of non-post-menopausal women with non-luminal A subtypes are warranted for identifying subtype- and menopausal-status-specific DNAm biomarkers for BC prognosis. 
    more » « less
  2. PURPOSE Lehmann et al have identified four molecular subtypes of triple-negative breast cancer (TNBC)—basal-like (BL) 1, BL2, mesenchymal (M), and luminal androgen receptor—and an immunomodulatory (IM) gene expression signature modifier. Our group previously showed that the response of TNBC to neoadjuvant systemic chemotherapy (NST) differs by molecular subtype, but whether NST affects the subtype was unknown. Here, we tested the hypothesis that in patients without pathologic complete response, TNBC subtypes can change after NST. Moreover, in cases with the changed subtype, we determined whether epithelial-to-mesenchymal transition (EMT) had occurred. MATERIALS AND METHODS From the Pan-Pacific TNBC Consortium data set containing TNBC patient samples from four countries, we examined 64 formalin-fixed, paraffin-embedded pairs of matched pre- and post-NST tumor samples. The TNBC subtype was determined using the TNBCtype-IM assay. We analyzed a partial EMT gene expression scoring metric using mRNA data. RESULTS Of the 64 matched pairs, 36 (56%) showed a change in the TNBC subtype after NST. The most frequent change was from BL1 to M subtypes (38%). No tumors changed from M to BL1. The IM signature was positive in 14 (22%) patients before NST and eight (12.5%) patients after NST. The EMT score increased after NST in 28 (78%) of the 36 patients with the changed subtype ( v 39% of the 28 patients without change; P = .002254). CONCLUSION We report, to our knowledge, for the first time that the TNBC molecular subtype and IM signature frequently change after NST. Our results also suggest that EMT is promoted by NST. Our findings may lead to innovative adjuvant therapy strategies in TNBC cases with residual tumor after NST. 
    more » « less
  3. Known genes in the breast cancer study literature could not be confirmed whether they are vital to breast cancer formations due to lack of convincing accuracy, although they may be biologically directly related to breast cancer based on present biological knowledge. It is hoped vital genes can be identified with the highest possible accuracy, for example, 100% accuracy and convincing causal patterns beyond what has been known in breast cancer. One hope is that finding gene-gene interaction signatures and functional effects may solve the puzzle. This research uses a recently developed competing linear factor analysis method in differentially expressed gene detection to advance the study of breast cancer formation. Surprisingly, 3 genes are detected to be differentially expressed in TNBC and non-TNBC (Her2, Luminal A, Luminal B) samples with 100% sensitivity and 100% specificity in 1 study of triple-negative breast cancers (TNBC, with 54 675 genes and 265 samples). These 3 genes show a clear signature pattern of how TNBC patients can be grouped. For another TNBC study (with 54 673 genes and 66 samples), 4 genes bring the same accuracy of 100% sensitivity and 100% specificity. Four genes are found to have the same accuracy of 100% sensitivity and 100% specificity in 1 breast cancer study (with 54 675 genes and 121 samples), and the same 4 genes bring an accuracy of 100% sensitivity and 96.5% specificity in the fourth breast cancer study (with 60 483 genes and 1217 samples). These results show the 4-gene-based classifiers are robust and accurate. The detected genes naturally classify patients into subtypes, for example, 7 subtypes. These findings demonstrate the clearest gene-gene interaction patterns and functional effects with the smallest numbers of genes and the highest accuracy compared with findings reported in the literature. The 4 genes are considered to be essential for breast cancer studies and practice. They can provide focused, targeted researches and precision medicine for each subtype of breast cancer. New breast cancer disease types may be detected using the classified subtypes, and hence new effective therapies can be developed. 
    more » « less
  4. Abstract

    One-dimensional persistent homology is arguably the most important and heavily used computational tool in topological data analysis. Additional information can be extracted from datasets by studying multi-dimensional persistence modules and by utilizing cohomological ideas, e.g. the cohomological cup product. In this work, given a single parameter filtration, we investigate a certain 2-dimensional persistence module structure associated with persistent cohomology, where one parameter is the cup-length$$\ell \ge 0$$0and the other is the filtration parameter. This new persistence structure, called thepersistent cup module, is induced by the cohomological cup product and adapted to the persistence setting. Furthermore, we show that this persistence structure is stable. By fixing the cup-length parameter$$\ell $$, we obtain a 1-dimensional persistence module, called the persistent$$\ell $$-cup module, and again show it is stable in the interleaving distance sense, and study their associated generalized persistence diagrams. In addition, we consider a generalized notion of apersistent invariant, which extends both therank invariant(also referred to aspersistent Betti number), Puuska’s rank invariant induced by epi-mono-preserving invariants of abelian categories, and the recently-definedpersistent cup-length invariant, and we establish their stability. This generalized notion of persistent invariant also enables us to lift the Lyusternik-Schnirelmann (LS) category of topological spaces to a novel stable persistent invariant of filtrations, called thepersistent LS-category invariant.

     
    more » « less
  5. There is currently no gene expression assay that can assess if premalignant lesions will develop into invasive breast cancer. This study sought to identify biomarkers for selecting patients with a high potential for developing invasive carcinoma in the breast with normal histology, benign lesions, or premalignant lesions. A set of 26-gene mRNA expression profiles were used to identify invasive ductal carcinomas from histologically normal tissue and benign lesions and to select those with a higher potential for future cancer development (ADHC) in the breast associated with atypical ductal hyperplasia (ADH). The expression-defined model achieved an overall accuracy of 94.05% (AUC = 0.96) in classifying invasive ductal carcinomas from histologically normal tissue and benign lesions (n = 185). This gene signature classified cancer development in ADH tissues with an overall accuracy of 100% (n = 8). The mRNA expression patterns of these 26 genes were validated using RT-PCR analyses of independent tissue samples (n = 77) and blood samples (n = 48). The protein expression of PBX2 and RAD52 assessed with immunohistochemistry were prognostic of breast cancer survival outcomes. This signature provided significant prognostic stratification in The Cancer Genome Atlas breast cancer patients (n = 1100), as well as basal-like and luminal A subtypes, and was associated with distinct immune infiltration and activities. The mRNA and protein expression of the 26 genes was associated with sensitivity or resistance to 18 NCCN-recommended drugs for treating breast cancer. Eleven genes had significant proliferative potential in CRISPR-Cas9/RNAi screening. Based on this gene expression signature, the VEGFR inhibitor ZM-306416 was discovered as a new drug for treating breast cancer.

     
    more » « less