skip to main content


Title: Predicting early breast cancer recurrence from histopathological images in the Carolina Breast Cancer Study
Abstract

Approaches for rapidly identifying patients at high risk of early breast cancer recurrence are needed. Image-based methods for prescreening hematoxylin and eosin (H&E) stained tumor slides could offer temporal and financial efficiency. We evaluated a data set of 704 1-mm tumor core H&E images (2–4 cores per case), corresponding to 202 participants (101 who recurred; 101 non-recurrent matched on age and follow-up time) from breast cancers diagnosed between 2008–2012 in the Carolina Breast Cancer Study. We leveraged deep learning to extract image information and trained a model to identify recurrence. Cross-validation accuracy for predicting recurrence was 62.4% [95% CI: 55.7, 69.1], similar to grade (65.8% [95% CI: 59.3, 72.3]) and ER status (66.3% [95% CI: 59.8, 72.8]). Interestingly, 70% (19/27) of early-recurrent low-intermediate grade tumors were identified by our image model. Relative to existing markers, image-based analyses provide complementary information for predicting early recurrence.

 
more » « less
NSF-PAR ID:
10473620
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
npj Breast Cancer
Volume:
9
Issue:
1
ISSN:
2374-4677
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Use of genomic assays to determine distant recurrence risk in patients with early stage breast cancer has expanded and is now included in the American Joint Committee on Cancer staging manual. Algorithmic alternatives using standard clinical and pathology information may provide equivalent benefit in settings where genomic tests, such as OncotypeDx, are unavailable. We developed an artificial neural network (ANN) model to nonlinearly estimate risk of distant cancer recurrence. In addition to clinical and pathological variables, we enhanced our model using intraoperatively determined global mammographic breast density (MBD) and local breast density (LBD). LBD was measured with optical spectral imaging capable of sensing regional concentrations of tissue constituents. A cohort of 56 ER+ patients with an OncotypeDx score was evaluated. We demonstrated that combining MBD/LBD measurements with clinical and pathological variables improves distant recurrence risk prediction accuracy, with high correlation (r= 0.98) to the OncotypeDx recurrence score.

     
    more » « less
  2. Abstract Motivation

    Predicting pathological complete response (pCR) to neoadjuvant chemotherapy (NAC) in triple-negative breast cancer (TNBC) patients accurately is direly needed for clinical decision making. pCR is also regarded as a strong predictor of overall survival. In this work, we propose a deep learning system to predict pCR to NAC based on serial pathology images stained with hematoxylin and eosin and two immunohistochemical biomarkers (Ki67 and PHH3). To support human prior domain knowledge-based guidance and enhance interpretability of the deep learning system, we introduce a human knowledge-derived spatial attention mechanism to inform deep learning models of informative tissue areas of interest. For each patient, three serial breast tumor tissue sections from biopsy blocks were sectioned, stained in three different stains and integrated. The resulting comprehensive attention information from the image triplets is used to guide our prediction system for prognostic tissue regions.

    Results

    The experimental dataset consists of 26 419 pathology image patches of 1000×1000 pixels from 73 TNBC patients treated with NAC. Image patches from randomly selected 43 patients are used as a training dataset and images patches from the rest 30 are used as a testing dataset. By the maximum voting from patch-level results, our proposed model achieves a 93% patient-level accuracy, outperforming baselines and other state-of-the-art systems, suggesting its high potential for clinical decision making.

    Availability and implementation

    The codes, the documentation and example data are available on an open source at: https://github.com/jkonglab/PCR_Prediction_Serial_WSIs_biomarkers

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract

    In the tumor microenvironment, immune cells have emerged as key regulators of cancer progression. While much work has focused on characterizing tumor‐related immune cells through gene expression profiling, microRNAs (miRNAs) have also been reported to regulate immune cells in the tumor microenvironment. Using regression‐based computational methods, we have constructed for the first time, immune cell signatures based on miRNA expression from The Cancer Genome Atlas breast and ovarian cancer datasets. Combined with existing mRNA immune cell signatures, the integrated mRNA‐miRNA leukocyte signatures are better able to delineate prognostic immune cell subsets within both cancers compared to the mRNA or miRNA signatures alone. Moreover, using the miRNA signatures, the anti‐inflammatory M2 macrophages emerged as the most significantly prognostic cell type in the breast cancer data (HR [hazard ratio]: 12.9; CI [confidence interval]: 3.09‐52.9;P = 4.22E−4), whereas the pro‐inflammatory M1 macrophages emerged as the most prognostic immune cell type in the ovarian cancer data (HR: 0.2; CI: 0.04‐0.56,P = 5.02E−3). These results suggest that our integrated miRNA and mRNA leukocyte signatures could be used to better delineate prognostic leukocyte subsets within cancers, whereas continued investigation may further support the regulatory relationships predicted between the miRNAs and immune cells found within our signature matrices.

     
    more » « less
  4. Abstract Motivation

    Breast cancer is a type of cancer that develops in breast tissues, and, after skin cancer, it is the most commonly diagnosed cancer in women in the United States. Given that an early diagnosis is imperative to prevent breast cancer progression, many machine learning models have been developed in recent years to automate the histopathological classification of the different types of carcinomas. However, many of them are not scalable to large-scale datasets.

    Results

    In this study, we propose the novel Primal-Dual Multi-Instance Support Vector Machine to determine which tissue segments in an image exhibit an indication of an abnormality. We derive an efficient optimization algorithm for the proposed objective by bypassing the quadratic programming and least-squares problems, which are commonly employed to optimize Support Vector Machine models. The proposed method is computationally efficient, thereby it is scalable to large-scale datasets. We applied our method to the public BreaKHis dataset and achieved promising prediction performance and scalability for histopathological classification.

    Availability and implementation

    Software is publicly available at: https://1drv.ms/u/s!AiFpD21bgf2wgRLbQq08ixD0SgRD?e=OpqEmY.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract

    While tumor infiltration by CD8+T cells is now widely accepted to predict outcomes, the clinical significance of intratumoral B cells is less clear. We hypothesized that spatial distribution rather than density of B cells within tumors may provide prognostic significance. We developed statistical techniques (fractal dimension differences and a box-counting method ‘occupancy’) to analyze the spatial distribution of tumor-infiltrating lymphocytes (TILs) in human triple-negative breast cancer (TNBC). Our results indicate that B cells in good outcome tumors (no recurrence within 5 years) are spatially dispersed, while B cells in poor outcome tumors (recurrence within 3 years) are more confined. While most TILs are located within the stroma, increased numbers of spatially dispersed lymphocytes within cancer cell islands are associated with a good prognosis. B cells and T cells often form lymphocyte clusters (LCs) identified via density-based clustering. LCs consist either of T cells only or heterotypic mixtures of B and T cells. Pure B cell LCs were negligible in number. Compared to tertiary lymphoid structures (TLS), LCs have fewer lymphocytes at lower densities. Both types of LCs are more abundant and more spatially dispersed in good outcomes compared to poor outcome tumors. Heterotypic LCs in good outcome tumors are smaller and more numerous compared to poor outcome. Heterotypic LCs are also closer to cancer islands in a good outcome, with LC size decreasing as they get closer to cancer cell islands. These results illuminate the significance of the spatial distribution of B cells and LCs within tumors.

     
    more » « less