skip to main content

Title: Bayesian finite mixture of regression analysis for cancer based on histopathological imaging–environment interactions

Cancer is a heterogeneous disease. Finite mixture of regression (FMR)—as an important heterogeneity analysis technique when an outcome variable is present—has been extensively employed in cancer research, revealing important differences in the associations between a cancer outcome/phenotype and covariates. Cancer FMR analysis has been based on clinical, demographic, and omics variables. A relatively recent and alternative source of data comes from histopathological images. Histopathological images have been long used for cancer diagnosis and staging. Recently, it has been shown that high-dimensional histopathological image features, which are extracted using automated digital image processing pipelines, are effective for modeling cancer outcomes/phenotypes. Histopathological imaging–environment interaction analysis has been further developed to expand the scope of cancer modeling and histopathological imaging-based analysis. Motivated by the significance of cancer FMR analysis and a still strong demand for more effective methods, in this article, we take the natural next step and conduct cancer FMR analysis based on models that incorporate low-dimensional clinical/demographic/environmental variables, high-dimensional imaging features, as well as their interactions. Complementary to many of the existing studies, we develop a Bayesian approach for accommodating high dimensionality, screening out noises, identifying signals, and respecting the “main effects, interactions” variable selection hierarchy. An effective computational algorithm is developed, and simulation shows advantageous performance of the proposed approach. The analysis of The Cancer Genome Atlas data on lung squamous cell cancer leads to interesting findings different from the alternative approaches.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Medium: X Size: p. 425-442
p. 425-442
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Heterogeneity is a hallmark of cancer. For various cancer outcomes/phenotypes, supervised heterogeneity analysis has been conducted, leading to a deeper understanding of disease biology and customized clinical decisions. In the literature, such analysis has been oftentimes based on demographic, clinical, and omics measurements. Recent studies have shown that high-dimensional histopathological imaging features contain valuable information on cancer outcomes. However, comparatively, heterogeneity analysis based on imaging features has been very limited. In this article, we conduct supervised cancer heterogeneity analysis using histopathological imaging features. The penalized fusion technique, which has notable advantages-such as greater flexibility-over the finite mixture modeling and other techniques, is adopted. A sparse penalization is further imposed to accommodate high dimensionality and select relevant imaging features. To improve computational feasibility and generate more reliable estimation, we employ model averaging. Computational and statistical properties of the proposed approach are carefully investigated. Simulation demonstrates its favorable performance. The analysis of The Cancer Genome Atlas (TCGA) data may provide a new way of defining/examining breast cancer heterogeneity. 
    more » « less
  2. Abstract

    In cancer research, supervised heterogeneity analysis has important implications. Such analysis has been traditionally based on clinical/demographic/molecular variables. Recently, histopathological imaging features, which are generated as a byproduct of biopsy, have been shown as effective for modeling cancer outcomes, and a handful of supervised heterogeneity analysis has been conducted based on such features. There are two types of histopathological imaging features, which are extracted based on specific biological knowledge and using automated imaging processing software, respectively. Usingbothtypes of histopathological imaging features, our goal is to conduct the first supervised cancer heterogeneity analysisthat satisfies a hierarchical structure. That is, the first type of imaging features defines a rough structure, and the second type defines a nested and more refined structure. A penalization approach is developed, which has been motivated by but differs significantly from penalized fusion and sparse group penalization. It has satisfactory statistical and numerical properties. In the analysis of lung adenocarcinoma data, it identifies a heterogeneity structure significantly different from the alternatives and has satisfactory prediction and stability performance.

    more » « less
  3. Automatic histopathological Whole Slide Image (WSI) analysis for cancer classification has been highlighted along with the advancements in microscopic imaging techniques, since manual examination and diagnosis with WSIs are time- and cost-consuming. Recently, deep convolutional neural networks have succeeded in histopathological image analysis. However, despite the success of the development, there are still opportunities for further enhancements. In this paper, we propose a novel cancer texture-based deep neural network (CAT-Net) that learns scalable morphological features from histopathological WSIs. The innovation of CAT-Net is twofold: (1) capturing invariant spatial patterns by dilated convolutional layers and (2) improving predictive performance while reducing model complexity. Moreover, CAT-Net can provide discriminative morphological (texture) patterns formed on cancerous regions of histopathological images comparing to normal regions. We elucidated how our proposed method, CAT-Net, captures morphological patterns of interest in hierarchical levels in the model. The proposed method out-performed the current state-of-the-art benchmark methods on accuracy, precision, recall, and F1 score. 
    more » « less
  4. As the most lethal major cancer, pancreatic cancer is a global healthcare challenge. Personalized medicine utilizing cutting-edge multi-omics data holds potential for major breakthroughs in tackling this critical problem. Radiomics and deep learning, two trendy quantitative imaging methods that take advantage of data science and modern medical imaging, have shown increasing promise in advancing the precision management of pancreatic cancer via diagnosing of precursor diseases, early detection, accurate diagnosis, and treatment personalization and optimization. Radiomics employs manually-crafted features, while deep learning applies computer-generated automatic features. These two methods aim to mine hidden information in medical images that is missed by conventional radiology and gain insights by systematically comparing the quantitative image information across different patients in order to characterize unique imaging phenotypes. Both methods have been studied and applied in various pancreatic cancer clinical applications. In this review, we begin with an introduction to the clinical problems and the technology. After providing technical overviews of the two methods, this review focuses on the current progress of clinical applications in precancerous lesion diagnosis, pancreatic cancer detection and diagnosis, prognosis prediction, treatment stratification, and radiogenomics. The limitations of current studies and methods are discussed, along with future directions. With better standardization and optimization of the workflow from image acquisition to analysis and with larger and especially prospective high-quality datasets, radiomics and deep learning methods could show real hope in the battle against pancreatic cancer through big data-based high-precision personalization. 
    more » « less
  5. Abstract

    Heterogeneity is a hallmark of cancer, diabetes, cardiovascular diseases, and many other complex diseases. This study has been partly motivated by the unsupervised heterogeneity analysis for complex diseases based on molecular and imaging data, for which, network‐based analysis, by accommodating the interconnections among variables, can be more informative than that limited to mean, variance, and other simple distributional properties. In the literature, there has been very limited research on network‐based heterogeneity analysis, and a common limitation shared by the existing techniques is that the number of subgroups needs to be specified a priori or in an ad hoc manner. In this article, we develop a penalized fusion approach for heterogeneity analysis based on the Gaussian graphical model. It applies penalization to the mean and precision matrix parameters to generate regularized and interpretable estimates. More importantly, a fusion penalty is imposed to “automatedly” determine the number of subgroups and generate more concise, reliable, and interpretable estimation. Consistency properties are rigorously established, and an effective computational algorithm is developed. The heterogeneity analysis of non‐small‐cell lung cancer based on single‐cell gene expression data of the Wnt pathway and that of lung adenocarcinoma based on histopathological imaging data not only demonstrate the practical applicability of the proposed approach but also lead to interesting new findings.

    more » « less