skip to main content

Title: Machine Learning of Discriminative Gate Locations for Clinical Diagnosis

High‐throughput single‐cell cytometry technologies have significantly improved our understanding of cellular phenotypes to support translational research and the clinical diagnosis of hematological and immunological diseases. However, subjective and ad hoc manual gating analysis does not adequately handle the increasing volume and heterogeneity of cytometry data for optimal diagnosis. Prior work has shown that machine learning can be applied to classify cytometry samples effectively. However, many of the machine learning classification results are either difficult to interpret without using characteristics of cell populations to make the classification, or suboptimal due to the use of inaccurate cell population characteristics derived from gating boundaries. To date, little has been done to optimize both the gating boundaries and the diagnostic accuracy simultaneously. In this work, we describe a fully discriminative machine learning approach that can simultaneously learn feature representations (e.g., combinations of coordinates of gating boundaries) and classifier parameters for optimizing clinical diagnosis from cytometry measurements. The approach starts from an initial gating position and then refines the position of the gating boundaries by gradient descent until a set of globally‐optimized gates across different samples are achieved. The learning procedure is constrained by regularization terms encoding domain knowledge that encourage the algorithm to seek interpretable results. We evaluate the proposed approach using both simulated and real data, producing classification results on par with those generated via human expertise, in terms of both the positions of the gating boundaries and the diagnostic accuracy. © 2019 The Authors.Cytometry Part Apublished by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.

more » « less
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Cytometry Part A
Page Range / eLocation ID:
p. 296-307
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Knowing when a machine learning system is not confident about its prediction is crucial in medical domains where safety is critical. Ideally, a machine learning algorithm should make a prediction only when it is highly certain about its competency, and refer the case to physicians otherwise. In this paper, we investigate how Bayesian deep learning can improve the performance of the machine–physician team in the skin lesion classification task. We used the publicly available HAM10000 dataset, which includes samples from seven common skin lesion categories: Melanoma (MEL), Melanocytic Nevi (NV), Basal Cell Carcinoma (BCC), Actinic Keratoses and Intraepithelial Carcinoma (AKIEC), Benign Keratosis (BKL), Dermatofibroma (DF), and Vascular (VASC) lesions. Our experimental results show that Bayesian deep networks can boost the diagnostic performance of the standard DenseNet-169 model from 81.35% to 83.59% without incurring additional parameters or heavy computation. More importantly, a hybrid physician–machine workflow reaches a classification accuracy of 90 % while only referring 35 % of the cases to physicians. The findings are expected to generalize to other medical diagnosis applications. We believe that the availability of risk-aware machine learning methods will enable a wider adoption of machine learning technology in clinical settings. 
    more » « less
  2. Abstract

    Acute lymphoblastic leukemia (ALL) is the most common childhood cancer. While there are a number of well‐recognized prognostic biomarkers at diagnosis, the most powerful independent prognostic factor is the response of the leukemia to induction chemotherapy (Campana and Pui: Blood 129 (2017) 1913–1918). Given the potential for machine learning to improve precision medicine, we tested its capacity to monitor disease in children undergoing ALL treatment. Diagnostic and on‐treatment bone marrow samples were labeled with an ALL‐discriminating antibody combination and analyzed by imaging flow cytometry. Ignoring the fluorescent markers and using only features extracted from bright‐field and dark‐field cell images, a deep learning model was able to identify ALL cells at an accuracy of >88%. This antibody‐free, single cell method is cheap, quick, and could be adapted to a simple, laser‐free cytometer to allow automated, point‐of‐care testing to detect slow early responders. Adaptation to other types of leukemia is feasible, which would revolutionize residual disease monitoring. © 2020 The Authors.Cytometry Part Apublished by Wiley Periodicals, Inc. on behalf of International Society for Advancement of Cytometry.

    more » « less
  3. Many recent efforts in the diagnostic field address the accessibility of cancer diagnosis. Typical histological staining methods identify cancer cells visually by a larger nucleus with more condensed chromatin. Machine learning (ML) has been incorporated into image analysis for improving this process. Recently, impedance spectrometers have been shown to generate all-inclusive lab-on-a-chip platforms to detect nucleus abnormities. In this paper, a wideband electrical sensor and data analysis paradigm that can identify nuclear changes shows the realization of a single-cell microfluidic device to detect nuclei of altered sizes. To model cells of altered nucleus, Jurkat cells were treated to enlarge or shrink their nucleus followed by broadband sensing to obtain the S-parameters of single cells. The ability to deduce important frequencies associated with nucleus size is demonstrated and used to improve classification models in both binary and multiclass scenarios, despite a heterogeneous and overlapping cell population. The important frequency features match those predicted in a double-shell circuit model published in prior work, demonstrating a coherent new analytical technique for electrical data analysis. The electrical sensing platform assisted by ML with impressive accuracy of cell classification looks forward to a label-free and flexible approach to cancer diagnosis. 
    more » « less
  4. With the advances in machine learning for the diagnosis of Alzheimer’s disease (AD), most studies have focused on either identifying the subject’s status through classification algorithms or on predicting their cognitive scores through regression methods, neglecting the potential association between these two tasks. Motivated by the need to enhance the prospects for early diagnosis along with the ability to predict future disease states, this study proposes a deep neural network based on modality fusion, kernelization, and tensorization that perform multiclass classification and longitudinal regression simultaneously within a unified multitask framework. This relationship between multiclass classification and longitudinal regression is found to boost the efficacy of the final model in dealing with both tasks. Different multimodality scenarios are investigated, and complementary aspects of the multimodal features are exploited to simultaneously delineate the subject’s label and predict related cognitive scores at future timepoints using baseline data. The main intent in this multitask framework is to consolidate the highest accuracy possible in terms of precision, sensitivity, F1 score, and area under the curve (AUC) in the multiclass classification task while maintaining the highest similarity in the MMSE score as measured through the correlation coefficient and the RMSE for all time points under the prediction task, with both tasks, run simultaneously under the same set of hyperparameters. The overall accuracy for multiclass classification of the proposed KTMnet method is 66.85 ± 3.77. The prediction results show an average RMSE of 2.32 ± 0.52 and a correlation of 0.71 ± 5.98 for predicting MMSE throughout the time points. These results are compared to state-of-the-art techniques reported in the literature. A discovery from the multitasking of this consolidated machine learning framework is that a set of hyperparameters that optimize the prediction results may not necessarily be the same as those that would optimize the multiclass classification. In other words, there is a breakpoint beyond which enhancing further the results of one process could lead to the downgrading in accuracy for the other. 
    more » « less
  5. Membrane antigens control cell function by regulating biochemical interactions and hence are routinely used as diagnostic and prognostic targets in biomedicine. Fluorescent labeling and subsequent optical interrogation of cell membrane antigens, while highly effective, limit expression profiling to centralized facilities that can afford and operate complex instrumentation. Here, we introduce a cytometry technique that computes surface expression of immunomagnetically labeled cells by electrically tracking their trajectory under a magnetic field gradient on a microfluidic chip with a throughput of >500 cells per min. In addition to enabling the creation of a frugal cytometry platform, this immunomagnetic cell manipulation-based measurement approach allows direct expression profiling of target subpopulations from non-purified samples. We applied our technology to measure epithelial cell adhesion molecule expression on human breast cancer cells. Once calibrated, surface expression and size measurements match remarkably well with fluorescence-based measurements from a commercial flow cytometer. Quantitative measurements of biochemical and biophysical cell characteristics with a disposable cytometer have the potential to impact point of care testing of clinical samples particularly in resource limited settings. 
    more » « less