skip to main content


Title: Cell morphology-based machine learning models for human cell state classification
Herein, we implement and access machine learning architectures to ascertain models that differentiate healthy from apoptotic cells using exclusively forward (FSC) and side (SSC) scatter flow cytometry information. To generate training data, colorectal cancer HCT116 cells were subjected to miR-34a treatment and then classified using a conventional Annexin V/propidium iodide (PI)-staining assay. The apoptotic cells were defined as Annexin V-positive cells, which include early and late apoptotic cells, necrotic cells, as well as other dying or dead cells. In addition to fluorescent signal, we collected cell size and granularity information from the FSC and SSC parameters. Both parameters are subdivided into area, height, and width, thus providing a total of six numerical features that informed and trained our models. A collection of logistical regression, random forest, k-nearest neighbor, multilayer perceptron, and support vector machine was trained and tested for classification performance in predicting cell states using only the six aforementioned numerical features. Out of 1046 candidate models, a multilayer perceptron was chosen with 0.91 live precision, 0.93 live recall, 0.92 live f value and 0.97 live area under the ROC curve when applied on standardized data. We discuss and highlight differences in classifier performance and compare the results to the standard practice of forward and side scatter gating, typically performed to select cells based on size and/or complexity. We demonstrate that our model, a ready-to-use module for any flow cytometry-based analysis, can provide automated, reliable, and stain-free classification of healthy and apoptotic cells using exclusively size and granularity information.  more » « less
Award ID(s):
2029121
NSF-PAR ID:
10233562
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
npj systems biology and applications
Volume:
7
Issue:
23
ISSN:
2056-7189
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Herein, we implement and access machine learning architectures to ascertain models that differentiate healthy from apoptotic cells using exclusively forward (FSC) and side (SSC) scatter flow cytometry information. To generate training data, colorectal cancer HCT116 cells were subjected to miR-34a treatment and then classified using a conventional Annexin V/propidium iodide (PI)-staining assay. The apoptotic cells were defined as Annexin V-positive cells, which include early and late apoptotic cells, necrotic cells, as well as other dying or dead cells. In addition to fluorescent signal, we collected cell size and granularity information from the FSC and SSC parameters. Both parameters are subdivided into area, height, and width, thus providing a total of six numerical features that informed and trained our models. A collection of logistical regression, random forest, k-nearest neighbor, multilayer perceptron, and support vector machine was trained and tested for classification performance in predicting cell states using only the six aforementioned numerical features. Out of 1046 candidate models, a multilayer perceptron was chosen with 0.91 live precision, 0.93 live recall, 0.92 livefvalue and 0.97 live area under the ROC curve when applied on standardized data. We discuss and highlight differences in classifier performance and compare the results to the standard practice of forward and side scatter gating, typically performed to select cells based on size and/or complexity. We demonstrate that our model, a ready-to-use module for any flow cytometry-based analysis, can provide automated, reliable, and stain-free classification of healthy and apoptotic cells using exclusively size and granularity information.

     
    more » « less
  2. null (Ed.)
    Detection and quantification of bacterial endotoxins is important in a range of health-related contexts, including during pharmaceutical manufacturing of therapeutic proteins and vaccines. Here we combine experimental measurements based on nematic liquid crystalline droplets and machine learning methods to show that it is possible to classify bacterial sources ( Escherichia coli , Pseudomonas aeruginosa , Salmonella minnesota ) and quantify concentration of endotoxin derived from all three bacterial species present in aqueous solution. The approach uses flow cytometry to quantify, in a high-throughput manner, changes in the internal ordering of micrometer-sized droplets of nematic 4-cyano-4′-pentylbiphenyl triggered by the endotoxins. The changes in internal ordering alter the intensities of light side-scattered (SSC, large-angle) and forward-scattered (FSC, small-angle) by the liquid crystal droplets. A convolutional neural network (Endonet) is trained using the large data sets generated by flow cytometry and shown to predict endotoxin source and concentration directly from the FSC/SSC scatter plots. By using saliency maps, we reveal how EndoNet captures subtle differences in scatter fields to enable classification of bacterial source and quantification of endotoxin concentration over a range that spans eight orders of magnitude (0.01 pg mL −1 to 1 μg mL −1 ). We attribute changes in scatter fields with bacterial origin of endotoxin, as detected by EndoNet, to the distinct molecular structures of the lipid A domains of the endotoxins derived from the three bacteria. Overall, we conclude that the combination of liquid crystal droplets and EndoNet provides the basis of a promising analytical approach for endotoxins that does not require use of complex biologically-derived reagents ( e.g. , Limulus amoebocyte lysate). 
    more » « less
  3. Abstract

    Most cancer patients die from metastatic disease as a result of a circulating tumor cell (CTC) spreading from a primary tumor through the blood circulation to distant organs. Many studies have demonstrated the tremendous potential of using CTC counts as prognostic markers of metastatic development and therapeutic efficacy. However, it is only the viable CTCs capable of surviving in the blood circulation that can create distant metastasis. To date, little progress has been made in understanding what proportion of CTCs is viable and what proportion is in an apoptotic state. Here, we introduce a novel approach toward in situ characterization of CTC apoptosis status using a multicolor in vivo flow cytometry platform with fluorescent detection for the real‐time identification and enumeration of such cells directly in blood flow. The proof of concept was demonstrated with two‐color fluorescence flow cytometry (FFC) using breast cancer cells MDA‐MB‐231 expressing green fluorescein protein (GFP), staurosporine as an activator of apoptosis, Annexin‐V apoptotic kit with orange dye color, and a mouse model. The future application of this new platform for real‐time monitoring of antitumor drug efficiency is discussed. © 2019 International Society for Advancement of Cytometry

     
    more » « less
  4. Porcine reproductive and respiratory syndrome is an infectious disease of pigs caused by PRRS virus (PRRSV). A modified live-attenuated vaccine has been widely used to control the spread of PRRSV and the classification of field strains is a key for a successful control and prevention. Restriction fragment length polymorphism targeting the Open reading frame 5 (ORF5) genes is widely used to classify PRRSV strains but showed unstable accuracy. Phylogenetic analysis is a powerful tool for PRRSV classification with consistent accuracy but it demands large computational power as the number of sequences gets increased. Our study aimed to apply four machine learning (ML) algorithms, random forest, k-nearest neighbor, support vector machine and multilayer perceptron, to classify field PRRSV strains into four clades using amino acid scores based on ORF5 gene sequence. Our study used amino acid sequences of ORF5 gene in 1931 field PRRSV strains collected in the US from 2012 to 2020. Phylogenetic analysis was used to labels field PRRSV strains into one of four clades: Lineage 5 or three clades in Linage 1. We measured accuracy and time consumption of classification using four ML approaches by different size of gene sequences. We found that all four ML algorithms classify a large number of field strains in a very short time (<2.5 s) with very high accuracy (>0.99 Area under curve of the Receiver of operating characteristics curve). Furthermore, the random forest approach detects a total of 4 key amino acid positions for the classification of field PRRSV strains into four clades. Our finding will provide an insightful idea to develop a rapid and accurate classification model using genetic information, which also enables us to handle large genome datasets in real time or semi-real time for data-driven decision-making and more timely surveillance. 
    more » « less
  5. Abstract

    Circulating tumor cells (CTCs) are known to have cancer stem cell (CSC) properties and survive physiological conditions of fluid shear stress (FSS). However, current chemotherapy screening techniques do not adequately recapitulate this FSS environment and are not predictive of a drug response. In this study, MCF7 and MDA‐MB‐231 cells under FSS are used as an in vitro model of CTCs. The effects of doxorubicin (DOX) and paclitaxel on sheared cells using WST8 assay and stemness (CD44+/CD24) and apoptosis (Annexin V+/7‐AAD+) using flow cytometry are tested. Quantitative polymerase chain reaction is used to test gene expression. It is shown that suspension‐cultured and FSS treated MCF7 cells increase in drug resistance, especially with DOX. There is a synergistic increase in the CD44+/CD24CSC‐like population and an increase in drug resistance‐related gene expression in MCF7 cells co‐treated with FSS and drugs. There is also a correlated increase in STAT3 and NANOG expression under FSS. To the best of the authors' knowledge, this is the first report to suggest that the increase in CSC‐like cells from FSS contributes to drug resistance via the STAT3/NANOG pathway. This increase in CTC drug resistance also highlights the importance of implementing FSS, which is unavailable in current drug screening techniques.

     
    more » « less