skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Deep multiple instance learning classifies subtissue locations in mass spectrometry images from tissue-level annotations
Motivation: Mass spectrometry imaging (MSI) characterizes the molecular composition of tissues at spatial resolution, and has a strong potential for distinguishing tissue types, or disease states. This can be achieved by supervised classification, which takes as input MSI spectra, and assigns class labels to subtissue locations. Unfortunately, developing such classifiers is hindered by the limited availability of training sets with subtissue labels as the ground truth. Subtissue labeling is prohibitively expensive, and only rough annotations of the entire tissues are typically available. Classifiers trained on data with approximate labels have sub-optimal performance. Results: To alleviate this challenge, we contribute a semi-supervised approach mi-CNN. mi-CNN implements multiple instance learning with a convolutional neural network (CNN). The multiple instance aspect enables weak supervision from tissue-level annotations when classifying subtissue locations. The convolutional architecture of the CNN captures contextual dependencies between the spectral features. Evaluations on simulated and experimental datasets demonstrated that mi-CNN improved the subtissue classification as compared to traditional classifiers. We propose mi-CNN as an important step towards accurate subtissue classification in MSI, enabling rapid distinction between tissue types and disease states.  more » « less
Award ID(s):
1759736
PAR ID:
10159354
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of Intelligent Systems for Molecular Biology (ISMB)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Mass spectrometry imaging (MSI) is widely used for the label-free molecular mapping of biological samples. The identification of co-localized molecules in MSI data is crucial to the understanding of biochemical pathways. One of key challenges in molecular colocalization is that complex MSI data are too large for manual annotation but too small for training deep neural networks. Herein, we introduce a self-supervised clustering approach based on contrastive learning, which shows an excellent performance in clustering of MSI data. We train a deep convolutional neural network (CNN) using MSI data from a single experiment without manual annotations to effectively learn high-level spatial features from ion images and classify them based on molecular colocalizations. We demonstrate that contrastive learning generates ion image representations that form well-resolved clusters. Subsequent self-labeling is used to fine-tune both the CNN encoder and linear classifier based on confidently classified ion images. This new approach enables autonomous and high-throughput identification of co-localized species in MSI data, which will dramatically expand the application of spatial lipidomics, metabolomics, and proteomics in biological research. 
    more » « less
  2. With the prevalence of machine learning in many high-stakes decision-making processes, e.g., hiring and admission, it is important to take fairness into account when practitioners design and deploy machine learning models, especially in scenarios with imperfectly labeled data. Multiple-Instance Learning (MIL) is a weakly supervised approach where instances are grouped in labeled bags, each containing several instances sharing the same label. However, current fairness-centric methods in machine learning often fall short when applied to MIL due to their reliance on instance-level labels. In this work, we introduce a Fair Multiple-Instance Learning (FMIL) framework to ensure fairness in weakly supervised learning. In particular, our method bridges the gap between bag-level and instance-level labeling by leveraging the bag labels, inferring high-confidence instance labels to improve both accuracy and fairness in MIL classifiers. Comprehensive experiments underscore that our FMIL framework substantially reduces biases in MIL without compromising accuracy. 
    more » « less
  3. Abstract While convective storm mode is explicitly depicted in convection-allowing model (CAM) output, subjectively diagnosing mode in large volumes of CAM forecasts can be burdensome. In this work, four machine learning (ML) models were trained to probabilistically classify CAM storms into one of three modes: supercells, quasi-linear convective systems, and disorganized convection. The four ML models included a dense neural network (DNN), logistic regression (LR), a convolutional neural network (CNN) and semi-supervised CNN-Gaussian mixture model (GMM). The DNN, CNN, and LR were trained with a set of hand-labeled CAM storms, while the semi-supervised GMM used updraft helicity and storm size to generate clusters which were then hand labeled. When evaluated using storms withheld from training, the four classifiers had similar ability to discriminate between modes, but the GMM had worse calibration. The DNN and LR had similar objective performance to the CNN, suggesting that CNN-based methods may not be needed for mode classification tasks. The mode classifications from all four classifiers successfully approximated the known climatology of modes in the U.S., including a maximum in supercell occurrence in the U.S. Central Plains. Further, the modes also occurred in environments recognized to support the three different storm morphologies. Finally, storm mode provided useful information about hazard type, e.g., storm reports were most likely with supercells, further supporting the efficacy of the classifiers. Future applications, including the use of objective CAM mode classifications as a novel predictor in ML systems, could potentially lead to improved forecasts of convective hazards. 
    more » « less
  4. In the era of big data, data-driven based classification has become an essential method in smart manufacturing to guide production and optimize inspection. The industrial data obtained in practice is usually time-series data collected by soft sensors, which are highly nonlinear, nonstationary, imbalanced, and noisy. Most existing soft-sensing machine learning models focus on capturing either intra-series temporal dependencies or pre-defined inter-series correlations, while ignoring the correlation between labels as each instance is associated with multiple labels simultaneously. In this paper, we propose a novel graph based soft-sensing neural network (GraSSNet) for multivariate time-series classification of noisy and highly-imbalanced soft-sensing data. The proposed GraSSNet is able to 1) capture the inter-series and intra-series dependencies jointly in the spectral domain; 2) exploit the label correlations by superimposing label graph that built from statistical co-occurrence information; 3) learn features with attention mechanism from both textual and numerical domain; and 4) leverage unlabeled data and mitigate data imbalance by semi-supervised learning. Comparative studies with other commonly used classifiers are carried out on Seagate soft sensing data, and the experimental results validate the competitive performance of our proposed method. 
    more » « less
  5. This paper presents a unified framework to learn to quantify perceptual attributes (e.g., safety, attractiveness) of physical urban environments using crowd-sourced street-view photos without human annotations. The efforts of this work include two folds. First, we collect a large-scale urban image dataset in multiple major cities in U.S.A., which consists of multiple street-view photos for every place. Instead of using subjective annotations as in previous works, which are neither accurate nor consistent, we collect for every place the safety score from government’s crime event records as objective safety indicators. Second, we observe that the place-centric perception task is by nature a multi-instance regression problem since the labels are only available for places (bags), rather than images or image regions (instances). We thus introduce a deep convolutional neural network (CNN) to parameterize the instance-level scoring function, and develop an EM algorithm to alternatively estimate the primary instances (images or image regions) which affect the safety scores and train the proposed network. Our method is capable of localizing interesting images and image regions for each place.We evaluate the proposed method on a newly created dataset and a public dataset. Results with comparisons showed that our method can clearly outperform the alternative perception methods and more importantly, is capable of generating region-level safety scores to facilitate interpretations of the perception process. 
    more » « less