skip to main content


Title: Consensus Subspace Clustering
One significant challenge in the field of supervised deep learning is the lack of large-scale labeled datasets for many problems. In this paper, we propose Consensus Spectral Clustering (CSC), which leverages the strengths of convolutional autoencoders and spectral clustering to provide pseudo labels for image data. This data can be used as weakly-labeled data for training and evaluating classifiers which require supervision. The primary weaknesses of previous works lies in their inability to isolate the object of interest in an image and cluster similar images together. We address these issues by denoising input images to remove pixels which do not contain data pertinent to the target. Additionally, we introduce a voting method for label selection to improve the clustering results. Our extensive experimentation on several benchmark datasets demonstrates that the proposed CSC method achieves competitive performance with state-of-the-art methods.  more » « less
Award ID(s):
1909707
NSF-PAR ID:
10342433
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE International Conference on Tools with Artificial Intelligence
Volume:
33
Page Range / eLocation ID:
391 to 395
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Collecting large annotated datasets in Remote Sensing is often expensive and thus can become a major obstacle for training advanced machine learning models. Common techniques of addressing this issue, based on the underlying idea of pre-training the Deep Neural Networks (DNN) on freely available large datasets, cannot be used for Remote Sensing due to the unavailability of such large-scale labeled datasets and the heterogeneity of data sources caused by the varying spatial and spectral resolution of different sensors. Self-supervised learning is an alternative approach that learns feature representation from unlabeled images without using any human annotations. In this paper, we introduce a new method for land cover mapping by using a clustering-based pretext task for self-supervised learning. We demonstrate the effectiveness of the method on two societally relevant applications from the aspect of segmentation performance, discriminative feature representation learning, and the underlying cluster structure. We also show the effectiveness of the active sampling using the clusters obtained from our method in improving the mapping accuracy given a limited budget of annotating. 
    more » « less
  2. Since the cost of labeling data is getting higher and higher, we hope to make full use of the large amount of unlabeled data and improve image classification effect through adding some unlabeled samples for training. In addition, we expect to uniformly realize two tasks, namely the clustering of the unlabeled data and the recognition of the query image. We achieve the goal by designing a novel sparse model based on manifold assumption, which has been proved to work well in many tasks. Based on the assumption that images of the same class lie on a sub-manifold and an image can be approximately represented as the linear combination of its neighboring data due to the local linear property of manifold, we proposed a sparse representation model on manifold. Specifically, there are two regularizations, i.e., a variant Trace lasso norm and the manifold Laplacian regularization. The first regularization term enables the representation coefficients satisfying sparsity between groups and density within a group. And the second term is manifold Laplacian regularization by which label can be accurately propagated from labeled data to unlabeled data. Augmented Lagrange Multiplier (ALM) scheme and Gauss Seidel Alternating Direction Method of Multiplier (GS-ADMM) are given to solve the problem numerically. We conduct some experiments on three human face databases and compare the proposed work with several state-of-the-art methods. For each subject, some labeled face images are randomly chosen for training for those supervised methods, and a small amount of unlabeled images are added to form the training set of the proposed approach. All experiments show our method can get better classification results due to the addition of unlabeled samples. 
    more » « less
  3. Abstract Motivation

    Multispectral biological fluorescence microscopy has enabled the identification of multiple targets in complex samples. The accuracy in the unmixing result degrades (i) as the number of fluorophores used in any experiment increases and (ii) as the signal-to-noise ratio in the recorded images decreases. Further, the availability of prior knowledge regarding the expected spatial distributions of fluorophores in images of labeled cells provides an opportunity to improve the accuracy of fluorophore identification and abundance.

    Results

    We propose a regularized sparse and low-rank Poisson regression unmixing approach (SL-PRU) to deconvolve spectral images labeled with highly overlapping fluorophores which are recorded in low signal-to-noise regimes. First, SL-PRU implements multipenalty terms when pursuing sparseness and spatial correlation of the resulting abundances in small neighborhoods simultaneously. Second, SL-PRU makes use of Poisson regression for unmixing instead of least squares regression to better estimate photon abundance. Third, we propose a method to tune the SL-PRU parameters involved in the unmixing procedure in the absence of knowledge of the ground truth abundance information in a recorded image. By validating on simulated and real-world images, we show that our proposed method leads to improved accuracy in unmixing fluorophores with highly overlapping spectra.

    Availability and implementation

    The source code used for this article was written in MATLAB and is available with the test data at https://github.com/WANGRUOGU/SL-PRU.

     
    more » « less
  4. Abstract In this article, we describe a modified implementation of Mask Region-based Convolutional Neural Networks (Mask-RCNN) for cosmic ray muon clustering in a liquid argon TPC and applied to MicroBooNE neutrino data. Our implementation of this network, called sMask-RCNN, uses sparse submanifold convolutions to increase processing speed on sparse datasets, and is compared to the original dense version in several metrics. The networks are trained to use wire readout images from the MicroBooNE liquid argon time projection chamber as input and produce individually labeled particle interactions within the image. These outputs are identified as either cosmic ray muon or electron neutrino interactions. We find that sMask-RCNN has an average pixel clustering efficiency of 85.9% compared to the dense network's average pixel clustering efficiency of 89.1%. We demonstrate the ability of sMask-RCNN used in conjunction with MicroBooNE's state-of-the-art Wire-Cell cosmic tagger to veto events containing only cosmic ray muons. The addition of sMask-RCNN to the Wire-Cell cosmic tagger removes 70% of the remaining cosmic ray muon background events at the same electron neutrino event signal efficiency. This event veto can provide 99.7% rejection of cosmic ray-only background events while maintaining an electron neutrino event-level signal efficiency of 80.1%. In addition to cosmic ray muon identification, sMask-RCNN could be used to extract features and identify different particle interaction types in other 3D-tracking detectors. 
    more » « less
  5. null (Ed.)
    High-throughput phenotyping enables the efficient collection of plant trait data at scale. One example involves using imaging systems over key phases of a crop growing season. Although the resulting images provide rich data for statistical analyses of plant phenotypes, image processing for trait extraction is required as a prerequisite. Current methods for trait extraction are mainly based on supervised learning with human labeled data or semisupervised learning with a mixture of human labeled data and unsupervised data. Unfortunately, preparing a sufficiently large training data is both time and labor-intensive. We describe a self-supervised pipeline (KAT4IA) that uses K -means clustering on greenhouse images to construct training data for extracting and analyzing plant traits from an image-based field phenotyping system. The KAT4IA pipeline includes these main steps: self-supervised training set construction, plant segmentation from images of field-grown plants, automatic separation of target plants, calculation of plant traits, and functional curve fitting of the extracted traits. To deal with the challenge of separating target plants from noisy backgrounds in field images, we describe a novel approach using row-cuts and column-cuts on images segmented by transform domain neural network learning, which utilizes plant pixels identified from greenhouse images to train a segmentation model for field images. This approach is efficient and does not require human intervention. Our results show that KAT4IA is able to accurately extract plant pixels and estimate plant heights. 
    more » « less