skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning from counting: Leveraging temporal classification for weakly supervised object localization and detection
This paper reports a new solution of leveraging temporal classification to support weakly supervised object detection (WSOD). Specifically, we introduce raster scan-order techniques to serialize 2D images into 1D sequence data, and then leverage a combined LSTM (Long, Short-Term Memory) and CTC (Connectionist Temporal Classification) network to achieve object localization based on a total count (of interested objects). We term our proposed network LSTM-CCTC (Count-based CTC). This “learning from counting” strategy differs from existing WSOD methods in that our approach automatically identifies critical points on or near a target object. This strategy significantly reduces the need of generating a large number of candidate proposals for object localization. Experiments show that our method yields state-of-the-art performance based on an evaluation on PASCAL VOC datasets.  more » « less
Award ID(s):
1853864
PAR ID:
10276655
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2020 British Machine Vision Conference (BMVC2020)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The need for efficiently finding the video content a user wants is increasing because of the erupting of user-generated videos on the Web. Existing keyword-based or content-based video retrieval methods usually determine what occurs in a video but not when and where. In this paper, we make an answer to the question of when and where by formulating a new task, namely spatio-temporal video re-localization. Specifically, given a query video and a reference video, spatio-temporal video re-localization aims to localize tubelets in the reference video such that the tubelets semantically correspond to the query. To accurately localize the desired tubelets in the reference video, we propose a novel warp LSTM network, which propagates the spatio-temporal information for a long period and thereby captures the corresponding long-term dependencies. Another issue for spatio-temporal video re-localization is the lack of properly labeled video datasets. Therefore, we reorganize the videos in the AVA dataset to form a new dataset for spatio-temporal video re-localization research. Extensive experimental results show that the proposed model achieves superior performances over the designed baselines on the spatio-temporal video re-localization task. 
    more » « less
  2. ABSTRACT Microfluidic devices (MDs) present a novel method for detecting circulating tumor cells (CTCs), enhancing the process through targeted techniques and visual inspection. However, current approaches often yield heterogeneous CTC populations, necessitating additional processing for comprehensive analysis and phenotype identification. These procedures are often expensive, time‐consuming, and need to be performed by skilled technicians. In this study, we investigate the potential of a cost‐effective and efficient hyperuniform micropost MD approach for CTC classification. Our approach combines mathematical modeling of fluid–structure interactions in a simulated microfluidic channel with machine learning techniques. Specifically, we developed a cell‐based modeling framework to assess CTC dynamics in erythrocyte‐laden plasma flow, generating a large dataset of CTC trajectories that account for two distinct CTC phenotypes. Convolutional neural network (CNN) and recurrent neural network (RNN) were then employed to analyze the dataset and classify these phenotypes. The results demonstrate the potential effectiveness of the hyperuniform micropost MD design and analysis approach in distinguishing between different CTC phenotypes based on cell trajectory, offering a promising avenue for early cancer detection. 
    more » « less
  3. There is considerable interest in AI systems that can assist a cardiologist to diagnose echocardiograms, and can also be used to train residents in classifying echocardiograms. Prior work has focused on the analysis of a single frame. Classifying echocardiograms at the video-level is challenging due to intra-frame and inter-frame noise. We propose a two-stream deep network which learns from the spatial context and optical flow for the classification of echocardiography videos. Each stream contains two parts: a Convolutional Neural Network (CNN) for spatial features and a bi-directional Long Short-Term Memory (LSTM) network with Attention for temporal. The features from these two streams are fused for classification. We verify our experimental results on a dataset of 170 (80 normal and 90 abnormal) videos that have been manually labeled by trained cardiologists. Our method provides an overall accuracy of 91:18%, with a sensitivity of 94:11% and a specificity of 88:24%. 
    more » « less
  4. In this work, we investigate the problem of level curve tracking in unknown scalar fields using a limited number of mobile robots. We design and implement a long short-term memory (LSTM) enabled control strategy for a mobile sensor network to detect and track desired level curves. Based on the existing work of cooperative Kalman filter, we design an LSTM-enhanced Kalman filter that utilizes the sensor measurements and a sequence of past fields and gradients to estimate the current field value and gradient. We also design an LSTM model to estimate the Hessian of the field. The LSTM-enabled strategy has some benefits such as it can be trained offline on a collection of level curves in known fields prior to deployment, where the trained model will enable the mobile sensor network to track level curves in unknown fields for various applications. Another benefit is that we can train using larger resources to get more accurate models while utilizing a limited number of resources when the mobile sensor network is deployed in production. Simulation results show that this LSTM-enabled control strategy successfully tracks the level curve using a mobile multi-robot sensor network. 
    more » « less
  5. Messinger, David W.; Velez-Reyes, Miguel (Ed.)
    Recently, multispectral and hyperspectral data fusion models based on deep learning have been proposed to generate images with a high spatial and spectral resolution. The general objective is to obtain images that improve spatial resolution while preserving high spectral content. In this work, two deep learning data fusion techniques are characterized in terms of classification accuracy. These methods fuse a high spatial resolution multispectral image with a lower spatial resolution hyperspectral image to generate a high spatial-spectral hyperspectral image. The first model is based on a multi-scale long short-term memory (LSTM) network. The LSTM approach performs the fusion using a multiple step process that transitions from low to high spatial resolution using an intermediate step capable of reducing spatial information loss while preserving spectral content. The second fusion model is based on a convolutional neural network (CNN) data fusion approach. We present fused images using four multi-source datasets with different spatial and spectral resolutions. Both models provide fused images with increased spatial resolution from 8m to 1m. The obtained fused images using the two models are evaluated in terms of classification accuracy on several classifiers: Minimum Distance, Support Vector Machines, Class-Dependent Sparse Representation and CNN classification. The classification results show better performance in both overall and average accuracy for the images generated with the multi-scale LSTM fusion over the CNN fusion 
    more » « less