skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multi-scale Cell Instance Segmentation with Keypoint Graph based Bounding Boxes
Most existing methods handle cell instance segmentation problems directly without relying on additional detection boxes. These methods generally fails to separate touching cells due to the lack of global understanding of the objects. In contrast, box-based instance segmentation solves this problem by combining object detection with segmentation. However, existing methods typically utilize anchor box-based detectors, which would lead to inferior instance segmentation performance due to the class imbalance issue. In this paper, we propose a new box-based cell instance segmentation method. In particular, we first detect the five pre-defined points of a cell via keypoints detection. Then we group these points according to a keypoint graph and subsequently extract the bounding box for each cell. Finally, cell segmentation is performed on feature maps within the bounding boxes. We validate our method on two cell datasets with distinct object shapes, and empirically demonstrate the superiority of our method compared to other instance segmentation techniques.  more » « less
Award ID(s):
1747778
PAR ID:
10170440
Author(s) / Creator(s):
Date Published:
Journal Name:
MICCAI 2019
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objects as points. Our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity. In a second stage, it refines these estimates using additional point features on the object. In CenterPoint, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective. CenterPoint achieved state-of-the-art performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model. On the Waymo Open Dataset, CenterPoint outperforms all previous single model methods by a large margin and ranks first among all Lidar-only submissions. 
    more » « less
  2. Detecting small objects (e.g., manhole covers, license plates, and roadside milestones) in urban images is a long-standing challenge mainly due to the scale of small object and background clutter. Although convolution neural network (CNN)-based methods have made significant progress and achieved impressive results in generic object detection, the problem of small object detection remains unsolved. To address this challenge, in this study we developed an end-to-end network architecture that has three significant characteristics compared to previous works. First, we designed a backbone network module, namely Reduced Downsampling Network (RD-Net), to extract informative feature representations with high spatial resolutions and preserve local information for small objects. Second, we introduced an Adjustable Sample Selection (ADSS) module which frees the Intersection-over-Union (IoU) threshold hyperparameters and defines positive and negative training samples based on statistical characteristics between generated anchors and ground reference bounding boxes. Third, we incorporated the generalized Intersection-over-Union (GIoU) loss for bounding box regression, which efficiently bridges the gap between distance-based optimization loss and area-based evaluation metrics. We demonstrated the effectiveness of our method by performing extensive experiments on the public Urban Element Detection (UED) dataset acquired by Mobile Mapping Systems (MMS). The Average Precision (AP) of the proposed method was 81.71%, representing an improvement of 1.2% compared with the popular detection framework Faster R-CNN. 
    more » « less
  3. Integral imaging has proven useful for three-dimensional (3D) object visualization in adverse environmental conditions such as partial occlusion and low light. This paper considers the problem of 3D object tracking. Two-dimensional (2D) object tracking within a scene is an active research area. Several recent algorithms use object detection methods to obtain 2D bounding boxes around objects of interest in each frame. Then, one bounding box can be selected out of many for each object of interest using motion prediction algorithms. Many of these algorithms rely on images obtained using traditional 2D imaging systems. A growing literature demonstrates the advantage of using 3D integral imaging instead of traditional 2D imaging for object detection and visualization in adverse environmental conditions. Integral imaging’s depth sectioning ability has also proven beneficial for object detection and visualization. Integral imaging captures an object’s depth in addition to its 2D spatial position in each frame. A recent study uses integral imaging for the 3D reconstruction of the scene for object classification and utilizes the mutual information between the object’s bounding box in this 3D reconstructed scene and the 2D central perspective to achieve passive depth estimation. We build over this method by using Bayesian optimization to track the object’s depth in as few 3D reconstructions as possible. We study the performance of our approach on laboratory scenes with occluded objects moving in 3D and show that the proposed approach outperforms 2D object tracking. In our experimental setup, mutual information-based depth estimation with Bayesian optimization achieves depth tracking with as few as two 3D reconstructions per frame which corresponds to the theoretical minimum number of 3D reconstructions required for depth estimation. To the best of our knowledge, this is the first report on 3D object tracking using the proposed approach. 
    more » « less
  4. Leonardis, A; Ricci, E; Roth, S; Russakovsky, O; Sattler, T; Varol, G (Ed.)
    Embodied agents must detect and localize objects of interest, e.g. traffic participants for self-driving cars. Supervision in the form of bounding boxes for this task is extremely expensive. As such, prior work has looked at unsupervised instance detection and segmentation, but in the absence of annotated boxes, it is unclear how pixels must be grouped into objects and which objects are of interest. This results in over-/under- segmentation and irrelevant objects. Inspired by human visual system and practical applications, we posit that the key missing cue for un- supervised detection is motion: objects of interest are typically mobile objects that frequently move and their motions can specify separate in- stances. In this paper, we propose MOD-UV, a Mobile Object Detector learned from Unlabeled Videos only. We begin with instance pseudo- labels derived from motion segmentation, but introduce a novel training paradigm to progressively discover small objects and static-but-mobile objects that are missed by motion segmentation. As a result, though only learned from unlabeled videos, MOD-UV can detect and segment mo- bile objects from a single static image. Empirically, we achieve state-of- the-art performance in unsupervised mobile object detection on Waymo Open, nuScenes, and KITTI Datasets without using any external data or supervised models. Code is available at github.com/YihongSun/MOD-UV. 
    more » « less
  5. Embodied agents must detect and localize objects of interest, e.g. traffic participants for self-driving cars. Supervision in the form of bounding boxes for this task is extremely expensive. As such, prior work has looked at unsupervised instance detection and segmentation, but in the absence of annotated boxes, it is unclear how pixels must be grouped into objects and which objects are of interest. This results in over-/under- segmentation and irrelevant objects. Inspired by human visual system and practical applications, we posit that the key missing cue for un- supervised detection is motion: objects of interest are typically mobile objects that frequently move and their motions can specify separate in- stances. In this paper, we propose MOD-UV, a Mobile Object Detector learned from Unlabeled Videos only. We begin with instance pseudo- labels derived from motion segmentation, but introduce a novel training paradigm to progressively discover small objects and static-but-mobile objects that are missed by motion segmentation. As a result, though only learned from unlabeled videos, MOD-UV can detect and segment mo- bile objects from a single static image. Empirically, we achieve state-of- the-art performance in unsupervised mobile object detection on Waymo Open, nuScenes, and KITTI Datasets without using any external data or supervised models. Code is available at github.com/YihongSun/MOD-UV. 
    more » « less