skip to main content


Title: Unsupervised foreground extraction via deep region competition.
We present Deep Region Competition (DRC), an algorithm designed to extract foreground objects from images in a fully unsupervised manner. Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background. In this work, we rethink the foreground extraction by reconciling energy-based prior with generative image modeling in the form of Mixture of Experts (MoE), where we further introduce the learned pixel re-assignment as the essential inductive bias to capture the regularities of background regions. With this modeling, the foreground/background partition can be naturally found through Expectation-Maximization (EM). We show that the proposed method effectively exploits the interaction between the mixture components during the partitioning process, which closely connects to region competition, a seminal approach for generic image segmentation. Experiments demonstrate that DRC exhibits more competitive performances on complex real-world data and challenging multi-object scenes compared with prior methods. Moreover, we show empirically that DRC can potentially generalize to novel foreground objects even from categories unseen during training.  more » « less
Award ID(s):
2015577
NSF-PAR ID:
10351314
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Neural Information Processing Systems (NeurIPS 2021).
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. A self-driving car must be able to reliably handle adverse weather conditions (e.g., snowy) to operate safely. In this paper, we investigate the idea of turning sensor inputs (i.e., images) captured in an adverse condition into a benign one (i.e., sunny), upon which the downstream tasks (e.g., semantic segmentation) can attain high accuracy. Prior work primarily formulates this as an unpaired image-to-image translation problem due to the lack of paired images captured under the exact same camera poses and semantic layouts. While perfectly- aligned images are not available, one can easily obtain coarsely- paired images. For instance, many people drive the same routes daily in both good and adverse weather; thus, images captured at close-by GPS locations can form a pair. Though data from repeated traversals are unlikely to capture the same foreground objects, we posit that they provide rich contextual information to supervise the image translation model. To this end, we propose a novel training objective leveraging coarsely- aligned image pairs. We show that our coarsely-aligned training scheme leads to a better image translation quality and improved downstream tasks, such as semantic segmentation, monocular depth estimation, and visual localization. 
    more » « less
  2. In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation for long-tailed instance segmentation. We find that an abundance of instance segments can potentially be obtained freely from object-centric images, according to two insights: (i) an object-centric image usually contains one salient object in a simple background; (ii) objects from the same class often share similar appearances or similar contrasts to the background. Motivated by these insights, we propose a simple and scalable framework FREESEG for extracting and leveraging these “free” object segments to facilitate model training. Concretely, we investigate the similarity among object-centric images of the same class to propose candidate segments of foreground instances, followed by a novel ranking of segment quality. The resulting high quality object segments can then be used to augment the existing long-tailed datasets, e.g., by copying and pasting the segments onto the original training images. Extensive experiments show that FREESEG yields substantial improvements on top of strong baselines and achieves state-of-the-art accuracy for segmenting rare object categories. 
    more » « less
  3. Existing image inpainting methods typically fill holes by borrowing information from surrounding pixels. They often produce unsatisfactory results when the holes overlap with or touch foreground objects due to lack of information about the actual extent of foreground and background regions within the holes. These scenarios, however, are very important in practice, especially for applications such as the removal of distracting objects. To address the problem, we propose a foreground-aware image inpainting system that explicitly disentangles structure inference and content completion. Specifically, our model learns to predict the foreground contour first, and then inpaints the missing region using the predicted contour as guidance. We show that by such disentanglement, the contour completion model predicts reasonable contours of objects, and further substantially improves the performance of image inpainting. Experiments show that our method significantly outperforms existing methods and achieves superior inpainting results on challenging cases with complex compositions. 
    more » « less
  4. Context. Recent developments in time domain astronomy, such as Zwicky Transient Facility (ZTF), have made it possible to conduct daily scans of the entire visible sky, leading to the discovery of hundreds of new transients every night. Among these detections, 10 to 15 of these objects are supernovae (SNe), which have to be classified prior to cosmological use. The spectral energy distribution machine (SEDM) is a low-resolution ( ℛ ~ 100) integral field spectrograph designed, built, and operated with the aim of spectroscopically observing and classifying targets detected by the ZTF main camera. Aims. As the current pysedm pipeline can only handle isolated point sources, it is limited by contamination when the transient is too close to its host galaxy core. This can lead to an incorrect typing and ultimately bias the cosmological analyses, affecting the homogeneity of the SN sample in terms of local environment properties. We present a new scene modeler to extract the transient spectrum from its structured background, with the aim of improving the typing efficiency of the SEDM. Methods. H yper G al is a fully chromatic scene modeler that uses archival pre-transient photometric images of the SN environment to generate a hyperspectral model of the host galaxy. It is based on the cigale SED fitter used as a physically-motivated spectral interpolator. The galaxy model, complemented by a point source for the transient and a diffuse background component, is projected onto the SEDM spectro-spatial observation space and adjusted to observations, and the SN spectrum is ultimately extracted from this multi-component model. The full procedure, from scene modeling to transient spectrum extraction and typing, is validated on 5000 simulated cubes built from actual SEDM observations of isolated host galaxies, covering a broad range of observing conditions and scene parameters. Results. We introduce the contrast, c , as the transient-to-total flux ratio at the SN location, integrated over the ZTF r -band. From estimated contrast distribution of real SEDm observations, we show that H yper G al correctly classifies ~95% of SNe Ia, and up to 99% for contrast c ≳ 0.2, representing more than 90% of the observations. Compared to the standard point-source extraction method (without the hyperspectral galaxy modeling step), H yper G al correctly classifies 20% more SNe Ia between 0.1 < c < 0.6 (50% of the observation conditions), with less than 5% of SN Ia misidentifications. The false-positive rate is less than 2% for c > 0.1 (> 99% of the observations), which represents half as much as the standard extraction method. Assuming a similar contrast distribution for core-collapse SNe, H yper G al classifies 14% additional SNe II and 11% additional SNe Ibc. Conclusions. H yper G al has proven to be extremely effective in extracting and classifying SNe in the presence of strong contamination by the host galaxy, providing a significant improvement with respect to the single point-source extraction. 
    more » « less
  5. In recent years, deep neural networks have achieved state-of-the-art performance in a variety of recognition and segmentation tasks in medical imaging including brain tumor segmentation. We investigate that segmenting a brain tumor is facing to the imbalanced data problem where the number of pixels belonging to the background class (non tumor pixel) is much larger than the number of pixels belonging to the foreground class (tumor pixel). To address this problem, we propose a multitask network which is formed as a cascaded structure. Our model consists of two targets, i.e., (i) effectively differentiate the brain tumor regions and (ii) estimate the brain tumor mask. The first objective is performed by our proposed contextual brain tumor detection network, which plays a role of an attention gate and focuses on the region around brain tumor only while ignoring the far neighbor background which is less correlated to the tumor. Different from other existing object detection networks which process every pixel, our contextual brain tumor detection network only processes contextual regions around ground-truth instances and this strategy aims at producing meaningful regions proposals. The second objective is built upon a 3D atrous residual network and under an encode-decode network in order to effectively segment both large and small objects (brain tumor). Our 3D atrous residual network is designed with a skip connection to enables the gradient from the deep layers to be directly propagated to shallow layers, thus, features of different depths are preserved and used for refining each other. In order to incorporate larger contextual information from volume MRI data, our network utilizes the 3D atrous convolution with various kernel sizes, which enlarges the receptive field of filters. Our proposed network has been evaluated on various datasets including BRATS2015, BRATS2017 and BRATS2018 datasets with both validation set and testing set. Our performance has been benchmarked by both regionbased metrics and surface-based metrics. We also have conducted comparisons against state-of-the-art approaches 
    more » « less