We propose a boundary-aware multi-task deep-learning- based framework for fast 3D building modeling from a sin- gle overhead image. Unlike most existing techniques which rely on multiple images for 3D scene modeling, we seek to model the buildings in the scene from a single overhead im- age by jointly learning a modified signed distance function (SDF) from the building boundaries, a dense heightmap of the scene, and scene semantics. To jointly train for these tasks, we leverage pixel-wise semantic segmentation and normalized digital surface maps (nDSM) as supervision, in addition to labeled building outlines. At test time, buildings in the scene are automatically modeled in 3D using only an input overhead image. We demonstrate an increase in building modeling performance using a multi-feature net- work architecture that improves building outline detection by considering network features learned for the other jointly learned tasks. We also introduce a novel mechanism for ro- bustly refining instance-specific building outlines using the learned modified SDF. We verify the effectiveness of our method on multiple large-scale satellite and aerial imagery datasets, where we obtain state-of-the-art performance in the 3D building reconstruction task.
more »
« less
Generative Building Feature Estimation From Satellite Images
Urban and environmental researchers seek to obtain building features (e.g., building shapes, counts, and areas) at large scales. However, blurriness, occlusions, and noise from prevailing satellite images severely hinder the performance of image segmentation, super-resolution, or deep-learning-based translation networks. In this article, we combine globally available satellite images and spatial geometric feature datasets to create a generative modeling framework that enables obtaining significantly improved accuracy in per-building feature estimation and the generation of visually plausible building footprints. Our approach is a novel design that compensates for the degradation present in satellite images by using a novel deep network setup that includes segmentation, generative modeling, and adversarial learning for instance-level building features. Our method has proven its robustness through large-scale prototypical experiments covering heterogeneous scenarios from dense urban to sparse rural. Results show better quality over advanced segmentation networks for urban and environmental planning, and show promise for future continental-scale urban applications.
more »
« less
- Award ID(s):
- 2107096
- PAR ID:
- 10468113
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Transactions on Geoscience and Remote Sensing
- Volume:
- 61
- ISSN:
- 0196-2892
- Page Range / eLocation ID:
- 1 to 13
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Image segmentation is a fundamental task that has benefited from recent advances in machine learning. One type of segmentation, of particular interest to computer vision, is that of urban segmentation. Although recent solutions have leveraged on deep neural networks, approaches usually do not consider regularities appearing in facade structures (e.g., windows are often in groups of similar alignment, size, or spacing patterns) as well as additional urban structures such as building footprints and roofs. Moreover, both satellite and street-view images are often noisy and occluded, thus getting the complete structure segmentation from a partial observation is difficult. Our key observations are that facades and other urban structures exhibit regular structures, and additional views are often available. In this paper, we present a novel framework (RFCNet) that consists of three modules to achieve multiple goals. Specifically, we propose Regularization to improve the regularities given an initial segmentation, Fusion that fuses multiple views of the segmentation, and Completion that can infer the complete structure if necessary. Experimental results show that our method outperforms previous state-of-the-art methods quantitatively and qualitatively for multiple facade datasets. Furthermore, by applying our framework to other urban structures (e.g., building footprints and roofs), we demonstrate our approach can be generalized to various pattern types.more » « less
-
Building an efficient and accurate pixel-level labeling framework for large-scale and high-resolution satellite imagery is an important machine learning application in the remote sensing area. Due to the very limited amount of the ground-truth data, we employ a well-performing superpixel tessellation approach to segment the image into homogeneous regions and then use these irregular-shaped regions as the foundation for the dense labeling work. A deep model based on generative adversarial networks is trained to learn the discriminating features from the image data without requiring any additional labeled information. In the subsequent classification step, we adopt the discriminator of this unsupervised model as a feature extractor and train a fast and robust support vector machine to assign the pixel-level labels. In the experiments, we evaluate our framework in terms of the pixel-level classification accuracy on satellite imagery with different geographical types. The results show that our dense-labeling framework is very competitive compared to the state-of-the-art methods that heavily rely on prior knowledge or other large-scale annotated datasets.more » « less
-
null (Ed.)Recent advances in big spatial data acquisition and deep learning allow novel algorithms that were not possible several years ago. We introduce a novel inverse procedural modeling algorithm for urban areas that addresses the problem of spatial data quality and uncertainty. Our method is fully automatic and produces a 3D approximation of an urban area given satellite imagery and global-scale data, including road network, population, and elevation data. By analyzing the values and the distribution of urban data, e.g., parcels, buildings, population, and elevation, we construct a procedural approximation of a city at a large-scale. Our approach has three main components: (1) procedural model generation to create parcel and building geometries, (2) parcel area estimation that trains neural networks to provide initial parcel sizes for a segmented satellite image of a city block, and (3) an optional optimization that can use partial knowledge of overall average building footprint area and building counts to improve results. We demonstrate and evaluate our approach on cities around the globe with widely different structures and automatically yield procedural models with up to 91,000 buildings, and spanning up to 150 km 2 . We obtain both a spatial arrangement of parcels and buildings similar to ground truth and a distribution of building sizes similar to ground truth, hence yielding a statistically similar synthetic urban space. We produce procedural models at multiple scales, and with less than 1% error in parcel and building areas in the best case as compared to ground truth and 5.8% error on average for tested cities.more » « less
-
Collecting large-scale medical datasets with fully annotated samples for training of deep networks is prohibitively expensive, especially for 3D volume data. Recent breakthroughs in self-supervised learning (SSL) offer the ability to overcome the lack of labeled training samples by learning feature representations from unlabeled data. However, most current SSL techniques in the medical field have been designed for either 2D images or 3D volumes. In practice, this restricts the capability to fully leverage unlabeled data from numerous sources, which may include both 2D and 3D data. Additionally, the use of these pre-trained networks is constrained to downstream tasks with compatible data dimensions.In this paper, we propose a novel framework for unsupervised joint learning on 2D and 3D data modalities. Given a set of 2D images or 2D slices extracted from 3D volumes, we construct an SSL task based on a 2D contrastive clustering problem for distinct classes. The 3D volumes are exploited by computing vectored embedding at each slice and then assembling a holistic feature through deformable self-attention mechanisms in Transformer, allowing incorporating long-range dependencies between slices inside 3D volumes. These holistic features are further utilized to define a novel 3D clustering agreement-based SSL task and masking embedding prediction inspired by pre-trained language models. Experiments on downstream tasks, such as 3D brain segmentation, lung nodule detection, 3D heart structures segmentation, and abnormal chest X-ray detection, demonstrate the effectiveness of our joint 2D and 3D SSL approach. We improve plain 2D Deep-ClusterV2 and SwAV by a significant margin and also surpass various modern 2D and 3D SSL approaches.more » « less
An official website of the United States government

