skip to main content


This content will become publicly available on October 31, 2024

Title: Large-scale urban building function mapping by integrating multi-source web-based geospatial data
Morphological (e.g. shape, size, and height) and function (e.g. working, living, and shopping) information of buildings is highly needed for urban planning and management as well as other applications such as city-scale building energy use modeling. Due to the limited availability of socio-economic geospatial data, it is more challenging to map building functions than building morphological information, especially over large areas. In this study, we proposed an integrated framework to map building functions in 50 U.S. cities by integrating multi-source web-based geospatial data. First, a web crawler was developed to extract Points of Interest (POIs) from Tripadvisor.com, and a map crawler was developed to extract POIs and land use parcels from Google Maps. Second, an unsupervised machine learning algorithm named OneClassSVM was used to identify residential buildings based on landscape features derived from Microsoft building footprints. Third, the type ratio of POIs and the area ratio of land use parcels were used to identify six non-residential functions (i.e. hospital, hotel, school, shop, restaurant, and office). The accuracy assessment indicates that the proposed framework performed well, with an average overall accuracy of 94% and a kappa coefficient of 0.63. With the worldwide coverage of Google Maps and Tripadvisor.com, the proposed framework is transferable to other cities over the world. The data products generated from this study are of great use for quantitative city-scale urban studies, such as building energy use modeling at the single building level over large areas.  more » « less
Award ID(s):
1854502
NSF-PAR ID:
10478447
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Taylor & Francis Online
Date Published:
Journal Name:
Geo-spatial Information Science
ISSN:
1009-5020
Page Range / eLocation ID:
1 to 15
Subject(s) / Keyword(s):
["Building functions, geospatial data, TripAdvisor, Google Static Maps"]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The information of building types is highly needed for urban planning and management, especially in high resolution building modeling in which buildings are the basic spatial unit. However, in many parts of the world, this information is still missing. In this paper, we proposed a framework to derive the information of building type using geospatial data, including point-of-interest (POI) data, building footprints, land use polygons, and roads, from Gaode and Baidu Maps. First, we used natural language processing (NLP)-based approaches (i.e., text similarity measurement and topic modeling) to automatically reclassify POI categories into which can be used to directly infer building types. Second, based on the relationship between building footprints and POIs, we identified building types using two indicators of type ratio and area ratio. The proposed framework was tested using over 440,000 building footprints in Beijing, China. Our NLP-based approaches and building type identification methods show overall accuracies of 89.0% and 78.2%, and kappa coefficient of 0.83 and 0.71, respectively. The proposed framework is transferrable to other China cities for deriving the information of building types from web mapping platforms. The data products generated from this study are of great use for quantitative urban studies at the building level. 
    more » « less
  2. High frequency and spatially explicit irrigated land maps are important for understanding the patterns and impacts of consumptive water use by agriculture. We built annual, 30 m resolution irrigation maps using Google Earth Engine for the years 1986–2018 for 11 western states within the conterminous U.S. Our map classifies lands into four classes: irrigated agriculture, dryland agriculture, uncultivated land, and wetlands. We built an extensive geospatial database of land cover from each class, including over 50,000 human-verified irrigated fields, 38,000 dryland fields, and over 500,000 km 2 of uncultivated lands. We used 60,000 point samples from 28 years to extract Landsat satellite imagery, as well as climate, meteorology, and terrain data to train a Random Forest classifier. Using a spatially independent validation dataset of 40,000 points, we found our classifier has an overall binary classification (irrigated vs. unirrigated) accuracy of 97.8%, and a four-class overall accuracy of 90.8%. We compared our results to Census of Agriculture irrigation estimates over the seven years of available data and found good overall agreement between the 2832 county-level estimates (r 2 = 0.90), and high agreement when estimates are aggregated to the state level (r 2 = 0.94). We analyzed trends over the 33-year study period, finding an increase of 15% (15,000 km 2 ) in irrigated area in our study region. We found notable decreases in irrigated area in developing urban areas and in the southern Central Valley of California and increases in the plains of eastern Colorado, the Columbia River Basin, the Snake River Plain, and northern California. 
    more » « less
  3. High-quality temperature data at a finer spatio-temporal scale is critical for analyzing the risk of heat exposure and hazards in urban environments. The variability of urban landscapes makes cities a challenging environment for quantifying heat exposure. Most of the existing heat hazard studies have inherent limitations on two fronts; first, the spatio-temporal granularities are too coarse, and second, the inability to track the ambient air temperature (AAT) instead of land surface temperature (LST). Overcoming these limitations requires developing models for mapping the variability in heat exposure in urban environments. We investigated an integrated approach for mapping urban heat hazards by harnessing a diverse set of high-resolution measurements, including both ground-based and satellite-based temperature data. We mounted vehicle-borne mobile sensors on city buses to collect high-frequency temperature data throughout 2018 and 2019. Our research also incorporated key biophysical parameters and Landsat 8 LST data into Random Forest regression modeling to map the hyperlocal variability of heat hazard over areas not covered by the buses. The vehicle-borne temperature sensor data showed large temperature differences within the city, with the largest variations of up to 10 °C and morning-afternoon diurnal changes at a magnitude around 20 °C. Random Forest modeling on noontime (11:30 am – 12:30 pm) data to predict AAT produced accurate results with a mean absolute error of 0.29 °C and successfully showcased the enhanced granularity in urban heat hazard mapping. These maps revealed well-defined hyperlocal variabilities in AAT, which were not evident with other research approaches. Urban core and dense residential areas revealed larger than 5 °C AAT differences from their nearby green spaces. The sensing framework developed in this study can be easily implemented in other urban areas, and findings from this study will be beneficial in understanding the heat vulnerabilities of individual communities. It can be used by the local government to devise targeted hazard mitigation efforts such as increasing green space, developing better heatsafety policies, and exposure warning for workers. 
    more » « less
  4. null (Ed.)
    Recent advances in big spatial data acquisition and deep learning allow novel algorithms that were not possible several years ago. We introduce a novel inverse procedural modeling algorithm for urban areas that addresses the problem of spatial data quality and uncertainty. Our method is fully automatic and produces a 3D approximation of an urban area given satellite imagery and global-scale data, including road network, population, and elevation data. By analyzing the values and the distribution of urban data, e.g., parcels, buildings, population, and elevation, we construct a procedural approximation of a city at a large-scale. Our approach has three main components: (1) procedural model generation to create parcel and building geometries, (2) parcel area estimation that trains neural networks to provide initial parcel sizes for a segmented satellite image of a city block, and (3) an optional optimization that can use partial knowledge of overall average building footprint area and building counts to improve results. We demonstrate and evaluate our approach on cities around the globe with widely different structures and automatically yield procedural models with up to 91,000 buildings, and spanning up to 150 km 2 . We obtain both a spatial arrangement of parcels and buildings similar to ground truth and a distribution of building sizes similar to ground truth, hence yielding a statistically similar synthetic urban space. We produce procedural models at multiple scales, and with less than 1% error in parcel and building areas in the best case as compared to ground truth and 5.8% error on average for tested cities. 
    more » « less
  5. Abstract: One-meter soil cores were taken to evaluate soil texture, bulk density, carbon and nitrogen pools, microbial biomass carbon and nitrogen content, microbial respiration, potential net nitrogen mineralization, potential net nitrification and inorganic nitrogen pools in 32 residential home lawns that differed by previous land use and age, but had similar soil types. These were compared to soils from 8 forested reference sites. Purpose: Soil cores were obtained from residential and forest sites in the Baltimore, MD USA metropolitan area. The residential sites were mostly within the Gwynns Falls Watershed (-76.012008W, -77.314183E, 39.724847N, 38.708367S and approximately 17 km2) Lawns on residential sites were dominated by a variety of cool season turfgrasses. Forest soil cores were taken from permanent forest plots of the Baltimore Ecosystem Study (BES) LTER (Groffman et al. 2006). These remnant forests are over 100 years old with soils that were comparable in type and texture to those underlying the residential study sites. Soils from all sites were from the Manor series (coarse-loamy, micaceous, mesic Typic Dystrudepts), which are well-drained upland soils with loamy textures and bedrock at 5 to 10 feet below the soil surface. To aid the site selection process we used neighborhoods in the Baltimore City metropolitan area that have been mapped using HERCULES, a high resolution land cover classification system designed to assist in the study of human-ecological systems (Cadenasso et al. 2007). Using HERCULES and additional data sources, we identified residential sites that were similar except for single factors that we hypothesized to be important predictors of ecosystem dynamics. These factors included land use history (agriculture and forest, n = 10 and n = 22), housing density (low and medium/high, n = 9 and n = 23), and housing age (4 to 58 yrs old, n = 32). Housing age was acquired from the Maryland Property View database. Prior land use was determined based on land use change maps developed by integrating aerial photos from 1938, 1957, 1971, and 1999 into a geographic information system. Once a list of residential parcels meeting the predefined criteria were identified, we sent mailings to property owners chosen at random from each of the factor groups with the goal of recruiting 40 property owners for a 3 year study (of which this work is a part). We had recruited 32 property owners at the time that soil cores were obtained. Data have been published in Raciti et al. (2011a, 2011b) References Cadenasso, M. L., S. T. A. Pickett, and K. Schwarz. 2007. Spatial heterogeneity in urban ecosystems: reconceptualizing land cover and a framework for classification. Frontiers in Ecology and the Environment 5:80-88. Groffman, P. M., R. V. Pouyat, M. L. Cadenasso, W. C. Zipperer, K. Szlavecz, I. D. Yesilonis, L. E. Band, and G. S. Brush. 2006. Land use context and natural soil controls on plant community composition and soil nitrogen and carbon dynamics in urban and rural forests. Forest Ecology and Management 236:177-192. Raciti, S. R., P. M. Groffman, J. C. Jenkins, R. V. Pouyat, and T. J. Fahey. 2011a. Controls on nitrate production and availability in residential soils. Ecological Applications:In press. Raciti, S. R., P. M. Groffman, J. C. Jenkins, R. V. Pouyat, T. J. Fahey, M. L. Cadenasso, and S. T. A. Pickett. 2011b. Accumulation of carbon and nitrogen in residential soils with different land use histories. Ecosystems 14:287-297. 
    more » « less