- Award ID(s):
- 2019609
- NSF-PAR ID:
- 10231457
- Date Published:
- Journal Name:
- Geoscientific Model Development
- Volume:
- 13
- Issue:
- 12
- ISSN:
- 1991-9603
- Page Range / eLocation ID:
- 6149 to 6164
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Many spatial analysis methods suffer from the scaling issue identified as part of the Modifiable Areal Unit Problem (MAUP). This article introduces the Pyramid Model (PM), a hierarchical data framework integrating space and spatial scale in a 3D environment to support multi-scale analysis. The utility of the PM is tested in examining quadrat density and kernel density, which are commonly used measures of point patterns. The two metrics computed from a simulated point set with varying scaling parameters (i.e. quadrats and bandwidths) are represented in the PM. The PM permits examination of the variation of the density metrics computed at all different scales. 3D visualization techniques (e.g. volume display, isosurfaces, and slicing) allow users to observe nested relations between spatial patterns at different scales and understand the scaling issue and MAUP in spatial analysis. A tool with interactive controls is developed to support visual exploration of the internal patterns in the PM. In addition to the point pattern measures, the PM has potential in analyzing other spatial indices, such as spatial autocorrelation indicators, coefficients of regression analysis and accuracy measures of spatial models. The implementation of the PM further advances the development of a multi-scale framework for spatio-temporal analysis.more » « less
-
Abstract The joint analysis of spatial and temporal processes poses computational challenges due to the data's high dimensionality. Furthermore, such data are commonly non-Gaussian. In this paper, we introduce a copula-based spatiotemporal model for analyzing spatiotemporal data and propose a semiparametric estimator. The proposed algorithm is computationally simple, since it models the marginal distribution and the spatiotemporal dependence separately. Instead of assuming a parametric distribution, the proposed method models the marginal distributions nonparametrically and thus offers more flexibility. The method also provides a convenient way to construct both point and interval predictions at new times and locations, based on the estimated conditional quantiles. Through a simulation study and an analysis of wind speeds observed along the border between Oregon and Washington, we show that our method produces more accurate point and interval predictions for skewed data than those based on normality assumptions.
-
Abstract The rapid urbanisation of China has received growing attention regarding its urban residential environments. In this article, we model the spatial heterogeneity of housing prices and explore the spatial discrepancy of landscape effects on property values in Shenzhen, a large Chinese city. In contrast to previous studies, this paper integrates the official housing transaction records and housing attributes from open data along with field surveys. Then, the results using the hedonic price model (HPM), geographically weighted regression (GWR) without landscape metrics and GWR with landscape metrics are compared. The results show that GWR with landscape metrics outperforms the other two models. In summary, this research provides new insights into landscape metrics in real estate studies and can guide decision‐makers plan and design cities while also providing guidance to regulate and control urban property values based on local conditions.
-
Kernel survival analysis models estimate individual survival distributions with the help of a kernel function, which measures the similarity between any two data points. Such a kernel function can be learned using deep kernel survival models. In this paper, we present a new deep kernel survival model called a survival kernet, which scales to large datasets in a manner that is amenable to model interpretation and also theoretical analysis. Specifically, the training data are partitioned into clusters based on a recently developed training set compression scheme for classification and regression called kernel netting that we extend to the survival analysis setting. At test time, each data point is represented as a weighted combination of these clusters, and each such cluster can be visualized. For a special case of survival kernets, we establish a finite-sample error bound on predicted survival distributions that is, up to a log factor, optimal. Whereas scalability at test time is achieved using the aforementioned kernel netting compression strategy, scalability during training is achieved by a warm-start procedure based on tree ensembles such as XGBoost and a heuristic approach to accelerating neural architecture search. On four standard survival analysis datasets of varying sizes (up to roughly 3 million data points), we show that survival kernets are highly competitive compared to various baselines tested in terms of time-dependent concordance index. Our code is available at: https://github.com/georgehc/survival-kernetsmore » « less
-
Geographically Weighted Regression (GWR) is a widely used tool for exploring spatial heterogeneity of processes over geographic space. GWR computes location-specific parameter estimates, which makes its calibration process computationally intensive. The maximum number of data points that can be handled by current open-source GWR software is approximately 15,000 observations on a standard desktop. In the era of big data, this places a severe limitation on the use of GWR. To overcome this limitation, we propose a highly scalable, open-source FastGWR implementation based on Python and the Message Passing Interface (MPI) that scales to the order of millions of observations. FastGWR optimizes memory usage along with parallelization to boost performance significantly. To illustrate the performance of FastGWR, a hedonic house price model is calibrated on approximately 1.3 million single-family residential properties from a Zillow dataset for the city of Los Angeles, which is the first effort to apply GWR to a dataset of this size. The results show that FastGWR scales linearly as the number of cores within the High-Performance Computing (HPC) environment increases. It also outperforms currently available open-sourced GWR software packages with drastic speed reductions – up to thousands of times faster – on a standard desktop.more » « less