Abstract Multiscale geographically weighted regression (MGWR) extends geographically weighted regression (GWR) by allowing process heterogeneity to be modeled at different spatial scales. While MGWR improves parameter estimates compared to GWR, the relationship between spatial scale and correlations within and among covariates—specifically spatial autocorrelation and collinearity—has not been systematically explored. This study investigates these relationships through controlled simulation experiments. Results indicate that spatial autocorrelation and collinearity affect specific model components rather than the entire model. Their impacts are cumulative but remain minimal unless they become very strong. MGWR effectively mitigates local multicollinearity issues by applying varying bandwidths across parameter surfaces. However, high levels of spatial autocorrelation and collinearity can lead to bandwidth underestimation for global processes, potentially producing false local effects. Additionally, strong collinearity may cause bandwidths to be overestimated for some processes, which helps mitigate collinearity but may obscure local effects. These findings suggest that while MGWR offers greater robustness against multicollinearity compared to GWR, bandwidth estimates should be interpreted with caution, as they can be influenced by strong spatial autocorrelation and collinearity. These results have important implications for empirical applications of MGWR.
more »
« less
Estimation and interpretation problems and solutions when using proportion covariates in linear regression models
Proportion variables, also known as compositional data, are very common in ecology. Unfortunately, few scientists are aware of how compositional data, when used as covariates, can adversely impact statistical analysis. We describe here how proportion covariates result in multicollinearity and parameter identifiability problems. Using simulated data on bird species richness as a function of land use, we show how these problems manifest when fitting a wide range of models in R, both in a frequentist and Bayesian framework. In particular, we show that similar models can often generate substantially different parameter estimates, leading to very different conclusions. Dropping a covariate or the intercept from the model can solve the multicollinearity and parameter identifiability problems. Unfortunately, these solutions do not fix the inherent challenges associated with interpreting parameter estimates. To this end, we propose focusing the interpretation on the difference of slope parameters to avoid the inherent unidentifiability of individual parameters. We also propose conditional plots with two‐axes and marginal plots as visualization techniques that can help users better interpret their modeling results. We illustrate these problems and proposed solutions using empirical data from the North American Breeding Bird Survey. The practical and straightforward approaches suggested in this article will help the fitting of linear models and interpretation of its results when some of the covariates are proportions.
more »
« less
- Award ID(s):
- 2040819
- PAR ID:
- 10581477
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Ecology
- Volume:
- 105
- Issue:
- 4
- ISSN:
- 0012-9658
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Uncertainty in parameter estimates from fitting within-host models to empirical data limits the model's ability to uncover mechanisms of infection, disease progression, and to guide pharmaceutical interventions. Understanding the effect of model structure and data availability on model predictions is important for informing model development and experimental design. To address sources of uncertainty in parameter estimation, we used four mathematical models of influenza A infection with increased degrees of biological realism. We tested the ability of each model to reveal its parameters in the presence of unlimited data by performing structural identifiability analyses. We then refined the results by predicting practical identifiability of parameters under daily influenza A virus titers alone or together with daily adaptive immune cell data. Using these approaches, we presented insight into the sources of uncertainty in parameter estimation and provided guidelines for the types of model assumptions, optimal experimental design, and biological information needed for improved predictions.more » « less
-
Abstract Determining accurate estimates for the characteristics of the severe acute respiratory syndrome coronavirus 2 in the upper and lower respiratory tracts, by fitting mathematical models to data, is made difficult by the lack of measurements early in the infection. To determine the sensitivity of the parameter estimates to the noise in the data, we developed a novel two-patch within-host mathematical model that considered the infection of both respiratory tracts and assumed that the viral load in the lower respiratory tract decays in a density dependent manner and investigated its ability to match population level data. We proposed several approaches that can improve practical identifiability of parameters, including an optimal experimental approach, and found that availability of viral data early in the infection is of essence for improving the accuracy of the estimates. Our findings can be useful for designing interventions.more » « less
-
Abstract Hierarchical probability models are being used more often than non-hierarchical deterministic process models in environmental prediction and forecasting, and Bayesian approaches to fitting such models are becoming increasingly popular. In particular, models describing ecosystem dynamics with multiple states that are autoregressive at each step in time can be treated as statistical state space models (SSMs). In this paper, we examine this subset of ecosystem models, embed a process-based ecosystem model into an SSM, and give closed form Gibbs sampling updates for latent states and process precision parameters when process and observation errors are normally distributed. Here, we use simulated data from an example model (DALECev) and study the effects changing the temporal resolution of observations on the states (observation data gaps), the temporal resolution of the state process (model time step), and the level of aggregation of observations on fluxes (measurements of transfer rates on the state process). We show that parameter estimates become unreliable as temporal gaps between observed state data increase. To improve parameter estimates, we introduce a method of tuning the time resolution of the latent states while still using higher-frequency driver information and show that this helps to improve estimates. Further, we show that data cloning is a suitable method for assessing parameter identifiability in this class of models. Overall, our study helps inform the application of state space models to ecological forecasting applications where (1) data are not available for all states and transfers at the operational time step for the ecosystem model and (2) process uncertainty estimation is desired.more » « less
-
The Theis equation is an important mathematical model used for analyzing drawdown data obtained from pumping tests to estimate aquifer parameters. Since the Theis model is a nonlinear equation, a complex graphical procedure is employed for fitting this equation to pump test data. This graphical method was originally proposed by Theis in the late 1930s, and since then, all the groundwater textbooks have included this fitting method. Over the past 90 years, every groundwater hydrologist has been trained to use this tedious procedure for estimating the values of aquifer transmissivity (T) and storage coefficient (S). Unfortunately, this mechanical procedure does not provide any intuition for understanding the inherent limitations in this manual fitting procedure. Furthermore, it does not provide an estimate for the parameter error. In this study, we employ the public domain coding platform Python to develop a script, namely, PyTheis, which can be used to simultaneously evaluate T and S values, and the error associated with these two parameters. We solve nine test problems to demonstrate the robustness of the Python script. The test problems include several published case studies that use real field data. Our tests show that the proposed Python script can efficiently solve a variety of pump test problems. The code can also be easily adapted to solve other hydrological problems that require nonlinear curve fitting routines.more » « less
An official website of the United States government

