Current biogeochemical models produce carbon–climate feedback projections with large uncertainties, often attributed to their structural differences when simulating soil organic carbon (SOC) dynamics worldwide. However, choices of model parameter values that quantify the strength and represent properties of different soil carbon cycle processes could also contribute to model simulation uncertainties. Here, we demonstrate the critical role of using common observational data in reducing model uncertainty in estimates of global SOC storage. Two structurally different models featuring distinctive carbon pools, decomposition kinetics, and carbon transfer pathways simulate opposite global SOC distributions with their customary parameter values yet converge to similar results after being informed by the same global SOC database using a data assimilation approach. The converged spatial SOC simulations result from similar simulations in key model components such as carbon transfer efficiency, baseline decomposition rate, and environmental effects on carbon fluxes by these two models after data assimilation. Moreover, data assimilation results suggest equally effective simulations of SOC using models following either first‐order or Michaelis–Menten kinetics at the global scale. Nevertheless, a wider range of data with high‐quality control and assurance are needed to further constrain SOC dynamics simulations and reduce unconstrained parameters. New sets of data, such as microbial genomics‐function relationships, may also suggest novel structures to account for in future model development. Overall, our results highlight the importance of observational data in informing model development and constraining model predictions. 
                        more » 
                        « less   
                    
                            
                            A Geographical Perspective on Simpson's Paradox
                        
                    
    
            The concept of scale is inherent to, and consequential for, the modeling of geographical processes. However, scale also causes huge problems because the results of many types of spatial analysis appear to be dependent on the scale of the units for which data are reported (measurement scale). Consequently, when the same spatial models are calibrated at different scales of aggregations, the results are often vastly different (the well-known Modifiable Areal Unit Problem or MAUP). With the advent of local models and the fundamental difference in their scale of application compared to global models, this issue is further exacerbated in unexpected ways. For example, a global model and local model calibrated using data measured at the same aggregation scale can also result in different and sometimes contradictory inferences (the classic Simpson's Paradox). Here we provide a geographical perspective on why and how contrasting inferences might result from the calibration of a local and global model using the same data. Further, we examine the viability of such an occurrence using a synthetic experiment and two empirical examples. Finally, we discuss how such a perspective might inform the analyst’s conundrum: when the respective inferences run counter to one another, do we believe the local or global model results? 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2117455
- PAR ID:
- 10540214
- Publisher / Repository:
- Journal of Spatial Information Science
- Date Published:
- Journal Name:
- Journal of Spatial Information Science
- Issue:
- 26
- ISSN:
- 1948-660X
- Page Range / eLocation ID:
- 1 to 25
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Abstract Achieving sustainable development requires understanding how human behavior and the environment interact across spatial scales. In particular, knowing how to manage tradeoffs between the environment and the economy, or between one spatial scale and another, necessitates a modeling approach that allows these different components to interact. Existing integrated local and global analyses provide key insights, but often fail to capture ‘meso-scale’ phenomena that operate at scales between the local and the global, leading to erroneous predictions and a constrained scope of analysis. Meso-scale phenomena are difficult to model because of their complexity and computational challenges, where adding additional scales can increase model run-time exponentially. These additions, however, are necessary to make models that include sufficient detail for policy-makers to assess tradeoffs. Here, we synthesize research that explicitly includes meso-scale phenomena and assess where further efforts might be fruitful in improving our predictions and expanding the scope of questions that sustainability science can answer. We emphasize five categories of models relevant to sustainability science, including biophysical models, integrated assessment models, land-use change models, earth-economy models and spatial downscaling models. We outline the technical and methodological challenges present in these areas of research and discuss seven directions for future research that will improve coverage of meso-scale effects. Additionally, we provide a specific worked example that shows the challenges present, and possible solutions, for modeling meso-scale phenomena in integrated earth-economy models.more » « less
- 
            Although Federated Learning (FL) enables global model training across clients without compromising their raw data, due to the unevenly distributed data among clients, existing Federated Averaging (FedAvg)-based methods suffer from the problem of low inference performance. Specifically, different data distributions among clients lead to various optimization directions of local models. Aggregating local models usually results in a low-generalized global model, which performs worse on most of the clients. To address the above issue, inspired by the observation from a geometric perspective that a well-generalized solution is located in a flat area rather than a sharp area, we propose a novel and heuristic FL paradigm named FedMR (Federated Model Recombination). The goal of FedMR is to guide the recombined models to be trained towards a flat area. Unlike conventional FedAvg-based methods, in FedMR, the cloud server recombines collected local models by shuffling each layer of them to generate multiple recombined models for local training on clients rather than an aggregated global model. Since the area of the flat area is larger than the sharp area, when local models are located in different areas, recombined models have a higher probability of locating in a flat area. When all recombined models are located in the same flat area, they are optimized towards the same direction. We theoretically analyze the convergence of model recombination. Experimental results show that, compared with state-of-the-art FL methods, FedMR can significantly improve the inference accuracy without exposing the privacy of each client.more » « less
- 
            Early stopping based on hold-out data is a popular regularization technique designed to mitigate overfitting and increase the predictive accuracy of neural networks. Models trained with early stopping often provide relatively accurate predictions,but they generally still lack precise statistical guarantees unless they are further calibrated using independent hold-out data. This paper addresses the above limitation with conformalized early stopping: a novel method that combines early stopping with conformal calibration while efficiently recycling the same hold-out data. This leads to models that are both accurate and able to provide exact predictive inferences without multiple data splits nor overly conservative adjustments. Practical implementations are developed for different learning tasks—outlier detection, multi-class classification, regression—and their competitive performance is demonstrated on real data.more » « less
- 
            Abstract Both hydrological and geophysical data can be used to calibrate hillslope hydrologic models. However, these data often reflect hydrological dynamics occurring at disparate spatial scales. Their use as sole objectives in model calibrations may thus result in different optimum hydraulic parameters and hydrologic model behavior. This is especially true for mountain hillslopes where the subsurface is often heterogeneous and the representative elementary volume can be on the scale of several m3. This study explores differences in hydraulic parameters and hillslope‐scale storage and flux dynamics of models calibrated with different hydrological and geophysical data. Soil water content, groundwater level, and two time‐lapse electrical resistivity tomography (ERT) data sets (transfer resistance and inverted resistivity) from two mountain hillslopes in Wyoming, USA, are used to calibrate physics‐based surface–subsurface hydrologic models of the hillslopes. Calibrations are performed using each data set independently and all data together resulting in five calibrated parameter sets at each site. Model predicted hillslope runoff and internal hydrological dynamics vary significantly depending on the calibration data set. Results indicate that water content calibration data yield models that overestimate near‐surface water storage in mountain hillslopes. Groundwater level calibration data yield models that more reasonably represent hillslope‐scale storage and flux dynamics. Additionally, ERT calibration data yield models with reasonable hillslope runoff predictions but relatively poor predictions of internal hillslope dynamics. These observations highlight the importance of carefully selecting data for hydrologic model calibration in mountain environments. Poor selection of calibration data may yield models with limited predictive capability depending on modeling goals and model complexity.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    