Soybean prediction using computationally efficient Bayesian spatial regression models and satellite imagery

Fischer, Richard_J; Rekabdarkolaee, Hossein_Moradi  (ORCID:0000000334494953); Joshi, Deepak_R  (ORCID:000000025142284X); Clay, David_E  (ORCID:0000000250319744); Clay, Sharon_A  (ORCID:0000000341666995)

doi:10.1002/agj2.21670

Abstract Preharvest yield estimates can be used for harvest planning, marketing, and prescribing in‐season fertilizer and pesticide applications. One approach that is being widely tested is the use of machine learning (ML) or artificial intelligence (AI) algorithms to estimate yields. However, one barrier to the adoption of this approach is that ML/AI algorithms behave as a black block. An alternative approach is to create an algorithm using Bayesian statistics. In Bayesian statistics, prior information is used to help create the algorithm. However, algorithms based on Bayesian statistics are not often computationally efficient. The objective of the current study was to compare the accuracy and computational efficiency of four Bayesian models that used different assumptions to reduce the execution time. In this paper, the Bayesian multiple linear regression (BLR), Bayesian spatial, Bayesian skewed spatial regression, and the Bayesian nearest neighbor Gaussian process (NNGP) models were compared with ML non‐Bayesian random forest model. In this analysis, soybean (Glycine max) yields were the response variable (y), and spaced‐based blue, green, red, and near‐infrared reflectance that was measured with the PlanetScope satellite were the predictor (x). Among the models tested, the Bayesian (NNGP;R²‐testing = 0.485) model, which captures the short‐range correlation, outperformed the (BLR;R²‐testing = 0.02), Bayesian spatial regression (SRM;R²‐testing = 0.087), and Bayesian skewed spatial regression (sSRM;R²‐testing = 0.236) models. However, associated with improved accuracy was an increase in run time from 534 s for the BLR model to 2047 s for the NNGP model. These data show that relatively accurate within‐field yield estimates can be obtained without sacrificing computational efficiency and that the coefficients have biological meaning. However, all Bayesian models had lowerR²values and higher execution times than the random forest model.

More Like this