Thinning occurrence points does not improve species distribution model performance

Ten Caten, Cleber  (ORCID:0000000337883508); Dallas, Tad  (ORCID:0000000333289958)

doi:10.1002/ecs2.4703

Abstract Spatial biases are an intrinsic feature of occurrence data used in species distribution models (SDMs). Thinning species occurrences, where records close in the geographic or environmental space are removed from the modeling procedure, is an approach often used to address these biases. However, thinning occurrence data can also negatively affect SDM performance, given that the benefits of removing spatial biases might be outweighed by the detrimental effects of data loss caused by this approach. We used real and virtual species to evaluate how spatial and environmental thinning affected different performance metrics of four SDM methods. The occurrence data of virtual species were sampled randomly, evenly spaced, and clustered in the geographic space to simulate different types of spatial biases, and several spatial and environmental thinning distances were used to thin the occurrence data. Null datasets were also generated for each thinning distance where we randomly removed the same number of occurrences by a thinning distance and compared the results of the thinned and null datasets. We found that spatially or environmentally thinned occurrence data is no better than randomly removing them, given that thinned datasets performed similarly to null datasets. Specifically, spatial and environmental thinning led to a general decrease in model performances across all SDM methods. These results were observed for real and virtual species, were positively associated with thinning distance, and were consistent across the different types of spatial biases. Our results suggest that thinning occurrence data usually fails to improve SDM performance and that the use of thinning approaches when modeling species distributions should be considered carefully.

More Like this