skip to main content

Title: Statistical binning leads to profound model violation due to gene tree error incurred by trying to avoid gene tree error
Award ID(s):
Publication Date:
Journal Name:
Molecular Phylogenetics and Evolution
Page Range or eLocation-ID:
164 to 171
Sponsoring Org:
National Science Foundation
More Like this
  1. When forest conditions are mapped from empirical models, uncertainty in remotely sensed predictor variables can cause the systematic overestimation of low values, underestimation of high values, and suppression of variability. This regression dilution or attenuation bias is a well-recognized problem in remote sensing applications, with few practical solutions. Attenuation is of particular concern for applications that are responsive to prediction patterns at the high end of observed data ranges, where systematic error is typically greatest. We addressed attenuation bias in models of tree species relative abundance (percent of total aboveground live biomass) based on multitemporal Landsat and topoclimatic predictor data. We developed a multi-objective support vector regression (MOSVR) algorithm that simultaneously minimizes total prediction error and systematic error caused by attenuation bias. Applied to 13 tree species in the Acadian Forest Region of the northeastern U.S., MOSVR performed well compared to other prediction methods including single-objective SVR (SOSVR) minimizing total error, Random Forest (RF), gradient nearest neighbor (GNN), and Random Forest nearest neighbor (RFNN) algorithms. SOSVR and RF yielded the lowest total prediction error but produced the greatest systematic error, consistent with strong attenuation bias. Underestimation at high relative abundance caused strong deviations between predicted patterns of species dominance/codominance andmore »those observed at field plots. In contrast, GNN and RFNN produced dominance/codominance patterns that deviated little from observed patterns, but predicted species relative abundance with lower accuracy and substantial systematic error. MOSVR produced the least systematic error for all species with total error often comparable to SOSVR or RF. Predicted patterns of dominance/codominance matched observations well, though not quite as well as GNN or RFNN. Overall, MOSVR provides an effective machine learning approach to the reduction of systematic prediction error and should be fully generalizable to other remote sensing applications and prediction problems.« less