Evaluating the accuracy of binary classifiers for geomorphic applications

Rossi, Matthew William

doi:10.5194/esurf-12-765-2024

Increased access to high-resolution topography has revolutionized our ability to map out fine-scale topographic features at watershed to landscape scales. As our “vision” of the land surface has improved, so has the need for more robust quantification of the accuracy of the geomorphic maps we derive from these data. One broad class of mapping challenges is that of binary classification whereby remote sensing data are used to identify the presence or absence of a given feature. Fortunately, there is a large suite of metrics developed in the data sciences well suited to quantifying the pixel-level accuracy of binary classifiers. This analysis focuses on how these metrics perform when there is a need to quantify how the number and extent of landforms are expected to vary as a function of the environmental forcing (e.g., due to climate, ecology, material property, erosion rate). Results from a suite of synthetic surfaces show how the most widely used pixel-level accuracy metric, the F1 score, is particularly poorly suited to quantifying accuracy for this kind of application. Well-known biases to imbalanced data are exacerbated by methodological strategies that calibrate and validate classifiers across settings where feature abundances vary. The Matthews correlation coefficient largely removes this bias over a wide range of feature abundances such that the sensitivity of accuracy scores to geomorphic setting instead embeds information about the size and shape of features and the type of error. If error is random, the Matthews correlation coefficient is insensitive to feature size and shape, though preferential modification of the dominant class can limit the domain over which scores can be compared. If the error is systematic (e.g., due to co-registration error between remote sensing datasets), this metric shows strong sensitivity to feature size and shape such that smaller features with more complex boundaries induce more classification error. Future studies should build on this analysis by interrogating how pixel-level accuracy metrics respond to different kinds of feature distributions indicative of different types of surface processes.

More Like this