- Award ID(s):
- 2205705
- PAR ID:
- 10476462
- Publisher / Repository:
- Elsevier
- Date Published:
- Journal Name:
- International Journal of Applied Earth Observation and Geoinformation
- Volume:
- 125
- ISSN:
- 1569-8432
- Page Range / eLocation ID:
- 103561
- Subject(s) / Keyword(s):
- Change detection Forest harvest Temporal segmentation LandTrendr Forest Inventory and Analysis Ensemble methods
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Random forests use ensembles of decision trees to boost accuracy for machine learning tasks. However, large ensembles slow down inference on platforms that process each tree in an ensemble individually. We present Bolt, a platform that restructures whole random forests, not just individual trees, to speed up inference. Conceptually, Bolt maps every path in each tree to a lookup table which, if cache were large enough, would allow inference with just one memory access. When the size of the lookup table exceeds cache capacity, Bolt employs a novel combination of lossless compression, parameter selection, and bloom filters to shrink the table while preserving fast inference. We compared inference speed in Bolt to three state-of-the-art platforms: Python Scikit-Learn, Ranger, and Forest Packing. We evaluated these platforms using datasets with vision, natural language processing and categorical applications. We observed that on ensembles of shallow decision trees Bolt can run 2-14X faster than competing platforms and that Bolt's speedups persist as the number of decision trees in an ensemble increases.more » « less
-
Multi-study learning uses multiple training studies, separately trains classifiers on individual studies, and then forms ensembles with weights rewarding members with better cross-study prediction ability. This article considers novel weighting approaches for constructing tree-based ensemble learners in this setting. Using Random Forests as a single-study learner, we perform a comparison of either weighting each forest to form the ensemble, or extracting the individual trees trained by each Random Forest and weighting them directly. We consider weighting approaches that reward cross-study replicability within the training set. We find that incorporating multiple layers of ensembling in the training process increases the robustness of the resulting predictor. Furthermore, we explore the mechanisms by which the ensembling weights correspond to the internal structure of trees to shed light on the important features in determining the relationship between the Random Forests algorithm and the true outcome model. Finally, we apply our approach to genomic datasets and show that our method improves upon the basic multi-study learning paradigm.more » « less
-
null (Ed.)Alaska has witnessed a significant increase in wildfire events in recent decades that have been linked to drier and warmer summers. Forest fuel maps play a vital role in wildfire management and risk assessment. Freely available multispectral datasets are widely used for land use and land cover mapping, but they have limited utility for fuel mapping due to their coarse spectral resolution. Hyperspectral datasets have a high spectral resolution, ideal for detailed fuel mapping, but they are limited and expensive to acquire. This study simulates hyperspectral data from Sentinel-2 multispectral data using the spectral response function of the Airborne Visible/Infrared Imaging Spectrometer-Next Generation (AVIRIS-NG) sensor, and normalized ground spectra of gravel, birch, and spruce. We used the Uniform Pattern Decomposition Method (UPDM) for spectral unmixing, which is a sensor-independent method, where each pixel is expressed as the linear sum of standard reference spectra. The simulated hyperspectral data have spectral characteristics of AVIRIS-NG and the reflectance properties of Sentinel-2 data. We validated the simulated spectra by visually and statistically comparing it with real AVIRIS-NG data. We observed a high correlation between the spectra of tree classes collected from AVIRIS-NG and simulated hyperspectral data. Upon performing species level classification, we achieved a classification accuracy of 89% for the simulated hyperspectral data, which is better than the accuracy of Sentinel-2 data (77.8%). We generated a fuel map from the simulated hyperspectral image using the Random Forest classifier. Our study demonstrated that low-cost and high-quality hyperspectral data can be generated from Sentinel-2 data using UPDM for improved land cover and vegetation mapping in the boreal forest.more » « less
-
null (Ed.)Ensembles of decision trees perform well on many problems, but are not interpretable. In contrast to existing approaches in interpretability that focus on explaining relationships between features and predictions, we propose an alternative approach to interpret tree ensemble classifiers by surfacing representative points for each class -- prototypes. We introduce a new distance for Gradient Boosted Tree models, and propose new, adaptive prototype selection methods with theoretical guarantees, with the flexibility to choose a different number of prototypes in each class. We demonstrate our methods on random forests and gradient boosted trees, showing that the prototypes can perform as well as or even better than the original tree ensemble when used as a nearest-prototype classifier. In a user study, humans were better at predicting the output of a tree ensemble classifier when using prototypes than when using Shapley values, a popular feature attribution method. Hence, prototypes present a viable alternative to feature-based explanations for tree ensembles.more » « less
-
Abstract Climate change is driving substantial changes in North American boreal forests, including changes in productivity, mortality, recruitment, and biomass. Despite the importance for carbon budgets and informing management decisions, there is a lack of near‐term (5–30 year) forecasts of expected changes in aboveground biomass (AGB). In this study, we forecast AGB changes across the North American boreal forest using machine learning, repeat measurements from 25,000 forest inventory sites, and gridded geospatial datasets. We find that AGB change can be predicted up to 30 years into the future, and that training on sites across the entire domain allows accurate predictions even in regions with only a small amount of existing field data. While predicting AGB loss is less skillful than gains, using a multi‐model ensemble can improve the accuracy in detecting change direction to >90% for observed increases, and up to 70% for observed losses. Higher stem density, winter temperatures, and the presence of temperate tree species in forest plots were positively associated with AGB change, whereas greater initial biomass, continentality (difference between mean summer and winter temperatures), prevalence of black spruce (
Picea mariana ), summer precipitation, and early warning metrics from long‐term remote sensing time series were negatively associated with AGB change. Across the domain, we predict nondisturbance‐induced declines in AGB at 23% of sites by 2030. The approach developed here can be used to estimate near‐future forest biomass in boreal North America and inform relevant management decisions. Our study also highlights the power of machine learning multi‐model ensembles when trained on a large volume of forest inventory plots, which could be applied to other regions with adequate plot density and spatial coverage.