skip to main content


This content will become publicly available on December 1, 2024

Title: Not-so-random forests: Comparing voting and decision tree ensembles for characterizing partial harvest events
Ensemble-based change detection can improve map accuracies by combining information from multiple datasets. There is a growing literature investigating ensemble inputs and applications for forest disturbance detection and mapping. However, few studies have evaluated ensemble methods other than Random Forest classifiers, which rely on uninterpretable “black box” algorithms with hundreds of parameters. Additionally, most ensemble-based disturbance maps do not utilize independently and systematically collected field-based forest inventory measurements. Here, we compared three approaches for combining change detection results generated from multi-spectral Landsat time series with forest inventory measurements to map forest harvest events at an annual time step. We found that seven-parameter degenerate decision tree ensembles performed at least as well as 500-tree Random Forest ensembles trained and tested on the same LandTrendr segmentation results and both supervised decision tree methods consistently outperformed the top-performing voting approach (majority). Comparisons with an existing national forest disturbance dataset indicated notable improvements in accuracy that demonstrate the value of developing locally calibrated, process-specific disturbance datasets like the harvest event maps developed in this study. Furthermore, by using multi-date forest inventory measurements, we are able to establish a lower bound of 30% basal area removal on detectable harvests, providing biophysical context for our harvest event maps. Our results suggest that simple interpretable decision trees applied to multi-spectral temporal segmentation outputs can be as effective as more complex machine learning approaches for characterizing forest harvest events ranging from partial clearing to clear cuts, with important implications for locally accurate mapping of forest harvests and other types of disturbances.  more » « less
Award ID(s):
2205705
NSF-PAR ID:
10476462
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
International Journal of Applied Earth Observation and Geoinformation
Volume:
125
ISSN:
1569-8432
Page Range / eLocation ID:
103561
Subject(s) / Keyword(s):
["Change detection","Forest harvest","Temporal segmentation","LandTrendr","Forest Inventory and Analysis","Ensemble methods"]
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Alaska has witnessed a significant increase in wildfire events in recent decades that have been linked to drier and warmer summers. Forest fuel maps play a vital role in wildfire management and risk assessment. Freely available multispectral datasets are widely used for land use and land cover mapping, but they have limited utility for fuel mapping due to their coarse spectral resolution. Hyperspectral datasets have a high spectral resolution, ideal for detailed fuel mapping, but they are limited and expensive to acquire. This study simulates hyperspectral data from Sentinel-2 multispectral data using the spectral response function of the Airborne Visible/Infrared Imaging Spectrometer-Next Generation (AVIRIS-NG) sensor, and normalized ground spectra of gravel, birch, and spruce. We used the Uniform Pattern Decomposition Method (UPDM) for spectral unmixing, which is a sensor-independent method, where each pixel is expressed as the linear sum of standard reference spectra. The simulated hyperspectral data have spectral characteristics of AVIRIS-NG and the reflectance properties of Sentinel-2 data. We validated the simulated spectra by visually and statistically comparing it with real AVIRIS-NG data. We observed a high correlation between the spectra of tree classes collected from AVIRIS-NG and simulated hyperspectral data. Upon performing species level classification, we achieved a classification accuracy of 89% for the simulated hyperspectral data, which is better than the accuracy of Sentinel-2 data (77.8%). We generated a fuel map from the simulated hyperspectral image using the Random Forest classifier. Our study demonstrated that low-cost and high-quality hyperspectral data can be generated from Sentinel-2 data using UPDM for improved land cover and vegetation mapping in the boreal forest. 
    more » « less
  2. null (Ed.)
    Urban flooding is a major natural disaster that poses a serious threat to the urban environment. It is highly demanded that the flood extent can be mapped in near real-time for disaster rescue and relief missions, reconstruction efforts, and financial loss evaluation. Many efforts have been taken to identify the flooding zones with remote sensing data and image processing techniques. Unfortunately, the near real-time production of accurate flood maps over impacted urban areas has not been well investigated due to three major issues. (1) Satellite imagery with high spatial resolution over urban areas usually has nonhomogeneous background due to different types of objects such as buildings, moving vehicles, and road networks. As such, classical machine learning approaches hardly can model the spatial relationship between sample pixels in the flooding area. (2) Handcrafted features associated with the data are usually required as input for conventional flood mapping models, which may not be able to fully utilize the underlying patterns of a large number of available data. (3) High-resolution optical imagery often has varied pixel digital numbers (DNs) for the same ground objects as a result of highly inconsistent illumination conditions during a flood. Accordingly, traditional methods of flood mapping have major limitations in generalization based on testing data. To address the aforementioned issues in urban flood mapping, we developed a patch similarity convolutional neural network (PSNet) using satellite multispectral surface reflectance imagery before and after flooding with a spatial resolution of 3 meters. We used spectral reflectance instead of raw pixel DNs so that the influence of inconsistent illumination caused by varied weather conditions at the time of data collection can be greatly reduced. Such consistent spectral reflectance data also enhance the generalization capability of the proposed model. Experiments on the high resolution imagery before and after the urban flooding events (i.e., the 2017 Hurricane Harvey and the 2018 Hurricane Florence) showed that the developed PSNet can produce urban flood maps with consistently high precision, recall, F1 score, and overall accuracy compared with baseline classification models including support vector machine, decision tree, random forest, and AdaBoost, which were often poor in either precision or recall. The study paves the way to fuse bi-temporal remote sensing images for near real-time precision damage mapping associated with other types of natural hazards (e.g., wildfires and earthquakes). 
    more » « less
  3. Random forests use ensembles of decision trees to boost accuracy for machine learning tasks. However, large ensembles slow down inference on platforms that process each tree in an ensemble individually. We present Bolt, a platform that restructures whole random forests, not just individual trees, to speed up inference. Conceptually, Bolt maps every path in each tree to a lookup table which, if cache were large enough, would allow inference with just one memory access. When the size of the lookup table exceeds cache capacity, Bolt employs a novel combination of lossless compression, parameter selection, and bloom filters to shrink the table while preserving fast inference. We compared inference speed in Bolt to three state-of-the-art platforms: Python Scikit-Learn, Ranger, and Forest Packing. We evaluated these platforms using datasets with vision, natural language processing and categorical applications. We observed that on ensembles of shallow decision trees Bolt can run 2-14X faster than competing platforms and that Bolt's speedups persist as the number of decision trees in an ensemble increases. 
    more » « less
  4. Multi-study learning uses multiple training studies, separately trains classifiers on individual studies, and then forms ensembles with weights rewarding members with better cross-study prediction ability. This article considers novel weighting approaches for constructing tree-based ensemble learners in this setting. Using Random Forests as a single-study learner, we perform a comparison of either weighting each forest to form the ensemble, or extracting the individual trees trained by each Random Forest and weighting them directly. We consider weighting approaches that reward cross-study replicability within the training set. We find that incorporating multiple layers of ensembling in the training process increases the robustness of the resulting predictor. Furthermore, we explore the mechanisms by which the ensembling weights correspond to the internal structure of trees to shed light on the important features in determining the relationship between the Random Forests algorithm and the true outcome model. Finally, we apply our approach to genomic datasets and show that our method improves upon the basic multi-study learning paradigm. 
    more » « less
  5. Abstract

    Climate change is driving substantial changes in North American boreal forests, including changes in productivity, mortality, recruitment, and biomass. Despite the importance for carbon budgets and informing management decisions, there is a lack of near‐term (5–30 year) forecasts of expected changes in aboveground biomass (AGB). In this study, we forecast AGB changes across the North American boreal forest using machine learning, repeat measurements from 25,000 forest inventory sites, and gridded geospatial datasets. We find that AGB change can be predicted up to 30 years into the future, and that training on sites across the entire domain allows accurate predictions even in regions with only a small amount of existing field data. While predicting AGB loss is less skillful than gains, using a multi‐model ensemble can improve the accuracy in detecting change direction to >90% for observed increases, and up to 70% for observed losses. Higher stem density, winter temperatures, and the presence of temperate tree species in forest plots were positively associated with AGB change, whereas greater initial biomass, continentality (difference between mean summer and winter temperatures), prevalence of black spruce (Picea mariana), summer precipitation, and early warning metrics from long‐term remote sensing time series were negatively associated with AGB change. Across the domain, we predict nondisturbance‐induced declines in AGB at 23% of sites by 2030. The approach developed here can be used to estimate near‐future forest biomass in boreal North America and inform relevant management decisions. Our study also highlights the power of machine learning multi‐model ensembles when trained on a large volume of forest inventory plots, which could be applied to other regions with adequate plot density and spatial coverage.

     
    more » « less