skip to main content


Title: Generalizing systematic adaptive cluster sampling for forest ecosystem inventory
Reliable statistical inference is central to forest ecology and management, much of which seeks to estimate population parameters for forest attributes and ecological indicators for biodiversity, functions and services in forest ecosystems. Many populations in nature such as plants or animals are characterized by aggregation of tendencies, introducing a big challenge to sampling. Regardless, a biased or imprecise inference would mislead analysis, hence the conclusion and policymaking. Systematic adaptive cluster sampling (SACS) is designunbiased and particularly efficient for inventorying spatially clustered populations. However, (1) oversampling is common for nonrare variables, making SACS a difficult choice for inventorying common forest attributes or ecological indicators; (2) a SACS sample is not completely specified until the field campaign is completed, making advance budgeting and logistics difficult; (3) even for rare variables, uncertainty regarding the final sample still persists; and (4) a SACS sample may be variable-specific as its formation can be adapted to a particular attribute or indicator, thus risking imbalance or non-representativeness for other jointly observed variables. Consequently, to solve these challenges, we aim to develop a generalized SACS (GSACS) with respect to the design and estimators, and to illustrate its connections with systematic sampling (SS) as has been widely employed by national forest inventories and ecological observation networks around the world. In addition to theoretical derivations, empirical sampling distributions were validated and compared for GSACS and SS using sampling simulations that incorporated a comprehensive set of forest populations exhibiting different spatial patterns. Five conclusions are relevant: (1) in contrast to SACS, GSACS explicitly supports inventorying forest attributes and ecological indicators that are nonrare, and solved SACS problems of oversampling, uncertain sample form, and sample imbalance for alternative attributes or indicators; (2) we demonstrated that SS is a special case of GSACS; (3) even with fewer sample plots, GSACS gives estimates identical to SS; (4) GSACS outperforms SS with respect to inventorying clustered populations and for making domain-specific estimates; and (5) the precision in design-based inference is negatively correlated with the prevalence of a spatial pattern, the range of spatial autocorrelation, and the sample plot size, in a descending order.  more » « less
Award ID(s):
1922758
PAR ID:
10291041
Author(s) / Creator(s):
Date Published:
Journal Name:
Forest ecology and management
Volume:
489
Issue:
2021
ISSN:
0378-1127
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Transect-based monitoring has long been a valuable tool in ecosystem monitoring. These transects are often used to measure multiple ecosystem attributes. The line-point intercept (LPI), vegetation height, and canopy gap intercept methods comprise a set of core methods, which provide indicators of ecosystem condition. However, users struggle to design a sampling strategy that optimizes the ability to detect ecological change using transect-based methods. We assessed the sensitivity of these core methods on a one-hectare plot to transect length, number, and sampling interval to determine: 1) minimum sampling required to describe ecosystem characteristics and detect change for each method and 2) optimal transect length and number for all three methods to make recommendations for future analyses and monitoring efforts. We used data from 13 National Wind Erosion Research Network locations spanning the western US, which included 151 measurements over time across five biomes. We found that longer and increased numbers of transects were more important for reducing sampling error than increased sample intensity along transects. For all methods and indicators across plots, three 100-m transects reduced sampling error so that indicator estimates fall within an 95% confidence interval of +/- 5% for canopy gap intercept and LPI-total foliar cover, +/- 5 cm for height and +/- two species for LPI-species counts. For the same criteria at 80% confidence intervals, two 100-m transects are needed. Site-scale inference was strongly affected by sample design, consequently our understanding of ecological dynamics may be influenced by sampling decisions. 
    more » « less
  2. Abstract

    A critical decision in landscape genetic studies is whether to use individuals or populations as the sampling unit. This decision affects the time and cost of sampling and may affect ecological inference. We analyzed 334 Columbia spotted frogs at 8 microsatellite loci across 40 sites in northern Idaho to determine how inferences from landscape genetic analyses would vary with sampling design. At all sites, we compared a proportion available sampling scheme (PASS), in which all samples were used, to resampled datasets of 2–11 individuals. Additionally, we compared a population sampling scheme (PSS) to an individual sampling scheme (ISS) at 18 sites with sufficient sample size. We applied an information theoretic approach with both restricted maximum likelihood and maximum likelihood estimation to evaluate competing landscape resistance hypotheses. We found that PSS supported low‐density forest when restricted maximum likelihood was used, but a combination model of most variables when maximum likelihood was used. We also saw variations when AIC was used compared to BIC. ISS supported this model as well as additional models when testing hypotheses of land cover types that create the greatest resistance to gene flow for Columbia spotted frogs. Increased sampling density and study extent, seen by comparing PSS to PASS, showed a change in model support. As number of individuals increased, model support converged at 7–9 individuals for ISS to PSS. ISS may be useful to increase study extent and sampling density, but may lack power to provide strong support for the correct model with microsatellite datasets. Our results highlight the importance of additional research on sampling design effects on landscape genetics inference.

     
    more » « less
  3. Abstract

    Network analysis of infectious disease in wildlife can reveal traits or individuals critical to pathogen transmission and help inform disease management strategies. However, estimates of contact between animals are notoriously difficult to acquire. Researchers commonly use telemetry technologies to identify animal associations, but such data may have different sampling intervals and often captures a small subset of the population. The objectives of this study were to outline best practices for telemetry sampling in network studies of infectious disease by determining (a) the consequences of telemetry sampling on our ability to estimate network structure, (b) whether contact networks can be approximated using purely spatial contact definitions and (c) how wildlife spatial configurations may influence telemetry sampling requirements.

    We simulated individual movement trajectories for wildlife populations using a home range‐like movement model, creating full location datasets and corresponding ‘complete’ networks. To mimic telemetry data, we created ‘sample’ networks by subsampling the population (10%–100% of individuals) with a range of sampling intervals (every minute to every 3 days). We varied the definition of contact for sample networks, using either spatiotemporal or spatial overlap, and varied the spatial configuration of populations (random, lattice or clustered). To compare complete and sample networks, we calculated seven network metrics important for disease transmission and assessed mean ranked correlation coefficients and percent error between complete and sample network metrics.

    Telemetry sampling severely reduced our ability to calculate global node‐level network metrics, but had less impact on local and network‐level metrics. Even so, in populations with infrequent associations, high intensity telemetry sampling may still be necessary. Defining contact in terms of spatial overlap generally resulted in overly connected networks, but in some instances, could compensate for otherwise coarse telemetry data.

    By synthesizing movement and disease ecology with computational approaches, we characterized trade‐offs important for using wildlife telemetry data beyond ecological studies of individual movement, and found that careful use of telemetry data has the potential to inform network models. Thus, with informed application of telemetry data, we can make significant advances in leveraging its use for a better understanding and management of wildlife infectious disease.

     
    more » « less
  4. Statistical confidence in estimates of timber volume, carbon storage, and other forest attributes depends, in part, on the uncertainty in field measurements. Surprisingly, measurement uncertainty is rarely reported, even though national forest inventories routinely repeat field measurements for quality assurance. We compared measurements made by field crews and quality assurance crews in the Forest Inventory and Analysis program of the U.S. Forest Service, using data from 2790 plots and 51 740 trees and saplings across the 24 states of the Northern Region. We characterized uncertainty in 12 national core tree-level variables; seven tree crown variables used in forest health monitoring; three variables describing seedlings; and 11 variables describing the site, such as elevation, slope, and distance from a road. Discrepancies in measurement were generally small but were higher for some variables requiring judgment, such as tree class, decay class, and cause of mortality. When scaled up to states, forest types, or the region, uncertainties in basal area, timber volume, and aboveground biomass were negligible. Understanding all sources of uncertainty is important to designing forest monitoring systems, managing the conduct of the inventory, and assessing the uncertainty of forest attributes required for making regional and national forest policy decisions. 
    more » « less
  5. Rutkowski L. ; Scherer R. ; Korytkowski M. ; Pedrycz W. ; Tadeusiewicz R. ; Zurada J. (Ed.)
    In this work, we investigate the impact of class imbalance on the accuracy and diversity of synthetic samples generated by conditional generative adversarial networks (CGAN) models. Though many studies utilizing GANs have seen extraordinary success in producing realistic image samples, these studies generally assume the use of well-processed and balanced benchmark image datasets, including MNIST and CIFAR-10. However, well-balanced data is uncommon in real world applications such as detecting fraud, diagnosing diabetes, and predicting solar flares. It is well known that when class labels are not distributed uniformly, the predictive ability of classification algorithms suffers significantly, a phenomenon known as the "class-imbalance problem." We show that the imbalance in the training set can also impact sample generation of CGAN models. We utilize the well known MNIST datasets, controlling the imbalance ratio of certain classes within the data through sampling. We are able to show that both the quality and diversity of generated samples suffer in the presence of class imbalances and propose a novel framework named Two-stage CGAN to produce high-quality synthetic samples in such cases. Our results indicate that the proposed framework provides a significant improvement over typical oversampling and undersampling techniques utilized for class imbalance remediation. 
    more » « less