skip to main content

Title: The accuracy of phenology estimators for use with sparsely sampled presence‐only observations

Phenology is one of the most immediate responses to global climate change, but data limitations have made examining phenology patterns across greater taxonomic, spatial and temporal scales challenging. One significant opportunity is leveraging rapidly increasing data resources from digitized museum specimens and community science platforms, but this assumes reliable statistical methods are available to estimate phenology using presence‐only data. Estimating the onset or offset of key events is especially difficult with incidental data, as lower data densities occur towards the tails of an abundance distribution.

The Weibull distribution has been recognized as an appropriate distribution to estimate phenology based on presence‐only data, but Weibull‐informed estimators are only available for onset and offset. We describe the mathematical framework for a new Weibull‐parameterized estimator of phenology appropriate for any percentile of a distribution and make it available in anrpackage,phenesse. We use simulations and empirical data on open flower timing and first arrival of monarch butterflies to quantify the accuracy of our estimator and other commonly used phenological estimators for 10 phenological metrics: onset, mean and offset dates, as well as the 1st, 5th, 10th, 50th, 90th, 95th and 99th percentile dates. Root mean squared errors and mean bias of the phenological estimators were calculated for different patterns of abundance and observation processes.

Results show a general pattern of decay in performance of estimates when moving from mean estimates towards the tails of the seasonal abundance curve, suggesting that onset and offset continue to be the most difficult phenometrics to estimate. However, with simple phenologies and enough observations, our newly developed estimator can provide useful onset and offset estimates. This is especially true for the start of the season, when incidental observations may be more common.

Our simulation demonstrates the potential of generating accurate phenological estimates from presence‐only data and guides the best use of estimators. The estimator that we developed, phenesse, is the least biased and has the lowest estimation error for onset estimates under most simulated and empirical conditions examined, improving the robustness of these estimates for phenological research.

more » « less
Award ID(s):
2033263 1703048 1702664
Author(s) / Creator(s):
 ;  ;  ;  ;
Publisher / Repository:
Date Published:
Journal Name:
Methods in Ecology and Evolution
Page Range / eLocation ID:
p. 1273-1285
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Species with different life histories and communities that vary in their seasonal constraints tend to shift their phenology (seasonal timing) differentially in response to climate warming.

    We investigate how these variable phenological shifts aggregate to influence phenological overlap within communities. Phenological advancements of later season species and extended durations of early season species may increase phenological overlap, with implications for species' interactions such as resource competition.

    We leverage extensive historic (1958–1960) and recent (2006–2015) weekly survey data for communities of grasshoppers along a montane elevation gradient to assess the impact of climate on shifts in the phenology and abundance distributions of species. We then examine how these responses are influenced by the seasonal timing of species and elevation, and how in aggregate they influence degrees of phenological overlap within communities.

    In warmer years, abundance distributions shift earlier in the season and become broader. Total abundance responds variably among species and we do not detect a significant response across species. Shifts in abundance distributions are not strongly shaped by species' seasonal timing or sites of variable elevations. The area of phenological overlap increases in warmer years due to shifts in the relative seasonal timing of compared species. Species that overwinter as nymphs increasingly overlap with later season species that advance their phenology. The days of phenological overlap also increase in warm years but the response varies across sites of variable elevation. Our phenological overlap metric based on comparing single events—the dates of peak abundance—does not shift significantly with warming.

    Phenological shifts are more complex than shifts in single dates such as first occurrence. As abundance distributions shift earlier and become broader in warm years, phenological overlap increases. Our analysis suggests that overall grasshopper abundance is relatively robust to climate and associated phenological shifts but we find that increased overlap can decrease abundance, potentially by strengthening species interactions such as resource competition.

    more » « less
  2. Abstract

    Understanding patterns of diversity is central to ecology and conservation, yet estimates of diversity are often biased by imperfect detection. In recent years, multi‐species occupancy models (MSOM) have been developed as a statistical tool to account for species‐specific heterogeneity in detection while estimating true measures of diversity. Although the power of these models has been tested in various ways, their ability to estimate gamma diversity—or true community size,Nis a largely unrecognized feature that needs rigorous evaluation.

    We use both simulations and an empirical dataset to evaluate the bias, precision, accuracy and coverage of estimates ofNfrom MSOM compared to the widely applied iChao2 non‐parametric estimator. We simulated 5,600 datasets across seven scenarios of varying average occupancy and detectability covariates, as well as varying numbers of sites, replicates and true community size. Additionally, we use a real dataset of surveys over 9 years (where species accumulation reached an asymptote, indicating trueN), to estimateNfrom each annual survey.

    Simulations showed that both MSOM and iChao2 estimators are generally accurate (i.e. unbiased and precise) except under unideal scenarios where mean species occupancy is low. In such scenarios, MSOM frequently overestimatedN. Across all scenarios, MSOM estimates were less certain than iChao2, but this led to over‐confident iChao2 estimates that showed poor coverage. Results from the real dataset largely confirmed the simulation findings, with MSOM estimates showing greater accuracy and coverage than iChao2.

    Community ecologists have a wide choice of analytical methods, and both iChao2 and MSOM estimates ofNare substantially preferable to raw species counts. The simplicity of non‐parametric estimators has obvious advantages, but our results show that in many cases, MSOM may provide superior estimates that also account more accurately for uncertainty. Both methods can show strong bias when average occupancy is very low, and practitioners should show caution when using estimates derived from either method under such conditions.

    more » « less
  3. Abstract

    Seed bank, seed dispersal and historical disturbance are critical factors affecting plant population persistence. However, because of difficulties collecting data on these factors they are often ignored.

    We evaluated the roles of seed bank, seed dispersal and historical disturbance on metapopulation persistence ofHypericum cumulicola, a Florida endemic. We took advantage of long‐term demographic data of multiple populations (22 years; ~11 K individuals; 15 populations) and a wealth of information on burn history (1962–present), and habitat attributes (patch specific location, elevation, area and aggregation) of a system of 92 patches of Florida rosemary scrub. We used previously developed integral projection models to assess the relative ability of simulations with different levels of seed dormancy for recently produced and older seeds and different dispersal kernels (including no dispersal) to predict regional observed occupancy and plant abundance in patches in 2016–2018. We compared a simulation with this model using historical burn history to 500 model simulations with the same average fire regime (using a Weibull distribution to determine the probability of ignition) but with random ignition years.

    The most likely model had limited dispersal (mean = 0.5 m) and the highest dormancy (field estimates × 1.2 %) and its predictions were associated with observed occurrences (67% correct) and densities (20% of variance explained). Historical burn synchrony among neighbouring patches (skewness in the number of patches burned by year = 1.79) probably explains the higher densities predicted by the simulation with the historical fire regime compared with predicted abundances after simulations using random ignition years (skewness = 0.20 +SE= 0.01).

    Synthesis.Our findings demonstrate the pivotal role of seed dormancy, dispersal and fire history on population dynamics, distribution and abundance. Because of the prevalence of metapopulation dynamics, we should be aware of the significance of changes in the availability and configuration of suitable habitat associated with human or non‐human landscape changes. Decisions on prescribed fires (or other disturbances) will benefit from our knowledge of consequences of fire frequency, but also of location of ignition and the probability of fire spread.

    more » « less
  4. Abstract

    Estimating phenotypic distributions of populations and communities is central to many questions in ecology and evolution. These distributions can be characterized by their moments (mean, variance, skewness and kurtosis) or diversity metrics (e.g. functional richness). Typically, such moments and metrics are calculated using community‐weighted approaches (e.g. abundance‐weighted mean). We propose an alternative bootstrapping approach that allows flexibility in trait sampling and explicit incorporation of intraspecific variation, and show that this approach significantly improves estimation while allowing us to quantify uncertainty.

    We assess the performance of different approaches for estimating the moments of trait distributions across various sampling scenarios, taxa and datasets by comparing estimates derived from simulated samples with the true values calculated from full datasets. Simulations differ in sampling intensity (individuals per species), sampling biases (abundance, size), trait data source (local vs. global) and estimation method (two types of community‐weighting, two types of bootstrapping).

    We introduce thetraitstrapR package, which contains a modular and extensible set of bootstrapping and weighted‐averaging functions that use community composition and trait data to estimate the moments of community trait distributions with their uncertainty. Importantly, the first function in the workflow,trait_fill, allows the user to specify hierarchical structures (e.g. plot within site, experiment vs. control, species within genus) to assign trait values to each taxon in each community sample.

    Across all taxa, simulations and metrics, bootstrapping approaches were more accurate and less biased than community‐weighted approaches. With bootstrapping, a sample size of 9 or more measurements per species per trait generally included the true mean within the 95% CI. It reduced average percent errors by 26%–74% relative to community‐weighting. Random sampling across all species outperformed both size‐ and abundance‐biased sampling.

    Our results suggest randomly sampling ~9 individuals per sampling unit and species, covering all species in the community and analysing the data using nonparametric bootstrapping generally enable reliable inference on trait distributions, including the central moments, of communities. By providing better estimates of community trait distributions, bootstrapping approaches can improve our ability to link traits to both the processes that generate them and their effects on ecosystems.

    more » « less
  5. Abstract

    Home range estimation is routine practice in ecological research. While advances in animal tracking technology have increased our capacity to collect data to support home range analysis, these same advances have also resulted in increasingly autocorrelated data. Consequently, the question of which home range estimator to use on modern, highly autocorrelated tracking data remains open. This question is particularly relevant given that most estimators assume independently sampled data. Here, we provide a comprehensive evaluation of the effects of autocorrelation on home range estimation. We base our study on an extensive data set ofGPSlocations from 369 individuals representing 27 species distributed across five continents. We first assemble a broad array of home range estimators, including Kernel Density Estimation (KDE) with four bandwidth optimizers (Gaussian reference function, autocorrelated‐Gaussian reference function [AKDE], Silverman's rule of thumb, and least squares cross‐validation), Minimum Convex Polygon, and Local Convex Hull methods. Notably, all of these estimators exceptAKDEassume independent and identically distributed (IID) data. We then employ half‐sample cross‐validation to objectively quantify estimator performance, and the recently introduced effective sample size for home range area estimation () to quantify the information content of each data set. We found thatAKDE95% area estimates were larger than conventionalIID‐based estimates by a mean factor of 2. The median number of cross‐validated locations included in the hold‐out sets byAKDE95% (or 50%) estimates was 95.3% (or 50.1%), confirming the largerAKDEranges were appropriately selective at the specified quantile. Conversely, conventional estimates exhibited negative bias that increased with decreasing . To contextualize our empirical results, we performed a detailed simulation study to tease apart how sampling frequency, sampling duration, and the focal animal's movement conspire to affect range estimates. Paralleling our empirical results, the simulation study demonstrated thatAKDEwas generally more accurate than conventional methods, particularly for small . While 72% of the 369 empirical data sets had >1,000 total observations, only 4% had an >1,000, where 30% had an <30. In this frequently encountered scenario of small ,AKDEwas the only estimator capable of producing an accurate home range estimate on autocorrelated data.

    more » « less