skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1933497

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable “student” model to mimic the predictions made by the black box “teacher” model. However, when the student model is sensitive to the variability of the data sets used for training even when keeping the teacher fixed, the corresponded interpretation is not reliable. Existing strategies stabilize model distillation by checking whether a large enough sample of pseudo-data is generated to reliably reproduce student models, but methods to do so have so far been developed separately for each specific class of student model. In this paper, we develop a generic approach for stable model distillation based on central limit theorem for the estimated fidelity of the student to the teacher. We start with a collection of candidate student models and search for candidates that reasonably agree with the teacher. Then we construct a multiple testing framework to select a sample size such that the consistent student model would be selected under different pseudo samples. We demonstrate the application of our proposed approach on three commonly used intelligible models: decision trees, falling rule lists and symbolic regression. Finally, we conduct simulation experiments on Mammographic Mass and Breast Cancer datasets and illustrate the testing procedure throughout a theoretical analysis with Markov process. The code is publicly available athttps://github.com/yunzhe-zhou/GenericDistillation. 
    more » « less
  2. Abstract Community assembly is often treated as deterministic, converging on one or at most a few possible stable endpoints. However, in nature, we typically observe continuous change in community composition, which is often ascribed to environmental change. But continuous changes in community composition can also arise in deterministic, time‐invariant community models, especially food web models. Our goal was to determine why some models produce continuous assembly and others do not. We investigated a simple two‐trophic‐level community model to show that continuous assembly is driven by the relative niche width of the trophic levels. If predators have a larger niche width than prey, community assembly converges to a stable equilibrium. Conversely, if predators have a smaller niche width than prey, then community composition never stabilizes. Evidence that food webs need not reach a stable equilibrium has important implications, as many ecological theories of community ecology based on equilibria may be difficult to apply to such food webs. 
    more » « less
  3. Abstract Varying coefficient models are a flexible extension of generic parametric models whose coefficients are functions of a set of effect-modifying covariates instead of fitted constants. They are capable of achieving higher model complexity while preserving the structure of the underlying parametric models, hence generating interpretable predictions. In this paper we study the use of gradient boosted decision trees as those coefficient-deciding functions in varying coefficient models with linearly structured outputs. In contrast to the traditional choices of splines or kernel smoothers, boosted trees are more flexible since they require no structural assumptions in the effect modifier space. We introduce our proposed method from the perspective of a localized version of gradient descent, prove its theoretical consistency under mild assumptions commonly adapted by decision tree research, and empirically demonstrate that the proposed tree boosted varying coefficient models achieve high performance qualified by their training speed, prediction accuracy and intelligibility as compared to several benchmark algorithms. 
    more » « less
  4. Abstract As a general rule, plants defend against herbivores with multiple traits. The defense synergy hypothesis posits that some traits are more effective when co‐expressed with others compared to their independent efficacy. However, this hypothesis has rarely been tested outside of phytochemical mixtures, and seldom under field conditions. We tested for synergies between multiple defense traits of common milkweed (Asclepias syriaca) by assaying the performance of two specialist chewing herbivores on plants in natural populations. We employed regression and a novel application of random forests to identify synergies and antagonisms between defense traits. We found the first direct empirical evidence for two previously hypothesized defense synergies in milkweed (latex by secondary metabolites, latex by trichomes) and identified numerous other potential synergies and antagonisms. Our strongest evidence for a defense synergy was between leaf mass per area and low nitrogen content; given that these “leaf economic” traits typically covary in milkweed, a defense synergy could reinforce their co‐expression. We report that each of the plant defense traits showed context‐dependent effects on herbivores, and increased trait expression could well be beneficial to herbivores for some ranges of observed expression. The novel methods and findings presented here complement more mechanistic approaches to the study of plant defense diversity and provide some of the best evidence to date that multiple classes of plant defense synergize in their impact on insects. Plant defense synergies against highly specialized herbivores, as shown here, are consistent with ongoing reciprocal evolution between these antagonists. 
    more » « less
  5. Abstract Matrix population models are frequently built and used by ecologists to analyse demography and elucidate the processes driving population growth or decline. Life Table Response Experiments (LTREs) are comparative analyses that decompose the realized difference or variance in population growth rate () into contributions from the differences or variances in the vital rates (i.e. the matrix elements). Since their introduction, LTREs have been based on approximations and have not included biologically relevant interaction terms.We used the functional analysis of variance framework to derive an exact LTRE method, which calculates the exact response of to the difference or variance in a given vital rate, for all interactions among vital rates—including higher‐order interactions neglected by the classical methods. We used the publicly available COMADRE and COMPADRE databases to perform a meta‐analysis comparing the results of exact and classical LTRE methods. We analysed 186 and 1487 LTREs for animal and plant matrix population models, respectively.We found that the classical methods often had small errors, but that very high errors were possible. Overall error was related to the difference or variance in the matrices being analysed, consistent with the Taylor series basis of the classical method. Neglected interaction terms accounted for most of the errors in fixed design LTRE, highlighting the importance of two‐way interaction terms. For random design LTRE, errors in the contribution terms present in both classical and exact methods were comparable to errors due to neglected interaction terms. In most examples we analysed, evaluating exact contributions up to three‐way interaction terms was sufficient for interpreting 90% or more of the difference or variance in .Relative error, previously used to evaluate the accuracy of classical LTREs, is not a reliable metric of how closely the classical and exact methods agree. Error compensation between estimated contribution terms and neglected contribution terms can lead to low relative error despite faulty biological interpretation. Trade‐offs or negative covariances among matrix elements can lead to high relative error despite accurate biological interpretation. Exact LTRE provides reliable and accurate biological interpretation, and the R packageexactLTREmakes the exact method accessible to ecologists. 
    more » « less
  6. Abstract Selecting among competing statistical models is a core challenge in science. However, the many possible approaches and techniques for model selection, and the conflicting recommendations for their use, can be confusing. We contend that much confusion surrounding statistical model selection results from failing to first clearly specify the purpose of the analysis. We argue that there are three distinct goals for statistical modeling in ecology: data exploration, inference, and prediction. Once the modeling goal is clearly articulated, an appropriate model selection procedure is easier to identify. We review model selection approaches and highlight their strengths and weaknesses relative to each of the three modeling goals. We then present examples of modeling for exploration, inference, and prediction using a time series of butterfly population counts. These show how a model selection approach flows naturally from the modeling goal, leading to different models selected for different purposes, even with exactly the same data set. This review illustrates best practices for ecologists and should serve as a reminder that statistical recipes cannot substitute for critical thinking or for the use of independent data to test hypotheses and validate predictions. 
    more » « less
  7. Rangel, Thiago F (Ed.)
    Species distribution models (SDMs) are frequently data-limited. In aquatic habitats, emerging environmental DNA (eDNA) sampling methods can be quicker and more cost-efficient than traditional count and capture surveys, but their utility for fitting SDMs is complicated by dilution, transport, and loss processes that modulate DNA concentrations and mix eDNA from different locations. Past models for estimating organism densities from measured species-specific eDNA concentrations have accounted for how these processes affect expected concentrations. We built off this previous work to construct a linear hierarchical model that also accounts for how they give rise to spatially correlated concentration errors. We applied our model to 60 simulated stream networks and three types of species niches in order to answer two questions: 1) what is the D-optimal sampling design, i.e. where should eDNA samples be positioned to most precisely estimate species–environment relationships? and 2) How does parameter estimation accuracy depend on the stream network’s topological and hydrologic properties? We found that correcting for eDNA dynamics was necessary to obtain consistent parameter estimates, and that relative to a heuristic benchmark design, optimizing sampling locations improved design efficiency by an average of 41.5%. Samples in the D-optimal design tended to be positioned near downstream ends of stream reaches high in the watershed, where eDNA concentration was high and mostly from homogeneous source areas, and they collectively spanned the full ranges of covariates. When measurement error was large, it was often optimal to collect replicate samples from high-information reaches. eDNA-based estimates of species–environment regression parameters were most precise in stream networks that had many reaches, large geographic size, slow flows, and/or high eDNA loss rates. Our study demonstrates the importance and viability of accounting for eDNA dilution, transport, and loss in order to optimize sampling designs and improve the accuracy of eDNA-based species distribution models. 
    more » « less
    Free, publicly-accessible full text available February 28, 2026
  8. Taylor, Caz M (Ed.)
    Abstract: One strand of modern coexistence theory (MCT) partitions invader growth rates (IGR) to quantify how different mechanisms contribute to species coexistence, highlighting fluctuation‐dependent mechanisms. A general conclusion from the classical analytic MCT theory is that coexistence mechanisms relying on temporal variation (such as the temporal storage effect) are generally less effective at promoting coexistence than mechanisms relying on spatial or spatiotemporal variation (primarily growth‐density covariance). However, the analytic theory assumes continuous population density, and IGRs are calculated for infinitesimally rare invaders that have infinite time to find their preferred habitat and regrow, without ever experiencing intraspecific competition. Here we ask if the disparity between spatial and temporal mechanisms persists when individuals are, instead, discrete and occupy finite amounts of space. We present a simulation‐based approach to quantifying IGRs in this situation, building on our previous approach for spatially non‐varying habitats. As expected, we found that spatial mechanisms are weakened; unexpectedly, the contribution to IGR from growth‐density covariance could even become negative, opposing coexistence. We also found shifts in which demographic parameters had the largest effect on the strength of spatial coexistence mechanisms. Our substantive conclusions are statements about one model, across parameter ranges that we subjectively considered realistic. Using the methods developed here, effects of individual discreteness should be explored theoretically across a broader range of conditions, and in models parameterized from empirical data on real communities. 
    more » « less
    Free, publicly-accessible full text available November 1, 2025
  9. Kisdi, Éva; Akçay, Erol (Ed.)
    In many species, a few individuals produce most of the next generation. How much of this reproductive skew is driven by variation among individuals in fixed traits, how much by external factors, and how much by random chance? And what does it take to have truly exceptional lifetime reproductive output (LRO)? In the past, we and others have partitioned the variance of LRO as a proxy for reproductive skew. Here we explain how to partition LRO skewness itself into contributions from fixed trait variation, four forms of “demographic luck” (birth state, fecundity luck, survival trajectory luck, and growth trajectory luck), and two kinds of “environmental luck” (birth environment and environment trajectory). Each of these is further partitioned into contributions at different ages.We also determine what we can infer about individuals with exceptional LRO. We find that reproductive skew is largely driven by random variation in lifespan, and exceptional LRO generally results from exceptional lifespan. Other kinds of luck frequently bring skewness down rather than increasing it. In populations where fecundity varies greatly with environmental conditions, getting a good year at the right time can be an alternate route to exceptional LRO, so that LRO is less predictive of lifespan. 
    more » « less
  10. Heino, Mikko (Ed.)
    Abstract: Chance pervades life. In turn, life histories are described by probabilities (e.g. survival and breeding) and averages across individuals (e.g. mean growth rate and age at maturity). In this study, we explored patterns of luck in lifetime outcomes by analysing structured population models for a wide array of plant and animal species. We calculated four response variables: variance and skewness in both lifespan and lifetime reproductive output (LRO), and partitioned them into contributions from different forms of luck. We examined relationships among response variables and a variety of life history traits. We found that variance in lifespan and variance in LRO were positively correlated across taxa, but that variance and skewness were negatively correlated for both lifespan and LRO. The most important life history trait was longevity, which shaped variance and skew in LRO through its effects on variance in lifespan. We found that luck in survival, growth, and fecundity all contributed to variance in LRO, but skew in LRO was overwhelmingly due to survival luck. Rapidly growing populations have larger variances in LRO and lifespan than shrinking populations. Our results indicate that luck‐induced genetic drift may be most severe in recovering populations of species with long mature lifespan and high iteroparity. 
    more » « less