skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Search for: All records

Creators/Authors contains: "Hooker, Giles"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Taylor, Caz M (Ed.)
    Abstract: One strand of modern coexistence theory (MCT) partitions invader growth rates (IGR) to quantify how different mechanisms contribute to species coexistence, highlighting fluctuation‐dependent mechanisms. A general conclusion from the classical analytic MCT theory is that coexistence mechanisms relying on temporal variation (such as the temporal storage effect) are generally less effective at promoting coexistence than mechanisms relying on spatial or spatiotemporal variation (primarily growth‐density covariance). However, the analytic theory assumes continuous population density, and IGRs are calculated for infinitesimally rare invaders that have infinite time to find their preferred habitat and regrow, without ever experiencing intraspecific competition. Here we ask if the disparity between spatial and temporal mechanisms persists when individuals are, instead, discrete and occupy finite amounts of space. We present a simulation‐based approach to quantifying IGRs in this situation, building on our previous approach for spatially non‐varying habitats. As expected, we found that spatial mechanisms are weakened; unexpectedly, the contribution to IGR from growth‐density covariance could even become negative, opposing coexistence. We also found shifts in which demographic parameters had the largest effect on the strength of spatial coexistence mechanisms. Our substantive conclusions are statements about one model, across parameter ranges that we subjectively considered realistic. Using the methods developed here, effects of individual discreteness should be explored theoretically across a broader range of conditions, and in models parameterized from empirical data on real communities.

     
    more » « less
    Free, publicly-accessible full text available November 1, 2025
  2. Abstract

    Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable “student” model to mimic the predictions made by the black box “teacher” model. However, when the student model is sensitive to the variability of the data sets used for training even when keeping the teacher fixed, the corresponded interpretation is not reliable. Existing strategies stabilize model distillation by checking whether a large enough sample of pseudo-data is generated to reliably reproduce student models, but methods to do so have so far been developed separately for each specific class of student model. In this paper, we develop a generic approach for stable model distillation based on central limit theorem for the estimated fidelity of the student to the teacher. We start with a collection of candidate student models and search for candidates that reasonably agree with the teacher. Then we construct a multiple testing framework to select a sample size such that the consistent student model would be selected under different pseudo samples. We demonstrate the application of our proposed approach on three commonly used intelligible models: decision trees, falling rule lists and symbolic regression. Finally, we conduct simulation experiments on Mammographic Mass and Breast Cancer datasets and illustrate the testing procedure throughout a theoretical analysis with Markov process. The code is publicly available athttps://github.com/yunzhe-zhou/GenericDistillation.

     
    more » « less
  3. Heino, Mikko (Ed.)
    Abstract: Chance pervades life. In turn, life histories are described by probabilities (e.g. survival and breeding) and averages across individuals (e.g. mean growth rate and age at maturity). In this study, we explored patterns of luck in lifetime outcomes by analysing structured population models for a wide array of plant and animal species. We calculated four response variables: variance and skewness in both lifespan and lifetime reproductive output (LRO), and partitioned them into contributions from different forms of luck. We examined relationships among response variables and a variety of life history traits. We found that variance in lifespan and variance in LRO were positively correlated across taxa, but that variance and skewness were negatively correlated for both lifespan and LRO. The most important life history trait was longevity, which shaped variance and skew in LRO through its effects on variance in lifespan. We found that luck in survival, growth, and fecundity all contributed to variance in LRO, but skew in LRO was overwhelmingly due to survival luck. Rapidly growing populations have larger variances in LRO and lifespan than shrinking populations. Our results indicate that luck‐induced genetic drift may be most severe in recovering populations of species with long mature lifespan and high iteroparity.

     
    more » « less
  4. Abstract

    Matrix population models are frequently built and used by ecologists to analyse demography and elucidate the processes driving population growth or decline. Life Table Response Experiments (LTREs) are comparative analyses that decompose the realized difference or variance in population growth rate () into contributions from the differences or variances in the vital rates (i.e. the matrix elements). Since their introduction, LTREs have been based on approximations and have not included biologically relevant interaction terms.

    We used the functional analysis of variance framework to derive an exact LTRE method, which calculates the exact response of to the difference or variance in a given vital rate, for all interactions among vital rates—including higher‐order interactions neglected by the classical methods. We used the publicly available COMADRE and COMPADRE databases to perform a meta‐analysis comparing the results of exact and classical LTRE methods. We analysed 186 and 1487 LTREs for animal and plant matrix population models, respectively.

    We found that the classical methods often had small errors, but that very high errors were possible. Overall error was related to the difference or variance in the matrices being analysed, consistent with the Taylor series basis of the classical method. Neglected interaction terms accounted for most of the errors in fixed design LTRE, highlighting the importance of two‐way interaction terms. For random design LTRE, errors in the contribution terms present in both classical and exact methods were comparable to errors due to neglected interaction terms. In most examples we analysed, evaluating exact contributions up to three‐way interaction terms was sufficient for interpreting 90% or more of the difference or variance in .

    Relative error, previously used to evaluate the accuracy of classical LTREs, is not a reliable metric of how closely the classical and exact methods agree. Error compensation between estimated contribution terms and neglected contribution terms can lead to low relative error despite faulty biological interpretation. Trade‐offs or negative covariances among matrix elements can lead to high relative error despite accurate biological interpretation. Exact LTRE provides reliable and accurate biological interpretation, and the R packageexactLTREmakes the exact method accessible to ecologists.

     
    more » « less
  5. Abstract

    Varying coefficient models are a flexible extension of generic parametric models whose coefficients are functions of a set of effect-modifying covariates instead of fitted constants. They are capable of achieving higher model complexity while preserving the structure of the underlying parametric models, hence generating interpretable predictions. In this paper we study the use of gradient boosted decision trees as those coefficient-deciding functions in varying coefficient models with linearly structured outputs. In contrast to the traditional choices of splines or kernel smoothers, boosted trees are more flexible since they require no structural assumptions in the effect modifier space. We introduce our proposed method from the perspective of a localized version of gradient descent, prove its theoretical consistency under mild assumptions commonly adapted by decision tree research, and empirically demonstrate that the proposed tree boosted varying coefficient models achieve high performance qualified by their training speed, prediction accuracy and intelligibility as compared to several benchmark algorithms.

     
    more » « less
  6. Abstract

    This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions. Methods such as the variable importance measures proposed for random forests, partial dependence plots, and individual conditional expectation plots remain popular because they are both model-agnostic and depend only on the pre-trained model output, making them computationally efficient and widely available in software. However, numerous studies have found that these tools can produce diagnostics that are highly misleading, particularly when there is strong dependence among features. The purpose of our work here is to (i) review this growing body of literature, (ii) provide further demonstrations of these drawbacks along with a detailed explanation as to why they occur, and (iii) advocate for alternative measures that involve additional modeling. In particular, we describe how breaking dependencies between features in hold-out data places undue emphasis on sparse regions of the feature space by forcing the original model to extrapolate to regions where there is little to no data. We explore these effects across various model setups and find support for previous claims in the literature that PaP metrics can vastly over-emphasize correlated features in both variable importance measures and partial dependence plots. As an alternative, we discuss and recommend more direct approaches that involve measuring the change in model performance after muting the effects of the features under investigation.

     
    more » « less
  7. null (Ed.)
    An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME [39], are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method. 
    more » « less
  8. null (Ed.)
    In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures. Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries. We argue that this is largely due to the "data modelers" incorporating algorithmic methods into their toolbox, particularly driven by recent developments in the statistical understanding of Breiman's own Random Forest methods. While this can be simplistically described as "Breiman won", these same developments also expose the limitations of the prediction-first philosophy that he espoused, making careful statistical analysis all the more important. This paper outlines these exciting recent developments in the random forest literature which, in our view, occurred as a result of a necessary blending of the two ways of thinking Breiman originally described. We also ask what areas statistics and statisticians might currently overlook. 
    more » « less