NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Using Longitudinal Data for Plausible Counterfactual Explanations

Asemota, Alexander; Hooker, Giles (August 2025, HI-AI@KDD, Human-Interpretable AI Workshop at the KDD 2024, 26th of August 2024, Barcelona, Spain)

Free, publicly-accessible full text available August 26, 2026
It's about (taking up) space: Discreteness of individuals and the strength of spatial coexistence mechanisms

https://doi.org/10.1002/ecy.4404

Ellner, Stephen P; Snyder, Robin E; Adler, Peter B; Hernández, Christina M; Hooker, Giles (November 2024, Ecology)
Taylor, Caz M (Ed.)
Abstract: One strand of modern coexistence theory (MCT) partitions invader growth rates (IGR) to quantify how different mechanisms contribute to species coexistence, highlighting fluctuation‐dependent mechanisms. A general conclusion from the classical analytic MCT theory is that coexistence mechanisms relying on temporal variation (such as the temporal storage effect) are generally less effective at promoting coexistence than mechanisms relying on spatial or spatiotemporal variation (primarily growth‐density covariance). However, the analytic theory assumes continuous population density, and IGRs are calculated for infinitesimally rare invaders that have infinite time to find their preferred habitat and regrow, without ever experiencing intraspecific competition. Here we ask if the disparity between spatial and temporal mechanisms persists when individuals are, instead, discrete and occupy finite amounts of space. We present a simulation‐based approach to quantifying IGRs in this situation, building on our previous approach for spatially non‐varying habitats. As expected, we found that spatial mechanisms are weakened; unexpectedly, the contribution to IGR from growth‐density covariance could even become negative, opposing coexistence. We also found shifts in which demographic parameters had the largest effect on the strength of spatial coexistence mechanisms. Our substantive conclusions are statements about one model, across parameter ranges that we subjectively considered realistic. Using the methods developed here, effects of individual discreteness should be explored theoretically across a broader range of conditions, and in models parameterized from empirical data on real communities.
more » « less
Full Text Available
A generic approach for reproducible model distillation

https://doi.org/10.1007/s10994-024-06597-w

Zhou, Yunzhe; Xu, Peiru; Hooker, Giles (August 2024, Machine Learning)

Abstract Model distillation has been a popular method for producing interpretable machine learning. It uses an interpretable “student” model to mimic the predictions made by the black box “teacher” model. However, when the student model is sensitive to the variability of the data sets used for training even when keeping the teacher fixed, the corresponded interpretation is not reliable. Existing strategies stabilize model distillation by checking whether a large enough sample of pseudo-data is generated to reliably reproduce student models, but methods to do so have so far been developed separately for each specific class of student model. In this paper, we develop a generic approach for stable model distillation based on central limit theorem for the estimated fidelity of the student to the teacher. We start with a collection of candidate student models and search for candidates that reasonably agree with the teacher. Then we construct a multiple testing framework to select a sample size such that the consistent student model would be selected under different pseudo samples. We demonstrate the application of our proposed approach on three commonly used intelligible models: decision trees, falling rule lists and symbolic regression. Finally, we conduct simulation experiments on Mammographic Mass and Breast Cancer datasets and illustrate the testing procedure throughout a theoretical analysis with Markov process. The code is publicly available athttps://github.com/yunzhe-zhou/GenericDistillation.
more » « less
The natural history of luck: A synthesis study of structured population models

https://doi.org/10.1111/ele.14390

Hernández, Christina M; Ellner, Stephen P; Snyder, Robin E; Hooker, Giles (March 2024, Ecology Letters)
Heino, Mikko (Ed.)
Abstract: Chance pervades life. In turn, life histories are described by probabilities (e.g. survival and breeding) and averages across individuals (e.g. mean growth rate and age at maturity). In this study, we explored patterns of luck in lifetime outcomes by analysing structured population models for a wide array of plant and animal species. We calculated four response variables: variance and skewness in both lifespan and lifetime reproductive output (LRO), and partitioned them into contributions from different forms of luck. We examined relationships among response variables and a variety of life history traits. We found that variance in lifespan and variance in LRO were positively correlated across taxa, but that variance and skewness were negatively correlated for both lifespan and LRO. The most important life history trait was longevity, which shaped variance and skew in LRO through its effects on variance in lifespan. We found that luck in survival, growth, and fecundity all contributed to variance in LRO, but skew in LRO was overwhelmingly due to survival luck. Rapidly growing populations have larger variances in LRO and lifespan than shrinking populations. Our results indicate that luck‐induced genetic drift may be most severe in recovering populations of species with long mature lifespan and high iteroparity.
more » « less
Full Text Available
Approximation trees: statistical reproducibility in model distillation

https://doi.org/10.1007/s10618-022-00907-3

Zhou, Yichen; Zhou, Zhengze; Hooker, Giles (January 2023, Data Mining and Knowledge Discovery)

Full Text Available
Decision tree boosted varying coefficient models

https://doi.org/10.1007/s10618-022-00863-y

Zhou, Yichen; Hooker, Giles (September 2022, Data Mining and Knowledge Discovery)

Abstract Varying coefficient models are a flexible extension of generic parametric models whose coefficients are functions of a set of effect-modifying covariates instead of fitted constants. They are capable of achieving higher model complexity while preserving the structure of the underlying parametric models, hence generating interpretable predictions. In this paper we study the use of gradient boosted decision trees as those coefficient-deciding functions in varying coefficient models with linearly structured outputs. In contrast to the traditional choices of splines or kernel smoothers, boosted trees are more flexible since they require no structural assumptions in the effect modifier space. We introduce our proposed method from the perspective of a localized version of gradient descent, prove its theoretical consistency under mild assumptions commonly adapted by decision tree research, and empirically demonstrate that the proposed tree boosted varying coefficient models achieve high performance qualified by their training speed, prediction accuracy and intelligibility as compared to several benchmark algorithms.
more » « less
An exact version of Life Table Response Experiment analysis, and the R package exactLTRE

https://doi.org/10.1111/2041-210X.14065

Hernández, Christina M.; Ellner, Stephen P.; Adler, Peter B.; Hooker, Giles; Snyder, Robin E. (March 2023, Methods in Ecology and Evolution)

Abstract Matrix population models are frequently built and used by ecologists to analyse demography and elucidate the processes driving population growth or decline. Life Table Response Experiments (LTREs) are comparative analyses that decompose the realized difference or variance in population growth rate () into contributions from the differences or variances in the vital rates (i.e. the matrix elements). Since their introduction, LTREs have been based on approximations and have not included biologically relevant interaction terms.We used the functional analysis of variance framework to derive an exact LTRE method, which calculates the exact response of to the difference or variance in a given vital rate, for all interactions among vital rates—including higher‐order interactions neglected by the classical methods. We used the publicly available COMADRE and COMPADRE databases to perform a meta‐analysis comparing the results of exact and classical LTRE methods. We analysed 186 and 1487 LTREs for animal and plant matrix population models, respectively.We found that the classical methods often had small errors, but that very high errors were possible. Overall error was related to the difference or variance in the matrices being analysed, consistent with the Taylor series basis of the classical method. Neglected interaction terms accounted for most of the errors in fixed design LTRE, highlighting the importance of two‐way interaction terms. For random design LTRE, errors in the contribution terms present in both classical and exact methods were comparable to errors due to neglected interaction terms. In most examples we analysed, evaluating exact contributions up to three‐way interaction terms was sufficient for interpreting 90% or more of the difference or variance in .Relative error, previously used to evaluate the accuracy of classical LTREs, is not a reliable metric of how closely the classical and exact methods agree. Error compensation between estimated contribution terms and neglected contribution terms can lead to low relative error despite faulty biological interpretation. Trade‐offs or negative covariances among matrix elements can lead to high relative error despite accurate biological interpretation. Exact LTRE provides reliable and accurate biological interpretation, and the R packageexactLTREmakes the exact method accessible to ecologists.
more » « less
Full Text Available
Toward a “modern coexistence theory” for the discrete and spatial

https://doi.org/10.1002/ecm.1548

Ellner, Stephen P.; Snyder, Robin E.; Adler, Peter B.; Hooker, Giles (November 2022, Ecological Monographs)

Full Text Available
Bridging Breiman's Brook: From Algorithmic Modeling to Statistical Learning

https://doi.org/10.1353/obs.2021.0027.

Mentch, Lucas; Hooker, Giles (July 2021, Observational studies)
null (Ed.)
In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures. Twenty years later this division feels far more ephemeral, both in terms of assigning individuals to camps, and in terms of intellectual boundaries. We argue that this is largely due to the "data modelers" incorporating algorithmic methods into their toolbox, particularly driven by recent developments in the statistical understanding of Breiman's own Random Forest methods. While this can be simplistically described as "Breiman won", these same developments also expose the limitations of the prediction-first philosophy that he espoused, making careful statistical analysis all the more important. This paper outlines these exciting recent developments in the random forest literature which, in our view, occurred as a result of a necessary blending of the two ways of thinking Breiman originally described. We also ask what areas statistics and statisticians might currently overlook.
more » « less
Full Text Available
S-LIME: Stabilized-LIME for Model Explanation

https://doi.org/10.1145/3447548.3467274

Zhou, Zhengze; Hooker, Giles; Wang, Fei (August 2021, KDD '21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining)
null (Ed.)
An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME [39], are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.
more » « less
Full Text Available

« Prev Next »

Search for: All records