Abstract Fusion learning methods, developed for the purpose of analyzing datasets from many different sources, have become a popular research topic in recent years. Individualized inference approaches through fusion learning extend fusion learning approaches to individualized inference problems over a heterogeneous population, where similar individuals are fused together to enhance the inference over the target individual. Both classical fusion learning and individualized inference approaches through fusion learning are established based on weighted aggregation of individual information, but the weight used in the latter is localized to thetargetindividual. This article provides a review on two individualized inference methods through fusion learning,iFusion andiGroup, that are developed under different asymptotic settings. Both procedures guarantee optimal asymptotic theoretical performance and computational scalability. This article is categorized under:Statistical Learning and Exploratory Methods of the Data Sciences > Manifold LearningStatistical Learning and Exploratory Methods of the Data Sciences > Modeling MethodsStatistical and Graphical Methods of Data Analysis > Nonparametric MethodsData: Types and Structure > Massive Data
more »
« less
A practical guide to understanding and validating complex models using data simulations
Biologists routinely fit novel and complex statistical models to push the limits of our understanding. Examples include, but are not limited to, flexible Bayesian approaches (e.g. BUGS, stan), frequentist and likelihood‐based approaches (e.g. packageslme4) and machine learning methods.These software and programs afford the user greater control and flexibility in tailoring complex hierarchical models. However, this level of control and flexibility places a higher degree of responsibility on the user to evaluate the robustness of their statistical inference. To determine how often biologists are running model diagnostics on hierarchical models, we reviewed 50 recently published papers in 2021 in the journalNature Ecology & Evolution, and we found that the majority of published papers didnotreport any validation of their hierarchical models, making it difficult for the reader to assess the robustness of their inference. This lack of reporting likely stems from a lack of standardized guidance for best practices and standard methods.Here, we provide a guide to understanding and validating complex models using data simulations. To determine how often biologists use data simulation techniques, we also reviewed 50 recently published papers in 2021 in the journalMethods Ecology & Evolution. We found that 78% of the papers that proposed a new estimation technique, package or model used simulations or generated data in some capacity (18 of 23 papers); but very few of those papers (5 of 23 papers) included either a demonstration that the code could recover realistic estimates for a dataset with known parameters or a demonstration of the statistical properties of the approach. To distil the variety of simulations techniques and their uses, we provide a taxonomy of simulation studies based on the intended inference. We also encourage authors to include a basic validation study whenever novel statistical models are used, which in general, is easy to implement.Simulating data helps a researcher gain a deeper understanding of the models and their assumptions and establish the reliability of their estimation approaches. Wider adoption of data simulations by biologists can improve statistical inference, reliability and open science practices.
more »
« less
- Award ID(s):
- 2015273
- PAR ID:
- 10473163
- Publisher / Repository:
- BES Journals
- Date Published:
- Journal Name:
- Methods in Ecology and Evolution
- Volume:
- 14
- Issue:
- 1
- ISSN:
- 2041-210X
- Page Range / eLocation ID:
- 203 to 217
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Bayesian hierarchical models allow ecologists to account for uncertainty and make inference at multiple scales. However, hierarchical models are often computationally intensive to fit, especially with large datasets, and researchers face trade‐offs between capturing ecological complexity in statistical models and implementing these models.We present a recursive Bayesian computing (RB) method that can be used to fit Bayesian models efficiently in sequential MCMC stages to ease computation and streamline hierarchical inference. We also introduce transformation‐assisted RB (TARB) to create unsupervised MCMC algorithms and improve interpretability of parameters. We demonstrate TARB by fitting a hierarchical animal movement model to obtain inference about individual‐ and population‐level migratory characteristics.Our recursive procedure reduced computation time for fitting our hierarchical movement model by half compared to fitting the model with a single MCMC algorithm. We obtained the same inference fitting our model using TARB as we obtained fitting the model with a single algorithm.For complex ecological statistical models, like those for animal movement, multi‐species systems, or large spatial and temporal scales, the computational demands of fitting models with conventional computing techniques can limit model specification, thus hindering scientific discovery. Transformation‐assisted RB is one of the most accessible methods for reducing these limitations, enabling us to implement new statistical models and advance our understanding of complex ecological phenomena.more » « less
-
Abstract Resource selection functions (RSFs) are among the most commonly used statistical tools in both basic and applied animal ecology. They are typically parameterized using animal tracking data, and advances in animal tracking technology have led to increasing levels of autocorrelation between locations in such data sets. Because RSFs assume that data are independent and identically distributed, such autocorrelation can cause misleadingly narrow confidence intervals and biased parameter estimates.Data thinning, generalized estimating equations and step selection functions (SSFs) have been suggested as techniques for mitigating the statistical problems posed by autocorrelation, but these approaches have notable limitations that include statistical inefficiency, unclear or arbitrary targets for adequate levels of statistical independence, constraints in input data and (in the case of SSFs) scale‐dependent inference. To remedy these problems, we introduce a method for likelihood weighting of animal locations to mitigate the negative consequences of autocorrelation on RSFs.In this study, we demonstrate that this method weights each observed location in an animal's movement track according to its level of non‐independence, expanding confidence intervals and reducing bias that can arise when there are missing data in the movement track.Ecologists and conservation biologists can use this method to improve the quality of inferences derived from RSFs. We also provide a complete, annotated analytical workflow to help new users apply our method to their own animal tracking data using thectmm Rpackage.more » « less
-
Abstract Model calibration is crucial for optimizing the performance of complex computer models across various disciplines. In the era of Industry 4.0, symbolizing rapid technological advancement through the integration of advanced digital technologies into industrial processes, model calibration plays a key role in advancing digital twin technology, ensuring alignment between digital representations and real‐world systems. This comprehensive review focuses on the Kennedy and O'Hagan (KOH) framework (Kennedy and O'Hagan, Journal of the Royal Statistical Society: Series B 2001; 63(3):425–464). In particular, we explore recent advancements addressing the challenges of the unidentifiability issue while accommodating model inadequacy within the KOH framework. In addition, we explore recent advancements in adapting the KOH framework to complex scenarios, including those involving multivariate outputs and functional calibration parameters. We also delve into experimental design strategies tailored to the unique demands of model calibration. By offering a comprehensive analysis of the KOH approach and its diverse applications, this review serves as a valuable resource for researchers and practitioners aiming to enhance the accuracy and reliability of their computer models. This article is categorized under:Statistical Models > Semiparametric ModelsStatistical Models > Simulation ModelsStatistical Models > Bayesian Modelsmore » « less
-
Abstract Evolutionary biologists characterize macroevolutionary trends of phenotypic change across the tree of life using phylogenetic comparative methods. However, within‐species variation can complicate such investigations. For this reason, procedures for incorporating nonstructured (random) intraspecific variation have been developed.Likewise, evolutionary biologists seek to understand microevolutionary patterns of phenotypic variation within species, such as sex‐specific differences or allometric trends. Additionally, there is a desire to compare such within‐species patterns across taxa, but current analytical approaches cannot be used to interrogate within‐species patterns while simultaneously accounting for phylogenetic non‐independence. Consequently, deciphering how intraspecific trends evolve remains a challenge.Here we introduce an extended phylogenetic generalized least squares (E‐PGLS) procedure which facilitates comparisons of within‐species patterns across species while simultaneously accounting for phylogenetic non‐independence.Our method uses an expanded phylogenetic covariance matrix, a hierarchical linear model, and permutation methods to obtain empirical sampling distributions and effect sizes for model effects that can evaluate differences in intraspecific trends across species for both univariate and multivariate data, while conditioning them on the phylogeny.The method has appropriate statistical properties for both balanced and imbalanced data. Additionally, the procedure obtains evolutionary covariance estimates that reflect those from existing approaches for nonstructured intraspecific variation. Importantly, E‐PGLS can detect differences in structured (i.e. microevolutionary) intraspecific patterns across species when such trends are present. Thus, E‐PGLS extends the reach of phylogenetic comparative methods into the intraspecific comparative realm, by providing the ability to compare within‐species trends across species while simultaneously accounting for shared evolutionary history.more » « less
An official website of the United States government

