skip to main content


Title: A practical guide to selecting models for exploration, inference, and prediction in ecology
Abstract

Selecting among competing statistical models is a core challenge in science. However, the many possible approaches and techniques for model selection, and the conflicting recommendations for their use, can be confusing. We contend that much confusion surrounding statistical model selection results from failing to first clearly specify the purpose of the analysis. We argue that there are three distinct goals for statistical modeling in ecology: data exploration, inference, and prediction. Once the modeling goal is clearly articulated, an appropriate model selection procedure is easier to identify. We review model selection approaches and highlight their strengths and weaknesses relative to each of the three modeling goals. We then present examples of modeling for exploration, inference, and prediction using a time series of butterfly population counts. These show how a model selection approach flows naturally from the modeling goal, leading to different models selected for different purposes, even with exactly the same data set. This review illustrates best practices for ecologists and should serve as a reminder that statistical recipes cannot substitute for critical thinking or for the use of independent data to test hypotheses and validate predictions.

 
more » « less
Award ID(s):
1933561 1933497
PAR ID:
10450796
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Ecology
Volume:
102
Issue:
6
ISSN:
0012-9658
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We discuss inference after data exploration, with a particular focus on inference after model or variable selection. We review three popular approaches to this problem: sample splitting, simultaneous inference, and conditional selective inference. For each approach, we explain how it works, and highlight its advantages and disadvantages. We also provide an illustration of these post-selection inference approaches. 
    more » « less
  2. We discuss inference after data exploration, with a particular focus on inference after model or variable selection. We review three popular approaches to this problem: sample splitting, simultaneous inference, and conditional selective inference. We explain how each approach works and highlight its advantages and disadvantages. We also provide an illustration of these post-selection inference approaches. 
    more » « less
  3. Significance

    Although practically attractive with high prediction and classification power, complicated learning methods often lack interpretability and reproducibility, limiting their scientific usage. A useful remedy is to select truly important variables contributing to the response of interest. We develop a method for deep learning inference using knockoffs, DeepLINK, to achieve the goal of variable selection with controlled error rate in deep learning models. We show that DeepLINK can also have high power in variable selection with a broad class of model designs. We then apply DeepLINK to three real datasets and produce statistical inference results with both reproducibility and biological meanings, demonstrating its promising usage to a broad range of scientific applications.

     
    more » « less
  4. Abstract

    Representing hydrologic connectivity of non‐floodplain wetlands (NFWs) to downstream waters in process‐based models is an emerging challenge relevant to many research, regulatory, and management activities. We review four case studies that utilize process‐based models developed to simulate NFW hydrology. Models range from a simple, lumped parameter model to a highly complex, fully distributed model. Across case studies, we highlight appropriate application of each model, emphasizing spatial scale, computational demands, process representation, and model limitations. We end with a synthesis of recommended “best modeling practices” to guide model application. These recommendations include: (1) clearly articulate modeling objectives, and revisit and adjust those objectives regularly; (2) develop a conceptualization of NFW connectivity using qualitative observations, empirical data, and process‐based modeling; (3) select a model to represent NFW connectivity by balancing both modeling objectives and available resources; (4) use innovative techniques and data sources to validate and calibrate NFW connectivity simulations; and (5) clearly articulate the limits of the resulting NFW connectivity representation. Our review and synthesis of these case studies highlights modeling approaches that incorporate NFW connectivity, demonstrates tradeoffs in model selection, and ultimately provides actionable guidance for future model application and development.

     
    more » « less
  5. Abstract

    The rapid development of modeling techniques has brought many opportunities for data‐driven discovery and prediction. However, this also leads to the challenge of selecting the most appropriate model for any particular data task. Information criteria, such as the Akaike information criterion (AIC) and Bayesian information criterion (BIC), have been developed as a general class of model selection methods with profound connections with foundational thoughts in statistics and information theory. Many perspectives and theoretical justifications have been developed to understand when and how to use information criteria, which often depend on particular data circumstances. This review article will revisit information criteria by summarizing their key concepts, evaluation metrics, fundamental properties, interconnections, recent advancements, and common misconceptions to enrich the understanding of model selection in general.

    This article is categorized under:

    Data: Types and Structure > Traditional Statistical Data

    Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods

    Statistical and Graphical Methods of Data Analysis > Information Theoretic Methods

    Statistical Models > Model Selection

     
    more » « less