In ecology and evolutionary biology (EEB), the study of developmental plasticity seeks to understand ontogenetic processes underlying the phenotypes upon which natural selection acts. A central challenge to this inquiry is ascertaining a causal effect of the exposure on the manifestation of later-life phenotype due to the time elapsed between the two events. The exposure is a potential cause of the outcome—i.e. an environmental stimulus or experience. The later phenotype might be a behaviour, physiological condition, morphology or life-history trait. The latency period between the exposure and outcome complicates causal inference due to the inevitable occurrence of additional events that may affect the relationship of interest. Here, we describe six distinct but non-mutually exclusive conceptual models from the field of lifecourse epidemiology and discuss their applications to EEB research. The models include Critical Period with No Later Modifiers, Critical Period with Later Modifiers, Accumulation of Risk with Independent Risk Exposures, Accumulation of Risk with Risk Clustering, Accumulation of Risk with Chains of Risk and Accumulation of Risk with Trigger Effect. These models, which have been widely used to test causal hypotheses regarding the early origins of adult-onset disease in humans, are directly relevant to research on developmental plasticity in EEB.
more »
« less
A biologist's guide to model selection and causal inference
A goal of many research programmes in biology is to extract meaningful insights from large, complex datasets. Researchers in ecology, evolution and behavior (EEB) often grapple with long-term, observational datasets from which they construct models to test causal hypotheses about biological processes. Similarly, epidemiologists analyse large, complex observational datasets to understand the distribution and determinants of human health. A key difference in the analytical workflows for these two distinct areas of biology is the delineation of data analysis tasks and explicit use of causal directed acyclic graphs (DAGs), widely adopted by epidemiologists. Here, we review the most recent causal inference literature and describe an analytical workflow that has direct applications for EEB. We start this commentary by defining four distinct analytical tasks (description, prediction, association, causal inference). The remainder of the text is dedicated to causal inference, specifically focusing on the use of DAGs to inform the modelling strategy. Given the increasing interest in causal inference and misperceptions regarding this task, we seek to facilitate an exchange of ideas between disciplinary silos and provide an analytical framework that is particularly relevant for making causal inference from observational data.
more »
« less
- Award ID(s):
- 1856266
- PAR ID:
- 10348514
- Date Published:
- Journal Name:
- Proceedings of the Royal Society B: Biological Sciences
- Volume:
- 288
- Issue:
- 1943
- ISSN:
- 0962-8452
- Page Range / eLocation ID:
- 20202815
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Extracting an individual's scientific knowledge is essential for improving educational assessment and understanding cognitive tasks in engineering activities such as reasoning and decision making. However, knowledge extraction is an almost impossible endeavor if the domain of knowledge and the available observational data are unrestricted. The objective of this paper is to quantify individuals' theory-based causal knowledge from their responses to given questions. Our approach uses directed acyclic graphs (DAGs) to represent causal knowledge for a given theory and a graph-based logistic model that maps individuals' question-specific subgraphs to question responses. We follow a hierarchical Bayesian approach to estimate individuals' DAGs from observations.The method is illustrated using 205 engineering students' responses to questions on fatigue analysis in mechanical parts. In our results, we demonstrate how the developed methodology provides estimates of population-level DAG and DAGs for individual students. This dual representation is essential for remediation since it allows us to identify parts of a theory that a population or individual struggles with and parts they have already mastered. An addendum of the method is that it enables predictions about individuals' responses to new questions based on the inferred individual-specific DAGs. The latter has implications for the descriptive modeling of human problem-solving, a critical ingredient in sociotechnical systems modeling.more » « less
-
Causal inference is a critical research topic across many domains, such as statistics, computer science, education, public policy, and economics, for decades. Nowadays, estimating causal effect from observational data has become an appealing research direction owing to the large amount of available data and low budget requirement, compared with randomized controlled trials. Embraced with the rapidly developed machine learning area, various causal effect estimation methods for observational data have sprung up. In this survey, we provide a comprehensive review of causal inference methods under the potential outcome framework, one of the well-known causal inference frameworks. The methods are divided into two categories depending on whether they require all three assumptions of the potential outcome framework or not. For each category, both the traditional statistical methods and the recent machine learning enhanced methods are discussed and compared. The plausible applications of these methods are also presented, including the applications in advertising, recommendation, medicine, and so on. Moreover, the commonly used benchmark datasets as well as the open-source codes are also summarized, which facilitate researchers and practitioners to explore, evaluate and apply the causal inference methods.more » « less
-
At the biosphere–atmosphere interface, nonlinear interdependencies among components of an ecohydrological complex system can be inferred using multivariate high frequency time series observations. Information flow among these interacting variables allows us to represent the causal dependencies in the form of a directed acyclic graph (DAG). We use high frequency multivariate data at 10 Hz from an eddy covariance instrument located at 25 m above agricultural land in the Midwestern US to quantify the evolutionary dynamics of this complex system using a sequence of DAGs by examining the structural dependency of information flow and the associated functional response. We investigate whether functional differences correspond to structural differences or if there are no functional variations despite the structural differences. We base our analysis on the hypothesis that causal dependencies are instigated through information flow, and the resulting interactions sustain the dynamics and its functionality. To test our hypothesis, we build upon causal structure analysis in the companion paper to characterize the information flow in similarly clustered DAGs from 3-min non-overlapping contiguous windows in the observational data. We characterize functionality as the nature of interactions as discerned through redundant, unique, and synergistic components of information flow. Through this analysis, we find that in turbulence at the biosphere–atmosphere interface, the variables that control the dynamic character of the atmosphere as well as the thermodynamics are driven by non-local conditions, while the scalar transport associated with CO2 and H2O is mainly driven by short-term local conditions.more » « less
-
ABSTRACT Experiments have long been the gold standard for causal inference in Ecology. As Ecology tackles progressively larger problems, however, we are moving beyond the scales at which randomised controlled experiments are feasible. To answer causal questions at scale, we need to also use observational data —something Ecologists tend to view with great scepticism. The major challenge using observational data for causal inference is confounding variables: variables affecting both a causal variable and response of interest. Unmeasured confounders—known or unknown—lead to statistical bias, creating spurious correlations and masking true causal relationships. To combat this omitted variable bias, other disciplines have developed rigorous approaches for causal inference from observational data that flexibly control for broad suites of confounding variables. We show how ecologists can harness some of these methods—causal diagrams to identify confounders coupled with nested sampling and statistical designs—to reduce risks of omitted variable bias. Using an example of estimating warming effects on snails, we show how current methods in Ecology (e.g., mixed models) produce incorrect inferences due to omitted variable bias and how alternative methods can eliminate it, improving causal inferences with weaker assumptions. Our goal is to expand tools for causal inference using observational and imperfect experimental data in Ecology.more » « less