skip to main content


Title: Beyond linearity, stability, and equilibrium: The edm package for empirical dynamic modeling and convergent cross-mapping in Stata
How can social and health researchers study complex dynamic systems that function in nonlinear and even chaotic ways? Common methods, such as experiments and equation-based models, may be ill-suited to this task. To address the limitations of existing methods and offer nonparametric tools for characterizing and testing causality in nonlinear dynamic systems, we introduce the edm command in Stata. This command implements three key empirical dynamic modeling (EDM) methods for time series and panel data: 1) simplex projection, which characterizes the dimensionality of a system and the degree to which it appears to function deterministically; 2) S-maps, which quantify the degree of nonlinearity in a system; and 3) convergent cross-mapping, which offers a nonparametric approach to modeling causal effects. We illustrate these methods using simulated data on daily Chicago temperature and crime, showing an effect of temperature on crime but not the reverse. We conclude by discussing how EDM allows checking the assumptions of traditional model-based methods, such as residual autocorrelation tests, and we advocate for EDM because it does not assume linearity, stability, or equilibrium.  more » « less
Award ID(s):
1655203 1660584
NSF-PAR ID:
10298676
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
The Stata Journal: Promoting communications on statistics and Stata
Volume:
21
Issue:
1
ISSN:
1536-867X
Page Range / eLocation ID:
220 to 258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Methane dynamics within salt marshes are complex because vegetation types, temperature, oscillating water levels, and changes in salinity and redox conditions influence CH4production, consumption, oxidation, and emissions. These non‐linear and complex interactions among variables affect the traditionally expected functional relationships and present challenges for interpreting and developing process‐based models. We employed empirical dynamic modeling (EDM) and convergent cross mapping (CCM) as a novel approach for characterizing seasonal/multiday and diurnal CH4dynamics by inferring causal variables, lags, and interconnections among multiple biophysical variables within a temperate salt marsh using 5 years of eddy covariance data. EDM/CCM is a nonparametric approach capable of quantifying the coupling between variables while determining time scales where variable interactions are the most relevant. We found that gross primary productivity, tidal creek dissolved oxygen, and temperature were important for seasonal/multiday dynamics (rho = 0.73–0.80), while water level was most important for diurnal dynamics during both the growing and dormancy phenoperiods (rho = 0.72 and 0.56, respectively). Lags for the top‐ranked variables (i.e., gross primary productivity, dissolved oxygen, temperature, water level) occurred between 1 and 5 weeks at the seasonal scale and 1–24 hr at the diurnal scale. The EDM had high prediction capabilities for intra‐/inter‐seasonal patterns and annual CH4sums but had limitations in representing large, infrequent fluxes. Results highlight the importance of non‐linearity, drivers, lag times, and interconnections among multiple biophysical variables that regulate CH4fluxes in tidal wetlands. This research introduces a novel approach to examining CH4fluxes, which will aid in evaluating current paradigms in wetlands and other ecosystems.

     
    more » « less
  2. Proteins and nucleic acids participate in essentially every biochemical process in living organisms, and the elucidation of their structure and motions is essential for our understanding how these molecular machines perform their function. Nuclear Magnetic Resonance (NMR) spectroscopy is a powerful versatile technique that provides critical information on the molecular structure and dynamics. Spin-relaxation data are used to determine the overall rotational diffusion and local motions of biological macromolecules, while residual dipolar couplings (RDCs) reveal local and long-range structural architecture of these molecules and their complexes. This information allows researchers to refine structures of proteins and nucleic acids and provides restraints for molecular docking. Several software packages have been developed by NMR researchers in order to tackle the complicated experimental data analysis and structure modeling. However, many of them are offline packages or command-line applications that require users to set up the run time environment and also to possess certain programming skills, which inevitably limits accessibility of this software to a broad scientific community. Here we present new science gateways designed for NMR/structural biology community that address these current limitations in NMR data analysis. Using the GenApp technology for scientific gateways (https://genapp.rocks), we successfully transformed ROTDIF and ALTENS, two offline packages for bio-NMR data analysis, into science gateways that provide advanced computational functionalities, cloud-based data management, and interactive 2D and 3D plotting and visualizations. Furthermore, these gateways are integrated with molecular structure visualization tools (Jmol) and with gateways/engines (SASSIE-web) capable of generating huge computer-simulated structural ensembles of proteins and nucleic acids. This enables researchers to seamlessly incorporate conformational ensembles into the analysis in order to adequately take into account structural heterogeneity and dynamic nature of biological macromolecules. ROTDIF-web offers a versatile set of integrated modules/tools for determining and predicting molecular rotational diffusion tensors and model-free characterization of bond dynamics in biomacromolecules and for docking of molecular complexes driven by the information extracted from NMR relaxation data. ALTENS allows characterization of the molecular alignment under anisotropic conditions, which enables researchers to obtain accurate local and long-range bond-vector restraints for refining 3-D structures of macromolecules and their complexes. We will describe our experience bringing our programs into GenApp and illustrate the use of these gateways for specific examples of protein systems of high biological significance. We expect these gateways to be useful to structural biologists and biophysicists as well as NMR community and to stimulate other researchers to share their scientific software in a similar way. 
    more » « less
  3. Abstract Particle filters avoid parametric estimates for Bayesian posterior densities, which alleviates Gaussian assumptions in nonlinear regimes. These methods, however, are more sensitive to sampling errors than Gaussian-based techniques such as ensemble Kalman filters. A recent study by the authors introduced an iterative strategy for particle filters that match posterior moments—where iterations improve the filter’s ability to draw samples from non-Gaussian posterior densities. The iterations follow from a factorization of particle weights, providing a natural framework for combining particle filters with alternative filters to mitigate the impact of sampling errors. The current study introduces a novel approach to forming an adaptive hybrid data assimilation methodology, exploiting the theoretical strengths of nonparametric and parametric filters. At each data assimilation cycle, the iterative particle filter performs a sequence of updates while the prior sample distribution is non-Gaussian, then an ensemble Kalman filter provides the final adjustment when Gaussian distributions for marginal quantities are detected. The method employs the Shapiro–Wilk test to determine when to make the transition between filter algorithms, which has outstanding power for detecting departures from normality. Experiments using low-dimensional models demonstrate that the approach has a significant value, especially for nonhomogeneous observation networks and unknown model process errors. Moreover, hybrid factors are extended to consider marginals of more than one collocated variables using a test for multivariate normality. Findings from this study motivate the use of the proposed method for geophysical problems characterized by diverse observation networks and various dynamic instabilities, such as numerical weather prediction models. Significance Statement Data assimilation statistically processes observation errors and model forecast errors to provide optimal initial conditions for the forecast, playing a critical role in numerical weather forecasting. The ensemble Kalman filter, which has been widely adopted and developed in many operational centers, assumes Gaussianity of the prior distribution and solves a linear system of equations, leading to bias in strong nonlinear regimes. On the other hand, particle filters avoid many of those assumptions but are sensitive to sampling errors and are computationally expensive. We propose an adaptive hybrid strategy that combines their advantages and minimizes the disadvantages of the two methods. The hybrid particle filter–ensemble Kalman filter is achieved with the Shapiro–Wilk test to detect the Gaussianity of the ensemble members and determine the timing of the transition between these filter updates. Demonstrations in this study show that the proposed method is advantageous when observations are heterogeneous and when the model has an unknown bias. Furthermore, by extending the statistical hypothesis test to the test for multivariate normality, we consider marginals of more than one collocated variable. These results encourage further testing for real geophysical problems characterized by various dynamic instabilities, such as real numerical weather prediction models. 
    more » « less
  4. Economic models often depend on quantities that are unobservable, either for privacy reasons or because they are difficult to measure. Examples of such variables include human capital (or ability), personal income, unobserved heterogeneity (such as consumer “types”), et cetera. This situation has historically been handled either by simply using observable imperfect proxies for each of the unobservables, or by assuming that such unobservables satisfy convenient conditional mean or independence assumptions that enable their elimination from the estimation problem. However, thanks to tremendous increases in both the amount of data available and computing power, it has become possible to take full advantage of recent formal methods to infer the statistical properties of unobservable variables from multiple imperfect measurements of them. The general framework used is the concept of measurement systems in which a vector of observed variables is expressed as a (possibly nonlinear or nonparametric) function of a vector of all unobserved variables (including unobserved error terms or “disturbances” that may have nonadditively separable affects). The framework emphasizes important connections with related fields, such as nonlinear panel data, limited dependent variables, game theoretic models, dynamic models, and set identification. This review reports the progress made toward the central question of whether there exist plausible assumptions under which one can identify the joint distribution of the unobservables from the knowledge of the joint distribution of the observables. It also overviews empirical efforts aimed at exploiting such identification results to deliver novel findings that formally account for the unavoidable presence of unobservables. (JEL C30, C55, C57, D12, E21, E23, J24) 
    more » « less
  5. Abstract

    Estimating and predicting the state of the atmosphere is a probabilistic problem for which an ensemble modeling approach often is taken to represent uncertainty in the system. Common methods for examining uncertainty and assessing performance for ensembles emphasize pointwise statistics or marginal distributions. However, these methods lose specific information about individual ensemble members. This paper explores contour band depth (cBD), a method of analyzing uncertainty in terms of contours of scalar fields. cBD is fully nonparametric and induces an ordering on ensemble members that leads to box-and-whisker-plot-type visualizations of uncertainty for two-dimensional data. By applying cBD to synthetic ensembles, we demonstrate that it provides enhanced information about the spatial structure of ensemble uncertainty. We also find that the usefulness of the cBD analysis depends on the presence of multiple modes and multiple scales in the ensemble of contours. Finally, we apply cBD to compare various convection-permitting forecasts from different ensemble prediction systems and find that the value it provides in real-world applications compared to standard analysis methods exhibits clear limitations. In some cases, contour boxplots can provide deeper insight into differences in spatial characteristics between the different ensemble forecasts. Nevertheless, identification of outliers using cBD is not always intuitive, and the method can be especially challenging to implement for flow that exhibits multiple spatial scales (e.g., discrete convective cells embedded within a mesoscale weather system).

    Significance Statement

    Predictions of Earth’s atmosphere inherently come with some degree of uncertainty owing to incomplete observations and the chaotic nature of the system. Understanding that uncertainty is critical when drawing scientific conclusions or making policy decisions from model predictions. In this study, we explore a method for describing model uncertainty when the quantities of interest are well represented by contours. The method yields a quantitative visualization of uncertainty in both the location and the shape of contours to an extent that is not possible with standard uncertainty quantification methods and may eventually prove useful for the development of more robust techniques for evaluating and validating numerical weather models.

     
    more » « less