All branches of ecology study relationships among and between environmental and biological variables. However, standard approaches to studying such relationships, based on correlation and regression, provide only some of the complex information contained in the relationships. Other statistical approaches exist that provide a complete description of relationships between variables, based on the concept of the *copula*; they are applied in finance, neuroscience and elsewhere, but rarely in ecology. We explore the concepts that underpin copulas and the potential for those concepts to improve our understanding of ecology. We find that informative copula structure in dependencies between variables is common across all the environmental, species-trait, phenological, population, community, and ecosystem functioning datasets we considered. Many datasets exhibited asymmetric tail associations, whereby two variables were more strongly related in their left compared to right tails, or *vice versa*. We describe mechanisms by which observed copula structure and tail associations can arise in ecological data, including a Moran-like effect whereby dependence structures are inherited from environmental variables; and asymmetric or nonlinear influences of environments on ecological variables, such as under Liebig's law of the minimum. We also describe consequences of copula structure for ecological phenomena, including impacts on extinction risk, Taylor's law, and the temporal stability of ecosystem services. By documenting the importance of a complete description of dependence between variables, advancing conceptual frameworks, and demonstrating a powerful approach, we encourage widespread use of copulas in ecology, which we believe can benefit the discipline.
more »
« less
Uniform Partitioning of Data Grid for Association Detection
Inferring appropriate information from large datasets has become important. In particular, identifying relationships among variables in these datasets has far-reaching impacts. In this paper, we introduce the uniform information coefficient (UIC), which measures the amount of dependence between two multidimensional variables and is able to detect both linear and non-linear associations. Our proposed UIC is inspired by the maximal information coefficient (MIC) \cite{MIC:2011}; however, the MIC was originally designed to measure dependence between two one-dimensional variables. Unlike the MIC calculation that depends on the type of association between two variables, we show that the UIC calculation is less computationally expensive and more robust to the type of association between two variables. The UIC achieves this by replacing the dynamic programming step in the MIC calculation with a simpler technique based on the uniform partitioning of the data grid. This computational efficiency comes at the cost of not maximizing the information coefficient as done by the MIC algorithm. We present theoretical guarantees for the performance of the UIC and a variety of experiments to demonstrate its quality in detecting associations.
more »
« less
- PAR ID:
- 10205736
- Date Published:
- Journal Name:
- IEEE Transactions on Pattern Analysis and Machine Intelligence
- ISSN:
- 0162-8828
- Page Range / eLocation ID:
- 1 to 1
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Extreme climatic events (ECEs) are becoming more frequent and more intense due to climate change. Furthermore, there is reason to believe ECEs may modify tail associations between distinct population vital rates, or between values of an environmental variable measured in different locations. Tail associations between two variables are associations that occur between values in the left or right tails of the distributions of the variables. Two positively associated variables can be principally left‐tail associated (i.e., more correlated when they take low values than when they take high values) or right‐tail associated (more correlated when they take high than low values), even with the same overall correlation coefficient in both cases. We tested, in the context of non‐spatial stage‐structured matrix models, whether tail associations between stage‐specific vital rates may influence extinction risk. We also tested whether the nature of spatial tail associations of environmental variables can influence metapopulation extinction risk. For instance, if low values of an environmental variable reduce the growth rates of local populations, one may expect that left‐tail associations increase metapopulation extinction risks because then environmental catastrophes are spatially synchronized, presumably reducing the potential for rescue effects. For the non‐spatial, stage‐structured models we considered, left‐tail associations between vital rates did accentuate extinction risk compared to right‐tail associations, but the effect was small. In contrast, we showed that density dependence interacts with tail associations to influence metapopulation extinction risk substantially: For population models showing undercompensatory density dependence, left‐tail associations in environmental variables often strongly accentuated and right‐tail associations mitigated extinction risk, whereas the reverse was usually true for models showing overcompensatory density dependence. Tail associations and their asymmetries are taken into account in assessing risks in finance and other fields, but to our knowledge, our study is one of the first to consider how tail associations influence population extinction risk. Our modeling results provide an initial demonstration of a new mechanism influencing extinction risks and, in our view, should help motivate more comprehensive study of the mechanism and its importance for real populations in future work.more » « less
-
Summary Relational arrays represent measures of association between pairs of actors, often in varied contexts or over time. Trade flows between countries, financial transactions between individuals, contact frequencies between school children in classrooms and dynamic protein-protein interactions are all examples of relational arrays. Elements of a relational array are often modelled as a linear function of observable covariates. Uncertainty estimates for regression coefficient estimators, and ideally the coefficient estimators themselves, must account for dependence between elements of the array, e.g., relations involving the same actor. Existing estimators of standard errors that recognize such relational dependence rely on estimating extremely complex, heterogeneous structure across actors. This paper develops a new class of parsimonious coefficient and standard error estimators for regressions of relational arrays. We leverage an exchangeability assumption to derive standard error estimators that pool information across actors, and are substantially more accurate than existing estimators in a variety of settings. This exchangeability assumption is pervasive in network and array models in the statistics literature, but not previously considered when adjusting for dependence in a regression setting with relational data. We demonstrate improvements in inference theoretically, via a simulation study, and by analysis of a dataset involving international trade.more » « less
-
Recently, many regression based conditional independence (CI) test methods have been proposed to solve the problem of causal discovery. These methods provide alternatives to test CI by first removing the information of the controlling set from the two target variables, and then testing the independence between the corresponding residuals Res1 and Res2. When the residuals are linearly uncorrelated, the independence test between them is nontrivial. With the ability to calculate inner product in high-dimensional space, kernel-based methods are usually used to achieve this goal, but still consume considerable time. In this paper, we investigate the independence between two linear combinations under linear non-Gaussian structural equation model. We show that the dependence between the two residuals can be captured by the difference between the similarity of (Res1, Res2) and that of (Res1, Res3) (Res3 is generated by random permutation) in high-dimensional space. With this result, we design a new method called SCIT for CI test, where permutation test is performed to control Type I error rate. The proposed method is simpler yet more efficient and effective than the existing ones. When applied to causal discovery, the proposed method outperforms the counterparts in terms of both speed and Type II error rate, especially in the case of small sample size, which is validated by our extensive experiments on various datasets.more » « less
-
We develop a new method to fit the multivariate response linear regression model that exploits a parametric link between the regression coefficient matrix and the error covariance matrix. Specifically, we assume that the correlations between entries in the multivariate error random vector are proportional to the cosines of the angles between their corresponding re- gression coefficient matrix columns, so as the angle between two regression coefficient matrix columns decreases, the correlation between the corresponding errors increases. We highlight two models under which this parameterization arises: a latent variable reduced-rank regression model and the errors-in-variables regression model. We propose a novel non-convex weighted residual sum of squares criterion which exploits this parameterization and admits a new class of penalized estimators. The optimization is solved with an accelerated proximal gradient de- scent algorithm. Our method is used to study the association between microRNA expression and cancer drug activity measured on the NCI-60 cell lines. An R package implementing our method, MCMVR, is available online.more » « less
An official website of the United States government

