skip to main content


Title: Use and abuse of correlation analyses in microbial ecology
Abstract

Correlation analyses are often included in bioinformatic pipelines as methods for inferring taxon–taxon interactions. In this perspective, we highlight the pitfalls of inferring interactions from covariance and suggest methods, study design considerations, and additional data types for improving high-throughput interaction inferences. We conclude that correlation, even when augmented by other data types, almost never provides reliable information on direct biotic interactions in real-world ecosystems. These bioinformatically inferred associations are useful for reducing the number of potential hypotheses that we might test, but will never preclude the necessity for experimental validation.

 
more » « less
NSF-PAR ID:
10485642
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
The ISME Journal
Volume:
13
Issue:
11
ISSN:
1751-7362
Format(s):
Medium: X Size: p. 2647-2655
Size(s):
["p. 2647-2655"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Reconstructing regulatory networks from expression and interaction data is a major goal of systems biology. While much work has focused on trying to experimentally and computationally determine the set of transcription-factors (TFs) and microRNAs (miRNAs) that regulate genes in these networks, relatively little work has focused on inferring the regulation of miRNAs by TFs. Such regulation can play an important role in several biological processes including development and disease. The main challenge for predicting such interactions is the very small positive training set currently available. Another challenge is the fact that a large fraction of miRNAs are encoded within genes making it hard to determine the specific way in which they are regulated.

    Results

    To enable genome wide predictions of TF–miRNA interactions, we extended semi-supervised machine-learning approaches to integrate a large set of different types of data including sequence, expression, ChIP-seq and epigenetic data. As we show, the methods we develop achieve good performance on both a labeled test set, and when analyzing general co-expression networks. We next analyze mRNA and miRNA cancer expression data, demonstrating the advantage of using the predicted set of interactions for identifying more coherent and relevant modules, genes, and miRNAs. The complete set of predictions is available on the supporting website and can be used by any method that combines miRNAs, genes, and TFs.

    Availability and Implementation

    Code and full set of predictions are available from the supporting website: http://cs.cmu.edu/~mruffalo/tf-mirna/.

    Contact

    zivbj@cs.cmu.edu

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Summary

    Next‐generation sequencing technologies have generated, and continue to produce, an increasingly large corpus of biological data. The data generated are inherently compositional as they convey only relative information dependent upon the capacity of the instrument, experimental design and technical bias. There is considerable information to be gained through network analysis by studying the interactions between components within a system. Network theory methods using compositional data are powerful approaches for quantifying relationships between biological components and their relevance to phenotype, environmental conditions or other external variables. However, many of the statistical assumptions used for network analysis are not designed for compositional data and can bias downstream results. In this mini‐review, we illustrate the utility of network theory in biological systems and investigate modern techniques while introducing researchers to frameworks for implementation. We overview (1) compositional data analysis, (2) data transformations and (3) network theory along with insight on a battery of network types including static‐, temporal‐, sample‐specific‐ and differential‐networks. The intention of this mini‐review is not to provide a comprehensive overview of network methods, rather to introduce microbiology researchers to (semi)‐unsupervised data‐driven approaches for inferring latent structures that may give insight into biological phenomena or abstract mechanics of complex systems.

     
    more » « less
  3. Abstract Aim

    Species with wide distributions spanning the African Guinean and Congolian rain forests are often composed of genetically distinct populations or cryptic species with geographic distributions that mirror the locations of the remaining forest habitats. We used phylogeographic inference and demographic model testing to evaluate diversification models in a widespread rain forest species, the African foam‐nest treefrogChiromantis rufescens.

    Location

    Guinean and Congolian rain forests, West and Central Africa.

    Taxon

    Chiromantis rufescens.

    Methods

    We collected mitochondrial DNA (mtDNA) and single‐nucleotide polymorphism (SNP) data for 130 samples ofC. rufescens. After estimating population structure and inferring species trees using coalescent methods, we tested demographic models to evaluate alternative population divergence histories that varied with respect to gene flow, population size change and periods of isolation and secondary contact. Species distribution models were used to identify the regions of climatic stability that could have served as forest refugia since the last interglacial.

    Results

    Population structure withinC. rufescensresembles the major biogeographic regions of the Guinean and Congolian forests. Coalescent‐based phylogenetic analyses provide strong support for an early divergence between the western Upper Guinean forest and the remaining populations. Demographic inferences support diversification models with gene flow and population size changes even in cases where contemporary populations are currently allopatric, which provides support for forest refugia and barrier models. Species distribution models suggest that forest refugia were available for each of the populations throughout the Pleistocene.

    Main conclusions

    Considering historical demography is essential for understanding population diversification, especially in complex landscapes such as those found in the Guineo–Congolian forest. Population demographic inferences help connect the patterns of genetic variation to diversification model predictions. The diversification history ofC. rufescenswas shaped by a variety of processes, including vicariance from river barriers, forest fragmentation and adaptive evolution along environmental gradients.

     
    more » « less
  4. Abstract Aim

    Adult survival is central to theories explaining latitudinal gradients in life history strategies. Life history theory predicts higher adult survival in tropical than north temperate regions given lower fecundity and parental effort. Early studies were consistent with this prediction, but standard‐effort netting studies in recent decades suggested that apparent survival rates in temperate and tropical regions strongly overlap. Such results do not fit with life history theory. Targeted marking and resighting of breeding adults yielded higher survival estimates in the tropics, but this approach is thought to overestimate survival because it does not sample social and age classes with lower survival. We compared the effect of field methods on tropical survival estimates and their relationships with life history traits.

    Location

    Sabah, Malaysian Borneo.

    Time period

    2008–2016.

    Major taxon

    Passeriformes.

    Methods

    We used standard‐effort netting and resighted individuals of all social and age classes of 18 tropical songbird species over 8 years. We compared apparent survival estimates between these two field methods with differing analytical approaches.

    Results

    Estimated detection and apparent survival probabilities from standard‐effort netting were similar to those from other tropical studies that used standard‐effort netting. Resighting data verified that a high proportion of individuals that were never recaptured in standard‐effort netting remained in the study area, and many were observed breeding. Across all analytical approaches, addition of resighting yielded substantially higher survival estimates than did standard‐effort netting alone. These apparent survival estimates were higher than for temperate zone species, consistent with latitudinal differences in life histories. Moreover, apparent survival estimates from addition of resighting, but not from standard‐effort netting alone, were correlated with parental effort as measured by egg temperature across species.

    Main conclusions

    Inclusion of resighting showed that standard‐effort netting alone can negatively bias apparent survival estimates and obscure life history relationships across latitudes and among tropical species.

     
    more » « less
  5. Abstract Aim

    Among the main biogeographical hypotheses explaining the remarkable diversity of fishes in the Neotropics is the “palaeogeographical hypothesis”, focusing on vicariance, and the “hydrogeological hypothesis”, focusing on geodispersal. Yet while reflecting different processes, they may result in similar biogeographical patterns. We employed a model‐based Bayesian approach to test these alternative hypotheses and determine which shaped the phylogeographical patterns observed in a group of Neotropical fishes.

    Location

    South America.

    Taxon

    Salminus.

    Methods

    We used mitochondrial and nuclear markers to infer phylogenetic relationships and estimate divergence times amongSalminusspecies, associating the results with known geological events. We then employed approximate Bayesian computation (ABC) to explore changes in population size over time, asking whether vicariance or geodispersal events best explain the phylogeographical signature observed in the data. Because geodispersal captures a few individuals from a parental population, which can then expand and lead to a new lineage, we expect to find genetic signatures of a founder event following population expansion under this scenario, but not under vicariance.

    Results

    The analyses suggest that the diversification process inSalminusbegan in Upper Miocene, andABCindicates that it involved both vicariance and geodispersal events: while a vicariance event better explains the phylogeographical structure withinS. brasiliensisand the genetic patterns of differentiation betweenS. sp. Amazon andS. sp. Araguaia, geodispersal appears to have been the most important event structuring lineages ofSalminus hilarii.

    Main Conclusions

    Both vicariance and geodispersal signatures were detected in our biological model, inferring a complex yet realistic demographic history ofSalminuslineages. The correspondence between theABCresults and traditional phylogeographical interpretations provide further confidence in the models drawn and tested. This study reinforces the value of applying anABCframework in phylogeographical studies, particularly for those interested in testing alternative and biologically plausible processes underlying similar biogeographical patterns.

     
    more » « less