skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Title: NetGAM: Using generalized additive models to improve the predictive power of ecological network analyses constructed using time-series data
Abstract Ecological network analyses are used to identify potential biotic interactions between microorganisms from species abundance data. These analyses are often carried out using time-series data; however, time-series networks have unique statistical challenges. Time-dependent species abundance data can lead to species co-occurrence patterns that are not a result of direct, biotic associations and may therefore result in inaccurate network predictions. Here, we describe a generalize additive model (GAM)-based data transformation that removes time-series signals from species abundance data prior to running network analyses. Validation of the transformation was carried out by generating mock, time-series datasets, with an underlying covariance structure, running network analyses on these datasets with and without our GAM transformation, and comparing the network outputs to the known covariance structure of the simulated data. The results revealed that seasonal abundance patterns substantially decreased the accuracy of the inferred networks. In addition, the GAM transformation increased the predictive power (F1 score) of inferred ecological networks on average and improved the ability of network inference methods to capture important features of network structure. This study underscores the importance of considering temporal features when carrying out network analyses and describes a simple, effective tool that can be used to improve results.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ISME Communications
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Transcriptome studies that provide temporal information about transcript abundance facilitate identification of gene regulatory networks (GRNs). Inferring GRNs from time series data using computational modeling remains a central challenge in systems biology. Commonly employed clustering algorithms identify modules of like-responding genes but do not provide information on how these modules are interconnected. These methods also require users to specify parameters such as cluster number and size, adding complexity to the analysis. To address these challenges, we used a recently developed algorithm, partitioned local depth (PaLD), to generate cohesive networks for 4 time series transcriptome datasets (3 hormone and 1 abiotic stress dataset) from the model plant Arabidopsis thaliana. PaLD provided a cohesive network representation of the data, revealing networks with distinct structures and varying numbers of connections between transcripts. We utilized the networks to make predictions about GRNs by examining local neighborhoods of transcripts with highly similar temporal responses. We also partitioned the networks into groups of like-responding transcripts and identified enriched functional and regulatory features in them. Comparison of groups to clusters generated by commonly used approaches indicated that these methods identified modules of transcripts that have similar temporal and biological features, but also identified unique groups, suggesting that a PaLD-based approach (supplemented with a community detection algorithm) can complement existing methods. These results revealed that PaLD could sort like-responding transcripts into biologically meaningful neighborhoods and groups while requiring minimal user input and producing cohesive network structure, offering an additional tool to the systems biology community to predict GRNs.

    more » « less
  2. Dietary DNA metabarcoding enables researchers to identify and characterize trophic interactions with a high degree of taxonomic precision. It is also sensitive to sources of bias and contamination in the field and lab. One of the earliest and most common strategies for dealing with such sensitivities has been to filter resulting sequence data to remove low-abundance sequences before conducting ecological analyses based on the presence or absence of food taxa. Although this step is now often perceived to be both necessary and sufficient for cleaning up datasets, evidence to support this perception is lacking and more attention needs to be paid to the related risk of introducing other undesirable errors. Using computer simulations, we demonstrate that common strategies to remove low-abundance sequences can erroneously eliminate true dietary sequences in ways that impact downstream dietary inferences. Using real data from well-studied wildlife populations in Yellowstone National Park, we further show how these strategies can markedly alter the composition of individual dietary profiles in ways that scale-up to obscure ecological interpretations about dietary generalism, specialism, and niche partitioning. Although the practice of removing low-abundance sequences may continue to be a useful strategy to address a subset of research questions that focus on a subset of relatively abundant food resources, its continued widespread use risks generating misleading perceptions about the structure of trophic networks. Researchers working with dietary DNA metabarcoding data—or similar data such as environmental DNA, microbiomes, or pathobiomes—should be aware of potential drawbacks and consider alternative bioinformatic, experimental, and statistical solutions. We used fecal DNA metabarcoding to characterize the diets of bison and bighorn sheep in winter and summer. Our analyses are based on 35 samples (median per species per season = 10) analyzed using the P6 loop of the chloroplast trnL(UAA) intron together with publicly available plant reference data (Illumina sequence read data are available at NCBI (BioProject: PRJNA780500)). Obicut was used to trim reads with a minimum quality threshold of 30, and primers were removed from forward and reverse reads using cutadapt. All further sequence identifications were performed using obitools; forward and reverse sequences were aligned using the illuminapairedend command using a minimum alignment score of 40, and only joined sequences retained. We used the obiuniq command to group identical sequences and tally them within samples, enabling us to quantify the relative read abundance (RRA) of each sequence. Sequences that occurred ≤2 times overall or that were ≤8 bp were discarded. Sequences were considered to be likely PCR artifacts if they were highly similar to another sequence (1 bp difference) and had a much lower abundance (0.05%) in the majority of samples in which they occurred; we discarded these sequences using the obiclean command. Overall, we characterized 357 plant sequences and a subset of 355 sequences were retained in the dataset after rarefying samples to equal sequencing depth. We then applied relative read abundance thresholds from 0% to 5% to the fecal samples. We compared differences in the inferred dietary richness within and between species based on individual samples, based on average richness across samples, and based on the total richness of each population after accounting for differences in sample size. The readme file contains an explanation of each of the variables in the dataset. Information on the methodology can be found in the associated manuscript referenced above.  
    more » « less
  3. Abstract

    Microbial planktonic communities are the basis of food webs in aquatic ecosystems since they contribute substantially to primary production and nutrient recycling. Network analyses of DNA metabarcoding data sets emerged as a powerful tool to untangle the complex ecological relationships among the key players in food webs. In this study, we evaluated co‐occurrence networks constructed from time‐series metabarcoding data sets (12 months, biweekly sampling) of protistan plankton communities in surface layers (epilimnion) and bottom waters (hypolimnion) of two temperate deep lakes, Lake Mondsee (Austria) and Lake Zurich (Switzerland). Lake Zurich plankton communities were less tightly connected, more fragmented and had a higher susceptibility to a species extinction scenario compared to Lake Mondsee communities. We interpret these results as a lower robustness of Lake Zurich protistan plankton to environmental stressors, especially stressors resulting from climate change. In all networks, the phylum Ciliophora contributed the highest number of nodes, among them several in key positions of the networks. Associations in ciliate‐specific subnetworks resembled autecological species‐specific traits that indicate adaptions to specific environmental conditions. We demonstrate the strength of co‐occurrence network analyses to deepen our understanding of plankton community dynamics in lakes and indicate biotic relationships, which resulted in new hypotheses that may guide future research in climate‐stressed ecosystems.

    more » « less
  4. Abstract

    Riverscape genetics, which applies concepts in landscape genetics to riverine ecosystems, lack appropriate quantitative methods that address the spatial autocorrelation structure of linear stream networks and account for bidirectional geneflow. To address these challenges, we present a general framework for the design and analysis of riverscape genetic studies. Our framework starts with the estimation of pairwise genetic distance at sample sites and the development of a spatially structured ecological network (SSEN) on which riverscape covariates are measured. We then introduce the novel bidirectional geneflow in riverscapes (BGR) model that uses principles of isolation‐by‐resistance to quantify the effects of environmental covariates on genetic connectivity, with spatial covariance defined using simultaneous autoregressive models on the SSEN and the generalized Wishart distribution to model pairwise distance matrices arising through a random walk model of geneflow. We highlight the utility of this framework in an analysis of riverscape genetics for brook trout (Salvelinus fontinalis) in north central Pennsylvania, USA. Using the fixation index (FST) as the measure of genetic distance, we estimated the effects of 12 riverscape covariates on geneflow by evaluating the relative support of eight competing BGR models. We then compared the performance of the top‐ranked BGR model to results obtained from comparable analyses using multiple regression on distance matrices (MRM) and the program STRUCTURE. We found that the BGR model had more power to detect covariate effects, particularly for variables that were only partial barriers to geneflow and/or uncommon in the riverscape, making it more informative for assessing patterns of population connectivity and identifying threats to species conservation. This case study highlights the utility of our modeling framework over other quantitative methods in riverscape genetics, particularly the ability to rigorously test hypotheses about factors that influence geneflow and probabilistically estimate the effect of riverscape covariates, including stream flow direction. This framework is flexible across taxa and riverine networks, is easily executable, and provides intuitive results that can be used to investigate the likely outcomes of current and future management scenarios.

    more » « less
  5. Abstract Aims

    Both ecological drift and environmental heterogeneity can produce high beta diversity among communities, but only the effect of drift is expected to be enhanced in communities of small size. Few studies have explicitly tested the influence of community size on patterns of beta diversity. Here we applied a series of analyses aimed at testing the influence of drift versus environmental heterogeneity on beta diversity among tree communities on islands of variable size.


    Thousand Island Lake, Zhejiang Province, China.


    We used data on mapped tree communities and environmental conditions for 20 small islands (<1 ha) and nine large islands (>1 ha) created via the construction of a hydroelectric dam in 1959. Beta diversity was calculated using abundance‐based multiple‐site dissimilarity based on the Bray–Curtis index. On the basis of the hypothesis of ecological drift among small islands, we tested for higher beta diversity among small than large islands using: (a) raw data (b) controlling for the number of individual sampled on a given island, and (c) controlling for the contiguous sampling area and thus for intra‐island environmental heterogeneity. We also tested the prediction that the relationship between species composition and environmental variables should be weaker on small islands using canonical correspondence analyses.


    Using raw data and controlling for the number of individuals, community dissimilarity was significantly greater among small islands than among large islands. However, when controlling for contiguous sampling area this difference disappeared. Contrary to the prediction based on ecological drift, the strength of overall composition–environment relationships was not significantly weaker for small islands in any of the analyses, and environmental heterogeneity increased faster with area among small islands than among large islands.

    Main Conclusions

    Despite a result using raw data that was consistent with the hypothesis of ecological drift, our full set of results clearly indicated the high beta diversity among small islands was more likely due to environmental heterogeneity rather than ecological drift. This result points to a clear need to control for sampling area among habitats of different size when testing for statistical signatures of drift.

    more » « less