skip to main content

Title: A multi-species repository of social networks

Social network analysis is an invaluable tool to understand the patterns, evolution, and consequences of sociality. Comparative studies over a range of social systems across multiple taxonomic groups are particularly valuable. Such studies however require quantitative social association or interaction data across multiple species which is not easily available. We introduce the Animal Social Network Repository (ASNR) as the first multi-taxonomic repository that collates 790 social networks from more than 45 species, including those of mammals, reptiles, fish, birds, and insects. The repository was created by consolidating social network datasets from the literature on wild and captive animals into a consistent and easy-to-use network data format. The repository is archived at ASNR has tremendous research potential, including testing hypotheses in the fields of animal ecology, social behavior, epidemiology and evolutionary biology.

; ;
Publication Date:
Journal Name:
Scientific Data
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Personalized (patient-specific) approaches have recently emerged with a precision medicine paradigm that acknowledges the fact that molecular pathway structures and activity might be considerably different within and across tumors. The functional cancer genome and proteome provide rich sources of information to identify patient-specific variations in signaling pathways and activities within and across tumors; however, current analytic methods lack the ability to exploit the diverse and multi-layered architecture of these complex biological networks. We assessed pan-cancer pathway activities for >7700 patients across 32 tumor types from The Cancer Proteome Atlas by developing a personalized cancer-specific integrated network estimation (PRECISE) model. PRECISE is a general Bayesian framework for integrating existing interaction databases, data-drivende novocausal structures, and upstream molecular profiling data to estimate cancer-specific integrated networks, infer patient-specific networks and elicit interpretable pathway-level signatures. PRECISE-based pathway signatures, can delineate pan-cancer commonalities and differences in proteomic network biology within and across tumors, demonstrates robust tumor stratification that is both biologically and clinically informative and superior prognostic power compared to existing approaches. Towards establishing the translational relevance of the functional proteome in research and clinical settings, we provide an online, publicly available, comprehensive database and visualization repository of our findings (

  2. Abstract Background

    Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations.


    Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants.


    Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibratedmore »Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (

    « less
  3. Abstract Background

    Identifying splice site regions is an important step in the genomic DNA sequencing pipelines of biomedical and pharmaceutical research. Within this research purview, efficient and accurate splice site detection is highly desirable, and a variety of computational models have been developed toward this end. Neural network architectures have recently been shown to outperform classical machine learning approaches for the task of splice site prediction. Despite these advances, there is still considerable potential for improvement, especially regarding model prediction accuracy, and error rate.


    Given these deficits, we propose EnsembleSplice, an ensemble learning architecture made up of four (4) distinct convolutional neural networks (CNN) model architecture combination that outperform existing splice site detection methods in the experimental evaluation metrics considered including the accuracies and error rates. We trained and tested a variety of ensembles made up of CNNs and DNNs using the five-fold cross-validation method to identify the model that performed the best across the evaluation and diversity metrics. As a result, we developed our diverse and highly effective splice site (SS) detection model, which we evaluated using two (2) genomicHomo sapiensdatasets and theArabidopsis thalianadataset. The results showed that for of theHomo sapiensEnsembleSplice achieved accuracies of 94.16% for one of themore »acceptor splice sites and 95.97% for donor splice sites, with an error rate for the sameHomo sapiensdataset, 4.03% for the donor splice sites and 5.84% for theacceptor splice sites datasets.


    Our five-fold cross validation ensured the prediction accuracy of our models are consistent. For reproducibility, all the datasets used, models generated, and results in our work are publicly available in our GitHub repository here:

    « less
  4. Abstract Background

    CRISPR-Cas (clustered regularly interspaced short palindromic repeats—CRISPR-associated proteins) systems are adaptive immune systems commonly found in prokaryotes that provide sequence-specific defense against invading mobile genetic elements (MGEs). The memory of these immunological encounters are stored in CRISPR arrays, where spacer sequences record the identity and history of past invaders. Analyzing such CRISPR arrays provide insights into the dynamics of CRISPR-Cas systems and the adaptation of their host bacteria to rapidly changing environments such as the human gut.


    In this study, we utilized 601 publicly availableBacteroides fragilisgenome isolates from 12 healthy individuals, 6 of which include longitudinal observations, and 222 availableB. fragilisreference genomes to update the understanding ofB. fragilisCRISPR-Cas dynamics and their differential activities. Analysis of longitudinal genomic data showed that some CRISPR array structures remained relatively stable over time whereas others involved radical spacer acquisition during some periods, and diverse CRISPR arrays (associated with multiple isolates) co-existed in the same individuals with some persisted over time. Furthermore, features of CRISPR adaptation, evolution, and microdynamics were highlighted through an analysis of host-MGE network, such as modules of multiple MGEs and hosts, reflecting complex interactions betweenB. fragilisand its invaders mediated through the CRISPR-Cas systems.


    We made available of all annotated CRISPR-Casmore »systems and their target MGEs, and their interaction network as a web resource at We anticipate it will become an important resource for studying ofB. fragilis, its CRISPR-Cas systems, and its interaction with mobile genetic elements providing insights into evolutionary dynamics that may shape the species virulence and lead to its pathogenicity.

    « less
  5. Abstract

    This paper describes a set of Near-Real-Time (NRT) Vegetation Index (VI) data products for the Conterminous United States (CONUS) based on Moderate Resolution Imaging Spectroradiometer (MODIS) data from Land, Atmosphere Near-real-time Capability for EOS (LANCE), an openly accessible NASA NRT Earth observation data repository. The data set offers a variety of commonly used VIs, including Normalized Difference Vegetation Index (NDVI), Vegetation Condition Index (VCI), Mean-referenced Vegetation Condition Index (MVCI), Ratio to Median Vegetation Condition Index (RMVCI), and Ratio to previous-year Vegetation Condition Index (RVCI). LANCE enables the NRT monitoring of U.S. cropland vegetation conditions within 24 hours of observation. With more than 20 years of observations, this continuous data set enables geospatial time series analysis and change detection in many research fields such as agricultural monitoring, natural resource conservation, environmental modeling, and Earth system science. The complete set of VI data products described in the paper is openly distributed via Web Map Service (WMS) and Web Coverage Service (WCS) as well as the VegScape web application (