skip to main content


Title: A multi-species repository of social networks
Abstract

Social network analysis is an invaluable tool to understand the patterns, evolution, and consequences of sociality. Comparative studies over a range of social systems across multiple taxonomic groups are particularly valuable. Such studies however require quantitative social association or interaction data across multiple species which is not easily available. We introduce the Animal Social Network Repository (ASNR) as the first multi-taxonomic repository that collates 790 social networks from more than 45 species, including those of mammals, reptiles, fish, birds, and insects. The repository was created by consolidating social network datasets from the literature on wild and captive animals into a consistent and easy-to-use network data format. The repository is archived athttps://bansallab.github.io/asnr/. ASNR has tremendous research potential, including testing hypotheses in the fields of animal ecology, social behavior, epidemiology and evolutionary biology.

 
more » « less
NSF-PAR ID:
10153723
Author(s) / Creator(s):
; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Data
Volume:
6
Issue:
1
ISSN:
2052-4463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Personalized (patient-specific) approaches have recently emerged with a precision medicine paradigm that acknowledges the fact that molecular pathway structures and activity might be considerably different within and across tumors. The functional cancer genome and proteome provide rich sources of information to identify patient-specific variations in signaling pathways and activities within and across tumors; however, current analytic methods lack the ability to exploit the diverse and multi-layered architecture of these complex biological networks. We assessed pan-cancer pathway activities for >7700 patients across 32 tumor types from The Cancer Proteome Atlas by developing a personalized cancer-specific integrated network estimation (PRECISE) model. PRECISE is a general Bayesian framework for integrating existing interaction databases, data-drivende novocausal structures, and upstream molecular profiling data to estimate cancer-specific integrated networks, infer patient-specific networks and elicit interpretable pathway-level signatures. PRECISE-based pathway signatures, can delineate pan-cancer commonalities and differences in proteomic network biology within and across tumors, demonstrates robust tumor stratification that is both biologically and clinically informative and superior prognostic power compared to existing approaches. Towards establishing the translational relevance of the functional proteome in research and clinical settings, we provide an online, publicly available, comprehensive database and visualization repository of our findings (https://mjha.shinyapps.io/PRECISE/).

     
    more » « less
  2. Abstract

    Environmental DNA (eDNA) metabarcoding is a promising method to monitor species and community diversity that is rapid, affordable and non‐invasive. The longstanding needs of the eDNA community are modular informatics tools, comprehensive and customizable reference databases, flexibility across high‐throughput sequencing platforms, fast multilocus metabarcode processing and accurate taxonomic assignment. Improvements in bioinformatics tools make addressing each of these demands within a single toolkit a reality.

    The new modular metabarcode sequence toolkitAnacapa(https://github.com/limey-bean/Anacapa/) addresses the above needs, allowing users to build comprehensive reference databases and assign taxonomy to raw multilocus metabarcode sequence data. A novel aspect ofAnacapais its database building module, “Creating Reference libraries Using eXisting tools” (CRUX), which generates comprehensive reference databases for specific user‐defined metabarcoding loci. TheQuality Control and ASV Parsingmodule sorts and processes multiple metabarcoding loci and processes merged, unmerged and unpaired reads maximizing recovered diversity.DADA2then detects amplicon sequence variants (ASVs) and theAnacapa Classifiermodule aligns these ASVs toCRUX‐generated reference databases usingBowtie2. Lastly, taxonomy is assigned to ASVs with confidence scores using a Bayesian Lowest Common Ancestor (BLCA) method. TheAnacapa Toolkitalso includes anrpackage,ranacapa, for automated results exploration through standard biodiversity statistical analysis.

    Benchmarking tests verify that theAnacapa Toolkiteffectively and efficiently generates comprehensive reference databases that capture taxonomic diversity, and can assign taxonomy to both MiSeq and HiSeq‐length sequence data. We demonstrate the value of theAnacapa Toolkitin assigning taxonomy to seawater eDNA samples collected in southern California.

    TheAnacapa Toolkitimproves the functionality of eDNA and streamlines biodiversity assessment and management by generating metabarcode specific databases, processing multilocus data, retaining a larger proportion of sequencing reads and expanding non‐traditional eDNA targets. All the components of theAnacapa Toolkitare open and available in a virtual container to ease installation.

     
    more » « less
  3. Abstract Background

    Identifying splice site regions is an important step in the genomic DNA sequencing pipelines of biomedical and pharmaceutical research. Within this research purview, efficient and accurate splice site detection is highly desirable, and a variety of computational models have been developed toward this end. Neural network architectures have recently been shown to outperform classical machine learning approaches for the task of splice site prediction. Despite these advances, there is still considerable potential for improvement, especially regarding model prediction accuracy, and error rate.

    Results

    Given these deficits, we propose EnsembleSplice, an ensemble learning architecture made up of four (4) distinct convolutional neural networks (CNN) model architecture combination that outperform existing splice site detection methods in the experimental evaluation metrics considered including the accuracies and error rates. We trained and tested a variety of ensembles made up of CNNs and DNNs using the five-fold cross-validation method to identify the model that performed the best across the evaluation and diversity metrics. As a result, we developed our diverse and highly effective splice site (SS) detection model, which we evaluated using two (2) genomicHomo sapiensdatasets and theArabidopsis thalianadataset. The results showed that for of theHomo sapiensEnsembleSplice achieved accuracies of 94.16% for one of the acceptor splice sites and 95.97% for donor splice sites, with an error rate for the sameHomo sapiensdataset, 4.03% for the donor splice sites and 5.84% for theacceptor splice sites datasets.

    Conclusions

    Our five-fold cross validation ensured the prediction accuracy of our models are consistent. For reproducibility, all the datasets used, models generated, and results in our work are publicly available in our GitHub repository here:https://github.com/OluwadareLab/EnsembleSplice

     
    more » « less
  4. Abstract

    There is a cross‐sectoral push among conservationists to simultaneously mitigate biodiversity loss and climate change, especially as the latter increasingly threatens the former. Growing evidence demonstrates that animals can have substantial impacts on carbon cycling. As such, there are increasing calls to use animal conservation and rewilding to dually overcome biodiversity loss and mitigate climate change.

    Specifically, trophic rewilding—which involves restoring intact animal communities, functional roles and trophic structure within food webs, and natural ecosystem processes—utilizes a rewilding framework to simultaneously support biodiversity conservation and carbon capture and storage. Trophic rewilding is a complex conservation approach to mitigating climate change, involving accurate estimations of baseline conditions and continuous monitoring of carbon cycling and species impacts within a system. It is also predicated on garnering social support for both the reintroduction and monitoring of a species, and obtaining the animals themselves.

    We are excited by the growing interest in this potential, but emphasize that a species' net impact on ecosystem carbon dynamics is context‐dependent. Caution is required whenever biodiversity conservation (including rewilding), climate change mitigation, and human welfare do not readily align. Hence—similar to other nature‐based solutions—these burgeoning efforts must avoid sweeping generalizations.

    To bolster successful trophic rewilding, we highlight a range of social and ecological context dependencies that can vary outcomes in a rewilded carbon cycle and provide ethical considerations for successful implementation.

    We conclude with an overview of the available technology to predict and monitor progress toward both biodiversity and climate mitigation goals.

    Read the freePlain Language Summaryfor this article on the Journal blog.

     
    more » « less
  5. Abstract Background

    Crop improvement through cross-population genomic prediction and genome editing requires identification of causal variants at high resolution, within fewer than hundreds of base pairs. Most genetic mapping studies have generally lacked such resolution. In contrast, evolutionary approaches can detect genetic effects at high resolution, but they are limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Here we use genomic annotations to accurately predict nucleotide conservation across angiosperms, as a proxy for fitness effect of mutations.

    Results

    Using only sequence analysis, we annotate nonsynonymous mutations in 25,824 maize gene models, with information from bioinformatics and deep learning. Our predictions are validated by experimental information: within-species conservation, chromatin accessibility, and gene expression. According to gene ontology and pathway enrichment analyses, predicted nucleotide conservation points to genes in central carbon metabolism. Importantly, it improves genomic prediction for fitness-related traits such as grain yield, in elite maize panels, by stringent prioritization of fewer than 1% of single-site variants.

    Conclusions

    Our results suggest that predicting nucleotide conservation across angiosperms may effectively prioritize sites most likely to impact fitness-related traits in crops, without being limited by shifting selection, missing data, and low depth of multiple-sequence alignments. Our approach—Prediction of mutation Impact by Calibrated Nucleotide Conservation (PICNC)—could be useful to select polymorphisms for accurate genomic prediction, and candidate mutations for efficient base editing. The trained PICNC models and predicted nucleotide conservation at protein-coding SNPs in maize are publicly available in CyVerse (https://doi.org/10.25739/hybz-2957).

     
    more » « less