Title: EPIC-CoGe: managing and analyzing genomic data
AbstractSummary
The EPIC-CoGe browser is a web-based genome visualization utility that integrates the GMOD JBrowse genome browser with the extensive CoGe genome database (currently containing over 30 000 genomes). In addition, the EPIC-CoGe browser boasts many additional features over basic JBrowse, including enhanced search capability and on-the-fly analyses for comparisons and analyses between all types of functional and diversity genomics data. There is no installation required and data (genome, annotation, functional genomic and diversity data) can be loaded by following a simple point and click wizard, or using a REST API, making the browser widely accessible and easy to use by researchers of all computational skill levels. In addition, EPIC-CoGe and data tracks are easily embedded in other websites and JBrowse instances.
Availability and implementation
EPIC-CoGe Browser is freely available for use online through CoGe (https://genomevolution.org). Source code (MIT open source) is available: https://github.com/LyonsLab/coge.
Supplementary information
Supplementary data are available at Bioinformatics online.
Genome browsers are an essential tool in genome analysis. Modern genome browsers enable complex and interactive visualization of a wide variety of genomic data modalities. While such browsers are very powerful, they can be challenging to configure and program for bioinformaticians lacking expertise in web development.
Results
We have developed an R package that provides an interface to the JBrowse 2 genome browser. The package can be used to configure and customize the browser entirely with R code. The browser can be deployed from the R console, or embedded in Shiny applications or R Markdown documents.
Availability and implementation
JBrowseR is available for download from CRAN, and the source code is openly available from the Github repository at https://github.com/GMOD/JBrowseR/.
Accurately representing biological networks in a low-dimensional space, also known as network embedding, is a critical step in network-based machine learning and is carried out widely using node2vec, an unsupervised method based on biased random walks. However, while many networks, including functional gene interaction networks, are dense, weighted graphs, node2vec is fundamentally limited in its ability to use edge weights during the biased random walk generation process, thus under-using all the information in the network.
Results
Here, we present node2vec+, a natural extension of node2vec that accounts for edge weights when calculating walk biases and reduces to node2vec in the cases of unweighted graphs or unbiased walks. Using two synthetic datasets, we empirically show that node2vec+ is more robust to additive noise than node2vec in weighted graphs. Then, using genome-scale functional gene networks to solve a wide range of gene function and disease prediction tasks, we demonstrate the superior performance of node2vec+ over node2vec in the case of weighted graphs. Notably, due to the limited amount of training data in the gene classification tasks, graph neural networks such as GCN and GraphSAGE are outperformed by both node2vec and node2vec+.
Availability and implementation
The data and code are available on GitHub at https://github.com/krishnanlab/node2vecplus_benchmarks. All additional data underlying this article are available on Zenodo at https://doi.org/10.5281/zenodo.7007164.
Supplementary information
Supplementary data are available at Bioinformatics online.
Femenias, Martin M.; Santos, Juan C.; Sites, Jr, Jack W.; Avila, Luciano J.; Morando, Mariana; Kendziorski, ed., Christina(
, Bioinformatics)
AbstractMotivation
Transposable elements (TEs) are ubiquitous in genomes and many remain active. TEs comprise an important fraction of the transcriptomes with potential effects on the host genome, either by generating deleterious mutations or promoting evolutionary novelties. However, their functional study is limited by the difficulty in their identification and quantification, particularly in non-model organisms.
Results
We developed a new pipeline [explore active transposable elements (ExplorATE)] implemented in R and bash that allows the quantification of active TEs in both model and non-model organisms. ExplorATE creates TE-specific indexes and uses the Selective Alignment (SA) to filter out co-transcribed transposons within genes based on alignment scores. Moreover, our software incorporates a Wicker-like criteria to refine a set of target TEs and avoid spurious mapping. Based on simulated and real data, we show that the SA strategy adopted by ExplorATE achieved better estimates of non-co-transcribed elements than other available alignment-based or mapping-based software. ExplorATE results showed high congruence with alignment-based tools with and without a reference genome, yet ExplorATE required less execution time. Likewise, ExplorATE expands and complements most previous TE analyses by incorporating the co-transcription and multi-mapping effects during quantification, and provides a seamless integration with other downstream tools within the R environment.
Availability and implementation
Source code is available at https://github.com/FemeniasM/ExplorATEproject and https://github.com/FemeniasM/ExplorATE_shell_script. Data available on request.
Supplementary information
Supplementary data are available at Bioinformatics online.
Accurately predicting drug–target interactions (DTIs) in silico can guide the drug discovery process and thus facilitate drug development. Computational approaches for DTI prediction that adopt the systems biology perspective generally exploit the rationale that the properties of drugs and targets can be characterized by their functional roles in biological networks.
Results
Inspired by recent advance of information passing and aggregation techniques that generalize the convolution neural networks to mine large-scale graph data and greatly improve the performance of many network-related prediction tasks, we develop a new nonlinear end-to-end learning model, called NeoDTI, that integrates diverse information from heterogeneous network data and automatically learns topology-preserving representations of drugs and targets to facilitate DTI prediction. The substantial prediction performance improvement over other state-of-the-art DTI prediction methods as well as several novel predicted DTIs with evidence supports from previous studies have demonstrated the superior predictive power of NeoDTI. In addition, NeoDTI is robust against a wide range of choices of hyperparameters and is ready to integrate more drug and target related information (e.g. compound–protein binding affinity data). All these results suggest that NeoDTI can offer a powerful and robust tool for drug development and drug repositioning.
Availability and implementation
The source code and data used in NeoDTI are available at: https://github.com/FangpingWan/NeoDTI.
Supplementary information
Supplementary data are available at Bioinformatics online.
Human immunodeficiency virus type 1 (HIV-1) genome integration is closely related to clinical latency and viral rebound. In addition to human DNA sequences that directly interact with the integration machinery, the selection of HIV integration sites has also been shown to depend on the heterogeneous genomic context around a large region, which greatly hinders the prediction and mechanistic studies of HIV integration.
Results
We have developed an attention-based deep learning framework, named DeepHINT, to simultaneously provide accurate prediction of HIV integration sites and mechanistic explanations of the detected sites. Extensive tests on a high-density HIV integration site dataset showed that DeepHINT can outperform conventional modeling strategies by automatically learning the genomic context of HIV integration from primary DNA sequence alone or together with epigenetic information. Systematic analyses on diverse known factors of HIV integration further validated the biological relevance of the prediction results. More importantly, in-depth analyses of the attention values output by DeepHINT revealed intriguing mechanistic implications in the selection of HIV integration sites, including potential roles of several DNA-binding proteins. These results established DeepHINT as an effective and explainable deep learning framework for the prediction and mechanistic study of HIV integration.
Availability and implementation
DeepHINT is available as an open-source software and can be downloaded from https://github.com/nonnerdling/DeepHINT.
Supplementary information
Supplementary data are available at Bioinformatics online.
Nelson, Andrew D. L., Haug-Baltzell, Asher K., Davey, Sean, Gregory, Brian D., Lyons, Eric, and Hancock, ed., John. EPIC-CoGe: managing and analyzing genomic data. Bioinformatics 34.15 Web. doi:10.1093/bioinformatics/bty106.
Nelson, Andrew D. L., Haug-Baltzell, Asher K., Davey, Sean, Gregory, Brian D., Lyons, Eric, & Hancock, ed., John. EPIC-CoGe: managing and analyzing genomic data. Bioinformatics, 34 (15). https://doi.org/10.1093/bioinformatics/bty106
Nelson, Andrew D. L., Haug-Baltzell, Asher K., Davey, Sean, Gregory, Brian D., Lyons, Eric, and Hancock, ed., John.
"EPIC-CoGe: managing and analyzing genomic data". Bioinformatics 34 (15). Country unknown/Code not available: Oxford University Press. https://doi.org/10.1093/bioinformatics/bty106.https://par.nsf.gov/biblio/10393367.
@article{osti_10393367,
place = {Country unknown/Code not available},
title = {EPIC-CoGe: managing and analyzing genomic data},
url = {https://par.nsf.gov/biblio/10393367},
DOI = {10.1093/bioinformatics/bty106},
abstractNote = {Abstract SummaryThe EPIC-CoGe browser is a web-based genome visualization utility that integrates the GMOD JBrowse genome browser with the extensive CoGe genome database (currently containing over 30 000 genomes). In addition, the EPIC-CoGe browser boasts many additional features over basic JBrowse, including enhanced search capability and on-the-fly analyses for comparisons and analyses between all types of functional and diversity genomics data. There is no installation required and data (genome, annotation, functional genomic and diversity data) can be loaded by following a simple point and click wizard, or using a REST API, making the browser widely accessible and easy to use by researchers of all computational skill levels. In addition, EPIC-CoGe and data tracks are easily embedded in other websites and JBrowse instances. Availability and implementationEPIC-CoGe Browser is freely available for use online through CoGe (https://genomevolution.org). Source code (MIT open source) is available: https://github.com/LyonsLab/coge. Supplementary informationSupplementary data are available at Bioinformatics online.},
journal = {Bioinformatics},
volume = {34},
number = {15},
publisher = {Oxford University Press},
author = {Nelson, Andrew D. L. and Haug-Baltzell, Asher K. and Davey, Sean and Gregory, Brian D. and Lyons, Eric and Hancock, ed., John},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.