NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

GLIDER: function prediction from GLIDE-based neighborhoods

https://doi.org/10.1093/bioinformatics/btac322

Devkota, Kapil; Schmidt, Henri; Werenski, Matt; Murphy, James M.; Erden, Mert; Arsenescu, Victor; Cowen, Lenore J.; Valencia, ed., Alfonso (May 2022, Bioinformatics)

Abstract MotivationProtein function prediction, based on the patterns of connection in a protein–protein interaction (or association) network, is perhaps the most studied of the classical, fundamental inference problems for biological networks. A highly successful set of recent approaches use random walk-based low-dimensional embeddings that tend to place functionally similar proteins into coherent spatial regions. However, these approaches lose valuable local graph structure from the network when considering only the embedding. We introduce GLIDER, a method that replaces a protein–protein interaction or association network with a new graph-based similarity network. GLIDER is based on a variant of our previous GLIDE method, which was designed to predict missing links in protein–protein association networks, capturing implicit local and global (i.e. embedding-based) graph properties. ResultsGLIDER outperforms competing methods on the task of predicting GO functional labels in cross-validation on a heterogeneous collection of four human protein–protein association networks derived from the 2016 DREAM Disease Module Identification Challenge, and also on three different protein–protein association networks built from the STRING database. We show that this is due to the strong functional enrichment that is present in the local GLIDER neighborhood in multiple different types of protein–protein association networks. Furthermore, we introduce the GLIDER graph neighborhood as a way for biologists to visualize the local neighborhood of a disease gene. As an application, we look at the local GLIDER neighborhoods of a set of known Parkinson’s Disease GWAS genes, rediscover many genes which have known involvement in Parkinson’s disease pathways, plus suggest some new genes to study. Availability and implementationAll code is publicly available and can be accessed here: https://github.com/kap-devkota/GLIDER. Supplementary informationSupplementary data are available at Bioinformatics online.
more » « less
Neighborhood embedding and re-ranking of disease genes with ADAGIO

https://doi.org/10.1145/3535508.3545542

Erden, Mert; Gelement, Megan; Hakimjee, Sarrah; Levin, Kyla; Sidhom, Mary-Joy; Devkota, Kapil; Cowen, Lenore J. (August 2022, BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics)

Full Text Available
Enforcing exact physics in scientific machine learning: A data-driven exterior calculus on graphs

https://doi.org/10.1016/j.jcp.2022.110969

Trask, Nathaniel; Huang, Andy; Hu, Xiaozhe (May 2022, Journal of Computational Physics)

Full Text Available
Random-Walk Based Approximate k-Nearest Neighbors Algorithm for Diffusion State Distance

https://doi.org/10.1007/978-3-030-97549-4_1

Cowen, L.; Hu, X.; Lin, J.; Shen, Y.; Wu, K. (March 2022, Large-Scale Scientific Computing. LSSC 2021, Springer Lecture Notes in Computer Science)

Diffusion State Distance (DSD) is a data-dependent metric that compares data points using a data-driven diffusion process and provides a powerful tool for learning the underlying structure of high-dimensional data. While finding the exact nearest neighbors in the DSD metric is computationally expensive, in this paper, we propose a new random-walk based algorithm that empirically finds approximate k-nearest neighbors accurately in an efficient manner. Numerical results for real-world protein-protein interaction networks are presented to illustrate the efficiency and robustness of the proposed algorithm. The set of approximate k-nearest neighbors performs well when used to predict proteins’ functional labels.
more » « less
Full Text Available
MUNDO: protein function prediction embedded in a multispecies world

https://doi.org/10.1093/bioadv/vbab025

Arsenescu, Victor; Devkota, Kapil; Erden, Mert; Shpilker, Polina; Werenski, Matthew; Cowen, Lenore J (January 2022, Bioinformatics Advances)
Mulder, Nicola (Ed.)
Abstract Motivation Leveraging cross-species information in protein function prediction can add significant power to network-based protein function prediction methods, because so much functional information is conserved across at least close scales of evolution. We introduce MUNDO, a new cross-species co-embedding method that combines a single-network embedding method with a co-embedding method to predict functional annotations in a target species, leveraging also functional annotations in a model species network. Results Across a wide range of parameter choices, MUNDO performs best at predicting annotations in the mouse network, when trained on mouse and human protein–protein interaction (PPI) networks, in the human network, when trained on human and mouse PPIs, and in Baker’s yeast, when trained on Fission and Baker’s yeast, as compared to competitor methods. MUNDO also outperforms all the cross-species methods when predicting in Fission yeast when trained on Fission and Baker’s yeast; however, in this single case, discarding the information from the other species and using annotations from the Fission yeast network alone usually performs best. Availability and implementation All code is available and can be accessed here: github.com/v0rtex20k/MUNDO. Supplementary information Supplementary data are available at Bioinformatics Advances online. Additional experimental results are on our github site.
more » « less
Full Text Available
Majority Vote Cascading: A Semi-Supervised Framework for Improving Protein Function Prediction

https://doi.org/10.1109/TCBB.2021.3059812

Lazarsfeld, John; Rodriguez, Jonathan; Erden, Mert; Liu, Yuelin; Cowen, Lenore J. (February 2021, IEEE/ACM Transactions on Computational Biology and Bioinformatics)

A method to improve protein function prediction for sparsely annotated PPI networks is introduced. The method extends the DSD majority vote algorithm introduced by Cao et al. to give confidence scores on predicted labels and to use predictions of high confidence to predict the labels of other nodes in subsequent rounds. We call this a majority vote cascade. Several cascade variants are tested in a stringent cross-validation experiment on PPI networks from S. cerevisiae and D. melanogaster, and we show that for many different settings with several alternative confidence functions, cascading improves the accuracy of the predictions. A list of the most confident new label predictions in the two networks is also reported. Code and networks for the cross-validation experiments appear at http://bcb.cs.tufts.edu/cascade.
more » « less
Full Text Available
A Posteriori Error Estimates for Multilevel Methods for Graph Laplacians

https://doi.org/10.1137/20M1349618

Hu, Xiaozhe; Wu, Kaiyi; Zikatanov, Ludmil T. (January 2021, SIAM Journal on Scientific Computing)

Full Text Available
Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks

https://doi.org/10.1137/20M1324089

Cowen, Lenore; Devkota, Kapil; Hu, Xiaozhe; Murphy, James M.; Wu, Kaiyi (January 2021, SIAM Journal on Mathematics of Data Science)
null (Ed.)
Full Text Available
Well-Posedness and Discretization for a Class of Models for Mixed-Dimensional Problems with High-Dimensional Gap

https://doi.org/10.1137/20M1362541

Hodneland, Erlend; Hu, Xiaozhe; Nordbotten, Jan M. (January 2021, SIAM Journal on Applied Mathematics)

Full Text Available
GLIDE: combining local methods and diffusion state embeddings to predict missing interactions in biological networks

https://doi.org/10.1093/bioinformatics/btaa459

Devkota, Kapil; Murphy, James M; Cowen, Lenore J (July 2020, Bioinformatics)

Abstract Motivation One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein–protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. Results We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE’s global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn’s disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. Availability and implementation GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available

« Prev Next »

Search for: All records