Search for: All records

Creators/Authors contains: "Devkota, Kapil"

« Prev Next »

Total Resources

7

Resource Type
Conference Paper

1

Conference Proceeding

0

Dataset

0

Journal Article

6

Workshop Report

0

Availability
Full Text / Resource Available

7

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TT3D: Leveraging precomputed protein 3D sequence models to predict protein–protein interactions

https://doi.org/10.1093/bioinformatics/btad663

Sledzieski, Samuel ; Devkota, Kapil ; Singh, Rohit ; Cowen, Lenore ; Berger, Bonnie ; Elofsson, ed., Arne ( October 2023 , Bioinformatics)

Abstract Motivation
High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di).
Results
We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein–protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein–protein interaction predictions across all protein pairs can be made genome-wide.
Availability and Implementation
TT3D is available at https://github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https://zenodo.org/records/10037674.

more » « less
Neighborhood embedding and re-ranking of disease genes with ADAGIO

https://doi.org/10.1145/3535508.3545542

Erden, Mert ; Gelement, Megan ; Hakimjee, Sarrah ; Levin, Kyla ; Sidhom, Mary-Joy ; Devkota, Kapil ; Cowen, Lenore J. ( August 2022 , BCB '22: Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics)

Full Text Available
Topsy-Turvy: integrating a global view into sequence-based PPI prediction

https://doi.org/10.1093/bioinformatics/btac258

Singh, Rohit ; Devkota, Kapil ; Sledzieski, Samuel ; Berger, Bonnie ; Cowen, Lenore ( June 2022 , Bioinformatics)

Abstract Summary
Computational methods to predict protein–protein interaction (PPI) typically segregate into sequence-based ‘bottom-up’ methods that infer properties from the characteristics of the individual protein sequences, or global ‘top-down’ methods that infer properties from the pattern of already known PPIs in the species of interest. However, a way to incorporate top-down insights into sequence-based bottom-up PPI prediction methods has been elusive. We thus introduce Topsy-Turvy, a method that newly synthesizes both views in a sequence-based, multi-scale, deep-learning model for PPI prediction. While Topsy-Turvy makes predictions using only sequence data, during the training phase it takes a transfer-learning approach by incorporating patterns from both global and molecular-level views of protein interaction. In a cross-species context, we show it achieves state-of-the-art performance, offering the ability to perform genome-scale, interpretable PPI prediction for non-model organisms with no existing experimental PPI data. In species with available experimental PPI data, we further present a Topsy-Turvy hybrid (TT-Hybrid) model which integrates Topsy-Turvy with a purely network-based model for link prediction that provides information about species-specific network rewiring. TT-Hybrid makes accurate predictions for both well- and sparsely-characterized proteins, outperforming both its constituent components as well as other state-of-the-art PPI prediction methods. Furthermore, running Topsy-Turvy and TT-Hybrid screens is feasible for whole genomes, and thus these methods scale to settings where other methods (e.g. AlphaFold-Multimer) might be infeasible. The generalizability, accuracy and genome-level scalability of Topsy-Turvy and TT-Hybrid unlocks a more comprehensive map of protein interaction and organization in both model and non-model organisms.
Availability and implementation
https://topsyturvy.csail.mit.edu.
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
MUNDO: protein function prediction embedded in a multispecies world

https://doi.org/10.1093/bioadv/vbab025

Arsenescu, Victor ; Devkota, Kapil ; Erden, Mert ; Shpilker, Polina ; Werenski, Matthew ; Cowen, Lenore J ( January 2022 , Bioinformatics Advances)
Mulder, Nicola (Ed.)
Abstract Motivation Leveraging cross-species information in protein function prediction can add significant power to network-based protein function prediction methods, because so much functional information is conserved across at least close scales of evolution. We introduce MUNDO, a new cross-species co-embedding method that combines a single-network embedding method with a co-embedding method to predict functional annotations in a target species, leveraging also functional annotations in a model species network. Results Across a wide range of parameter choices, MUNDO performs best at predicting annotations in the mouse network, when trained on mouse and human protein–protein interaction (PPI) networks, in the human network, when trained on human and mouse PPIs, and in Baker’s yeast, when trained on Fission and Baker’s yeast, as compared to competitor methods. MUNDO also outperforms all the cross-species methods when predicting in Fission yeast when trained on Fission and Baker’s yeast; however, in this single case, discarding the information from the other species and using annotations from the Fission yeast network alone usually performs best. Availability and implementation All code is available and can be accessed here: github.com/v0rtex20k/MUNDO. Supplementary information Supplementary data are available at Bioinformatics Advances online. Additional experimental results are on our github site.
more » « less
Full Text Available
GLIDER: function prediction from GLIDE-based neighborhoods

https://doi.org/10.1093/bioinformatics/btac322

Devkota, Kapil ; Schmidt, Henri ; Werenski, Matt ; Murphy, James M. ; Erden, Mert ; Arsenescu, Victor ; Cowen, Lenore J. ; Valencia, ed., Alfonso ( May 2022 , Bioinformatics)

Abstract Motivation
Protein function prediction, based on the patterns of connection in a protein–protein interaction (or association) network, is perhaps the most studied of the classical, fundamental inference problems for biological networks. A highly successful set of recent approaches use random walk-based low-dimensional embeddings that tend to place functionally similar proteins into coherent spatial regions. However, these approaches lose valuable local graph structure from the network when considering only the embedding. We introduce GLIDER, a method that replaces a protein–protein interaction or association network with a new graph-based similarity network. GLIDER is based on a variant of our previous GLIDE method, which was designed to predict missing links in protein–protein association networks, capturing implicit local and global (i.e. embedding-based) graph properties.
Results
GLIDER outperforms competing methods on the task of predicting GO functional labels in cross-validation on a heterogeneous collection of four human protein–protein association networks derived from the 2016 DREAM Disease Module Identification Challenge, and also on three different protein–protein association networks built from the STRING database. We show that this is due to the strong functional enrichment that is present in the local GLIDER neighborhood in multiple different types of protein–protein association networks. Furthermore, we introduce the GLIDER graph neighborhood as a way for biologists to visualize the local neighborhood of a disease gene. As an application, we look at the local GLIDER neighborhoods of a set of known Parkinson’s Disease GWAS genes, rediscover many genes which have known involvement in Parkinson’s disease pathways, plus suggest some new genes to study.
Availability and implementation
All code is publicly available and can be accessed here: https://github.com/kap-devkota/GLIDER.
Supplementary information
Supplementary data are available at Bioinformatics online.

more » « less
Diffusion State Distances: Multitemporal Analysis, Fast Algorithms, and Applications to Biological Networks

https://doi.org/10.1137/20M1324089

Cowen, Lenore ; Devkota, Kapil ; Hu, Xiaozhe ; Murphy, James M. ; Wu, Kaiyi ( January 2021 , SIAM Journal on Mathematics of Data Science)
null (Ed.)
Full Text Available
GLIDE: combining local methods and diffusion state embeddings to predict missing interactions in biological networks

https://doi.org/10.1093/bioinformatics/btaa459

Devkota, Kapil ; Murphy, James M ; Cowen, Lenore J ( July 2020 , Bioinformatics)

Abstract Motivation One of the core problems in the analysis of biological networks is the link prediction problem. In particular, existing interactions networks are noisy and incomplete snapshots of the true network, with many true links missing because those interactions have not yet been experimentally observed. Methods to predict missing links have been more extensively studied for social than for biological networks; it was recently argued that there is some special structure in protein–protein interaction (PPI) network data that might mean that alternate methods may outperform the best methods for social networks. Based on a generalization of the diffusion state distance, we design a new embedding-based link prediction method called global and local integrated diffusion embedding (GLIDE). GLIDE is designed to effectively capture global network structure, combined with alternative network type-specific customized measures that capture local network structure. We test GLIDE on a collection of three recently curated human biological networks derived from the 2016 DREAM disease module identification challenge as well as a classical version of the yeast PPI network in rigorous cross validation experiments. Results We indeed find that different local network structure is dominant in different types of biological networks. We find that the simple local network measures are dominant in the highly connected network core between hub genes, but that GLIDE’s global embedding measure adds value in the rest of the network. For example, we make GLIDE-based link predictions from genes known to be involved in Crohn’s disease, to genes that are not known to have an association, and make some new predictions, finding support in other network data and the literature. Availability and implementation GLIDE can be downloaded at https://bitbucket.org/kap_devkota/glide. Supplementary information Supplementary data are available at Bioinformatics online.
more » « less
Full Text Available