skip to main content

Search for: All records

Creators/Authors contains: "Krishnan, Arjun"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We study first-passage percolation through related optimization problems over paths of restricted length. The path length variable is in duality with a shift of the weights. This puts into a convex duality framework old observations about the convergence of the normalized Euclidean length of geodesics due to Hammersley and Welsh, Smythe and Wierman, and Kesten, and leads to new results about geodesic length and the regularity of the shape function as a function of the weight shift. For points far enough away from the origin, the ratio of the geodesic length and theℓ<#comment/>1\ell ^1distance to the endpoint is uniformly bounded away from one. The shape function is a strictly concave function of the weight shift. Atoms of the weight distribution generate singularities, that is, points of nondifferentiability, in this function. We generalize to all distributions, directions and dimensions an old singularity result of Steele and Zhang for the planar Bernoulli case. When the weight distribution has two or more atoms, a dense set of shifts produces singularities. The results come from a combination of the convex duality, the shape theorems of the different first-passage optimization problems, and modification arguments.

    more » « less
    Free, publicly-accessible full text available May 18, 2024
  2. Abstract Motivation

    Accurately representing biological networks in a low-dimensional space, also known as network embedding, is a critical step in network-based machine learning and is carried out widely using node2vec, an unsupervised method based on biased random walks. However, while many networks, including functional gene interaction networks, are dense, weighted graphs, node2vec is fundamentally limited in its ability to use edge weights during the biased random walk generation process, thus under-using all the information in the network.


    Here, we present node2vec+, a natural extension of node2vec that accounts for edge weights when calculating walk biases and reduces to node2vec in the cases of unweighted graphs or unbiased walks. Using two synthetic datasets, we empirically show that node2vec+ is more robust to additive noise than node2vec in weighted graphs. Then, using genome-scale functional gene networks to solve a wide range of gene function and disease prediction tasks, we demonstrate the superior performance of node2vec+ over node2vec in the case of weighted graphs. Notably, due to the limited amount of training data in the gene classification tasks, graph neural networks such as GCN and GraphSAGE are outperformed by both node2vec and node2vec+.

    Availability and implementation

    The data and code are available on GitHub at All additional data underlying this article are available on Zenodo at

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less
  3. Abstract

    There are currently >1.3 million human –omics samples that are publicly available. This valuable resource remains acutely underused because discovering particular samples from this ever-growing data collection remains a significant challenge. The major impediment is that sample attributes are routinely described using varied terminologies written in unstructured natural language. We propose a natural-language-processing-based machine learning approach (NLP-ML) to infer tissue and cell-type annotations for genomics samples based only on their free-text metadata. NLP-ML works by creating numerical representations of sample descriptions and using these representations as features in a supervised learning classifier that predicts tissue/cell-type terms. Our approach significantly outperforms an advanced graph-based reasoning annotation method (MetaSRA) and a baseline exact string matching method (TAGGER). Model similarities between related tissues demonstrate that NLP-ML models capture biologically-meaningful signals in text. Additionally, these models correctly classify tissue-associated biological processes and diseases based on their text descriptions alone. NLP-ML models are nearly as accurate as models based on gene-expression profiles in predicting sample tissue annotations but have the distinct capability to classify samples irrespective of the genomics experiment type based on their text metadata. Python NLP-ML prediction code and trained tissue models are available at

    more » « less