skip to main content


This content will become publicly available on June 4, 2024

Title: Higher-Order Link Prediction Via Learnable Maximum Mean Discrepancy
Higher-order link prediction (HOLP) seeks missing links capturing dependencies among three or more network nodes. Predicting high-order links (HOLs) can for instance reveal hyperlinks in the structure of drug substance and metabolic networks. Existing methods either make restrictive assumptions regarding the emergence of HOLs, or, they rely on reduced dimensionality models of limited expressiveness. To overcome these limitations, the HOLP approach developed here leverages distribution similarities across embeddings as captured by a learnable probability metric. The intuition underpinning the novel approach is that sets of nodes whose embeddings are less similar in distribution, are less likely to be connected by a HOL. Specifically, nonlinear dimensionality reduction is effected through a Gaussian process latent variable model that yields nodal embeddings, and also learns a data-driven similarity function (kernel). This kernel forms the core of a maximum mean discrepancy probability metric. Tests on benchmark datasets illustrate the potential of the proposed approach.  more » « less
Award ID(s):
2220292 2212318 2126052 1901134
NSF-PAR ID:
10424903
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE International Conference on Acoustics, Speech, and Signal Processing
Page Range / eLocation ID:
1 to 5
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this thesis we propose novel estimation techniques for localization and planning problems, which are key challenges in long-term autonomy. We take inspiration in our methods from non-parametric estimation and use tools such as kernel density estimation, non-linear least-squares optimization, binary masking, and random sampling. We show that these methods, by avoiding explicit parametric models, outperform existing methods that use them. Despite the seeming differences between localization and planning, we demonstrate in this thesis that the problems share core structural similarities. When real or simulation-sampled measurements are expensive, noisy, or high variance, non-parametric estimation techniques give higher-quality results in less time. We first address two localization problems. In order to permit localization with a set of ad hoc-placed radios, we propose an ultra-wideband (UWB) graph realization system to localize the radios. Our system achieves high accuracy and robustness by using kernel density estimation for measurement probability densities, by explicitly modeling antenna delays, and by optimizing this combination with a non-linear least squares formulation. Next, in order to then support robotic navigation, we present a flexible system for simultaneous localization and mapping (SLAM) that combines elements from both traditional dense metric SLAM and topological SLAM, using a binary "masking function" to focus attention. This masking function controls which lidar scans are available for loop closures. We provide several masking functions based on approximate topological class detectors. We then examine planning problems in the final chapter and in the appendix. In order to plan with uncertainty around multiple dynamic agents, we describe Monte-Carlo Policy-Tree Decision Making (MCPTDM), a framework for efficiently computing policies in partially-observable, stochastic, continuous problems. MCPTDM composes a sequence of simpler closed-loop policies and uses marginal action costs and particle repetition to improve cost estimates and sample efficiency by reducing variance. Finally, in the appendix we explore Learned Similarity Monte-Carlo Planning (LSMCP), where we seek to enhance the sample efficiency of partially observable Monte Carlo tree search-based planning by taking advantage of similarities in the final outcomes of similar states and actions. We train a multilayer perceptron to learn a similarity function which we then use to enhance value estimates in the planning. Collectively, we show in this thesis that non-parametric methods promote long-term autonomy by reducing error and increasing robustness across multiple domains. 
    more » « less
  2. Fu, Feng (Ed.)
    Network science has increasingly become central to the field of epidemiology and our ability to respond to infectious disease threats. However, many networks derived from modern datasets are not just large, but dense, with a high ratio of edges to nodes. This includes human mobility networks where most locations have a large number of links to many other locations. Simulating large-scale epidemics requires substantial computational resources and in many cases is practically infeasible. One way to reduce the computational cost of simulating epidemics on these networks is sparsification , where a representative subset of edges is selected based on some measure of their importance. We test several sparsification strategies, ranging from naive thresholding to random sampling of edges, on mobility data from the U.S. Following recent work in computer science, we find that the most accurate approach uses the effective resistances of edges, which prioritizes edges that are the only efficient way to travel between their endpoints. The resulting sparse network preserves many aspects of the behavior of an SIR model, including both global quantities, like the epidemic size, and local details of stochastic events, including the probability each node becomes infected and its distribution of arrival times. This holds even when the sparse network preserves fewer than 10% of the edges of the original network. In addition to its practical utility, this method helps illuminate which links of a weighted, undirected network are most important to disease spread. 
    more » « less
  3. Knowledge graph embeddings (KGE) have been extensively studied to embed large-scale relational data for many real-world applications. Existing methods have long ignored the fact many KGs contain two fundamentally different views: high-level ontology-view concepts and fine-grained instance-view entities. They usually embed all nodes as vectors in one latent space. However, a single geometric representation fails to capture the structural differences between two views and lacks probabilistic semantics towards concepts’ granularity. We propose Concept2Box, a novel approach that jointly embeds the two views of a KG using dual geometric representations. We model concepts with box embeddings, which learn the hierarchy structure and complex relations such as overlap and disjoint among them. Box volumes can be interpreted as concepts’ granularity. Different from concepts, we model entities as vectors. To bridge the gap between concept box embeddings and entity vector embeddings, we propose a novel vector-to-box distance metric and learn both embeddings jointly. Experiments on both the public DBpedia KG and a newly-created industrial KG showed the effectiveness of Concept2Box. 
    more » « less
  4. This paper addresses the problem of characterizing statistical distributions of cellular shape populations using shape samples from microscopy image data. This problem is challenging because of the nonlinearity and high-dimensionality of shape manifolds. The paper develops an efficient, nonparametric approach using ideas from k-modal mixtures and kernel estimators. It uses elastic shape analysis of cell boundaries to estimate statistical modes and clusters given shapes around those modes. (Notably, it uses a combination of modal distributions and ANOVA to determine k automatically.) A population is then characterized as k-modal mixture relative to this estimated clustering and a chosen kernel (e.g., a Gaussian or a flat kernel). One can compare and analyze populations using the Fisher-Rao metric between their estimated distributions. We demonstrate this approach for classifying shapes associated with migrations of entamoeba histolytica under different experimental conditions. This framework remarkably captures salient shape patterns and separates shape data for different experimental settings, even when it is difficult to discern class differences visually. 
    more » « less
  5. Semiflexible slender filaments are ubiquitous in nature and cell biology, including in the cytoskeleton, where reorganization of actin filaments allows the cell to move and divide. Most methods for simulating semiflexible inextensible fibers/polymers are based on discrete (bead-link or blob-link) models, which become prohibitively expensive in the slender limit when hydrodynamics is accounted for. In this paper, we develop a novel coarse-grained approach for simulating fluctuating slender filaments with hydrodynamic interactions. Our approach is tailored to relatively stiff fibers whose persistence length is comparable to or larger than their length and is based on three major contributions. First, we discretize the filament centerline using a coarse non-uniform Chebyshev grid, on which we formulate a discrete constrained Gibbs–Boltzmann (GB) equilibrium distribution and overdamped Langevin equation for the evolution of unit-length tangent vectors. Second, we define the hydrodynamic mobility at each point on the filament as an integral of the Rotne–Prager–Yamakawa kernel along the centerline and apply a spectrally accurate “slender-body” quadrature to accurately resolve the hydrodynamics. Third, we propose a novel midpoint temporal integrator, which can correctly capture the Ito drift terms that arise in the overdamped Langevin equation. For two separate examples, we verify that the equilibrium distribution for the Chebyshev grid is a good approximation of the blob-link one and that our temporal integrator for overdamped Langevin dynamics samples the equilibrium GB distribution for sufficiently small time step sizes. We also study the dynamics of relaxation of an initially straight filament and find that as few as 12 Chebyshev nodes provide a good approximation to the dynamics while allowing a time step size two orders of magnitude larger than a resolved blob-link simulation. We conclude by applying our approach to a suspension of cross-linked semiflexible fibers (neglecting hydrodynamic interactions between fibers), where we study how semiflexible fluctuations affect bundling dynamics. We find that semiflexible filaments bundle faster than rigid filaments even when the persistence length is large, but show that semiflexible bending fluctuations only further accelerate agglomeration when the persistence length and fiber length are of the same order. 
    more » « less