skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Gromov–Wasserstein distance between networks and stable network invariants
Abstract We define a metric—the network Gromov–Wasserstein distance—on weighted, directed networks that is sensitive to the presence of outliers. In addition to proving its theoretical properties, we supply network invariants based on optimal transport that approximate this distance by means of lower bounds. We test these methods on a range of simulated network datasets and on a dataset of real-world global bilateral migration. For our simulations, we define a network generative model based on the stochastic block model. This may be of independent interest for benchmarking purposes.  more » « less
Award ID(s):
1723003 1740761
PAR ID:
10125348
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Volume:
8
Issue:
4
ISSN:
2049-8764
Page Range / eLocation ID:
p. 757-787
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Non‐perennial streams are receiving increased attention from researchers, however, suitable methods for measuring their hydrologic connectivity remain scarce. To address this deficiency, we developed Bayesian statistical approaches for measuring both average active stream length, and a new metric called average communication distance. Average communication distance is a theoretical increasedeffective distancethat stream‐borne materials must travel, given non‐continuous streamflow. Because it is the product of the inverse probability of surface water presence and stream length, the average communication distance of a non‐perennial stream segment will be greater than its actual physical length. As an application we considered Murphy Creek, a simple non‐perennial stream network in southwestern Idaho, USA. We used surface water presence/absence data obtained in 2019, and priors for the probability of surface water, based on predictions from an existing regional United States Geological Survey model. Average communication distance posterior distributions revealed locations where effective stream lengths increased dramatically due to flow rarity. We also found strong seasonal (spring, summer, fall) differences in network‐level posterior distributions of both average stream length and average communication distance. Our work demonstrates the unique perspectives concerning network drying provided by communication distance, and demonstrates the general usefulness of Bayesian approaches in the analysis of non‐perennial streams. 
    more » « less
  2. Abstract Threading a query protein sequence onto a library of weakly homologous structural templates remains challenging, even when sequence‐based predicted contact or distance information is used. Contact‐assisted or distance‐assisted threading methods utilize only the spatial proximity of the interacting residue pairs for template selection and alignment, ignoring their orientation. Moreover, existing threading methods fail to consider the neighborhood effect induced by the query–template alignment. We present a new distance‐ and orientation‐based covariational threading method called DisCovER by effectively integrating information from inter‐residue distance and orientation along with the topological network neighborhood of a query–template alignment. Our method first selects a subset of templates using standard profile‐based threading coupled with topological network similarity terms to account for the neighborhood effect and subsequently performs distance‐ and orientation‐based query–template alignment using an iterative double dynamic programming framework. Multiple large‐scale benchmarking results on query proteins classified as weakly homologous from the continuous automated model evaluation experiment and from the current literature show that our method outperforms several existing state‐of‐the‐art threading approaches, and that the integration of the neighborhood effect with the inter‐residue distance and orientation information synergistically contributes to the improved performance of DisCovER. DisCovER is freely available athttps://github.com/Bhattacharya-Lab/DisCovER. 
    more » « less
  3. Ouangraoua, Aida (Ed.)
    Abstract Scientists world-wide are putting together massive efforts to understand how the biodiversity that we see on Earth evolved from single-cell organisms at the origin of life and this diversification process is represented through the Tree of Life. Low sampling rates and high heterogeneity in the rate of evolution across sites and lineages produce a phenomenon denoted “long branch attraction” (LBA) in which long non-sister lineages are estimated to be sisters regardless of their true evolutionary relationship. LBA has been a pervasive problem in phylogenetic inference affecting different types of methodologies from distance-based to likelihood-based. Here, we present a novel neural network model that outperforms standard phylogenetic methods and other neural network implementations under LBA settings. Furthermore, unlike existing neural network models in phylogenetics, our model naturally accounts for the tree isomorphisms via permutation invariant functions which ultimately result in lower memory and allows the seamless extension to larger trees. 
    more » « less
  4. Species distribution and ecological niche models (hereafter SDMs) are popular tools with broad applications in ecology, biodiversity conservation, and environmental science. Many SDM applications require projecting models in environmental conditions non‐analog to those used for model training (extrapolation), giving predictions that may be statistically unsupported and biologically meaningless. We introduce a novel method, Shape, a model‐agnostic approach that calculates the extrapolation degree for a given projection data point by its multivariate distance to the nearest training data point. Such distances are relativized by a factor that reflects the dispersion of the training data in environmental space. Distinct from other approaches, Shape incorporates an adjustable threshold to control the binary discrimination between acceptable and unacceptable extrapolation degrees. We compared Shape's performance to five extrapolation metrics based on their ability to detect analog environmental conditions in environmental space and improve SDMs suitability predictions. To do so, we used 760 virtual species to define different modeling conditions determined by species niche tolerance, distribution equilibrium condition, sample size, and algorithm. All algorithms had trouble predicting species niches. However, we found a substantial improvement in model predictions when model projections were truncated independently of extrapolation metrics. Shape's performance was dependent on extrapolation threshold used to truncate models. Because of this versatility, our approach showed similar or better performance than the previous approaches and could better deal with all modeling conditions and algorithms. Our extrapolation metric is simple to interpret, captures the complex shapes of the data in environmental space, and can use any extrapolation threshold to define whether model predictions are retained based on the extrapolation degrees. These properties make this approach more broadly applicable than existing methods for creating and applying SDMs. We hope this method and accompanying tools support modelers to explore, detect, and reduce extrapolation errors to achieve more reliable models. Keywords: environmental novelty, extrapolation, Mahalanobis distance, model prediction, non‐analog environmental data, transferability 
    more » « less
  5. Abstract Despite promising advancements, closed-loop neurostimulation for drug-resistant epilepsy (DRE) still relies on manual tuning and produces variable outcomes, while automated predictable algorithms remain an aspiration. As a fundamental step towards addressing this gap, here we study predictive dynamical models of human intracranial EEG (iEEG) response under parametrically rich neurostimulation. Using data fromn= 13 DRE patients, we find that stimulation-triggered switched-linear models with ~300 ms of causal historical dependence best explain evoked iEEG dynamics. These models are highly consistent across different stimulation amplitudes and frequencies, allowing for learning a generalizable model from abundant STIM OFF and limited STIM ON data. Further, evoked iEEG in nearly all subjects exhibited a distance-dependent pattern, whereby stimulationdirectlyimpacts the actuation site and nearby regions (≲ 20 mm), affects medium-distance regions (20 ~ 100 mm) through network interactions, and hardly reaches more distal areas (≳ 100 mm). Peak network interaction occurs at 60 ~ 80 mm from the stimulation site. Due to their predictive accuracy and mechanistic interpretability, these models hold significant potential for model-based seizure forecasting and closed-loop neurostimulation design. 
    more » « less