skip to main content


Title: The Gromov–Wasserstein distance between networks and stable network invariants
Abstract

We define a metric—the network Gromov–Wasserstein distance—on weighted, directed networks that is sensitive to the presence of outliers. In addition to proving its theoretical properties, we supply network invariants based on optimal transport that approximate this distance by means of lower bounds. We test these methods on a range of simulated network datasets and on a dataset of real-world global bilateral migration. For our simulations, we define a network generative model based on the stochastic block model. This may be of independent interest for benchmarking purposes.

 
more » « less
Award ID(s):
1723003 1740761
NSF-PAR ID:
10125348
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Volume:
8
Issue:
4
ISSN:
2049-8764
Page Range / eLocation ID:
p. 757-787
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The maritime shipping network is the backbone of global trade. Data about the movement of cargo through this network comes in various forms, from ship-level Automatic Identification System (AIS) data, to aggregated bilateral trade volume statistics. Multiple network representations of the shipping system can be derived from any one data source, each of which has advantages and disadvantages. In this work, we examine data in the form of liner shipping service routes, a list of walks through the port-to-port network aggregated from individual shipping companies by a large shipping logistics database. This data is inherently sequential, in that each route represents a sequence of ports called upon by a cargo ship. Previous work has analyzed this data without taking full advantage of the sequential information. Our contribution is to develop a path-based methodology for analyzing liner shipping service route data, computing navigational trajectories through the network that both respect the directional information in the shipping routes and minimize the number of cargo transfers between routes, a desirable property in industry practice. We compare these paths with those computed using other network representations of the same data, finding that our approach results in paths that are longer in terms of both network and nautical distance. We further use these trajectories to re-analyze the role of a previously-identified structural core through the network, as well as to define and analyze a measure of betweenness centrality for nodes and edges.

     
    more » « less
  2. Largely due to superior properties compared to traditional materials, the use of polymer matrix composites (PMC) has been expanding in several industries such as aerospace, transportation, defense, and marine. However, the anisotropy and nonhomogeneity of these structures contribute to the difficulty in evaluating structural integrity; damage sites can occur at multiple locations and length scales and are hard to track over time. This can lead to unpredictable and expensive failure of a safety-critical structure, thus creating a need for non-destructive evaluation (NDE) techniques which can detect and quantify small-scale damage sites and track their progression. Our research group has improved upon classical microwave techniques to address these needs; utilizing a custom device to move a sample within a resonant cavity and create a spatial map of relative permittivity. We capitalize on the inevitable presence of moisture within the polymer network to detect damage. The differing migration inclinations of absorbed water molecules in a pristine versus a damaged composite alters the respective concentrations of the two chemical states of moisture. The greater concentration of free water molecules residing in the damage sites exhibit highly different relative permittivity when compared to the higher ratio of polymer-bound water molecules in the undamaged areas. Currently, the technique has shown the ability to detect impact damage across a range of damage levels and gravimetric moisture contents but is not able to specifically quantify damage extent with regards to impact energy level. The applicability of machine learning (ML) to composite materials is substantial, with uses in areas like manufacturing and design, prediction of structural properties, and damage detection. Using traditional NDE techniques in conjunction with supervised or unsupervised ML has been shown to improve the accuracy, reliability, or efficiency of the existing methods. In this work, we explore the use of a combined unsupervised/supervised ML approach to determine a damage boundary and quantification of single-impact specimens. Dry composite specimens were damaged via drop tower to induce one central impact site of 0, 2, or 3 Joules. After moisture exposure, Entrepreneur Dr, Raleigh, North Carolina 27695, U.S.A. 553 each specimen underwent dielectric mapping, and spatial permittivity maps were created at a variety of gravimetric moisture contents. An unsupervised K-means clustering algorithm was applied to the dielectric data to segment the levels of damage and define a damage boundary. Subsequently, supervised learning was used to quantify damage using features including but not limited to thickness, moisture content, permittivity values of each cluster, and average distance between points in each cluster. A regression model was trained on several samples with impact energy as the predicted variable. Evaluation was then performed based on prediction accuracy for samples in which the impact energies are not known to the model.

     
    more » « less
  3. Species distribution and ecological niche models (hereafter SDMs) are popular tools with broad applications in ecology, biodiversity conservation, and environmental science. Many SDM applications require projecting models in environmental conditions non‐analog to those used for model training (extrapolation), giving predictions that may be statistically unsupported and biologically meaningless. We introduce a novel method, Shape, a model‐agnostic approach that calculates the extrapolation degree for a given projection data point by its multivariate distance to the nearest training data point. Such distances are relativized by a factor that reflects the dispersion of the training data in environmental space. Distinct from other approaches, Shape incorporates an adjustable threshold to control the binary discrimination between acceptable and unacceptable extrapolation degrees. We compared Shape's performance to five extrapolation metrics based on their ability to detect analog environmental conditions in environmental space and improve SDMs suitability predictions. To do so, we used 760 virtual species to define different modeling conditions determined by species niche tolerance, distribution equilibrium condition, sample size, and algorithm. All algorithms had trouble predicting species niches. However, we found a substantial improvement in model predictions when model projections were truncated independently of extrapolation metrics. Shape's performance was dependent on extrapolation threshold used to truncate models. Because of this versatility, our approach showed similar or better performance than the previous approaches and could better deal with all modeling conditions and algorithms. Our extrapolation metric is simple to interpret, captures the complex shapes of the data in environmental space, and can use any extrapolation threshold to define whether model predictions are retained based on the extrapolation degrees. These properties make this approach more broadly applicable than existing methods for creating and applying SDMs. We hope this method and accompanying tools support modelers to explore, detect, and reduce extrapolation errors to achieve more reliable models.

    Keywords: environmental novelty, extrapolation, Mahalanobis distance, model prediction, non‐analog environmental data, transferability

     
    more » « less
  4. During recent years there have been several efforts from city and transportation planners, as well as, port authorities, to design multimodal transport systems, covering the needs of the population to be served. However, before designing such a system, the first step is to understand the current gaps. Does the current system meet the transit demand of the geographic area covered? If not, where are the gaps between supply and demand? To answer this question, the notion of transit desert has been introduced. A transit desert is an area where the supply of transit service does not meet the demand for it. While there is little ambiguity on what constitutes transit demand, things are more vague when it comes to transit supply. Existing efforts often define transit supply using volume metrics (e.g., number of bus stops within a pre-defined distance). However, this does not necessarily capture the quality of the transit service. In this study, we introduce a network-based transit desert index (which we call TDI) that captures not only the quantity of transit supply in an area, but also the connectivity that the transit system provides for an area within the region of interest. In particular, we define a network between areas based on the transit travel time, distance, and overall quantity of connections. We use these measures to examine two notions of transit quality: connectivity and availability. To quantify the connectivity of an area i we utilize the change observed in the second smallest eigenvalue of the Laplacian when we remove node i from the network. To quantify availability of an area i, we examine the number of routes which pass through this area as given by an underlying transit network. We further apply and showcase our approach with data from Allegheny County, Pennsylvania, USA. Finally, we discuss current limitations of TDI and how we can tackle them as part of our future research. 
    more » « less
  5. Abstract

    The impact that each individual non‐pharmaceutical intervention (NPI) had on the spread rate of COVID‐19 is difficult to estimate, since several NPIs were implemented in rapid succession in most countries. In this article, we analyze the detectability of sudden changes in a parameter of nonlinear dynamical systems, which could be used to represent NPIs or mutations of the virus, in the presence of measurement noise. Specifically, by taking an agnostic approach, we provide necessary conditions for when the best possible unbiased estimator is able to isolate the effect of a sudden change in a model parameter, by using the Hammersley–Chapman–Robbins (HCR) lower bound. Several simplifications to the calculation of the HCR lower bound are given, which depend on the amplitude of the sudden change and the dynamics of the system. We further define the concept of the most informative sample based on the largest distance between two output trajectories, which is a good indicator of when the HCR lower bound converges. These results are thereafter used to analyze the susceptible‐infected‐removed model. For instance, we show that performing analysis using the number of recovered/deceased, as opposed to the cumulative number of infected, may be an inferior signal to use since sudden changes are fundamentally more difficult to estimate and seem to require more samples. Finally, these results are verified by simulations and applied to real data from the spread of COVID‐19 in France.

     
    more » « less