The study of complex networks is a significant development in modern science, and has enriched the social sciences, biology, physics, and computer science. Models and algorithms for such networks are pervasive in our society, and impact human behavior via social networks, search engines, and recommender systems, to name a few. A widely used algorithmic technique for modeling such complex networks is to construct a lowdimensional Euclidean embedding of the vertices of the network, where proximity of vertices is interpreted as the likelihood of an edge. Contrary to the common view, we argue that such graph embeddings do not capture salient properties of complex networks. The two properties we focus on are low degree and large clustering coefficients, which have been widely established to be empirically true for realworld networks. We mathematically prove that any embedding (that uses dot products to measure similarity) that can successfully create these two properties must have a rank that is nearly linear in the number of vertices. Among other implications, this establishes that popular embedding techniques such as singular value decomposition and node2vec fail to capture significant structural aspects of realworld complex networks. Furthermore, we empirically study a number of different embedding techniques based onmore »
This content will become publicly available on December 12, 2022
On the Power of Edge Independent Graph Models
Why do many modern neuralnetworkbased graph generative models fail to reproduce typical realworld network characteristics, such as high triangle density? In this work we study the limitations of edge independent random graph models, in which each edge is added to the graph independently with some probability. Such models include both the classic ErdösRényi and stochastic block models, as well as modern generative models such as NetGAN, variational graph autoencoders, and CELL. We prove that subject to a bounded overlap condition, which ensures that the model does not simply memorize a single graph, edge independent models are inherently limited in their ability to generate graphs with high triangle and other subgraph densities. Notably, such high densities are known to appear in realworld social networks and other graphs. We complement our negative results with a simple generative model that balances overlap and accuracy, performing comparably to more complex models in reconstructing many graph statistics.
 Award ID(s):
 1763618
 Publication Date:
 NSFPAR ID:
 10355852
 Journal Name:
 Conference on Neural Information Processing Systems (NeurIPS)
 Sponsoring Org:
 National Science Foundation
More Like this


Lowdimensional embeddings, from classical spectral embeddings to modern neuralnetinspired methods, are a cornerstone in the modeling and analysis of complex networks. Recent work by Seshadhri et al. (PNAS 2020) suggests that such embeddings cannot capture local structure arising in complex networks. In particular, they show that any network generated from a natural lowdimensional model cannot be both sparse and have high triangle density (high clustering coefficient), two hallmark properties of many realworld networks. In this work we show that the results of Seshadhri et al. are intimately connected to the model they use rather than the lowdimensional structure of complex networks. Specifically, we prove that a minor relaxation of their model can generate sparse graphs with high triangle density. Surprisingly, we show that this same model leads to exact lowdimensional factorizations of many realworld networks. We give a simple algorithm based on logistic principal component analysis (LPCA) that succeeds in finding such exact embeddings. Finally, we perform a large number of experiments that verify the ability of very lowdimensional embeddings to capture local structure in realworld networks.

We are now over four decades into digitally managing the names of Earth's species. As the number of federating (i.e., software that brings together previously disparate projects under a common infrastructure, for example TaxonWorks) and aggregating (e.g., International Plant Name Index, Catalog of Life (CoL)) efforts increase, there remains an unmet need for both the migration forward of old data, and for the production of new, precise and comprehensive nomenclatural catalogs. Given this context, we provide an overview of how TaxonWorks seeks to contribute to this effort, and where it might evolve in the future. In TaxonWorks, when we talk about governed names and relationships, we mean it in the sense of existing international codes of nomenclature (e.g., the International Code of Zoological Nomenclature (ICZN)). More technically, nomenclature is defined as a set of objective assertions that describe the relationships between the names given to biological taxa and the rules that determine how those names are governed. It is critical to note that this is not the same thing as the relationship between a name and a biological entity, but rather nomenclature in TaxonWorks represents the details of the (governed) relationships between names. Rather than thinking of nomenclature as changingmore »

The transitive closure of a graph is a new graph where every vertex is directly connected to all vertices to which it had a path in the original graph. Transitive closures are useful for reachability and relationship querying. Finding the transitive closure can be computationally expensive and requires a large memory footprint as the output is typically larger than the input. Some of the original research on transitive closures assumed that graphs were dense and used dense adjacency matrices. We have since learned that many realworld networks are extremely sparse, and the existing methods do not scale. In this work, we introduce a new algorithm called Antisection Transitive Closure (ATC) for finding the transitive closure of a graph. We present a new parallel edges operation – antisections – for finding new edges to reachable vertices. ATC scales to massively multithreaded systems such as NVIDIA’s GPU with tens of thousands of threads. We show that the antisection operation shares some traits with the triangle counting intersection operation in graph analysis. Lastly, we view the transitive closure problem as a dynamic graph problem requiring edge insertions. By doing this, our memory footprint is smaller. We also show a method for creating themore »

Triangle counting is a fundamental technique in network analysis, that has received much attention in various input models. The vast majority of triangle counting algorithms are targeted to static graphs. Yet, many realworld graphs are directed and temporal, where edges come with timestamps. Temporal triangles yield much more information, since they account for both the graph topology and the timestamps. Temporal triangle counting has seen a few recent results, but there are varying definitions of temporal triangles. In all cases, temporal triangle patterns enforce constraints on the time interval between edges (in the triangle). We define a general notion (δ1,3,δ1,2,δ2,3)temporal triangles that allows for separate time constraints for all pairs of edges. Our main result is a new algorithm, DOTTT (Degeneracy Oriented Temporal Triangle Totaler), that exactly counts all directed variants of (δ1,3,δ1,2,δ2,3)temporal triangles. Using the classic idea of degeneracy ordering with careful combinatorial arguments, we can prove that DOTTT runs in O(mκlogm) time, where m is the number of (temporal) edges and κ is the graph degeneracy (max core number). Up to log factors, this matches the running time of the best static triangle counters. Moreover, this running time is better than existing. DOTTT has excellent practical behavior andmore »