skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning attribute and homophily measures through random walks
Abstract We investigate the statistical learning of nodal attribute functionals in homophily networks using random walks. Attributes can be discrete or continuous. A generalization of various existing canonical models, based on preferential attachment is studied (model class $$\mathscr {P}$$ P ), where new nodes form connections dependent on both their attribute values and popularity as measured by degree. An associated model class $$\mathscr {U}$$ U is described, which is amenable to theoretical analysis and gives access to asymptotics of a host of functionals of interest. Settings where asymptotics for model class $$\mathscr {U}$$ U transfer over to model class $$\mathscr {P}$$ P through the phenomenon of resolvability are analyzed. For the statistical learning, we consider several canonical attribute agnostic sampling schemes such as Metropolis-Hasting random walk, versions of node2vec (Grover and Leskovec, 2016) that incorporate both classical random walk and non-backtracking propensities and propose new variants which use attribute information in addition to topological information to explore the network. Estimators for learning the attribute distribution, degree distribution for an attribute type and homophily measures are proposed. The performance of such statistical learning framework is studied on both synthetic networks (model class $$\mathscr {P}$$ P ) and real world systems, and its dependence on the network topology, degree of homophily or absence thereof, (un)balanced attributes, is assessed.  more » « less
Award ID(s):
2113662
PAR ID:
10433604
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Applied Network Science
Volume:
8
Issue:
1
ISSN:
2364-8228
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cherifi, H.; Mantegna, R.N.; Rocha, L.M.; Cherifi, C.; Micciche, S. (Ed.)
    We investigate the statistical learning of nodal attribute distributions in homophily networks using random walks. Attributes can be discrete or continuous. A generalization of various existing canonical models, based on preferential attachment is studied, where new nodes form connections dependent on both their attribute values and popularity as measured by degree. We consider several canonical attribute agnostic sampling schemes such as Metropolis-Hasting random walk, versions of node2vec (Grover and Leskovec 2016) that incorporate both classical random walk and non-backtracking propensities and propose new variants which use attribute information in addition to topological information to explore the network. The performance of such algorithms is studied on both synthetic networks and real world systems, and its dependence on the degree of homophily, or absence thereof, is assessed. 
    more » « less
  2. Graph neural networks (GNNs) have emerged as a powerful tool for modeling graph data due to their ability to learn a concise representation of the data by integrating the node attributes and link information in a principled fashion. However, despite their promise, there are several practical challenges that must be overcome to effectively use them for node classification problems. In particular, current approaches are vulnerable to different kinds of biases inherent in the graph data. First, if the class distribution is imbalanced, then the GNNs' loss function is biased towards classifying the majority class correctly rather than the minority class, which hurts the performance of the latter class. Second, due to homophily effect, the learned representation and subsequent downstream tasks may favor certain demographic groups over others when applied to social network data. To mitigate such biases, we propose a novel framework called Fairness-Aware Cost Sensitive Graph Convolutional Network (FACS-GCN) for classifying nodes in networks with skewed class distributions. Our approach combines a cost-sensitive exponential loss with an adversarial learning component to alleviate the ill-effects of both biases. The framework employs a stagewise additive modeling approach to ensure there is no significant loss in accuracy when imparting fairness into the GNN. Experimental results on 6 benchmark graph data demonstrate the effectiveness of FACS-GCN against comparable baseline methods in terms of promoting fairness while maintaining a high model accuracy on the majority of the datasets. 
    more » « less
  3. Abstract Preferential attachment, homophily, and their consequences such as scale-free (i.e. power-law) degree distributions, the glass ceiling effect (the unseen, yet unbreakable barrier that keeps minorities and women from rising to the upper rungs of the corporate ladder, regardless of their qualifications or achievements) and perception bias are well-studied in undirected networks. However, such consequences and the factors that lead to their emergence in directed networks (e.g. author–citation graphs, Twitter) are yet to be coherently explained in an intuitive, theoretically tractable manner using a single dynamical model. To this end, we present a theoretical and numerical analysis of the novel Directed Mixed Preferential Attachment model in order to explain the emergence of scale-free degree distributions and the glass ceiling effect in directed networks with two groups (minority and majority). Specifically, we first derive closed-form expressions for the power-law exponents of the in-degree and out-degree distributions of each of the two groups and then compare the derived exponents with each other to obtain useful insights. These insights include answers to questions such as: when does the minority group have an out-degree (or in-degree) distribution with a heavier tail compared to the majority group? what factors cause the tail of the out-degree distribution of a group to be heavier than the tail of its own in-degree distribution? what effect does frequent addition of edges between existing nodes have on the in-degree and out-degree distributions of the majority and minority groups? Answers to these questions shed light on the interplay between structure (i.e. the in-degree and out-degree distributions of the two groups) and dynamics (characterized collectively by the homophily, preferential attachment, group sizes and growth dynamics) of various real-world directed networks. We also provide a novel definition of the glass ceiling faced by a group via the number of individuals with large out-degree (i.e. those with many followers) normalized by the number of individuals with large in-degree (i.e. those who follow many others) and then use it to characterize the conditions that cause the glass ceiling effect to emerge in a directed network. Our analytical results are supported by detailed numerical experiments. The DMPA model and its theoretical and numerical analysis provided in this article are useful for analysing various phenomena on directed networks in fields such as network science and computational social science. 
    more » « less
  4. Abstract Understanding actor collaboration networks and their evolution is essential to promoting collective action in resilience planning and management of interdependent infrastructure systems. Local interactions and choice homophily are two important network evolution mechanisms. Network motifs encode the information of network formation, configuration, and the local structure. Homophily effects, on the other hand, capture whether the network configurations have significant correlations with node properties. The objective of this paper is to explore the extent to which local interactions and homophily effects influence actor collaboration in resilience planning and management of interdependent infrastructure systems. We mapped bipartite actor collaboration network based on a post-Hurricane Harvey stakeholder survey that revealed actor collaborations for hazard mitigation. We examined seven bipartite network motifs for the mapped collaboration network and compared the mapped network to simulated random models with same degree distributions. Then we examined whether the network configurations had significant statistics for node properties using exponential random graph models. The results provide insights about the two mechanisms—local interactions and homophily effect—influencing the formation of actor collaboration in resilience planning and management of interdependent urban systems. The findings have implications for improving network cohesion and actor collaborations from diverse urban sectors. 
    more » « less
  5. Social networks are very important carriers of information. For instance, the political leaning of our friends can serve as a proxy to identify our own political preferences. This explanatory power is leveraged in many scenarios ranging from business decision‐ making to scientific research to infer missing attributes using machine learning. How‐ ever, factors affecting the performance and the direction of bias of these algorithms are not well understood. To this end, we systematically study how structural properties of the network and the training sample influence the results of collective classification. Our main findings show that (i) mean classification performance can empirically and analytically be predicted by structural properties such as homophily, class balance, edge density and sample size, (ii) small training samples are enough for heterophilic networks to achieve high and unbiased classification performance, even with imper‐ fect model estimates, (iii) homophilic networks are more prone to bias issues and low performance when group size differences increase, (iv) when sampling budgets are small, partial crawls achieve the most accurate model estimates, and degree sampling achieves the highest overall performance. Our findings help practitioners to better understand and evaluate their results when sampling budgets are small or when no ground‐truth is available. 
    more » « less