skip to main content


This content will become publicly available on December 1, 2024

Title: Learning attribute and homophily measures through random walks
Abstract We investigate the statistical learning of nodal attribute functionals in homophily networks using random walks. Attributes can be discrete or continuous. A generalization of various existing canonical models, based on preferential attachment is studied (model class $$\mathscr {P}$$ P ), where new nodes form connections dependent on both their attribute values and popularity as measured by degree. An associated model class $$\mathscr {U}$$ U is described, which is amenable to theoretical analysis and gives access to asymptotics of a host of functionals of interest. Settings where asymptotics for model class $$\mathscr {U}$$ U transfer over to model class $$\mathscr {P}$$ P through the phenomenon of resolvability are analyzed. For the statistical learning, we consider several canonical attribute agnostic sampling schemes such as Metropolis-Hasting random walk, versions of node2vec (Grover and Leskovec, 2016) that incorporate both classical random walk and non-backtracking propensities and propose new variants which use attribute information in addition to topological information to explore the network. Estimators for learning the attribute distribution, degree distribution for an attribute type and homophily measures are proposed. The performance of such statistical learning framework is studied on both synthetic networks (model class $$\mathscr {P}$$ P ) and real world systems, and its dependence on the network topology, degree of homophily or absence thereof, (un)balanced attributes, is assessed.  more » « less
Award ID(s):
2113662
NSF-PAR ID:
10433604
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Applied Network Science
Volume:
8
Issue:
1
ISSN:
2364-8228
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cherifi, H. ; Mantegna, R.N. ; Rocha, L.M. ; Cherifi, C. ; Micciche, S. (Ed.)
    We investigate the statistical learning of nodal attribute distributions in homophily networks using random walks. Attributes can be discrete or continuous. A generalization of various existing canonical models, based on preferential attachment is studied, where new nodes form connections dependent on both their attribute values and popularity as measured by degree. We consider several canonical attribute agnostic sampling schemes such as Metropolis-Hasting random walk, versions of node2vec (Grover and Leskovec 2016) that incorporate both classical random walk and non-backtracking propensities and propose new variants which use attribute information in addition to topological information to explore the network. The performance of such algorithms is studied on both synthetic networks and real world systems, and its dependence on the degree of homophily, or absence thereof, is assessed. 
    more » « less
  2. In cognitive diagnostic assessment, multiple fine-grained attributes are measured simultaneously. Attribute hierarchies are considered important structural features of cognitive diagnostic models (CDMs) that provide useful information about the nature of attributes. Templin and Bradshaw first introduced a hierarchical diagnostic classification model (HDCM) that directly takes into account attribute hierarchies, and hence, HDCM is nested within more general CDMs. They also formulated an empirically driven hypothesis test to statistically test one hypothesized link (between two attributes) at a time. However, their likelihood ratio test statistic does not have a known reference distribution, so it is cumbersome to perform hypothesis testing at scale. In this article, we studied two exploratory approaches that could learn the attribute hierarchies directly from data, namely, the latent variable selection (LVS) approach and the regularized latent class modeling (RLCM) approach. An identification constraint was proposed for the LVS approach. Simulation results revealed that both approaches could successfully identify different types of attribute hierarchies, when the underlying CDM is either the deterministic input noisy and gate model or the saturated log-linear CDM. The LVS approach outperformed the RLCM approach, especially when the total number of attributes increases.

     
    more » « less
  3. Graph neural networks (GNNs) have emerged as a powerful tool for modeling graph data due to their ability to learn a concise representation of the data by integrating the node attributes and link information in a principled fashion. However, despite their promise, there are several practical challenges that must be overcome to effectively use them for node classification problems. In particular, current approaches are vulnerable to different kinds of biases inherent in the graph data. First, if the class distribution is imbalanced, then the GNNs' loss function is biased towards classifying the majority class correctly rather than the minority class, which hurts the performance of the latter class. Second, due to homophily effect, the learned representation and subsequent downstream tasks may favor certain demographic groups over others when applied to social network data. To mitigate such biases, we propose a novel framework called Fairness-Aware Cost Sensitive Graph Convolutional Network (FACS-GCN) for classifying nodes in networks with skewed class distributions. Our approach combines a cost-sensitive exponential loss with an adversarial learning component to alleviate the ill-effects of both biases. The framework employs a stagewise additive modeling approach to ensure there is no significant loss in accuracy when imparting fairness into the GNN. Experimental results on 6 benchmark graph data demonstrate the effectiveness of FACS-GCN against comparable baseline methods in terms of promoting fairness while maintaining a high model accuracy on the majority of the datasets. 
    more » « less
  4. Abstract

    Preferential attachment, homophily, and their consequences such as scale-free (i.e. power-law) degree distributions, the glass ceiling effect (the unseen, yet unbreakable barrier that keeps minorities and women from rising to the upper rungs of the corporate ladder, regardless of their qualifications or achievements) and perception bias are well-studied in undirected networks. However, such consequences and the factors that lead to their emergence in directed networks (e.g. author–citation graphs, Twitter) are yet to be coherently explained in an intuitive, theoretically tractable manner using a single dynamical model. To this end, we present a theoretical and numerical analysis of the novel Directed Mixed Preferential Attachment model in order to explain the emergence of scale-free degree distributions and the glass ceiling effect in directed networks with two groups (minority and majority). Specifically, we first derive closed-form expressions for the power-law exponents of the in-degree and out-degree distributions of each of the two groups and then compare the derived exponents with each other to obtain useful insights. These insights include answers to questions such as: when does the minority group have an out-degree (or in-degree) distribution with a heavier tail compared to the majority group? what factors cause the tail of the out-degree distribution of a group to be heavier than the tail of its own in-degree distribution? what effect does frequent addition of edges between existing nodes have on the in-degree and out-degree distributions of the majority and minority groups? Answers to these questions shed light on the interplay between structure (i.e. the in-degree and out-degree distributions of the two groups) and dynamics (characterized collectively by the homophily, preferential attachment, group sizes and growth dynamics) of various real-world directed networks. We also provide a novel definition of the glass ceiling faced by a group via the number of individuals with large out-degree (i.e. those with many followers) normalized by the number of individuals with large in-degree (i.e. those who follow many others) and then use it to characterize the conditions that cause the glass ceiling effect to emerge in a directed network. Our analytical results are supported by detailed numerical experiments. The DMPA model and its theoretical and numerical analysis provided in this article are useful for analysing various phenomena on directed networks in fields such as network science and computational social science.

     
    more » « less
  5. Abstract Understanding actor collaboration networks and their evolution is essential to promoting collective action in resilience planning and management of interdependent infrastructure systems. Local interactions and choice homophily are two important network evolution mechanisms. Network motifs encode the information of network formation, configuration, and the local structure. Homophily effects, on the other hand, capture whether the network configurations have significant correlations with node properties. The objective of this paper is to explore the extent to which local interactions and homophily effects influence actor collaboration in resilience planning and management of interdependent infrastructure systems. We mapped bipartite actor collaboration network based on a post-Hurricane Harvey stakeholder survey that revealed actor collaborations for hazard mitigation. We examined seven bipartite network motifs for the mapped collaboration network and compared the mapped network to simulated random models with same degree distributions. Then we examined whether the network configurations had significant statistics for node properties using exponential random graph models. The results provide insights about the two mechanisms—local interactions and homophily effect—influencing the formation of actor collaboration in resilience planning and management of interdependent urban systems. The findings have implications for improving network cohesion and actor collaborations from diverse urban sectors. 
    more » « less