skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fairness-Aware Graph Sampling for Network Analysis
Network sampling is the task of selecting a subset of nodes and links from a network in a way that preserves its topological properties and other user requirements. This paper investigates the problem of generating an unbiased network sample that contains balanced proportion of nodes from different groups. Creating such a representative sample would require handling the trade-off between ensuring structural preservability and group representativity of the selected nodes. We present a novel max-min subgraph fairness measure that can be used as a unifying framework to combine both criteria. A greedy algorithm is then proposed to generate a fair and representative sample from an initial set of target nodes. A theoretical approximation guarantee for the output of the proposed greedy algorithm based on submodularity and curvature ratios is also presented. Experimental results on real-world datasets show that the proposed method will generate more fair and representative samples compared to other existing network sampling methods.  more » « less
Award ID(s):
1939368
PAR ID:
10384975
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the IEEE International Conference on Data Mining
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Ranzato, M.; Beygelzimer, A.; Liang, P.S.; Vaughan, J.W.; Dauphin, Y. (Ed.)
    Fairness and robustness are critical elements of Trustworthy AI that need to be addressed together. Fairness is about learning an unbiased model while robustness is about learning from corrupted data, and it is known that addressing only one of them may have an adverse affect on the other. In this work, we propose a sample selection-based algorithm for fair and robust training. To this end, we formulate a combinatorial optimization problem for the unbiased selection of samples in the presence of data corruption. Observing that solving this optimization problem is strongly NP-hard, we propose a greedy algorithm that is efficient and effective in practice. Experiments show that our method obtains fairness and robustness that are better than or comparable to the state-of-the-art technique, both on synthetic and benchmark real datasets. Moreover, unlike other fair and robust training baselines, our algorithm can be used by only modifying the sampling step in batch selection without changing the training algorithm or leveraging additional clean data. 
    more » « less
  2. We consider the problem of finding the maximally influential node in random networks where each node influences every other node with constant yet unknown probability. We develop an online algorithm that learns the relative influences of the nodes. It relaxes the assumption in the existing literature that a central observer can monitor the influence spread globally. The proposed algorithm delegates the online updates to the nodes on the network; hence requires only local observations at the nodes. We show that using an explore-then-commit learning strategy, the cumulative regret accumulated by the algorithm over horizon T approaches O(T2/3) for a network with a large number of nodes. Additionally, we show that, for fixed T, the worst case-regret grows linearly with the number n of nodes in the graph. Numerical experiments illustrate this linear dependence for Chung-Lu models. The experiments also demonstrate that ε-greedy learning strategies can achieve similar performance to the explore-then-commit strategy on Chung-Lu models. 
    more » « less
  3. null (Ed.)
    Can we infer all the failed components of an infrastructure network, given a sample of reachable nodes from supply nodes? One of the most critical post-disruption processes after a natural disaster is to quickly determine the damage or failure states of critical infrastructure components. However, this is nontrivial, considering that often only a fraction of components may be accessible or observable after a disruptive event. Past work has looked into inferring failed components given point probes, i.e. with a direct sample of failed components. In contrast, we study the harder problem of inferring failed components given partial information of some ‘serviceable’ reachable nodes and a small sample of point probes, being the first often more practical to obtain. We formulate this novel problem using the Minimum Description Length (MDL) principle, and then present a greedy algorithm that minimizes MDL cost effectively. We evaluate our algorithm on domain-expert simulations of real networks in the aftermath of an earthquake. Our algorithm successfully identified failed components, especially the critical ones affecting the overall system performance. 
    more » « less
  4. In-context learning (ICL) empowers large language models (LLMs) to tackle new tasks by using a series of training instances as prompts. Since generating the prompts needs to sample from a vast pool of instances and annotate them (e.g., add labels in classification task), existing methods have proposed to select a subset of unlabeled examples for annotation, thus enhancing the quality of prompts and concurrently mitigating annotation costs. However, these methods often require a long time to select instances due to their complexity, hindering their practical viability. To address this limitation, we propose a graph-based selection method, FastGAS, designed to efficiently identify high-quality instances while minimizing computational overhead. Initially, we construct a data similarity graph based on instance similarities. Subsequently, employing a graph partitioning algorithm, we partition the graph into pieces. Within each piece (i.e., subgraph), we adopt a greedy approach to pick the most representative nodes. By aggregating nodes from diverse pieces and annotating the corresponding instances, we identify a set of diverse and representative instances for ICL. Compared to prior approaches, our method not only exhibits superior performance on different tasks but also significantly reduces selection time. In addition, we demonstrate the efficacy of our approach in LLMs of larger sizes. 
    more » « less
  5. We study the problem of sampling a bandlimited graph signal in the presence of noise, where the objective is to select a node subset of prescribed cardinality that minimizes the signal reconstruction mean squared error (MSE). To that end, we formulate the task at hand as the minimization of MSE subject to binary constraints, and approximate the resulting NP-hard problem via semidefinite programming (SDP) relaxation. Moreover, we provide an alternative formulation based on maximizing a monotone weak submodular function and propose a randomized-greedy algorithm to find a sub-optimal subset. We then derive a worst-case performance guarantee on the MSE returned by the randomized greedy algorithm for general non-stationary graph signals. The efficacy of the proposed methods is illustrated through numerical simulations on synthetic and realworld graphs. Notably, the randomized greedy algorithm yields an order-of-magnitude speedup over state-of-the-art greedy sampling schemes, while incurring only a marginal MSE performance loss. 
    more » « less