Link prediction has been widely applied in social network analysis. Despite its importance, link prediction algorithms can be biased by disfavoring the links between individuals in particular demographic groups. In this paper, we study one particular type of bias, namely, the bias in predicting inter-group links (i.e., links across different demographic groups). First, we formalize the definition of bias in link prediction by providing quantitative measurements of accuracy disparity, which measures the difference in prediction accuracy of inter-group and intra-group links. Second, we unveil the existence of bias in six existing state-of-the-art link prediction algorithms through extensive empirical studies over real world datasets. Third, we identify the imbalanced density across intra-group and inter-group links in training graphs as one of the underlying causes of bias in link prediction. Based on the identified cause, fourth, we design a pre-processing bias mitigation method named FairLP to modify the training graph, aiming to balance the distribution of intra-group and inter-group links while preserving the network characteristics of the graph. FairLP is model-agnostic and thus is compatible with any existing link prediction algorithm. Our experimental results on real-world social network graphs demonstrate that FairLP achieves better trade-off between fairness and prediction accuracy than the existing fairness-enhancing link prediction methods.
more »
« less
Graph Constrained Group Testing
In network tomography, one goal is to identify a small set of failed links in a network using as little information as possible. One way of setting up this problem is called graph-constrained group testing. Graph-constrained group testing is a variant of the classical combinatorial group testing problem, where the tests that one is allowed are additionally constrained by a graph. In this case, the graph is given by the underlying network topology. The main contribution of this work is to show that for most graphs, the constraints imposed by the graph are no constraint at all. That is, the number of tests required to identify the failed links in graph-constrained group testing is near-optimal even for the corresponding group testing problem with no graph constraints. Our approach is based on a simple randomized construction of tests. To analyze our construction, we prove new results about the size of giant components in randomly sparsified graphs. Finally, we provide empirical results which suggest that our connected-subgraph tests perform better not just in theory but also in practice, and in particular perform better on a real-world network topology.
more »
« less
- Award ID(s):
- 1844628
- PAR ID:
- 10132787
- Date Published:
- Journal Name:
- Leibniz international proceedings in informatics
- ISSN:
- 1868-8969
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract In group testing, the goal is to identify a subset of defective items within a larger set of items based on tests whose outcomes indicate whether at least one defective item is present. This problem is relevant in areas such as medical testing, DNA sequencing, communication protocols and many more. In this paper, we study (i) a sparsity-constrained version of the problem, in which the testing procedure is subjected to one of the following two constraints: items are finitely divisible and thus may participate in at most $$\gamma $$ tests; or tests are size-constrained to pool no more than $$\rho $$ items per test; and (ii) a noisy version of the problem, where each test outcome is independently flipped with some constant probability. Under each of these settings, considering the for-each recovery guarantee with asymptotically vanishing error probability, we introduce a fast splitting algorithm and establish its near-optimality not only in terms of the number of tests, but also in terms of the decoding time. While the most basic formulations of our algorithms require $$\varOmega (n)$$ storage for each algorithm, we also provide low-storage variants based on hashing, with similar recovery guarantees.more » « less
-
Karlapalem, Kamal; Cheng, Hong; Ramakrishnan, Naren; null; null; Reddy, P. Krishna; Srivastava, Jaideep; Chakraborty, Tanmoy (Ed.)Constrained learning, a weakly supervised learning task, aims to incorporate domain constraints to learn models without requiring labels for each instance. Because weak supervision knowledge is useful and easy to obtain, constrained learning outperforms unsupervised learning in performance and is preferable than supervised learning in terms of labeling costs. To date, constrained learning, especially constrained clustering, has been extensively studied, but was primarily focused on data in the Euclidean space. In this paper, we propose a weak supervision network embedding (WSNE) for constrained learning of graphs. Because no label is available for individual nodes, we propose a new loss function to quantify the constraint-based loss, and integrate this loss in a graph convolutional neural network (GCN) and variational graph auto-encoder (VGAE) combined framework to jointly model graph structures and node attributes. The joint optimization allows WSNE to learn embedding not only preserving network topology and content, but also satisfying the constraints. Experiments show that WSNE outperforms baselines for constrained graph learning tasks, including constrained graph clustering and constrained graph classification.more » « less
-
We address the problem of online topology inference from streaming nodal observations of graph signals generated by linear diffusion dynamics on the sought graph. To that end, we leverage the stationarity of the signals and use the so-called graph-shift operator (GSO) as a matrix representation of the graph. Under this model, estimated covariance eigenvectors obtained from streaming independent graph signals diffused on the sought network are a valid estimator of the GSO's spectral templates. We develop an ADMM algorithm to find a sparse and structurally admissible GSO given the eigenvectors estimate. Then, we propose an online scheme that upon sensing new diffused observations, efficiently updates eigenvectors (thus makes more accurate on expectation) and performs only one or a few iterations of the mentioned ADMM until the new data is observed. Numerical tests illustrate the effectiveness of the proposed topology inference approach in recovering large scale graphs, adapting to streaming information, and accommodating changes in the sought network.more » « less
-
We address the problem of inferring a directed network from nodal observations of graph signals generated by linear diffusion dynamics on the sought graph. Observations are modeled as the outputs of a linear graph filter (i.e., a polynomial on a local diffusion graph-shift operator encoding the unknown graph topology), excited with an ensemble of independent graph signals with arbitrarily-correlated nodal components. In this context, we first rely on observations of the output signals along with prior statistical information on the inputs to identify the diffusion filter. Such problem entails solving a system of quadratic matrix equations, which we recast as a smooth quadratic minimization subject to Stiefel manifold constraints. Subsequent identification of the network topology given the graph filter estimate boils down to finding a sparse and structurally admissible shift that commutes with the given filter, thus forcing the latter to be a polynomial in the sought graph-shift operator. Preliminary numerical tests corroborating the effectiveness of the proposed algorithms in recovering synthetic and real-world digraphs are provided.more » « less
An official website of the United States government

