Content spread inequity is a potential unfairness issue in online social networks, disparately impacting minority groups. In this paper, we view friendship suggestion, a common feature in social network platforms, as an opportunity to achieve an equitable spread of content. In particular, we propose to suggest a subset of potential edges (currently not existing in the network but likely to be accepted) that maximizes content spread while achieving fairness. Instead of re-engineering the existing systems, our proposal builds a fairness wrapper on top of the existing friendship suggestion components. We prove the problem is NP-hard and inapproximable in polynomial time unless P=NP. Therefore, allowing relaxation of the fairness constraint, we propose an algorithm based on LP-relaxation and randomized rounding with fixed approximation ratios on fairness and content spread. We provide multiple optimizations, further improving the performance of our algorithm in practice. Besides, we propose a scalable algorithm that dynamically adds subsets of nodes, chosen via iterative sampling, and solves smaller problems corresponding to these nodes. Besides theoretical analysis, we conduct comprehensive experiments on real and synthetic data sets. Across different settings, our algorithms found solutions with near-zero unfairness while significantly increasing the content spread. Our scalable algorithm could process a graph with half a million nodes on a single machine, reducing the unfairness to around 0.0004 while lifting content spread by 43%.
more »
« less
Scalable and Memory-Efficient Algorithms for Controlling Networked Epidemic Processes Using Multiplicative Weights Update Method
We study the problem of designing scalable algorithms to find effective intervention strategies for controlling stochastic epidemic processes on networks. This is a common problem arising in agent based models for epidemic spread.Previous approaches to this problem focus on either heuristics with no guarantees or approximation algorithms that scale only to networks corresponding to county-sized populations, typically, with less than a million nodes. In particular, the mathematical-programming based approaches need to solve the Linear Program (LP) relaxation of the problem using an LP solver, which restricts the scalability of this approach. In this work, we overcome this restriction by designing an algorithm that adapts the multiplicative weights update (MWU) framework, along with the sample average approximation (SAA) technique, to approximately solve the linear program (LP) relaxation for the problem. To scale this approach further, we provide a memory-efficient algorithm that enables scaling to large networks, corresponding to country-size populations, with over 300 million nodes and 30 billion edges. Furthermore, we show that this approach provides near-optimal solutions to the LP in practice.
more »
« less
- PAR ID:
- 10386720
- Date Published:
- Journal Name:
- Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
- Page Range / eLocation ID:
- 5164 to 5170
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
ABSTRACT Efficient contact tracing and isolation is an effective strategy to control epidemics, as seen in the Ebola epidemic and COVID-19 pandemic. An important consideration in contact tracing is the budget on the number of individuals asked to quarantine—the budget is limited for socioeconomic reasons (e.g., having a limited number of contact tracers). Here, we present a Markov Decision Process (MDP) framework to formulate the problem of using contact tracing to reduce the size of an outbreak while limiting the number of people quarantined. We formulate each step of the MDP as a combinatorial problem, MinExposed, which we demonstrate is NP-Hard. Next, we develop two approximation algorithms, one based on rounding the solutions of a linear program and another (greedy algorithm) based on choosing nodes with a high (weighted) degree. A key feature of the greedy algorithm is that it does not need complete information of the underlying social contact network, making it implementable in practice. Using simulations over realistic networks, we show how the algorithms can help in bending the epidemic curve with a limited number of isolated individuals.more » « less
-
Efficient contact tracing and isolation is an effective strategy to control epidemics, as seen in the Ebola epidemic and COVID-19 pandemic. An important consideration in contact tracing is the budget on the number of individuals asked to quarantine—the budget is limited for socioeconomic reasons (e.g., having a limited number of contact tracers). Here, we present a Markov Decision Process (MDP) framework to formulate the problem of using contact tracing to reduce the size of an outbreak while limiting the number of people quarantined. We formulate each step of the MDP as a combinatorial problem, MinExposed, which we demonstrate is NP-Hard. Next, we develop two approximation algorithms, one based on rounding the solutions of a linear program and another (greedy algorithm) based on choosing nodes with a high (weighted) degree. A key feature of the greedy algorithm is that it does not need complete information of the underlying social contact network, making it implementable in practice. Using simulations over realistic networks, we show how the algorithms can help in bending the epidemic curve with a limited number of isolated individuals.more » « less
-
null (Ed.)While social networks are widely used as a media for information diffusion, attackers can also strategically employ analytical tools, such as influence maximization, to maximize the spread of adversarial content through the networks. We investigate the problem of limiting the diffusion of negative information by blocking nodes and edges in the network. We formulate the interaction between the defender and the attacker as a Stackelberg game where the defender first chooses a set of nodes to block and then the attacker selects a set of seeds to spread negative information from. This yields an extremely complex bi- level optimization problem, particularly since even the standard influence measures are difficult to compute. Our approach is to approximate the attacker’s problem as the maximum node domination problem. To solve this problem, we first develop a method based on integer programming combined with constraint generation. Next, to improve scalability, we develop an approximate solution method that represents the attacker’s problem as an integer program, and then combines relaxation with duality to yield an upper bound on the defender’s objective that can be computed using mixed integer linear programming. Finally, we propose an even more scalable heuristic method that prunes nodes from the consideration set based on their degree. Extensive experiments demonstrate the efficacy of our approaches.more » « less
-
The task of extracting a diverse subset from a dataset, often referred to as maximum diversification, plays a pivotal role in various real-world applications that have far-reaching consequences. In this work, we delve into the realm of fairness-aware data subset selection, specifically focusing on the problem of selecting a diverse set of size k from a large collection of n data points (FairDiv). The FairDiv problem is well-studied in the data management and theory community. In this work, we develop the first constant approximation algorithm for FairDiv that runs in near-linear time using only linear space. In contrast, all previously known constant approximation algorithms run in super-linear time (with respect to n or k) and use super-linear space. Our approach achieves this efficiency by employing a novel combination of the Multiplicative Weight Update method and advanced geometric data structures to implicitly and approximately solve a linear program. Furthermore, we improve the efficiency of our techniques by constructing a coreset. Using our coreset, we also propose the first efficient streaming algorithm for the FairDiv problem whose efficiency does not depend on the distribution of data points. Empirical evaluation on million-sized datasets demonstrates that our algorithm achieves the best diversity within a minute. All prior techniques are either highly inefficient or do not generate a good solution.more » « less
An official website of the United States government

