skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Decentralized Online Influence Maximization
We consider the problem of finding the maximally influential node in random networks where each node influences every other node with constant yet unknown probability. We develop an online algorithm that learns the relative influences of the nodes. It relaxes the assumption in the existing literature that a central observer can monitor the influence spread globally. The proposed algorithm delegates the online updates to the nodes on the network; hence requires only local observations at the nodes. We show that using an explore-then-commit learning strategy, the cumulative regret accumulated by the algorithm over horizon T approaches O(T2/3) for a network with a large number of nodes. Additionally, we show that, for fixed T, the worst case-regret grows linearly with the number n of nodes in the graph. Numerical experiments illustrate this linear dependence for Chung-Lu models. The experiments also demonstrate that ε-greedy learning strategies can achieve similar performance to the explore-then-commit strategy on Chung-Lu models.  more » « less
Award ID(s):
1646522 1652113
PAR ID:
10383194
Author(s) / Creator(s):
;
Date Published:
Journal Name:
58th Annual Allerton Conference on Communication, Control, and Computing (Allerton)
Page Range / eLocation ID:
1 to 8
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Cussens, James; Zhang, Kun (Ed.)
    We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step $$t$$ is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a $(1-1/e)$-regret upper bound of $$\mathcal{O}(n^\frac{1}{3}k^\frac{4}{3}T^\frac{2}{3}\log(T)^\frac{1}{2})$$ for a horizon $$T$$, number of base elements $$n$$, and cardinality constraint $$k$$. We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods. 
    more » « less
  2. null (Ed.)
    Regret minimization has proved to be a versatile tool for tree- form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, mod- ern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the player’s decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs—even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restric- tions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used—for example, those in which only black-box access to the environment is available. We give the first, to our knowl- edge, regret-minimization algorithm that guarantees sublinear regret with high probability even when requirement (i)—and thus also (ii)—is dropped. We formalize an online learning setting in which the strategy space is not known to the agent and gets revealed incrementally whenever the agent encoun- ters new decision points. We give an efficient algorithm that achieves O(T 3/4) regret with high probability for that setting, even when the agent faces an adversarial environment. Our experiments show it significantly outperforms the prior algo- rithms for the problem, which do not have such guarantees. It can be used in any application for which regret minimization is useful: approximating Nash equilibrium or quantal response equilibrium, approximating coarse correlated equilibrium in multi-player games, learning a best response, learning safe opponent exploitation, and online play against an unknown opponent/environment. 
    more » « less
  3. We investigate the problem of combinatorial multi-armed bandits with stochastic submodular (in expectation) rewards and full-bandit feedback, where no extra information other than the reward of selected action at each time step is observed. We propose a simple algorithm, Explore-Then-Commit Greedy (ETCG) and prove that it achieves a -regret upper bound of for a horizon , number of base elements , and cardinality constraint . We also show in experiments with synthetic and real-world data that the ETCG empirically outperforms other full-bandit methods. 
    more » « less
  4. We consider the online linear optimization problem, where at every step the algorithm plays a point x_t in the unit ball, and suffers loss for some cost vector c_t that is then revealed to the algorithm. Recent work showed that if an algorithm receives a "hint" h_t that has non-trivial correlation with c_t before it plays x_t, then it can achieve a logarithmic regret guarantee, improving on the classical sqrt(T) bound. In this work, we study the question of whether an algorithm really requires a hint at every time step. Somewhat surprisingly, we show that an algorithm can obtain logarithmic regret with just O(sqrt(T)) hints under a natural query model. We give two applications of our result, to the well-studied setting of optimistic regret bounds and to the problem of online learning with abstention. 
    more » « less
  5. We propose a model for online graph problems where algorithms are given access to an oracle that predicts (e.g., based on modeling assumptions or on past data) the degrees of nodes in the graph. Within this model, we study the classic problem of online bipartite matching, and a natural greedy matching algorithm called MinPredictedDegree, which uses predictions of the degrees of offline nodes. For the bipartite version of a stochastic graph model due to Chung, Lu, and Vu where the expected values of the offline degrees are known and used as predictions, we show that MinPredictedDegree stochastically dominates any other online algorithm, i.e., it is optimal for graphs drawn from this model. Since the “symmetric” version of the model, where all online nodes are identical, is a special case of the well-studied “known i.i.d. model”, it follows that the competitive ratio of MinPredictedDegree on such inputs is at least 0.7299. For the special case of graphs with power law degree distributions, we show that MinPredictedDegree frequently produces matchings almost as large as the true maximum matching on such graphs. We complement these results with an extensive empirical evaluation showing that MinPredictedDegree compares favorably to state-of-the-art online algorithms for online matching. 
    more » « less