We study the \emph{offline reinforcement learning} (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown \emph{Markov Decision Process} (MDP) using the data coming from a policy $$\mu$$. In particular, we consider the sample complexity problems of offline RL for the finite horizon MDPs. Prior works derive the information-theoretical lower bounds based on different data-coverage assumptions and their upper bounds are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the \emph{Adaptive Pessimistic Value Iteration} (APVI) algorithm and derive the suboptimality upper bound that nearly matches $$ O\left(\sum_{h=1}^H\sum_{s_h,a_h}d^{\pi^\star}_h(s_h,a_h)\sqrt{\frac{\mathrm{Var}_{P_{s_h,a_h}}{(V^\star_{h+1}+r_h)}}{d^\mu_h(s_h,a_h)}}\sqrt{\frac{1}{n}}\right). $$ We also prove an information-theoretical lower bound to show this quantity is required under the weak assumption that $$d^\mu_h(s_h,a_h)>0$$ if $$d^{\pi^\star}_h(s_h,a_h)>0$$. Here $$\pi^\star$$ is a optimal policy, $$\mu$$ is the behavior policy and $$d(s_h,a_h)$$ is the marginal state-action probability. We call this adaptive bound the \emph{intrinsic offline reinforcement learning bound} since it directly implies all the existing optimal results: minimax rate under uniform data-coverage assumption, horizon-free setting, single policy concentrability, and the tight problem-dependent results. Later, we extend the result to the \emph{assumption-free} regime (where we make no assumption on $$ \mu$$) and obtain the assumption-free intrinsic bound. Due to its generic form, we believe the intrinsic bound could help illuminate what makes a specific problem hard and reveal the fundamental challenges in offline RL.
more »
« less
Theoretical bounds on the network community profile from low-rank semi-definite programming
We study a new connection between a technical measure called $$\mu$$-conductance that arises in the study of Markov chains for sampling convex bodies and the network community profile that characterizes size-resolved properties of clusters and communities in social and information networks. The idea of $$\mu$$-conductance is similar to the traditional graph conductance, but disregards sets with small volume. We derive a sequence of optimization problems including a low-rank semi-definite program from which we can derive a lower bound on the optimal $$\mu$$-conductance value. These ideas give the first theoretically sound bound on the behavior of the network community profile for a wide range of cluster sizes. The algorithm scales up to graphs with hundreds of thousands of nodes and we demonstrate how our framework validates the predicted structures of real-world graphs.
more »
« less
- Award ID(s):
- 1839317
- PAR ID:
- 10488923
- Publisher / Repository:
- ICML
- Date Published:
- Journal Name:
- International Conference on Machine Learning
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We introduce a notion called entropic independence that is an entropic analog of spectral notions of high-dimensional expansion. Informally, entropic independence of a background distribution $$\mu$$ on $$k$$-sized subsets of a ground set of elements says that for any (possibly randomly chosen) set $$S$$, the relative entropy of a single element of $$S$$ drawn uniformly at random carries at most $O(1/k)$ fraction of the relative entropy of $$S$$. Entropic independence is the analog of the notion of spectral independence, if one replaces variance by entropy. We use entropic independence to derive tight mixing time bounds, overcoming the lossy nature of spectral analysis of Markov chains on exponential-sized state spaces. In our main technical result, we show a general way of deriving entropy contraction, a.k.a. modified log-Sobolev inequalities, for down-up random walks from spectral notions. We show that spectral independence of a distribution under arbitrary external fields automatically implies entropic independence. We furthermore extend our theory to the case where spectral independence does not hold under arbitrary external fields. To do this, we introduce a framework for obtaining tight mixing time bounds for Markov chains based on what we call restricted modified log-Sobolev inequalities, which guarantee entropy contraction not for all distributions, but for those in a sufficiently large neighborhood of the stationary distribution. To derive our results, we relate entropic independence to properties of polynomials: $$\mu$$ is entropically independent exactly when a transformed version of the generating polynomial of $$\mu$$ is upper bounded by its linear tangent; this property is implied by concavity of the said transformation, which was shown by prior work to be locally equivalent to spectral independence. We apply our results to obtain (1) tight modified log-Sobolev inequalities and mixing times for multi-step down-up walks on fractionally log-concave distributions, (2) the tight mixing time of $$O(n\log n)$$ for Glauber dynamics on Ising models whose interaction matrix has eigenspectrum lying within an interval of length smaller than $$1$$, improving upon the prior quadratic dependence on $$n$$, and (3) nearly-linear time $$\widetilde O_{\delta}(n)$$ samplers for the hardcore and Ising models on $$n$$-node graphs that have $$\delta$$-relative gap to the tree-uniqueness threshold. In the last application, our bound on the running time does not depend on the maximum degree $$\Delta$$ of the graph, and is therefore optimal even for high-degree graphs, and in fact, is sublinear in the size of the graph for high-degree graphs.more » « less
-
Cheeger's inequality shows that any undirected graph G with minimum normalized Laplacian eigenvalue lambda_G has a cut with conductance at most O(sqrt{lambda_G}). Qualitatively, Cheeger's inequality says that if the mixing time of a graph is high, there is a cut that certifies this. However, this relationship is not tight, as some graphs (like cycles) do not have cuts with conductance o(sqrt{lambda_G}). To better approximate the mixing time of a graph, we consider a more general object. Specifically, instead of bounding the mixing time with cuts, we bound it with cuts in graphs obtained by Schur complementing out vertices from the graph G. Combinatorially, these Schur complements describe random walks in G restricted to a subset of its vertices. As a result, all Schur complement cuts have conductance at least Omega(lambda_G). We show that unlike with cuts, this inequality is tight up to a constant factor. Specifically, there is a Schur complement cut with conductance at most O(lambda_G).more » « less
-
Consider public health officials aiming to spread awareness about a new vaccine in a community interconnected by a social network. How can they distribute information with minimal resources, so as to avoid polarization and ensure community-wide convergence of opinion? To tackle such challenges, we initiate the study of sample complexity of opinion formation in networks. Our framework is built on the recognized opinion formation game, where we regard each agent’s opinion as a data-derived model, unlike previous works that treat opinions as data-independent scalars. The opinion model for every agent is initially learned from its local samples and evolves game-theoretically as all agents communicate with neighbors and revise their models towards an equilibrium. Our focus is on the sample complexity needed to ensure that the opinions converge to an equilibrium such that every agent’s final model has low generalization error. Our paper has two main technical results. First, we present a novel polynomial time optimization framework to quantify the total sample complexity for arbitrary networks, when the underlying learning problem is (generalized) linear regression. Second, we leverage this optimization to study the network gain which measures the improvement of sample complexity when learning over a network compared to that in isolation. Towards this end, we derive network gain bounds for various network classes including cliques, star graphs, and random regular graphs. Additionally, our framework provides a method to study sample distribution within the network, suggesting that it is sufficient to allocate samples inversely to the degree. Empirical results on both synthetic and real-world networks strongly support our theoretical findings.more » « less
-
Nowadays, large-scale graph data is being generated in a variety of real-world applications, from social networks to co-authorship networks, from protein-protein interaction networks to road traffic networks. Many existing works on graph mining focus on the vertices and edges, with the first-order Markov chain as the underlying model. They fail to explore the high-order network structures, which are of key importance in many high impact domains. For example, in bank customer personally identifiable information (PII) networks, the star structures often correspond to a set of synthetic identities; in financial transaction networks, the loop structures may indicate the existence of money laundering. In this paper, we focus on mining user-specified high-order network structures and aim to find a structure-rich subgraph which does not break many such structures by separating the subgraph from the rest. A key challenge associated with finding a structure-rich subgraph is the prohibitive computational cost. To address this problem, inspired by the family of local graph clustering algorithms for efficiently identifying a low-conductance cut without exploring the entire graph, we propose to generalize the key idea to model high-order network structures. In particular, we start with a generic definition of high-order conductance, and define the high-order diffusion core, which is based on a high-order random walk induced by user-specified high-order network structure. Then we propose a novel High-Order Structure-Preserving LOcal Cut (HOSPLOC) algorithm, which runs in polylogarithmic time with respect to the number of edges in the graph. It starts with a seed vertex and iteratively explores its neighborhood until a subgraph with a small high-order conductance is found. Furthermore, we analyze its performance in terms of both effectiveness and efficiency. The experimental results on both synthetic graphs and real graphs demonstrate the effectiveness and efficiency of our proposed HOSPLOC algorithm.more » « less
An official website of the United States government

