We study the \emph{offline reinforcement learning} (offline RL) problem, where the goal is to learn a reward-maximizing policy in an unknown \emph{Markov Decision Process} (MDP) using the data coming from a policy $$\mu$$. In particular, we consider the sample complexity problems of offline RL for the finite horizon MDPs. Prior works derive the information-theoretical lower bounds based on different data-coverage assumptions and their upper bounds are expressed by the covering coefficients which lack the explicit characterization of system quantities. In this work, we analyze the \emph{Adaptive Pessimistic Value Iteration} (APVI) algorithm and derive the suboptimality upper bound that nearly matches $$ O\left(\sum_{h=1}^H\sum_{s_h,a_h}d^{\pi^\star}_h(s_h,a_h)\sqrt{\frac{\mathrm{Var}_{P_{s_h,a_h}}{(V^\star_{h+1}+r_h)}}{d^\mu_h(s_h,a_h)}}\sqrt{\frac{1}{n}}\right). $$ We also prove an information-theoretical lower bound to show this quantity is required under the weak assumption that $$d^\mu_h(s_h,a_h)>0$$ if $$d^{\pi^\star}_h(s_h,a_h)>0$$. Here $$\pi^\star$$ is a optimal policy, $$\mu$$ is the behavior policy and $$d(s_h,a_h)$$ is the marginal state-action probability. We call this adaptive bound the \emph{intrinsic offline reinforcement learning bound} since it directly implies all the existing optimal results: minimax rate under uniform data-coverage assumption, horizon-free setting, single policy concentrability, and the tight problem-dependent results. Later, we extend the result to the \emph{assumption-free} regime (where we make no assumption on $$ \mu$$) and obtain the assumption-free intrinsic bound. Due to its generic form, we believe the intrinsic bound could help illuminate what makes a specific problem hard and reveal the fundamental challenges in offline RL.
more »
« less
Theoretical bounds on the network community profile from low-rank semi-definite programming
We study a new connection between a technical measure called $$\mu$$-conductance that arises in the study of Markov chains for sampling convex bodies and the network community profile that characterizes size-resolved properties of clusters and communities in social and information networks. The idea of $$\mu$$-conductance is similar to the traditional graph conductance, but disregards sets with small volume. We derive a sequence of optimization problems including a low-rank semi-definite program from which we can derive a lower bound on the optimal $$\mu$$-conductance value. These ideas give the first theoretically sound bound on the behavior of the network community profile for a wide range of cluster sizes. The algorithm scales up to graphs with hundreds of thousands of nodes and we demonstrate how our framework validates the predicted structures of real-world graphs.
more »
« less
- Award ID(s):
- 1839317
- PAR ID:
- 10488923
- Publisher / Repository:
- ICML
- Date Published:
- Journal Name:
- International Conference on Machine Learning
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We introduce a notion called entropic independence that is an entropic analog of spectral notions of high-dimensional expansion. Informally, entropic independence of a background distribution $$\mu$$ on $$k$$-sized subsets of a ground set of elements says that for any (possibly randomly chosen) set $$S$$, the relative entropy of a single element of $$S$$ drawn uniformly at random carries at most $O(1/k)$ fraction of the relative entropy of $$S$$. Entropic independence is the analog of the notion of spectral independence, if one replaces variance by entropy. We use entropic independence to derive tight mixing time bounds, overcoming the lossy nature of spectral analysis of Markov chains on exponential-sized state spaces. In our main technical result, we show a general way of deriving entropy contraction, a.k.a. modified log-Sobolev inequalities, for down-up random walks from spectral notions. We show that spectral independence of a distribution under arbitrary external fields automatically implies entropic independence. We furthermore extend our theory to the case where spectral independence does not hold under arbitrary external fields. To do this, we introduce a framework for obtaining tight mixing time bounds for Markov chains based on what we call restricted modified log-Sobolev inequalities, which guarantee entropy contraction not for all distributions, but for those in a sufficiently large neighborhood of the stationary distribution. To derive our results, we relate entropic independence to properties of polynomials: $$\mu$$ is entropically independent exactly when a transformed version of the generating polynomial of $$\mu$$ is upper bounded by its linear tangent; this property is implied by concavity of the said transformation, which was shown by prior work to be locally equivalent to spectral independence. We apply our results to obtain (1) tight modified log-Sobolev inequalities and mixing times for multi-step down-up walks on fractionally log-concave distributions, (2) the tight mixing time of $$O(n\log n)$$ for Glauber dynamics on Ising models whose interaction matrix has eigenspectrum lying within an interval of length smaller than $$1$$, improving upon the prior quadratic dependence on $$n$$, and (3) nearly-linear time $$\widetilde O_{\delta}(n)$$ samplers for the hardcore and Ising models on $$n$$-node graphs that have $$\delta$$-relative gap to the tree-uniqueness threshold. In the last application, our bound on the running time does not depend on the maximum degree $$\Delta$$ of the graph, and is therefore optimal even for high-degree graphs, and in fact, is sublinear in the size of the graph for high-degree graphs.more » « less
-
Cheeger's inequality shows that any undirected graph G with minimum normalized Laplacian eigenvalue lambda_G has a cut with conductance at most O(sqrt{lambda_G}). Qualitatively, Cheeger's inequality says that if the mixing time of a graph is high, there is a cut that certifies this. However, this relationship is not tight, as some graphs (like cycles) do not have cuts with conductance o(sqrt{lambda_G}). To better approximate the mixing time of a graph, we consider a more general object. Specifically, instead of bounding the mixing time with cuts, we bound it with cuts in graphs obtained by Schur complementing out vertices from the graph G. Combinatorially, these Schur complements describe random walks in G restricted to a subset of its vertices. As a result, all Schur complement cuts have conductance at least Omega(lambda_G). We show that unlike with cuts, this inequality is tight up to a constant factor. Specifically, there is a Schur complement cut with conductance at most O(lambda_G).more » « less
-
null (Ed.)We study games with nonlinear best response functions played on a network consisting of disjoint communities. Prior works on network games have identified conditions to guarantee the uniqueness and stability of Nash equilibria in a network without any community structure. In this paper we are interested in accomplishing the same for networks with a community structure; it turns out that these conditions are much easier to verify with certain community structures. Specifically, we consider multipartite graphs and show that the uniqueness and stability of Nash equilibria are related to matrices which are potentially much lower in dimension, on the order of the number of communities, compared to same-size networks without a multipartite structure, in which case such matrices have a dimension the size of the network. We further introduce a new notion of degree centrality to measure the importance and influence of a community in such a network. We show that this notion enables us to find new conditions for uniqueness and stability of Nash equilibria.more » « less
-
We study the inference of network archaeology in growing random geometric graphs. We consider the root finding problem for a random nearest neighbor tree in dimension d∈N, generated by sequentially embedding vertices uniformly at random in the d-dimensional torus and connecting each new vertex to the nearest existing vertex. More precisely, given an error parameter ε>0 and the unlabeled tree, we want to efficiently find a small set of candidate vertices, such that the root is included in this set with probability at least 1−ε. We call such a candidate set a confidence set. We define several variations of the root finding problem in geometric settings -- embedded, metric, and graph root finding -- which differ based on the nature of the type of metric information provided in addition to the graph structure (torus embedding, edge lengths, or no additional information, respectively). We show that there exist efficient root finding algorithms for embedded and metric root finding. For embedded root finding, we derive upper and lower bounds (uniformly bounded in n) on the size of the confidence set: the upper bound is subpolynomial in 1/ε and stems from an explicit efficient algorithm, and the information-theoretic lower bound is polylogarithmic in 1/ε. In particular, in d=1, we obtain matching upper and lower bounds for a confidence set of size Θ(log(1/ε)loglog(1/ε)).more » « less