Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to nonfederal websites. Their policies may differ from this site.

In this work, we consider the popular treebased search strategy within the framework of reinforcement learning, the Monte Carlo tree search (MCTS), in the context of the infinitehorizon discounted cost Markov decision process (MDP). Although MCTS is believed to provide an approximate value function for a given state with enough simulations, the claimed proof of this property is incomplete. This is because the variant of MCTS, the upper confidence bound for trees (UCT), analyzed in prior works, uses “logarithmic” bonus term for balancing exploration and exploitation within the treebased search, following the insights from stochastic multiarm bandit (MAB) literature. In effect, such an approach assumes that the regret of the underlying recursively dependent nonstationary MABs concentrates around their mean exponentially in the number of steps, which is unlikely to hold, even for stationary MABs. As the key contribution of this work, we establish polynomial concentration property of regret for a class of nonstationary MABs. This in turn establishes that the MCTS with appropriate polynomial rather than logarithmic bonus term in UCB has a claimed property. Interestingly enough, empirically successful approaches use a similar polynomial form of MCTS as suggested by our result. Using this as a building block, we arguemore »Free, publiclyaccessible full text available May 1, 2023

We consider sparse matrix estimation where the goal is to estimate an nbyn matrix from noisy observations of a small subset of its entries. We analyze the estimation error of the popularly used collaborative filtering algorithm for the sparse regime. Specifically, we propose a novel iterative variant of the algorithm, adapted to handle the setting of sparse observations. We establish that as long as the number of entries observed at random scale logarithmically larger than linear in n, the estimation error with respect to the entrywise max norm decays to zero as n goes to infinity, assuming the underlying matrix of interest has constant rank r. Our result is robust to model misspecification in that if the underlying matrix is approximately rank r, then the estimation error decays to the approximation error with respect to the [Formula: see text]norm. In the process, we establish the algorithm’s ability to handle arbitrary bounded noise in the observations.Free, publiclyaccessible full text available December 6, 2022

Variational methods, such as meanfield (MF) and treereweighted (TRW), provide computationally efficient approximations of the logpartition function for generic graphical models but their approximation ratio is generally not quantified. As the primary contribution of this work, we provide an approach to quantify their approximation ratio for any discrete pairwise graphical model with nonnegative potentials through a property of the underlying graph structure G. Specifically, we argue that (a variant of) TRW produces an estimate within factor K(G) which captures how far G is from tree structure. As a consequence, the approximation ratio is 1 for trees. The quantity K(G) is the solution of a minmax problem associated with the spanning tree polytope of G that can be evaluated in polynomial time for any graph. We provide a near lineartime variant that achieves an approximation ratio depending on the minimal (across edges) effective resistance of the graph. We connect our results to the graph partition approximation method and thus provide a unified perspective.

Probability distributions over rankings are crucial for the modeling and design of a wide range of practical systems. In this work, we pursue a nonparametric approach that seeks to learn a distribution over rankings (aka the ranking model) that is consistent with the observed data and has the sparsest possible support (i.e., the smallest number of rankings with nonzero probability mass). We focus on firstorder marginal data, which comprise information on the probability that item i is ranked at position j, for all possible item and position pairs. The observed data may be noisy. Finding the sparsest approximation requires brute force search in the worst case. To address this issue, we restrict our search to, what we dub, the signature family, and show that the sparsest model within the signature family can be found computationally efficiently compared with the brute force approach. We then establish that the signature family provides good approximations to popular ranking model classes, such as the multinomial logit and the exponential family classes, with support size that is small relative to the dimension of the observed data. We test our methods on two data sets: the ranked election data set from the American Psychological Association andmore »

We discuss the question of learning distribution over permutations of a given set of choices, options or items based on partial observations. This is central to capturing the so called ``choice'' in a variety of contexts: understanding preferences of consumers over a collection of products based on purchasing and browsing data in the setting of retail and ecommerce, learning public opinion amongst a collection of socioeconomic issues based on sparse polling data, deciding a ranking of teams or players based on outcomes of games, electing leaders based on votes, and more generally collaborative decision making based on collective judgement such as accepting paper(s) in a competitive academic conference. The question of learning distribution over permutations arises beyond capturing ``choice'' as well. For example, tracking a collection of objects using noisy cameras, or aggregating ranking of webpages using outcomes of multiple search engines. It is only natural that such a topic has been extensively studied in Economics, Political Science and Psychology for more than a century, and more so recently in Computer Science, Electrical Engineering, Statistics and Operations Research. Here we shall focus on the task of learning distribution over permutations from its marginal distributions of two types: firstorder marginals andmore »

The property of (quasi)reversibility of Markov chains have led to elegant characterization of steadystate distribution for complex queueing networks, e.g. celebrated Jackson networks, BCMP (Baskett, Chandi, Muntz, Palacois) and Kelly theorem. In a nutshell, despite the complicated interaction, in the steadystate, the queues in such networks exhibit independence and subsequently lead to explicit calculations of distributional properties of the queuing network that may seem impossible at the outset. The model of stochastic processing network (cf. Harrison 2000) captures variety of dynamic resource allocation problems including the flowlevel networks used for modeling bandwidth sharing in the Internet, switched networks (cf. Shah, Wischik 2006) for modeling packet scheduling in the Internet router and wireless medium access, and hybrid flowpacket networks for modeling jobandpacket level scheduling in data centers. Unlike before, an appropriate resource allocation or scheduling policy is required in such networks to achieve good performance. Given the complexity, asymptotic analytic approaches such as fluid model or LyapunovFoster criteria to establish positiverecurrence and heavy traffic or diffusion approximation to characterize the scaled steadystate distribution became method of choice. A remarkable progress has been made along these lines over the past few decades, but there is a need for much more to matchmore »

We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and denoise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) lowrank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with nonoverlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la amore »

We consider modelfree reinforcement learning for infinitehorizon discounted Markov Decision Processes (MDPs) with a continuous state space and unknown transition kernel, when only a single sample path under an arbitrary policy of the system is available. We consider the Nearest Neighbor QLearning (NNQL) algorithm to learn the optimal Q function using nearest neighbor regression method. As the main contribution, we provide tight finite sample analysis of the convergence rate. In particular, for MDPs with a ddimensional state space and the discounted factor in (0, 1), given an arbitrary sample path with “covering time” L, we establish that the algorithm is guaranteed to output an "accurate estimate of the optimal Qfunction nearly optimal sample complexity.

We propose an algorithm to impute and forecast a time series by transforming the observed time series into a matrix, utilizing matrix estimation to recover missing values and denoise observed entries, and performing linear regression to make predictions. At the core of our analysis is a representation result, which states that for a large class of models, the transformed time series matrix is (approximately) lowrank. In effect, this generalizes the widely used Singular Spectrum Analysis (SSA) in the time series literature, and allows us to establish a rigorous link between time series analysis and matrix estimation. The key to establishing this link is constructing a Page matrix with nonoverlapping entries rather than a Hankel matrix as is commonly done in the literature (e.g., SSA). This particular matrix structure allows us to provide finite sample analysis for imputation and prediction, and prove the asymptotic consistency of our method. Another salient feature of our algorithm is that it is model agnostic with respect to both the underlying time dynamics and the noise distribution in the observations. The noise agnostic property of our approach allows us to recover the latent states when only given access to noisy and partial observations a la amore »