skip to main content


Title: Hoeffding's inequality for Markov chains and its applications to statistical learning
This paper establishes Hoeffding’s lemma and inequality for bounded functions of general-state-space and not necessarily reversible Markov chains. The sharpness of these results is characterized by the optimality of the ratio between variance prox- ies in the Markov-dependent and independent settings. The boundedness of functions is shown necessary for such results to hold in general. To showcase the usefulness of the new results, we apply them for non-asymptotic analyses of MCMC estima- tion, respondent-driven sampling and high-dimensional covariance matrix estimation on time series data with a Markovian nature. In addition to statistical problems, we also apply them to study the time-discounted rewards in econometric models and the multi-armed bandit problem with Markovian rewards arising from the field of machine learning.  more » « less
Award ID(s):
1662139 1712591
NSF-PAR ID:
10326333
Author(s) / Creator(s):
Date Published:
Journal Name:
Journal of machine learning research
Volume:
22
ISSN:
1533-7928
Page Range / eLocation ID:
1-35
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent’s task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of the Q-function on other agents decreases exponentially as the distance between them increases. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by three case studies, two wireless communication case studies with independent and dependent reward functions, respectively, and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation. 
    more » « less
  2. Abstract Motivation

    Genome annotations are a common way to represent genomic features such as genes, regulatory elements or epigenetic modifications. The amount of overlap between two annotations is often used to ascertain if there is an underlying biological connection between them. In order to distinguish between true biological association and overlap by pure chance, a robust measure of significance is required. One common way to do this is to determine if the number of intervals in the reference annotation that intersect the query annotation is statistically significant. However, currently employed statistical frameworks are often either inefficient or inaccurate when computing P-values on the scale of the whole human genome.

    Results

    We show that finding the P-values under the typically used ‘gold’ null hypothesis is NP-hard. This motivates us to reformulate the null hypothesis using Markov chains. To be able to measure the fidelity of our Markovian null hypothesis, we develop a fast direct sampling algorithm to estimate the P-value under the gold null hypothesis. We then present an open-source software tool MCDP that computes the P-values under the Markovian null hypothesis in O(m2+n) time and O(m) memory, where m and n are the numbers of intervals in the reference and query annotations, respectively. Notably, MCDP runtime and memory usage are independent from the genome length, allowing it to outperform previous approaches in runtime and memory usage by orders of magnitude on human genome annotations, while maintaining the same level of accuracy.

    Availability and implementation

    The software is available at https://github.com/fmfi-compbio/mc-overlaps. All data for reproducibility are available at https://github.com/fmfi-compbio/mc-overlaps-reproducibility.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. null (Ed.)
    We study ergodic properties of Markovian multiclass many-server queues that are uniform over scheduling policies and the size of the system. The system is heavily loaded in the Halfin–Whitt regime, and the scheduling policies are work conserving and preemptive. We provide a unified approach via a Lyapunov function method that establishes Foster–Lyapunov equations for both the limiting diffusion and the prelimit diffusion-scaled queuing processes simultaneously. We first study the limiting controlled diffusion and show that if the spare capacity (safety staffing) parameter is positive, the diffusion is exponentially ergodic uniformly over all stationary Markov controls, and the invariant probability measures have uniform exponential tails. This result is sharp because when there is no abandonment and the spare capacity parameter is negative, the controlled diffusion is transient under any Markov control. In addition, we show that if all the abandonment rates are positive, the invariant probability measures have sub-Gaussian tails regardless whether the spare capacity parameter is positive or negative. Using these results, we proceed to establish the corresponding ergodic properties for the diffusion-scaled queuing processes. In addition to providing a simpler proof of previous results in Gamarnik and Stolyar [Gamarnik D, Stolyar AL (2012) Multiclass multiserver queueing system in the Halfin-Whitt heavy traffic regime: asymptotics of the stationary distribution. Queueing Systems 71(1–2):25–51], we extend these results to multiclass models with renewal arrival processes, albeit under the assumption that the mean residual life functions are bounded. For the Markovian model with Poisson arrivals, we obtain stronger results and show that the convergence to the stationary distribution is at an exponential rate uniformly over all work-conserving stationary Markov scheduling policies. 
    more » « less
  4. Abstract

    Organisms are constantly making tradeoffs. These tradeoffs may be behavioural (e.g. whether to focus on foraging or predator avoidance) or physiological (e.g. whether to allocate energy to reproduction or growth). Similarly, wildlife and fishery managers must make tradeoffs while striving for conservation or economic goals (e.g. costs vs. rewards). Stochastic dynamic programming (SDP) provides a powerful and flexible framework within which to explore these tradeoffs. A rich body of mathematical results on SDP exist but have received little attention in ecology and evolution.

    Using directed graphs – an intuitive visual model representation – we reformulated SDP models into matrix form. We synthesized relevant existing theoretical results which we then applied to two canonical SDP models in ecology and evolution. We applied these matrix methods to a simple illustrative patch choice example and an existing SDP model of parasitoid wasp behaviour.

    The proposed analytical matrix methods provide the same results as standard numerical methods as well as additional insights into the nature and quantity of other, nearly optimal, strategies, which we may also expect to observe in nature. The mathematical results highlighted in this work also explain qualitative aspects of model convergence. An added benefit of the proposed matrix notation is the resulting ease of implementation of Markov chain analysis (an exact solution for the realized states of an individual) rather than Monte Carlo simulations (the standard, approximate method). It also provides an independent validation method for other numerical methods, even in applications focused on short‐term, non‐stationary dynamics.

    These methods are useful for obtaining, interpreting, and further analysing model convergence to the optimal time‐independent (i.e. stationary) decisions predicted by an SDP model. SDP is a powerful tool both for theoretical and applied ecology, and an understanding of the mathematical structure underlying SDP models can increase our ability to apply and interpret these models.

     
    more » « less
  5. Francisco Ruiz, Jennifer Dy (Ed.)
    Graphical models such as Markov random fields (MRFs) that are associated with undirected graphs, and Bayesian networks (BNs) that are associated with directed acyclic graphs, have proven to be a very popular approach for reasoning under uncertainty, prediction problems and causal inference. Parametric MRF likelihoods are well-studied for Gaussian and categorical data. However, in more complicated parametric and semi-parametric set- tings, likelihoods specified via clique potential functions are generally not known to be congenial (jointly well-specified) or non-redundant. Congenial and non-redundant DAG likelihoods are far simpler to specify in both parametric and semi-parametric settings by modeling Markov factors in the DAG factorization. However, DAG likelihoods specified in this way are not guaranteed to coincide in distinct DAGs within the same Markov equivalence class. This complicates likelihoods based model selection procedures for DAGs by “sneaking in” potentially un- warranted assumptions about edge orientations. In this paper we link a density function decomposition due to Chen with the clique factorization of MRFs described by Lauritzen to provide a general likelihood for MRF models. The proposed likelihood is composed of variationally independent, and non-redundant closed form functionals of the observed data distribution, and is sufficiently general to apply to arbitrary parametric and semi-parametric models. We use an extension of our developments to give a general likelihood for DAG models that is guaranteed to coincide for all members of a Markov equivalence class. Our results have direct applications for model selection and semi-parametric inference. 
    more » « less