 NSFPAR ID:
 10380131
 Date Published:
 Journal Name:
 Advances in neural information processing systems
 ISSN:
 10495258
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Abstract We consider a collection of Markov chains that model the evolution of multitype biological populations. The state space of the chains is the positive orthant, and the boundary of the orthant is the absorbing state for the Markov chain and represents the extinction states of different population types. We are interested in the longterm behavior of the Markov chain away from extinction, under a small noise scaling. Under this scaling, the trajectory of the Markov process over any compact interval converges in distribution to the solution of an ordinary differential equation (ODE) evolving in the positive orthant. We study the asymptotic behavior of the quasistationary distributions (QSD) in this scaling regime. Our main result shows that, under conditions, the limit points of the QSD are supported on the union of interior attractors of the flow determined by the ODE. We also give lower bounds on expected extinction times which scale exponentially with the system size. Results of this type when the deterministic dynamical system obtained under the scaling limit is given by a discretetime evolution equation and the dynamics are essentially in a compact space (namely, the onestep map is a bounded function) have been studied by Faure and Schreiber (2014). Our results extend these to a setting of an unbounded state space and continuoustime dynamics. The proofs rely on uniform large deviation results for small noise stochastic dynamical systems and methods from the theory of continuoustime dynamical systems. In general, QSD for Markov chains with absorbing states and unbounded state spaces may not exist. We study one basic family of binomialPoisson models in the positive orthant where one can use Lyapunov function methods to establish existence of QSD and also to argue the tightness of the QSD of the scaled sequence of Markov chains. The results from the first part are then used to characterize the support of limit points of this sequence of QSD.more » « less

Abstract Continuoustime Markov chains are frequently used as stochastic models for chemical reaction networks, especially in the growing field of systems biology. A fundamental problem for these Stochastic Chemical Reaction Networks (SCRNs) is to understand the dependence of the stochastic behavior of these systems on the chemical reaction rate parameters. Towards solving this problem, in this paper we develop theoretical tools called comparison theorems that provide stochastic ordering results for SCRNs. These theorems give sufficient conditions for monotonic dependence on parameters in these network models, which allow us to obtain, under suitable conditions, information about transient and steadystate behavior. These theorems exploit structural properties of SCRNs, beyond those of general continuoustime Markov chains. Furthermore, we derive two theorems to compare stationary distributions and mean first passage times for SCRNs with different parameter values, or with the same parameters and different initial conditions. These tools are developed for SCRNs taking values in a generic (finite or countably infinite) state space and can also be applied for nonmassaction kinetics models. When propensity functions are bounded, our method of proof gives an explicit method for coupling two comparable SCRNs, which can be used to simultaneously simulate their sample paths in a comparable manner. We illustrate our results with applications to models of enzymatic kinetics and epigenetic regulation by chromatin modifications.

We consider the imitation learning problem of learning a policy in a Markov Decision Process (MDP) setting where the reward function is not given, but demonstrations from experts are available. Although the goal of imitation learning is to learn a policy that produces behaviors nearly as good as the experts’ for a desired task, assumptions of consistent optimality for demonstrated behaviors are often violated in practice. Finding a policy that is distributionally robust against noisy demonstrations based on an adversarial construction potentially solves this problem by avoiding optimistic generalizations of the demonstrated data. This paper studies Distributionally Robust Imitation Learning (DRoIL) and establishes a close connection between DRoIL and Maximum Entropy Inverse Reinforcement Learning. We show that DRoIL can be seen as a framework that maximizes a generalized concept of entropy. We develop a novel approach to transform the objective function into a convex optimization problem over a polynomial number of variables for a class of loss functions that are additive over state and action spaces. Our approach lets us optimize both stationary and nonstationary policies and, unlike prevalent previous methods, it does not require repeatedly solving an inner reinforcement learning problem. We experimentally show the significant benefits of DRoIL’s new optimization method on synthetic data and a highway driving environment.more » « less

Berry, Jonathan ; Shmoys, David ; Cowen, Lenore ; Naumann, Uwe (Ed.)In the United States, regions (such as states or counties) are frequently divided into districts for the purpose of electing representatives. How the districts are drawn can have a profound effect on who's elected, and drawing the districts to give an advantage to a certain group is known as gerrymandering. It can be surprisingly difficult to detect when gerrymandering is occurring, but one algorithmic method is to compare a current districting plan to a large number of randomly sampled plans to see whether it is an outlier. Recombination Markov chains are often used to do this random sampling: randomly choose two districts, consider their union, and split this union up in a new way. This approach works well in practice and has been widely used, including in litigation, but the theory behind it remains underdeveloped. For example, it's not known if recombination Markov chains are irreducible, that is, if recombination moves suffice to move from any districting plan to any other. Irreducibility of recombination Markov chains can be formulated as a graph problem: for a planar graph G, is the space of all partitions of G into κ connected subgraphs (κ districts) connected by recombination moves? While the answer is yes when districts can be as small as one vertex, this is not realistic in realworld settings where districts must have approximately balanced populations. Here we fix district sizes to be κ1 ± 1 vertices, κ2 ± 1 vertices,… for fixed κ1, κ2,…, a more realistic setting. We prove for arbitrarily large triangular regions in the triangular lattice, when there are three simply connected districts, recombination Markov chains are irreducible. This is the first proof of irreducibility under tight district size constraints for recombination Markov chains beyond small or trivial examples. The triangular lattice is the most natural setting in which to first consider such a question, as graphs representing states/regions are frequently triangulated. The proof uses a sweepline argument, and there is hope it will generalize to more districts, triangulations satisfying mild additional conditions, and other redistricting Markov chains.more » « less

Many transit agencies operating paratransit and microtransit services have to respond to trip requests that arrive in realtime, which entails solving hard combinatorial and sequential decisionmaking problems under uncertainty. To avoid decisions that lead to significant inefficiency in the long term, vehicles should be allocated to requests by optimizing a nonmyopic utility function or by batching requests together and optimizing a myopic utility function. While the former approach is typically offline, the latter can be performed online. We point out two major issues with such approaches when applied to paratransit services in practice. First, it is difficult to batch paratransit requests together as they are temporally sparse. Second, the environment in which transit agencies operate changes dynamically (e.g., traffic conditions can change over time), causing the estimates that are learned offline to become stale. To address these challenges, we propose a fully online approach to solve the dynamic vehicle routing problem (DVRP) with time windows and stochastic trip requests that is robust to changing environmental dynamics by construction. We focus on scenarios where requests are relatively sparseour problem is motivated by applications to paratransit services. We formulate DVRP as a Markov decision process and use Monte Carlo tree search to evaluate actions for any given state. Accounting for stochastic requests while optimizing a nonmyopic utility function is computationally challenging; indeed, the action space for such a problem is intractably large in practice. To tackle the large action space, we leverage the structure of the problem to design heuristics that can sample promising actions for the tree search. Our experiments using realworld data from our partner agency show that the proposed approach outperforms existing stateoftheart approaches both in terms of performance and robustness.more » « less