skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Infinity Learning: Learning Markov Chains from Aggregate Steady-State Observations
We consider the task of learning a parametric Continuous Time Markov Chain (CTMC) sequence model without examples of sequences, where the training data consists entirely of aggregate steady-state statistics. Making the problem harder, we assume that the states we wish to predict are unobserved in the training data. Specifically, given a parametric model over the transition rates of a CTMC and some known transition rates, we wish to extrapolate its steady state distribution to states that are unobserved. A technical roadblock to learn a CTMC from its steady state has been that the chain rule to compute gradients will not work over the arbitrarily long sequences necessary to reach steady state —from where the aggregate statistics are sampled. To overcome this optimization challenge, we propose ∞-SGD, a principled stochastic gradient descent method that uses randomly-stopped estimators to avoid infinite sums required by the steady state computation, while learning even when only a subset of the CTMC states can be observed. We apply ∞-SGD to a real-world testbed and synthetic experiments showcasing its accuracy, ability to extrapolate the steady state distribution to unobserved states under unobserved conditions (heavy loads, when training under light loads), and succeeding in difficult scenarios where even a tailor-made extension of existing methods fails.  more » « less
Award ID(s):
1717493 1738981 1943364
PAR ID:
10182363
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
34
Issue:
04
ISSN:
2159-5399
Page Range / eLocation ID:
3922 to 3929
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Many-body dynamical models in which Boltzmann statistics can be derived directly from the underlying dynamical laws without invoking the fundamental postulates of statistical mechanics are scarce. Interestingly, one such model is found in econophysics and in chemistry classrooms: the money game, in which players exchange money randomly in a process that resembles elastic intermolecular collisions in a gas, giving rise to the Boltzmann distribution of money owned by each player. Although this model offers a pedagogical example that demonstrates the origins of Boltzmann statistics, such demonstrations usually rely on computer simulations. In fact, a proof of the exponential steady-state distribution in this model has only become available in recent years. Here, we study this random money/energy exchange model and its extensions using a simple mean-field-type approach that examines the properties of the one-dimensional random walk performed by one of its participants. We give a simple derivation of the Boltzmann steady-state distribution in this model. Breaking the time-reversal symmetry of the game by modifying its rules results in non-Boltzmann steady-state statistics. In particular, introducing ‘unfair’ exchange rules in which a poorer player is more likely to give money to a richer player than to receive money from that richer player, results in an analytically provable Pareto-type power-law distribution of the money in the limit where the number of players is infinite, with a finite fraction of players in the ‘ground state’ (i.e. with zero money). For a finite number of players, however, the game may give rise to a bimodal distribution of money and to bistable dynamics, in which a participant’s wealth jumps between poor and rich states. The latter corresponds to a scenario where the player accumulates nearly all the available money in the game. The time evolution of a player’s wealth in this case can be thought of as a ‘chemical reaction’, where a transition between ‘reactants’ (rich state) and ‘products’ (poor state) involves crossing a large free energy barrier. We thus analyze the trajectories generated from the game using ideas from the theory of transition paths and highlight non-Markovian effects in the barrier crossing dynamics. 
    more » « less
  2. Training example order in SGD has long been known to affect convergence rate. Recent results show that accelerated rates are possible in a variety of cases for permutation-based sample orders, in which each example from the training set is used once before any example is reused. In this paper, we develop a broad condition on the sequence of examples used by SGD that is sufficient to prove tight convergence rates in both strongly convex and non-convex settings. We show that our approach suffices to recover, and in some cases improve upon, previous state-of-the-art analyses for four known example-selection schemes: (1) shuffle once, (2) random reshuffling, (3) random reshuffling with data echoing, and (4) Markov Chain Gradient Descent. Motivated by our theory, we propose two new example-selection approaches. First, using quasi-Monte-Carlo methods, we achieve unprecedented accelerated convergence rates for learning with data augmentation. Second, we greedily choose a fixed scan-order to minimize the metric used in our condition and show that we can obtain more accurate solutions from the same number of epochs of SGD. We conclude by empirically demonstrating the utility of our approach for both convex linear-model and deep learning tasks. Our code is available at: https://github.com/EugeneLYC/qmc-ordering. 
    more » « less
  3. Finkbeiner, B.; Wies, T. (Ed.)
    Stochastic model checking (SMC) is a formal verification technique for the analysis of systems with probabilistic behavior. Scalability has been a major limiting factor for SMC tools to analyze real-world systems with large or infinite state spaces. The infinite-state Continuous-time Markov Chain (CTMC) model checker, STAMINA, tackles this problem by selectively exploring only a portion of a model’s state space, where a majority of the probability mass resides, to efficiently give an accurate probability bound to properties under verification. In this paper, we present two major improvements to STAMINA, namely, a method of calculating and distributing estimated state reachability probabilities that improves state space truncation efficiency and combination of the previous two CTMC analyses into one for generating the probability bound. Demonstration of the improvements on several benchmark examples, including hazard analysis of infinite-state combinational genetic circuits, yield significant savings in both run-time and state space size (and hence memory), compared to both the previous version of STAMINA and the infinite-state CTMC model checker INFAMY. The improved STAMINA demonstrates significant scalability to allow for the verification of complex real-world infinite-state systems. 
    more » « less
  4. Communication is a key bottleneck in federated learning where a large number of edge devices collaboratively learn a model under the orchestration of a central server without sharing their own training data. While local SGD has been proposed to reduce the number of FL rounds and become the algorithm of choice for FL, its total communication cost is still prohibitive when each device needs to communicate with the remote server repeatedly for many times over bandwidth-limited networks. In light of both device-to-device (D2D) and device-to-server (D2S) cooperation opportunities in modern communication networks, this paper proposes a new federated optimization algorithm dubbed hybrid local SGD (HL-SGD) in FL settings where devices are grouped into a set of disjoint clusters with high D2D communication bandwidth. HL-SGD subsumes previous proposed algorithms such as local SGD and gossip SGD and enables us to strike the best balance between model accuracy and runtime. We analyze the convergence of HL-SGD in the presence of heterogeneous data for general nonconvex settings. We also perform extensive experiments and show that the use of hybrid model aggregation via D2D and D2S communications in HL-SGD can largely speed up the training time of federated learning. 
    more » « less
  5. null (Ed.)
    Despite ---or maybe because of--- their astonishing capacity to fit data, neural networks are believed to have difficulties extrapolating beyond training data distribution. This work shows that, for extrapolations based on finite transformation groups, a model's inability to extrapolate is unrelated to its capacity. Rather, the shortcoming is inherited from a learning hypothesis: Examples not explicitly observed with infinitely many training examples have underspecified outcomes in the learner’s model. In order to endow neural networks with the ability to extrapolate over group transformations, we introduce a learning framework counterfactually-guided by the learning hypothesis that any group invariance to (known) transformation groups is mandatory even without evidence, unless the learner deems it inconsistent with the training data. 
    more » « less