Manybody dynamical models in which Boltzmann statistics can be derived directly from the underlying dynamical laws without invoking the fundamental postulates of statistical mechanics are scarce. Interestingly, one such model is found in econophysics and in chemistry classrooms: the money game, in which players exchange money randomly in a process that resembles elastic intermolecular collisions in a gas, giving rise to the Boltzmann distribution of money owned by each player. Although this model offers a pedagogical example that demonstrates the origins of Boltzmann statistics, such demonstrations usually rely on computer simulations. In fact, a proof of the exponential steadystate distribution in this model has only become available in recent years. Here, we study this random money/energy exchange model and its extensions using a simple meanfieldtype approach that examines the properties of the onedimensional random walk performed by one of its participants. We give a simple derivation of the Boltzmann steadystate distribution in this model. Breaking the timereversal symmetry of the game by modifying its rules results in nonBoltzmann steadystate statistics. In particular, introducing ‘unfair’ exchange rules in which a poorer player is more likely to give money to a richer player than to receive money from that richer player, results in an analytically provable Paretotype powerlaw distribution of the money in the limit where the number of players is infinite, with a finite fraction of players in the ‘ground state’ (i.e. with zero money). For a finite number of players, however, the game may give rise to a bimodal distribution of money and to bistable dynamics, in which a participant’s wealth jumps between poor and rich states. The latter corresponds to a scenario where the player accumulates nearly all the available money in the game. The time evolution of a player’s wealth in this case can be thought of as a ‘chemical reaction’, where a transition between ‘reactants’ (rich state) and ‘products’ (poor state) involves crossing a large free energy barrier. We thus analyze the trajectories generated from the game using ideas from the theory of transition paths and highlight nonMarkovian effects in the barrier crossing dynamics.
 NSFPAR ID:
 10182363
 Date Published:
 Journal Name:
 Proceedings of the AAAI Conference on Artificial Intelligence
 Volume:
 34
 Issue:
 04
 ISSN:
 21595399
 Page Range / eLocation ID:
 3922 to 3929
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Abstract 
Batch Normalization (BN) is essential to effectively train stateoftheart deep Convolutional Neural Networks (CNN). It normalizes the layer outputs during training using the statistics of each minibatch. BN accelerates training procedure by allowing to safely utilize large learning rates and alleviates the need for careful initialization of the parameters. In this work, we study BN from the viewpoint of Fisher kernels that arise from generative probability models. We show that assuming samples within a minibatch are from the same probability density function, then BN is identical to the Fisher vector of a Gaussian distribution. That means batch normalizing transform can be explained in terms of kernels that naturally emerge from the probability density function that models the generative process of the underlying data distribution. Consequently, it promises higher discrimination power for the batchnormalized minibatch. However, given the rectifying nonlinearities employed in CNN architectures, distribution of the layer outputs show an asymmetric characteristic. Therefore, in order for BN to fully benefit from the aforementioned properties, we propose approximating underlying data distribution not with one, but a mixture of Gaussian densities. Deriving Fisher vector for a Gaussian Mixture Model (GMM), reveals that batch normalization can be improved by independently normalizing with respect to the statistics of disentangled subpopulations. We refer to our proposed soft piecewise version of batch normalization as Mixture Normalization (MN). Through extensive set of experiments on CIFAR10 and CIFAR100, using both a 5layers deep CNN and modern InceptionV3 architecture, we show that mixture normalization reduces required number of gradient updates to reach the maximum test accuracy of the batch normalized model by ∼31%47% across a variety of training scenarios. Replacing even a few BN modules with MN in the 48layers deep InceptionV3 architecture is sufficient to not only obtain considerable training acceleration but also better final test accuracy. We show that similar observations are valid for 40 and 100layers deep DenseNet architectures as well. We complement our study by evaluating the application of mixture normalization to the Generative Adversarial Networks (GANs), where "mode collapse" hinders the training process. We solely replace a few batch normalization layers in the generator with our proposed mixture normalization. Our experiments using Deep Convolutional GAN (DCGAN) on CIFAR10 show that mixture normalized DCGAN not only provides an acceleration of ∼58% but also reaches lower (better) "Fréchet Inception Distance" (FID) of 33.35 compared to 37.56 of its batch normalized counterpart.more » « less

Abstract Traits that have arisen multiple times yet still remain rare present a curious paradox. A number of these rare traits show a distinct tippy pattern, where they appear widely dispersed across a phylogeny, are associated with short branches and differ between recently diverged sister species. This phylogenetic pattern has classically been attributed to the trait being an evolutionary dead end, where the trait arises due to some short‐term evolutionary advantage, but it ultimately leads species to extinction. While the higher extinction rate associated with a dead end trait could produce such a tippy pattern, a similar pattern could appear if lineages with the trait speciated slower than other lineages, or if the trait was lost more often that it was gained. In this study, we quantify the degree of tippiness of red flowers in the tomato family, Solanaceae, and investigate the macroevolutionary processes that could explain the sparse phylogenetic distribution of this trait. Using a suite of metrics, we confirm that red‐flowered lineages are significantly overdispersed across the tree and form smaller clades than expected under a null model. Next, we fit 22 alternative models using Hi
SSE (Hidden State Speciation and Extinction), which accommodates asymmetries in speciation, extinction and transition rates that depend on observed and unobserved (hidden) character states. Results of the model fitting indicated significant variation in diversification rates across the family, which is best explained by the inclusion of hidden states. Our best fitting model differs between the maximum clade credibility tree and when incorporating phylogenetic uncertainty, suggesting that the extreme tippiness and rarity of red Solanaceae flowers makes it difficult to distinguish among different underlying processes. However, both of the best models strongly support a bias towards the loss of red flowers. The best fitting HiSSE model when incorporating phylogenetic uncertainty lends some support to the hypothesis that lineages with red flowers exhibit reduced diversification rates due to elevated extinction rates. Future studies employing simulations or targeting population‐level processes may allow us to determine whether red flowers in Solanaceae or other angiosperms clades are rare and tippy due to a combination of processes, or asymmetrical transitions alone. 
Despite —or maybe because of— their astonishing capacity to fit data, neural networks are believed to have difficulties extrapolating beyond training data distribution. This work shows that, for extrapolations based on finite transformation groups, a model’s inability to extrapolate is unrelated to its capacity. Rather, the shortcoming is inherited from a learning hypothesis: Examples not explicitly observed with infinitely many training examples have underspecified outcomes in the learner’s model. In order to endow neural networks with the ability to extrapolate over group transformations, we introduce a learning framework counterfactuallyguided by the learning hypothesis that any group invariance to (known) transformation groups is mandatory even without evidence, unless the learner deems it inconsistent with the training data. Unlike existing invariancedriven methods for (counterfactual) extrapolations, this framework allows extrapolations from a single environment. Finally, we introduce sequence and image extrapolation tasks that validate our framework and showcase the shortcomings of traditional approaches.more » « less

null (Ed.)Despite or maybe because of their astonishing capacity to fit data, neural networks are believed to have difficulties extrapolating beyond training data distribution. This work shows that, for extrapolations based on finite transformation groups, a model's inability to extrapolate is unrelated to its capacity. Rather, the shortcoming is inherited from a learning hypothesis: Examples not explicitly observed with infinitely many training examples have underspecified outcomes in the learner’s model. In order to endow neural networks with the ability to extrapolate over group transformations, we introduce a learning framework counterfactuallyguided by the learning hypothesis that any group invariance to (known) transformation groups is mandatory even without evidence, unless the learner deems it inconsistent with the training data.more » « less