Generating random variates is a fundamental operation in diverse areas of computer science and is supported in almost all modern programming languages. Traditional software libraries for random variate generation are grounded in the idealized Real-RAM model of computation, where algorithms are assumed to be able to access uniformly distributed real numbers from the unit interval and compute with infinite-precision real arithmetic. These assumptions are unrealistic, as any software implementation of a Real-RAM algorithm on a physical computer can instead access a stream of individual random bits and computes with finite-precision arithmetic. As a result, existing libraries have few theoretical guarantees in practice. For example, the actual distribution of a random variate generator is generally unknown, intractable to quantify, and arbitrarily different from the desired distribution; causing runtime errors, unexpected behavior, and inconsistent APIs. This article introduces a new approach to principled and practical random variate generation with formal guarantees. The key idea is to first specify the desired probability distribution in terms of a finite-precision numerical program that defines its cumulative distribution function (CDF), and then generate exact random variates according to this CDF. We present a universal and fully automated method to synthesize exact random variate generators given any numerical CDF implemented in any binary number format, such as floating-point, fixed-point, and posits. The method is guaranteed to operate with the same precision used to specify the CDF, does not overflow, avoids expensive arbitrary-precision arithmetic, and exposes a consistent API. The method rests on a novel space-time optimal implementation for the class of generators that attain the information-theoretically optimal Knuth and Yao entropy rate, consuming the least possible number of input random bits per output variate. We develop a random variate generation library using our method in C and evaluate it on a diverse set of continuous and discrete distributions, showing competitive runtime with the state-of-the-art GNU Scientific Library while delivering higher accuracy, entropy efficiency, and automation.
more »
« less
Parsing randomness
Random data generators can be thought of as parsers of streams of randomness. This perspective on generators for random data structures is established folklore in the programming languages community, but it has never been formalized, nor have its consequences been deeply explored. We build on the idea of freer monads to develop free generators, which unify parsing and generation using a common structure that makes the relationship between the two concepts precise. Free generators lead naturally to a proof that a monadic generator can be factored into a parser plus a distribution over choice sequences. Free generators also support a notion of derivative, analogous to the familiar Brzozowski derivatives of formal languages, allowing analysis tools to preview the effect of a particular generator choice. This gives rise to a novel algorithm for generating data structures satisfying user-specified preconditions.
more »
« less
- Award ID(s):
- 1955565
- PAR ID:
- 10603365
- Publisher / Repository:
- Association for Computing Machinery (ACM)
- Date Published:
- Journal Name:
- Proceedings of the ACM on Programming Languages
- Volume:
- 6
- Issue:
- OOPSLA2
- ISSN:
- 2475-1421
- Format(s):
- Medium: X Size: p. 89-113
- Size(s):
- p. 89-113
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A Generative Adversarial Network (GAN) is an unsupervised generative framework to generate a sample distribution that is identical to the data distribution. Recently, mix strategy multi-generator/discriminator GANs have been shown to outperform single pair GANs. However, the mixed model suffers from the problem of linearly growing training time. Also, imbalanced training among generators makes it difficult to parallelize. In this paper, we propose a balanced mix-generator GAN that works in parallel by mixing multiple disjoint generators to approximate the real distribution. The weights of the discriminator and the classifier are controlled by a balance strategy. We also present an efficient loss function, to force each generator to embrace few modes with a high probability. Our model is naturally adaptive to large parallel computation frameworks. Each generator can be trained on multiple GPUs asynchronously. We have performed extensive experiments on synthetic datasets, MNIST1000, CIFAR-10, and ImageNet. The results establish that our model can achieve the state-of-the-art performance (in terms of the modes coverage and the inception score), with significantly reduced training time. We also show that the missing mode problem can be relieved with a growing number of generators.more » « less
-
null (Ed.)We consider a general framework for constructing non-linear generators by adding a (32-bit or larger) pseudo-random number generator (PRNG) as a baseline generator to the basic RC4 design, in which an index-selection scheme similar to RC4 is used. We refer to the proposed design as the eRC (enhanced/ extended RC4) design. We discuss several advantages of adding a good baseline generator to the RC4 design, including new updating schemes for the auxiliary table. We consider some popular PRNGs with the nice properties of high-dimensional equi-distribution, efficiency, long period, and portability as the baseline generator. We demonstrate that eRC generators are very efficient via extensive empirical testing on some eRC generators. We also show that eRC is flexible enough to choose minimal design parameters for eRC generators and yet the resulting eRC generators still pass stringent empirical tests, which makes them suitable for both software and hardware implementations.more » « less
-
A novel approach to computationally enhance the sampling of molecular crystal structures is proposed and tested. This method is based on the use of extended variables coupled to a Monte Carlo based crystal polymorph generator. Inspired by the established technique of quasi-random sampling of polymorphs using the rigid molecule constraint, this approach represents molecular clusters as extended variables within a thermal reservoir. Polymorph unit-cell variables are generated using pseudo-random sampling. Within this framework, a harmonic coupling between the extended variables and polymorph configurations is established. The extended variables remain fixed during the inner loop dedicated to polymorph sampling, enforcing a stepwise propagation of the extended variables to maintain system exploration. The final processing step results in a polymorph energy landscape, where the raw structures sampled to create the extended variable trajectory are re-optimized without the thermal coupling term. The foundational principles of this approach are described and its effectiveness using both a Metropolis Monte Carlo type algorithm and modifications that incorporate replica exchange is demonstrated. A comparison is provided with pseudo-random sampling of polymorphs for the molecule coumarin. The choice to test a design of this algorithm as relevant for enhanced sampling of crystal structures was due to the obvious relation between molecular structure variables and corresponding crystal polymorphs as representative of the inherent vapor to crystal transitions that exist in nature. Additionally, it is shown that the trajectories of extended variables can be harnessed to extract fluctuation properties that can lead to valuable insights. A novel thermodynamic variable is introduced: the free energy difference between ensembles ofZ′ = 1 andZ′ = 2 crystal polymorphs.more » « less
-
In this paper, a novel hierarchical signal processing methodology is proposed for generator condition monitoring and fault diagnosis based on raw electrical waveform data in power networks, which can often be measured by strategically located waveform sensors. The impact of generator short circuit faults on strategically located electrical waveform sensors in power networks are firstly investigated and validated in Matlab Simulink. Based on the large set of electrical waveform data produced by Matlab Simulink, a hierarchical algorithm is then designed to locate fault site location and monitor the condition of generators in power networks. Finally, the proposed methodology is validated in 14-bus IEEE standard power network under different scenarios (e.g, one generator fault, two-generator-fault, various aging levels, etc). Our results show that we can locate fault site location and monitor the aging condition of generators in power networks. Compared to traditional condition monitoring and fault diagnosis based on generator sensors, our proposed methodology can monitor a large number of generators based on a limited number of waveform sensors, which promises to reduce the cost of the maintenance and improve the reliability of the power grid.more » « less
An official website of the United States government
