skip to main content

Title: Efficient sampling of constrained high-dimensional theoretical spaces with machine learning

Models of physics beyond the Standard Model often contain a large number of parameters. These form a high-dimensional space that is computationally intractable to fully explore. Experimental results project onto a subspace of parameters that are consistent with those observations, but mapping these constraints to the underlying parameters is also typically intractable. Instead, physicists often resort to scanning small subsets of the full parameter space and testing for experimental consistency. We propose an alternative approach that uses generative models to significantly improve the computational efficiency of sampling high-dimensional parameter spaces. To demonstrate this, we sample the constrained and phenomenological Minimal Supersymmetric Standard Models subject to the requirement that the sampled points are consistent with the measured Higgs boson mass. Our method achieves orders of magnitude improvements in sampling efficiency compared to a brute force search.

; ; ;
Award ID(s):
Publication Date:
Journal Name:
The European Physical Journal C
Springer Science + Business Media
Sponsoring Org:
National Science Foundation
More Like this
  1. Nonlinear state-space models are ubiquitous in modeling real-world dynamical systems. Sequential Monte Carlo (SMC) techniques, also known as particle methods, are a well-known class of parameter estimation methods for this general class of state-space models. Existing SMC-based techniques rely on excessive sampling of the parameter space, which makes their computation intractable for large systems or tall data sets. Bayesian optimization techniques have been used for fast inference in state-space models with intractable likelihoods. These techniques aim to find the maximum of the likelihood function by sequential sampling of the parameter space through a single SMC approximator. Various SMC approximators with different fidelities and computational costs are often available for sample- based likelihood approximation. In this paper, we propose a multi-fidelity Bayesian optimization algorithm for the inference of general nonlinear state-space models (MFBO-SSM), which enables simultaneous sequential selection of parameters and approximators. The accuracy and speed of the algorithm are demonstrated by numerical experiments using synthetic gene expression data from a gene regulatory network model and real data from the VIX stock price index.
  2. Abstract

    Particle accelerators are invaluable discovery engines in the chemical, biological and physical sciences. Characterization of the accelerated beam response to accelerator input parameters is often the first step when conducting accelerator-based experiments. Currently used techniques for characterization, such as grid-like parameter sampling scans, become impractical when extended to higher dimensional input spaces, when complicated measurement constraints are present, or prior information known about the beam response is scarce. Here in this work, we describe an adaptation of the popular Bayesian optimization algorithm, which enables a turn-key exploration of input parameter spaces. Our algorithm replaces  the need for parameter scans while minimizing prior information needed about the measurement’s behavior and associated measurement constraints. We experimentally demonstrate that our algorithm autonomously conducts an adaptive, multi-parameter exploration of input parameter space, potentially orders of magnitude faster than conventional grid-like parameter scans, while making highly constrained, single-shot beam phase-space measurements and accounts for costs associated with changing input parameters. In addition to applications in accelerator-based scientific experiments, this algorithm addresses challenges shared by many scientific disciplines, and is thus applicable to autonomously conducting experiments over a broad range of research topics.

  3. Abstract

    The marginal likelihood of a model is a key quantity for assessing the evidence provided by the data in support of a model. The marginal likelihood is the normalizing constant for the posterior density, obtained by integrating the product of the likelihood and the prior with respect to model parameters. Thus, the computational burden of computing the marginal likelihood scales with the dimension of the parameter space. In phylogenetics, where we work with tree topologies that are high-dimensional models, standard approaches to computing marginal likelihoods are very slow. Here, we study methods to quickly compute the marginal likelihood of a single fixed tree topology. We benchmark the speed and accuracy of 19 different methods to compute the marginal likelihood of phylogenetic topologies on a suite of real data sets under the JC69 model. These methods include several new ones that we develop explicitly to solve this problem, as well as existing algorithms that we apply to phylogenetic models for the first time. Altogether, our results show that the accuracy of these methods varies widely, and that accuracy does not necessarily correlate with computational burden. Our newly developed methods are orders of magnitude faster than standard approaches, and in somemore »cases, their accuracy rivals the best established estimators.

    « less
  4. Abstract

    Hierarchical probability models are being used more often than non-hierarchical deterministic process models in environmental prediction and forecasting, and Bayesian approaches to fitting such models are becoming increasingly popular. In particular, models describing ecosystem dynamics with multiple states that are autoregressive at each step in time can be treated as statistical state space models (SSMs). In this paper, we examine this subset of ecosystem models, embed a process-based ecosystem model into an SSM, and give closed form Gibbs sampling updates for latent states and process precision parameters when process and observation errors are normally distributed. Here, we use simulated data from an example model (DALECev) and study the effects changing the temporal resolution of observations on the states (observation data gaps), the temporal resolution of the state process (model time step), and the level of aggregation of observations on fluxes (measurements of transfer rates on the state process). We show that parameter estimates become unreliable as temporal gaps between observed state data increase. To improve parameter estimates, we introduce a method of tuning the time resolution of the latent states while still using higher-frequency driver information and show that this helps to improve estimates. Further, we show that datamore »cloning is a suitable method for assessing parameter identifiability in this class of models. Overall, our study helps inform the application of state space models to ecological forecasting applications where (1) data are not available for all states and transfers at the operational time step for the ecosystem model and (2) process uncertainty estimation is desired.

    « less
  5. Learning a dynamical system from input/output data is a fundamental task in the control design pipeline. In the partially observed setting there are two components to identification: parameter estimation to learn the Markov parameters, and system realization to obtain a state space model. In both sub-problems it is implicitly assumed that standard numerical algorithms such as the singular value decomposition (SVD) can be easily and reliably computed. When trying to fit a high-dimensional model to data, even computing an SVD may be intractable. In this work we show that an approximate matrix factorization obtained using randomized methods can replace the standard SVD in the realization algorithm while maintaining the finite-sample performance and robustness guarantees of classical methods.