skip to main content

Title: BOKEI: Bayesian optimization using knowledge of correlated torsions and expected improvement for conformer generation
A key challenge in conformer sampling is finding low-energy conformations with a small number of energy evaluations. We recently demonstrated the Bayesian Optimization Algorithm (BOA) is an effective method for finding the lowest energy conformation of a small molecule. Our approach balances between exploitation and exploration, and is more efficient than exhaustive or random search methods. Here, we extend strategies used on proteins and oligopeptides ( e.g. Ramachandran plots of secondary structure) and study correlated torsions in small molecules. We use bivariate von Mises distributions to capture correlations, and use them to constrain the search space. We validate the performance of our new method, Bayesian Optimization with Knowledge-based Expected Improvement (BOKEI), on a dataset consisting of 533 diverse small molecules, using (i) a force field (MMFF94); and (ii) a semi-empirical method (GFN2), as the objective function. We compare the search performance of BOKEI, BOA with Expected Improvement (BOA-EI), and a genetic algorithm (GA), using a fixed number of energy evaluations. In more than 60% of the cases examined, BOKEI finds lower energy conformations than global optimization with BOA-EI or GA. More importantly, we find correlated torsions in up to 15% of small molecules in larger data sets, up to 8 more » times more often than previously reported. The BOKEI patterns not only describe steric clashes, but also reflect favorable intramolecular interactions such as hydrogen bonds and π–π stacking. Increasing our understanding of the conformational preferences of molecules will help improve our ability to find low energy conformers efficiently, which will have impact in a wide range of computational modeling applications. « less
Authors:
; ;
Award ID(s):
1800435
Publication Date:
NSF-PAR ID:
10177038
Journal Name:
Physical Chemistry Chemical Physics
Volume:
22
Issue:
9
Page Range or eLocation-ID:
5211 to 5219
ISSN:
1463-9076
Sponsoring Org:
National Science Foundation
More Like this
  1. Starting in the early 2000s, sophisticated technologies have been developed for the rational construction of synthetic genetic networks that implement specified logical functionalities. Despite impressive progress, however, the scaling necessary in order to achieve greater computational power has been hampered by many constraints, including repressor toxicity and the lack of large sets of mutually orthogonal repressors. As a consequence, a typical circuit contains no more than roughly seven repressor-based gates per cell. A possible way around this scalability problem is to distribute the computation among multiple cell types, each of which implements a small subcircuit, which communicate among themselves usingmore »diffusible small molecules (DSMs). Examples of DSMs are those employed by quorum sensing systems in bacteria. This paper focuses on systematic ways to implement this distributed approach, in the context of the evaluation of arbitrary Boolean functions. The unique characteristics of genetic circuits and the properties of DSMs require the development of new Boolean synthesis methods, distinct from those classically used in electronic circuit design. In this work, we propose a fast algorithm to synthesize distributed realizations for any Boolean function, under constraints on the number of gates per cell and the number of orthogonal DSMs. The method is based on an exact synthesis algorithm to find the minimal circuit per cell, which in turn allows us to build an extensive database of Boolean functions up to a given number of inputs. For concreteness, we will specifically focus on circuits of up to 4 inputs, which might represent, for example, two chemical inducers and two light inputs at different frequencies. Our method shows that, with a constraint of no more than seven gates per cell, the use of a single DSM increases the total number of realizable circuits by at least 7.58-fold compared to centralized computation. Moreover, when allowing two DSM’s, one can realize 99.995% of all possible 4-input Boolean functions, still with at most 7 gates per cell. The methodology introduced here can be readily adapted to complement recent genetic circuit design automation software. A toolbox that uses the proposed algorithm was created and made available at https://github. com/sontaglab/DBC/.« less
  2. We introduce Ordalia, a novel approach for speeding up deep learning hyperparameter optimization search through early-pruning of less promising configurations. Our method leverages empirical and theoretical results characterizing the shape of the generalization error curve for increasing training data size and number of epochs. We show that with relatively small computational resources one can estimate the dominant parameters of neural networks' learning curves to obtain consistently good evaluations of their learning process to reliably early-eliminate non-promising configurations. By iterating this process with increasing training resources Ordalia rapidly converges to a small candidate set that includes many of the most promisingmore »configurations. We compare the performance of Ordalia with Hyperband, the state-of-the-art model-free hyperparameter optimization algorithm, and show that Ordalia consistently outperforms it on a variety of deep learning tasks. Ordalia conservative use of computational resources and ability to evaluate neural networks learning progress leads to a much better exploration and coverage of the search space, which ultimately produces superior neural network configurations.« less
  3. Abstract

    Methods based on Gaussian stochastic process (GSP) models and expected improvement (EI) functions have been promising for box-constrained expensive optimization problems. These include robust design problems with environmental variables having set-type constraints. However, the methods that combine GSP and EI sub-optimizations suffer from the following problem, which limits their computational performance. Efficient global optimization (EGO) methods often repeat the same or nearly the same experimental points. We present a novel EGO-type constraint-handling method that maintains a so-called tabu list to avoid past points. Our method includes two types of penalties for the key “infill” optimization, which selects the nextmore »test runs. We benchmark our tabu EGO algorithm with five alternative approaches, including DIRECT methods using nine test problems and two engineering examples. The engineering examples are based on additive manufacturing process parameter optimization informed using point-based thermal simulations and robust-type quality constraints. Our test problems span unconstrained, simply constrained, and robust constrained problems. The comparative results imply that tabu EGO offers very promising computational performance for all types of black-box optimization in terms of convergence speed and the quality of the final solution.

    « less
  4. The dynamic stall phenomenon produces adverse aerodynamic loading, which negatively affects the structural strength and life of aerodynamic systems. Aerodynamic shape optimization (ASO) provides a practical approach for delaying and mitigating dynamic stall characteristics without the addition of an auxiliary system. A typical ASO investigation requires multiple evaluations of accurate but time-consuming computational fluid dynamics (CFD) simulations. In the case of dynamic stall, unsteady CFD simulations are required for airfoil shape evaluation; combining it with high-dimensions of airfoil shape parameterization renders the ASO investigation computationally costly. In this study, metamodel-based optimization (MBO) is proposed using the multifidelity modeling (MFM) techniquemore »to efficiently conduct ASO investigation for computationally expensive dynamic stall cases. MFM methods combine data from accurate high-fidelity (HF) simulations and fast low-fidelity (LF) simulations to provide accurate and fast predictions. In particular, Cokriging regression is used for approximating the objective and constraint functions. The airfoil shape is parameterized using six PARSEC parameters. The objective and constraint functions are evaluated for a sinusoidally oscillating airfoil with the unsteady Reynolds-averaged Navier-Stokes equations at a Reynolds number of 135,000, Mach number of 0.1, and reduced frequency of 0.05. The initial metamodel is generated using 220 LF and 20 HF samples. The metamodel is then sequentially refined using the expected improvement infill criteria and validated with the normalized root mean square error. The refined metamodel is utilized for finding the optimal design. The optimal airfoil shape shows higher thickness, larger leading-edge radius, and an aft camber compared to baseline (NACA 0012). The optimal shape delays the dynamic stall occurrence by 3 degrees and reduces the peak aerodynamic coefficients. The performance of the MFM method is also compared with the single-fidelity metamodeling method using HF samples. Both the approaches produced similar optimal shapes; however, the optimal shape from MFM achieved a minimum objective function value while more closely satisfying the constraint at a computational cost saving of around 41%.« less
  5. Ultra-high-energy (UHE) photons are an important tool for studying the high-energy Universe. A plausible source of photons with exa-eV (EeV) energy is provided by UHE cosmic rays (UHECRs) undergoing the Greisen–Zatsepin–Kuzmin process (Greisen 1966; Zatsepin & Kuzmin 1966) or pair production process (Blumenthal 1970) on a cosmic background radiation. In this context, the EeV photons can be a probe of both UHECR mass composition and the distribution of their sources (Gelmini, Kalashev & Semikoz 2008; Hooper, Taylor & Sarkar 2011). At the same time, the possible flux of photons produced by UHE protons in the vicinity of their sources bymore »pion photoproduction or inelastic nuclear collisions would be noticeable only for relatively near sources, as the attenuation length of UHE photons is smaller than that of UHE protons; see, for example, Bhattacharjee & Sigl (2000) for a review. There also exists a class of so-called top-down models of UHECR generation that efficiently produce the UHE photons, for instance by the decay of heavy dark-matter particles (Berezinsky, Kachelriess & Vilenkin 1997; Kuzmin & Rubakov 1998) or by the radiation from cosmic strings (Berezinsky, Blasi & Vilenkin 1998). The search for the UHE photons was shown to be the most sensitive method of indirect detection of heavy dark matter (Kalashev & Kuznetsov 2016, 2017; Kuznetsov 2017; Kachelriess, Kalashev & Kuznetsov 2018; Alcantara, Anchordoqui & Soriano 2019). Another fundamental physics scenario that could be tested with UHE photons (Fairbairn, Rashba & Troitsky 2011) is the photon mixing with axion-like particles (Raffelt & Stodolsky 1988), which could be responsible for the correlation of UHECR events with BL Lac type objects observed by the High Resolution Fly’s Eye (HiRes) experiment (Gorbunov et al. 2004; Abbasi et al. 2006). In most of these scenarios, a clustering of photon arrival directions, rather than diffuse distribution, is expected, so point-source searches can be a suitable test for photon - axion-like particle mixing models. Finally, UHE photons could also be used as a probe for the models of Lorentz-invariance violation (Coleman & Glashow 1999; Galaverni & Sigl 2008; Maccione, Liberati & Sigl 2010; Rubtsov, Satunin & Sibiryakov 2012, 2014). The Telescope Array (TA; Tokuno et al. 2012; Abu-Zayyad et al. 2013c) is the largest cosmic ray experiment in the Northern Hemisphere. It is located at 39.3° N, 112.9° W in Utah, USA. The observatory includes a surface detector array (SD) and 38 fluorescence telescopes grouped into three stations. The SD consists of 507 stations that contain plastic scintillators, each with an area of 3 m2 (SD stations). The stations are placed in the square grid with 1.2 km spacing and cover an area of ∼700 km2. The TA SD is capable of detecting extensive air showers (EASs) in the atmosphere caused by cosmic particles of EeV and higher energies. The TA SD has been operating since 2008 May. A hadron-induced EAS significantly differs from an EAS induced by a photon because the depth of the shower maximum Xmax for a photon shower is larger, and a photon shower contains fewer muons and has a more curved front (see Risse & Homola 2007 for a review). The TA SD stations are sensitive to both muon and electromagnetic components of the shower and therefore can be triggered by both hadron-induced and photon-induced EAS events. In the present study, we use 9 yr of TA SD data for a blind search for point sources of UHE photons. We utilize the statistics of the SD data, which benefit from a high duty cycle. The full Monte Carlo (MC) simulation of proton-induced and photon-induced EAS events allows us to perform the photon search up to the highest accessible energies, E ≳ 1020 eV. As the main tool for the present photon search, we use a multivariate analysis based on a number of SD parameters that make it possible to distinguish between photon and hadron primaries. While searches for diffuse UHE photons were performed by several EAS experiments, including Haverah Park (Ave et al. 2000), AGASA (Shinozaki et al. 2002; Risse et al. 2005), Yakutsk (Rubtsov et al. 2006; Glushkov et al. 2007, 2010), Pierre Auger (Abraham et al. 2007, 2008a; Bleve 2016; Aab et al. 2017c) and TA (Abu-Zayyad et al. 2013b; Abbasi et al. 2019a), the search for point sources of UHE photons has been done only by the Pierre Auger Observatory (Aab et al. 2014, 2017a). The latter searches were based on hybrid data and were limited to the 1017.3 < E < 1018.5 eV energy range. In the present paper, we use the TA SD data alone. We perform the searches in five energy ranges: E > 1018, E > 1018.5, E > 1019, E > 1019.5 and E > 1020 eV. We find no significant evidence of photon point sources in all energy ranges and we set the point-source flux upper limits from each direction in the TA field of view (FOV). The search for unspecified neutral particles was also previously performed by the TA (Abbasi et al. 2015). The limit on the point-source flux of neutral particles obtained in that work is close to the present photon point-source flux limits.« less