<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>BOKEI: Bayesian optimization using knowledge of correlated torsions and expected improvement for conformer generation</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>03/04/2020</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10177038</idno>
					<idno type="doi">10.1039/C9CP06688H</idno>
					<title level='j'>Physical Chemistry Chemical Physics</title>
<idno>1463-9076</idno>
<biblScope unit="volume">22</biblScope>
<biblScope unit="issue">9</biblScope>					

					<author>Lucian Chan</author><author>Geoffrey R. Hutchison</author><author>Garrett M. Morris</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[A key challenge in conformer sampling is finding low-energy conformations with a small number of energy evaluations. We recently demonstrated the Bayesian Optimization Algorithm (BOA) is an effective method for finding the lowest energy conformation of a small molecule. Our approach balances between exploitation and exploration, and is more efficient than exhaustive or random search methods. Here, we extend strategies used on proteins and oligopeptides (              e.g.              Ramachandran plots of secondary structure) and study correlated torsions in small molecules. We use bivariate von Mises distributions to capture correlations, and use them to constrain the search space. We validate the performance of our new method, Bayesian Optimization with Knowledge-based Expected Improvement (BOKEI), on a dataset consisting of 533 diverse small molecules, using (i) a force field (MMFF94); and (ii) a semi-empirical method (GFN2), as the objective function. We compare the search performance of BOKEI, BOA with Expected Improvement (BOA-EI), and a genetic algorithm (GA), using a fixed number of energy evaluations. In more than 60% of the cases examined, BOKEI finds lower energy conformations than global optimization with BOA-EI or GA. More importantly, we find correlated torsions in up to 15% of small molecules in larger data sets, up to 8 times more often than previously reported. The BOKEI patterns not only describe steric clashes, but also reflect favorable intramolecular interactions such as hydrogen bonds and π–π stacking. Increasing our understanding of the conformational preferences of molecules will help improve our ability to find low energy conformers efficiently, which will have impact in a wide range of computational modeling applications.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>objective function. We compare the search performance of BOKEI, the BOA with Expected Improvement (BOA-EI), and a genetic algorithm (GA), using a fixed number of energy evaluations. In more than 60% of the cases examined, BOKEI finds lower energy conformations than global optimization with BOA-EI or GA. More importantly, we find correlated torsions in up to 15% of small molecules in larger data sets, up to 8 times more often than previously reported. The BOKEI patterns not only describe steric clashes, but also reflect favorable intramolecular interactions such as hydrogen bonds and &#960;-&#960; stacking. Increasing our understanding of the conformational preferences of molecules will help improve our ability to find low energy conformers efficiently, which will have impact in a wide range of computational modeling applications.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Many molecules can adopt multiple geometrically-distinct conformers. Considerable effort has been devoted to understanding the influence of structure on function, notably in the fields of protein folding, and protein-ligand binding. <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref> Finding the energetically-lowest conformation of a small molecule is a common task in computational chemistry. <ref type="bibr">4,</ref><ref type="bibr">5</ref> Here, we introduce a new search method that extends our previously proposed search strategy, Bayesian Optimization Algorithm, BOA, <ref type="bibr">6</ref> by incorporating prior knowledge of correlated adjacent pairs of torsional angles.</p><p>We recently showed BOA tends to find the lowest energy conformation of small to medium organic molecules more efficiently than both an exhaustive systematic search, Confab, <ref type="bibr">7</ref> and a uniform random search, when evaluated using a "fixed-rotor approximation" and the MMFF94 <ref type="bibr">6</ref> force field. BOA required an order of magnitude fewer energy evaluations than these methods to find the lowest energy conformation. Drawing from statistical mechanics, BOA begins with an initial estimate of the probability of likely dihedral angles (e.g. 60 &#8226; , 120 &#8226; , 180 &#8226; , etc.) and updates these probabilities by evaluating a new point on the potential energy surface (PES). In this way, BOA "learns" the most likely dihedral angles of a molecule from all previous observations, and determines the next query point based on the model's uncertainty. This approach balances exploration and exploitation, and prevents the search from being trapped in local minima. However, BOA suffers from two limitations: one is that it tends to unnecessarily sample high energy regions of the potential energy surface. A second is that the Gaussian Process (GP) regression that is used as a surrogate model, which is well-known to have high computational complexity, although recent studies <ref type="bibr">8</ref> have shown the asymptotic complexity of the exact GP inference can be reduced to O(N 2 ), where N is the total number of energy evaluations. Here, we introduce a new knowledge-based acquisition function to address the first issue. Although GP has high computational complexity, we still use it as the surrogate model in this work. Its well-calibrated model of uncertainty is heavily used to guide the sampling on the PES in this work.</p><p>Various knowledge-based methods <ref type="bibr">[9]</ref><ref type="bibr">[10]</ref><ref type="bibr">[11]</ref><ref type="bibr">[12]</ref> have been proposed for (diverse) conformer generation.</p><p>These methods utilise the torsional preferences in guiding conformer sampling and the torsion rules are typically derived from databases of experimental X-ray crystal structures, such as the Protein Data Bank <ref type="bibr">13</ref> and the Cambridge Structural Database, <ref type="bibr">14</ref> although they can also be derived from computed structures. Context and nearest neighbour effects are generally ignored in the torsion rules: each dihedral is treated as an independent free rotor. <ref type="bibr">1</ref> Structural information about adjacent and proximal rotatable bonds can, however, be crucial for conformer generation, as these torsion angles are naturally constrained. This can help to reduce steric clashes, retain &#960;-conjugation, align intramolecular hydrogen bonds, or preserve other similar non-covalent interactions. For instance, Figure <ref type="figure">1b</ref> shows the computed MMFF94 potential energy surface for 5-phenylthioquinazoline-2,4-diamine, with light blue indicating conformations with low energies. Due to steric clashes, the neighboring dihedral angles are clearly correlated and thus the conformational search can be greatly focused on the most favorable regions of the state space by incorporating this information. Even in a simpler molecule such as ortho-1,1 :2 ,1 -terphenyl, there are correlations between non-neighboring dihedral angles due to steric clashes (Figure <ref type="figure">1d</ref>).</p><p>In this work, we examine the distributions of correlated torsions in (i) X-ray crystal structures; (ii) the lowest-energy conformations from MMFF94; and (iii) an approximate quantum method, GFN2. We analyse the distributions of the correlated torsions in lowest-energy conformations computed first by MMFF94, and then by GFN2, and use these distributions to constrain the search space in Bayesian optimization using a modified acquisition function. We show that this significantly improves the efficiency of the search for low-energy conformations. We also show that correlated dihedral angles are common in organic small molecules.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Material and Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">Knowledge-based Method</head><p>Guba et al. <ref type="bibr">15</ref> derived a set of rotatable bond SMARTS patterns that are used in RDKit's <ref type="bibr">16</ref> Experimental-Torsion Distance Geometry with basic Knowledge (ETKDG) algorithm, <ref type="bibr">17</ref> and also in BOA, <ref type="bibr">6</ref> the predecessor of our new method, BOKEI. This library only considered substructures consisting of a single rotatable bond. We expanded this set of patterns to consider correlated torsions in substructures consisting of two adjacent rotatable bonds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.1">Correlated Torsion Rules</head><p>We enumerated all possible pairs of rotatable bond SMARTS patterns as defined in the library of Guba et al., <ref type="bibr">15</ref> and counted the frequencies of the corresponding pairs of torsion angles observed in ligands having five or fewer rotatable bonds in the Crystallography Open Database (COD). <ref type="bibr">18,</ref><ref type="bibr">19</ref> We excluded pair patterns with fewer than 100 observations, resulting in 19 correlated torsion patterns, including one SMARTS pattern suggested by Cole et al. <ref type="bibr">9</ref> (see Appendix 1 Table <ref type="table">S1</ref>). The remaining substructures defined by Cole et al. were discarded due to insufficient observations in our dataset. Future work will expand on this initial set as discussed below.</p><p>For molecules that contain any matches of the derived SMARTS patterns, we calculated the lowest energy conformation and recorded the corresponding torsion angles. We performed the calculations with two energy functions: (i) a force field, MMFF94, 20 and (ii) a semi-empirical method, GFN2. <ref type="bibr">21,</ref><ref type="bibr">22</ref> We then modeled the observed torsion distribution with bivariate von Mises mixture models for all three sources of structure (X-ray crystal structure, MMFF94, and GFN2) separately. Details of the calculations and models can be found in Appendix 2. The bivariate von Mises distribution has been used to model torsion angles in protein structures. <ref type="bibr">[23]</ref><ref type="bibr">[24]</ref><ref type="bibr">[25]</ref> There are two advantages of using this distribution for correlated torsions:</p><p>(i) it can model correlated torsions that cannot be described by a simple clash term; and</p><p>(ii) it can be easily integrated with existing conformer sampling schemes, such as distance geometry. <ref type="bibr">26</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Bivariate von Mises Distribution and Mixture Models</head><p>The bivariate von Mises distribution is a probability distribution that can be used to jointly model two angular variables (&#952; 1 and &#952; 2 ). It can be thought of as an analogue of the bivariate normal distribution on a torus. In particular, we used the cosine variant of the von Mises distribution, which is as follows:</p><p>where the parameters (&#181;, &#957;) and (&#954; 1 , &#954; 2 ) in the model represent the mean, and concentrations respectively. (&#954; 3 ) is a parameter controlling the correlation.</p><p>The bivariate von Mises distribution can be unimodal or bimodal under different conditions (see Appendix 2). A single bivarate von Mises distribution is not sufficient to capture the multimodality of the torsion preferences. We used mixture models (Eq. 2) to describe the torsion preference entirely; the probability, P (&#952; 1 , &#952; 2 ) of observing the pair of torsion angles, &#952; 1 and &#952; 2 , is given by:</p><p>where K is the number of components in the model, and w i is the weight of each component.</p><p>The estimation procedure of the parameters of the mixture models are explained in Appendix 2.</p><p>We should note that the bivariate von Mises mixture model requires sufficient data to accurately describe torsion preferences, so cases with small numbers of observations were excluded in this work. This limits the current performance of our algorithm, and we discuss some potential solutions in Section 3.7.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">Bayesian Optimization</head><p>The general idea of Bayesian optimization is to construct a surrogate model that approximates a black box objective function, f (x), and exploit this model to decide which points to evaluate next. The sampling strategy is determined by the choice of acquisition function, such as Expected Improvement (EI) and Lower Confidence Bound (LCB). Gaussian Process is commonly used as the surrogate model. For more detail about the Gaussian Process and a review of Bayesian optimization, see Rasmussen and Williams, <ref type="bibr">27</ref> Brochu et al. <ref type="bibr">28</ref> and Shahriari et al. 29</p><p>Here,</p><p>, where &#952; best , &#181;(&#952;), and &#963; 2 (&#952;) are the best current value, predictive mean, and predictive variance respectively; while &#934;(&#8226;) and &#966;(&#8226;) are the cumulative distribution function and probability density function, respectively; and &#947; is a parameter to balance exploration against exploitation. We should note that the acquisition functions (Eq. 3 and 4) takes the model uncertainty into account when selecting next query points, but it is still possible to select points in regions with steric clashes. In this work, we therefore define a new acquisition function that makes use of our domain knowledge, namely Knowledge-based</p><p>Expected Improvement (KEI), to address this problem.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.1">Knowledge-based Expected Improvement</head><p>Knowledge-based Expected Improvement (KEI) can be considered as a modified expected improvement (EI) acquisition function that offers improvement only when a set of torsion constraints are satisfied:</p><p>where M is the total number of correlated torsions found in the molecule, P m (&#952; m,1 , &#952; m,2 ) is the mixture model of the torsion angle pairs in pattern, m. We assume independence between each pair of correlated torsions. The idea of KEI is similar to the method of expected improvement with Boolean constraints suggested by Griffiths and Hern&#225;ndez-Lobato, and Gelbart et al., with a user-specified minimum confidence of the constraints. <ref type="bibr">30,</ref><ref type="bibr">31</ref> Instead of Boolean constraints, we derived separate distributions of the correlated torsions from the lowest energy conformations found by MMFF94, and GFN2. These were encapsulated by bivariate von Mises mixture models, which are used to constrain the search.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.2">Covariance Function</head><p>Since potential energy is known to be periodic with respect to dihedral angle, a locally periodic kernel, k LP , which is a product of a periodic kernel and a squared exponential kernel, was used:</p><p>where l, p, and &#963; 2 are the length-scale, periodicity, and variance, respectively. The periodicity is determined by torsional potentials corresponding to the rotatable bond SMARTS patterns.</p><p>Note that for missing patterns did not match a specific type of rotatable bond, i.e. did not match a SMARTS pattern, we assigned general values for the periodicity based on the atomic hybridization of the two atoms in the central rotatable bond, i.e. sp 2 -sp 2 , sp 2 -sp 3 , or sp 3 -sp 3 . Boundary constraints were added to the length-scale in the kernel for numerical stability.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">Data</head><p>We used two datasets, the Platinum dataset, <ref type="bibr">3</ref> and a dataset assembled by Ebejer et al., <ref type="bibr">32</ref> to benchmark the performance of the search algorithms. Duplicated molecules in the two datasets were removed based on their InChI Key. Molecules with 2 to 18 rotatable bonds, and containing two adjacent rotatable bonds matching the set of rotatable bond-SMARTS patterns, were selected for the study, giving a set of 533 unique molecules.</p><p>We extracted small molecules with matching rotatable bonds from the Crystallography Open Database (COD), and removed duplicate molecules from the COD set that were present in both the validation set and COD based on their InChI Key. We recorded the torsional preferences in these crystal structures. In addition, we calculated the lowest energy conformations of these molecules using MMFF94 and GFN2, and recorded the resultant calculated torsional for each. We then derived bivariate von Mises mixture models from the torsion preferences found in X-ray crystal structures, and calculated by the MMFF94 and GFN2 methods.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5">Evaluation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.1">Energy function</head><p>Previously, <ref type="bibr">6</ref> we computed a single-point MMFF94 force field energy while keeping the small molecule's bond lengths and bond angles fixed. Here, we relaxed the framework and evaluate the energy of the fully-optimized molecule, i.e. the energy value at a given set of dihedral angles was a result of a short (50 steps) MMFF94 energy minimization, with a concomittant change in bond lengths and bond angles, while the torsion angles remain fixed in geometry optimization. We used the MMFF94 implementation in Open Babel 2.4.1. <ref type="bibr">33</ref> The configurations in the benchmark datasets were used as the initial structures.</p><p>In addition, we performed single-point energy calculations with a more accurate semiempirical method, namely GFN2. <ref type="bibr">21,</ref><ref type="bibr">22</ref> Similarly, the configurations in the benchmark datasets were used as initial structures. The bond length, bond angles and other parameters were inherited from the input structures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.5.2">Comparison</head><p>Three global optimization conformational search methods were compared: (i) a Genetic Algorithm (GA), and our Bayesian Optimization Algorithm, BOA, with two different acquisition functions: (ii) our previous expected improvement (EI) method,, 6 and (iii) the new knowledge-based expected improvement (KEI) strategy. The implementations are described below.</p><p>(i) Bayesian Optimization with EI (BOA-EI). <ref type="bibr">6</ref> GPyOpt 34 was used for the Bayesian optimization and Pybel <ref type="bibr">35</ref> was used to drive the torsion angles of the molecules and energy minimization. Note that torsion constraints were used in the geometry optimization step (i.e., minimization of all other degrees of freedom, bonds, angles, etc. with fixed dihedral angles). Expected improvement was used as the acquisition function. Five initial random samples were generated to begin a Gaussian Process regression. We also added boundary constraints to the length-scale parameter for the sake of numerical stability. The lowest energy conformation calculated from all iterations returned as the final output structure.</p><p>(ii) Bayesian Optimization with KEI (BOKEI). The implementation was the same as the standard Bayesian optimization in (i), except a knowledge-based acquisition function KEI was used.</p><p>(iii) Genetic Algorithm (GA). The implementation in Open Babel <ref type="bibr">33</ref> for GA was used.</p><p>Termination criterion, i.e. convergence was reached when three identical generations were observed, was applied in GA; all other GA parameters were left as their default values. Note that no torsion constraints were added in the energy minimization step as surrogate model was not required.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.6">Search Space</head><p>The search space for each molecule is determined by the set of freely-rotatable bonds in each.</p><p>The search space for the Bayesian optimization and its variant was defined by hypercube</p><p>, where d is the number of rotatable bonds. A discrete grid space was used for GA.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.6.1">Search Budget</head><p>The number of energy evaluations was determined by the number of rotatable bonds in the molecule (see Table <ref type="table">1</ref>). Since five initial samples were used to fit a Gaussian process in Bayesian optimization, only K -5 conformations were sampled after initial sampling. For accurate statistical comparisons of these stochastic methods, five runs were performed for each algorithm. and that from BOA-EI was calculated. The average energy difference was used to evaluate the performance of the search methods. Note that the energy. The average energy difference is calculated as follows:</p><p>where N is the number of runs. E EI,i and E KEI,i are the lowest energy found by EI and KEI in i-th run respectively. E GA is the lowest energy conformation found in all runs (do not depend on i-th run). The lowest energy conformation found by BOA-EI was used as reference in both cases. Since same initialization was used in both BOKEI and BOA-EI, i.e. five initial samples were the same, we could directly compare the performance between them in each run. In GA, We compared its best performance to each run in BOA-EI. Two different energy functions were used in the context and we denote &#8710;E M M F F 94 and &#8710;E GF N 2 to be the average energy difference in MMFF94 and GFN2 respectively.</p><p>Wilcoxon signed-rank test was used to test whether the proposed method, BOKEI, finds lower average energy conformations than BOA-EI and GA, i.e. one-sided test. We tested it across number of rotatable bonds with both MMFF94 and GFN2 energy functions.</p><p>Furthermore, we calculated the frequency of BOKEI in finding lower energy conformations than BOA-EI and GA. We also computed the pattern frequency of our derived torsions pattern across three datasets, namely Platinum dataset, <ref type="bibr">3</ref> COD, and ChEMBL 25, <ref type="bibr">36</ref> and compared to that of the correlated torsion patterns defined in CSD Conformer Generator.</p><p>Lastly, we performed run time analysis on BOA-EI and BOKEI, using a desktop running Fedora 30 with an Intel Core i7-6700 operating at 3.40 GHz, and 32 GB of RAM. Single core was used to read molecule, drive torsions and write conformers to disks. All cores were used in the GPyOpt operations. The time included reading input molecules and writing the conformers to disk. Fifteen molecules with two to six rotatable bonds, three for each, were sampled. We repeated the search with four times each molecule.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Results and Discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Example</head><p>Two molecules were sampled to illustrate the strength and the weakness of the BOKEI algorithm, using the geometry-optimized MMFF94 energy (see Appendix 3 for more examples with GFN2). Ten runs were performed for each molecule. From Figure <ref type="figure">2</ref>, we observed the BOKEI was able to find lower energy conformation consistently with same number of iterations. The energy gap between BOKEI and BOA-EI decreased as the number of iterations increased, since both methods should converge to the same global minimum. We should note that the performance of BOKEI was worse than BOA-EI in this example 2b, which was a result of under-estimation of the correlated torsion distribution. Insufficient sampling or biased selection of molecules gave rise to the incomplete prior information, and led to the inferior performance. Using additional data to improve the correlated distributions, even in this case, the performances became similar, as discussed in Section 3.7.</p><p>We revisited the example, 5-Phenylthioquinazoline-2,4-diamine, and studied the effect of the new acquisition function on the posterior. Figure <ref type="figure">S5</ref> in Appendix 2 showed that the posterior of BOA-EI and BOKEI after twenty iterations (25 observations in total, with five initial samples). GFN2 energy function was used in this example. The posterior mean showed a coarse boundary between high and low energy regions. The observations in BOKEI were more closely packed in the low energy region, comparing to that in BOA-EI. Only a small number of observations were sampled in medium or high energy region, the posterior standard deviations thus remained high.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Dataset Summary</head><p>For a broader comparison, 533 molecules with sizes from 2-18 rotatable bonds were used to validate the performance. In GFN2, we benchmarked the performance with molecules up to 13 rotatable bonds only. The number of matches for each pattern in the validation set is summarized in Appendix 3 Figure <ref type="figure">S7</ref>. Five correlated SMARTS patterns are frequently found, with frequency greater than 100. Note that there were nine molecules (five in MMFF94 and four in GFN2) excluded from analysis due to early stopping in one of the five runs (see Appendix 4 Table <ref type="table">S8</ref> and<ref type="table">S9</ref>). This was manifested by a non-positive definite kernel error.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">MMFF94</head><p>Figure <ref type="figure">3a</ref> showed that BOKEI consistently found lower energy conformation than BOA-EI and GA. A Wilcoxon signed rank test showed that energy difference between BOKEI and BOA-EI was statistically significant (p &lt;&lt; 0.01, see Appendix 3 Table <ref type="table">S4</ref>) across all rotatable bonds. On the other hand, we observed that the GA outperformed BOKEI and BOA-EI for molecules with more than twelve rotatable bonds. This was because the small number of samples (200 energy evaluations) may not be sufficient for the BOA-EI or BOKEI models to learn the most likely dihedral angles in high dimensional problems. Figure <ref type="figure">S6a</ref> in Appendix 3 showed that BOKEI frequently (&gt; 63% and &gt; 70%) found lower energy conformations than BOA-EI and GA for molecules with fewer than eleven rotatable bonds respectively Furthermore, Figure <ref type="figure">2</ref> highlighted that BOKEI showed greater benefit earlier on in the evaluations. Comparing the mean energy difference between BOKEI and BOA-EI at different stages (40%, 60% and 100% of the maximum number of energy evaluations), the energy gap was indeed greater, and in favor of BOKEI, in the early stages (see Appendix 3, Table <ref type="table">S6</ref>).</p><p>The gap diminished when more evaluations were used, since both methods converged to the same global optimum. These results suggest that the information about correlated torsions greatly helped the search in the earlier stages, pointing the search towards favorable regions of the potential energy landscape.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4">GFN2</head><p>In GFN2, we used a single-point energy calculation, and excluded GA in the analysis. Figure <ref type="figure">3b</ref> showed that BOKEI consistently found lower energy conformations than BOA-EI.</p><p>Similarly, Wilcoxon signed rank test showed the average energy difference between BOKEI and BOA-EI was statistically significant (p &lt;&lt; 0.01, see Appendix 3 Table <ref type="table">S5</ref>). The energy gap was greater in the early stage and the gap diminished as more energy evaluations were used (see Appendix 3 Table <ref type="table">S7</ref>). In addition, Figure <ref type="figure">S6b</ref> in Appendix 3 showed that BOKEI frequently (&gt; 60%) found lower energy conformations than BOA-EI across all rotatable bonds.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.5">Correlated Torsion</head><p>When adjacent rotatable bonds have correlated dihedral angles, this is typically caused by unfavorable steric interactions -but not always. Four of the nineteen patterns in our library arise because of favorable intramolecular interactions, such as hydrogen bonds and &#960;-&#960; stacking. For example, in pattern 15, the lowest energy conformations found by GFN2 often form intramolecular hydrogen bonds between the N-H or O-H groups and the adjacent carbonyl oxygen atoms in the esters (see Figure <ref type="figure">4a</ref>).</p><p>The thioamide functional group is a key part of patterns 2 and 16 (see Figure <ref type="figure">4b</ref>). The delocalization of the nitrogen lone pairs in this group contributes to its overall planarity, but it could exist in either the cis or trans form. The orientation of the aromatic ring in pattern 2 is thus highly constrained. The cis/trans preference is easily revealed by examining higherorder correlated torsions, i.e. between three adjacent rotatable bonds. In particular, we considered a specific thiourea derivative that is bonded to a carbonyl group (see Appendix 1 Table <ref type="table">S2</ref> ). This is always observed to adopt the following conformation: (i) the C=S and C=O are oriented in 'opposite' directions, while (ii) the thiourea adopts the syn-anti <ref type="bibr">37</ref> conformation. This results in the formation of a pseudo six-membered ring that is stabilized by a C=O --H-N intramolecular hydrogen bond (see for example, Figure <ref type="figure">4b</ref>). Figure <ref type="figure">S3</ref> in Appendix 3 shows these three torsion angles are highly concentrated around 0 &#8226; in COD.</p><p>For pattern 17, &#960;-&#960; stacking is evident: when aromatic rings are attached to both ends of the pattern, both rings prefer to interact with one another (see Fig <ref type="figure">4c</ref>).</p><p>It should be noted that the CSD Conformer Generator 9 also considers 11 correlated torsions, but a simple clash term are used for all other interactions. Here, we use a more flexible approach that employs bivariate von Mises mixture models to fully describe the correlated torsions. Both favorable intramolecular interactions and unfavorable steric clashes can be described. It would also possible to expand this to a multivariate case, 38 in order to capture higher-order correlations as mentioned earlier. Additionally, the framework is easy to inte-grate with other conformer sampling schemes, such as distance geometry. <ref type="bibr">17,</ref><ref type="bibr">26</ref> We intend to integrate with the ETKDG in RDKit in the future as the current implementation does not consider the correlated torsion.</p><p>In addition, we observed the torsion preference of MMFF94 differed from the one in GFN2 and crystal structures in pattern 1, 3 and 12. It suggested some of the molecular interactions could not be represented in the classical force field. In pattern 2, we notice a discrepancy between the crystal conformation and the lowest energy conformation in GFN2 -the second torsion angles (i.e. the torsion angles measured from the thioamide group) are highly concentrated around 180 &#8226; , i.e. the trans-form. This could be explained by the separate hydrogen bond interaction of the N-H group and the C=S in the crystal.</p><p>Furthermore, we compiled the number of the molecules with the presence of correlated torsions in three different datasets: (i) Platinum, (ii) COD and (iii) ChEMBL 25 (see Table <ref type="table">2</ref>). We showed that our SMARTS patterns library surprisingly matched 10 -15% organic molecules, which was noticeably higher than the patterns defined in CSD Conformer Generator (1 -4%). These results suggest that broader investigation of correlated torsions is warranted, despite the conventional assumption of each rotatable bond as an independent free rotor. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.6">Computational Time</head><p>Figure <ref type="figure">5</ref> showed the average run time of BOA-EI and BOKEI with GFN2 energy function, varying number of iterations (50, 100) and number of rotatable bonds (two to six rotatable bonds). Both computational cost increased as the number of rotatable bonds increased. The computational time also increase when BOKEI was used, but was primarily dominated by the number of conformers generated. Note that the current implementation can be further optimized by providing the gradient information of the acquisition function.</p><p>In theory, extra multiplication in the BOKEI acquisition function increases the computational complexity to O(mn), where m is the number of correlated torsions found in a molecule, and n is the number of samples used to evaluate the acquisition function. We should note that the relative contribution to the computational time of the new acquisition function will be small, when a more accurate and computational expensive method, such as density functional theory (DFT), is used for energy evaluation. Our new algorithm will be more cost-effective than the old version in such settings.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.7">Limitation and Future work</head><p>We only calculated the lowest energy conformation of the molecules with up to five rotatable bonds in the COD set, and the low energy region of the correlated dihedral could be incomplete. This limited the performance of the algorithm. This issue can be easily solved by increase sampling of the corresponding substructures in a larger database, for instance ChEMBL <ref type="bibr">36</ref> and PubChem, <ref type="bibr">39</ref> and re-estimate the distribution. Figure <ref type="figure">6a</ref> showed the original prior used in Example 2b. We observed that the cluster centroids shifted when observations from ChEMBL were used (see Figure <ref type="figure">6b</ref>), and Figure <ref type="figure">6c</ref> showed a great improvement in convergence rate comparing to the original case. In addition, hundreds of observations were typically required to fit the mixture models. Cole et al. derived a set of SMARTS patterns, and we did not use all of them due to insufficient observations. In order to overcome this obstacle, we could apply a meta-learning approach proposed by Ton et al., <ref type="bibr">40</ref> which attempt to learn the conditional distribution of the correlated torsion. They showed that the approach could generalise the density of the correlated torsions with few observations. This meta-learning setting could greatly reduce the computational cost in learning torsion rules, and potentially help discover more unexpected torsion patterns for the sampling scheme.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusions</head><p>By reformulating the search of the lowest energy conformation of a given molecule as a constrained Bayesian optimization problem, we have shown concrete improvements. Prior knowledge of correlated torsions were used to confine the exploration to regions of low energy.</p><p>We compared the Bayesian optimization with two different acquisition functions: standard expected improvement (EI) and knowledge-based expected improvement (KEI), and genetic algorithm (GA), using two energy functions: MMFF94 and GFN2. We showed that with the same number of energy evaluations, the Bayesian optimization with KEI (BOKEI) frequently (&gt; 60%) found lower energy conformations (median energy difference 1.95 kcal/mol in MMFF94 and 1.54 kcal/mol in GFN2) than the Bayesian optimization with EI (BOA-EI) in both cases, across all rotatable bonds.</p><p>Importantly, using bivarivate von Mises mixture models to describe the correlated dihedral allowed us to capture correlation that could not be explained by simple clash terms, and it could be integrated into other conformer sampling frameworks easily. Furthermore, we showed that the correlated torsion not only reflect steric clashes, but also favorable intramolecular interactions such as hydrogen bonds and &#960;-&#960; stacking.</p><p>Future work should focus on expanding data sources, to ensure sufficient sampling across a wide range of correlated dihedrals including other types of neighbors, non-nearest neighbors.</p><p>Moreover, ring torsions were not investigated, which are well-known to involve correlations torsional motion (e.g. Cremer-Pople angles and ring pucker). <ref type="bibr">41,</ref><ref type="bibr">42</ref> Such efforts will improve the efficiency in sampling low-energy conformers for applications in property-driven drug design, materials screening, and crystal structure prediction.     Figure <ref type="figure">5</ref>: Average computational time for BOA-EI and BOKEI with GFN2 energy function, using different number of energy evaluations (50, 100). The computational time increased as the number of rotatable bonds increase, but was dominated by the number of conformers generated.</p></div></body>
		</text>
</TEI>
