<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Performance and Analysis of the Alchemical Transfer Method  for Binding Free Energy Predictions of Diverse Ligands</title></titleStmt>
			<publicationStmt>
				<publisher>arXiv</publisher>
				<date>08/16/2023</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10470161</idno>
					<idno type="doi"></idno>
					<title level='j'>arXivorg</title>
<idno>2331-8422</idno>
<biblScope unit="volume"></biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Chen Lieyang</author><author>Yujie Wu</author><author>Chuanjie Wu</author><author>Ana Silveira</author><author>Woody Sherman</author><author>Huafeng Xu</author><author>Emilio Gallicchio</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[The Alchemical Transfer Method (ATM) is herein validated against the relative binding free energies of a diverse set of protein-ligand complexes. We employed a streamlined setup workflow, a bespoke force field, and the AToM-OpenMM software to compute the relative binding free energies (RBFE) of the benchmark set prepared by Schindler and collaborators at Merck KGaA. This benchmark set includes examples of standard small R-group ligand modifications as well as more challenging scenarios, such as large R-group changes, scaffold hopping, formal charge changes, and charge-shifting transformations. The novel coordinate perturbation scheme and a dual-topology approach of ATM address some of the challenges of single-topology alchemical relative binding free energy methods. Specifically, ATM eliminates the need for splitting electrostatic and Lennard-Jones interactions, atom mapping, defining ligand regions, and post-corrections for charge-changing perturbations. Thus, ATM is simpler and more broadly applicable than conventional alchemical methods, especially for scaffold-hopping and charge-changing transformations. Here, we performed well over 500 relative binding free energy calculations for eight protein targets and found that ATM achieves accuracy comparable to existing state-of-the-art methods, albeit with larger statistical fluctuations. We discuss insights into specific strengths and weaknesses of the ATM method that will inform future deployments. This study confirms that ATM is applicable as a production tool for relative binding free energy (RBFE) predictions across a wide range of perturbation types within a unified, open-source framework.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Introduction</head><p>Alchemical binding free energy prediction tools are emerging as a best-in-class standard for in silico prediction of binding free energies (i.e., protein-ligand binding affinity) in structurebased drug design. <ref type="bibr">[1]</ref><ref type="bibr">[2]</ref><ref type="bibr">[3]</ref><ref type="bibr">[4]</ref><ref type="bibr">[5]</ref><ref type="bibr">[6]</ref> While many challenges remain, <ref type="bibr">7,</ref><ref type="bibr">8</ref> the increased utilization of RBFE calculations has been fueled in part by the promising results of large-scale validation campaigns against benchmark sets representative of actual drug discovery projects. <ref type="bibr">[8]</ref><ref type="bibr">[9]</ref><ref type="bibr">[10]</ref><ref type="bibr">[11]</ref><ref type="bibr">[12]</ref><ref type="bibr">[13]</ref><ref type="bibr">[14]</ref><ref type="bibr">[15]</ref><ref type="bibr">[16]</ref><ref type="bibr">[17]</ref><ref type="bibr">[18]</ref><ref type="bibr">[19]</ref> Despite decades of progress, reliable prediction of binding affinities of protein-ligand complexes remains a challenging problem with many unresolved issues, especially related to large chemical modifications, core transformations, and formal charge changes. In principle, the dissociation constant of a complex can be measured via brute force molecular dynamics (MD) simulations by sampling many binding/unbinding events. <ref type="bibr">20</ref> However, this approach is generally not practical because the typical residence time of the ligand in the binding site (from milliseconds to hours) is too computationally expensive for practical applications in drug discovery. The alternative physical pathway methods <ref type="bibr">[21]</ref><ref type="bibr">[22]</ref><ref type="bibr">[23]</ref><ref type="bibr">[24]</ref><ref type="bibr">[25]</ref><ref type="bibr">[26]</ref><ref type="bibr">[27]</ref><ref type="bibr">[28]</ref> consist of measuring the reversible work of dragging the ligand from the solution to the binding site (or vice versa) along a chosen route. While physically appealing and computationally more efficient than brute force MD, <ref type="bibr">26,</ref><ref type="bibr">[29]</ref><ref type="bibr">[30]</ref><ref type="bibr">[31]</ref> physical pathway methods are rarely used in small molecule structurebased drug design because they require overcoming transition states and they do not readily apply to the common enclosed binding sites that lack a clear entryway. <ref type="bibr">27</ref> Furthermore, physical pathway methods do not apply to the direct estimation of relative binding free energies useful in drug discovery applications.</p><p>In drug discovery applications, knowledge of the change in binding affinity resulting from modifying a ligand into another is usually more relevant than that of their absolute binding affinities due to the nature of the iterative design-make-test (DMT) cycle that is almost always employed to advance a hit to a development candidate. <ref type="bibr">2,</ref><ref type="bibr">10,</ref><ref type="bibr">12</ref> Furthermore, while sampling of the full binding/unbinding pathway can yield information about binding kinetics, it is unnecessary for the computation of binding thermodynamics. Accordingly, Relative Binding Free Energy (RBFE) alchemical protocols can estimate the difference in the binding free energies of a pair of related ligands more directly than computing each of their absolute binding free energies. <ref type="bibr">19,</ref><ref type="bibr">[32]</ref><ref type="bibr">[33]</ref><ref type="bibr">[34]</ref> That being said, absolute binding free energies still have great potential value in the context of virtual screening of diverse molecules. <ref type="bibr">35</ref> This work focuses on relative binding free energies in the context of optimizing screening hits to drugs.</p><p>In most RBFE software implementations, the transformation of one molecule to another is accomplished by parameter interpolation approaches <ref type="bibr">33,</ref><ref type="bibr">36</ref> that scale parameters of the potential energy function to convert one ligand into another through an alchemical transforma-tion. However, parameter interpolation schemes require complex and often non-transferable customization of the energy subroutines of molecular dynamics engines and custom soft-core pair-potentials used to correct singularities of the alchemical potential energy function. <ref type="bibr">37,</ref><ref type="bibr">38</ref> Moreover, parameter interpolation implementations typically do not directly connect the two ligands in their bound states. Rather, they rely on a thermodynamic cycle and alchemical calculations in solution and receptor environments separately, and often each step is further split into the decoupling of electrostatic and non-electrostatic interactions to avoid numerical instabilities. <ref type="bibr">8,</ref><ref type="bibr">33</ref> Furthermore, transformations involving changes in the net ligand charge require the additional calculation of correction factors. <ref type="bibr">[39]</ref><ref type="bibr">[40]</ref><ref type="bibr">[41]</ref><ref type="bibr">[42]</ref><ref type="bibr">[43]</ref> Atom mapping procedures to find corresponding atom pairs for interpolation and the creation or annihilation of atoms to dummy types add complexities to current single-topology alchemical RBFE protocols. <ref type="bibr">11,</ref><ref type="bibr">[44]</ref><ref type="bibr">[45]</ref><ref type="bibr">[46]</ref> Single-topology transformations often encode non-standard molecular topology formats that require custom system setup workflows that are incompatible with standard molecular visualization and trajectory analysis tools. In many implementations, RBFE alchemical schemes are limited to R-group transformations involving ligand pairs sharing a common scaffold. <ref type="bibr">8,</ref><ref type="bibr">47</ref> Except for a few commercial products, <ref type="bibr">[48]</ref><ref type="bibr">[49]</ref><ref type="bibr">[50]</ref> scaffold-hopping RBFE calculations involving cyclization, ring expansion, linking, or any other transformation that necessitate the breaking or the formation of chemical bonds, <ref type="bibr">51</ref> are not generally supported.</p><p>We have recently developed the Alchemical Transfer Method (ATM) that resolves many of the aforementioned complexities of RBFE calculations. The key innovations of the method are 1) the mapping of the potential energy functions of the unbound and bound states by a coordinate transformation rather than a variation of parameters and 2) expressing the alchemical potential energy function in a dual-topology formulation as the combination of the energy functions of the physical end states rather than the interpolation of their parameters. <ref type="bibr">28,</ref><ref type="bibr">34,</ref><ref type="bibr">52,</ref><ref type="bibr">53</ref> ATM is not as affected by the complexities of traditional alchemical methods. It supports absolute and relative binding free energy calculations in a unified way, computes free energies directly employing a single simulation box with standard chemical topologies, and natively supports diverse perturbations (standard R-groups, charge-changing, and scaffold-hopping transformations) without correction factors and ancillary calculations. Furthermore, since ATM employs a dual-topology formalism and does not use parameter interpolation or custom soft-core pair potentials, it is easier to implement and more straightforward to transfer across MD engines because it uses the unmodified energy routines of the underlying molecular dynamics engine. For the same reason, it applies to any molecular energy function, including the next generation of polarizable, <ref type="bibr">[54]</ref><ref type="bibr">[55]</ref><ref type="bibr">[56]</ref><ref type="bibr">[57]</ref> quantum-mechanical, <ref type="bibr">[58]</ref><ref type="bibr">[59]</ref><ref type="bibr">[60]</ref><ref type="bibr">[61]</ref> and machine-learning potentials <ref type="bibr">62,</ref><ref type="bibr">63</ref> that are starting to be employed in macromolecular simulations. The current open-source software release of ATM employs the OpenMM molecular dynamics engine and has been successfully tested on a series of drug discovery targets with the AMBER molecular mechanics force field in academic and industrial settings. <ref type="bibr">18,</ref><ref type="bibr">34</ref> The simplifications and greater range of applicability afforded by the alchemical transfer approach can be particularly useful in drug-discovery deployments to screen large and diverse ligand libraries in a more streamlined fashion. In this work, we validate ATM against the community benchmark prepared by Schindler et al., <ref type="bibr">12</ref> which contains examples of standard peripheral group transformations as well as more challenging scaffold-hopping and chargechanging transformations representative of real-world drug-discovery applications.</p><p>Given the large number of calculations involved, aspects of the ATM workflow have been automated, which was facilitated by the nature of the ATM approach that avoids custom alchemical topologies and atom mapping typical of conventional RBFE workflows. <ref type="bibr">4,</ref><ref type="bibr">8,</ref><ref type="bibr">36,</ref><ref type="bibr">64,</ref><ref type="bibr">65</ref> Knowing that the quality of the potential energy functions can have a substantial effect on the free energy prediction accuracy, <ref type="bibr">66,</ref><ref type="bibr">67</ref> we applied a bespoke force field parameter generation protocol for each of the ligands. While the force field generation engine (FFEngine) is not publicly available, the parameters for each of the ligands in this work have been included in the Supporting Information. Thus, the work presented here is fully reproducible using the open-source version of ATM and the published force field parameters.</p><p>The paper is organized as follows: We first introduce the theory of the Alchemical Transfer Method and present some of the key implementation details. We then describe the benchmarks sets, system setup, and alchemical simulation details. The results are presented and analyzed next. The paper concludes with a discussion of examples illustrating the strengths and weaknesses of the method and identifying areas of improvement for the application of ATM to drug discovery projects.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Theory and Methods</head><p>The Alchemical Transfer Method (ATM)</p><p>The alchemical Transfer Method (ATM) models the free energy difference between two chemical states related by a coordinate transformation. One such example is the molecular binding processes investigated here, represented as the translation of the ligand from the solvent to the receptor binding site. The method details are fully described in previously published works., <ref type="bibr">34,</ref><ref type="bibr">53</ref> so an abridged account is provided here. Briefly, consider the potential energy, U 0 (x), of the unbound state (R + A) of the complex between a receptor R and a ligand A when the ligand is in solution and where x = (x R , x A , x S ) represents the coordinates of the receptor, ligand, and solvent, respectively. ATM expresses the potential energy function, U 1 (x), that describes the state RA when the ligand is bound to the receptor as</p><p>where h is a fixed displacement vector that brings the ligand from its position in the solvent bulk to the receptor binding site (Figure <ref type="figure">1</ref>). The free energy difference between the bound and unbound states is then calculated by Free Energy Perturbation (FEP), similar to standard binding free energy methods. To this end, we define the perturbation energy function as</p><p>and introduce a &#955;-dependent alchemical potential energy function</p><p>where W &#955; (u) is an alchemical perturbation function with the properties W 0 (u) = 0 and W 1 (u) = u, ensuring that Eq. ( <ref type="formula">3</ref>) interpolates from the initial unbound state U 0 (x) at &#955; = 0 and the final bound state U 1 (x) at &#955; = 1. The linear alchemical perturbation function</p><p>In this work, we adopt a non-linear expression described in Computational Details that yields faster convergence than the default linear version. <ref type="bibr">68</ref> As elaborated elsewhere, <ref type="bibr">34,</ref><ref type="bibr">53</ref> the alchemical path between the unbound and bound endpoints is divided into two legs: one starting from the unbound state at &#955; = 0 using the alchemical potential in Eq. ( <ref type="formula">3</ref>), and a second leg starting from the bound state U 1 (x) morphing in the other direction towards the unbound state using the alchemical potential</p><p>Both legs terminate at &#955; = 1/2 at the ATM alchemical intermediate with the potential energy function U 1/2 (x) = [U 0 (x) + U 1 (x)]/2 that is an equally weighted average of the unbound and bound states.</p><p>The ATM formulation above is for an Absolute Binding Free Energy (ABFE) calculation.</p><p>Here, we are concerned with Relative Binding Free Energy (RBFE) prediction, in which the binding of a ligand B occurs simultaneously as another ligand, A, leaves the receptor binding site. The free energy change of this process is the difference between the binding free energies of the two ligands. More specifically, an RBFE ATM calculation computes the free energy change from the state RA + B with ligand A bound to the receptor to the state RB + A where ligand B is bound to the receptor. In ATM, this process is described by a coordinate transformation that translates ligand B from a position in the solvent to the binding site and simultaneously translates ligand A from the receptor binding site to the solvent. Using converge the unbound state of the receptor, which can be a slow process given the possibility of high-barrier conformational rearrangements, differences in binding site solvent structure, changes to the protonation/tautomerization states of binding site residues, and other differences that may exist between the bound and unbound states. ATM RBFE calculations always have a ligand in the binding site, thus minimizing these effects. The actual free energy of the unbound receptor is inconsequential in the context of RBFE calculations because it is a constant for each ligand.</p><p>The alchemical intermediates in RBFE calculations are unphysical states in which the ligands are present simultaneously in the receptor binding site and solution, each at a strength proportional to the coupling parameter &#955;. The more dissimilar the ligands, the more strained the conformations of the system required to accommodate the alchemical intermediate states.</p><p>This characteristic is reflected in the high free energy barrier encountered at the alchemical midpoint for some ligand pairs. The height of the free energy barrier is used below as one of the proxies to judge the quality of RBFE calculations and the confidence level of their estimates. The specific settings of the RBFE protocol used in this work are described in Computational Details.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>FFEngine Ligand Force Field Assignment</head><p>FFEgine is a force field toolkit built with ParmEd, <ref type="bibr">69</ref> RDKit, 70 GAFF2, 71 GFN2-xTB, <ref type="bibr">72</ref> GPU-powered QM package Terachem 73,74 that provides high-quality parameters for druglike molecules based on a bespoke workflow where quantum mechanical calculations are performed on each molecule to obtain the potential surface for force field fitting. The atom types in FFEngine are defined following the hierarchical structure described by Jin Z. et al. <ref type="bibr">75</ref> In total, FFEngine utilizes approximately 200 atom types, with the objective of covering the chemical space of drug-like molecules.</p><p>FFEngine generates ligand parameters for the AMBER force field functional form, <ref type="bibr">76</ref> for use in AMBER, GROMACS, OpenMM, or other software packages. The charge as-signment uses GFN2-xTB/BCC, which is similar to the AM1/BCC model from AMBER.</p><p>Atomic partial charges are assigned based on atom types and the bond charge correction (BCC) parameters fitted to the HF/6-31G* electrostatic potential (ESP) from 50,000 druglike compounds and their conformations. In GFN2-xTB/BCC, the pre-charges are assigned with the semiempirical method GFN2-XTB. The molecular structures are relaxed with the machine learning force field GFN-FF to prepare the structure before partial charges assignment. <ref type="bibr">77</ref> The vdW, bond, angle, and torsion parameters are assigned based on a more refined set of atom types. GAFF2 was used as the fallback parameters for bond, angle, and torsional degrees of freedom.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>DiffNet Analysis</head><p>ATM yields estimates of the differences between the binding free energies of pairs of ligands. We employed DiffNet <ref type="bibr">64,</ref><ref type="bibr">78</ref> to estimate the binding free energies of the ligands in the set. Diffnet finds the set of binding free energies most consistent with the network of their differences (referred to as edges) and their uncertainties based on the maximum likelihood principle. The DiffNet solution is known up to an arbitrary constant that we set to match the average of the experimental binding free energies.</p><p>The uncertainties associated with the edges of the network of free energy transformations are an important element of the DiffNet protocol. Edges with small uncertainties weigh on the final solution more than edges with larger uncertainties. To define confidence levels in the ATM RBFE predictions, we used two measures of the quality of the alchemical calculations:</p><p>1) gaps in the perturbation energy distributions and 2) height of the free energy barrier along the alchemical path.</p><p>To assess the presence of gaps in the perturbation energy distributions, the binding energy samples of each leg were histogrammed with bins of size &#8710;u = k B T /&#8710;&#955;, where &#8710;&#955; = 1/N states is the size of the subdivision of the alchemical path among N states alchemical states (approximately 13 kcal/mol in this application). A case with a sequence of two or more bins with zero counts was flagged as unlikely to be converged, and its uncertainty was increased by &#8710;u. A case with a sequence of two or more bins with less than 10% of the expected number of samples based on a uniform distribution was flagged as possibly unconverged, and its uncertainty was increased by a tenth of &#8710;u. Similarly, to capture the lower confidence of predictions of large transformations with high free energy intermediates, we increased the uncertainty of predictions where the free energy of the alchemical intermediate, &#8710;&#8710;G(1/2), exceeds 30 kcal/mol relative to either end state. Specifically, in these cases, we increased the uncertainty linearly by a(&#8710;&#8710;G(1/2) -g) where a = 0.1 kcal/mol and g = 30 kcal/mol.</p><p>While this assessment successfully flagged possibly problematic calculations, the specific parameters used here to assign confidence levels were set empirically and would benefit from validation in future studies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Simulation Setup Workflow</head><p>To conduct this work, we wrote a Python-based ATM RBFE setup and analysis workflow.</p><p>The workflow performs water placement, force field generation, reference atom selection, displacement vector searching, AToM-OpenMM simulation, relative binding free energy calculation, and DiffNet analysis. As input, the workflow requires fully prepared proteins and docked ligand poses.</p><p>In order to get better solvation structures for the receptors, 3D-RISM 79 from Amber-Tools 80 was first applied to estimate the implicit water distribution on the receptor-primary ligand complex surface, followed by Placevent 81 to place the explicit water molecules on the complex surface. Any water molecules that clashed with other ligands in the set were removed. The workflow then assigned ligand force field parameters using FFEngine as described above.</p><p>An ATM RBFE calculation requires the choice of three corresponding reference atoms for the alignment of the ligand pair. <ref type="bibr">34</ref> The alignment reference atoms cannot be colinear and are best chosen among rigid core atoms of the two ligands. This task involves manual selection the reference atoms of only one representative ligand placed into the receptor binding site.</p><p>All other ligands are then automatically aligned to the first by a minimum distance search based on their initial poses.</p><p>In the simulation box for an ATM calculation, one ligand is placed in the binding pocket, and the other one is translated into the solvent phase by a displacement vector h. While the choice of the displacement vector could be arbitrary, a reasonable choice should ensure that any atom of the second ligand is at least 10 &#197; away from any of the receptor atoms.</p><p>On the other hand, a large displacement would unnecessarily increase the box size, affecting performance. We used TLeap 80 to generate a preliminary rectangular solvation box for the receptor with a 10 &#197; water buffer to select a good displacement vector automatically. The ATM displacement vector h was then obtained by displacing the ligand from the binding pocket towards the center of the box face with the smallest surface area until all atoms of the ligand were found outside the box.</p><p>The Amber input chemical topologies with the assigned force field parameters generated by our automated workflow are listed in the simulation input files available on the GitHub repository at <ref type="url">https://github.com/EricChen521/ATM</ref> MerckSet.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Benchmark Systems</head><p>The benchmark sets prepared by Schindler et al. <ref type="bibr">12</ref> include eight receptor targets with 24 to 44 ligands for each target (Table <ref type="table">1</ref>). The benchmark also specifies the ligand pairs forming the edges of the graph of RBFE calculations. <ref type="bibr">82</ref> In total, the benchmark set includes 264 protein-ligand complexes and 550 RBFE edges. Schindler et al. reported free energy estimates for 525 of the 550 transformations using Schr&#246;dinger's FEP+ package. <ref type="bibr">10</ref> In this work, we considered all of the protein-ligand complexes and the corresponding edges except the two involving compound 28 of the CDK8 set, which we suspect to have been misidentified.</p><p>This compound lacks a measured binding free energy and is classified as a non-binder, even though close analogs are strong binders. <ref type="bibr">83</ref> The benchmark sets provided by Schindler et al. <ref type="bibr">12</ref> are considered challenging because, in addition to standard small R-group modifications, they include a significant number of more challenging transformations. The set includes 43 large R-group transformations (more than 10 added/removed atoms), 66 charge transformations (either changing the formal charge or moving the location of the formal charge), and 60 scaffold-hopping transformations. Large R-group transformations are considered challenging because they often induce changes in the conformation and hydration pattern of the complex to accommodate the new groups of atoms. Charge-changing RBFEs, which involve ligands with different net charges, have traditionally required specialized strategies or correction terms <ref type="bibr">20,</ref><ref type="bibr">43,</ref><ref type="bibr">84</ref> unnecessary in our alchemical transfer approach. <ref type="bibr">34</ref> Nevertheless, charge-changing transformations and the related charge-shifting transformations remain challenging because of the receptor and solvent reorganization that they often induce. Finally, scaffold-hopping transformations that include cyclization, ring-breaking, and ring expansion/reduction transformations that involve the breaking or forming of chemical bonds, which traditionally require specialized strategies, <ref type="bibr">48,</ref><ref type="bibr">50</ref> are generally straightforward and are treated here with ATM as any other transformation. <ref type="bibr">34</ref> Computational Details</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Molecular Systems Setup</head><p>We employed the structures of the proteins and bound ligands as provided by Schindler et al. <ref type="bibr">12</ref> The simulation systems were prepared for the RBFE calculations using the automated workflow described above. The AMBER FF14SB force field <ref type="bibr">85</ref> was used for the protein and the TIP3P model 86 for water. K + and Cl -ions were added to the system if needed for neutralization. The force constants of the ligand alignment restraints were set to k r = 2.5 kcal/(mol &#197;2 ) for the position restraint, and k &#952; = k &#968; = 50 kcal/mol for the orientational restraints. <ref type="bibr">34</ref> The C&#945; atoms' positions of the receptors were restrained to their initial values using flat-bottom harmonic restraints with a tolerance of 1.5 &#197; and a force constant of 25 kcal/(mol &#197;2 ).</p><p>The solvated systems were energy minimized, thermalized, and relaxed for 400 ps at 298 K and 1 bar constant pressure and annealed to the &#955; = 1/2 alchemical intermediate in 500 ps while restraining the receptor and the ligands' atoms. The system was then equilibrated in the NVT ensemble at 298 K for 300 ps at the alchemical intermediate after releasing the restraints on the ligands and the receptor (except for the C&#945; atoms positional restraints described above). The resulting structures were used as starting configurations for the alchemical replica exchange simulations described next.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Alchemical Transfer Relative Binding Free Energy Protocol</head><p>ATM RBFE calculations were conducted using the AToM-OpenMM package version 3.2.3, 87</p><p>the ATM MetaForce OpenMM plugin version 0.3.1, <ref type="bibr">88</ref> and the OpenMM MD engine version 7.7. <ref type="bibr">89</ref> The ATM alchemical schedule is comprised of two legs. <ref type="bibr">34,</ref><ref type="bibr">53</ref> Using the notation in-  <ref type="formula">3</ref>) while that for the second leg is</p><p>where 0 &#8804; &#955; &#8804; 1/2, the perturbation energy is given in Eq. ( <ref type="formula">2</ref>), and, for both legs, the alchemical perturbation function is the softplus function 68</p><p>The parameters &#955; 2 , &#955; 1 , &#945;, and u 0 are functions of &#955; (see below). <ref type="bibr">52</ref> The function</p><p>with</p><p>and</p><p>with, in this work, u max = 200 kcal/mol, u c = 100 kcal/mol, and a = 1/16, is the softcore perturbation energy function designed to avoid singularities near the initial state of the alchemical transformation. <ref type="bibr">52</ref> A schedule of 11 equispaced &#955;-states from &#955; = 0 to &#955; = 1/2 was employed for each of the two legs. The corresponding schedules of softplus alchemical parameters in Eq. ( <ref type="formula">6</ref>)</p><p>were: &#955; 1 = 0, 0, 0, 0, 0, 0, 0.1, 0.2, 0.3, 0.4, 0.5, &#955; 2 = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.5, 0.5, 0.5, 0.5, 0.5, and &#945; = 0.1 (kcal/mol) -1 and u 0 = 110 kcal/mol for all &#955;-states. We employed this alchemical schedule for all the ligand pairs of the benchmark sets in this work without further optimization.</p><p>For each run, asynchronous Hamiltonian replica exchange 90 molecular dynamics conformational sampling was performed with a 2 fs timestep and a replica running time of 40 ps on 4 GPUs using the AToM-OpenMM software. <ref type="bibr">87</ref> Exchanges between the two equivalent alchemical intermediate states at &#955; = 1/2 allow replicas to transition from one alchemical leg to the other. Perturbation energy samples were collected every 40 ps. Relative binding free energies were computed from replica trajectories at least 5 ns long (approximately 110 ns of MD in aggregate per ligand pair), discarding the first third of the samples for equilibration.</p><p>UWHAM multi-state analysis <ref type="bibr">91</ref> was used for free energy and statistical error estimation.</p><p>The molecular system files, AToM-OpenMM input files, and UWHAM analysis code are available on GitHub at <ref type="url">https://github.com/EricChen521/ATM</ref> MerckSet.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Results</head><p>The relative binding free energy performance of the automated ATM protocol compared with the experimental measurements of the benchmark sets is summarized in Figure <ref type="figure">2</ref>, where each point represents a ligand pair. For each target set, we report the Average Unsigned Error (AUE), the Pearson's correlation coefficient r, and the fraction of concordant predictions f concd. relative to the experiments. A concordant prediction is a case in which the direction of the change in binding affinity of more than 0.5 kcal/mol is correctly predicted (see Methods).</p><p>According to these statistics, the c-Met, PFKFB3, CDK8, and Hif-2&#945; predictions are better than the other targets, with correlation coefficients of 0.6 or higher with over 80% of concordant pairs. The performance for the c-Met set with r = 0.84 and f concd. = 86% is particularly encouraging. However, due to a relatively small number of outliers, the AUEs for these sets are quite high (from 0.98 to 1.5 kcal/mol). The RBFE prediction performance for the remaining four sets (Eg5, SHP2, Syk, and TNKS2) is not as good. The Syk and TNKS2</p><p>set have fewer major outliers and display relatively small AUEs. However, the calculated RBFEs are poorly correlated with the experiments with correlation coefficients of 0.52 and 0.40, respectively. The SHP2 and, particularly, the Eg5 sets have both a significant number of outliers and poor correlation with the experiments.</p><p>The confidence levels of the RBFE predictions represented by the error bars in Figure The accuracy of the absolute binding free energies (ABFE) predictions calculated from the RBFEs using the Diffnet protocol (see Methods) is summarized in Table <ref type="table">1</ref> and illustrated in Interestingly, the triangulation of the data afforded by DiffNet results in ABFEs significantly tighter agreement with the experiments than the RBFEs estimates. This is particularly noticeable for the c-Met set, which, despite the major outliers and large uncertainties of the RBFEs (Figure <ref type="figure">2</ref>), yields high-confidence predictions with low AUEs (Figure <ref type="figure">3</ref>).</p><p>The ATM ABFE predictions for three of the eight systems (c-Met, PFKBF3, and CDK8) have high correlation coefficients and generally low AUEs relative to the experiments. FEP+, based on a different alchemical approach, performed similarly well for these systems, suggesting that they are tractable for diverse free energy methods. Although affected by a few major outliers, the performance on the Hif-2&#945; set is also generally good. Conversely, ATM performed noticeably worse for the SHP2 and Syk sets, where it resulted in an unusually poor AUE relative to the experiments. ATM and FEP+ performed equally poorly for the Syk and Eg5 systems indicating a common set of challenges. The TNKS2 set is less informative due to the small range of experimental affinities (Figure <ref type="figure">3</ref>). </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Discussion</head><p>The results obtained in this validation study confirm the applicability of the ATM approach to large-scale binding free energy estimation campaigns for challenging and diverse ligand libraries. In this section, we present some examples illustrating the strengths and weaknesses of the method that we observed during this work. We plan to incorporate these insights in deploying this approach in future work.</p><p>Small Charge-Preserving R-group RBFE Estimates are Generally</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Well-Converged</head><p>Over 70% of the ligand pairs of the benchmark (384 out of 548) are classified as conventional R-group alchemical transformations involving one or more small peripheral substitutes of the same net charge with 10 or fewer heavy atoms. The calculations of these pairs are expected to converge rapidly, and the corresponding relative binding free energy estimates to be more reliable than the other more challenging cases. As illustrated by the example in Figure <ref type="figure">4</ref>, this expectation is largely confirmed by the consistently good quality measures in terms of is added to a neutral carbamate substituent, is an example of this class. While the free energy of the alchemical intermediate is high (above 40 kcal/mol), the sequence of alchemical states is relatively well connected. The Eg5 case in Figure <ref type="figure">7</ref> includes a shift by three bonds of a charged ammonium group. This alchemical transformation is assigned a large uncertainty because it is characterized by both a disconnected alchemical path and a high free energy intermediate.</p><p>Overall, the 66 RBFE predictions classified as either charge-changing or charge-shifting transformations have larger uncertainties and poorer agreement with the experiments than average. This is especially so for the transformations in this class for the SHP2 set that are found to have an RMSE relative to the experiments in excess of 3 kcal/mol. Nearly all edges with large uncertainties for the c-Met and Eg5 sets (Figure <ref type="figure">2</ref>) correspond to charge-changing or charge-shifting transformations. Achieving better convergence for these transformations would significantly improve ATM's promising overall prediction accuracy for these sets.</p><p>Because the system remains neutral, the poorer outcomes observed here for alchemical transfer transformations involving charge variations are not obviously due to systematic boundary conditions and finite system biases present in some double-decoupling protocols. <ref type="bibr">43,</ref><ref type="bibr">84</ref> The convergence issues observed with charged groups are more likely related to the conformational reorganization of the complex that occurs when interactions are varied by placing or relocating a charged group of the ligand. This is illustrated by the case in Figure <ref type="figure">7</ref> where an amide hydrogen bonding acceptor group is replaced by a strong ammonium hydrogen bonding donor originally placed three bonds away. In this case, we observe that the two ligands shift position and form salt bridges with two different glutamate residues of the receptor (Figure <ref type="figure">8</ref>). The equilibration between these two conformational states of the complex hinders convergence because it occurs slowly relative to the timescales of the alchemical simulations. In addition, because the receptor needs to accommodate both charged groups of the ligand simultaneously, conformational frustration at the alchemical intermediate leads to high free energies and gaps and lack of state overlaps along the alchemical pathway (Figure <ref type="figure">7</ref>).</p><p>This and the previous examples underscore the various ways in which differences between ligand pairs prevent successful RBFE predictions. Two ligands can have very different sizes and shapes, as in the example of Figure <ref type="figure">5</ref>, or, as in the present example of Figure <ref type="figure">7</ref>, they can be structurally similar but differ radically in the way that corresponding groups interact with their environments. Cases with very dissimilar ligands such as these can probably be addressed by breaking the alchemical transformation into smaller steps by inserting suitable chemical intermediates. For example, the transformation in Figure <ref type="figure">7</ref> that attempts to replace one charged group with another simultaneously causing a large conformational reorganization (Figure <ref type="figure">8</ref>) could be implemented by first removing or neutralizing one ammonium group and then inserting the other in a second step. The case in Figure <ref type="figure">6</ref> and similar others we observed indicate that individual insertions of charged groups are more manageable than replacing one charged group with another placed elsewhere.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Scaffold-hopping Transformations are as Straightforward as R-group Transformations</head><p>The Alchemical Transfer Method is based on a dual-topology representation, making it easier to set up scaffold-hopping transformations between ligands that do not share the same core topology. Indeed, the ATM setup procedure described here for scaffold-hopping transformations is the same as any other transformation. Sixty of the 548 RBFE calculations conducted in this work were classified as scaffold hopping transformations.</p><p>As illustrated by the examples in Figures <ref type="figure">9</ref> and<ref type="figure">10</ref>, the results of this validation campaign have also shown that scaffold-hopping RBFE calculations converge just as efficiently as small R-group transformations of the same kind. In the HIF-2&#945; case in Figure <ref type="figure">9</ref>, for example, a 5membered ring is formed by the cyclization of two substituents of the central aromatic group.</p><p>In the case of Figure <ref type="figure">10</ref>, a 5-membered ring is expanded to a 6-membered ring. In both of these cases and many similar ones we observed, there are good overlaps between perturbation</p></div></body>
		</text>
</TEI>
