<?xml-model href='http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng' schematypens='http://relaxng.org/ns/structure/1.0'?><TEI xmlns="http://www.tei-c.org/ns/1.0">
	<teiHeader>
		<fileDesc>
			<titleStmt><title level='a'>Free Energy-Based Computational Methods for the Study of Protein-Peptide Binding Equilibria</title></titleStmt>
			<publicationStmt>
				<publisher></publisher>
				<date>05/01/2021</date>
			</publicationStmt>
			<sourceDesc>
				<bibl> 
					<idno type="par_id">10275774</idno>
					<idno type="doi">10.1007/978-1-0716-1855-4_15</idno>
					<title level='j'>Methods in molecular biology</title>
<idno>1064-3745</idno>
<biblScope unit="volume">Computational Peptide Science</biblScope>
<biblScope unit="issue"></biblScope>					

					<author>Emilio Gallicchio</author><author>Thomas Simonson</author>
				</bibl>
			</sourceDesc>
		</fileDesc>
		<profileDesc>
			<abstract><ab><![CDATA[This chapter discusses the theory and application of physics-based free energy methods to estimate protein-peptide binding free energies. It presents a statistical mechanics formulation of molecular binding, which is then specialized in three methodologies: (i) alchemical absolute binding free energy estimation with implicit solvation, (ii) alchemical relative binding free energy estimation with explicit solvation, and (iii) potential of mean force binding free energy estimation. Case studies of protein-peptide binding application taken from the recent literature are discussed for each method.]]></ab></abstract>
		</profileDesc>
	</teiHeader>
	<text><body xmlns="http://www.tei-c.org/ns/1.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Peptide and peptide-derived molecules are widely used to target protein-protein interactions for medicinal purposes and basic biological research. In-silico models play an increasingly significant role in the study of protein-peptide interactions. As excellently reviewed elsewhere, <ref type="bibr">[1,</ref><ref type="bibr">2,</ref><ref type="bibr">3]</ref> computational methods for studying proteinpeptide interactions have evolved on somewhat separate tracks from those used for small molecule-protein interactions. These differences are partly due to the greater flexibility and size of peptides and their tendency to interact with proteins through many relatively weak interactions. Nevertheless, because the same fundamental physical forces regulate all molecular recognition phenomena, it is helpful to relate computational models under a standard set of principles. This chapter is devoted to a class of physics-based free energy methods considered the most accurate and detailed for modeling the thermodynamics of molecular binding equilibria. These methods model the interactions between molecules as well as their motion at the atomic level. We derive each method discussed from a well-established statistical mechanics theory of non-covalent molecular association.</p><p>The chapter attempts to demystify the theory and the seemingly arcane formulas and computational procedures used in the field and point out the specific features of the methods that make them more or less suitable for studying protein-peptide interactions.</p><p>There is an implicit acknowledgment here that an understanding of these methodologies and how to select and apply them appropriately cannot be accomplished fully without referring to the underlying theory. The treatment employed here requires only a basic familiarity with concepts of statistics (probability distributions, averages, marginalization) and of classical statistical thermodynamics (classical partition functions and their manipulations, and their relationship with the free energy).</p><p>After presenting the theory and methods, we then illustrate their applications by discussing three case studies. We hope that this format will help convey the characteristics and relationships between the various methodologies and the fundamental principles on which they are based.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Statistical Mechanics Formulation</head><p>In this section, we derive and discuss a statistical mechanics theory of molecular binding. The concepts and the formulas expressed here will be used later to rationalize the specific computational methods and practices used in the case studies reviewed in Section 3.</p><p>We attempt to use unambiguous notation throughout, but sometimes we adopt a simplified notation to unclutter the equations. For example, in intermediate formulas, we often omit limits of integration and Jacobian factors for curvilinear coordinates when they do not affect the form and interpretation of the final result. In some places, we use function arguments to distinguish two functions. For example, we might denote the ligand and receptor's potential energy functions with the same symbol , as ( ) and ( ), even though they are different mathematical functions.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">The Standard Free Energy of Binding</head><p>We will consider here the reversible non-covalent binding equilibrium between receptor molecules R and ligand molecules L to form a complex RL in an ideal solution: R + L &#8651; RL, <ref type="bibr">(1)</ref> with the dimensionless equilibrium constant</p><p>where [. . .] are concentrations, &#8226; is the standard state concentration (conventionally set as 1 M or 1 molecule/1668 &#197; 3 ), and the "eq" subscript states that all concentrations are evaluated at equilibrium. The Gibb's molar standard binding free energy, which is the main objective of the computational models of binding discussed here, is defined as</p><p>where B is Boltzmann's constant and is the temperature in the Kelvin scale (in the following we will assume constant temperature pressure conditions).</p><p>Implicit in this quasi-chemical description of the binding equilibrium is the idea that the separated species in solution R and L, as well as the complex RL, are defined in some way. In an experimental setting, the apparatus used to measure equilibrium concentrations provides a working definition of the species. The nature of the experimental reporter used to monitor the formation of the complex is of particular relevance. <ref type="bibr">[4]</ref> The change of a spectroscopic signal, as in NMR and UV/VIS fluorescence assays, <ref type="bibr">[5]</ref> likely probes a set of conformations of the complex in which specific groups of the receptor and the ligand are in contact. Hence, different spectroscopic reporters would, in general, yield different estimates of the standard free energy of binding. <ref type="bibr">[6]</ref> Spectroscopic reporters stand in contrast to experimental reporters, such as those in calorimetric, surface plasmon resonance (SPR), amplified luminescent proximity (AlphaScreen), and equilibrium dialysis binding assays, that probe unspecific molecular association. <ref type="bibr">[4,</ref><ref type="bibr">7,</ref><ref type="bibr">8,</ref><ref type="bibr">9,</ref><ref type="bibr">6]</ref> Here, we focus mainly on computational models that define the complex using structural means-typically specific distances and angles between groups of atoms <ref type="bibr">[10]</ref>-and are therefore more suitable to describe measurements of binding constants with specific spectroscopic experimental reporters.</p><p>In practice, the association between a peptide ligand and a protein receptor is also often monitored by indirect biochemical means, such as enzymatic inhibition <ref type="bibr">[11]</ref> or pull-down assays, <ref type="bibr">[12]</ref> that are only indirectly related to the equilibrium binding constant of the ligand-receptor complex. The computational models' ability to reproduce or explain this type of data is expected to be semi-quantitative at best, as it would be a correlation between experimental binding constants and activity data.</p><p>While ambiguities in relating molecular computer simulations to experimental biophysical data of molecular binding exist for any molecular complex, the issue is explicitly discussed here because it is expected to be particularly widespread for the study of the interactions involving peptides, which are generally more flexible than most small-molecule drug compounds and engage protein receptors over a large binding surface in a variety of binding modes. It is useful to keep these issues in mind when designing a computational model and the answers that one can reasonably extract from it. Computational modeling can be a valuable tool when used judiciously by exploiting its strengths while managing its unavoidable limitations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Statistical Mechanics Theory of Non-Covalent Molecular Binding</head><p>Under the assumptions above, Gilson et al. <ref type="bibr">[13]</ref> derived a statistical mechanics expression for the binding constant [Eq. ( <ref type="formula">2</ref>)] which, with a few reasonable approximations (discussed below), can be written as: <ref type="bibr">[14]</ref> </p><p>where is the intramolecular configurational partition function of one molecules of species in solution.</p><p>A full derivation of Eq. ( <ref type="formula">4</ref>) is beyond the scope of this chapter. However, it is briefly outlined here to introduce the notation. Eq. ( <ref type="formula">4</ref>) is derived by writing the molar standard binding free energy as the difference of the standard chemical potentials of the complex and those of the receptor and ligand</p><p>and employing the McMillan-Mayer expression for the standard chemical potential of a solute in an ideal solution <ref type="bibr">[15]</ref> &#8226; = -ln</p><p>where the internal canonical molecular partition function of solute in solution and &#923; is the thermal De Broglie wavelength of the center of mass of the solute. The internal molecular partition function includes only the internal degrees of freedom of the solute obtained after separating the translational degrees of freedom of the molecular center of mass. Furthermore, the solute's internal canonical molecular partition function in solution is understood in the context of the concept of the solvent potential of mean force, <ref type="bibr">[16]</ref> in which the solvent degrees are averaged out.</p><p>While a quantum-mechanical treatment is required in general, adopting a classical expression for the molecular partition function is appropriate for the present discussion limited to non-covalent molecular association equilibria, which do not involve the formation or breaking of chemical bonds. The internal canonical molecular partition function is written as</p><p>where the denominator comes from the integration over the momenta, the factor of 8 2 comes from the integration over the orientational degrees of freedom of the solute, and is the vibrational molecular configurational partition function</p><p>where = 1/( ) is the inverse temperature, denotes the collection of the vibrational degrees of freedom of solute and</p><p>is the effective potential energy of a specific configuration of the solute in solution.</p><p>The effective potential energy is given by the sum of the intramolecular potential energy ( ) and the solvent of potential of mean force ( ), defined as the free energy of solvation of the solute kept rigid in configuration . In the present notation,</p><p>where denotes the collection of degrees of freedom of solvent molecules,</p><p>( , ) is the potential energy of the mixture of solvent molecules and one solute molecule , and is the configurational partition function of the pure solvent, expressed as the integral in Eq. ( <ref type="formula">10</ref>) but without the solute.</p><p>The details are omitted since the the contributions from momenta cancel out in this classical treatment. We assume that the orientational degrees of freedom can be separated from the vibrational degrees of freedom without significant loss of accuracy. This is generally an excellent approximation at moderate temperatures.</p><p>It should be noted that the solvent potential of mean force formalism does not introduce new assumptions or approximations than the ones already adopted. In this context, it is only a convenient notation aid. We will discuss later implicit solvation models which approximate the solvent potential of mean force.</p><p>The notation can be easily extended to solvent mixtures including ions and co-solvents.</p><p>Eq. ( <ref type="formula">4</ref>) is obtained by inserting Eq. ( <ref type="formula">6</ref>) for each species, using Eqs. (7-10), into Eq. ( <ref type="formula">5</ref>) noticing that the kinetic energy factors cancel out, and finally inverting Eq.</p><p>(3).</p><p>The definition of the vibrational configurational partition function of the complex, , receives special consideration in this theory. <ref type="bibr">[13]</ref> In the complex, the translational and orientational degrees of freedom of the ligand are represented by the internal degrees of freedom of the complex that specify the position and orientation of the ligand with respect to a coordinate system attached to the receptor. <ref type="bibr">[10]</ref> Furthermore, the integration along these coordinates is limited to some specified range of configurational space that encodes our structural definition of what constitutes a valid configuration of a ligand "bound" to the receptor (see the discussion in Section 2.1). The structural definition of the bound complex is a necessary and somewhat arbitrary input of the theory. <ref type="bibr">[13,</ref><ref type="bibr">4,</ref><ref type="bibr">14,</ref><ref type="bibr">7]</ref> Without it, the free energy of the bound complex relative to the unbound state is undefined and, consequently, the standard binding free energy and the binding constant would also be undefined in this theory.</p><p>It is customary to represent the bound region of the complex by an indicator function ( ), where represents the collection of the six coordinates that specify the position and orientation of the ligand relative to the receptor. <ref type="bibr">[10]</ref> The indicator function is set to 1 if the position and orientation of the ligand is such that receptor and ligand are considered bound and zero otherwise so that can be written as</p><p>Three translations and three orientations for a non-linear ligand.</p><p>The specific choice of the coordinates is arbitrary as long as they do not couple directly or indirectly the intramolecular coordinates of the receptor or the ligand.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3">The Binding Free Energy Formula</head><p>Since the direct evaluation of partition functions is not generally feasible, Eq. ( <ref type="formula">4</ref>) is not amenable to direct computation. One strategy is to transform it into an average over the conformational ensemble in which receptor and ligand are uncoupled. To do so, we reorganize the integration variables in the numerator so that they match exactly those in the denominator. First, define</p><p>which measures the spatial ( site ) and angular (&#937; site ) extent of the bound state of the complex when receptor and ligand are uncoupled. Then, multiply and divide Eq.</p><p>(4) by Eq. ( <ref type="formula">12</ref>) by keeping the integral form in the denominator and the integrated form in the numerator. The result is</p><p>where</p><p>is the ensemble average of the Boltzmann weight of the effective binding energy of defined as the difference in effective potential energies of the complex in the specified configuration and of that of the separated receptor and ligand without changing their internal configurations</p><p>Eq. ( <ref type="formula">12</ref>)is is colloquially referred to as the volume of the receptor binding site. The notation used here suggests that translational and orientational components are not coupled in the definition of ( ). The present treatment is still valid if this is not the case, except that in this case the value of the integral of the indicator function is not written as the product of spatial and orientational components. Finally, &#937; site = 8 2 if the definition of the bound complex does not involve orientational coordinates, that is when only the position of the ligand is used to judge whether it is bound to the receptor.</p><p>with the normalized probability density function</p><p>which corresponds to an unphysical state of the complex in which the ligand is bound to the receptor (the density is zero unless ( ) = 1) but it does not interact with it (the potential function lacks receptor-ligand coupling terms). We will hereafter refer to this state as the decoupled state of the complex. Conversely, the coupled state of the complex is the physical state in which the bound ligand and the receptor interact through the &#936; ( , , ) potential function.</p><p>Inserting Eq. ( <ref type="formula">13</ref>) into Eq. (3) yields the following expression for the standard free energy of binding</p><p>where</p><p>is the ideal component of the standard free energy of binding, corresponding to the reversible work for transferring a ligand from an ideal solution at concentration &#8226; to the binding site region in the absence of ligand-receptor interactions, and</p><p>is the excess component of the standard free energy of binding, corresponding to the reversible work for turning on the receptor-ligand interactions while the ligand is sequestered within the binding site region of the receptor. The goal of the computational models discussed in this chapter is the estimation of the excess free energy of binding. The ideal component is generally computed analytically by integration of the expression that defines the indicator function of the bound complex.</p><p>Eq. ( <ref type="formula">19</ref>) provides, in principle, a computational route to evaluate the binding free energy. The process is often called alchemical because it is unrealizable in Nature.</p><p>Never the less it produces estimates that can be compared to experimental measurements. It instructs to (i) obtain a sample of Boltzmann's-distributed conformations of the complex in the uncoupled state (by molecular dynamics, typically), (ii) evaluate the binding energy function [Eq. ( <ref type="formula">15</ref>)] for each sample by turning on without conformational rearrangements the coupling between ligand and receptor, and finally (iii) find the average of the Boltzmann weight exp(-). While straightforward, this process is numerically ill-conditioned, and it fails for all but the simplest systems.</p><p>This problem arises because atoms of the ligand and the receptor are very likely to clash when uncoupled. Consequently, the binding energy is large and positive, and exp(-) is negligibly small for the vast majority of samples. Effectively, the sampling process generates mostly zeros, and the average is dominated by the very rare cases when, by chance, ligand and receptor do not clash and are primed to form favorable interactions even in the absence of such interactions.</p><p>To appreciate more quantitatively the severity of this numerical problem, let's rewrite the ensemble average in Eq. ( <ref type="formula">19</ref>) as a statistical average</p><p>where 0 ( ) is the probability density distribution of the binding energy in the uncoupled state. As shown for example in Fig. <ref type="bibr">(1)</ref> for the complex between 3iodotoluene and the L99A mutant of T4-lysozyme, <ref type="bibr">[17]</ref> 0 ( ) (in green) is greatest for large and positive values of the binding energy. For this system, the probability of finding a conformation for which the integrand of Eq. ( <ref type="formula">20</ref>) is significant (the red curve) is six or more orders of magnitude smaller than the probability of occurrence</p><p>In practice, various strategies ranging from stratification (break up the binding process by introducing appropriate intermediate states) to importance sampling (preferential sampling of bound states) have been devised to overcome the numerical problems in alchemical free energy averages. Some of these strategies will be discussed in the case studies later in this chapter. While often very useful, applying these advanced strategies to protein-peptide complexes remains very challenging, as reflected in the paucity of successful alchemical absolute binding free energy calculations for protein-peptide complexes reported in the literature.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.1">The Double-Decoupling Method</head><p>Eq. ( <ref type="formula">19</ref>) is not directly applicable to the calculation of binding free energies unless the solvent potential of mean force, ( ), or a suitable implicit solvent approximation for it, is available for the ligand, the receptor, and their complex. The solvent potential of mean force is required for conformational sampling and the evaluation of effective binding energies for each sample using Eqs. ( <ref type="formula">15</ref>) and ( <ref type="formula">9</ref>).</p><p>The alternative is to employ an explicit representation of the solvent. The relevant partition functions include integrating the solutes' internal degrees of freedom and the degrees of freedom of the solvent molecules. The result is a binding free energy formulation known as double-decoupling <ref type="bibr">[13]</ref> involving two exponential averages of the same form as Eq. ( <ref type="formula">19</ref>), one for coupling the ligand from vacuum to the solvated receptor and another for coupling the ligand to the pure solvent. These two processes, the second of which is related to the solvation of the ligand, are part of a thermodynamic cycle that brings the ligand from the solvent bulk to the solvated receptor through an intermediate state in which the ligand is in vacuum (Fig. <ref type="figure">2</ref>).</p><p>The double-decoupling method is regarded as the leading computational model for calculating protein-small molecule binding free energies. However, due to their sizes, Fig. <ref type="figure">2</ref> Schematic illustration of the thermodynamic cycle of the double decoupling method for the calculation of the binding free energy between a molecular receptor (orange doughnut) and a ligand (black circle). The dashed circle within the receptor represents the binding site region. The blue boxes represent the solvent. The bound and unbound end states are transformed to a common intermediate state in which the ligand is in vacuum (white). The excess binding free energy is the difference of the free energy changes of the two legs,</p><p>it is not generally applicable to peptides. It is presented here because it forms the basis for the relative binding free energy method employed in the case study of Section 3.2. To see why double-decoupling is not readily applicable to peptides, consider, for example, the first leg in Fig. <ref type="bibr">(2)</ref>, which is the inverse of the coupling of the peptide to the solvated receptor. For the same reasons outlined above concerning Eq. ( <ref type="formula">17</ref>), it would be very challenging to compute the free energy of this process because, in addition to the many atomic clashes with the receptor atoms, the uncoupled peptide will also clash with solvent molecules that would be present in the binding site.</p><p>Similar challenges would exist for the hydration leg.</p><p>The double-decoupling formula is derived from the statistical mechanics theory outlined in Section 2.2 by first inserting the definition of the solvent potential of mean force [Eq. <ref type="bibr">(10)</ref>] in each of the configurational partition functions in Eq. ( <ref type="formula">4</ref>) and then multiplying and dividing by the configurational partition function of the ligand in vacuum 0, to obtain:</p><p>where , is the configurational partition function of a system with solvent molecules with one molecule of species whose position and orientation, like in Section 2.2, is fixed. So for example</p><p>where ( , , , ) is the potential energy function of a system with solvent molecules containing the receptor-ligand complex in the configuration specified by the internal degrees of freedom , , and . 0, represents the configurational partition function of the ligand in vacuum.</p><p>The reciprocal of the last term in Eq. ( <ref type="formula">21</ref>) can be written as (23), this term is related to the solvation free energy of the ligand or the opposite process of leg 2 in Figure <ref type="figure">2</ref>.</p><p>The ratio of partition functions corresponding to the complex in Eq. ( <ref type="formula">21</ref>) is converted to an average by multiplying and dividing by site &#937; site as done earlier to derive Eq. ( <ref type="formula">13</ref>)</p><p>Specifically, the free energy of a solute in a fixed position and orientation in vacuum to a fixed position and orientation in solution; a quantity also known as the solvation free energy in the Ben-Naim standard state. <ref type="bibr">[19,</ref><ref type="bibr">20]</ref> where = ( , , , ) -( , ) -( ) is the instantaneous change in potential energy for bringing the ligand from vacuum to a position and orientation relative to receptor in a solution containing the receptor, and . . . , + , similarly to Eq. ( <ref type="formula">19</ref>), indicates the ensemble average over the uncoupled ensemble in which the ligand is bound to the receptor ( ( ) = 1) but it does not interact with either the receptor nor the solvent. As indicated in Eq. ( <ref type="formula">24</ref>) this ensemble average gives the free energy of the inverse of leg 1 in Figure <ref type="figure">2</ref>. Combining Eqs. ( <ref type="formula">23</ref>), ( <ref type="formula">24</ref>), ( <ref type="formula">21</ref>), ( <ref type="formula">17</ref>), <ref type="bibr">(18)</ref> and ( <ref type="formula">3</ref>) we finally arrive at the double-decoupling expression for the excess binding free energy:</p><p>as illustrated in Figure <ref type="figure">2</ref>.</p><p>Note that the free energy formula for each leg is in the same form of an exponential average [Eq. ( <ref type="formula">24</ref>)] of the alchemical potential energy change as the direct binding free energy formula we derived in Section 2.3. Thus, similar considerations apply for each leg of double-decoupling. In each case, the formula instructs to obtain samples of configurations of either the systems with the ligand in solution or the ligand in the solvated receptor in their decoupled ensembles. It then instructs to average over the set of samples the Boltmann's weight of the potential energy change for turning on the coupling between the ligand and the environment without conformational rearrangements. Here too, each leg's averaging process is expected to be numerically ill-conditioned (see, for example, Figure <ref type="figure">1</ref>) and not generally applicable directly in molecular simulations. Some numerical approaches to this problem are illustrated in the Case Studies section of this chapter.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4">The Potential of Mean Force Method</head><p>In this section we derive a non-alchemical formulation of the statistical mechanics expression ( <ref type="formula">4</ref>) which leads to the potential of mean force formula for the of binding constant.</p><p>Using the definition of the internal configurational partition function of the complex in Eq. ( <ref type="formula">11</ref>) and the analogous ones for the receptor and ligand, Eq. ( <ref type="formula">4</ref>) is written as</p><p>where we have written the product of the separated receptor and ligand as the partition function of a single system in which the ligand is placed in an arbitrary position * sufficiently removed from the receptor so that it does not interact with it.</p><p>Eq. ( <ref type="formula">26</ref>) is then written as</p><p>where the integration is within the binding site region where ( ) &#8800; 0, and the potential of mean force (PMF) function is defined as</p><p>where &#916; ( ) is the value of the PMF at relative to the value far away from the receptor. With this definition the PMF is zero at any point far away from the receptor.</p><p>The PMF as defined corresponds to the probability density of ( ) of finding the ligand in the orientation and position relative to the receptor:</p><p>so that</p><p>The potential of mean force expression <ref type="bibr">(27)</ref> formally instructs to map out the probability density <ref type="bibr">(29)</ref> to observe the ligand around the receptor in orientation and position , including far away from the receptor and within the binding site region, and to then integrate it within the binding site region to obtain the binding constant using Eq. ( <ref type="formula">27</ref>).</p><p>Some comments are in order. First, the PMF function can be obtained in the solvent of potential of mean force formulation as suggested by Eq. ( <ref type="formula">28</ref>) or using an explicit representation of the solvent by inserting the definitions of the effective potential energy &#936; and of the solvent of potential of mean force <ref type="bibr">(10)</ref> into Eq. ( <ref type="formula">28</ref>)</p><p>It is evident therefore that the PMF is obtained by monitoring the probability of occurrence of the ligand at whether an implicit or explicit description of the solvent is used.</p><p>Secondly, the potential of mean force formula for the binding constant <ref type="bibr">(27)</ref> does not require knowledge of the probability density ( ) everywhere around the receptor. It requires it only within the binding site region and at one arbitrary point * far away from the receptor in the solvent bulk to compute &#916; ( ) from Eq. <ref type="bibr">(30)</ref>. The latter is a fundamental point. It is not sufficient to study the distribution of placements of the ligand in the binding site o compute the binding free energy. We also require the probability of finding the ligand in the binding site relative to finding it somewhere in the solvent bulk. In practice, the PMF is obtained in a volume that includes both the binding site and positions far away from the receptor to connect the two regions in a statistical sense. <ref type="bibr">[21,</ref><ref type="bibr">22,</ref><ref type="bibr">23]</ref> Finally, the PMF is rarely obtained over all six degrees of freedom of (three positions and three orientations). In practice, the PMF is collected only along some of the dimensions by averaging over the others. The averaging procedure is formally described by marginalization of ( ). For example, to obtain the probability of the position of the ligand regardless of its orientation we integrate ( ) = ( , 1 , 1 , 2 ) over the three Euler angles 1 , 1 , and 2</p><p>In the bulk, the ligand distribution does not depend on the orientation and we get</p><p>Next, integrate Eq. ( <ref type="formula">27</ref>) over 1 , 1 , and 2 , assuming that the binding site definition does not depend on orientations, and expressing -&#916; ( ) as ( )/ ( * ), to obtain</p><p>where</p><p>and we have used Eqs. <ref type="bibr">(32)</ref> and <ref type="bibr">(33)</ref>. The implementation of Eq. ( <ref type="formula">34</ref>) requires the PMF with respect to the position of the ligand regardless of its orientation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Case Studies of Applications of Free Energy Methods to</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Protein-Peptide Binding Free Energy Estimation</head><p>In this section, we review some applications of the free energy methods derived from the statistical mechanics theory of non-covalent molecular binding introduced in Section 2.2 to the study of protein-peptide binding phenomena. We will focus in particular on theoretical and methodological aspects that will be introduced and discussed as needed. The following case studies are far from an exhaustive representation of the literature in the field. They have been selected primarily to illustrate the application of the theory and methods presented in Section 2. We also do not attempt to review each work exhaustively.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Binding of Cyclic Peptides to HIV Integrase with the</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Single-Decoupling Method and Implicit Solvation</head><p>As part of the infection cycle, HIV inserts its genome into a human chromosome.</p><p>The HIV integrase (IN) enzyme responsible for this process is recruited to the nuclear chromatin by the human lens epithelium-derived growth factor (LEDGF) transcriptional coactivator. <ref type="bibr">[24]</ref> There have been significant attempts <ref type="bibr">[8,</ref><ref type="bibr">25,</ref><ref type="bibr">26,</ref><ref type="bibr">27]</ref> to develop therapies against HIV based on disrupting the interaction of LEDGF with HIV IN, which occurs at the so-called LEDGF binding domain of integrase (Fig. <ref type="figure">3</ref>).</p><p>The study of the interaction of LEDGF and LEDGF-derived synthetic peptides with HIV-IN has provided useful insights for competitive inhibitors' design. <ref type="bibr">[28,</ref><ref type="bibr">29]</ref> As an example, Figure <ref type="figure">3</ref> illustrates the crystal structure of the LEDGF binding domain of the HIV IN dimer complexed with a cyclic peptide. <ref type="bibr">[29]</ref> Building upon an earlier successful application of alchemical binding free energy calculations of small-molecule inhibitors targeting the LEDGF/HIV IN as a plugin of the OpenMM molecular dynamics library <ref type="bibr">[34]</ref> has been named the Single-Decoupling Method (SDM), <ref type="bibr">[17]</ref> a name chosen to better place it in the same theoretical context as the Double-Decoupling Method (DDM) <ref type="bibr">[13]</ref> discussed in Section 2.3.1. In the following, we will use the latter name to refer to both implementations. SDM has been used in two studies involving protein-peptide binding to date. <ref type="bibr">[35,</ref><ref type="bibr">31]</ref> The implementation of Eq. ( <ref type="formula">17</ref>) requires the averaging of the Boltzmann weight of the effective binding energy in Eq. ( <ref type="formula">15</ref>), which in turn requires the specification of the intramolecular potential energy ( ) and the solvent potential of mean force ( ) for each configuration of the molecular species involved. The former is available from a molecular mechanics force field (OPLS-AA <ref type="bibr">[36]</ref> in the applications discussed here) while the solvent potential of mean force is approximated by an implicit solvent model. <ref type="bibr">[16]</ref> SDM employs the Analytical Generalized Born plus Non-Polar (AGBNP) implicit solvent model <ref type="bibr">[37,</ref><ref type="bibr">38]</ref> which is now maintained as an OpenMM plugin.</p><p>[39]</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.1">Alchemical Pathways and Stratification</head><p>We use this case study to illustrate the very general concept of an alchemical pathway and the idea of performing conformational sampling along the pathway to improve the convergence characteristics of the basic binding free energy formula [Eq. <ref type="bibr">(19)</ref>].</p><p>This technique, commonly known in the field as stratification is used in many free energy estimation problems. <ref type="bibr">[40]</ref> As discussed in Section 2.3, Eq. ( <ref type="formula">19</ref>) is not directly applicable in numerical simulations because, fundamentally, the coupled and uncoupled ensembles preferentially visit distinct regions of conformational space (see Figure <ref type="figure">1</ref> for example). The free github.com/rajatkrpal/openmm_sdm_plugin github.com/egallicc/openmm_agbnp_plugin energy, however, is a thermodynamic state function, and it should be possible to compute it as the sum of free energy changes over a series of intermediate states, each sufficiently similar to its neighbors so that free energy estimation formulas such as Eq. ( <ref type="formula">19</ref>) among these are numerically well-behaved. <ref type="bibr">[41,</ref><ref type="bibr">42]</ref> The intermediate so-called alchemical states are generally implemented by means of an alchemical progress parameter that tunes the system's potential energy function such that = 0 corresponds to the initial state and = 1 corresponds to the final state. A simple-but not necessarily the most efficient <ref type="bibr">[17,</ref><ref type="bibr">43]</ref> choice-is a linear interpolating function of the form</p><p>where 0 ( ) is the potential energy function that describes the initial state and ( ) = 1 ( ) -0 ( ), where 1 ( ) is the potential function of the final state, is the perturbation potential. The progress parameter and the specific parameterization of the alchemical potential is said to define an alchemical path that connects, in a thermodynamic sense, the initial and final states.</p><p>The specific alchemical potential energy function adopted by Kilburg &amp; Gallicchio <ref type="bibr">[31]</ref> to study peptide binding is, in the notation of Section 2.3,</p><p>where the first term on the r.h.s. is the potential energy function of the decoupled ensemble (corresponding to 0 ( ) in Eq. ( <ref type="formula">36</ref>)) and the binding energy function is defined by Eq. ( <ref type="formula">15</ref>). It is straightforward to see that &#936; at = 1 is the potential energy function of the coupled state. An alchemical binding free energy</p><p>This concept has since evolved into rigorous statistical interpretations and numerical algorithms, some of which are discussed later in this section.</p><p>To improve convergence, Kilburg &amp; Gallicchio actually used a soft-core form of the binding energy function. <ref type="bibr">[44,</ref><ref type="bibr">17]</ref> Soft-core functions are critical aspects of alchemical binding free energy calculations.</p><p>profile, &#916; ( ), along the thermodynamic path is defined, which corresponds to the free energy of the intermediate alchemical state at relative to the uncoupled state</p><p>which is Eq. ( <ref type="formula">19</ref>) with replaced with , the perturbation energy at the alchemical state at . By definition, the excess free energy of binding <ref type="bibr">(19)</ref> is the difference between the end points of the alchemical binding free energy profile</p><p>In Kilburg &amp; Gallicchio's study, the alchemical path was subdivided into 26 intermediate states mostly linearly spaced between 0 and 1, except the region near = 0, which required more closely spaced points. Conformational sampling was conducted at each -state by molecular dynamics (MD) using the alchemical potential energy function <ref type="bibr">(37)</ref>. The binding energy function <ref type="bibr">(15)</ref> and its gradients were evaluated at each MD time step by first evaluating the potential energy of the complex &#936; ( , , ) and then displacing the peptide in the implicit solvent medium at a large distance away from the protein receptor to evaluate the potential energy &#936; ( ) + &#936; ( ) without protein-peptide interactions. Samples of the decoupled energy &#936; 0 = &#936; ( ) + &#936; ( ) and of the binding energy were saved at each alchemical state at regular intervals. As discussed in Section 3.1.3, these are the inputs for the estimation of the binding free energy profile and of the excess binding free energy through Eq. ( <ref type="formula">39</ref>).</p><p>Specifically by replica-exchange molecular dynamics in temperature and space as described in Section 3.1.2)</p><p>The ligand displacement approach to compute the alchemical potential energy was made necessary by the many-body nature of the implicit solvation model. As briefly discussed in Section 3.2, with pairwise decomposable potentials it is more common that is is integrated into the calculation of individual interatomic interaction energies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.2">Replica Exchange Conformational Sampling</head><p>Stratification implies that an alchemical binding free energy calculation is commonly carried out as a collection of molecular simulations, each with a different alchemical potential energy function [Eq. <ref type="bibr">(36)</ref>] at a series of values of the alchemical progress parameter . The accuracy of alchemical free energy calculations depends heavily on the conformational sampling's quality at each -state. In this context, the conformational sampling's challenge is to generate a diverse set of configurations distributed according to Boltzmann's distribution for the given temperature and potential energy function. It is not sufficient, like in molecular docking, to propose a set of low-energy configurations. The configurations should also appear according to their probability of occurrence. Conformational sampling in alchemical simulations is carried out by Monte Carlo and, more often, Molecular Dynamics (MD). MD conformational sampling is limited by the slow time-scales of biomolecules' motion, and a host of advanced conformational sampling algorithms have been devised to overcome it. <ref type="bibr">[45]</ref> Kilburg &amp; Gallicchio employed two-dimensional replica-exchange conformational sampling in temperature and alchemical spaces. <ref type="bibr">[46,</ref><ref type="bibr">31]</ref> It is useful to consider separately the problem of sampling intermolecular degrees of freedom (the position and orientation of the ligand relative to the receptor, denoted by above) from the sampling of intramolecular degrees of freedom (the individual conformations of the peptide and the receptor, denoted by and ). The first problem is related to the simulation algorithm's ability to explore all relevant binding modes of the protein-receptor complex for fixed receptor and peptide conformations.</p><p>Missing the most stable binding mode would, of course, underestimate the binding affinity. The sampling of intermolecular degrees of freedom is straightforward near the decoupled state ( &#8771; 0) where protein-peptide interactions are weak, and the peptide can nearly freely translate and rotate within the binding site volume. In contrast, because of receptor-peptide interactions, rotations and translations are severely hindered near the coupled state ( &#8771; 1) where the peptide visits alternative binding modes only very rarely. Therefore, one solution to this problem is to make it so that the MD thread evolves the system in conformational space as well as space. In this way, new binding modes are formed when is small and, if they are sufficiently stable, they will be retained when the MD thread visits more strongly coupled states at &#8771; 1. Conversely, an MD thread in a metastable binding mode at &#8771; 1 would have an opportunity to acquire a smaller and convert to another binding mode.</p><p>Of course, the excursions in space have to be so that a canonical ensemble of conformations is generated at each alchemical state.</p><p>The replica exchange algorithm achieves this by evolving as many MD threads as there are alchemical states. At any one point in time, each MD thread is assigned the value of a unique alchemical state . The collection of threads, called replicas, forms an ensemble of independent canonical systems with the joint canonical statistical weight function</p><p>where &#936; ( ) is the alchemical potential energy function <ref type="bibr">(37)</ref>, denotes the configuration of replica , and is the value of assigned to it. The joint distribution is sampled by alternating updates of coordinates at a fixed assignment of values, which is accomplished independently for each replica by conventional constant temperature MD, with updates of the assignments. The latter is performed at fixed by proposing permutations of assignments { 1 , . . . , } &#8594; { &#8242; 1 , . . . , &#8242; } at fixed configurations and accepting and rejecting the move using the Metropolis Monte Carlo algorithm based on the ratio of the values of the proposed and original weight functions</p><p>There are many variations of replica-exchange differing in the nature of the replicas, the scheme of permutations of state assignments, and the computational implementation. <ref type="bibr">[47]</ref> Schemes, such as the one illustrated above, that modify the parameters of the potential energy function are known in the field as Hamiltonian replica exchange algorithms. <ref type="bibr">[48]</ref> Kilburg &amp; Gallicchio used the Gibbs Independent Sampling Algorithm <ref type="bibr">[17]</ref> for Hamiltonian reassignments and an asynchronous implementation <ref type="bibr">[46]</ref> of replica-exchange for that allows running the collection of replica simulations on heterogeneous and potentially unreliable computational resources such as on computational grids. <ref type="bibr">[49]</ref> Hamiltonian replica exchange addresses the sampling of intermolecular degrees of freedom. However, because couples receptor-peptide interactions, it has only an indirect influence on the rate at which intramolecular degrees of freedom are sampled. Peptides are very flexible and often change conformation upon binding.</p><p>They often interact with the protein over an extended surface and induce substantial induced-fit reorganization of the receptor. Conformational rearrangements of peptides occur very slowly at room temperature, especially of the cyclic peptides investigated in this study. The temperature replica-exchange algorithm, one of the first versions of replica-exchange proposed, <ref type="bibr">[50]</ref> is very useful for accelerating the sampling of the conformational space of peptides and proteins, <ref type="bibr">[51,</ref><ref type="bibr">52]</ref> and is applicable to free energy calculations. <ref type="bibr">[53]</ref> Kilburg &amp; Gallicchio adopted a two-dimensional replica-exchange scheme in which both the and temperature assignments undergo permutations. The joint canonical weight is generalized as</p><p>where and are the inverse temperature and assigned to replica , and ( , )</p><p>is one of the pair combinations of a set of inverse temperatures and alchemical states. Kilburg &amp; Gallicchio employed 8 temperatures between 300 to 379 K and 26 alchemical states for a total of 208 replicas for each protein-peptide complex.</p><p>The multi-dimensional replica-exchange algorithm employed allowed to explore simultaneously multiple conformations of the peptide and multiplied binding modes of each conformation.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.3">Multi-State Free Energy Estimation</head><p>While Eq. ( <ref type="formula">19</ref>) is formally correct, it is not an optimal free energy estimator. Optimal here refers to a free energy estimator's ability to return a free energy estimate with the smallest bias relative to the true free energy (accuracy) and smallest variance (precision) with a given finite set of samples. Kilburg &amp; Gallicchio employed the Unbinned Weighted Histogram Analysis Method (UWHAM) estimator <ref type="bibr">[44]</ref> which is considered an optimal free energy estimator when no information of the system is known other than the samples from the molecular simulations. The statistical and mathematical origins of the method <ref type="bibr">[54,</ref><ref type="bibr">44]</ref> are beyond the scope of this chapter.</p><p>The main idea is to arrive at an estimate of the free energy &#916; ( ) [Eq. <ref type="bibr">(38)</ref>] at by using the data collected at all -states. UWHAM can be interpreted as an extension of the familiar Weighted Histogram Analysis Method (WHAM), <ref type="bibr">[55]</ref> applied to Eq.</p><p>(20) for the maximum likelihood estimation of the distribution of binding energies in the uncoupled ensemble 0 ( ) from the corresponding distributions along the alchemical path ( ).</p><p>In this case, Kilburg &amp; Gallicchio collected data as a function of temperature as well as on a grid of 208 states. UWHAM provides, in this case, optimal estimates of the dimensionless free energy factor for each state defined as, up to a additive constant, Note that, because is not dimensionless, the ambiguity of the additive constant is also related to the arbitrariness of the units chosen to evaluate the logarithm.</p><p>= ln ( , )</p><p>where and are the values of the inverse temperature and of the alchemical progress parameter of state and ( , ) is defined by Eq. <ref type="bibr">(11)</ref>. Given the free energy factors, the free energy profile as function of temperature and is given by Eq. ( <ref type="formula">38</ref>), or</p><p>The dimensionless free energy factors minimize the convex objective function <ref type="bibr">[44]</ref> 1</p><p>where is the total number samples collected at any of the states, is the number of samples collected at state , and</p><p>is the dimensionless energy of sample in state , where &#936; 0, and are, respectively, the values of the decoupled potential energy and of the binding energy of the sample collected during the replica-exchange alchemical simulations. The UWHAM optimizer implemented in the statistical program R was used to obtain the dimensionless free energy factors (cran.r-project.org/web/packages/UWHAM).</p><p>Note that setting to zero the gradient of the UWHAM objective function leads to the self-consistent equations</p><p>Because the free energy estimates are known up to a temperature-dependent additive factor, differences between free energies at different temperatures are generally meaningless. However differences along at different temperature can be compared. For example, the binding free energy at one temperature &#916; ( ) = &#916; ( , = 1) -&#916; ( , = 0) can be compared to the binding free energy estimate at a different temperature to, for example, estimate the binding entropy.</p><p>The convexity property guarantees that there is a unique minimum. Ding, Vilseck, and Brooks <ref type="bibr">[56]</ref> developed a GPU implementation of UWHAM called FastMBAR (github.com/xqding/FastMBAR).[56]</p><p>where = -. Eq. ( <ref type="formula">47</ref>) is the basis of the equivalent Multistate Bennet Acceptance Ratio (MBAR) method to obtain the free energy factors. <ref type="bibr">[57]</ref> The UWHAM formulation of multi-state reweighting has been found to be more generalizable than MBAR's. <ref type="bibr">[56]</ref> For example it has been recently employed to impose global restraints on the free energy solutions.</p><p>[58]</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Effect of Mutations on the Binding Affinity of Peptides to PDZ</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Protein Domains</head><p>PDZ protein domains are widespread protein-protein interaction modules. They specifically recognize the 4 to 8 aminoacids at the C-terminus sequence of proteins.</p><p>Peptides and peptide derivatives that mimic these binding motifs are investigated as potential therapeutics for many diseases. <ref type="bibr">[59]</ref> Panel et al. <ref type="bibr">[60]</ref> studied the binding free energies between the TIAM1 PDZ domain and a series of peptides derived from its syndecan-1 and caspr4 protein targets (Figure <ref type="figure">4</ref>) using an alchemical relative binding free energy computational method generally known in the field as Free Energy Perturbation (FEP). <ref type="bibr">[61,</ref><ref type="bibr">62]</ref> The study's goal was to validate the methodology for protein-peptide binding and obtain physical and structural insights into the recognition mechanisms that allow PDZ domain to target specific sequences.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1">Theory of Relative Binding Free Energy Calculations</head><p>The dataset considered by Panel et al. <ref type="bibr">[60]</ref> included the TIAM1 PDZ domain bound to the wild-type peptides and a series of single and double mutants. As discussed in Section 2.3.1, peptides are generally too large and complex to be studied by</p><p>where the ratio of partition functions involving the receptor correspond to the free energy difference &#916; bound between the complex with ligand 2 in the solvent and the same system but with ligand 2 replaced by 1 . Similarly, the ratio of partition functions of the ligands in solution correspond to the free energy difference &#916; solv .</p><p>Finally, using Eq. ( <ref type="formula">2</ref>), we obtain</p><p>which is the key formula of the relative binding FEP method.</p><p>Fig. <ref type="figure">5</ref> The thermodynamic cycle used in the relative free energy perturbation method. The vertical transformations correspond to the association equilibrium between the receptor R and one of two ligands L 1 and L 2 . The horizontal legs correspond to the alchemical transformation of one ligand into the other alone in solution (top) or in the complex (bottom).</p><p>Comparing the free energies of systems with different atomic composition and number of degrees of freedom is arguably physically meaningless at this level of theory. However, note that the overall ratio of partition functions in Eq. ( <ref type="formula">48</ref>) if physically well defined. It represents the free energy difference between two systems, the first composed of two solutions one containing the complex with 2 and the other containing 1 , and the second in which 2 and 1 have swapped places. Evidently, the free energy difference &#916; bound -&#916; solv , which is the target of the theory, is physically well defined even though the individual components may not be.</p><p>Let's now turn to the evaluation of &#916; bound and &#916; solv by alchemical computer simulations. As usual, the strategy is to compute ratios of partition functions as ensemble averages. However, for example, the expression</p><p>cannot be directly turned into the form of an ensemble average because, in general, the number and kind of the internal degrees of freedom of the two ligands differ.</p><p>Panel et al. <ref type="bibr">[60]</ref> adopted the so-called dual topology strategy to address this issue, in which the simulation is conducted with a hybrid peptide in which the wild-type, say, and mutated aminoacid side chains are both represented at the same time (Figure <ref type="figure">6</ref>). The alchemical potential energy function is constructed so that the environment (the water solution or the solvated receptor) interacts with the atoms of both forms of the sidechain with a strength that depends on the alchemical charging parameter .</p><p>Similarly, the intramolecular potential energy function is designed so that the atoms of the protein backbone interact by bond stretching, bond angle, torsional, and 1,4 non-bonded interactions with both forms of the sidechain. The atoms of the two forms of the sidechain being mutated never interact directly with each other.</p><p>Formally, the dual-topology approach is derived from Eq. ( <ref type="formula">48</ref>) by multiplying and dividing each term by an appropriate partition function that introduces the additional degrees of freedom to turn each peptide into the hybrid peptide with both forms of the sidechain. For example, if , 1 term represents the peptide with the phenylalinine (PHE) sidechain in solution (Figure <ref type="figure">6</ref>, red), multiplying and combining it with</p><p>There is an analogous single-topology strategy <ref type="bibr">[64]</ref> which we do not discuss here.</p><p>does not otherwise interact with the environment. The same procedure applied to the partition function of the complex of the original peptide bound to the receptor , 1 in the denominator of Eq. ( <ref type="formula">48</ref>) yields the partition function , 1(2) of the hybrid peptide in the PHE state bound to the receptor. Similarly, multiplying and dividing by the term PHE analogous to Eq. ( <ref type="formula">51</ref>) to install a PHE sidechain onto the peptide with the ILE sidechain, yields the partition functions ,</p><p>,</p><p>for the hybrid peptides in solution and bound to the receptor in their ILE states.</p><p>With these preparations, finally Eq. ( <ref type="formula">48</ref>) is rewritten as</p><p>, 1</p><p>,</p><p>where</p><p>where 2 is the change in potential energy of the system for a given configuration of the solvated complex with the hybrid peptide due to, in this example, turning off PHE sidechain and turning on the ILE sidechain, and . . . 1 represents the average over the ensemble in which the PHE sidechain is on and the ILE sidechain is off. An analogous ensemble average gives &#916; solv for the transformation of PHE into ILE in solution.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2">Alchemical Transformations for Relative Binding Free Energies</head><p>As discussed in Sections 2. </p><p>where is the collection of all of the degrees of freedom of the dual-topology peptide system, 12 ( ) contains the potential energy terms that do not depend on 12 ( ) = 0 ( )</p><p>where 0 ( ) is the unperturbed component of the potential energy (including the intramolecular potential energy terms of the dual-topology sidechains not affected by the transformation, but excluding interactions between the two sidechains), and the terms SD ( ) represent the auxiliary restraints used in the dual-topology scheme to anchor each sidechain to the backbone [see Eq. ( <ref type="formula">51</ref>)], and</p><p>where NB denotes non-bonded interactions between the sidechain atoms and the environment, SS denotes the bonded (1-2, 1-3, and 1-4 interactions) among backbone atoms with sidechain , and SD is the corresponding term for bonded interactions between the backbone atoms and the sidechain. As illustrated by Eq. ( <ref type="formula">56</ref>), the nonbonded component has an explicit dependence due to the use of separation-shifted soft-core pair potentials <ref type="bibr">[67,</ref><ref type="bibr">65]</ref> to describe the non-bonded interactions between the dual-topology sidechains and the rest of the system.</p><p>It is straightforward to see that Eq. ( <ref type="formula">54</ref>) evaluated at = 0 describes the 1(2)</p><p>state of the dual-topology peptide with sidechain 2 turned off and, conversely, = 1 describes the (1)2 state. Panel et al. <ref type="bibr">[60]</ref> simulated 11 alchemical states from = 0 to = 1. The change in free energy from to +1 was evaluated using the Bennet Acceptance Ratio (BAR) method, which is MBAR [Eq. <ref type="bibr">(47)</ref>] for two states and</p><p>The S symbol stands for the single-topology region (the backbone in this case), and D stands for dual-topology region (the two sidechains). <ref type="bibr">[67,</ref><ref type="bibr">65]</ref> where, in this case,</p><p>is the alchemical potential energy at of the conformational sample collected at either or +1 .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3">Potential of Mean Force Study of the Binding of the MEEVD peptide to the TPR2A Receptor</head><p>The heat shock organizing protein (Hop) binds specifically to the heat shock protein Hsp90 through its tetraticopeptide repeat (TPR) domain TPR2A. TPR modules are widespread protein domains responsible for the specific recognition patterns of many proteins. Due to their molecular recognition characteristics, engineered TPR domains are seen as potential alternatives to antibody-derived biological medicines.</p><p>Lapelosa <ref type="bibr">[22]</ref> studied the binding of the MEEVD peptide from Hsp90 to the TPR2A domain of Hop (Figure <ref type="figure">7</ref>) using the potential of mean force methodology outlined in Section 2.4. The work yielded an estimate of the standard free energy of binding between TPR2A and MEEVD in good agreement with experimental measurements.</p><p>It provided structural insights into the entry and exit mechanism of the peptide from the receptor binding site.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.1">Calculation of the Standard Binding Free Energy</head><p>Lapelosa <ref type="bibr">[22]</ref> computed a 1-dimensional radial potential of mean force (PMF), &#916; ( ), along the center of mass separation between the receptor and the peptide (Figure <ref type="figure">7</ref>) using the Adaptive Biasing Force (ABF) method described in the</p><p>The numerator and the denominator of Eq. ( <ref type="formula">47</ref>) are often combined to cast the formula in terms of energy differences &#8242; -.</p><p>ABF introduces a fictitious biasing force ( ) along the radial direction such that the observed distribution of distances with the addition of the biasing force, obs ( ), is flat within the sampling region (in this case the region within the cone illustrated in Figure <ref type="figure">7</ref> with 0 = 60 &#8226; angle of aperture and up to &lt; * = 30 &#197;).</p><p>A derivation of ABF is beyond the scope of this chapter, however to motivate it, first note that differentiation of Eq. ( <ref type="formula">31</ref>) leads to the conclusion that the gradient of the PMF with respect of is the average gradient of the system potential energy function</p><p>where is the potential energy function of the solvated system and . . . represents an ensemble average at fixed . In other words, the negative of the gradient of the PMF is the system force averaged over the degrees of freedom of the system other than those along which the PMF is defined, thereby justifying the name potential of mean force for &#916; ( ). The same conclusion applies to forms of the PMF averaged over some coordinates such as ligand orientations [Eq. ( <ref type="formula">30</ref>)], including the 1-dimensional radial PMF, &#916; ( ), considered in the work of Lapelosa.</p><p>Also, note that the PMF along a coordinate is proportional to the logarithm of the probability distribution for that coordinate [Eq. <ref type="bibr">(30)</ref>]. Thus, a flat distribution indicates that the overall force, the mean force, plus the biasing force along the coordinate is zero or, equivalently, that the added biasing force is equal and opposite to the mean force. This implies that the potential of mean force can be obtained by integrating the biasing force that flattens the radial distribution. The additional benefit of having a flat distribution is that the dynamics along the chosen coordinate are more likely to be diffusive and not impeded by free energy barriers. Indeed,</p><p>In this case the radial force is interpreted in terms of the force of a central potential, and Eq. ( <ref type="formula">62</ref>) has additional terms due to the Jacobian of the radial coordinate. <ref type="bibr">[69]</ref> several independent binding/unbinding events have been reported in the study by Lapelosa. <ref type="bibr">[22]</ref> </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Conclusion</head><p>This chapter has shown how a statistical mechanics formulation of the non-covalent molecular association from first principles gives rise to different computational methods to estimate the binding free energies of protein-peptide complexes. The three case studies illustrate the application of each method to particular molecular complexes and how they are tailored to achieve specific goals. It is much more challenging to apply rigorous binding free energy estimation methods to proteinpeptide complexes relative to small-molecule binding. We hope that this chapter illustrates how a good appreciation of the underlying theories and their computational implementations helps understand the practices connected with each approach and its strengths and limitations.</p></div></body>
		</text>
</TEI>
