 Welcome to the lecture on alchemical free energy calculations. My name is Bert Groot and together with Vitas Kapsis we'll be having two lectures as well as a practical session on free energy calculations. I will start off with a more general introduction into the concepts. And afterwards Vitas will have a presentation more focused on applications and examples. And then we'll be equipped with the tools to be able to carry out such calculations ourselves in the following practical session. Alright, so why are we interested in free energy calculations at all? And in particular of course alchemical free energy calculations but let's have a look at free energy calculations or free energies in general. You may know that in fact all sorts of properties of molecules that we are interested in are in fact free energies. So for example affinities are free energies. For example if we are interested in binding affinities ligands binding to a receptor protein protein binding the affinity that we express in nano molar or so is actually free energy binding. Same thing for stability. If we say a system is very stable for example protein is very stable against unfolding. It's a stability of folding and this is a free energy of the folded state as compared to the unfolded state. And another example is in fact a conformational transition rates. So if we think about the barrier that needs to be overcome this is also a free energy barrier determining the rate of for example a reaction we might be interested in or a permeation rate for a channel. And which may in fact come from something like potential mean force or PMF and can be computed with different methods. This is a free energy profile from which we can for example learn permeation characteristics of a channel. And so this is the reason why we are interested in free energies and there is a practical advantage as well. It has to do with the way how we compute these properties and free energies happen to be more tractable to compute from for example in some calculations than other thermodynamic quantities. So not only is free energy important because it drives many of the processes they were interested in but they are also relatively easily tractable to calculate. So which flavors of free energy calculations are there? Well you may have guessed there are many because of the importance of being able to estimate free energies and so we are we have these different possibilities. We have first principles based, first principles meaning based on the first principles of statistical mechanics. Then we have more statistical based methods and these can be either ligand based or protein structure based and then we have hybrid methods that somewhat combine the process of both. And those again can be purely statistical so you may have heard of Rosetta for example that uses the statistical force fields to estimate free energies and then there is also more heuristic methods using regression models and MMPDSA for example are examples of that. Today we'll be focusing on first principle based and I'll explain a little bit what kind of methods there are so we can use either spontaneous free sampling and then using the Boltzmann formula we can also use different forms of biased sampling so we can use enhanced sampling like metodynamics that you have already heard about earlier this week. We can also use things like umbrella sampling, I'll explain in a minute what that is but we can also use alchemical approaches and that will be the focus of today. Which can either be in an equilibrium setting and those are the traditional alchemical free energy calculation methods or today we can also use non-equilibrium approaches and they'll explain how that works and we'll also apply that later today. So a little bit on the statistical mechanical or thermodynamic background on free energies. As I already said free energy is a thermodynamic property we can write it as done here g equals h minus ts meaning the free energy here the Gibbs free energy is defined as the enthalpy of the system or the internal energy minus the temperature times the entropy. Another way of writing the Gibbs free energy is minus kT times the logarithm of the partition function a third definition which is actually very handy is the so called electron equation that relates the probability pi of finding a state it relates it directly to the free energy by this exponential formula so the probability of finding a state is proportional to the e to the minus the free energy of the states divided by the thermal energy kT so these are different ways to compute free energy and as we can already see from this Boltzmann formula that relates the probabilities to free energy if we are lucky enough and we find all the states samples spontaneously and for example in FD simulation we can just see how probable each state is and from just doing the differential counting we know the free energy from applying this Boltzmann formula if we are not so lucky and finding the states being samples spontaneously we can of course bias the system we can either introduce a perturbation in a chemical way or we can also apply a perturbation in the form of an umbrella potential or a Gaussian potential in the case of pentadynamics and we can use non-equilibrium methods such as Jorzynski and Krux and I'll explain how that works afterwards so let's first go to the most straightforward case where we can just in equilibrium sampling already visit all the states that we're interested in so let's consider this case of a here we see as a function of time the deviation of a small peptide our mass deviation from a folded helical structure and at room temperature in the upper plot we see that it's mostly sampling low RMSD values meaning it's close to this helical state and then occasionally it shows a higher RMSD value meaning it's unfolding to a different state there we now raise the temperature go to 340 Kelvin in the lower plot we then see that it's folding and unfolding all the time and it's spending almost an equal number of time in the folded and the unfolded state so from the Boltzmann formula we can already see that the folded state must be the most probable and therefore the lowest free energy state at room temperature whereas at 340 Kelvin situations changed and the folded and unfolded states have almost equal probability and therefore they have equal free energy which also means we must be close to the folding temperature there because we have the folded and unfolded state in equilibrium at almost equal probability now that we can apply this formula we can also look in somewhat more detail so here we look at the principal component trajectory projection of the same trajectory and we see these dense spots for which we also see snapshots so the densest region in red is actually the folded helical structure then we have an associated state which is folded from one side but not yet from the other then we have a third spot which is more like a hairpin structure which is disconnected from all the rest and then we have this diffuse cloud of other structures in fact the collection of unfolded structures and the color code gives us the local phase space density meaning the brightly red colored states at high density meaning low free energy and this is obviously true for the folded helical state and for these other states whereas in blue we see a collection of unfolded states meaning all of these individual states have a higher free energy and only together do they show an appreciable probability to be there we can also express this probability of being in a state and on a different axis this is what we do here on the vertical axis and we again very clearly see that there are these three states with the lower free energy here in blue and stays with higher free energy in green and in red that's where the probability of finding those states is very low and on the right hand side we also see the effect of temperature in the room temperature as we saw before the system is mostly folded meaning it mostly is sitting in these low free energy bases whereas at the 340 Kelvin where we are closer to the folding temperature we spend almost equal amounts of time also in the unfolded state meaning that these folded minima are less populated okay now what if we are not so lucky and what if we are not visiting all the states that we are interested in spontaneously by just running along in this relation well we can help the system somewhat but of course then we have to worry about how we are biasing the system and in the case of umbrella sampling what we do is we are interested to extract underlying free energy profiles such as that here the red profile is what we would be interested in the barrier that we see here the delta G might be too high to overcome in a microsecond long simulation so what we can do is we can of course this is to overcome this barrier and this is for example shown on the right hand side here we are looking at this blue molecule could be an ammonia molecule sitting in water that we want to push through a channel now the ammonia might have a high permeation barrier so it might not want to go through spontaneously and we can apply an additional harmonic potential to the system to force it basically anywhere where we want the molecule to be and we can also force it to very high free energy regions now that solves half of the problem so then we have the system where we want it to be but of course we have artificially changed the probability of being there so we cannot just apply the Boltzmann formula like we did before we somehow have to take into consideration the bias that we applied to get the molecule there in the first place now the advantage is we know exactly what we did to that molecule we know the harmonic potential that we applied we know how we were biasing the system so we can unbiased the probability densities for exactly that additional potential and get the unbiased or reconstruct the unbiased free energy profile that way one method of doing that is the weighted histogram analysis method or also one and that is frequently used to in conjunction with umbrella something to extract free energy profiles from a DCV lesions now let's switch to our chemical methods which traditionally are known as free energy perturbation or thermodynamic integration methods there the situation may be somewhat similar so we're looking at the barrier here for example of the barrier of binding a molecule to a receptor and again that may be a barrier that is too high to overcome on short time scales so we we want to do something about it we could of course again apply something like umbrella something to overcome this barrier and to go from the unbound state to the bound state and compute the associated free energy profile and from the difference between bound and unbound we would then know the affinity of the binding of this molecule to the receptor what we can also do is say well we're actually not so interested in the whole profile and the barrier height and so on and the whole binding unbinding process we just want to know is the, we want just to know the affinity so is the molecule the difference in free energy basically between the bound and the unbound state and we are not necessarily interested in all the in between states unbinding pathway what can we do in such a case well here we can think of it in terms of perturbation of the system and we can for example define bound state as associated with the called it zero in the bound state and one in the unbound state we can also define actually the difference between two ligands and call them A and B and then we call lambda equals zero would be one ligand state B which lambda equals one would be associated with the second lambda value lambda equals one and the nice thing about computational physics is that this is now a coordinate lambda switching between two states for which we can define a free energy value so we integrate for lambda going from zero to one and the derivative of the Hamiltonian with respect to lambda and if we would sample that change of lambda we would have the associated free energy change now we can do this in one go that would be called perturbation we can also do this within between steps if we then do the integration as follows that would be called thermodynamic integration either way this is a chemical change some of you should become very skeptical now because we know that experimentally alchemy has not been very successful it has been tried very hard over many centuries it was never successful experimentally so there should be a healthy concern about whether it's possible to do this at all and the good news is that computationally this is not a problem at all so we can easily create a nitrogen out of nothing or we can change lead into gold or carbon into oxygen this is all fine because physically it's well defined how to do that and what is the associated energy and free energy change it's just that experimentally it's not so readily accessible but there's no reason why we should not apply this computationally and this is actually an advantage of doing simulations now there's one question here if we look at the formula especially if we remember the previous formula here the one right here at the bottom left g equals h minus ts free energy is an enthalpy term minus the temperature times an enthalpy term if we now go back here what we have is free energy is an integral over the internal energy this is the different h now it's a Hamiltonian but basically it's a very similar function and so we just look at the derivative of the internal energy with respect to lambda and that should give us free energy so where is now the minus ts term the question I will claim that this formula is correct we are not neglecting the minus ts term I'll just leave it to the discussion afterwards if somebody has any ideas if not you can also ask me but if there are any ideas where this minus ts term comes in in principle we are only looking at the integral over the internal energies here now how does this work in practice let's first look at perturbation what we do here is we apply the so called swansi formula which means that we look at the difference in the internal energy so we would simulate for example state a and we would compute the energies in state a but we would also compute the energies in case the system would be in state b state a is molecule 1 and state b is molecule 2 we would compute our whole simulation ensemble for molecule 1 but we could also think well what would have been the energy if in this particular state we would have actually had molecule 2 and this is how we then find the free energy as we compute the difference between the two energies take the exponential average over that and call that the free energy and of course by symmetry we could have done that also the other way around so we could simulate it also in state b and compute the energies for state a and both would give the same free energy change and this works very successfully in the case as here in the lower left where we have sufficient overlap so if the two molecules are rather similar sample similar states then that means we can apply this formula without any restrictions and get an accurate estimate of the free energy of course things become more problematic if the molecules are very different then they would also want to sample different states meaning that the states that we sample for one of the molecules is not representative anymore for the other molecules and then we would be making severe errors but there is an easy way out of course in such a case we just define a number of intermediate states and then apply the same formulas in series we can still apply the perturbation formalism here every time between neighboring states that are sufficiently overlapping and then just propagate through all the differences to get our final free energy difference we can also do it in a continuous way like we've shown with the integral before we don't use the difference between the energies anymore but we just look at small changes of lambda and compute the derivative of H with respect to lambda and then integrate over our coordinate range 0 to 1 to get our free energy change and then we are at thermodynamic integration so we have perturbation on the left and on the right and there is pros and cons to both so perturbation tends to be slightly more sensitive towards the charge of lambda steps we have an equilibration phase prior to the production phase for each lambda window that we are interested in and this exponential averaging often is a systematic error if we are not careful thermodynamic integration is likely more robust because we don't have a discrete sum but we have a more fine grained sampling but because of that we also must be careful how fast we change our lambda during the simulation in order to make sure that the system has the possibility to adapt otherwise we get a Hamiltonian lag that the system lags behind what we are doing with lambda and we will come to that in fact with non-equilibrium methods in a minute because this is what we would want to avoid here, a Hamiltonian lag would mean that we are introducing non-equilibrium effects in the system which would break the formalism of for example thermodynamic integration but in non-equilibrium methods we actually make use of these non-equilibrium dissipation effects but to complete this list the final advantage if you wish of the thermodynamic integration over perturbation use this exponential averaging but we have a direct calculation of the derivatives and therefore are slightly less sensitive to numerical issues perhaps so let's look at a practical example of where we can make use of this formalism so again let's think about binding and binding of two molecules one is this green more triangular molecule and it might bind to a receptor and undergo a conformational change in doing so we have a different molecule blue one which can bind to the same receptor maybe in a different post, we don't know yet but so if we would want to measure this experimentally we would be looking at the horizontal lags of this cycle where we would compare the bound state to the unbound state and look at the affinity or the free energy of binding so delta G binding here and we would do that for the green molecule as well as for the blue molecule and the difference between the two would then be the difference in affinity between these two molecules and in simulation we could do this as well but we would have to do this with for example umbrella sampling or metodynamics in order to simulate the whole binding unbinding process and this can be cumbersome and if we're only interested in the difference between the bound and the unbound we can actually also approach this alchemically where we then look at the vertical lags of this cycle so we would try to morph the green molecule into the blue molecule and we would once do this in the unbound state so there we can actually forget about the protein and just insolvent morph the blue into the green molecule or vice versa and we would also do the same in the bound state again we would transform the green molecule into the blue molecule and now the nice thing is that because this is thermodynamics it doesn't matter how we complete the cycle the cycle should always close so meaning if we want to go for example from the upper left to the lower right it doesn't mean which path we take we should always come up with the same answer so we can go to the upper right first and then go to the lower right we can also go to the lower left first and then go to the lower right and it should always add up and that in turn also means that the sum of the or the difference between the two horizontal lags that we might be interested in as the difference between binding infinities must be the same as the difference between the two vertical lags which we can approach alchemically so the difference in binding infinity can be obtained from the difference between the alchemical transformations one in the bound state one in the unbound state without ever having to compute the actual binding unbinding process and that's the beauty of this method that we circumvent cover some pathways in this case binding and unbinding by taking alchemical transformation that gives us access to exactly the same information so exactly these vertical lags are accessed by perturbation or thermodynamic integration or alchemically another example of such a cycle is if we are interested in protein or peptide folding and for example the question would be now if we introduce a mutation how would it affect the folded versus unfolded state so how would it affect the folding process folding free energy this is maybe an even more cumbersome pathway to approach directly because it would involve looking at the protein folding and unfolding pathway in the very beginning of this lecture we looked at one example where this happens spontaneously but this is rather the exception frequently this is a rather tough process to simulate and therefore what we would have to do here would be to look at the protein folding and unfolding for wild type and then we would have to repeat the exact same thing for the mutant that we are interested in and look at the difference in folding free energy now just as with the binding example before we can look at this as a thermodynamic cycle and also approach it alchemically here the alchemical transformations are now the horizontal one so instead of sampling the cumbersome vertical legs of unfolding and folding we look at the horizontal alchemical transformations of mutating the system from an unfolded to unfolded sorry from a wild type to mutant case we do that once in the unfolded state and we do it once in the unfolded state and the difference between the two give us exactly the difference in folding free energy for the mutant as compared to the wild type which is exactly what we were after in this case here we got alchemical information based on alchemical information on protein folding now like I said up to now we focused on equilibrium methods free energy perturbation and thermodynamic integration assuming that the system at all times is in equilibrium so be it either in state A or state B wild type mutant bound to unbound and the changes that we make are always so slow that we can assume that the system is in equilibrium at all times now what if that's not the case then of course for example if we morph one ligands into another very quickly or if we mutate an amino acid in a protein very quickly from say an alanine to a phenylalanine then what we would be computing is it's repeated here the same formula so in terms of thermodynamic integration what we would be computing is the integral over dh d lambda by changing lambda and if we do this slow enough so if we are in equilibrium this gives us the delta g if we do it much too fast then we also have non-equilibrium effects so like friction effects by introducing atoms where there is no room we are dissipating extra energy and therefore we get the contribution in addition to the free energy so that's why we don't write delta g here anymore but we write w for work that we need to do which contains a contribution from the free energy but it also contains a contribution from dissipated non-equilibrium work so now this is this complicates things but there is good news and that is we can either use the so called jorszynski framework or the krux fluctuation theorem and that's what we are doing here and the krux fluctuation theorem tells us that well even if we go way too fast it creates a non-equilibrium work if we repeat that process a number of times we get the so called work distribution so here we get a certain distribution of work values for the forward transition in blue and we can do the same thing backwards here we do the mutation forward so we start from the wild type state and we perform a number of mutations very quickly to the mutant state in addition we can also simulate the mutant do the exact same thing in the reverse direction so introduce the mutation back to the wild type also we collect a number of work values for that and now the krux fluctuation theorem tells us that the probabilities of finding certain work values does again look a little bit like this Boltzmann formula so it's related to the delta g and in fact at that value of the work that equals the delta g so where this difference between w and delta g becomes zero this is where the two probabilities are equal so meaning if we look at these two histograms where they overlap at the intersection point the two probabilities are equal and so this means this would be our estimate for the delta g and that means we actually have a means of extracting an equilibrium thermodynamic free energy from a site of non-equilibrium past transitions and this is very good news because this is something that we have easy access to in MD simulations so this is one actual example a number of snapshots derived from an MD simulation we did the number of forward transitions in green, number of backward transitions in blue, we see that the two distributions are separated from each other so there is dissipated work but there is also overlap between the two distributions we can see the intersection point and this is actually the estimate of the free energy in this case which is rather close to zero alright so how can we apply that in practice I'm going to show two examples one on the design of inhibitors and another on protein mutations first one is on thrombin inhibitors and this is just a quick benchmark based on unknown experimental affinities if that's something that can be reproduced by this alchemical non-equilibrium approach so the thermodynamic cycle that we are looking at looked very similar to what we looked at before so we compare a bound to an unbound state and then mutate one ligand into the other and this is how we approach it in simulations we can in fact also have both the bound and the unbound state in the same simulation box this also helps if we for example have a charge change on the ligand if we have the bound and the unbound state both in the same box it means that we can do this change without an actual charge change in the simulation it helps with air for the expert this helps with particle mesh evals we can keep the system at constant charge at zero charge at all times and yeah compute a difference in the binding affinity delta delta g between these two ligands this is the result that we got in the upper left we have the results as a bar graph in the lower left we have them as a scatterplot but let's first have a look at the upper left where we have the measured values in orange and the calculated simulation values in cyan and green and we see that overall there is quite a nice correspondence so whenever the simulation says that a certain compound is a good finder then in almost all cases also the experiment confirms that this is the case so all of the ligands here are deviations from a reference ligand that is shown in the inset and so here we have looked at these different ligand modifications in the scatterplot in the lower left we also see the overall correlation it's not perfect we have a correlation over around 0.8 but that's overall quite decent so qualitatively these kind of simulations tell us if a molecule is a batcher or worse binder than the reference compound and this is all based on these non-equilibrium alchemical estimates the second example I want to briefly share with you is an example on protein mutations and molecular recognition and design and we're looking here at protein ubiquitin and just a few words background on how ubiquitin works ubiquitin is a protein tag that associates with quite a large number of different proteins in the cell and therefore not surprisingly you will find examples of ubiquitin in complexes with different other proteins in the protein data ring so just a few examples of those are shown on the right where you see ubiquitin in complex with different other proteins and ubiquitin is always in a slightly different conformations sort of specific for that particular complex and at the same time on the left-hand side we see a solution ensemble of ubiquitin reflecting the kind of motions that ubiquitin undergoes in solution and those could be obtained either from simulation in this case it was obtained from NMR experiments but the agreement between NMR and MD was astonishingly high so you would get the same solution ensemble if you would look at it from simulation or from experiment but of course the question is is the dynamics that ubiquitin undergoes in solution is that the same type of dynamics that ubiquitin needs if it wants to engage in these different complexes some of the examples you see on the right and again we can look at this with principal component analysis that's what we see in this part and so we have here two ensembles projected one is the solution ensemble in red and the other is the ensemble of ubiquitin extracted from different complex structures from the PDB in black and already the correspondence in this projection shows that in fact the two ensembles are very similar we see the main mode of motion or the main mode of how ubiquitin adapts to engage in different complexes we see that in the animation so it's an opening and closing kind of pincer like motion opening and closing in order to adapt to different surfaces that it engages in and of course the overlap between the bound and the unbound states sort of suggest that what might be going on here is in fact conformational selection where a complex selects a ubiquitin structure that it needs for complex formation so is this in fact what is going on well we can not only simulate ubiquitin free in solution but we can also simulate it in different complexes that's what we did here so we see the same PCA projection now for simulated ubiquitin free in solution in red so the red panel is repeated here many different times and the blue panels are ubiquitin in different complexes you see the PDB codes in each of the panels and there we also see that in the complex sometimes ubiquitin is in fact restricted to a more open configuration sometimes it's restricted in a more close configuration in some complexes it can open and close just like in the free state in the unbound state so there it's not restricted at all in quite a few of the cases it's actually restricted to either the open or the closed state and that gave us the following idea for a protein design study and that is so apparently the native ubiquitin can constantly undergo this conformational change between open and closed and then it can form a complex with either proteins that require the open or the closed states in the complex now if we would be able by computational design to come up with a mutated ubiquitin that would strongly favor say the open state then the hypothesis was that maybe we can create a scenario in which ubiquitin is still equally happy to engage in a complex with a protein that requires the open state but it would not form complexes anymore with proteins that would require the closed state so what we did is we designed something like 112 mutants all in the core of ubiquitin so we didn't touch the actual interface surface and looked whether they would prefer the open or the closed state or if they would visit either of them like the wild type does in fact many of the mutants that we tested did not show any effect so they would not have any preference for open or closed but we had a handful that had a substantial preference for the open state and another few that would prefer the closed state so those would be interesting to follow up in terms of actual binding so the first thing we did was actually to follow up with a set of control simulations here by umbrella something to where we computed the complete open and closing free energy landscape in blue we see repeated the wild type protein so it has one for open one for close both almost equally deep and then we have different panels for the different mutants that we had predicted to be either open stabilizing or close stabilizing and from the chemical transformation screen and they are not all confirmed in this umbrella something scheme but there is a number of cases like this L69T that strongly prefers a closed state and has a strongly destabilized open state and there is other examples like for example this I13F where it is exactly the other way around as already predicted from the alchemical screen and now also confirmed in umbrella something that this would be a mutant that strongly stabilizes the open state over the closed state now then it becomes interesting to see do they have the expected effect in this scheme of complex formation and so let's have a look we have three scenarios on the left we have those mutants and complexes that are immediately compatible with each other so this could be a mutant that stabilizes the closed state with the complex partner that requires the closed state same for open and here we would expect no change in the affinity upon mutation and this is exactly what we see so we have a number of mutants and a number of complex binding partners and we see that the delta G so the affinity difference due to the mutation is close to zero and we have a second category here in the middle here either the mutation doesn't strongly affect the open-close equilibrium or the complex binding partner is actually tolerant to both the open and close states so again here we would expect no difference in affinity and this is what we see in the majority of the cases and then we have this interesting category on the right which is the one of incompatible ubiquitin and binding parking so yeah for example ubiquitin locked in the open state where the binding partner would require a closed state of ubiquitin or the other way around and here we indeed see on average a much larger so positive delta delta G meaning a large destabilization of the complex so exactly what we had hypothesized should happen based on our design scheme here so destabilization meaning no complex formation now this is all from simulation so of course this is a point where it's also interesting to follow up experimentally so we took some of these more destabilizing mutants as well as a control mutant that did not have a large effect and followed up experimentally what would happen so we looked at the ubiquitin complex formation with disc 2 protein we had two mutations i13f i36a that we predicted to be neutral so they should not be expected to have any effect small predicted delta delta G and also experimentally we had the small delta delta G here we have two predictions one is based purely on the conformational shift of the open closed and the other was on the predicted or the calculated delta delta G in the actual complex but basically gives a very similar answer on the other hand for the L69S and L69T mutations here on the right we predicted a strong destabilization of the complex and this is also what we found experimentally so this is a nice experimental validation of what we actually get here is a change in function so a change in finding affinity on protein protein complex formation based on a designed conformational shift I think that's about all what I wanted to share at this point but not of course without thinking of all the people in the team that did all the work and contributed heavily to the development of the PMX software that is behind all of this as well as the working out concepts and so on so it all started by work of Daniel Seliker who is now working in Burgen at Ingelheim it's followed up by Pitas Gapses and you'll hear the next lecture by him and we'll tell you what he has done in the meantime and Fensieger Materiel Deghi and Martin Reiner have also been contributing a lot to these types of free energy calculations and are continuing to do so alright thank you very much for your attention and I'll be happy to answer any questions in the upcoming chat