 Hello. Welcome to today's session on Alchemical Free Energy Calculations. My name is Berthe Groth and together with Vitas Gabses will be in charge of the session. We'll start with a talk from my side on the general concepts and then Vitas will take over and give you a number of in-depth examples of what one can do with these kind of calculations. And after that we'll do some hands-on calculations computing alchemical free energies ourselves. So why are we so interested in free energy calculations? Well if we think about affinities for example of drug binding to a protein or also of protein-protein affinity these are free energies of binding. Also if we think of stabilities, so yeah how does the stability of a protein change for example upon mutation this is also about free energy in this case the free energy of folding. Also if we are interested in rates, for example chemical reactions or conformational transitions these are also determined by free energies in that case it's about the free energy barriers. And the list goes on and on and on so free energies are really a central thermodynamic quantity that we're interested in if we're interested in different aspects of biomolecular function. And there's a small bonus it turns out that free energies are actually frequently more tractable in simulations than other thermodynamic quantities and so that's very useful. So how does this all fit in? Well of course due to their central importance there are many different flavors of free energy calculations. And this is just a broad but not complete overview of different states. So we have purely statistical methods those can be ligand based so this is the realm of cheminformatics which can also be structure based. We have hydrate methods perhaps you've heard about methods like Rosetta for example that uses a statistical force field. There's also methods like MMPBSA which are somewhat based on simulations but then also make sort of a regression model to make some shortcuts basically or one can also go to completely first principles based so this is based on the first principles of statistical mechanics and again there's different sub-flavors of those so if one is lucky you can just do a plane brute force MD simulation and then do the counting of the probabilities of states that you encounter and the free energy is nothing but a probability we'll get to that in a second but frequently we're not so lucky and then we need to bias our sampling and you've already heard about methods like metodynamics but today the focus will be on the so-called alchemical methods and I'll tell you all about how that works. Now why don't we just take a textbook and look up the definition of the free energy where we will find that for example one definition of the Gibbs free energy is simply this one which is probably very familiar to you which says there is an enthalpy contribution and a minus TS term which is the temperature minus an entropic part and this would indeed be a valid way of computing a free energy unfortunately this is also very difficult because obviously one would need to have access to the enthalpy well you could say we have energies from our simulation so that should be easy well turns out this term is difficult to converge and this term is even more difficult to compute frequently because entropy is a beast so what can we do about that? Well like I said if we are lucky we can just have a very long free unbiased simulation where everything that we're interested in already occurs spontaneously say you know a ligand is binding and unbinding from the protein of interest we simply count how frequently it's bound and how frequently it's unbound and then this Boltzmann relation here tells us that this probability of finding the state so say the bound state or the unbound state is directly proportional to its free energy so that's very simple that's very useful but that's also yeah it depends on us seeing everything spontaneously and this is as you know we have the sampling challenge and molecular dynamic simulations frequently which makes it the case that this Boltzmann formulaism is not always applicable and that's why we have this whole range of alternative methods as well just to make this point even clearer of how free energies and probabilities are related or actually one and the same thing so here we look at the peptide that is in the course of a simulation constantly unfolding and refolding over and over again just by chance and what we can see is that at room temperature it's mostly in the folded state so it has a low deviation from the experimental folded structure but at higher temperatures at more or less the melting temperature it is more or less 50-50 populated in the unfolded and folded state and from that we immediately can say well if the population is 50-50 then also the free energies of the folded and unfolded state should be very similar because equal or similar probabilities means similar free energies and of course this is the case so we can make good use of that and define the free energy exactly directly based on the probability then you get sort of a heat map like this so here the color codes for the local density so for the free energy of a particular state and we see these different states emerging so if we look at each individual structure then of course we don't have a distinction like here either folded or unfolded but it could be anything else as well so here for example we have the folded structure as indeed the densest or therefore the lowest free energy cluster but we also have this very similar cluster which is folded on the n-terminal part of the peptide but not on the c-terminal part and then we have this very different structure that also has a relatively low free energy but it's a completely different fold and then we have all the unfolded structures together that also have a low free energy but we can already appreciate here that this is not one state but these are very different structures all of which counts together as the unfolded state this is just another way of representing the same data so again we have the free energy computed from the local probability but now we haven't only color coded it so from blue as low free energy now to red high free energy but we also use it as the vertical axis whereas on the horizontal axis we use one of these conformational coordinates that we had used before as well and this representation actually nicely shows the effect of temperature also so if we are at low temperature then we see that these low free energy basins are more occupied whereas at higher temperature we see that entropy kicks in that's this minus Ts term that we were talking about before which makes that the unfolded state makes a larger contribution and the system doesn't feel this minima as so attractive anymore but like I said this is all very nice but it depends on us sampling all the relevant states spontaneously and unfortunately that's not always the case so that's why it frequently makes sense to think about alternatives and alchemical free energies is often an interesting choice and I'll try to convince you of that so let's consider the following process so say we're interested in the binding of a particular ligand to a particular protein and we could say well we have everything we need to look at the binding free energy we look at the bound state it will have a certain probability and if it undergoes spontaneous binding and unbinding we can just look at the probability of being in the unbound state and then the difference between those two will be the binding free energy and from the probability of being in the sort of intermediate state or from the on and off rates we actually also get an impression of the barrier of the free energy barrier that is separating the bound from the unbound state and if it doesn't happen spontaneously well then we can always apply something like umbrella sampling or metodynamics to make this process happen so one could say we have everything we need so job done well this is true but one may also realize that we spend a lot of time in this whole process of binding and unbinding and particularly if this barrier is high this can become quite problematic and costly and then we may also realize that if we're only interested in the difference between the bound and the unbound status here and we're not so interested in the actual binding process or the free energy barrier separating them then we're actually spending most of our computational effort in say umbrella sampling or metodynamics in this in-between process whereas we're actually only interested in the difference between the end states the bound and the unbound so can't we do something about that? well yes we can because we can consider this binding and unbinding process like we did before and say we do that for one particular ligand and say we're not just interested in the affinity of that ligand but we're actually more interested in seeing if this other ligand which is somewhat related is a better binder or a worse binder so if the binding affinity for this blue ligand is different from that of the green ligand now one way to go about that would be to compute the umbrella sampling metodynamics free energy for the green ligand you get the delta G binding for ligand A you do the same for the blue ligand you get the delta G binding, ligand B, fine and that will give the difference in free energy between the two if we just subtract those two and now the alchemical idea or approach is well we could also do the following we could just transform ligand A into ligand B once in the unbound form and once in the bound form and compute a free energy for that situation and now the nice thing of thermodynamics is that this thing will have to close and if it doesn't matter if we would go from here to here to here we should get the same answer as if we would go from here to here to here free energy is a state variable so the answer we get should be path independent and that means the difference between these two that we were interested in must be the same as the difference between these two that we can compute by these alchemical transformations transforming ligand A into ligand B and so if that is a simpler process this alchemical transformation then this unbinding transformation then we have actually gained something and it turns out that this is frequently the case so that these vertical lags are much more accessible than the horizontal lags of the cycle that means we can just compute the two alchemical transformations once in the unbound, once in the bound state subtract the two and we get exactly the delta G that we're interested in so that's what I said we can use an alchemical approach like free energy perturbation or thermodynamic integration for example to sample those alchemical lags that work one approach there that is commonly used and one of the first to have been employed is thermodynamic integration where we define a coordinate usually called lambda which we define to be zero for our state A so our blue ligand we define this coordinate to be one for the B state and then one can write down this equation for the free energy which is an integral over lambda of the Hamiltonian derivative with respect to lambda so for the physicist among you this is like a force acting on this lambda coordinate so we just take the derivative of our Hamiltonian with respect to lambda so this is our energy function and we compute that in every step and then do a simulation in which we start from lambda equals one we slowly transform it sorry we start with lambda equals zero then we transform it slowly during the simulation to lambda equals one and at each step we compute this derivative integrate over it and this is our delta G and we could do this gradually during the simulation this is called slow growth thermodynamic integration we could also stop at different points and there collect and average our derivative that is called discrete T.I. it all comes down to the same thing so this is a valid and popular approach for computing alchemical changes some things to think about maybe in the meantime and happy to discuss that afterwards or during a coffee break we see that we take the derivative here with respect to our energy function and we call that delta G whereas previously let me just briefly go back you may remember this formula here where we said the free energy has an entropic component and an entropic component and where now we just have this energy related term and integrate to that and they call it the delta G so is that justified or where did the minus T S term go that would be the question so is it neglected, is it included happy to discuss that another approach to achieve the exact same thing is to use free energy perturbation it's based on the same principle lambda coordinate 0 for A, 1 for B but we use a different formula and that is we only simulate A and take snapshots from that and recompute the energy of that snapshots for the B state so that is what this equation means these brackets I should have mentioned that before those are ensemble averages this is averaging over a whole ensemble and if it says ensemble A it means we have simulated ensemble A so we've just only simulated the A state that lambda equals 0 but we don't only compute the energies of state A but we also for each snapshot compute what would have been the energy if we would have had ligand B here and the so-called 20 formula tells us that evaluating this expression also gives us delta G we could have simulated also state B of course and then done the same thing backwards now this works very well if our A and B states are very similar so in phase space that means there's sufficient overlap and this is usually the case if there's only a small perturbation going from A to B so say you want to go from a phenylalanine to a tyrosine where you just change a hydroxyl group on amino acid then this may work well but of course you may wonder if the A simulation state is not representative for any states that the B state would have visited then this of course is not going to work very well and this is exactly what you would see in deviations or bias in your delta G so what you can do then of course is just define some intermediate states so at different intermediate lambdas and redo the whole formulaism and get your answer from that and be aware of course that you sum all these free energies and therefore you would also suffer from error propagation so if you need many intermediate states you typically get larger error bars so yeah of course there are some opposing cons for each of these flavors there is for TI there may be sorry FEP is here on the left so for FEP it may be very sensitive to the choice of lambdas step yeah because we have these exponential terms here that can be quite numerically sensitive there is some simulation time lost due to equilibration and yeah I already mentioned this this exponential averaging for that reason TI could be considered more robust but what you get of course if you do it in the slow growth approach is that every time every simulation step you change lambda and the system may need some time to adapt to that so you could suffer from some Hamiltonian lag but you don't suffer from this exponential averaging so that's an advantage right and yeah the same trick that I already explained for the ligand binding and this alchemical transformation you can apply in different contexts so here is an example from a thermodynamic cycle applied to protein stability so if we would be wondering how a mutation is changing the stability of a protein then of course the question is so how is the folding free energy changed well experimentally you would look at the folding-on-folding equilibrium you would add for the wild type you would see say a certain unfolding temperature you would do the same for the mutant and then you would see is the mutant more or less stable than the wild type here we can apply the exact same trick we can do the mutation alchemically so transform one amino acid into another and we do that once in the unfolded state we do that once in the folded state and we then subtract it to and this is exactly the same as the difference that we're interested in so again with an alchemical transformation without ever sampling an unfolding-folding transition which may be cumbersome we don't need those transitions anymore we can just look at the alchemical free energy changed due to the enforced mutation okay now this is how the status of the field was until maybe something like 10 years ago and so everything I told you so far assumes that our simulations are carried out in equilibrium and that each lambda state also we first need to reach an equilibrium and then sample states from that equilibrium and so that's the classical statistical mechanics way of defining free energies now with the advent of non-equilibrium methods by people like Gershinsky and Crooks we can actually also make use of non-equilibrium simulations so that means fast transitions going from A to B and nevertheless extract free energy calculations from that and how that works I can explain in a second so what we do here is the same formalism that we have seen before with thermodynamic integration but with one important difference so what we do instead here is we have a long simulation for state A and this is just a plain normal MD simulation and we do the same for state B here in this example it's called wild type versus mutant but it could also be ligand A versus ligand B for example it's the same principle and what we do then is we do fast transitions so fast meaning just picoseconds timescale in which we apply the strict of changing lambda so in this direction from 0 to 1 so we take snapshots from this equilibrium ensemble where lambda is defined as 0 and we then apply the strict of taking snapshots and then switching lambda to 1 in just a very short time and vice versa we do the same from the B state we collect snapshots and switch lambda to 0 starting from that and then we apply the same formula as what we've seen before already for thermodynamic integration so we take the derivative of our Hamiltonian with respect to lambda and compute the integral going from lambda going from 0 to 1 note that here I don't call the result a delta G because it isn't it's a work value and it's not a free energy in this case because we move so fast that in addition to the free energy change there may also be some dissipated work you know that for example if you move your hand through water if you do that very slowly then you almost notice no resistance but you try to do it very fast then you notice the friction of the water and you have to apply more and more force and therefore more and more energy to move your hand from A to B well the same applies here so in addition to the free energy we also have dissipated work because of friction and that also contributes to our work value but now the very nice thing is that we can get rid of this dissipated work again by for example the Krug's fluctuation theorem which says that these distributions of work values if we look at the work values in the forward direction and we look at the work values in the backward direction they have this very nice relation that they actually are connected to the delta G that we're after and if we just look at the probability of finding a forward work value so that's the blue distribution and it's related to finding the same work value negative in the reverse direction those two probabilities are related to the delta G and there's one special case where these two probabilities are equal meaning that's the place where these two distributions overlap where they cross that is the estimate of our delta G if two distributions have equal probability then this is one meaning that if e to the power something is one then this something must be zero so meaning that w must equal delta G so at that point we have reached our delta G and so this is an actual way to estimate free energies and it's a very powerful way because we can just spend almost all of our simulations in the states that we're interested in so in the actual physical states of the system just do a couple of short non-equilibrium transitions to connect the two and we know the free energy difference between them this is just one particular example where we can see this at work so from actual simulation snapshots so here we took a snapshot from two equilibrium simulations that were 100 nanoseconds long we took a snapshot every nanosecond I think and each of them led to a work value from which we built this distribution and yeah we can estimate our delta G as exactly where these two distributions are expected to cross right so let's have a look at some applications where can we put this into practice and obviously one area of application is drug design and this was one of our very first applications where Vita has looked into thrombin inhibitors so this is thrombin, it's a protein and we see one particular ligand bound here and there has been an extensive experimental study where lots of different modifications were made to this ligand at these locations Martin Green and for which experimentally affinity changes were measured so how these modifications at these locations change the affinity of those ligands and those were exactly the same modifications we also applied in our simulations and computed or estimated free energy change for here actually the thermodynamic cycle we used was slightly different but it's based on the same principle that we've looked at before so we are interested in a bound state we modified the ligand so here we add an additional purple group to the blue ligand and we make that alchemical transformation once in the unbound state and we don't need to simulate the protein in that lag because it's not bound anyway in the bound state we simulate the complex and introduce the same modification there and apply the same trick that you by now know we can actually do it all in a single simulation box so we can have the two legs of the cycle the bound state and the unbound state in the same simulation box which is useful sometimes for example if we are looking at charge changes then we avoid any changes of the total charge if we put the both systems in the same box and move in opposite directions but you get the overall idea and these are the results that we got so here we just have a bar plot of the set that we were looking at so in cyan you see the calculated value and in orange the measured value from ITC this is our reference ligand and you see the modifications here so some aromatic groups were changed at different locations some other modifications, methyl groups added and so on and overall we see quite a favorable agreement so frequently if the simulation says that this is the worst binder then also the experiment says that sometimes there are differences as well sometimes even qualitative but this is about the accuracy that one can expect so here we have a scatter plot of one versus the other so computed versus experimental we see an overall quite favorable agreement with correlation of something like 0.8 so that's nice to see you also see these dashed lines here which is a deviation of 1 kcal per mole that is sort of the gold standard in the field and we typically reach that with these kind of calculations few are higher but on average we are within that ballpark meaning that that's about the accuracy of these type of simulations currently another example I want to briefly show you and this is the last example I want to share before we hand over to Vitas this is an application on protein mutations where we try to design a functional change into a protein and for that we looked at the protein ubiquitin which we see here depicted as a solution ensemble so this is ubiquitin free in solution where it adopts different confirmations and when we started looking at this we realized that you also find many instances of ubiquitin in the protein data bank and actually in many of these instances ubiquitin is bound to something else so it's available in many different protein complexes and in each of these complexes you typically find ubiquitin in a slightly different conformation so the first thing we ask is ok so are the confirmations that we see free in solution do they have anything in common with the specific confirmations that ubiquitin needs for specific complexes well the answer is yes so this is a principle component projection of the solution ensemble that we have here in red and the set of x-ray conformers in black so if you don't know how PCA works it's important to note that each dot here is a single structure so it's in a single ensemble member and each black dot for example is one x-ray structure and PCA looks at the largest fluctuations so the single largest collective motion is actually depicted here in this movement it's what we call a pincer movement so it's ubiquitin opening and closing and so we see that in some crystal structures the black ones here on the right side they are in a more closed state of the system so where this pincer is in a more closed state whereas other complexes apparently require ubiquitin to be in this open state and in the solution ensemble we see both open and closed states meaning that all of the required states that ubiquitin needs to engage in complex formation are already sample free in solution which is one prerequisite fulfilled for conformational selection so that was the first thing we found but then we thought apparently this native wild type ubiquitin can undergo this opening closing motion all the time so could we think of a mutant of ubiquitin that would be locked in either a more closed state or in a more open state so and if that's the case would it change its binding behavior so then it might not be compatible anymore with structures that require the other state well first thing to know for that is of course what we saw here for fluctuations in the free state is that different in the bound state yes it is although in some cases it doesn't matter so this is the same projection as before but we always compare the free state in red which is always the same to different bound states or different complexes in blue so every blue cloud is a different complex and we see in some complexes it doesn't matter so ubiquitin is also free to open and close but in some it's actually restricted in either a closed or in an open state so those are the interesting ones because if we now would have a mutant ubiquitin that we could lock in either open or closed then we would expect that we can change something to the preferences of those complexes because they would not be compatible anymore possibly so this is basically the idea native ubiquitin undergoes this constant opening and closing and can form complexes with states that require ubiquitin either to be in a closed or an open state but what if we can mutate ubiquitin say to be only or mostly available in an open state then of course these complexes would readily form and we would not expect any major difference there but the situation would be different for these because here we would expect a large destabilization well, we tested this with an alchemical screening so we did many many mutations over a hundred and to make a very long story short we found a handful that would be predicted to be open stabilizing most mutations don't have any shift in the preference but we also found a handful that were predicted to be closed stabilizing so now it becomes interesting to see what they do to the mutants well, here are the ones that we would predict to be compatible so basically in this along these lines where we would not expect any change and then we also have the category of course where we do expect a change so where we don't expect a change we see almost no effect on the affinity so meaning a difference in affinity of close to zero this is another control group where again except for one outlier maybe we see values close to zero but these are the ones where we expect incompatibilities and in fact we do see also large shifts in free energies except for one again but mostly we see the expected behavior so here we have a prediction of what we would expect to be a destabilization of the complex because the complex partner would require ubiquitin in a particular conformation but we have a mutant that prefers the other and then of course it's interesting to test if this also holds experimentally and we did that for a couple of these mutants and yeah so for mutations where we expected a small effect so green and purple are two different simulation results and orange is the experiment so for those two where we expected a small effect or no effect we basically saw a small or no effect but for those where we expected or predicted a destabilization we also saw one so in fact yeah we have here designed a change in function by inducing a conformational shift by sort of selected mutations yeah and with that I would like to thank all the people who have contributed so yeah this type of free energy calculations all started with Daniel Seliker who is now working at Bernger Ingelheim and is currently mostly done by VitaScapsis with also contributions from Vance and Matteo over time and more recently also by Yuri Kalak in the group yeah thank you very much