 Hello, my name is Vitas and today I will be giving the second lecture on the alchemical free energy calculations and I will concentrate on the applications of these calculations on proteins, nucleic acids and ligands. So we will be dealing with mutating amino acids in proteins, nucleotides in DNA and we will apply arbitrary, we will apply arbitrary modifications to organic molecules, so drug-like compounds for example. And for all of these applications we will be, I will be referring to the package which we termed PMX, it's an in-house developed package in our group and it allows performing all of these alchemical modifications for you. So let's start our presentation with the amino acid mutations. We will deal with the 20 canonical amino acids here and we will deal with alchemical, with the alchemical perturbations to complete thermodynamic cycles like this one that I'm showing here. This cycle shows how to calculate the free energy of protein folding, so but not in the absolute, but the relative free energy of protein folding upon an amino acid mutation. So how much this folding free energy would be different if the blue amino acid for example here tryptophan would be replaced with the phenylalanine in this case with the red amino acid in this cycle. So we could, a naively thing we could calculate this by going in the vertical line, so performing the actual folding simulation, so one folding simulation and another one for the mutant and then taking the difference of these two potentially well-converged free energy estimates, but we can do it alchemically by going in the horizontal directions and obtaining the same, exactly the same delta-delta G. Of course such a simulation is not a standard one anymore if we need to actually mutate an amino acid on the fly, right? It requires a special treatment of our structures and topologies that we provide to the molecular dynamic simulation engine. We are using chromax in this case and how to tell chromax that we need a modified, an alchemically modified amino acid. Well that's exactly what PMX allows us to do, so how it works under the hood it actually creates a hybrid amino acid. Here I'm using an example, valine to phenylalanine, let's say we want to mutate valine to phenylalanine, we need to represent this amino acid, it's both states and it's and also it's hybrid state, right? So in state A it has to be valine, in state B it has to be phenylalanine and in between it has to be some linear interpolation between the two of them. So this all is achieved by having PMX make its magic under the hood and provide you with a hybrid structure and topology which you can plug in into the simulation engine. So how PMX works just very briefly, it is a set of Python scripts and modules which takes a structure and firstly it mutates, mutate.py, for example it mutates the amino acid and creates a hybrid structure, then PDB to GMX, that's a chromax tool which creates a topology and then we add the B state so we just adjusted the topology to get a hybrid topology. All of this workflow seems quite complex but it also can be achieved via PMX web server so you can see that exactly same workflow then is just hidden in the just in a few clicks, mouse clicks in this web server. And one more thing just before going into the applications I would like to tell a little bit about the non-acquilibrium free energy calculation protocol which we'll be using in all of these applications that I will show later. So it is a powerful approach, not so frequently used as the equilibrium approach however it has its own advantages and how it proceeds it's slightly different from the classical stratification approach which FEP approach for example uses. In this case we are simulating one state of the protein so wild type state in equilibrium state A, we also do the same, we simulate in equilibrium just run a molecular dynamic simulation in state B, we acquire the two equilibrium trajectories, extract the snapshots from them and start very rapid non-equilibrium transitions. Those transitions are on the order of hundreds of picoseconds so very quickly we introduce a mutation so from wild type to mutant and also going backwards from mutant to wild type. During each of these transitions we record the work required to perform this transition and plot them in a histogram way as it's visualized there in the small histograms on the slide. Once we do this we can also use crook's flutation theorem and extract the free energy difference between those two states. Right so we now have all the technical background that is needed to go into the applications and let's have a look at the protein thermostability. So we wanted to know what is the free energy difference upon amino acid mutation in protein folding. We looked at the Barney's and it's a very good test system because it has been studied experimentally very very thoroughly, here we collected 190 mutations at 55 positions you see that almost whole protein here is covered so we can mutate almost every residue in it and we can compare against the experimental measurement and this gives us a very interesting thermodynamic comparison of the force fields. We can check which force fields perform better and which worse in terms of agreement to the experiment. Here I'm showing averaged unsigned error AUE is averaged unsigned error from the experiment when averaging all the results for all of these hundred nineteen mutations and we can see that all the force fields perform quite well. The differences between the force fields is about one to two kilojoules per mole and the newer force fields they are charm 36 and amber with the modifications they perform significantly better than let's say an older force field in this case we used an older version of the OPLS force field. So we can start really looking at the thermodynamic comparison of the force fields and we can also look into an interesting avenue how to combine the results in the best way so here I will not go into much detail but we can start building now consensus results of the force fields either we can use some machine learning technique like partial squares technique to build one single best answer let's say by combining the results from the two force fields or from five of them and actually we noticed that even combining the results from only two force fields so our best amber version and charm version it once combined we would get a better answer than a better agreement with experiment than when using any of the single force fields. A similar result comes out also for a different protein we also probed it's not only working with that Barney's but also here Staphylococcal nucleus it also gives a very accurate free energy differences below 1 kilo calorie per mole this is like a golden limit which usually people strive to reach in accuracy of these calculations so we are well below that limit and also again the consensus approach works better than single force field taken separately we also can use these amino acid mutations to probe not only protein thermostability but for example also look at the drug resistance so in a number of applications we have looked now at how much the drug resistance would be chain drug resistance would be induced by a protein mutation here is an example of this HIV protease so protease itself is a protein which is necessary for HIV maturation and it is one of the main targets for all of the anti HIV drugs the virus however once it evolves it becomes resistant to the anti HIV drugs by means of mutations here are several mutations that I'm showing in blue so they are known from from sequencing of the of the genomes of viruses that have of the patients who have been treated with anti HIV drugs that those mutations that those positions are usually mutated and then again the virus escapes the drugs and the effects of the drugs so how could we probe for the effect of a mutation on a drug by means of the free energy different free energy calculations so here we would construct a slightly different thermodynamic cycle where we would have a mutation in the HIV in its apo form with respect to a mutation in the HIV in its hollow form where the drug is bound but all the other machinery can be is identical to the one that I described previously all the simulation technique and the non-acleogram free energy calculations now what we looked at was many of those mutations so here I marked here are marked a few of them and we looked at a number of known HIV protease inhibitors they are actual drugs that are on the market and we were able to recover a very good agreement with the experiment just by just by performing those exactly the same calculations that I outlined before and we are able to recover a very good correlation as you can see for six ligands so six known HIV drugs although there is a slight offset but in this case we did not employ our consensus portion we know that that would likely improve the results a little bit but we are already quite happy with such a good agreement with experiment in the next step then we looked with the even deeper we looked at the even more mutations however in this case we looked at those mutations that are further from the actual binding site of the ligand so this would be a particularly challenging because these mutations when they occur and make have an effect on the ligand binding they actually do not directly interact with the ligand so they make some indirect changes in the protein structure or maybe it's a dynamic or a steric network so first of all now we for for these mutations we did not have data to compare directly against Delta Delta G's with the experimentally measured Delta Delta G's it's not available usually what clinicians report usually are the restriction factors that are some that are only related to the Delta Delta G's but we were able to have to establish a connection between the Delta Delta G's and the restriction factors and we built a very reliable classifier of those that are susceptible so in the upper right quadrant those would be correctly identified resistant mutations resistance inducing mutations while those in the lower left quadrant would be correctly identified susceptible mutations so those that actually even increase the binding affinity or in this case restriction factor would be lowered for those mutations to a given drug so we have a very good classifier just which we extracted from the Delta Delta G values and these were previously known known mutations with the in affecting known livens but we also looked at the new mutations so in particular else 76 V that I have marked here this mutation is also known to either induce resistance or susceptibility and our collaborators have acquired a number of new genotypes that this mutation acts on or rather appears in and we were also almost in almost all cases just except of two outliers who were able to correctly classify those effects and once we were able to classify those effects we could also identify the underlying cause why this resistance or susceptibility occurs upon this mutation and a particularly interesting thing is that once mutated away this leucine 76 allows the allows a lysine and aspartate amino acids just to form a hydrogen bond which changes the overall asteric network and which changes then the interactions of a particular ligand binding to the active side so this mutation indirectly indirectly without acting directly on on the ligand it indirectly changes the binding affinity and in turn the restriction factors and induces the resistance or susceptibility to certain to other ligands not only can we do this for the HIV protease of course we my colleague Mateo Aldeghi he looked into a large number of of proteins that are binding various ligands that you can see in this figure so in total 17 systems with 27 different ligands bound and if you looked at more than a hundred mutations in those in all of those proteins to is to check how accurately can these drug resistant mutations be predicted and in this case again we we see that the consensus approach between amber 14 force field and charm 22 star gives an agreement with the experiment which is lower it's which reaches about 1 1.2 kilo calories per mole of difference from the experimental experimentally measured free energy differences so this would be it for the free energies for for the amino acid mutations and I also I have one more application that I I would like to talk about it does I mean acid mutations in the protein protein binding process however for this particular topic I will I would like to go into a bit more detail in the short talk that is scheduled for the afternoon and currently I would like to already switch to the DNA nucleic acid mutations so to in this case we will be mutating DNA and looking at the thermodynamic cycles like as visualized here I will come back to this in a few minutes now the underlying mechanism how we will be doing this is very similar so we will again be using PMX just in this case as previously we used amino acids to create hybrid structures and topology here we'll be using nuclear nucleic acids nucleotides to make a hybrid nucleotide structure and topology in this case so all the technical details are identical to the amino acids except that the library is also smaller and it contains different elements there right also we have a web server which can do the whole process automatically let's have a look at the application of protein DNA offer of these alchemical equations and protein DNA binding so the thermodynamic cycle is as follows we are interested in mutating a nuclear in nucleotide in a DNA which is in its unbound form so in and in comparison we as a reference we will use it so I'm sorry this will be our reference free energy Delta G1 and Delta G4 will be our DNA our our DNA mutation of a molecule bound to the protein and in total this will give us a relative free energy difference of DNA binding to the protein upon a nucleotide mutation to probe how accurately we can calculate these free energy differences we looked at a very large number so almost 400 mutations in 16 different protein DNA complexes these complexes comprise transcription factors and the nucleases you can you can see in more detail here if you're interested but right so they are very very diverse systems right and we come if we pull all the all of the data together so from all of these systems we can see plot them experimental versus calculate the free energy differences we can see that we get the accuracies that are between that at most accurate accurate predictions would reach in total average would reach five and a half kilojoules per mole error from the experiment this is slightly worse than the desired one kilo calories or four kilojoules per mole error that we saw for the amino acid mutations however of course in this case there are several things to note here for example that the experimental values if we look at the plot on the left they seem as if they are have a very strict capping region so this indicates that it's likely that experimental measurements themselves they have a certain range where they could measure the changes in the affinity however of course calculation does not does not have any limitation in that respect so maybe it is also both inaccuracies in the experiment or limitation of the measurement range from the experimental side and also inaccuracies that we have from our calculations that contribute to the slightly larger error and again we improved amber charm and the consensus force field and we see that if we have a consensus approach we are outperforming any individual force field as taken separately now where where could we apply such a calculation of delta delta g for protein DNA binding well we could construct something what is called a DNA profile DNA binding profile so this is a profile that is acquired from an experimental measurement and it shows what frequency of a nucleotide you would find at a given DNA position so in what what the nucleotide would be preferred there we could also recalculate such profiles from our calculated delta delta g values and once we do that we obtain well as similar profiles and now it's interesting to note that the top profile the one on the top panel is obtained from our calculations however the four lower profiles are obtained from different experimental techniques and we can see that it is and we can even quantify I'm not going to the details here but our computational profile is a very much indistinguishable from the from the different computational techniques so it could be considered to fall well within the experimental error of identification of these profiles and all right with this we can proceed with to our third part to the ligand modifications ligand free-engine calculations and here here we will be looking at a slightly yeah different different approach because the same strategy will not work anymore so pre-building a library for canonical amino acids or nucleotide mutations will not work because ligand modifications can be very a set of them can be very diverse and every ligand pair that we would consider would well would be unique since the chemical space is nearly infinite so what we do we apply a different strategy here we build three different modules that would apply that would allow dealing with any arbitrary modification for that we would firstly need to identify atoms that would need to be morphed between the two considered ligands to do that we can either align the two molecules if they're similar then the mapping will be quite good however if they are not similar in in their own 3d structure we rely on the maximum common substructure so on a graph based comparison of the two molecules this is all encoded in in a rather complex workflow but the user doesn't need to worry about this simply the algorithm will be probing one path will be probing another part and will be choosing a more suitable solution for example if we were to align these two molecules we could identify either on the left maximum common substructure which would be one a solution alignment would yield a different solution and then the algorithm would be intelligent enough to choose the better one according to its measures subsequently we have an automated technique to build hybrid structures and topologies and we also have a suggestions for ligand pairs in a small network of compounds for example if we have a network of compounds if we have a compound library that looks like this here visualized we would like to know which ligand would we like to mutate into which so that we cover all this chemical library so we build a pairwise matrix so based on the distances that our algorithm calculates intrinsically and it devices a distance matrix subsequently the distance matrix can be can be can be built into a can be transformed into a minimum spanning tree or several minimum several several trees so that we would have redundancies in the in the library now let's look how we could apply this in this approach so we have scanned a very large set of library a very large set of ligand modifications for ligands binding to a protein and so in in a sense we were acquiring delta delta G's in the following thermodynamic cycles where we looked at the ligand modification in water and also in a ligand in the same ligand modification when the ligand is bound to the protein thus giving us the free energy difference of ligand modification of ligand binding to the protein upon that modification now we scan 11 systems and probed almost 500 ligand modifications and here is the overall summary of the results there is a comparison of different force fields gas CGNFF and our consensus and on the and on the left here we see also the FEP plus which is a different approach from a commercial software company Schrodinger company which is frequently used in the pharmaceutical in the pharmaceutical industry if we look now at the accuracies in terms for example let's look at the average and signed error the top panel there we see that a consensus approach performs either as well or even outperforms the best results of the FEP plus and in the person correlation we are also well within the error bars of the commercial best commercial software and again consensus approach performs better than any single force field taken considered separately now the performance of calculations strongly depends on the system that is being that is being explored here I have now broken down all these results by system on the x-axis there is a in each of these plots it's the experimental value and on the y-axis is the calculated and delta-delta g-value we can see that the more red dots there are the the worst the prediction the worst the calculation accuracy is and we can see that some systems really perform really well very very the clouds look very blue there so we have a high accuracy but but it strongly depends strongly depends on the system now we for the last three minutes of my talk I would like to also tell about the absolute free energy calculations because not only can we do the relative free energy calculations we can also look at the at the disappearing or appearing the ligand full ligand in this case we would not get a delta-delta g but we would really aim to compute the absolute delta g value this would however require slightly different protocol we would require restraints so the ligand when it's decoupled there on the right side it would need to be restrained somehow to the protein and this those restraints would need to be taken into account in the free energy estimation in the perturbations are of course larger in this case and hence the convergence will be slower the protocol here is a can be visualized also as this as this thermodynamic cycle we want to disappear the ligand so decouple it from solvent on the left side apply the restraints and recouple it in the when it's bound to the protein and also remove the restraints so the cycle is slightly more complex than those cycles that I was showing before now as a test how how well our approaches work we can use a test system which has been recorded which has been collected by al-Deghi and the colleagues so in 2017 so several years ago material collected a set of 22 bromo-domain proteins those are proteins that interact strongly with the bromo-sporin which is a ligand inhibiting those bromo-domains and there is also experimental data available on the absolute binding freeanges of bromo-sporin to all of these bromo-domains and what Matteo did then in 2017 he calculated he used the Hamiltonian replica exchange enhanced free energy perturbation technique so equilibrium technique and obtained the result that are visualized in the upper left panel now we wanted to see what would happen if we were to apply either not apply this enhanced replica exchange protocol just simple FEP we see that this in the middle panel in the upper row middle panel we see that the accuracy drops so from 1.5 in the average unsigned error we drop to 2.3 average unsigned error that's quite a large drop in kilo calories per mole and what if we apply the non-equilibrium free energy protocol well we gain inaccuracy and accuracy becomes almost identical to that which is achieved by Hamiltonian replica exchange FEP approach in terms of Pearson correlation we actually with the non-equilibrium approach can reach the highest accuracy of these of these three methods almost reaching 0.6 in Pearson correlation the we can also look at the breakdown by system but of each of these energy estimations but maybe it's more interesting to look at the convergence so we here on the in the panel c i'm showing the convergence or rather average unsigned error of all of these calculations in with the depending on the sampling time used per obtaining a single delta g value and we see that the non-equilibrium free energy protocol converges very quickly to its final accuracy while the non-enhanced free energy perturbation protocol takes takes quite some time and even even after the 650 nanoseconds invested per single delta g calculation is still has not converged to the enhanced sampling technique result all right so with this i think i'm finished and just to summarize what we looked at today was a alchemical free energy calculations of amino acid mutations nucleotide mutations and ligand modifications as well as a little glimpse at the absolute ligand free energy calculations yeah thank you my acknowledgement slides and thank you