 Okay, welcome everybody to the bioxial webinar. It's the webinar number 63, and the webinar today is about Gromach's PMX for large scale alchemical protein, vegan binding affinity screening. The presenter is Vita Gapsus from the Max Planck Institute for Multidisciplinary Science. I, Alessandra Villa, we lost this webinar together with Arnold Proum from the University of Edinburgh. So today's presenter, today's presenter is a project leader, it's Vita Gapsus, that is a project leader at Max Planck Institute for Multidisciplinary Science in computational biomolecular dynamics group. This project focuses mainly on alchemical free energy calculation, and that include method development implementation in the PMX package, but also application to protein nucleic acid mutation, as well as the ligand mutation. I'm very happy that Vita's accept our invitation, and I'm looking forward to listen to him. Thank you for the invitation and the introduction. Today, so right today I will tell a little bit about the alchemical free energy calculations and mainly will focus on our latest developments for protein ligand binding with an emphasis of our efforts to make it really high throughput. And I'm very much for for guiding the drug design projects and also on the absolute binding affinity predictions. But before that I would like to spend a few minutes on introduction to so to get everyone on the same page. If you already know what chemical approaches are, you may also just get a cup of coffee in the meantime it will, it will be very quick it will be just a few minutes but I would like to give a very brief basic introduction to this. So the computational alchemy is, what is it that does it is a method to calculate free energy differences between the physical states that we I will talk about how, how we define and what we mean in this context for for the free energy calculations, and it relies on the molecular dynamics simulations has its sampling engine. So molecular dynamic simulations. Many of you have heard it, it is in, in this context we're interested in a simulating biomolecular systems which I'm showing here so it would be a full atomistic representation with full full atomistic explicit the salvation of proteins here I'm showing a protein example but it would also be nucleic acids and small molecules, and also explicit addition of ions allows reaching us some well defined salt concentration, we can sample the phase space by well controlling, controlled, controlling the temperature via latent pressure, and we integrate equations of motion over time and we get a trajectory that would look something like this just wiggling off of all the atoms. And as an outcome of such a sampling engine, we obtain well defined statistical ensembles at an atomistic resolution. So this is a very, very brief intro into our sampling engine. With that, of course, what, as I already mentioned, we want to use the sampling engine to calculate free energy differences. So this is the second ingredient into the methods that we need to understand here. And now, to illustrate this what what we want to actually calculate I'll use an example because I feel maybe it's, it will be the most intuitive to understand this chemical approach through an example so let's, let's formulate a question let's question thermostability in a protein due to a mutation. How do, how would be intuitively approach this this question. So let's take a protein, and using our molecular dynamic simulation sampling engine. Let's allow it to fold and unfold many times. So we have here I'm just showing a protein where we simply are going between the two states very well we simulate very long. Many times, we visit both states. This way we basically calculate the frequency of attend of visiting these states we approximate probabilities with these frequencies and then convert those probabilities into the free energy difference now we get the free energy difference between folded unfolded state for this protein. So computationally this is very expensive and tractable only for very small proteins. And so let's say it's it's a very difficult problem computationally to solve. So how do we go about it. Well, let's make this problem twice as complicated as before. Let's introduce another branch. So let's introduce now a mutation and do exactly the same. So we have to simulate again very long simulations for sampling folders and unfolded states. And now you may, you may tell me well, how does it help us if we had a very an intractable problem before now we have double intractable. So the notion that free energy is a state variable it doesn't depend on the path, and we circumvent this, this very computational expensive parts by drawing these horizontal arrows, and instead of sampling the folding and folding, we now sample that such a mutation from blue in this case into the red amino acid, once in the folded state once in unfolded state and effectively the delta delta G that we recover so the free energy difference is still the same delta delta G that we would get via horizontal via vertical arrows. Of course, this transformation is none, not physical because we need to literally morph the blue amino acids into the red one. So that's why the method is called alchemical. It's not physical but rather complete possible only via our computational methodologies. So in combination of molecular dynamic simulations and then this alchemical approach, we can now start answering interesting questions. Not only can we answer the questions about protein thermostability but here I'm drawing several several different cycles. So, we can also answer questions about, for example, how strongly the, how, how much the binding affinity of a DNA to a protein changes upon nuclear type mutation, or how much the binding affinity of a ligand changes upon ligand modification and similar questions can be answered by, yeah, of course by performing free energy calculations. And these, these calculations require MD sampling, which is no longer just this vanilla standard simulation but it requires some space setup, and that's exactly what the PMX software package does. It prepares the input hybrid structures and topologies for for gromach for simulations in gromachs to perform alchemical free energy calculations. And today, so this, this is all of my introduction. I would like today to concentrate on this particular cycle for relative binding free energies of protein ligand complexes. So we will be interested to check how strongly one ligand binds to the protein with respect to another ligand. And this question. Yeah, it has been, of course, on the minds of many scientists already for a long time and it has been announced, like a very, very accurate measurements have been demonstrated already 30 years ago or so. Or even more, just by showing well for one or several cases that yeah it is really possible to use such a computational technique. But of course it took a long time until this all came to reality to the real drug design pipelines and yeah almost, yeah 30 years past until scientists from this case Schrodinger incorporated when they showed that one can really obtain accurate and reliable free energy free energy predictions for the small molecules, small molecules binding to protein targets. And yeah, we in with gromachs in PMX we also have our own implementation of this approach and already a few years ago we have demonstrated it by collecting a large set of protein ligand complexes. And here I'm showing 11 diverse systems so it works on on yeah on diverse targets with using very many modifications almost 500 ligand modifications in this case, and if we just compare the outcome of our implications to those of of Schrodinger's implementation we see that we have very comparable accuracies. So usually we look at such plus as experimental delta delta G's versus calculated delta delta G's and we see that. The accuracies are virtually indistinguishable. And of course there are you can quickly notice some differences and let's say in the most notably in the uncertain estimations this is mainly due to the methodologies how we estimate our uncertainty but overall the accuracy is very much comparable to that of a commercial software. If we look by case so here I split exactly the same figure just by by protein ligand system we see that there are some case it is case dependent. And this could be due to both experimental measurement. Because those of course come from diverse public data sources, or also could be due to the difficulties in particular capturing some particular particular chemistry of small molecules. But so overall we have a high high accuracy of these predictions but it might be that in some cases like MCL one here we could have also also larger answer larger inaccuracies as well. This is just more to bring to general accuracy to to notion. And this was our slightly earlier work and what we have tried in the really in the last years we tried now being satisfied with the accuracy that we achieve we tried to bring this to really high throughput calculations because it is in the drug design projects it is always interesting to carry out these predictions, let's say in an overnight manner, and that we that that suggestions to the medicinal chemists could be made. As soon as possible as quickly as possible, that they could be later tested in the web lab and decide post that could continue and continue in a, yeah, quick turnaround. And for that, we took another benchmark data set collected by by, yeah, now publicly available and assembled by Merck. And here Christina Schindler and others, they have collected such a data set which is not particularly tailored towards the convenience of computational chemists but rather tailored to represent the real data sets as they are encountered in the pharmaceutical industry environment. So it is, those should be reflecting the realistic realistic cases in, in, yeah, in pharmacy. And this, this is a quite a large data set of eight systems, more than 500 delta delta G's, and we we just set out to probe different force fields and run independent replicas. And so this, this is a quite a large computational challenge. Okay. Now, we ask a question. If we have a high access to a high throughput machine, let's say, Max Planck supercomputer Raven would be made available to us, it has about 500 nodes and would could offer us three days of calculation, what our throughput, what would our workflow to calculate the energy differences work, work out would we be able to actually just efficiently calculate all this data set in a could, could we make use of these resources differently, and come up with accurate predictions. So for that we coined the same. The way we build this workflow where we. It's very simple work for just five steps where in black I'm showing just those that we can perform on the local workstation without any overhead. So those are preparing the, the proteins and ligands themselves that are the technologies, then we prepare the structures that we have this small step where we equilibrated a little bit in our in house cluster so that just to make sure that everything is fine. But in principle the step can also be pushed on the on the on a supercomputer and then we ran everything on our large cluster on HPC in HPC facility and so indeed in three days we can accomplish such a task of running about 20,000 independent jobs without without any problem. And the accuracy is that we get here, no need to go into particular details are just more emphasize that in indeed we get overall for the for comparison we get very comparable accuracy to those that I was showing before in the earlier, earlier plots, we do have again, similarly we do have a spread over between the different systems so very similar trends as well as as as we observed before. Okay, so this was, if we have access to a very large HPC machine. And so we can work, but then we also probed the second. Yeah, the second direction that is emerging nowadays is a cloud infrastructure. So for that we teamed up with them. AWS team, and built a slightly, a slightly modified version of the previous workflow that I was showing. In this case, we take the same 20,000 jobs, and we use hyper batch. Now this is a terminology of AWS, maybe it doesn't matter so much. But we use a job dispatcher, which will dispatch all of those jobs correctly into different regions. So, all over the world onto the clusters of onto Amazon compute clusters. And then we'll run this workflow that I'm showing here interact with containers container registry to do to pull correct containers for the diverse compute architectures and interact with the stream buckets to pull and push the data and basically accomplish the same thing. Hopefully, again, with our set time of three days this magical time limit that we set for ourselves. And let's see if it worked out. And it, yeah, it is very much possible. Here I'm showing the trace the time trace of what we achieved. We started on Monday here on X axis this time, when we started on Monday in the evening, our completion start and on Wednesday in the evening, all of them are finished. And the, I'm breaking the coloring is by simply by region where we're in the world in which in which compute cluster they were running. Interesting is, are the why scales. So, for example, the largest peak of CPUs that we could scale up to is was a virtual CPU so 140,000, and we were able at most and we submitted in two ways, just for our convenience. In the first wave, we use GPUs and another we didn't, and the GPUs we could secure 3000 GPUs at simultaneously so that's really, really great throughput. And I'm also showing here that here I'm coloring the same two waves I'm just calling them by different architecture so we could use a very diverse. The green are GPU based notes, blue ones are just the most abundant Intel notes, and then also several other architectures we included in there. So in principle one can diversify on the cloud one can diversify your job, your jobs on two different notes and get throughput gain and throughput that way. And now of course we calculated all of these free energy differences now we can also estimate so how much does it cost that is it is it also price performance competitive in terms of in comparison to HPC. Well, let's have a look on x axis I'm showing time to solution for free energy so how many hours will we need to wait to get one free energy value. So in the y axis I'm showing the total costs of one free energy difference calculation. Now, each of these lines represents represents three different systems, the so protein ligand complexes on the left and it will be always the smallest system in the middle there where the point is, it is the medium system and the larger one will be on the right and the line. And, well, we see several regions emerging so that region is the one where the calculations take long, and they are expensive. So these are CPU based small notes, simply on those notes one needs to, they don't cause themselves so much, but one needs to run simulations very long, and this simply accumulates the cost. So, in the top left quadrant, we see, yeah, those calculations that are fast, but they are very expensive so those are large CPU based notes, and what is really giving boost is the green green quadrant. We have all the GPU notes so those are really cost efficient dropping below $15 per free energy difference, and they are becoming really competitive, even to a customized consumer GPU based no cluster competing with a customized cluster which is based on consumer GPUs. Okay, so this would be our relative binding free energy calculation part. And for the second part today I would like to concentrate on the absolute protein ligand binding free energies. So this would be another kind of beast where we really want to remove the full ligand and and also couple it to the protein and ask the question so what is the absolute binding free energy for that molecule. Here we here I would like to divide the talk into several parts. In the first part, we wanted to explore the applicability of different free energy calculation methodologies for these absolute binding free energy calculations. And what I mean about the methodology comparison. So here I need to a little bit spend one minute on explaining these methodologies. Here we have the standard one equilibrium FEP is the method where chemical pathway, if you still remember from the introduction this all chemical pathway is being divided into discrete steps. In each discrete step we run an equilibrium simulation between each of these small discrete discretized steps we evaluate Delta G value that we estimate Delta G value. And from the sum of those, we can finally get overall Delta G. Now, another approach that we wanted to probe and we often use it in our everyday projects is non equilibrium thermodynamic integration. We can do it slightly different. We run to equilibrium simulations at physical and states. So, one is ligand coupled in this case to solve it in another case, the couple. And from those trajectories we extract snapshots from which we start a chemical transitions very rapid and then people seconds or 200 people seconds this is the range the typical range of the time scale and we drag them very quickly into another and do it also in the opposite direction from the work that is required to perform this these transitions we extract the free energy difference based on the crooks. Flucatation tearing. All right, and we wanted to ask the question so is it possible to use these methods and what are the benefits how quickly they converge. And to do that, we in a for the absolute binding fringes we needed to do to construct the thermodynamic cycle the thermodynamic cycle also looks slightly different. So here we are decoupling the full legion in water so in solvent restraining it in the active site of the protein and performing the coupling to the protein again, and by also removing the restrained contribution. All right, let's, let's have a look at how, how it worked we picked up several different systems so that they would be diverse enough. We took first 11 different ligands so you can see they're very different in their chemistry a binding to the same protein. That's our case a case B is where we took a single protein from a single legion Brahmas pouring and bind probe that binding for for selectivity and give it against 22 promo domains, promo the domain proteins and case C is a typical example in in these kind of calculations is T for license I'm where we propped only five ligands but the case here is that up on holo states differ substantially for this for this protein. So it will be quite interesting to see. Let's have a look how it performs so first case 11 ligands against the same protein. And yeah it's quite busy. These are the results but they would like to just, yeah, let me guide you through through them, and we don't need to see all the details that are depicted here but and the main points in the first scatterplot we're looking at the FEP approach, which is enhanced with by means of Hamiltonian replicas change. So it should converge better than FEP without Hamiltonian replicas change, but the final answer that as we see they're very similar compared to the experimental free energies. And then we see our non equilibrium free energy calculations with one with slightly faster perturbations or slightly or slightly slower one on a second transitions for several of those leaders. Yeah, we probe several technical details here, but the idea maybe maybe the most interesting thing is to look at the panel see where we're showing average unsigned error over all of these data points from the experiment. And we see that the we have we I plot them over on the x axis I'm showing the time simulation time required. And we see that the gray line is the is let it be our reference of the Hamiltonian replica exchange enhanced FEP, and the blue line is an orange line that are course, respectively non equilibrium approach and FEP approach. Unfortunately, they're very quickly approaching this gray line gray line as as a that they show as a reference. Yeah. If we look, if we zoom in a little bit, we see that line equilibrium approach converges a little bit faster than FEP. Okay. There are two other outliers. No need to spend much time on them because they're one directional transitions based on non equilibrium approach. They're just there for the sake of completeness of this figure. An example that we looked at is the bromosporine case where we have one ligand binding to many proteins here a very similar picture overall accuracy is slightly lower for this data set, but we are interested in comparing the methods right. So here again we see her ex FEP FEP and non equilibrium approach they're all giving very similar final answers. And if you look now at the convergence we now start to see some differences the blue, our non equilibrium approach converges very quickly. And to the gray line and the non enhanced FEP without him a ton of replications converges much slower. Okay, so it's a now we start to see also some differences between the methods. And the third example in this case was a T for Liza some so it is interesting. It is an interesting system because it was already studied by means of relative binding free inches for with respect to these events. And it was noticed that one would need very long simulations in order to converge the state where where which is a more similar to up a state. When the ligand is smaller, because these perturbations in the protein are quite substantial and this requires simply long. The FEP windows to be quite long to to reach the convergence. All right, so let's have a look at what could we do to make it converge faster for us in terms of non equilibrium approach, and we could do the following trick. We do not for the non equilibrium approach. If you remember the scheme that they had, we have two physical and states. In this case, those two physical and states would be our whole state and opposite. In this case, we can see them with the, well, start the simulations from the crystals, the whole structure and crystal apostrophe if they are available in this case they are available. And already this will help us to converge much faster. Well, maybe, maybe it is true maybe not let's see if this really is the case. So here are the time traces very similarly to how I was showing the other time traces. And here we have now different variants. So the orange and red variants are those that are starting from the apple. Those are FEP approaches that start from the apple simulations from the from the apple structures. The green lines are those that start from the FEP cases that start from the hollow structures and with non equilibrium PI we can combine both apple and hollow in a single simulation. And here I'm mostly interested in how quickly these methods converge. So I subtract, instead of looking at the full trace I simply subtract the last data point value from the first and plug them here in this plot. And we see that the convergence. So the difference how quickly we reach the same the final value is always the smallest for the for our non equilibrium approach. And we see that the convergence of combining both states into the same into the same French estimate. So in this case we can conclude that we can use our non equilibrium approach for estimate binding affinities in in absolute free energy protocol. We can also converge them faster than FEP and the similar rate to Hamiltonian replica exchange FEP. And, but we have this additional benefit of taking into account both up on hollow confirmations into the same simulation. And with this with this in mind, let's go to the second part where we will apply the same approach in a large scale of protein ligand binding free energies. And we did an even nifty trick into in improving this thermodynamic cycle that they showed you before, because in fact we noticed that with the non equilibrium free energy cooperation. We can even avoid simulating the apple state with the decoupled ligand at all, we simply can simulate the apple protein once and superimpose the ligands into its active site on on a reference there are some technical details which I just omit here but we can reconstruct sets and symbols faithfully and rigorously into to further participate in the in this branch of the thermodynamic cycle where ligand is coupled to the to the receptor. Okay, so this is just a methodological detail. We can proceed with the application for that. Again, we relied on the same data set that I showed you for the relative binding affinity study, just in this case we selected a subset of those systems. Let's go immediately to the results here are the overall results and the biggest in the largest panel we see in the largest panel we see that all the systems plotted together, we see that overall accuracy is slightly lower than we would usually get from the, from the relative binding affinities. And again, a very similar case that if we look now a case by case, sometimes sometimes we get very good agreement with experiment, sometimes we do get something more. What I would describe as an offset. So correlation might be there but we do have an offset between the experiment and calculation in some cases. And this is this is quite an interesting case this emergence of the soft set. Let's have a look closer into it. Because this is really something that is a conceptual different between the, between the relative and absolute binding affinities. So, let's let's again look at it by means of an example. Let's say, let's, let's think about a case how we could construct an apostate from a hollow state. This, this this approach that I would call a poor man's apostate would be would work as follows, we take a whole structure from crystallographic crystallographic structure and simply remove the living. We do not allow the system for sufficient time to equilibrate. We, we will simply have a whole structure without the living. We can call it up on can use it to calculate binding affinities, and we would come up with the plot in the that I'm shown here in the top left. There are some cases where the offsets emerge in the emergent offsets are really, really large. It's a purple case. And now we can compare in some cases in all of these cases we had that I'm sure in all of these cases that I'm showing here, we also had a true apostate as a result by x-ray crystallography so we can now take a true apostate not the poor man's apostate but the true apostate and calculate again binding affinities with those structures. And if you calculate them again, no offsets all of a sudden disappear. Right, so let's go a little bit into the detail what happens in these offsets. Let's take one of these systems so this purple system where the difference between this poor man's and true apostates statistically significant and let's have a look at those structures. So if we look at them, and the apostate and orange and whole estate and in blue, we see several differences. The first difference that really is stark, stark and catches the eye is this major loop motion. And one might suspect that yeah this is the problem this is the difference between up on holiday we don't that we don't capture, but actually, it is not the case. In fact, in the simulations we capture it very well and it doesn't influence our results at all. But there is a smaller, smaller circle that I have in this figure, and it shows the gatekeeping residue. And which actually changes its automatic position between up on hollow states. So now, if, if we calculate our binding affinities, starting from the whole structure. And here I'm overlaying many, many snapshots from MD simulation of this gatekeeping residue of this training. They're all pointing in the same direction and trying to show that they're pointing in the same direction, and we get an offset from the calculation. So let's take up on state and calculate and look at. Let's look at it. We already know that the binding affinities do not have an offset. And from in the simulations, our residue training is always pointing also in one direction, but in a different one than it was in the whole estate. So just from a single Rotamer state transition, we get about eight kilojoules per mole offset between these two data clouds. And how could we, how could we confirm proof to ourselves that this is really just a single Rotamer that is responsible for all this. Well, we could take the blue structure and change nothing in it except for this one Rotamer. So we take the blue structure and take the training from the orange structure and let's simulate with that. And we got this green one, which also doesn't have an offset. This means that it's really sufficient to replace this Rotamer in and we will recover our eight kilojoules per mole difference. In fact, if we just calculate the PMF between these two Rotamer states in an apple protein. We will see that the difference between the trance and wash minus is exactly eight kilojoules per mole which exactly accounts for our engine difference that we saw in the absolute binding affinities. All right, so with this, I think I also come to the conclusion of the second part of the absolute binding affinities. It is essential to get the accurate absolute binding affinity. It is essential to take into account the difference between up on hollow protein conformers. And in principle, absolute binding affinities can yield comparable accuracy to RBF ease. However, this will come at a much higher computational cost because it is simply it takes longer to converge such large perturbations. And here as I'm showing in these smaller, smaller panels. Here I'm comparing the delta delta G's, which I can always back out with from the absolute binding affinities. And on the right I'm showing them the same exactly the same edges the same delta delta G's that I would calculate from the relative binding affinities. And the difference between these accuracies is really statistically becomes statistically insignificant. We also have to take into account that we have slightly larger uncertainties in the case of the absolute binding affinities. This is mainly because of the, yeah, of difficulties to converge such large perturbations. And yeah, I think this is my final slide. I would like to thank everyone by Excel and then remarks and and also all the, of course, all the people involved in this in this project. Thank you. Okay, thanks very much. We test that was very interesting. Nice summary. We have a fair few questions people please feel free to add anyone's but we'll see if we can get through these questions in time. So the first question is from Shaheen who asks is repeating the non equilibrium class MD to transform wild type to mutated type simulations enough to be considered as a new replica or everything should be run from the beginning. The idea is that the information about the ensembles, which is actually the key here that we the key information that we want to have because it is the free energy difference between the between the equilibrium ensembles is actually encoded only in the sample populations of the equilibrium ensembles. So if we don't rerun them. Well we don't don't have any new information we can repeat the non equilibrium transitions which are required of course to to join these. Yeah, to to to converge the equilibrium to converge the free energy difference, but then the information about the free energy difference is actually just in the equilibrium part. Yeah, that's clear. Thank you. Next question is from Matthew who asks, how were the dual topologies for ligand setup, because he doesn't see the PMX web server having this functionality. Yeah, correct. So for the PMX web server we support. So first of all, those are single topology approaches. And secondly, we support that for the amino acids, nucleic acids and the ligands we are. We're considering on adding that as well. This, this currently is supported as a common line tool, or, yeah, that you can run. Also, it's incorporated into some workflows, but the easiest way would be just to run it PMX a common line. Okay, the second question from Shaheen who asks, does PMX accept exotic ligands should we provide the parameters for the ligands and so how. Yeah, so PMX doesn't parametrize ligands, it just prepares the hybrid structures and topologies for further calculations. It is up to the user what topologies one would, what what parameterization for the ligands one would choose. For example, we frequently in the lab try out several force fields so it is up to everyone to to experiment which which ones would give the highest accuracy I usually use a graph based parameterization. CGNFF and sometimes I try out OPLS and now lately we more and more work we're doing with open, open force field, but those are independent independent tools on their own to parametrize those ligands, we have a few rappers so if anyone is interested but I think the standard base is simply to rely on the tools that are provided by those four by the developers of the force fields and they are probably the most reliable source. Yeah, that makes sense and you probably actually maybe already answered the next question from Carl which is what kind of parameters do you use for the various ligands which is maybe very general questions answer but is there anything that you want to add on that to what you've already just said. Yeah, maybe I could then expand them since they have this question again, I would like to quickly quickly I'm scrolling up to several slides that I want to show the idea is that we always test several force fields and the simply an empirical observation that we observe is that when we combine the results of several force fields, we actually benefit from that so this is not necessarily obvious from the very big from just from, yeah, formulating this as an exercise, right, that you one may also get errors from two force fields might propagate and you would get an increase in overall error but if we simulate the standard way how I do it I simulate with Gaff and I simulate with CGNFF. So, outcome of those. I know that the error on average will be reduced if I simply combine the results of those two force fields because they tend to point in the opposite directions from one another. One is overestimating the free energy, another is likely to underestimate it, it's not a rule, it's just an empirical observation that we observed for these force fields. And it's, we stumbled upon this again and again that simply averaging of the results will help us to increase the accuracy. It makes sense, it's kind of like crowd, crowd sort of. Yeah, that's the approach. Okay, great. The next question is from Michael asks, thanks for interesting talk. What is the gain in accuracy to estimate finding infinity is when compared to molecular docking or comparing your method to molecular docking. I think docking could be. So the gain is in the accuracy of, I think they should not be directly compared one to one as docking would probably. That reflect very well on docking in that in the absolute sense, right, but docking is a very good in generating suggestions for the binding poses. So, and we always rely on the good starting structure. So let's say if a docking even is not able to get accurate on spot the free energy difference within one kilo calorie per mole. So that's our, I haven't mentioned this but this is like our golden standard for trying to get average and sign error within one kilo calorie per mole for a large set of course average over ligands they will be larger outliers but some, but on average we should not be more than one kilo calorie per mole from away from experiment. So docking would not achieve that particular accuracy but it would it could still be very useful in let's say ranking the ligands or generating these poses or suggestions for for the next simulations or so so. Okay, yeah, thanks. Then Matty asks, how do you usually set up the restraints for the decoupled ligands position rotational something more complex. We used the quite standard Boris car plus restraint types. And then we basically have for the method. We have an automated selector just a tool, which tries to identify more rigid regions in the protein, and also in the ligand, so that it doesn't violate rigid rotator approximation. So we, yeah, select like this for the method that I was outlining for the last part of the talk where we are actually relying on the super imposing ligand into the active side of the protein it gets a little bit more involved because we need to select the or select yeah let's say identify such a restraints, which would generate a super imposed ensemble, or at least we would not be able to distinguish it from a simulated ensemble so super imposed ensemble should not be distinguishable from a simulated ensemble, given the restraints that we would identify so that yeah it becomes a bit involved but we have this also automated in principle. Thank you. Lisa asks, if we don't have any able crystal structure available, then how can we proceed to calculate non non algorithm calculation for a absolute by entities. Yeah, in principle one can always use the whole estate and construct the so called poor man's apostate by removing the living one may consider to equilibrate longer, or maybe use them on some enhanced sampling technique to have to explore what is the best range of minimum of the protein. It doesn't have the protein and sample it doesn't guarantee though that that this will give. That will necessarily give an increase in the binding affinity accuracy. But for example, let me just show you this again. Where is this my. So, yeah, you see the barriers are from one state to another are quite large. So it all will depend on the barrier height between the upon whole estate. So for this is unknown. And if you are facing a high barrier. So the relaxation of the, this poor man's structure will not necessarily lead you to jump into this other other state that you are actually interested in. Yeah, but on the other hand, if you are not necessarily aiming after the exact absolute binding affinity. So if you don't mind having an offset, and you simply want to rank the ligands in according to their to their binding affinity. So maybe you, you can live with that offset. Right. Maybe, well, you simply will not trust that this is your true binding affinity will simply compare the ligands in terms of one ligand is more potent another is less potent and that is still a valid answer. So, you could still use a whole state. Well, without without the legal. I'm going to combine three questions into one, because they're related from Julia and loose, and Ignacio, which is basically asking sort of generally could this approach be useful for to calculate protein peptide systems binding for energies and one question particular how PMS performs with ligands like small peptides like eight to 10 amino acid residues. I think eight to 10 amino acids is a very large perturbation is to really fully decoupled such a large peptide for such a question I would probably always. So yeah, I think it's not feasible to converge simply to do it to attempt it it's a fine but I don't think it would converge but one can, of course, always to formulate a question if it's possible by approaching the meat it from the mutation part and mutate amino acid. Maybe if the question can be formulated in that way that one can ask, so how does the binding affinity change upon a mutation and try let's say an alanine scan of that peptide and then learn learn from that which amino acids are crucial and which amino acids are good replace and mutate into others to gain an affinity or reduce affinity, the full peptide I would I would not try to decouple. There's lots of lots of questions and we have a limit to how much we can ask you to answer, but we'll have at least one more question maybe that's the last one probably which is that. Have you considered or pride, even using apple structures model using alpha fold. And yeah, actually for this exact system, it's of course very interesting, which what what we would get. And in this case, so if we put p 38 into alpha fold, at least for me in all the cases it predicts the apple state of the protein. But so this is a good news and bad news at the same time. So if we have a whole of state it's fine we all of a sudden we just got up a state for free from alpha fold. On the other hand, imagine that we are interested in a relative binding affinity study where we would like to dock elegant into a structure, a known structure which for which we have only a sequence, but it always would predict the apple state with. So it would not help us so much. So, yeah, this is an anecdotal case because I attempted only for this particular scenario. And this was my observation. Okay time maybe just for one quick question, which is how significant is the choice of barostat thermostat algorithms for such alchemical methods like so can stick to domestic additional degrees of freedom or nothing etc. I think for so for barostat. It's a maybe that up particular details but it's, I didn't, I have not noticed any any particular problems with changing between them, or using the, the standard one which would generate me a correct. Correct ensemble. But for thermostat. Yeah, indeed, if if one is not so careful, especially when using when running absolute binding affinities and decoupling the full ligand, which is then not not interacting with the system only having those restraints that one defined. But otherwise not interacting with the system one can start in that in that temperature group if using global thermostat one might accumulate kinetic energy in it and start seeing fast transitions and maybe something like not extremely fast fast motions that might even cause instability and simulations. So for that, we are always using trying to use now stochastic dynamics integrator so this will avoid such such effects. Okay, thank you very much for answering all those questions for the interesting talk there were a couple of questions that we could not answer because of time, but I think people really appreciated it so thanks again with us for very talk. And now we'll hand over to Alexander. Yeah. Thank you. Thank you very much. I just want to recall the following webinar that we will have. So, next week, we will have a webinar on the workflow that has been developed within by Excel to predict, and in particular with their pharmacology application in pharmacology. Week after we will have a student webinar that is more are so by Excel is always are organizing a summer school and a winter school and in the digital version we as a poster price we give to the participant opportunity to give a presentation and those are the trees. And that won the pros poster price so they will present their research, and that will be in two weeks. I thank you everybody for the active participation, and I thank you very much Vitas for giving a webinar here. Thank you. See you next time.