 Our next presenter is Aris Tark Surinak, he is a master of science from Uppsala University in pharmacological sciences with drug discovery and development as the main field of study. And he's currently a PhD student at Nostrom Bio-Diversity Spain, where he's also trained as a project manager in the Drug Discovery Division. He's going to talk about could free energy calculations with glomax be faster? Nostrom Bio-Diversity is a temperature this go. So my name is Aris Tark Surinak and as you said I'm a PhD student at Nostrom Bio-Discovery and I will be presenting this presentation with the title could free energy calculations with glomax be faster and I will explain our attempt to achieve this goal. So first I would like to start to introduce you to Nostrom Bio-Discovery. We are a company situated in Barcelona at the scientific park, which is that one in the photo. The company was founded in 2015 and the two co-founders are Victor Vallar and Modisco.co. They are both group leaders with more than 20 years of experience working with drug design, molecular modeling and time engineering. And the value of the company is based on these three points, our team, the software and the hardware. We are a team of modellers with IT and AI experts. We will be some proprietary software like Pyrox and Belly. And we can also use third party tools like Schrodinger and Romax and some others. And finally we also have our hardware. We have our cluster that we can use in order to tackle the different projects our clients need. So now I'm going to start with the presentation. The main answer we asked ourselves was if we could calculate free energy with Romax faster and this is how we tackle this issue. Here is the outline of the presentation, I will start with a bit of an introduction and then I will focus on the case study talking about protein binding. So to start I want to talk to you about the drug discovery and development process. Here you can see the different stages and some of the computational techniques that are applied to the different stages. As you know the drug discovery and development process can take up to 10 years and it costs hundreds of millions of dollars. And as you go on in every stage the money you have to spend for the experiments and tests increases. So it is very important that when you go to the following steps you are confident about your previous results. Otherwise you may end up losing a lot of money and even still with these and all the money you can lose, but depending on the rate of the drugs that doesn't reach the market approval is higher than 70%. So with that one could think that it is very important and it is crucial to spend more time on early stages so you prevent to fail on the later stages. And a way of doing that is implementing computational based techniques like this one you can see here which are part of adding a lot of information and reducing the number of experiments. They also have to predict possible outcomes in later stages so at the end applying computational techniques save a lot of time and a lot of money. And in the case I will explain today, I will focus on the lead optimization stage and a molecular dynamic simulation. So at this stage the main objective or one of the main objectives is to optimize the lead to get the molecule with the highest affinity with your target of interest and molecular dynamics can help on that. The binding free energy calculations can help us to know which is the ligand that has more affinity to our target of interest. The free energy as you probably know is the sum of enthalpy and the product of the temperature and the entropy of the system. When this delta G is negative it means that this process is spontaneous so that the binding is favorite and the lower the free energy, the higher affinity of the ligand to the protein but calculating this absolute free energy for one ligand this delta G here from where the ligand goes from bound to bound state it is still very difficult because it would require long simulations, molecular dynamic simulations but a way of tackling this is calculating the relative binding free energy. So as I just said the absolute binding free energy which would be this delta G for ligand A is very difficult because it requires long simulations but if we have two ligands ligand A and ligand B it's possible to calculate the difference of the binding energy between them as it was difficult just with one now we have two but here the key thing is that now we can close the thermodynamic cycle and we can calculate the relative binding free energy this delta delta G doing delta G2 minus delta G1 and this would answer the question of which ligand has a higher affinity an additive delta delta G would mean that the ligand B has a higher affinity than ligand A and a positive one would mean that ligand A has a higher affinity and as I said before as we are focusing on the bit optimization stage and we want to find the ligand with a higher affinity we will probably here at this stage we will probably have a set of hundreds of or even thousand of quite similar ligands and we want to choose the best ones the the ones that has more affinity to our target so if we wisely connect all these ligands that they have a similar scaffold then we could calculate the relative binding affinity of the whole ligand sets and if we know one delta G experimental for one of the ligands of the set then we can estimate the delta G of the other ligands so now the question is how we calculate delta G2 and delta G1 sorry yeah here so to calculate delta G2 and delta G1 we are using a method called thermodynamic integration and the first thing we have to do here is to do the four molecular dynamic simulation you see here in the thermodynamic thermodynamic cycle we have to simulate ligand A in water ligand B in water the ligand A bound to the protein and we can be bound to the protein then once we have that we have to extract some snapshots if we look at this image here for example with the two system ligand A with the protein and we can be with the protein here we have to extract some snapshots and then we run fast molecular dynamic simulations changing from ligand A to ligand B and with that then we can calculate the working diagram with all these snapshots at this fast simulation changing from A to B we can calculate the working diagram you can see here and then we can do the same with the protein the ligand B in the protein and take the snapshot and run the fast molecular dynamic simulations changing from ligand B to ligand A and then we will have this other working diagram which is the forward and the reverse and then as we are working with similar compounds we expect that these two diagrams will have an overlap and the intersection of the overlap is going to be the delta g in that case it would be the delta g protein so this delta g2 and then we would do the same with the system with the water systems and we get the delta g1 which is the delta g water and then to calculate the alternative binding affinity we can do delta g protein minus delta g water so to sum up a bit the objective of this project was to create a protocol to accumulate free energy in a fast way and it is to use this workflow in the little optimization stage and to have AI big and generative model so our strategy to tackle this issue was this one first with the literature search then we did the benchmark study then we designed it and created the workflow and finally we did some we applied to industry checking the literature we found that we found this paper and here they calculated relative protein ligand binding affinity for 13 different systems using PMX and chromax and if we look the results of all of the systems we can see that they have a good correlation and a good error estimation so and it's similar to some other commercial software so it seems that it's a good method and the workflow they use is this one here they are working with two different force fields, general force field and general force field to parameterize the ligand then they select the edges if you are not familiar with that when I said selecting the edges and comparing the ligands and connecting all the ligands of the set then they build a hybrid topology for each pair of ligands then they run the molecular dynamic simulations with these steps for taking energy minimization then 10 picoseconds mvd ensemble 6 nanoseconds for the acceleration simulation and then they take 8 ds snapshots and they run 50 picoseconds and equilibrium simulations and they do three replicates for each of the force field and finally they analyze the results so the total time they need for a pair of ligands is 14 nanoseconds and as they are doing three replicates for each force field so with a total of six replicates they need a simulation the total simulation time is 240 nanoseconds and here you can see the results of one of the the systems of this paper here is for gen 1 which has 21 ligands here you see the average with the general number force field the average for the general force field and the average of the six of the six replicates and as you can see this one is the best it has the best correlations and also the best error estimation these are in kilo visual small so the next step for us was trying to replicate the results but trying to do it with a faster workflow so now I will start talking about our approach if we look at the workflow you have different stages and in each stage we have different things we can modify in order to optimize the workflow and calculate the binding free energy and now I will follow this workflow and I will show some of the results we obtained changing these parameters so for the setup we decided to work with just one force field instead of the two and a study we could get similar results with shorter simulations and we choose the general numbers force field then for the protein preparation it is important to have a reliable structure and build a protein model as accurate as possible it is also important to consider which ligand to use for the protein equilibration you can use a centered ligand so it's the most similar to all the other ligands or you can use the largest ligand of the set in order to prevent having issues or space issues in the binding site when you place the other ligands of the set then to select the edges we try two strategies the first one we only work with the similarity score well that's it this is to pair the ligands as I said before so the first one we work with the similarity score pairing the the ligands based on the number of per atoms between the between the ligands of the set and then we build the minimum spanning tree based on the similarity score as the results weren't very good we also tried to build a second minimum spanning tree to have more transitions and more pairs and study more the system uh and then we also tried blow map which is an algorithm done by mobly lab that the part of considering the similarity between uh ligands it also considers the rings of each systems the net charge and it has some other rules and it also connects the ligands using different strategies uh like radial or a half or you can do it like a tree and they also uh close the molecules in thermodynamic cycles and connecting the molecules in in cycles it has an advantage and I will show you later uh and if we look into the well here is the an example of the of the mapping uh the ligands with lomac uh here you can see that we are using the the radial strategy uh which means that all the ligands are connected to a centered ligand and also that uh all the ligands are connecting at least with two other ligands now yeah now we have the results with the two strategies for the system we started from the benchmark uh if we look into the results we can see that it's when working with the similarity score the results are don't converge and we have different trends with the different replicates but when working with blow map we have a good trend and the results converge so uh the way you pair the ligands is crucial to get good results at the end then the following step is the hybrid topology so the two common ways to build the hybrid topologies for mechanical calculations are single and dual topology uh here we have an example to compare the pincin to ventil alcohol here on the left we have the single topology that compares uh from one atom type to another and the the dame atoms are used uh where there is no match between the two structures and on the right side we have the dual topology uh which is that not compares one species to another but it only compares uh between dame atoms and interactive species uh but uh here uh what we used to build the hybrid topologies uh it was pmx which worked very good and i know that you had a lecture with uh beat us and i'm sure he explained that with more details are much better so following the workflow then if we go to uh md simulations uh here are the steps we use for the md simulations first we are in energy minimization and we do an 10 picosecond mbt ensemble then we do the equilibrium micro dynamics uh simulations and we try different then for the simulations one nanosecond two nanoseconds three and six nanoseconds then we select the snapshots and finally uh we have the non-equilibrium uh md simulations of 15 picoseconds and we also try a different number of non-equilibrium 16 30 40 and 80 uh and the the snapshots were extracted uh equidistantly for for from jaffet trajectory and the first part of the the equilibrium where you extract the snapshot is uh deleted so to be sure that you are not uh getting the snapshots that before the the system is equilibrated because then uh you introduce error to the calculation that's right yeah if we look into the results here we have some examples uh changing the the equilibrium simulation length and the number of non-equilibrium runs uh if we look this one here six nanoseconds and 80 transitions we can see that we have a good overlap between the rivers and the the forward and the backward uh and we can see that we also have a good uncertainty of 0.72 kilojoules small which is the best one out of the four if we look uh these two results with three nanoseconds and two nanoseconds we see that we still have some overlap and is we have a similar uncertainty of around 125 kilojoules per mole which is good enough and if we look to the one nanosecond and 16 non-equilibrium runs we see that uh well here we don't have or we don't have an overlap and the uncertainty is very high so here we decided to to keep working with the two nanoseconds and 13 non-equilibrium transitions between because it's the shortest one and it has uh good results are good enough so then what we have to do is to analyze the results uh to analyze the results we use uh bar and crooks and we selected the transitions uh and the path uh with summers course we have created uh here is um why before i i i said that it was important or when i was talking about the edge election i said that it was important to connect the ligands in cycles and that's because when you want to calculate the delta g or delta g delta delta g for one ligand if you have those connected in cycles then you can connect the the ligands in the way that you have less error and you have better results for example here if you want to calculate a delta g for ligand five we have these different paths so based on these scores we will choose the best one and then we we also created two different strategies to deal with the different replicates the first one is what i call replicates average so what we do is calculating the average of the three the three replicates and then we use this score to uh to select the best path to connect the the ligands and the other strategy is the individual best here what we do is using this score we select the best replicate out of the three of them and then we use again the score to connect the ligands between them and select the best the best path and now we can look at the result of the gen one system here is the here is with the replicates average and that one is the individual best well and as you can see the the individual best is much much better than the replicates average and this can be explained because when you have to do the average uh we may be introducing a lot of error uh for example if we have one transition that is very good but we do the average with other two that that has a poor performance we end up with the worst results but on the other hand if we are we we have a score to select the best transitions out of the three replicates that has the best performance then uh this as you can see here it improves the the results quite a lot and also you can see that the results are quite similar to the ones from the from the literature so to start summing up the final workflow is like that we do the ligand parameterization with the general number fourth field then we select the the edges with loma we created the hybrid topologies using pmx we run the simulations with grommax and then we analyze the results and here uh we achieve well this is the from the paper and that's our workflow so we achieve to reduce the simulation time from two hundred uh two hundred fourteen nanoseconds to fourteen nanoseconds for each second pair and we are able to calculate the landing free energy with grommax and pmx much faster than than this original workflow we started with if we look into the results here we have obtained the one i have been talking about then here we have another system for the the the same benchmark and here we have an industrial set as you can see we have with all of them uh good correlations and good error estimation so it seems that the workflow uh with this the shorter workflow we can obtain good correlations and good results and finally i would like to thank the organizers and thank you also to all of you for your attention and if you have any question uh are we happy to answer thank you sorry i couldn't unmute my mic thank for your presentation uh just let me check if there are any answer uh questions no questions from the audience yeah no questions unfortunately okay okay um well uh thank you again for your sorry for an interesting presentation i think that with this we conclude this uh the second morning session on success stories and showcasing of using HPC for biomolecular and applied research in central and eastern europe and now we have a lunch break and we will meet again um at in the next session at uh two o'clock central european summer time thanks again thank you