 So, yeah, thank you and welcome to my talk where I'm going to present new benchmarking results on protein ligand binding free energies with the parsley parameters. I'm starting with a quite busy slide if it pops up here showing the increasing interest in binding free energy calculation for drug design purposes by different companies. You all know the names Merck, Rosch, Bayer and Böhringer Engelheim and there's also an increasing number of publications over the years and that's why we are looking into calculating binding free energies. I'm in a perfect position for this as a postdoc being employed by Janssen but working for the open force field and my goals are to evaluate the new open force field parameters in retrospective drug design projects and to generate an open protein ligand benchmark data set. Benchmarking is part of the open FF infrastructure so what we do here what I do here is to create a data set which will provide input structures for benchmarking and analysis of the results where I use the open force field parameters and finally these results are compared to the free energies deposited in the benchmark set and also finally want to validate the parameters and maybe give feedback at some point about shortcomings and successes of those open force field parameters. All my work is public so the benchmark set and the PMX workflow where I'm working with people from MPI Göttingen on and the free energy framework which should allow to make consistent analysis of free energy calculations which should be then comparable over different methods and parameters. This is together with Hannah Bruce McDonald from the Kodera Lab. By testing the parameters we have to be aware that testing only the parameters is quite difficult because there are challenges in every part of the workflow and next to the parameters it can have errors in the input or quality flaws in the input quality then we need to set up the simulations correctly and finally sample long enough and then also the presentation and interpretation need to be consistent and correct. For the benchmark set we have specific requirements. It should be a benchmark set which is used by the whole community and should be it should be enabled really to compare and measure the improvements in methods and parameters. Therefore we need a publicly accessible data set. It should be sustainable that it is still available in 10 years for example and not somewhere hidden or going to be deleted on a public on a private repository then we want to have consistent high quality data which is clean interesting to use and of course documented and also great demonstrations and examples should be available. Here that's where I am right now. We have a publicly accessible benchmark data set which is should be sustainable because it's part of the open FF infrastructure which should last longer than my postdoc. We have I'm trying to include consistent and high quality data for example only so primary data and conversions are done on the fly. I'm working on an easy to use easy usable interface to retrieve and analyze the data and finally there are also examples and demonstrations on Jupiter notebooks. Currently we are here at stage one so we are including published calculations starting with a Schrödinger checks data set and extension by a couple of targets from Janssen. The input data is altered as little as possible to make it possible to to compare the results the previously done calculations. On a later state we want to refine and categorize the data meaning to improve the input structures when necessary and to categorize for example by quality or yeah the challenging aspects of the different data sets and as a last stage in long term we want to include meaningful new data sets for example new targets or target classes new chemical space or specific objectives challenges which one has different specific data set. As an example for the easy to use part I have here a Jupiter notebook which shows how easy you can retrieve the specific perturbations for one target in this case thrombin and yeah you get the data frame which you can visualize and which you can filter with a Jupiter notebook. Now I use this data set to rerun free energy calculations with a new open force field parameters the parsley 1.0.0. I have run calculations on 14 targets with about 300 ligands in total and 500 perturbations or transformations. This is approximately 1000 days of calculations on one GPU I use the Janssen AWS server. The first result revealed that we agree with experiment quite well in many cases so with an RMSE lower than one sometimes it's a bit higher and also quite nice correlations the gray targets here there are some edges not run yet that's why it's not complete data yet. We can also look at other metrics to assess the performance especially in view of drug design we are usually not interested in the actual absolute delta g values but the order of the values so prioritization of best binding compounds for synthesis and I made this smaller experiment if we take the five best ligands based on the calculation how many of them are within the five best of on the experimental results and you see here that in many cases you have at least three targets hits so to say within which are also the best experimental results. Now let's look a bit more into detail here I'm showing the distribution of deviations from the experimental data for the perturbations not the binding free energies anymore but the actual perturbations or simulations run and also here we have quite encouraging results more than a majority of perturbations have a deviation from experiment less than one k-cal per mole and the standard deviation is around 1.4 k-cal per mole it's also interesting to see that most of the outliers are in few targets CMAT MCL1 and PTP1B which could hint to unsuitable setup or just difficult molecules which are not well parameterized. One problem is as I said before it's not only the accuracy of the parameters we also have sampling issues sometimes so I wanted to see whether we can filter out the non-converged simulations and I'm using a convergence metric here to filter out filter the total dataset into converged and or divide the dataset into converged and non-converged simulations and you see that you can get you get a slight improvement in the standard deviation for the converged and much wider distribution for the not converged simulations but you also see in the non-converged simulations you have perturbations which agree quite well with experiment that's yeah probably the method which also gives reliable or good results if you don't have a complete sampling of the phase space and I also tried other metrics to evaluate the convergence and which gave similar results so let's look at the outliers 10 out of the top 10 outliers 8 are from the CMAT Z and again a hint for unsuitable initial structures the top outlier here it's this perturbation which on the first view doesn't look too complicated it's just perturbation from a sulfur to a nitrogen or and then creation of a hydrogen here so we got a deviation of about 8 k kelvin or from experiment it's interesting to know that we also have a deviation in the results done or in the calculations done with GAF parameters I looked at the input structure and what I saw there is that the two methyl groups of the two different compounds are pointing into different directions so one end said it's pointing to right of the other it's pointing to the left and this this is of course a problem that means the mapping was could it cannot be done very efficiently and the simulations cannot converge I've now run the new simulations with different with different starting structures but there we have also another issue here we can have a different totomere of the of the heterocycle with two nitrogens I just see that this should be an N not an NH here and we have to be careful when picking input structures that we have do not only have different rotomeres here but we can also have a different totomere and probably it's also better to have a different total the other totomere in the case of the green green starting structure other outliers are these edges one from the MCL one data set where we again have a we could have problems with a rotomere here so it's not clear which of these two methyl groups should be deleted during the run and also these oxygen to an H perturbation which could be difficult if it's inside a narrow pocket binding pocket for example and the same here in chunk one there are two bromine atoms created which are quite bulky and could be and this could be again problematic in narrow pockets now we want also to have a look at the agreement with the previously done calculations which are and here I compare the results with GAF 2.1 calculations and in general we have much better agreement with the other calculated data than with experiment I mean then nice correlations and the distribution here a deviation distribution with a standard deviation of 1.1 kcal per mole if we now compare the different sets of calculated results once with OPS3E done with FEP plus GAF 2.1 and parsley both run with the PMX non-equilibrium method we see that the outliers are quite specific to different force fields so there's only one third of all outliers with a deviation of more than two kcal per mole from experiment are shared with the different force fields so it's yeah quite specific on the other hand if you look at the successes with a deviation of less than one kcal per mole there are a lot of simulations or Kertubesan is shared with the different between the different calculations set and yeah about these 24 perturbations down here we can be very happy because these are only have the lowest deviation with the parsley parameters and we should of course in future work worry about these 30 perturbations and try to see whether there's something wrong in the parameters or whether we can improve the parameters of the future so to summarize based on the specific requirements the first version of the benchmark data set was created the first results are encouraging and some outliers can be identified as being either not converged or as being wrongly prepared or unsuitably prepared the ongoing work is to further this entangled method differences from or preparation differences from the force field and yeah and in longer long-term future we want to continuously add new targets to the data set run more and more calculations also with new versions of the open force field force fields and then also improve the input structure that's based on best practices with this I want to thank all the collaborators so from the open of course the whole open force field community especially my supervisors David and john and our software scientist chef and hannah from the coder lab then my supervisor at Janssen gary and my colleague Laura and our team leader herman as well as the whole group at Janssen finally I had great discussions with chris and ghettano from open eye and I'm working on the workflow together with bad features and you with this I want to thank for your attention and I'm open to any question thanks David I thought that was a really wonderful talk um I have one burning question which is that in the distribution of the delta delta g's they all seem to be skewed positive do you have any idea why that is I haven't seen this queue let me go back uh you mean here yeah so if I look at the the fitted symmetric curve it looks like there's more unoccupied space under it on the left than on the right I have no explanation about this so I haven't yeah thought about this it's interesting it almost makes me think that that maybe it's a sampling problem that these these transitions from a low energy state to a high energy state maybe they haven't converged though I don't know we can talk about the soft line I just thought that was a really interesting trend yeah thanks for this observation I also look into this any more questions are there places where you would especially appreciate help from the community yeah so I would say in in developing the benchmark data sets or especially the programmable interface there I can uh need some help to to get this forward um otherwise yeah there's also a lot of methodological issues or challenges which we can for example yeah enhancing the sampling along the uh for for the perturbations on the equilibrium runs to generate a representative ensemble of starting structures for the non-equilibrium runs there are a lot of interesting science questions which I'm happy to work with other people thanks