 It's important for anyone who doesn't know me. I'm a postdoc in Danny Paul's group at Newcastle and previously to that I was a postdoc with the open force fields. So I worked on the spoke fit, but today I'm not going to be talking about the spoke fit. I'm talking about functional form exploration that we've been doing with the open force field technology and stack. And I've highlighted everyone involved there, but it wouldn't be possible without everyone there. Okay, so just a bit of motivation. Why did we bother looking at functional form exploration because we've got the line of Jones 12 six potential and it's it's really good right we've we've seen this week how well it's doing. It's very fast. I'm highlighting here the tip to from the benchmark series where open force field parsley 1.3 actually performed state of the art on on this set with an RMSE of 0.7 kcals per month in our free energy benchmarks it does really really well. And we we like to use the line of Jones functional form because it's it's very fast to compute the dispersion parameter is just one over R to the sixth term. And then if we square that we get the repulsive term which is the one over R 12. However, it's not actually physical we know that there's better ways to model this and this comes from where we were computationally limited. We use this functional form, but there's actually better ways to compute this more realistic ways. So we should probably investigate those. But if we compute in the functional form becomes cheaper with advances in hardware. So traditionally trying to fit a new force field has been very very difficult, but I want to highlight how easy it is now using the open force field infrastructure. Fitness force field doesn't take 10s of HD students and postdocs it actually can take just one person now and it often is an open force field one person setting up these calculations pushing the button. So it's a really good job at automating a lot of this infrastructure. So this functional form exploration wouldn't be possible without all of the hard work that they've done automating this. So I'm just giving a very, kind of very brief overview of the open force field infrastructure where we currently have we pulled data sets from thermal ML that's where we got our physical properties. We've done a lot of work creating QM data sets for the valence fitting, and those are all stored on QC archive for anyone to access, and then we create data sets from that that people can use to fit their own force fields and we publish with our force fields like Sage and Parsley how can we fit it. This all goes through force balance which actually does the fitting of the force fields, and then we've created tools like open force field evaluator and the toolkit which which help this process of translating our force field into systems that we can run an open MMM and then refine the parameters. And the idea is to make systematic incremental improvements for our force field using this infrastructure, which is very easy to do you come up with a hypothesis about ways to improve force field, and then we we fit it and produce better force fields. And I'm just highlighting some of the improvements in the fitting data that we use. So we can actually use mixture properties now which I think it's been highlighted for the Sage fitting where we can fit to things that include water so force field is aware which water model that it's fit against, which is really good. So using this infrastructure now, what we wanted to work out is well what's the missing pieces to fit a new functional form. What's missing from this stack that would allow us to do that. And what we found is we just need a way to express our functional forming codes and then translate that into an open MMM system basically which we can then fit to some physical property or some violence property. And what we came up with was Smirnoff plugins. So this is just an extension to the open force field toolkit if anyone's used that before. So what we do is read parameters that the toolkit doesn't know how to understand it just extends the toolkits knowledge of different functional forms. So what we do is define our functional forming code, and then Smirnoff plugins automatically handles a lot of the hard work that we don't really want to think about when we're exploring new functional forms so it'll automatically add one for interactions with and scaling. So it has a long range of virtual sites. It has a long range correction which is all built on top of open MMM. We can do friendly calculations with it. It's easy to install because it's all on kind of forge. It's integrated with the entire open force field stack because we've got a really nice plug in architecture with the open force field toolkit, and it's, it's really easy to add new functional forms. So we just add the expression of our functional form Smirnoff plugins does everything else. And we use the traditional API, which I'm showing on the right hand side. To define our, our force field form, add any parameters and set up the force field for fit in. So now that we have our infrastructure to actually fit new functional forms, what we had to do is decide which which ones are we interested in. So looking through the literature, there's a lot of new functional forms out there and people tend to do these for water. I think there's a lot of experimental data there. So there's, there's a lot of motivation to produce new functional forms and test them. And what we've implemented so far are two new functional forms in Smirnoff plugins that people can use right away with about the Buckingham six, eight damped potential, which is physically motivated with but an exponential repulsion term. And we also found this high loader expansion and the dispersion term and the stampin function. And we also found this quite interesting and quite different double exponential term, where we've got two exponentials given in the repulsion and attraction terms, basically, and we decided to implement both in Smirnoff plugins. We liked both of these terms because the double exponential, I'll start with that one, has a natural soft core. So this is defined as the two particles approach each other. So this is the distance here. It's got links to the Lenard-Jones functional form as well. So epsilon has the same meaning as it does in Lenard-Jones. This is the well depth. Min has the same meaning as it does here. This is the minimum distance where the minimum potential as they approach each other. But this Alfrin beta term describes the steepness of the repulsion and the decay of the attraction. So these are like global parameters that we can fit to actually help it mimic other parameters, but it's actually cheaper to compute, which is quite nice. And as I said, the natural soft core makes it attractive for free energy calculations as well. So debugging in damp 6A potential is, as I was saying, physically motivated, but this needs a dampened function, which is quite expensive to compute when we're doing simulations. So what we looked at was, well, can we make the double exponential potential match this more physically motivated potential? And we found out that we could. On the right-hand side, we're showing action in energies as a function of the oxygen-oxygen distance in different water models. So in the blue slash orange lines at the top, that's tip 3P and tip 4P, and they overlap pretty well. I'm only showing the van der Waals component of the energy here as well. This is not electrostatics. In green, we've got tip 4P force balance, which is probably the most accurate 4-point water model of fixed charge that we have. And then in red and purple lines, I'm showing the B6A model, which is published by the Riley group, and our double exponential curve fit to that model. So all we've done here is a very simple plot the interaction energies as a function of distance and then a curve fit of the double exponential to that model. And we see that we can reproduce it very well. So we're interested in seeing how does this translate to physical properties once we get the potential corrected. And what we found is that after a little bit of refitting, we can actually produce quite a competitive water model. So here I'm showing six different water properties of a range of temperatures. And in blue, we've got the experimental data in orange, we have our double exponential B6A model fit to this data. And in green, we have the tip 4P force balance model. And in red, we've got a double exponential tip 3P model from the literature that we pulled out. And actually seeing this our water model after a few iterations in force balance fit to this experimental data actually is very competitive with state of the art water models as well. So that was really promising that our double exponential potential seems like a good way to go. So then we thought, well, can we create a general transferable double exponential model. So the plan was to then transition sage into a general model of double exponential functional form. So we just decided to keep the sage Merck types we didn't want to do any new parameters. We kept excellent exactly the same converted Sigma to Armin. We kept Al from be there from our optimized water model. So it's a global parameter that should be applied to all of the interactions. So that's a benefit to the sage physical property training data, which is about 1000 physical properties. And we also wanted to let the water relax as well. So we wanted to do co optimization, which I think has been hinted at previously. But we included some pure water densities over a range of temperatures, just to kind of regularize the water model setting straight to far away from the really good model that we produced. So what we found is that double exponential is actually really competitive with sage on the training sets must highlight this is all on the training data. And so what I've done here is break down the training data by pure density binary density and enthalpy of mixing. And what we see in the RMSE is that the double exponential is very competitive compared to sage and shown some slight improvements, especially in empathy of making as well which is really promising. We basically started from sage and then optimize from there. And on the right hand side, I'm showing that the movement as the water potential as a function. So again, we're at the oxygen oxygen distance. And I'm putting the interaction energy. I'm showing in blue, the basic six, that's the published model. And in orange, I'm showing where we started from so that was our optimized water model previously to just pure water properties. And in green, that's how much it changes fit into the mixture data with a regularization on pure density. When we dig further into the training data, we actually see a lot of the changes due to these water mixture properties, allowing the water to co-optimize with our four skills has actually helped it quite a lot. So we see that we correct an issue with the enthalpy of mixing, and we get much better agreement with the water properties and that seems to be where most of the improvement is coming from, which is really promising. Hopefully that's something that can back up and force fields will fit in as well. So we did a valence fit of the force filter to complete it. So we followed the normal open force fields pipeline of producing the general transferable force field. And we looked at fragments of the protein ligand benchmark set, which we previously looked at with bespoke fits as exactly the same set. We measured similar things. So here we're measuring the maximum RMST. So what we do is for the QM torsion profile, we do a MM relaxation at each point. And we measured the RMST difference between the structures between QM and MM. And we also measure the root and spread error in the energy profiles as well. And what we found is very similar performance to stage. We found cases where it does maybe a little bit worse and the cases where it actually improves things as well. And the bottom one, I'm just highlighting the hysterically congested molecule where we thought maybe the double exponential is going to help where it has this more realistic repulsion term. Maybe this is a case where it can help them. We see some improvements there. So next we wanted to benchmark the force fields and traditionally people would do protein ligands binding friendly calculations, but we don't have a protein force field for double exponential. And we don't have a wave fitting it yet either. So we then thought, well, how about hydration for energy calculations, but we don't have a framework to do those. So Simon very quickly put together absolve, which is a framework that anyone can use. We can do hydration for any calculations or solvation for any calculations with any potentials and a John soft core or a new functional form that you come up with. And our next question was, well, how do you scale it what we'd like to do is a nice linear lambda schedule when we scale our double exponential potential but what we found is that when you plot the interaction as a function of lambda is the potential just disappears at zero and even look at lambda at 0.05 is quite a big difference in the potential energy surface. And this will probably lead to very poor phase space overlap when we're doing our free energy calculations. So it actually came up with a long schedule where we do a linear scaling of the function, but we also scale alpha and beta towards each other and that has the effect of just flattening out the potential energy surface, quite nicely, at least to good phase space overlap, which, which I'm showing here, a calculation of ethanol in me felt pretty. We found that this schedule that came up with a significant very good phase space overlap compensable energy on softball potential as well. So we seem to be doing things correctly, as it's maybe not optimal but it does work. So then we tried to come up with a test set and we thought maybe transfer for energy is or a good way to go. We had some inspiration from some previous papers that open fossil that looked at and we thought well transfer for energy we can measure, and move them from a high dielectric medium like water to a low dielectric and that's kind of a surrogate for our protein ligand benchmarks because you don't have a protein source filled yet. This is what we decided to go with if we measured the hydration for energy on non on aqueous salvation for energy. And then we can combine this using this equation to make a transfer for energy. So we had 72 solids from free solves that also had data points in Minnesota salvation database, and we did 284 transfer for energies. So I'll start with a hydration for energy. The accuracy is it has quite low accuracy compared to sage, which at first was surprising because we did really well on these water mixture properties so it was a big bit surprising that were actually quite a lot worse than sage for hydration for energy and seems to be a systematic and error in our predictions of the hydration for energy. So if we look at the non aqueous salvation for energy again we are quite a lot worse than sage. And again there's a systematic offset in our predictions as well. But then when we get to the transfer energy things start to change and we're actually significantly better than sage. Yeah, so the systematic offset in our prediction seems to cancel out. So hopefully you get quite a good prediction of transfer for energy, which is very promising for putting like in binary for energies hopefully. So hopefully, I've convinced you that we can fit new functional forms using the open force field software and that it's very easy to do it now. Anyone can go away install this through condor, you can make your own functional form, you can train it. So software stack makes this very, very easy to do. And I'm excited to see what other functional forms where we're going to investigate and what way we can fit and how far we can take this. And open and I should say as well this is all built on open and and the GPU support for custom forces actually minimizes the loss that we're seeing in in in simulation time. So for ethanol for the hydration for energy calculation of ethanol, we get 185 nanoseconds per day on a single GPU compared to 255 nanoseconds a day that we were getting for LG softball. So for free energy calculations, you do see a slowdown with Leonard Jones which have to go through the same custom potential when you use the softball. There's actually two fall behind with double exponential. And for anyone that's interested in using double exponential, you can just condor install it and actually use it now as well. So, with a package called the force fields and if you can install that you can just use it through the toolkit, the toolkit in run this. So if you want to look at maybe some qm derived starting points. One of the thoughts that we had is that we're stuck in the minimum of the Leonard Jones potential we haven't actually escaped that, because the lecture is good at mimicking potentials maybe it's just reproducing the Leonard Jones potential. And we also want to find more physical property data to fit to and we're obviously really interested in the progress of the protein force field, because we kind of want to copy that and produce a double exponential version of that as well. And then we can do protein like in binary for energies and confirm that our force field is is worth investigating further. So we have convinced you that we can fit new functional forms and it is very easy doing this using open force field software. See I just want to thank everyone, especially the force field team like to say this is all built on that work. It wouldn't be possible without the hard work that they've done. And my funders as well. Thanks very much. Thanks everyone.