 I'm Danny Cole, from Newcastle University. I just wanted to emphasise that the funding in my group doesn't come directly from Openforcefield or any of the industry partners. I'm funded by UK research and innovation, but as I hope you'll see during this talk, the interactions that we've had with Openforcefield over the last three years or so have been really valuable for us in terms of scaling up what we can do with our methods to larger data sets or improving our own sort of software skills and mindsets. And hopefully it's been a bit useful for Openforcefield as well in terms of bringing in some of the ideas that we have in our group into your own fitting efforts. So we've already had a talk from Josh Horton this morning and pretty much all of the work that I'll show in this talk is work done by him in collaboration with many of you and other members of my group, but he's let me share it with you today. So just a brief introduction about the sort of work that we're interested in doing in my group. As you'll all be aware, molecular interactions and dynamics would be ideally described by quantum mechanics. But of course this is too expensive for routine use. So most of what we do can be broken down into these goals of personally developing better approximations to quantum mechanical modelling, usually in the form of molecular mechanics force fields. As we're increasingly trying to do with advice from a lot of you is producing software to automate this process and getting it out so that people can use it. And it was really gratifying to see talks from Crescent and Excientio in the last few days when it started to use the software that I'll talk about. And as well as that, there's no mention as well in the process of collecting and analyzing data so that we can all build on each other's work. And the sort of applications that we have in mind, like many of you, is to try to deliver more accurate predictions for computerated drug design. So with that in mind, I'll start by talking about a particular software project that's been mentioned a few times in the workshop so far. And that's the open force field bespoke fit package developed by Josh and others. So as we all know, the actual determination of molecular confirmation is absolutely crucial, particularly in structure-based drug design. And in particular, the confirmation is largely determined by torsional rotation about flexible bonds such as the one that's rotating in the little movie there. And unlike bond and angle parameters, transferability is really difficult with torsion parameters, particularly because they tend to be very sensitive to their surrounding environment. So what tends to happen with force field design, as we've seen for the Opiana 3 versions here, is that the torsion parameter library can sort of explode as you try to describe more and more of chemical space. Whereas, of course, we all know that due to sort of clever chemical perception, the open force field library is much smaller. But that can still lead to deficiencies in the description of torsional scans. And that's sort of exemplified by the graph in the bottom right here. So we've got a QM plot of a torsion scan for this molecule in blue. And if we do that same scan with either the open force field in red, I think for the gas in green, we get both the wrong minimum position and the wrong relative energies of measuring minima. So the open force field bespoke fit package aims to fix this problem by fitting the spoke torsion parameters for the molecule under study. And we've seen in the previous talks how that is done. We want to provide this robust molecule specific parameterization workflow. So the molecule comes in on the left hand side of this flow chart. The flexible torsions are automatically selected. And we use Vygo bond orders to fragment the molecule into chemically sensible fragments. The two is small as possible whilst they're taking the chemical environment of the torsion. Uniquely, we have these focus patterns, which uniquely describe the chemical fragment under study. And once we have the fragment we can do, we can collect the reference data, which is usually a quantum mechanical torsion scan of the torsion under study and use force balance to optimise the MN parameters to match the QN. And so on the graph on the bottom right, obviously we'll see what happens. The bespoke fit curve is now matching pretty much exactly the quantum mechanical curve, as you might expect. And again, fairly uniquely, we haven't seen it before. In other post, this is very well-interfaced with data storage archives like QC Archive. But for those of you in the pharmaceutical industry, that can also be a private repo. And we can use data repositories like this to generate parameters at scale. And we've done that in the JCIM paper that's linked at the bottom there. So I showed you one example of improving the description of the quantum potential energy surface, but just to show you some more statistics on that. So if we look at the open force field, parsley force field, for example at the top here, what I show here are two measures of how well we're replicating the quantum mechanical potential energy surface. So don't worry too much about what these mean, but the RMSE is basically how close the MM structures are to the QM structures over that scan. And the RMSE basically gives a measure of the match to the potential energy surface. So this is quite typical for transferable force fields on top. And at the bottom, the bespoke fit curves are improving both the geometry and the energetics of the scans. And additionally to that, through the interface of this very nice QC engine software, we have access to a whole range of other reference data generation options. So not only can we change the level of QM theory, we can use completely different reference data. So I've shown here a few examples where we are, for example, using the XTB semi empirical methods to perform the scan and then quantum mechanical single points on that potential energy surface. We can also do XTB scans with any single points or just pure XTB scans. But what's quite nice is we get a sort of ladder of accuracy. And of course that's also a ladder of computational expense. So if you don't mind the accuracy so much, you could use XTB to speed things up a bit. And as I mentioned at the beginning, we're not just interested in reproducing quantum mechanics. We want that to be more accurate and useful when we go to using that and for example protein. They can find in free energy calculations. And that's exactly what we see in this TIC2 example. So on the left hand side, the partially open force field does quite well already for this system with good correlation with experiments and RMSE to about 0.7 kT per mole. After refitting all of the torsion parameters for this full congenital series, the RMSE comes down to 0.5 kT per mole and we can see very good correlation with experimental data here. So just looking to the future and again I'm happy to see that others are already doing this. I would very much like this to be sort of implemented as a routine tool for free energy calculations. So anytime we set off 103 energy calculations overnight to make predictions for our drug discovery efforts, can we have a loop in here where we're automatically parameterising these torsion parameters in a bespoke map. And I'm also about to find to see that bias has been trying this out as well in crystallography simulations, which we haven't spoken about yet at this workshop, but the slope fit again is showing encouraging improvements compared to the base force field in these cases. And I'm sure there are many more applications out there as well. Just give my voice a break for a second so it. Your volume has been really good. It was good. That's so far. Oh good, I might take it down a bit. Right, so that's bespoke parameterisation for torsion parameters. But a lot of the work in my lab is also trying to bespoke the whole force field if you like. So can we bring in a molecule that we want to derive force field parameters for and derive all of the force field parameters in a bespoke manner again from quantum mechanics. So the coming slides are a bit of a mix of old works. So we started this in 2016 when I was in Bill Jordan's lab. And some of it is kind of future work a bit speculative and things. So excuse the mix of data. But this slide shows the overall approach. We're very keen on atoms in molecule electron density partition. So this sort of separates us out a little bit from the work that's done in open force field where you tend to use more sort of a and one BCC or ESP charge fitting methods. Atoms in molecule electron density partitioning has a few advantages in my opinion. For example, it doesn't suffer from the buried atoms issue for large molecules. So just to talk you through the approach here. So on the left hand side is a sort of cartoon of a molecular electron density. So you can use a whole range of atoms in molecule partitioning methods to take that total electron density and split it up amongst the constituent atoms in the molecule. So M this and D deck are two approaches that we like example. And it's quite simple. You try to assign a spherical and electron density as possible to the atom. So these atoms at the top in a sorry, these electrons at the top in there are assigned to this height in atom and the electrons in green here are assigned to the carpet. We can then calculate atomic properties of those atoms. So the atomic charge, for example, is just taken by integrating up the atomic electron density and subtracting it from the nuclear charge. We can also calculate something that's similar to atomic volume by integrating up the R cube moment of the electron. What we then want to do is to discover QM to M M mapping protocols to map these atomic observables onto force field parameters. So atomic charges are easiest. We've already calculated them. It's these things. So those are the atomic charges. I'll show you in a second how to get off site charges using this approach. And we're playing around a lot still with converting these atomic volumes into something that look like then a Jones type grammatism. And just as a side note, all of these electron densities are computed in an implicit solvent. So as to effectively account for these induction effects in the dead space. So just a slide on off site charges. I've been speaking about these with open force field for a couple of years now. So what we do to get off site charges. So take this molecule for example shown here. The opposite doing this, many of you will be aware doesn't have a sort of isotropic electron density around it. It will have areas of strong negative charge in the areas of the low pair. So there by the QM. As a color scheme. Of course, if you try to assign an M M at the end of the QM. You will get an isotropic electrostatic potential. Of course, doesn't matter if you have. So now that we have an atomic electron density. We can calculate from that an atomic electrostatic potential. And what we do is we can move around the positions and the charges of the virtual sites on the site charges. So it's a minute. So it's a match the QM electrostatic potential as closely as possible with the NM charts. And we see in this case that with the addition of just two virtual sites on to that. That we get a very good match between QM and the NM sites. And just a note that's a limit of search space. We've made a huge impact on that. So there's no sort of clear target. So quantum mechanical targets when we're trying to assign letter to those parameters to an atom. And that's because it's got to account for all of these things like charges of the virtual sites. And just a note that's a limit of search space. We maintain the symmetry of the actual responding environment when searching for these positions. So the difficult part is always the Leonard Jones parameters. The short range exchange repulsion as atoms get too close together as well as longer range dispersion interactions. So the best sort of data source that we found is quite a small data source, but it's a really nice paper by these people on the bottom here. I recommend reading it if you get a chance. And what they do is they build a force field for a set of small molecules using a very physically motivated model. Which is based on the overlap of atomic electrical density. So this is a functional form of it. So it's very complex. So it's more complex than we want to use in open force field type efforts. At least it gives us something to do. So, and they do derive full force field study of molecules and they combine it with an accurate multiple and full charge model. And they get very good results in the condensed strains as well. So it's always right. It is very expensive to run. It's also very expensive to parameterize. They use high quality DFT sat energies as well as active molecule analysis to break it down into atomic contributions, which is again useful for our purposes. But what that does allow us to do is it allows us to compare sort of sage type potential energy services with these data type models. So on the bikes and sides, if you compare just the thick line with basically any of the thin lines. The thick line shows that complex later model. And on the top is an oxygen atom in acetone. Where we're pairing the lead and Jones parts of sage with the kind of non electrostatic part of the state. So in the top case, we get a very good match with the high, the accurate sat DFT type data. Whereas on the bottom for the same molecule acetone for one of the carbon atoms. We get a very poor match between the force field type models and the Slater model. So that might be, for example, an area where we could improve the force field. But we can also use this when we're thinking about mapping from QM observables onto MN parameters. So in the bottom left hand plot here, I've taken a load of plots like the ones on the right and I've extracted the effective sigma parameter. So I've just looked to see where the minimum in the potential energy services. So that's what's possible on the Y axis. This is the sort of optimal sigma parameter, the best sigma parameter given what we know about quantum facts. And on the X axis here, we've tried to fit a QM to MN mapping model. So map the atoms in molecule volumes, which I showed you a few slides ago, onto these actual sigma parameters. So there's a few variable parameters in here which we can tune to get as straight a line as possible. With those tuning parameters, I hope that you can see we can get a decent correlation between the best sigma parameters and those that we've been modelled from quite cheap quantum mechanical model. So these are expensive, these are cheap, but we can map these ones onto these. So that part I just told you about was part of what we're working on right now. And the sort of toolkit we use for working on these sorts of things is a software package which Josh and Chris bring, those are put together with QKit. So again, this has been around for a few years. I don't really see this as an alternative or competitor to open force field. I see this now as our sort of playground for playing around with force field hypotheses and feeding those into open force field type initiatives. So the way this works is that molecule comes in the top. We perform a few optimisations of things and then we try to use our bespoke parameterisation workflows to out the force field environment. So we've got the torsion drives which Josh has now been working on since the start of his PhD. We've got testing calculations which allow us to get bond and angle parameters from the modified seminario method, which many of you will be familiar with. And we can do these dense calculations from which we can calculate non-bonded parameters as I've been showing. So out of this comes a force field file and we can interface with force balance and open force field evaluator such that any sort of tunable parameters that are left in this model can be rapidly tuned against experimental data. So we had a publication a year ago where we did a lot of this based on heat savagorisation. We now have a new interface with Simon's open open force for evaluated code where we'd like to expand the training and test sets and training against more properties. So I've done a good picture for this yet, but basically we've dropped it put together a new training set which contains 265, 265, and around a similar size test set. So this could be trained in a couple of days so we can very rapidly make force field hypotheses, train them up, test them in a couple of days, throw them away, or bring them through to you guys to extend. This is the point HCN though only, and let's get back. So here are some initial tests and actually not all of this is just around that as a molecule, we can do some sage testing as well. So in the case of SAVE we use that training test set to train a new set of transferable sigmas and epsilons for SAVE style force fields. So I'll go through those first, so those are the first two columns. So we've trained a SAVE model with a TIP-3P water model, and we've trained a SAVE model with a TIP-4P force balance water model. So this sort of investigation comes into, I think something Shepin will talk about when fitting protein force fields, everyone's interested in new water models and so forth. But actually in our hands, the training test data here for these two different water models are quite similar. I should note that we didn't co-optimize the water model here. If we did, I think that the water model would probably come down lower in. We can now also compare directly our QB type atoms in molecule analysis directly with open force field models in an apples to apples kind of way, which we haven't done much of before. So we can use an exactly the same charge model and just be fit to the Lennard Jones type models that I told you. And in that case, we get similar accuracy a bit worse than the SAVE type models with fewer fitting parameters. And we have some new mapping ideas that we can put in here yet to be hung for the virtual site scene yet. So there's more improvement to come. But let's say that QB becomes a little bit more accurate than SAVE chair. The question is, so what? You can parameterise these in a second. These ones, you have to do a full QM calculation. So this will never be competitive with these. But the answer comes in John's previous talk. Where we can use the types of graph neural networks that we spoke about to rapidly assign these quantum mechanical observables for new molecules if we're able to say it about. So as we've seen here, we can use these graph neural networks to provide continual saturn embeddings to describe non-bonded parameters. So John talks about using that for charge fitting and parameterisation. But in the current espeloma model, for example, they're still at some typing in the form of Lennard Jones parameters. So what we're proposing is if we can extract these continuous type of Lennard Jones parameters from quantum mechanical observables, then we can also train up an espeloma type model. To predict those atomic volumes, and hence the Jones parameters in a continuous type way. So we can get rid of the acting type all together. So I'll show you a few results where we've used Simon's data set of 50,000 molecules computed at the Hartree Poc 621PD level. As well as the ESPs on these molecules, Simon has also stored the QM reactions of molecule charges and volumes, which allow us to train up these espeloma type models. So in our hands for our atoms and molecule charges, we find our MSc of around 0.02 electrons on the entire data set. And I'll just show you a couple of examples of a good match where we get extreme good agreement between the predicted atoms and molecule charges and the reference QM data. And a slightly worse one, which we need to look out for if we're going to use these for parameterising a whole set of molecules. And as I say, this is something we put together a year or so ago. We haven't put it out there, but yet we're still working on improving the model. But now as I say, we can use our QM to MM mapping force fields to derive consistent non-bonded force models that's completely despised by these bars, neural networks, absolute values, so there are no actual types here at all. And we could do these liquid quantities. So in this case, we've looked at density and heat spatialisation with similar accuracy compared to sage type force fields with these models. And we're looking at now at extending those benchmarks into mixture properties and free energy benchmarks, as well as improving these models. OK, I'll do that very quickly. If there are any free energy type people in the model in the audience, feel free to talk about to us about our FEP growth software, which is our little contribution to free energy calculations. It's very useful for setting up a congenaric series of liggins in a binding pocket, using all your latest and even machine learning potentials for doing the optimising. Following on from Dennis's talk this morning as well, we also work a lot with Gavin Chalmers. Chalmers group at dividing fast and accurate and linear, top-strix and force fields and quantum data. We published something last year that was very expressive with Ali, in terms of the hybrid assignment at all, the accuracy. And do watch the space in the next few months, we will have a transfer or version of this force field, which could be used in exactly the same way as Ali. Don't be smooth, differentiate the law and so forth. OK, I hadn't actually finished that slide when I sent it to Geoff, so I'll skip over that one and take questions. So, just to thank Josh in particular again, and the rest of my group, and everyone was open to force fields, particularly Simon and Ruth, and Geoff and team for really integrating Josh in particular into the whole software infrastructure has been invaluable. Thank you very much.