 So my name is Jeffrey Soviati and I'm a postdoc at Mike Gilson's lab and today I'll be talking about two potential functions to host guys finding data. You have to click it first. Okay, no, no. So just a brief outline. I have to be louder. I'll try. So just, is that good? Yeah, just the outline of my talk. So I'll just give a background and motivation for this project. And I'll show results that optimizing the privacy host guys finding following followed by the benchmark to protein ligand binding and hydration free and this. And last I'll show the results of what happens when we try to optimize the both hydration free energies and host guys finding it. So just a little background. I think this is what we want. We want a force field that can describe protein ligand binding accurately. This is the holy grail for what we want to do in the application of but not limited to computer aided drug design. So most force fields are related to Q and data, especially for the balance parameters, but there's been recent efforts to optimize them to physical data directly. So for example, small micro crystals and state purpose. So for example, sage 2.0.0 was at the manager's refitted to condense phase mixture deal. So for this project, the question is, why don't we target binding data directly? So the the aim of this project is to target host guest binding instead of protein ligand. So we use host guest system as a surrogate for protein ligand systems. And as a case study, we'll optimize an implicit solvent model. So just a background by host guest systems, why are they useful? The obvious reason is it's smaller than a protein ligand system. So there are some similar chemistries involved in host guest binding. The bindings affinities are comparable to protein ligand and some are even tidal binder. And they are less ambiguous in terms of like protonation states. So we know them in the protonation states of this host. And they've been used in sample blind challenges as a test of force field and a test of methods. Okay, so just to give you a little background about the infrastructure. This is my rendering. I hope I do justice to it. So this is in courtesy to Salman Boothroyd who did a lot of the heavy lifting here. So our main program is the open effect evaluator. So evaluator will, given a force field, you can estimate certain properties. So the one that was used for sage is just made densities mixing, which takes in data from the new thermal ML database. But evaluator can also estimate two other properties that depends on external supports. So the first one is hydration free energies, which depends on yet and takes in data from the free soil database. If I'm not mistaken, there are about 650 small molecules in free soil. And the other one is host guest binding that depends on paprika that takes in data from the type room database. I'll explain that in the next few slides. So the ones we have as properties estimated and the reference data, we feed that into force balance and force balance gives us the optimized parameters. Okay, so just a little background about the host guest binding. We estimated the absolute binding free energy using the attached pool release method. So this method is developed in the Gilson lab. And as the name suggests, we attach a bunch of restraints to the host and the guest. We physically pull the guest out of the host molecule, and we estimate the free energy concept of applying this restraints. And that's also done with a Python program called paprika developed in Gilson lab. And this method was used to benchmark the Smirnoff 99 Frost which is a precursor to parsley and sage. And we show that this is a robust method to estimate the binding free energy for the host guest systems. So it's a bit about the type room database. I've curated about 126 host guest complexes, the comprises of psychodextrins, gukubitrules and octaacids. Most of these were curated from the sample black challenges. And the guest molecules range from simple cyclic alcohols, carboxylate acids to drug-like molecules. And I've hand selected 36 of these complexes as the training method that trainees said. And we've made these available online and on GitHub. So to test this, we use the open FF sage 2.0.0. As a force field, the partial charges, we use aim one VCC of 10 using the open eye toolkit. And for the generalized bond increases solvent, you just select the OBC2 model. So without getting too much into details, the generalized bond increases solvent trace each atom as a van der Waals sphere embedded in a continued medium. So the larger the radius, the less solvated the atom becomes. And the smaller the radius, the more solvated the atom becomes. So for the radii, we, as the initial values, we use the M bond to radii set. So basically, this is a five atom type base. The four of them are the elements, a hydrogen, carbon, nitrogen, oxygen, and with an extra type for a hydrogen atom down to a nitrogen atom. And we want to optimize all five of these radii. Okay. So with that said, oh, just a reference. This is a comparison of running the simulation in open MM on the RTX 3090. With implicit solvent, we get about two markers seconds a day. With explicit solvent, with about 3000 water molecules, we're getting 690 seconds a day. So we're getting a factor of three times faster. So these are the estimated binding free energies with the original radii sage. And what we see here is that the calculation overestimates the binding free energies compared to experiment, especially for the cuckoo patrols. So for the cuckoo patrols, we're getting about minus 50 kTel. I tested this with explicit solvent, same protocol. And we also get an overbinding. But with explicit solvent, we're getting about minus 20 to minus 30. So it's still getting overbinding with explicit water as well. Now we fit this into force balance, and force balance converge after 21 iterations. And we get a big improvement in the binding free energies. So the RMSD drove from about 21 kTel to about three. But we do see the R-squared deteriorated a bit, so from 0.7 to about 0.5. So these are the results for the training set. For the test set, we see the same thing with the original radii, an overestimation of the binding free energy for cuckoo patrols. Now when we use the host gas optimized radii from the previous slide, we get a much better result. So what we see is that the RMSD dropped to about 1.8 kTel per mole, and the R-squared increased to about 0.85. So what we see is that the optimized radii performed better on the test set than on the training set. Now if we look at what changed in terms of the parameters, as I'm showing here, the GBA radii before and after optimization. So most of it, three of them are minor changes, and the biggest change is with hydrogen and nitrogen. So nitrogen dropped from 1.55 axiom to about 0.53 axiom. This is very unphysical, but if we look at the results again, most of the host gas complex that contains the nitrogen radii are the cuckoo patrols. And since those are the most overbinding results, these systems are driving the optimization. Okay, so even though it's very unphysical, let's see how well it works on the benchmark on a protein ligand system. So for the test case, I chose the benchmark set from Elevee and L, which was published last year. So he kindly made these available online. There are 59 protein ligand systems in total, and the Delta G's range from about minus 2.7 to about minus 2.6 KKL per mole. And as a reference, we're running these in open MMM. We can put this on, we get about 900 nanoseconds and it's explicit about half of that. So just a bit about the calculations, the actual binding free-ditch calculations. We use double decoupling method here. We ran it in open MMM and open MMM tools. For the electrostatic part, we scale the partial charges and the GBSA term simultaneously. Split it up to 11 windows for this at the binding side in the bulk. And for the leningers, we use the decoupler with the software potential over 21 windows. Okay. So as a reference, I re-plotted the results from Elevee at L. Oh, forgot to say. So the results I'm going to show you is with the AMBER FF99 SB LDN force field. We've also done the calculations with FF14 SB, but we get similar results. So these are the explicit solvent results from Elevee at L with GAF2 and T3P. And we see that there is a nice correlation, even though it's a bit offset by 3K. Now, for comparison, I also did the calculation with GAF2, but with the OBC2 implicit solvent. And what we see here is that it does get slightly worse compared to the explicit solvent, but from T3P to OBC2. But you still see that there are some linear correlations, even though there's a larger scatter. Now, we also see the same thing with SAGE, where the RMSE, I think it's about double than the explicit solvent with an R-scode of a path. When we use the optimized radii, optimized to the 36th GaF complex, we see an improvement in the RMSE, but the R-square actually gets worse. So we're still not entirely sure why this is, so we need to investigate this further. I suspect it's got to do with the protein stability when we reduce the GB radii of the nitrogen half an action. As a final benchmark, we want to see how well the optimized radii are transferred to hydration-free energies. So I selected all 100 molecules from the free-solve database, and only selected molecules that only contain hydrogen, carbon, nitrogen, and oxygen. Iron RAND is in the 3G, the 3D open-effect evaluator. Similar procedures for the electrosetting and land adjourns to the protein ligand calculations. Each window we render for 29 seconds production run. So as a reference, I've spotted the results from David Mowgli's lab for GaF 1.7, I believe, for the 100 molecules that I selected. So with GaF and TMP, we get RMSE about 1.7 in our scale of 0.86. For comparison, I did the calculation with SAGE as well, SAGE with T3P. And for the same 100 molecules, SAGE performs slightly better than GaF 1.7. And now, when we switch to the solver, SAGE performs just as good as the exclusive solvent part. The RMSE just increased a bit from 1.5 Ga to about 2 Ga. But it retains the correlation, the experiment. But when we use the host gas optimized radii, this is what we get. We see that the host gas optimized radii deteriorates the hydration free energies. And looking at the outliers, these are the molecules containing hydrogen atoms. Remember, it dropped from 1.55 to 0.52. So I wanted to look into this further. So I did a separate optimization. I selected these two systems for the hydration free energies. I selected methyl isominoid. And for the host gas, I selected beta-soc adduction and isominoid acid. And I just want to optimize one radii instead of all five. I chose the oxygen in this case. So initially, the hydration free energies are overestimated. And in order for it to get too close to the experiment, what force balance there was increased the tube radii at 1.5 by 1.6. Now for host gas binding, initially it is overestimated and get to the experimental line. Force balance needed to decrease the radius from 1.5 to about 1.1. When I include both of them in the optimization, we include both hydration free energy and host gas binding. We don't see any convergence. It doesn't get any closer to the experimental line. And on the top right here, you see the objective function. It's not going down at all. So what we see here is we can either get, if it's a solvent parameter that can describe hydration free energies correctly, but not host gas binding or vice versa, vice versa. So just to summarize for the host gas systems, so we have this infrastructure in place and we can actually optimize force field parameters. In this case, optimize the GBU radii and we show that it gives good results for the test case, better than a training set. When we apply it to protein vegan systems, it did improve the RMSE but made the correlation worse a bit. For hydration free energies, it totally went the other way. It's just no good. And when I try to optimize to both of them, the force balance just can't find a good parameter that I can fit to both of them. So what's next? I think I need to look at other types of host molecules to diversify the training set. And we also want to see how well it transfers to other host molecule types. We're looking at optimizing other parameters besides the GBU radii. And we are also looking at splitting atom types, just to see if we can get away from this unphysical, small radius for nitrogen atoms. We're also currently working on modifying the OBC2 model with the Smerna plug-in. But if that doesn't work, we will try a different inputs-to-solvent model. Finally, we also want to test optimizing the Liner-Jones parameters with express-to-solvent. And hopefully we can integrate host case binding data into future open FF release. So with that, I'd like to acknowledge people in the Gilson lab and open force field without which this project won't be possible, especially Simon Withroy and the NIH funding and San Diego super computer center for resources. And with that, thank you for listening to my talk.