 Good morning, everyone. I'm Lorenzo Lammura. Could you be louder? Louder. OK. I'm not used to be that loud. So I work at OpenFF thanks to the support of Janssen. And together today, together with Biswop from Genentech, I will be talking about assessing charge models by means of atom by atom and dipole comparison. So why we want to assess charge models? Essentially because atomic charges are key for biomolecular simulation as they dictate the electrostatic contribution to intermolecular interactions. Electrostatic interactions between small molecules, drag-like molecules, and the corresponding receptor do not only affect the binding free energy, but are also essential for molecular recognition. And therefore, assessing the electrostatic match in protein-ligand complexes may help us to understand why ligand bind and how we can improve the binding, which is the ultimate goal of drug discovery. What we have here is a picture of electronically diverse ligands that can bind a receptor with different extents because they can establish different electrostatic interactions. And assigning partial charges correctly in this system means essentially being able to model the electrostatic interaction properly and making us able to predict accurately which ligand will bind the receptor. So traditionally, ABIN issue expensive methods were used to generate electrostatic potential of molecules to then the restricted ESP charge model fit. But obviously, this method is rather limited by molecular sites and by the number of molecules. And therefore, more recently, semi-empirical methods like AM1, BCC are being used to assign partial charges, which reproduce DSP as computed as the AtreeFoc level with the 631G start basis, which is a QM method, which deliberately overestimate the polarity of gas-phase molecules to take into account that it will then produce charges which are suitable for the solvated system. Nevertheless, speed is still a bottleneck, especially if we want to deal with virtual screening of large libraries. Or we have to deal with a big system like biopolymers. And for this reason, more recently, graph neural network charge are being developed for almost instant partial charge assignment. So we perform this benchmark using the OpenFS public data set, which consists of over 9,000 non-provisory compounds coming from the collection of six different pharma partners. What we did, essentially, was first to pull down QM data from QC archive, namely the QM dipoles and the QM optimized 3D coordinates. And then we used the SMICE to compute the charges with different charge models, OVLS, A1, A1, BCC, A1, BCC, ELF10, and Nagol, developed by Lilli. And with the first three, so we used the charges generated with the first three model. And we applied these charges on top of the 3D QM optimized coordinates to compute the dipoles and then being able to compare these dipoles with the QM dipoles in terms of angle, difference in length, and magnitude. And instead, we used the last two charge models. And I have to thank Trevor for having done the calculation with the A1, BCC, ELF10 to perform an atom-by-atom comparison. I will be talking about this latter comparison, and then I will then leave the floor to Beerswap to talk about the dipole comparison. Yeah. OK. So here we have the benchmark result for Nagol. So first thing to notice is the remarkable capability of Nagol to produce charges that can reproduce A1, BCC, ELF10 reference charges with the RMSE of only 0.01 or 2 electrons, which is great. And I have to acknowledge Lilli for the great work done with that model. And secondly, we found that only 1.3% of the benchmark contains at least one atomic partial charge, which has an absolute error of 0.1 electron compared to our reference model. Now, among these outliers, we found that about 60% of molecules of wrong partial charges were on carbon atom involved in a double bond, about 50% on aliphatic nitrogen, and about 20% are on sulfur. I am here showing one example for each of these three cases with, let's say, a notation that is a depiction of the molecule with the atom highlighted with the color map that reflects the charge difference between Nagol and A1, BCC, ELF10 charges. And below, we have the corresponding correlation plot between the two charge models. Also, since these three cases constituted almost all the outliers that we found, we also computed the RMSE for these specific cases, meaning the RMSE only for the double bonded carbon, RMSE only for aliphatic nitrogen, and RMSE only for sulfur atoms. So what we found is that the RMSE for double bonded carbon is slightly higher than the global RMSE for all the charges, meaning that these pathological cases, they contribute to the RMSE, but not that much. Whereas for the aliphatic nitrogen, the RMSE for aliphatic nitrogen is substantially higher than the global RMSE, and the same for the sulfur atoms, meaning that these two pathologies may affect more the global RMSE error. We also then applied the Nagol charge model, but with the BCC correction applied after the training. And we noticed that, for instance, in this case, this new model is able to correct the pathology on the double bonded carbon that were affected. And we also see that the correlation gets better, increases. However, the RMSE for the double bonded carbon is still the same as before, meaning that this new model might be able to correct some of the pathologies, but overall does not improve the error. Whereas for aliphatic nitrogen, we also saw that this new model was able to correct some of these pathologic nitrogen. But in this case, the new RMSE for these aliphatic nitrogen increases with respect to the RMSE of aliphatic nitrogen done by Nagol without the BCC applied afterwards. And the same is happening, essentially, for the sulfur pathologies. So the new RMSE with Nagol BCC is higher than before. And you will see in the next slide, this is just because, essentially, Nagol BCC is able to correct some of these pathologies. So yeah, so Nagol, it's trained on AM1 BCC, ELF10. Nagol plus BCC, it's trained on AM1 ELF10. And the BCC correction, it's applied after the training, if I'm correct. Can you say again, you mean that Nagol BCC, it's better than? Well, anyway, as I was mentioning, so essentially, Nagol plus BCC gets an higher error because it generates more pathological cases with aliphatic nitrogen. And the RRI reported just a few examples and also more cases with pathological sulfur, mainly on tides. So the RMSE of Nagol plus BCC is slightly higher than the RMSE with Nagol. And overall, the number of pathological cases is the same as before, so 1.3%. So overall, we can say that Nagol plus BCC's performing slightly worse, at least on this data set. But I mean, the take on message would be that in general, Nagol is performing really well because of the very, I mean, it's able to reproduce AM1 BCC ELF10 charges with a very low RMSE effort. And yeah, I will now leave the floor to Bill for discussing the dipoles. Exactly, yeah. So it's correcting these pathological cases and it's correcting also other pathological double bonded carbon other aliphatic. But it's creating a new one. And for aliphatic nitrogen and sulfur, it creates more pathological cases than Nagol. So essentially, yeah, here with Nagol, we have 50% of aliphatic nitrogen and 20% sulfur, which are constituting these outliers. But with Nagol plus BCC, this increases. Yeah, exactly. Yeah, so the question in this case was if the charges were computed with AM1 BCC, but the geometries were the 3D coordinates from the human optimized structure. And the answer is yes. So yeah, well, I mean, this is a conformationally independent charge model. And we applied the lowest energy minimum conformance. But in principle, we will get same charges for all the other conformance. So yeah, you are saying that for Espaloma, you were surprised that it was also conformational independent, right? Deep dependent. No, well, in this case, I didn't see conformational dependence of the model. No, not for Nagol, not for AM1 BCC. Well, I can leave the floor to Bill. And then if there are other questions, we can. Yeah, thank you, Lorenzo. Oh, the figure doesn't work. You have to use that. Oh. OK, so which left and right or down? OK, good. All right. So I'm continuing a little bit from what I talked about yesterday with maybe a little bit of twist to it and hopefully a way to make lemonade from lemons. What we looked at and I discussed yesterday was the computation of dipole moment data using DFT and comparing it with what the open FF and also the OPLS charge models were producing. This is a little bit of a reminder that these dipole moments are vectors. They range in value quite a bit. In the bottom right, you can see an amide group. These have dipole moments of about 4 to 5, which is pretty healthy. And those are actually important for stabilizing helices in protein folding. They line up and reinforce each other. And helix has an enormous dipole moment in a protein. There's a lot of things that contribute to this, of course, electronegativity and so on, but also solvent effects. I need to be louder. OK, I talked about this yesterday. I'm not going to go into it much. We analyzed this industry benchmark data set, which is about 10,000 compounds. It's a nice, healthy amount of data. And some of the molecules are ridiculously large or complicated. There's ions in there and all kinds of things. We filtered out the ions and just looked at neutral species. We pulled the dipoles from quantum calculations on DFT-optimized structures. We pulled charges from the force fields. We used the DFT structures in the charged dipoles to get force field dipoles and compare them. So I talked about this yesterday. Nothing new here. In comparing these, we wanted to look at how well each force field does against DFT. What might be causing the differences? Identify situations that are depolarized that is where the force field dipole moment is actually less than the gas phase dipole moment, which is a little bit pathological. We looked at other things besides open FF, including S-baloma and the N-A-G-L. We looked at, of course, OPLS-3 and OPLS-4. So this is a result slide that I had up yesterday, too. I don't need to talk about this very much. In the middle, we're comparing charge models against DFT where the charges are independent of conformation. And on the right, using OPLS-4, the charges are dependent on conformation. So you get different charge model for each conformation. So that was kind of interesting. For one thing, you see if the charge model is dependent on conformation, you can match the dipoles, the DFT dipoles pretty well. So what we wanted to do is look at not the cases where the force fields do well, but the cases where the force fields do poorly. And imagine you have compounds where the charges don't move around much as you change conformations. Those are situations where a fixed charge force field should do pretty well. And in that case, you would expect the force fields to give dipole moments similar to the DFT for all conformers. But now there's a bunch of compounds where that's not the case, where by changing the orientation of the molecule or the conformation of the molecule, charge moves around. And in that case, a fixed charge model is not going to do well. So we wanted to, and of course there's a number of causes for this that I have listed there, what we wanted to do is collect a set of these challenge compounds and use them for testing other ideas. The other thing was looking at the OPLS4 charges, which are conformationally dependent. We look across the conformers of the given compound and find those cases where there's a large charge difference among conformers. And so that's another indication that you've got conformer dependent charges and a fixed charge model won't do well. So here's a sample of some of these. We have dozens of them. These tend to be a little bit large, largeish for most studies. But we thought this was a good collection of things to establish because it seems like these would be good test molecules for looking at any or charge model averaging schemes or if you had fluctuating charges or polarizability, that these might be good molecules for testing the efficacy of those different approaches. And also, we thought if we had enough of these, somebody other than me might be able to tune or establish some sort of a trained model that would identify these. So a naive user wouldn't accidentally step into studying a problem for which the force will just start up to it. And that's my last slide. So if there's any questions, he will answer.