 Okay, hi everyone. If you don't know me, my name is Victoria Lim. I'm a PhD student in the Mobley lab at UC Irvine and Today I will be sharing about my benchmark assessment of molecular geometries and energies from various small molecule force fields Okay, so Given that this is a force field meeting. I don't need to really emphasize The importance of force field and specifically good force fields, but just to focus our attention a little bit this morning So let's say you have Small molecule in a protein binding site. You're trying to gain some insight out of this system so The way the insights that you will gain from looking at a ligand in a binding site will really be determined by how How good that force that ligand is represented in terms of this force field specifically We might wonder if you question such as what geometries can a certain molecule adopt and And secondly, what is the energy difference between two conformers? Which conformer is a more energetically stable one and being able to represent these small molecules accurately in just plain vacuum without a complicated protein environment is important in order to Really put these back into more complicated environments such as solution or protein or membrane to really draw insights and actually do chemistry and do science from these tools So just to present a high level view of my workflow so you can get an idea of what's going on So we extract the molecule set and prepare that from QC archive. This has over 26,000 Molecular structures and we group conformers together. So everything that has the same molecular connectivity So we're just looking at a toy example of methane here will be considered as the three different conformers of one parent molecule So let's say we have this cute little icon representing QM conformers Next we will perform energy minimizations with various force fields So each force field starts from the same initial structure as the QM optimized geometry So let's say, you know, these teal purple and yellow colors are three different force fields And we also we have the same starting geometry with the QM And then finally we compare the force field results against the QM reference So if we take if we want to analyze Conformer number two, we can compare that with conformer number two for each force field And we can look at energetic differences and sort of see how their geometries match up So let's say we have some structure here. This is the QM structure We are in this work. We're looking at the general amber force field gaffe and gaffe 2 We're looking at the Merck molecular force fields Merck 94 and 94 s which differ by slight changes in how nitrogens are represented And then finally we're looking at some force fields that came out of the open force field initiative Which are parsley and some are off 99 frost So just as an overview of my results, I'll be talking first about energies And then I'll be sharing about geometries and then we'll look at them together and then identify specific chemical moieties That can be used for future force field refinement Okay, so first we compete relative energies between the force fields and the QM results So the way we do that is by taking for some conformer I we take that its energy And then we subtract it from a reference conformer so the reference conformer is defined to be the same Conformer for all the different force fields and for the QM and The we choose a reference conformer by identifying which conformer Is that so for example if this blue In this blue rectangle here, let's say maybe the reference the lowest energy is this first one So we'll say this is the first the first conformer will be the lowest energy that will be reference zero And that will apply that reference for all force fields So with this we have histogram the relevant the change in conformer energies in this histogram plot here and so we see that There are a few different tiers around DDE equals zero. So the the change in conformer energy So ideally a perfect force field that completely reproduced the quantum Quantum conformer energy difference would have just a really high peak at zero and very low Histogram tails otherwise So we here we see that Generally gaff and gaff two are the highest around zero and then followed by And parsley is up in here too. It's sort of hidden among the blue And then we have Merck molecular force with 94 94 s and then we can see Smirnoff So you can see that parsley truly does improve upon Smirnoff and then Something that's pretty obvious in this graph is that there is asymmetry in terms of Force field over estimates, which would be on this side or underestimates The conformer energy especially with Smirnoff fine fine frost which has the highest difference between The left and right sides Okay, so We decided to extend our energetic analysis further thinking that So a certain force field like if you have a certain set of structures And they all start from the same structure. They will not always minimize to the same structure. So you might have some of some Redundant geometries in your conformer set. So we want to say Okay, given that we have matching conformers Then what are the relative energies of between different conformers? So we match all conformers with respect to a qm reference within one angstrom rmsd And then we compute the mean sign deviation of energy So just taking the force field energy minus the qm energies So this is very analogous to the histogram that I showed in the last side But this is just um one with match conformers and two now looking at the mean sign deviation. So you can See that in a little bit more clearly. So this is on log scale And if we're showing the different force fields here, you can see that they're all very similar It's hard to distinguish any differences. But we do see that there is The median of all these points is below zero also Reflecting that asymmetry that we see in the force field results So just to zoom in on this a little bit more If we overlay all the different force fields as in here and then we zoom in we can see that We see the blue orange and purple curves being on the outside. So these are normalized by area So each area is the same for For all the force fields Then we can see that these are slightly better because we would like ideally to see a much broader distribution at zero and then very Low amounts of data in the tail regions at high and low Compromar energy Okay, so we're going to change focus a little bit here to talk about geometries And then we'll take a look after that about how they relate to each other Okay, so in this work, we're looking at two different metrics to evaluate geometry and one is rmsd that It's generally more well known and more well used But another thing that we're looking at is called tfd which is torsion fingerprint deviation Now the reason that we want to look at tfd is that our rmsd has been known to correlate with molecular size And when we want to identify outliers and structures, we don't want You know artifacts and just okay. This is a large molecule. That's why it has a high high rmsd We want to identify the molecules that are Truly less consistent with the qm structure So torsion fingerprint deviation takes a Gaussian weighted difference of the torsion angles between two confirmations and it Ways it such that the internal torsion has a stronger contribution to the score than the external angles And it's all normalized from zero to one. So a high tfd is less Basically, it represents less agreement just with rmsd except rmsd is unbounded and the low rms Low rmsd is like a low tfd where that signifies more agreement So we hope that this is more independent of molecular size in order to basically compare two geometries So the results for the rmsd and tfd analysis are histogramed on the left side here And we see a couple of different things. So we see that once again parsley does improve Compared to smart knife 99 for us. We can see a reduction in this tail region here as well as an increase In the in the p and then we see this trend also between rmsd and tfd And that both rmsd and tfd plots are very similar But we see that once we control a little bit better for molecular size that The m m f f 94 and 94 s are a little bit more consistent with each other And they do a little bit better than the rest of the force fields in this work And we can take a look at specific parameters that are in these outlier regions to Focus our efforts on improving force fields moving forward. So the way we do that is by identifying If we take all the structures Like so here we have three different conformers that are represented in the in the tail region of the rmsd or tfd plot How we can ask ourselves how what is the fraction of this parameter in just that tail region? And then what is the fraction of parameter this parameter in the whole set? So if we have In this in the rmsd and tfd plots each conformer was considered separately So we're plotting each individual structure, but when we are actually doing this analysis We say okay if we see this conformer three times, but this molecule is only here once and if this molecule has a certain number of repetitions of that Particular parameter it just counts once with here, you know for these three conformers in this small in this all these Parameters b83 would count one time So we can get something just a series of bar plots that really show us. Okay if if Parameter is above one then that is over represented in terms of the tails of the rmsd or tfd And these might be the parameters that we want to focus on for example These ones are these van der waals parameters or these angles But so now we're going to Turn or a little or turn our focus to looking at relative energies versus geometries So here basically we we've looked at the geometries and energy separately now We're just looking at them together. Is there a relationship Between the energies and geometries and the short answer is not directly So we would you know if if there were a direct relationship, you might see sort of a linear Result where we see okay if we have you know better If we have worse geometries, which would be more on the right hand side We would have you know a higher spread of energies, but that is actually not the case when we're seeing that All the force fields have generally similar distributions. So and when we're looking at this dde versus tfd plot um Force field that has really good agreement to the quantum mechanical data will have a high density of points at 00 so here i'm using colors to represent the density of points since We have over 20 000 points in this plot. So it's hard to see individual ones. So we're using color as a way to denote the density So if we look at all the force fields together, we can see Just on linear scale on first glance. There's not a ton of difference. We can see that There's a concentration in m f f 94 Which is has a darker blue region around zero zero And then we can also clearly see that smart off 99 frost is much more spread out In terms of its energies, which is along the y-axis as well as the x-axis Which is the geometries which is consistent with what we are seeing in the previous 1d plus now We're just looking at it as individual points But just to get a better sense of what's going on in that central region that we care about We can look at this result on a log scale as well So we can see okay, let's say if we compare between gaffe and gaffe 2 We see that the high density region is Strongly negative as we're seeing that asymmetry and then it shifts a little bit more to be asymmetric on the dde axis And a similar observation is noted for m m f 94 Where there's a little bit more asymmetry which gets a little bit reduced in the 94 s force field And then parsley seems to be more symmetric than the others We can see it in this blue region here and then with smart off 99 frost things are just way more spread out So there's not really a there's a very strong concentration of points at the at the origin Okay, and then if we just take a closer look at this tfd plot So this is basically the same plot here, but now instead of looking at the density of points We're looking at specific chemical moieties So here I identified a few different structure Types that contribute to these outliers. So specifically we're going to start So if we look at this salmon color points here, these are mostly represented by this octahydro tetrasein compound so things along this has this like four-ring structure and lots of oxygen atoms substituted onto this scaffold We also have this azetidine Backbone. So basically this is looking at the four-membered ring with the nitrogen here And we see that these account for a lot of a lot of the the outliers in the tfd So for example these Purple points here as well as some of the high energies up here and down here And then finally we also identify um and then compounds as they're pretty systematic in some of the energies down here and a few up here as well and There are lots of other points that are not highlighted here, but these are less consistent They're more like individual structures that have maybe a just very unique composition of atoms or specific geometries like a seven-member ring that is not consistently Basically Represented poorly by the force field. So we do not really focus on these individual points Okay, so that brings me to the conclusion and I think this is about 15 minutes. So At this point, I'll take any questions if you have any Just I'm really sure whether it's a question but the these plots of differences in energy as a function of Deviation in structure They they don't seem intuitive. I mean I don't quite grasp them and um as you say one the expectation is that one one should get a linear kind of shift And so it's the energy deviation increases. So so should be the structural deviation Right. So yeah, so So what we're seeing here is and this is a little bit surprising is that um just because We have two structures that are so let me let's break this down into sort of quadrants, right? So um an upper quad so a high tfd And high relative energy that would be that's what we would expect because or and it would be actually either these two quadrants because With inconsistent geometries, we would expect more inconsistent energies Well, let's and then if we let's now turn our focus to this upper left quadrant That is saying we have pretty good agreement in geometries But the energies are still off and and then the lower left lower left quadrant would be saying that we have pretty good agreement in energy in Geometries for the energy. Well, I guess we would need to do like three So basically the high region the middle region and the lower region So we're seeing that it makes sense that if you have, you know, a simple Change in the in a in a portion that it can still represent the conformer energy well because um, There's not much changing other than maybe that portion angle um, but in fact, yeah, so when we have more Changes that are not just that that it's not always reproducing changes in the in the conformer energy and so that's That's something that's consistent among all the forces that we're looking at It seems to It seems to indicate that there's something fundamentally wrong with what we should have there is a force field which Which should there should be a penalty Every time it deviates from structure And and here we are getting a we're getting Large numbers in in changes in deviation in structure and yet the energy differences are Are the same so there's something do you see what I mean one? So so I'm not I'm not this is not a question of the data that you have the analysis you've done It's just something much broader than that that something's not being represented So really an ideal force field should be I mean a Mechanical force field should be something where the penalty is there to bring to cause alignment of the structures And that penalty doesn't seem to be there to bring that out ring ring ring them bring them together So we we are getting Structural deviation and yet the energies are same yeah, I think I mean that's a great point and I'm not sure if it's related to this but one thing that you know leaping has brought up in the past is that Well, I mean for one I think I think that um you know, mostly people have taken like qm optimized geometries and You know fit to those so I don't think Looking at relative conformer energies has been something that people do very much So you're you know when it's fitting to try to get the geometries Right to some extent but um, and then the you know the energetics for me variation around unoptim a single optimized geometry but the second they really only think is at the point that um You know what you'd probably like to do is to take Qm optimized geometries as we're doing here and make sure you get the mm energies of those right But you'd also like to take mm optimized geometries and see what the qm energies are of those And make sure you get those right because otherwise you're doing something that's a little bit asymmetric and saying Like I want to make sure my mm energies for these minima are right, but I'm not actually checking that In there is I find the qm minima and I see if I can capture those but I don't check to make sure I don't have lower mm minima somewhere else I understand. Yeah Mark I think you're muted I see your lips moving Apologies I'm just saying is there an issue trying to compare the qm and the mm energies in that especially for the larger molecules Your qm energies are entirely in vacuo But your mm energies are partially in vacuo because your charge model Is assuming a polarised medium Because the charge models a1 bcc which is a mimic for hs 61g star Which overpolarizes the molecule but everybody's happy with that because that's actually what happens in condensed phase So it's sort of In is it's a vacuum situation versus a partially vacuum but partially condensed phase And especially with larger molecules that could then fold up and have intramolecular electrostatic interactions That's going to introduce an error I have no idea how you'd correct that But um, that's a good point and I right now I don't correct for that and you know that could Explain some of the inconsistent energies that we see Yeah, that's certainly and that's certainly a challenging problem because yeah, as you said, I don't It's so hard to know what the right answer is to that I can't see any easier way of fixing it But yeah, it's always going to introduce an error. So there's going to be a limit to how good you can be over thought one one could of course Carry out qn calculations In some sort of in the same sort of dielectric that we expect to be in the reckon mechanics field That might be an approach. Yeah, yeah, yeah, some of these issues are issues where You know at some level it's easy to argue that we're the only real way to resolve some of them Is to have some kind of a polarizable force field because otherwise You just get kind of stuck with not knowing what the right way to compare it anything is but But yeah, that'll remain I think challenging when we're dealing with fixed charge force fields I wish I knew what the best thing to do is um, yeah part part of the answer I think is well, you know these issues should be bigger for larger and floppier molecules So we may want to separately look at like the smaller size size range Maybe we'll break some of these benchmarks out by size eventually And as any especially molecules can do things like form intramolecular hydrogen bonds and so on the likelihood of end up being big outliers on these sort of lots Yeah, I possibly ask one more You have time Victoria you mentioned that the RMST That comes out that's unbounded Is it not normal to normalize that with respect to a number of consequences? Uh, yeah, I meant that I meant to say that between if we're looking at a diverse set of molecules um The RMST will depend a little bit more molecular size So, you know a high RMST might signify that it's a very large and flexible molecule or might say that Oh, it's a it's a smaller molecule. It but has a high, you know, deviation in geometries So it's unbounded in the sense that it's not normalized between zero and one Hmm Okay, right. Yes, it is limited by the size of the molecule