 So this is a tag team talk between myself and leaping Wang and Victoria Lim in my group who is I'm gonna be talking some about some some work we've doing been doing in this area as well So what part of this is is quite simple. So for valence parameters This is the other part of the the bonded terms aside from torsions a lot of it just is gonna rely on force balance But so here's here's the big picture again so this in some sense becomes basically part of what we're talking about is basically part of the fitting process inside the the parameter optimizer specifically the part for Valence terms and relating to chemical perception as well so Right now where we stand in Smirnoff 99 frost which is our starting point small molecule force field We have 87 bond stretch parameters that covers basically all the chemistry we cover and just 38 angle bend parameters, which is quite small and probably that's a little too small in some cases So I think for for rings in many cases We've relied on the geometry of the ring to set or the geometric constraints on the ring to set its shape And so we have perhaps fewer angle parameters than we might end up needing But also it deals with a lot of the the duplicate parameters that force fields Otherwise have just by using the better form of chemical perception So we don't have thousands and thousands of duplicate lines So we think this means it's a really good starting point for for fitting for refitting these these parameters and in some cases Perhaps adding a few more So we aren't going to any more see these hundreds and hundreds of lines of exactly the same thing in slightly different combinations of add-on types, so We expect that a pretty simple approach is going to give us significant improvement What that is behind the graphic, but So we're going to start off by basically refitting all of the valence parameters using the current chemical perception So without changing the typing we're just going to be refitting these using force balance Well, also if we identify cases where We have major problems like we need an additional parameter for a ring system or whatever then we can modify the chemical perception by hand That could be the case for some ring internal angles where we need to add some other ones So be running lots of quantum mechanics calculations geometry optimizations. It's on for a good size molecule set to refit these and that's sort of the first phase that will come before going on to Adjusting chemical perception one so to basically there's three generations of valence parameters that we imagine So the first one will be getting the infrastructure working So we'll just be refitting selected valence parameters those that seem especially problematic or high priority without changing the chemical perception And so Vicki in a minute is going to talk some about how we're identifying some Cases that are either particularly problematic or particularly informative for refitting To prioritize for that first generation and and as you get your hands on this and start trying things out if you identify Cases you think are particularly problematic or particularly Informative we like your input Then the second phase is we'll have all that infrastructure working for fitting these and then we'll be refitting all of the valence parameters With fixed chemical perception though possible possibly some some high priority fixes to the chemical perception that would come in by hand And then the third generation would be refitting all the valence parameters with The Kemper based chemical perception so automatically determining which types we should have And so we think probably something like less a hundred or less bond and angled parameters We'll take us through the first two generations of us and and generation one is mostly what I'm gonna be talking about right now because that's most of the infrastructure work Generation two and three is basically just more of the same So a lot of the infrastructure the construction construction picture has to do with the generation one So we're gonna go over a bit of the detail plans here that just sort of roughly fall into this Generation one is refitting selected valence parameters. So we're gonna select the molecule library with particularly informative molecules Then have a to do conformer generation There's some working you to do on QC archive to get it to interface with this And on geometric and force balance and leaping will talk about that We also want to project vibrational frequencies onto internal coordinates to help with some of this and leaping is going to talk about that Then we'll refit the parameters and in the meantime, we're also building infrastructure and concepts for fitting improper We're gonna make a broader use of Proper torsions than most horses. We've been using have in the past and We'll talk about that for a minute So then generation two we're just gonna use that to instead of do selected valence parameters We need to do all of the valence parameters Also start using improper is a lot more and then generation three will be refit them with Chemical perception that is data-driven The key you want to jump in here Or the podium mic, I think works better As part of our G1 goals, we want to develop an initial molecule set in order to fit these force fields so you can imagine that if you put in a force field with only Carbons, hydrogens, nitrogen and oxygens you can't expect to use it to simulate something with sulfur or phosphorus So here this talk isn't this project is not focused on the black box the force field parameterization scheme, but sort of what molecules we want to build to put into it to develop this new force field So we want to focus on this orange circle of molecules and we want to identify particularly informative chemistry and here we're focused on Geometry structures that are not consistently represented by different force fields So this is a project that is mostly done by an undergrad in the lab named Jordan airman and Co-advised by Caitlin Bannon. So basically if you imagine all force fields in the limit of perfection Then they will minimize in gas phase to the same structure So in one case we have this dark green force field and then in force field T We have this light green and if you overlay they should be indistinguishable But in reality force fields are not consistent for all of chemical space so you might have these differences in specific origins and rings and These are the chemistries we want to find because force fields are not applicable for everything you want to simulate so the goal is to identify sets of molecules that are minimized to different geometries by different force fields and So we're not taking any specific force field as the right answer but we are comparing different pairs of infant force fields to see how they compare and And By doing this project we can develop better force fields by expanding the space of Chemistry that is covered So this is our brief workflow of what we're doing. So we're looking at two molecule sets drug bank and e-molecules We are limiting ourselves to molecules less than 200 heavy atoms and we are also not considering metals We are using the same big charge set for all of them with the a and one BCC partial charges And then with the same input structure. We have gas phase minimizations for a gaff I'm MFF 94 and 94 s OPL s 2005 and Smirnoff Thanks So two questions that you want one is How are you making sure you're getting the same minima right because because this only this is only true If you're in the global minima for the molecules concerned because that there's gonna be multiple different confirmations are available But you'll probably minimize too and then the second is is looking at the predicted optimized structure of a best option versus say Might it be better to compare the Hessian matrix? So so to look at like a predicted IR spectra or something between different force fields for the same molecule Okay, so to answer your first your first question was on How do you know you're getting to the correct local minima? So we're just starting with additional confirmations and we're just going to the local minima, so we're just basically seeing checking that the local minima will be the same and Seeing from the same starting structure if they will get to the same point I don't think that's guaranteed in some cases. They end up being different and we sort of pay less attention to those basically But we're looking for ones where they basically got to something like the same minimum, but it's different Well, would you I mean could you minimize for one falsehood and then take that as the input structure? But you then minimize would be other falsehoods to see if they move away from that minima that it's in Yeah, and then the second question was around. Yeah, is it not better to look at second derivatives rather than actual structures Yes, we could do that probably in sort of a later stage of this project Also part of this is about, you know, so the force fields aren't going to give us the right answer This is partly about figuring out which molecules were most interested or which chemistry were most interested in doing more work on in the immediate term So it's a quick way to check. I think this is a nice idea. And I Wouldn't think you'd need to worry so much about local minima at least with the sorts of structures you're showing here And also if you don't get to the same local minima starting from the same structure, that's informative, too I might suggest that you look at Allinger's work and include the MM X force fields because in his work first of all, he's he's Got a wide coverage of Interesting compounds so you could get some information from that And also he's documented With each of his force fields MM 1 MM 2 and MM 3 he's documented problematic systems So you might look you might add his problematic systems to your database and it would be very impressive if you could Eliviate some of the problems that he's been Thank you for that suggestion Okay, so the way we are scoring. Oh, sorry. It's also just came to mind the database of Minimize QM structures. I think it's GDP 9 or something like that that might be also useful as a comparison Okay, so the way we are scoring these Molecules are to you metrics one is Tanimoto combo Which is uses a combination of shape and cover color overlays for the molecule as well as something called torsion fingerprint deviation which Measures the basically the torsion finger torsion Deviation that is topologically weighted so that central deviations have higher weight than terminal torsions so we Use a normalized version of tanny of Tanimoto condo called tanny diff Where we want to look for a low tanny diff and a high TFD score So I'm just showing a few example molecules that we saw from e-molecules where These had a higher number of differences among pairwise comparisons of force fields So just showing the aromatic ring on the bottom first We see that aligning by the six-member ring. There is Variation in the planarity of the five-member ring. So here we have gaff in yellow gaff to in pink MMFF 94 which is the same structure as MMFF 94 s in gray and then Smirnoff in cyan so we see this range that is shown on the right here and With this double ring case, we have this cycle beetle connected to this cyclo-propyl And you can see that if we align by the four-membered ring these three force fields gaff MMFF 94 and Smirnoff Place the three-membered ring at a different orientation compared to the four-membered one and then finally with this compound here on the Middle image I'm turning off the hydrogens so you can better see that We again have a different Different represented planarity structure Of this ring compared to the two natural that are aligned So we are not we don't conclude. These are just preliminary results We don't really have much to conclude from them But the next step in this project is to group these molecules by chemical similarity In other words, can we say are there particular functional groups that are more poorly represented by These force fields and they're less consistent among these different force fields So from these molecules, we will build this set in order to That will lead to the next stage of generating perturbed confirmations of these molecules to Then interface with a third step with geometric and force balance to fit valence parameters So I think this is the end if anybody has any more questions The end of that part at least we can take more questions on at the end too Okay, so this is leaping, right? Yeah, so part of that is um, you know, if we're gonna do sort of a limited amount of A parameterized on a pretty limited molecule set to begin with we also like to add some value at the same time So if we can identify regions that are not well described by current force fields and prioritize those in the first round We're we're gonna know we're doing something interesting worthwhile in that first round All right. Um, oh john just a quick question. Are you saying vibrational frequencies? Is that loose there? Sorry Sorry, I'll wait till you're Okay, okay. Sure. Yeah, there'll there'll be uh, there'll be a slide or two on that um Okay, um, so uh, so in so in my portion of the talk I will describe a procedure for generating a quantum chemical data set to Fits a generation one valence parameters and um, and at least my initial expectation is that this plan should carry us into Generation two and generation three. We'll just generate more data. Okay, and um, and so um So we think valence parameters should be optimized to reproduce three types of properties from quantum chemistry calculations Well, at least these three so the so gas phase minimum energy geometries that a victoria just talked about and um vibrational frequencies from a vibrational analysis and And then potential energy scans along certain degrees of freedom where you expect the deformations to be larger So i'm just going to refer to these as soft degrees of freedom and um, and while these calculations are You know are ones that we can run on our machines or clusters at an individual scale Creating these data sets and fitting parameters on the scale needed to optimize the small molecule force field is going to Require a tighter integration of the software components. So the uh, so here are the software pieces that um that we're going to use that um were initially introduced yesterday and um And some additional coding and infrastructures needed to help the components really talk to each other and work together And um in the next few slides. I'm going to um, I'm just going to kind of highlight what still needs to be done using uh, you know using underlines, so um Um, so first for uh, um, so first let's take a look at gas phase minimum energy geometries force balance already implements a target to fit parameters to minimize the rmsd of a minimize structure to a quantum mechanical reference um and the uh, and the existing procedure is you basically you're preparing the target first which includes the um, which includes the minimize structure from quantum chemistry And then you go through the standard force balance procedure that you write the force field file with the current value of your parameters Round the molecular mechanics minimization with a very tight energy threshold Which is usually not costly because you're just minimizing the small molecule and then um, you compute the rmsd to your qm structure And you add it to the objective function And then you can compute derivatives of the rmsd with respect to your force field parameters simply with finite different and then um And there kind of is a problem here that rmsd does not properly penalize geometry errors evenly across parameter types Because an rmsd is kind of like a global metric of geometry difference If you have one degree of freedom that lead, um That that leads to the overall shape of the molecule changing like for example If you have a torsion in the middle of a large molecule that could lead you to a very large rmsd Whereas if you're wrong in a in a bond length that will lead to a very short rmsd As ill as illustrated by the by the two figures down here where On the left side you have a large rmsd and it's on the small And on the right side borrowing victoria's figure you have a small one So we want an objective function that is a little bit more representative of differences in individual Like valence type degrees of freedom so um So this target is not implemented yet, but um, but I have but I have a plan for it that is described here um, so the um, so the way the this target is going to be implemented is the following that um, The um, the top level organization is that um, is that each target had can have one or more properties and here um, and here we're going to have a have three groups of terms for um for bond lengths angles and if I grouped improper torsions into into the valence some vm properties is going to be a third group and Because deviations in bond length and angles and dihedral's all have different physical units. We um, We normalize each we normalize each term, right? So reasonable normalization for deviations in bond length being point one angstroms for example And and here I'm just showing you the first term in the sum But the second and third term are just strictly analogous to the first term So there's really no need to show them. Um, and um, and so the first sum is going to be over groups of bonds Where grouping is determined by the spin-off parameter assignment So we're going to use the force field to define the target in a sense that when we are measuring Errors in the bond length with respect to qm where you're going to put them into different bins as determined by the spin-off parameter assignment Um, and then um, and then the second sum that is inside the first one is over bonds that are within a group Okay, um and the reason why we Structure the sum in this way is that certain bonds are going to be assigned much more often than other ones like for example An aliphatic ch bond is going to appear many many more times than um, then I don't know a sulfur nitrogen bond for example You might not want your objective function to be dominated by a particular type of bond that appears many many times in your molecule set and The nesting structure of the sum here. It really is just intended to ensure that every bond type contributes by roughly the Same amount. Okay. Um, and that's and that's pretty much all there all there is to it. Okay, the Expectus implementation should be pretty straightforward. Okay. Um, next moving on to a vibrational frequencies Force balance does have an existing vibration target. Okay. Um, and the And the way and the way this works is that you still um, you start with your target preparation Which means you minimize the energy you followed by a vibrational analysis and um And and force balance is able to well has a has a script to convert, you know Different types of vibrational output from quantum chemistry programs into a common format that it likes to read and then um And then after you parameterize the molecular mechanic system You compute the molecular mechanics vibrational modes and frequencies And even though our ultimate goal is to fit the frequencies you have the problem that the vibrational modes are not exactly the same So an assignment step is performed first To assign a molecular mechanics vibrational mode to the closest corresponding quantum vibrational mode before you compute the objective function in the Which is a least squares quantity in the frequency difference Um, and and in practice using, you know a conventional by by conventional I mean the functional forms that you're that we're using we're able to to reproduce frequencies to within 100 to 200 Wave numbers and even though this isn't a very exact reproduction I think that the vibrational frequencies are mainly important to make sure that your force constants don't go too far away from physically motivated values. Um, and um um and so For the additional code that needs to be that needs to be added to make this part work Is that open mm does not natively support vibrational mode calculations? Um, and I think vibrational modes tends to be a slightly more esoteric feature in molecular mechanics codes Not all of them have it. Um amber does have the n-mode program, but I think it's pretty old Some folks in the audience there will have a better idea of that than me and then um And and so basically we need to implement a vibrational analysis for open mm, but that should be straightforward Force balance has a wrapper code around open mm in order to execute the open mm simulation So we'll just implement it in there. Yeah question if you're going to fit vibrational frequencies, um analogous to the a previous question you're obviously calculating the Hessian and He presumed so presumably you could fit the Hessian as was suggested And if your quantum program gives you analytical Hessians It may be far more efficient Then leaving them out you get a lot of free information. I mean it's taking Force balance one step further um Excellent point and we could and we could certainly implement a target to fit the Hessian The the choice to fit vibrational frequencies instead of Hessians was uh um I can I can kind of hand wave that justification, but um, but the point is that if you fit vibrational frequencies You're only fitting the minimum the Hessians Suggesting and I think the previous question referred to is you could fit Hessians all along the scan mm Those are and also just a philosophical point, but the The vibrational frequencies and Hessians are the curvature of the surface so they Are as relevant to the force field as first derivatives and energies I think chris has a coupled comment or question Uh different different comment different question. Oh, okay So do we want further discussion on this? I have a comment on that um, my my my quick my quick comment is um Is we is is we should include Hessians and there's no reason why? um Why we Would need to choose either Hessians or vibrations We could certainly do both and there's much added value to including a Hessian So we should do it I guess what I was going to suggest um is that it seems to me that by fitting to the Hessian Let's say at the at the energy local energy minimum one does avoid the mode Chorus having to decide which mode goes with which and that could get messy in some cases I imagine or ambiguous it can it can um It can get messy and I and I I agree that if you can perfectly reproduce the Hessian You've also perfectly reproduced the modes and the reason Why we do the mode decomposition initially is because it gives you some extra emphasis on the low frequency modes because the Low frequency modes come from small numbers in the Hessian Whereas the values in the Hessian are probably dominated by the stiffer degrees of freedom such as the bonds So we thought by doing the eigen decomposition you actually Um, you actually get to explicitly see what the low frequency modes look like and target them directly and that is what I think is the Added value of the of doing the vibrational analysis. So it's interesting. So I guess I wonder I never thought about this before but but if you diagonalize it, right? I mean then you can still see which You could still in principle. I suppose try to match The diagonalize components as opposed to doing mode overlays. I don't know um, if you if if you uh, if if you diagonalize the qm Hessian And then you diagonalize the mm Hessian and then you fit the diagonalize Components then we are essentially doing the and also the eigenvector We just put them in order then is that the same thing as mode matching? um Yeah, I think that I think that if we order the eigen values I mean, I think that if we order the eigen values, then we are basically doing a mode matching in terms of increasing frequency I think um, alberto had something and then chris and then we should come back to arnie. Yeah I'm not sure I caught your remark, but you don't need to diagonal if you're fitting Hessians You don't need to diagonalize and you have a direct one-to-one correspondence between Absolutely, absolutely. Let's go to alberto and then chris Just a comment. Are you looking at forces as well? Because I see only the second derivatives here and uh, yeah Yeah, there's uh, um, there's there's a there's a third target that I'm going to get to soon So what was really nice with your uh, uh previous slide about you can look at the equilibrium geometries And you can partition that out amongst the internal degrees of freedom where you're fitting the equilibrium values directly You get a direct comparison to a force field parameter With the vibrational frequencies and or even the diagonalized hessian You can know that your vibrational frequency is off or but it's much harder to attribute that to a specific parameter so How are you going to like this will give kind of a global readout on the quality of your force field It's going to be hard to focus it on the specific parameter. Do you have any plans there? Can you project that in somehow? Yeah, so this so this is not so so this is something I didn't directly think about before But there'll be a related slide coming up. I think you could translate your hessian into the internal coordinates And um, and we have a lot of the and we have most of the code in place. You're doing that I'll only we we just don't have the we just don't have the second derivatives of internal coordinates with respect to Cartesian coordinates, but um, but but but we can certainly do that and it might be a better idea I think there's also slightly a larger point Which is that like the vibrational frequencies will be coupled to the lener jones for example So you don't necessarily want to be fitting your valence parameters to get the vibrational frequencies, right? If the difference of vibrational frequencies is coming from a steric class, right? So so that's something that we have to just be aware of and keep coming back to especially you don't be this coupling And you're absolutely right if you transform Your hessian matrix into internals you can look at it and get the bun the bunch of the force constant That is the force constant matrix Except for the in some sense that it it's the force constant In a way that is coupled intimately to the the non-bonded parameters as well the lener jones or whatever Actually if you look at um, you're right and if you look at um crims work He does a very nice. He has a very nice trick Where given the non bond he subtracts the second derivative matrix of the non bond Out of the quantum hessian and then you're left exactly as you're saying With just the internals. So he reads off the force constants by subtracting off I need I need somebody on slack to write down this right down the name of this person to our valence channel. Who is it? Sam creme. Sam creme. Okay. The sdff force field. Okay, so we should look at that. That'd be great All right Let's plunge ahead then Okay, uh The the the third and let's let's see I I guess these I guess these animated gifs might not be might not be playing just because Um, okay. All right. Well, it's it's playing now. Okay. Um, so um, so the third so the third type of target is uh Is these energy scans along the softer degrees of freedom? So um, so the so the vibrational analysis is mainly intended to cover the majority of the I guess uh, the majority of displacements that you that you expect to be, you know Relatively relatively stiff where you don't get larger displacements So but you might have a but you might have anharmonic effects and coupling along those degrees of freedom Where larger displacements are are expected. Okay, so, um um So, so this is so this is just just an example of a of a calculation where uh Where jessica in in david mobley's lab provided this molecule and we use geometric to scan along an angle and an improper dihedral Degree of freedom producing this a two-dimensional potential energy map And this is the kind of thing that you might um that that you might be interested in fitting parameters to And seeing how important the anharmonic contributions might be And so um, so the initial plan is to use geometric to produce these one-dimensional or two-dimensional scans Working with qc archive to deploy the calculations at at scale and uh and retrieving the data But but if you have a large set of molecules save you have a few thousand It might not be straightforward to identify which degrees of freedom are soft and the ones that need to be skin And um, and this is where the idea of of the vibrational analysis might come into play again. So um So this is again an animated gif and um, and there and there it is. So, um Um, and and here and here the basic idea is that using the Hessian you can automatically detect the soft degrees of freedom Either by projecting your vibrational modes onto internal coordinates or by translating the entire Hessian into um Into the internal coordinates and in geometric We already have an internal coordinate system implementation that lets you do the molecular geometry optimization and um And putting in second derivatives is something that we wanted to do anyway because we want to optimize the transition states As well. So this will have a this will this will have some benefits in uh in multiple directions And this is just an example of a molecule where a vibrational analysis appears to reveal, you know, um, um A softer vibrational mode that you might want to scan along when you when you're automatically generating your your qm dataset um, okay, uh, and then um and then how are we going to work with the um with with the qc archive So um, so my so my interactions with daniel smith. I've I've developed a pretty superficial understanding of how a user interacts with qc archive so um, so so basically Um, suppose qc archive contains has defined a workflow called minimize then frequencies okay, um, and um, and so if this workflow exists then we can ask qc archive to execute it by providing a molecule and a quantum chemistry specification and then um, and then and then Qc archive will will do the computation or retrieve an existing one And then we will receive the minimize geometry and the hessian and vibrational data The next step is that you can is that we can locally run our internal coordinate code to detect the soft valence degrees of freedom um And we expect this will probably be easier than molecular fragmentation that hyah talked about that seem to be a much more challenging problem um And then in the uh, and then in the in the in the third step after we have identified Which degrees of freedom we want to scan that we can ask qc archive to do that step as well as um as long as Qc archive has that workflow precisely defined and then we will receive the minimize structures and energies um Um gradients are also really important. We will we we will get those on a grid. Okay, so for for fitting our parameters as well um And um and here are the needed uh software pieces to make to make that work um first uh First the qc schema, which is basically the standardized file format for storing vibrational data Still hasn't been fully designed then so um, and so this um And so we just need to finish the design of that file format and um and and decide what it needs to contain um We um, we also haven't really decided how to specify a potential energy scan because in a sense This is a lot easier than than a torsion drive which goes around Because the torsional degrees of freedom are Um are much more free right you can go all the way around but in But the procedure is a little bit different now because we no longer have the wavefront propagation And we are going to mainly be propagating around a minimum. So there's some questions there And then once we have precisely defined the workflows we want, um, they can be implemented into They they can be implemented into qc archive and the way I understand it is that the um Is that the definition of the workflow takes a long time because it takes a lot of meaning and planning But once it's been designed on the implementation seems to happen quickly. Um, and then um Um, and then there needs to be some codes that automatically downloads and converts this data into the force balance native formats That can be used for parameter optimization Um, and I think that is the end of my part. So back to you, david and sleeping Yeah, so, um It's quite a bit to do but um, once it's all in place actually refitting. We think it's going to be straightforward Here's one example of a sample Uh angle refit that I think leaping did in I don't know overnight or a few minutes or something the other day Um over break. So basically we'll be you know, pipelining whatever molecules that we end up with We'll go through conformer generation and and the workflow for Um scanning and then force balance for refitting And so we're going to be focusing on parameters that We haven't tested very much And or are identified as particularly problematic or informative um, so if if So both both things we draw from um Vicki and katelyn and jordan airman's work or things that you highlight in your own testing But because the the largest part of this first phase is just getting the infrastructure working So we want to make sure while we're getting it working. We actually provide some value at the same time um And so and we can already do this for for selected parameters This is an example in dimethylamine where basically the the original force Smirnoff 99 frost force field Around the nitrogen the angles are too stiff So you see that the mm energies are are high and then after a quick refit of those we end up with the orange um So that should be straightforward at the same time that we're also going to be working on concepts for fitting of improper's And this seems to have largely been neglected and at least the forcers were familiar with this is gaff's Improper's you might notice there's only two really There's 10.5 or 1.1 um And they're only used to ensure planarity and we think they actually should perhaps be used more broadly um, for example for partial planarity So we're starting to work through that One thing we're particularly interested in is nitrogen. So if you look at here's a small set We've put together and this is um, jessica mott. Can you wave your hand jessica says she's working on this aspect? um So she's working on this set and this is work actually vicki did just before Jessica started so this is looking at the sum of um Angles around the nitrogen center for this set of molecules and the nitrogen we're talking about is highlighted so You have pyramidal near the bottom And planar near the top and then you have this intermediate gray region where things are neither Paramidal nor planar and so depending on what chemistry you're looking at you can go relatively smoothly between Paramidal and planar And so we would like to do a better job treating that in the force field currently As at least the force fields were most familiar with All nitrogens are either pyramidal or planar never anywhere in between and they may disagree about which is which but you never have any in between And that obviously should not be the case And we have some longer term plans that can actually allow us to modulate more smoothly between pyramidal and planar um that we could talk about but Yeah, so there's a range of geometries and right now either That was exciting Force fields treat them as either or but really there should be a range that's possible in between And so we think with suitable improper as we can we can basically get the right planar for almost everything and go smoothly in between so We're going to be using geometric to work on that and this is something we're sort of getting this these concepts in place while working on The more traditional bond and angle turns So what we want to do is take a diverse set of small alkyls with a range of Paranalization we're perturbing the improper and the the corresponding angles compute geometries And then fit new force field parameters with force balance and To figure out what we're going to need to be doing on these more broadly So to give you just one quick sample. Here's one case. We've looked at already So we're looking on the left at the energy as a function of improper angle And so we have the quantum energy there in blue And what we get with serotonin and frost in red, which is rather similar to what you would get with Gaff or similar so this is a case where Gaff basically makes it planar And serotonin and frost is similar So we leaping just did a bit of geometric a scam with geometric and then jessica refit this with force balance And so after refitting we have the purple which basically Overlaps well with the mm. So this is an intermediate planarity case. So it has the two minima and Is indeed intermediate planarity after refitting So this is just one molecule and one molecule done in isolation So these aren't general parameters We aren't going to be using these parameters across a whole set of molecules But that's sort of what comes next is we head towards seeing how generally we how generally we can do this And so we think this is important because we Certainly care a lot about nitrogens So generation one is developing those concepts and then generate and fitting bonds and angles selected bonds and angles. So in generation two will be Fitting all the valence parameters we have while maintaining the existing connect chemical perception So basically the same thing is generation one but more broadly So we're going to need to ensure we have a broad molecule set that has adequate coverage of every parameter Which is an easy thing to check. It's a substructure query language So if we have Substrate have parameters that aren't being exercised. We just create a databases and pull more molecules Exercise that parameter and you can contribute your fragments of interest I think there's probably significant We could probably also use a lot of the torsion frag torsion drive fragments as molecules that we make sure we cover That's our logical place through synergy synergy um And so generating soaring quantum data for all the molecules and fragments and using force balance to refit And so at this point generation two we begin fitting a significant set of improper Because we should have the infrastructure built out for this And then generation three is a full valence refit with data-driven types. So katelyn talked yesterday about chemical perception automating chemical perception. So We're starting off with just having the types we currently have and then expanding those as needed Generation three would be starting over from basically scratch and learning which types we should have in a data-driven manner um And so the types will be now driven by data rather than either legacy types or expert generated types So to kind of summarize um we are Using minimized geometries vibrational we plan to use minimized geometries vibrational frequencies and our energy profiles potentially also hashins in fitting these and We can already refit selected parameters without any headache just using force balance directly and doing things locally But we're building out a lot of infrastructure to make this really large scale and automated and reusable and reproducible So and then there's a quite a bit of interesting science to do relating to improper I think so Um acknowledgement I don't have Jessica up here, but she's right there. So she did some of this And jordan is also not there, but he's an undergrad volunteer in the lab who's done this millions of molecules minimization project Um, and on slack. This is the valence channel if you want to talk about it So I'm happy to take more questions there. We have about nine or so minutes before break. Is it break? That's next Yeah I'm told it's on Oh, there you go Um harping back on the frequencies versus um Hessian the um, you're probably going to have what hundreds of thousands of frequencies And so there'll probably be thousands and thousands that are Uh improperly matched by accident or whatever And so if the goal is to get to something that is closer to a turnkey production of a force field from a given set of quantum data then that um Putting in the effort to fit the Hessians instead of the frequencies Sounds like it's going to be uh as a pragmatic matter If not for scientific matter is is going to be pretty important the other the other issue Leaping mentioned that you really want to fit the lower modes More carefully than the others, but you can probably achieve what you want there by uh by Increasing the weight on the one four Hessian elements Leaping you want to address that? Um Give your microphone hold your microphone Yeah, just uh, yeah Yeah, so I so I understood the the first part I might ask you to to repeat the the second part But um, so I guess so I guess we're regarding the first part. Um, my uh, the My experience using the vibration target is that the vibrational modes are never perfectly matched and then there's a Right and there and there are two ways to match them the the first way is you solve the assignment problem and then and every and every mm mode does get matched to every qm mode in the way, you know, it's the common tutorial optimum, but but um But you but you get certain modes that really overlap very very poorly And your and your objective function turns out to be very rough and hard to optimize the uh the other um The the other way is that you simply Is that you simply match the mm mode to the qm mode that it overlaps with most And in that case you get between 10 and 20 percent of the mm modes that get um that like get mapped to the same qm mode Um and and and again like that is that that's certainly imperfect But in but in practice of you know, we've we found it to work when we um when we don't rely on the vibrational data alone You know, we are doing this in conjunction with a with a lot of with a lot of uh other data such as the potential energy scans And it does tend to be along the soft modes where the vibrational assignment performs worst Right, which is which is where you're most interested, right? Uh, right So the so the other issue you're doing the um, you're calculating the frequencies on the force field version At the qm structure. Is that right? I know we minimize with the force field It is minimized. So you're so then you're comparing modes of two slightly different structures as well Which is maybe it is an additional compounding factor. Is that right? Uh, yeah, yeah, I I think that um, I I think that I think that a lot of um, I think that a lot of this Working is predicated on the mm and the qm giving you um giving you Optimized geometries optimized geometries that are very close together if you get um If you have very significant rmsd's between your um, um Like between your qm and your mm geometries and this certainly won't work Or like if you have like two different protein structures, you know, it's definitely not going to work Right, but again, if you if you were fitting the hessian you would fit at the same structure And then there's no issue of whether you're whether your uh structure shifted from one to the other And it's a you know, there's not going to be artifacts put in by that uh structure shift Like if there's a soft, especially if there's a soft potential Yeah, yeah, yeah, that's right. That's right. It makes sense that the hessian should be Should be fitted using the same structure or not different ones Um, I guess I've uh, you touched on this and I probably missed it at the beginning of the talk, but um What are your thoughts on integrating this with the Leonard jones terms because Everything's going to change I imagine when those change Yeah and so Maybe this is almost a question for john, but the the sort of big vision is that everything gets fit All together in some sense. So we have these sort of infrastructure building Phases and then there's a parameterization sprints where Everything's fit in a coupled manner And then test it Okay, so it sounds like, uh Take a shot with existing Leonard jones parameters for the uh, the first generation won't we aren't doing Leonard jones refitting quite yet So it's just going to be changing some of the existing valence So we could we could probably do a lot of Tuning of these methods independently, right? But then put everything together and the whole point of the whole project is to be able to integrate everything together And optimize concurrently all of the different components One thing we do have to talk about which we haven't brought up yet is what about one fours? Um What are we going to do are we going to do the same thing that everybody's done in the past? Because that that's one of the primary means of coupling between those and the more we can decouple them the more Easier it'll be to go back and forth between different components Yep, any thoughts or people passionate And I'm passionate about the one fours, but my question is not related to that So with regard to the improper's It seems like there's often a lot of overlap between improper's and other dihedral's that are in that ring already Yeah, right. And so when when I was fitting dihedral's I never really bothered with improper's thinking that they're they are kind of like Constants like you've pointed out But there's dihedral's on top of them that are doing pretty much the same thing So do you have any plans since the rest of Smirnoff is all about Culling excess parameters to reduce that dimensional space So, I mean, I think With the automated chemical perception, we should be able to Ask how many improper's do we really need? And and in fact, there's three right now in total In the whole force field So and you know because of well, they're mostly in aromatic rings. Although these are cases Notice none of these are cases where it's in ring Chris and then Arnie So going back to the one four's I think there's been very good discussion homing in on what's this issue of the coupling with the one four's And I just think we need to be aware that This problem is even worse than it initially appears because a lot of the Density calculations and bulk liquid calculations Focus on the distant part of the one four dominated by the one over the r to the sixth term Whereas the internal energies particularly for sterically congested structures, which I think will be instrumental in helping us tune The the one four's in the repulsive part because they will be going up this barrier in the repulsive part This is also going to be the dominant factor in the one with the one four Vanderbals as well. So I'm just you know, I think What I it is wonderful that we're aware of the issue and that we know we've got to try and solve this issue And we might have to think carefully especially since we're bringing such diverse bodies of data to try and use to co-optimize These these potential functions. So if we had unlimited time, we would clearly have a breakout session on one four's and You and Dave would we did Arnie you had something take several years to solve that problem But by overlap by overlapping the frequencies. I assume you mean taking the product the vectors One of the ways you might address the correlation between Is by using some of the more Some of the more rigorous forms of at a plane coordinate such as the um Wilson dishes And cross or the one we proposed where it's the distance if you take the distance out of the plane Of the other three atoms you're almost Yeah, I think I'd like the distance out of plane better than what we have right now. It's mathematically. It's easier too Just as far as test molecules, obviously you're going to have a whole set of these as you look at these so um I guess it's like in the urea case. You might want um, this aryl urias plus the mixed alphatic aromatic ones and then with a degree of um, electron withdrawing in your ring, so You can probably come up with a Unlimited set for those but still be able to cover it and make sure that yeah It works well across the mess. That's that's as well. Yeah