 Good morning and good afternoon everybody and welcome to the by Excel webinar. This webinar will be number 58. We are very happy to to introduce Justin Lempel that will be the presenter of today from Virginia Tech. He will speak about the charm force field development history feature and implementation in Gromax. We will host this webinar together with Stephen Fair from University of Edinburgh. The presenter of today is Justin Lempel, he's a system professor of Virginia Tech. After his PhD in biochemistry, he received the Kirshen Fellowship from the National Institute of Health for a postdoc work by Alexander McAurels, University of Maryland, Baltimore. There he contributes to the development of the drought force field, polarizable force field, sorry, for nucleic acid. And since 2017 is he has his own group working on pyrolysis on simulation pyrolysis force field on the lodging protein DNA and RNA G quadruplex at the Virginia Tech. All right, thank you very much for the invitation the opportunity to be here and talk about charm force field and how we have integrated it into the Gromax simulation software. Today I want to give you a couple of pieces of information both some some sort of historical and some more practical in terms of the development and use of the force field. Again with a couple of brief notes on the charm functional form we won't go into every bit of detail here but just a few things that are really particularly important to know about charm and how to use it effectively in simulations. So this is the overall charm potential energy function. This is probably familiar to you if you've looked at potential energy equations for for class one additive force fields. But I'll highlight a few things here that are a bit different about charm that maybe you haven't seen if you haven't used the force field quite a lot yet. Here of course are bonded or intramolecular terms the internal terms of the force field are typical harmonic interactions for bonds and angles, periodic interactions for dihedral and are out of plane harmonic energy term for improper the aspects of this that I'm going to go into a little bit more detail on will touch on the dihedral but we'll also talk about this Yuri Bradley term, which is an angle bending term, as well as the the use of CMAP these two dimensional surfaces for the force field. And the remainder of the potential energy form is shared amongst pretty much all typical biomolecular force fields we have our 12 six potential for Leonard Jones or Van de Waals interactions, and then we have a Coulomb expression for our electrostatics based on point charges and these comprised terms in the force field. So I want to spend a little bit of time talking about a couple of these force field terms and I'm going to give you on each of these slides. These are the bits of the force field files from Gromax. That's what sort of the topic of today is. And I want to start by discussing very briefly this Yuri angle, Yuri Bradley angle bending potential. So we have our typical angle bending harmonic form which is you know that the normal valence angle one three interaction here, which is again shared by most force fields, but there's this additional sort of cross term that's used that connects the one and the other ones or Adams I and K if you will I J K in this example here, which is an additional harmonic term. And this is used to account for things like asymmetry in stretching. And it gives better reproduction of things like infrared spectroscopy, the the spectra that come out of our experiments. And so this is something that allows us to very nicely capture those kinds of asymmetries. So this is an additional force field terms associated with this. You have your typical sort of equilibrium angle and reference for theta zero, and the stiffness of force constant of that which goes into the first part of this term the traditional angle bending term, but then some of our angles not all of them but some of them have this additional reference distance, which is given here is this S sub not here. And then you have a force constant associated with that as well or stiffness parameter. And so not all of our angles as I said before, are represented in this way. They are all a URI Bradley term but some of them have zero constants associated with that, that secondary term, because they are suitably represented without them. So this is the end all ring on the side chain of tryptophan. One of these angles here is represented by having this this cross term. The other is not so this would be just sort of a standard harmonic interaction so you'll see a mixture of these in the force field. So this is an angle function five and grow max to indicate that we might have this auxiliary term. One of the other pieces about charm it's perhaps a little bit different than other force fields is that it allows for the summation over several multiplicities for each torsion can have multiple dihedral terms on each torsional angle. So for instance I've shown here this is an asparagine side chain highlighting may not be the greatest highlighted a little bit in green here. So the force field terms associated with this chi to dihedral in the side chain. The atom types are listed here. This is function type nine which is just a grow max way of saying it's a periodic dihedral, however, you can allow for multiple terms for each one otherwise the default behavior is to overwrite previous parameter with something or simply a type one, for instance, and each of these torsion terms has a value of five, which is this this phase angle here has an amplitude or force constant K, which is shown in the equation here, and then a multiplicity or the number of minima that occur in a rotation of 360 degrees so this is a standard cosine functional form for for dihedral that most force fields use. But the difference here with charm is that we can allow for several multiplicities which can each have their own five this could be zero or 180 doesn't really matter. And then their own force constants associated without overwriting them we're actually just summing over all of these terms to give the complete representation of that dihedral. And one thing that's very important here is these these third neighbor interactions or one for pairs and in charm these are not scaled different force fields might apply different scaling constants like like a half or something like that. And these are intimately linked with dihedral, you can't have one without the other and this is one of the reasons why things like dihedral are not really transferable across force fields right I can't just take. You know, I've got a set of parameters in charm for a given torsion and I want to use it now an amber that wouldn't work, because each of those force fields have different scaling factors that are applied to these one for interactions. And really a dihedral is simply a correction factor for inaccuracies in our ability to model non bonded interactions at such short range in those one for interactions. There's really no fundamental force or anything that comes out of quantum mechanics or any other theory for a dihedral rotation except for gosh interactions among the third neighbors or the one for us. And so, so these are intimately linked now we do have some occasions where there's a specific pair type for a one for interaction a different Leonard Jones sometimes maybe an alkanes or something in the charm force field. So we go into that here today, but suffice to say that, given this convention of using a essentially unscaled or a scaling factor of one that necessitates the consistent use of dihedral parameters. And so along with dihedral is we have these two dimensional we call see maps or correction maps to energy surfaces. We've implemented it as these huge matrices of constants in the in the charm force field files implemented in grow max directly a similar format in the charm force field itself within charm. What these are, our energy offsets along five size space which is shown here's a snippet from the 2004 paper from from Alex Michael five and Charlie Brooks. These are the, the intrinsic energy surfaces and the corrections that are applied to them in the molecular mechanics calculations so that you get this final energy surface, you know in Ramashandran space here five side for different size now this one is alanine dipeptide, but they did this for for glycine and proline as well shown here. The genesis of this is that it's very difficult to treat torsions that are coupled with simple dihedral terms there's a lot of factors here that are difficult to represent using simple classical mechanics and so if we use these energy offsets which are have certain characteristics they're smooth and continuous and therefore differentiable so you can get forces via interpolation. It gives us a better energetic representation of the poly peptide backbone. Now this could be applied to any two dimensional, you know any coupled torsions but you'll see this most most prominently used in the charm force field in the protein backbone so you will see these different atom types linked around this central alpha carbon that give us a correction factor for for fine saw and so this gives us a better balance in terms of secondary structure preference. And this was I don't have a separate slide on this but this was part of the refinement in the charm 36 m force field. There are some problems out here with left handed helical balance in like poly glycine and things like that, and other intrinsically disorder proteins that necessitated a small correction out here but we left the rest of the CMAP surface alone. So that's that's part of things, you know, the charm force field developers are still tinkering with as needed. And of course there were a couple of other small corrections in charm 36 m but one of the big ones was an adjustment out here in this this left handed alpha helical space. I want to make a brief note here about the non bonded forces there's not a whole lot to say like I mentioned before you know using a standard Leonard Jones potential cool of equation for our non bonded forces but one of the things that is intrinsic in the charm force field is the use of a force switching function. You have to dig back a little bit into the literature to find the actual functional form of this but I'm putting it here simply for for sort of educational purposes. This is what it looks like in terms of switching the Vanderwals forces or the Leonard Jones equation I should say we have full strength. So just the scaling factor is one within some cut off, and then there's this switching function that's applied between that cut off and an ultimate sort of truncation distance. Okay, and so what that looks like in practice. This is one of the things that that's really important for getting this right. I've got the charm commands here on the left so if you're a charm user this might look a little familiar to you, and then the equivalent Gromax settings here on the right so the Gromax folks might have seen this. The Gromax website is part of the manual because it was sort of a frequently asked question for a long time. And so I put it here and on the website as well. So that the folks are getting this right. Now, we'll mention in a little bit here the use of Leonard Jones PME for for the Leonard Jones interactions which sort of removes this cut off necessity but for sort of historical purposes and for general use in the charm force field these are the settings that people use and we recommend when using this this normal truncation scheme. I'll also mention here which is a point that's often confusing for folks dispersion correction is turned off when using the charm force field except in the case of using the charm 36 force field with lipid mono layers to get the surface area right for the dispersion correction, but part of the parameterization is actually to account for what we're losing in terms of dispersion interactions we repeat calculations with longer cut offs to correct for how much energy we're losing so there should be no additional correction applied during your simulations except in the case where you've got a lipid mono layer that's that's it so if you've got a protein and solution nucleic acids whatever. If you've ever followed me on the grow max mailing list or now the user forum this is something I harp on getting your non bonded settings right. So I wanted to put that here for everyone. And so I wanted to give a couple of notes here about the parameterization strategy of charm this is also another really sort of frequently asked question and I've got this ligand I have you know some skeleton topology how do I refine it how do I improve it. So these are the high points of the really critical things that you need to do to parameterize something to be consistent with the charm force field. So this is the overview of our workflow, and it all starts with with an optimized quantum mechanical geometry here. We always start from this as as step one. If we have parameters for a similar compound we will assign them by analogy if not we have to come up with them to novo. If possible we try to use things that already exist in the force field. That initial topology will then go into our optimization loop, if you will, we do a bonded optimization that looks at the, the geometry of the molecule how to our bond lengths and angles and things compare against that QM optimized geometry. We look at vibrational frequencies for getting our force constants right. All those K values I showed you in a few slides ago. And if we've got the internals reasonably well fit, we can move on to an electrostatic optimizations this is your charge assignment. And we look at quantum mechanical dipole moments. I'll have some some details on that a little bit, and then water interactions individual interactions with water molecules. And then we'll move on to dihedral parameterization because again dihedral as you recall from a few slides ago. These are intrinsically linked with the one for non bonded interactions. So we have to make sure once we've got these charges set that you know things are still looking good or if there are refinements that are needed we need to be tuning our dihedral. And you'll note to that at every step here I've got this arrow back it's all iterative right a force field is a self consistent entity. You can't have one piece be really good one piece be really bad and cross your fingers and hope that it works. While there is some error cancellation and some of this we try to get each of these things as accurate as possible on their own. And while recognizing that if I make one change here at this last step, I need to be looking back at all these other things if I change a dihedral and that changes a torsion angle does that impact my bond and geometry doesn't make it better doesn't make it worse. How might that be linked with the valence angles because it's all connected. Okay, so it's an iterative process, until we get to our final model that we can empirically validate. And that would be in small molecule systems biomolecular systems and that kind of thing whatever we have actual experimental data for. So as an example here and I thought it was rather fortuitous this this came up on the grow max form the other day someone was looking at a similar compound to this and it brought me back to my first days in Alex McCrell's lab, working on CGNFF as a training exercise. So I've got my my molecule here this is methyl benzoate. This is the first thing I ever parameterized. So a little walk down memory lane here. This is a comparison here of the QM optimized geometry here with blue carbons, and then the initial molecular mechanics geometry with green carbons so you can see it's not bad, but you know the geometries don't quite online which means we've got some tuning to do in terms of different force field parameters that I'll show you in a minute. The optimizations are done with with a model chemistry shown here so we're mp2 631 g star model chemistry for all of our optimizations energy calculations dipole moments that sort of thing. And we want to check this geometry these bond lengths all the angles every geometric aspect of this to see if we're correct, as far as our initial parameters that we get from the force field that we're assigning by analogy that we're creating. So here's some tuning of course, and then we can look at things like the vibrational frequency analysis, which you can do as you can do that in the quantum mechanical software you can do that in charm quite easily. And that will help us to decide what the force constants are and tune there. And I'll note here to that the quantum mechanical geometry is what we use for all the initial steps of parameterization. And that means we're fixing our molecule in the QM geometry after we've done this initial check reason being if you have error in the geometry it's going to throw off all your other calculations you're not going to be able to calculate other, you know, dipole moments and that that are actually relevant to the parameterization so we fix everything in the QM geometry first do the next set of calculations that I'll tell you about, then we go back after everything is refined and repeat everything, allowing the system to relax so that we're in the MM geometry and making sure that we have things consistent and and the force field reproduces good properties. So for the electrostatic fitting this is this is charge assignment, and we can use as a sort of guiding point or a starting point if we if we need to. Merse Coleman charges which can be printed out by a lot of QM software. This gives us a sort of initial guess at the charges, or we take them by analogy and this point the comprehensive nature of the charm force field allows us to do a lot of things by analogy we have most of chemical space covered. So we can sort of piece molecules together like little Lego building blocks, and then refine slightly based on the exact chemical nature of the compound. And so, to check this or to refine this we're initially going to target this, this mp2 631 g star dipole moment, we want to overestimate it a little bit now there's no exact number here that it's going to be overestimated but because we're looking at a gas phase quantity, but we're using it principally in a condensed phase application if we're talking about the biomolecular type simulations, you want to overestimate this dipole because that reflects the condensed phase polarization response right put a molecule into water. It polarizes as a function of being that aqueous medium. And so that's what we're looking at here now there's a range of values but ballpark of about 20% or so overestimated is is where most things are in the force field. So we've got the dipole moment maybe we adjust the charges a little bit there, but you can solve a dipole moment with a large number of solutions probably infinitely many solutions in terms of charge assignment now we don't just sort of assign whatever charges we want but you need to check the individual interactions with the functional groups. And that's where these water interactions come in. So if we're going to scan or a single degree of freedom optimization along the hydrogen bonding distance for each one of these interactions we don't do them all simultaneously it's one by one. So this is just all the possible water interactions that we consider here for this methylbenzoate system. You know so you see on this this oxygen atom here we've got this linear hydrogen bond we've got one that's hydrogen bond off of where a lone pair might actually be that we don't have lone pairs here in in this topology but you know there there would be in real chemical system so we're checking all these things, checking the out of plane this one water up here is out of plane from this oxygen. So we do all of these because we want to look and see just how how well we agree with the quantum mechanical geometries and interaction energies with our force field model for each specific functional group. We do this at you know it's a fixed geometry of the compound it's fixed water geometry of tip 3p which is what we're assuming is being used with the charm force field. So you're literally only optimizing the distance here so it's an internal coordinate optimization rather than a full relaxation of the system. And once we have that optimized geometry we do a single point energy interact interaction energy analysis using the Hartree-Fock method now this is quick and cheap and easy to use. But because you know this is now ignoring things like electron correlation and dispersion interactions, you have to adjust this a little bit and we use you know reference data from like the water dimer to decide you know how much work about change these values. So the interaction energy minimum sort of the position of the of the interaction energy minimum gets shifted inward a little bit point to angstrom. And then in the case of neutral compounds neutral compounds only we scale this interaction energy by 16% or multiplied by 1.16 to make it just a little bit stronger. And that's going to be accounting for those missing interactions are the missing sort of physical properties in this model chemistry. We don't do that for charge compounds we find that this this type of calculation is sufficient for things that carry charge you don't want to over scale those and make them too strong. Okay, so these are the adjustments that we make to account for some of the missing factors in this QM model chemistry. And in this last step we have the dihedral optimization again same model chemistry that we use for all of our optimizations. Our QM optimized geometry back from step one. And then we typically scan one degree of freedom one torsion, either zero to 180 if it's fully symmetric save on calculation time, or the full range year zero to 345 with of course 360 and zero being equivalent you don't need to do both of those calculations in increments of 15 degrees. So you're going to just take and scan along one dihedral and intervals of 15 degrees and allow other degrees of freedom to relax. So the only thing you're truly constraining is the target dihedral. And then we offset that to zero and we find the global minimum and that scan, and we offset it to zero because the absolute energies are not really important to us, you know the QM software will give you some crazy high value in heart tree or something, but that includes all the nuclear electron interactions electron electron interactions. Those are things we're not modeling in classical mechanics we're looking at the relative configurational energy of that system, and that's what matters for us in fitting. We usually ignore values that are larger than 12 kcal per mole. So you can see this scan here for for this torsion in methylbenzoate, the initial you know so the QM values here in black, the initial guest parameter or the parameter by analogy here in the red. So, not awful but not not not exactly optimized. And then after a fitting protocol we can recover this blue line and the parameters for this are down here. And again this is the first dihedral I ever fit which was kind of a fun experience, and you can get this this kind of agreement and this was the best we could do you couldn't exactly align it but this was as close as we could get. Okay, and so then, after doing all this fitting you know we're going back and we're checking the dipole moments of the optimized geometry look good to the water interactions look good now with a fully. And then relax geometry that do all of those checks to make things make sure things are consistent. And then validation, you know we move into small molecules we don't just dump these things into, you know, proteins and hope it works we look at small molecules we're looking at liquid properties densities die electrics that sort of thing heats of vaporization, which is a measure then of the non bonded strength between all the atoms, all molecules. And then validation for a molecule if we have it. Sometimes we have that value sometimes we don't. We can look at other condensed phase systems solids crystals, there's huge, huge crystal database of things and this is where we can look at the internal geometries and packing of these molecules are we getting the lattice parameters right this is also an area in which if we have to sort of adjust the bond lengths or something if the QM is a little bit different than the empirical data we might make a compromise here in terms of bond length surveillance angles or something. There's always a little bit of interplay between the observed values and experiments and QM. It's not solely QM driven. There's a little bit of adjustments that may be made here for not reproducing good geometries and something that people are using a whole lot right now for small molecules is the charm general force field or CGNFF. These are the three sort of really foundational papers about CGNFF so if you're not familiar with the details of all of this. I don't have time to tell you everything about every one of these papers I wish I did but that would be a day long workshop rather than an hour long webinar. So please refer here for all the details, but essentially what's happening in this algorithm so there's there's the CGNFF force field itself, the parameters the topologies and all that. There's also a program that was written in Alex McCraw's lab that will assign those topologies to to your molecules there's a CGNFF program that goes through this algorithm shown here that will assign the initial topology for you. So you input a mole to file we'll talk about that just a little bit here to identify the chemical elements how things are bonded identifying rings and hybridization state that helps with with parameter assignment. And then it looks for then it looks for things like aromaticity higher order bonds you're double and triple bonds that helps an atom typing here in the next step. And once we have the atom types, then we can start populating bonded parameters. So taking things, you know, do we have that parameter already in the force field if so good keep going. It's not what's the closest match, and then assigning things by by analogy to existing pieces of force field. And then once we have all the internals we assign charges. Okay, because at that point, you know the atom types they give you your Leonard Jones. These are your bonded parameters to give you all the internal terms, and then we can assign charges again this is done by analogy. So there's some really, really intricate information given this last paper about how those those partial atomic charges are assigned by the program. Yeah, you can access the server here at this URL below. This is where it's currently located. And it's free for everyone to use. And you can upload a mole to file and return a charm topology. And this is what it's doing behind the scenes. We're delivering on the title of the talk here so the charm force field in grow max the implementation how we use it, and I'll conclude with a practical example that that often comes up and some notes for folks to have to navigate some of the things you need to do here. So this is also distributed by by Alex McCrell on his website. The URL is given here. So clicking on his charm force field page there's a heading for grow max. The current release is from July 2021. It was, it was just released about last month but we've we've versioned it July 2021 that's when the actual charm force field itself was was publicly released. And we have two versions of that that are available and these are noted on the website we have our sort of standard, if you will charm force field. And then we have the secondary one that's got this lj PME so the Leonard Jones PME. This is now supported grow max had this algorithm implemented for for several years now, but there was a refinement of the lipid parameters and a few topologies here in this this very very recent paper of Jeff Claude's group. So there's some changes to charge assignment and some of the lipids some some internal parameters and things like that some Adam type some Leonard Jones differences. And these are now in in this distribution. Okay, and I should note too that if you're going to be using these lipids. It's fully compatible with the rest of the force that you can use Leonard Jones PME with the other aspects of the force field and they work just fine. There are a couple of changes in this version. We have a new automated build system that Andras did that's beautiful. It's way better than the stuff we used to have. I appreciate his contributions here and getting this working. And what this does is it organizes the force field way better it prevents having to do some manual edits and things that I used to have to go through and check it. So you now have more of a conventional grow max feel to this where each biomolecule type, you know proteins nucleic acid lipids. They have their own RTP and TDB files and that sort of thing for more intuitive look for the force field files and for use with with PDB to GMX rather than these sort of monolithic merged files we used to have because converting from the charm force field is is not a sec not exactly easy. And so you know rather than trying to code in what belongs where we didn't. And so we had a merged file that he is very smartly split into more user friendly things. This force field distribution, it only includes the charm 36 M for proteins are older distributions allowed you to switch back to the old charm 36 so so this one doesn't. This has become sort of the standard for the charm force for proteins if you want to use the old protein force field. We maintain an archive at this link on Alex's website of the old distributions. But one of the things you can do in the current July 2021 version is use this modified tip three P that they used in the nature methods paper where they parameterize this force field. It's gotten modified epsilon on the hydrogen atoms of water, which make the interactions between the water and a solute, a little bit stronger, and that helps unfold these like intrinsically disordered proteins and things. So that gives you an expanded race and generation better properties that sort of thing. And these are nb fixed back for water water interactions so that all the water water interactions are normal. And so that that allows you to just switch that on and off if that's something you want to test and try in your system so that's that functionality has been added. So validation testing is really, really important of course we have to make sure these things are equivalent between charm and grow max. I have a battery of a test suite here. I build polypeptides that have all the canonical amino acids, oligonucleotides sugars lipids, a couple of other ad hoc systems like water and salt solutions and stuff to make sure all of our nb fixes are working. And then we just compare the energies now of course I have gone in and compare the forces as well on all the atoms for a subset of systems they always come out fine. So it's quite easy to look here at single numbers. So the agreement here is quite good we usually are down into the third decimal place in the potential energy. Some of this comes from using slightly different koolomic constants in the different software. Actually every biomolecular software package uses a different value for this so they're all slightly different, which is a little annoying but if you know that's the source of your error that's that's what's happening. So the breakdown of all the different energy terms and they all match up it's usually just the electrostatics that are a slight bit off, but they agree. And so the force field implementation is robust, and we get agreement between the different packages. So how do you use it how do we use charm 36 in grow max there's a couple of routes you can take. One is the online charm gooey, which is a really really highly functional web server I'm not just saying that because I contributed to this particular paper but it really is great. It allows you to basically do all the things that charm does and then output files that are compatible with all kinds of different simulation software whatever your favorite software might be, whether it is grow max or something else. So what it's doing is it's running charm under the hood, and it's making use of all the things that charm can do which include an internal coordinate builder for building on modifications and terminal patches and things like that, which is the other thing that's really handy and charm is this on the fly patching modifying residues making cyclic peptides cross linking whatever it is. You have all of these things that charm does because charm is a script based program you can just step by step say okay here's how I want to change my system do this do this to this, and then it will output for you. And then the coordinate file, the course going to apology, and then whatever subset of the force field you need to simulate that system, and I highlight that here because, you know, an issue that people run into, especially with, with using things in grow max is they say oh I built this really complicated system, and now I want to add a ligand to it and then all of a sudden there's a bunch of missing atom types or parameters or something. It's a full charm force field it just gives you what you need for simplicity sake. And so just realize if you're modifying things from charm gooey, you're going to have to modify or augment the force field parameter files in most cases. So how do we match that natively within grow max there's all these things that that charm can do that grow max really can't. And so that is combining things into pre built RTP entries, we don't have, for instance, there's no phospho serine defined in charm, they're serine and then a patch that will adjust that topology to become phospho serine. So we had to combine that into an RTP entry by marrying together the residue definition and the patch to make a residue. And so those are the things we've got in place here. So if the hydrogen building database, the HDB files, that's sort of the extent of PDB GMX is internal coordinate builder is it can build hydrogens on and then termini database the TDB files that are the sort of terminal patching. And all this runs through PDB GMX like anything else our charm 36 port that has all these features will give you your coordinates and topologies in in grow max compliant format. So I want to spend a few minutes going through a practical example now this is a tutorial I wrote a few years ago it's been online for a while stop linked down here if you want to go through it. Maybe some of the audience have maybe haven't. But an example here is something people encounter quite frequently is I've got a protein ligand complex and I want to write a simulation of it maybe it's a drug maybe it's an endogenous molecule or whatever happens to be. How do I deal with that. So PDB GMX of course can can handle the protein just fine. The ligand itself has to be treated differently, because in all likelihood we're not going to have an RTP entry for that ligand in the force field alright. We've got a bunch of model compounds from CGNFF but those are not usually full on drugs or ligands or co factors. So we have to run that through the CGNFF program so so visit that URL I gave you before the CGNFF server. And when we have that topology we can combine that into our system topology and then continue on preparing things like we would any other grow back system solvated add ions whatever you want to do to it to give you your input system for the MD simulation. This is the part that trips people up sometimes and we have you know some stumbling blocks here that hopefully I'll help people get through today as I talk about the protocol here. So just a bit more detail on generating a CGNFF topology, you need a syntactically correct mole to file. This can be a bit of a challenge there's lots of programs that will write out more two files the quality of some varies they're usually pretty good but occasionally you'll find some problems. So so multi files going to have all the Adam names. It will assign these these civil malt atom types to each of the atoms. We use this in CGNFF for some of the initial Adam typing that's done. There are bonded connectivity is here what atoms are bonded to what, and then what type of bond is a single bond is an aromatic bond that kind of thing. There are some non unique Adam names which is often the case when when programs export these things. CGNFF is going to reassign those so anything that's a duplicate it's going to assign its own name to so that doesn't really matter. It's a little inconvenient to have this in the first place but for the purposes of the output. It's not a big deal at all. The Adam typing assignment that that's really critical. Sometimes there's there's some weird things that happen here, or if a user is creating one of these on their own. So occasionally the server will return an error, in which case you need to revisit you know the assignment of the Adam types or the connectivity and it'll, it'll freak out if you have like five bonds to a carbon or something like that in which case you need to sort of do your bookkeeping a little bit better down here. And then what does that give you the CGNFF program will give you a charm compatible topology or stream file that's what the str suffix is on that file. It has a due definition in charm format so it's got your Adam names, the types that it's assigned to them, the charges, and then the great thing about CGNFF and I think this is one of the most useful features is it gives you a penalty for each of these these charges and each of the parameters that it assigns which we'll look at in a moment. It tells you where you might need to have some caution in using the topology, maybe there's some refinement necessary. And so what's going to print out up here is the largest penalty for the parameters so your bonds angles dihedral, and then the largest penalty for the charge. The header of the topology gives you some guidance on this you know if it's less than 10 you're probably fine 10 or almost certainly fine 10 to 50 you should probably double check it and anything over 50 is basically like, we don't think this is a very good it's really a confidence score how confident is the program that the analogy is is reasonable for your purposes. Okay, so these parameter these charge penalties are all super small, so I'm not worried about this I think this is a good topology and I'm not going to play with it. You could do some basic validation but being a rather hydrophobic molecule pretty much the only thing you're going to do is a water interaction around this hydroxyl, which has extremely low penalties anyway so there's not a whole lot I'm probably going to do this anyway. Coming down here these are the two dihedral that that are identified as not previously existing in the force field not you know not an exact match based on the atom typing. You can see though the atom types it assigns and where it drew the, the parent parameter from. So again these are super close in terms of the chemical nature of all of these, and the penalties are below one. You're not going to do any better than that so these you're not going to need to touch. In any case, you know the dihedral is that I'm talking about here are generally you know this first one is not even really rotatable because the central bond isn't an aromatic ring. So it's not going to deviate very much anyway. And then the other one is out here to the alkyl group but again, very small penalty. It looks just like any other alkanes, and it's it's going to be a good match so in this case I'm not going to do any kind of refinement to this. If you have massive penalties here. Well if you have large penalties you know you should double check things you have massive penalties then you need to do some parameterization work, which was the second topic today, and how you would go about doing that. And one thing that the trips folks up is is our conversion script itself this is again a program available from Alex's website. We have exactly named CGNFF charm a GMX it's got some suffixes here with which version of Python and which version of the network X package you need. Please pay attention to that it's like the number one thing that people have problems with is getting the right combination of these things. And the versioning is is always a bit of a headache for us because things are constantly changing in these versions. We tell you what we test with. If you have issues please let me know, we'll see if we can work something out and it's it's sort of something that we're interfering with what you need to provide to this this conversion script is the residue name, the original multi file, the topology from CGNFF the stream file, and then the location of the charm 36 force field distribution that you're using. The reason this is the first three files go together into a sanity check. Does this residue name exists. Sometimes they get changed. So be aware of that. It's correctly matching between the mold to in the stream file, the mold to in the stream file are checked for consistency in terms of the order of the atoms, and that gives you a PDV file, right that you can then using gromax. Now the stream file and the force field are then sent through a version check to make sure we have consistency between the version of CGNFF the force field version, and what is included in the charm force field distribution that we've given you and I'll get to that on the slide. But if all of that passes you get a topology assistant apology a ligand topology and then an auxiliary parameter file for that ligand. And so this is just a listing of what those are this initial coordinate file, you can go ahead and use this directly in your simulation of the protein ligand complex or just in solution standalone topology this is useful for running, you know if you want to do free energy implementation or something in water. This could be what's used than the itp file like most other gromax itp files contains the molecule type definition of the ligand. And this is one that people often wonder what it is PRM this is sort of just charm nomenclature for a parameter file PRM. It's just a gromax itp file, and that has in it any of the new parameters that were introduced for your things had to be generated by analogy, they're in here because they're not going to be in the parent force field. And that is is placed very specifically in your topology before the tutorial I have walks you through this it's got to appear immediately after the force field because all of force field terms have to be declared and known before you can introduce any molecule type. So it's got to be sort of right up here at the top of its apology right after the force field inclusion statement. So if we do this check, why are we making sure this is consistent with the parent force field right well, I've said it before I'll say it again, I'll say it to the day I die. The force field is a self consistent entity you can't mix and match them. Okay. Say for instance, a couple years ago or whatever you generated a ligand stream file, and that was with CGNFF version 4.4 or 4.3 or whatever it happens to be. We want to use it in conjunction with our latest force field because of you know whatever reason because you've got modified amino acids that are in this version or whatever. We're now distributing the force field with CGNFF version 4.6. And it's going to be you know having these specific versions of non-bonded and bonded parameters. There could be a clash here, perhaps in the development of CGNFF. We have now refined and implemented some parameter that was previously guessed in your topology. And this includes statement in this order. If it exists in the parent force field, it will be overwritten by the one in your topology, or your parameter file from that topology, and you're using then a sub optimal or unoptimized parameter, potentially. Okay. Maybe you've already refined it in the agreement is pretty good and you're not really doing a change but we do this to help people from, you know, making mistakes we don't want you to be using an outdated parameter when we have actually parameterized this and made it a lot better. Okay. So that's the version check that's something that trips people up sometimes. Sometimes the differences in the versions are not not, you know, super large but you've got to have pretty intimate knowledge of the development history to make that call so we just say look you need to be using the same version of CGNFF from the server as with the force field distribution that we're giving you. All right, so with that I want to conclude and, you know, thank the folks that gave me all of my charm and CGNFF knowledge from my time at, at Maryland of course my advisor Alex McCrell, probably Ramon who worked very closely with me in developing the original charm to grow max port. Of course, Ken Ovan Oneslaga who's the, the main developer and driver of CGNFF he was a huge help in teaching me all the ins and outs of the force field. He did a beautiful job of writing this new conversion program based on the one we had been using which was really clunky. Of course thanks to my group who's always testing and using these force fields and things, the folks that fund the work. And again thank you for the invitation here today and I'll be happy to take any questions at this point. Thanks for the talk, Justin. It was, it was really good. We're going to do questions now so the way it works is I'm just going to read out the questions that you guys have put in the Q&A. I apologize in advance when I pronounce your name wrong. So right we'll begin with a question from Athol Suresh which is how are the energy corrections in CMAP determine for each residue. Is it based on comparing with experimental data or quantum mechanical calculations. And quantum mechanical. So we have, you know, each of the dipeptides that we, you know, they do it for alanine dipeptide and glycine and proline dipeptides. And that forms the basis of all of the backbone terms. So it's QM energetics in the scans of Phi and Psi. And then we repeat the calculations with the molecular mechanics energies and it's simply the difference between the two. And that's that's what the surface is as we see CMAP is is the what is the energy difference for each point on that surface. So it's purely from quantum mechanics there's there's no experimental data involved in that. Thanks. So next one from C young Kim. Can I use LJ, the Lena Jones cutoff free version of charm 36 am in Gromax and which van der Waals modifier should I use. Yes, you can. So, so see 36 am so the the Leonard Jones PME specifically focused on refining the lipids but again, the protein and nucleic acid and all the other parts of the force which should be compatible with that. I'll admit I've not used that that feature. You know the Vander Waals modifier it's it's the LJ PME whatever the keyword is in Gromax. And so I don't know all of the settings for that I haven't tested that yet it's a fairly new thing for the Gromax implementation and I have not really been doing membranes. So I would encourage you to check out that reference that the 2021 JCTC paper for the methods there in terms of you know they're splitting parameters and all that stuff for the LJ PME. And see if you can reproduce the the properties they have there but that's not something I've done yet so sorry I don't really have a great answer for for that just yet. Yeah, so next one from Roman patcher, which experimental data is mostly used for parameter validation of the force field IR or ramen and how much of experimental data is used in comparison with q amp calculations for parameter input. This is on how much experimental data we have. This is an area in which, you know, we really love databases of properties of molecules and some of the stuff that's, you know, really not glorious to produce but really really useful. The IR spectra are one I mean we look principally at vibrational frequency analysis directly from the QM and then directly computed in the charm program the molded program within charm. You know you can look at IR spectra if you have them. And that's one thing like I said with the Yuri Bradley you can get asymmetric stretching and that reproduces IR pretty nicely. And then parameter stuff that's actually fairly straightforward and you can take that almost directly from QM and use that it doesn't require a whole lot else. But we would look at things like geometries you know like from the crystal simulations or something if a bond length or a valence angle differs a little bit in a real observable physical system, then we're going to tune things a little bit there and make a make a compromise with those parameters to get that right. We're really looking when it comes to empirical data, you know, it's it's thermodynamic properties it's transport properties and diffusion die electrics, heats of vaporization free energy to salvation that sort of thing. Thanks. So, from QNU in the paper charm 36 m and improve force field for folded and intrinsically disorder proteins on nature methods 2016. The authors use many different values of Vanderals and Coulomb cutoff. For example, for the RS peptide they use value of both for 9.5 angstroms but for folded proteins they use the values you showed. Can you comment on this and other values you showed applied for both folded proteins and IDPs. Yeah, so this this is something that might my sort of hedging answer is, unless you're absolutely certain that what you are using is going to give you a right answer don't change it. There's a little bit of tinkering with the cutoffs in that paper to ensure robustness of the parameters that they were not extremely sensitive to the cutoff. And there is a, you know, to be fair there's a little bit of wiggle room and what you can do with the non bonded cutoffs in I don't advocate for that because, as a force field developer I know weird things can show up unpredictably. So, please don't take that as an invitation to just completely change things. It is not dependent necessarily on whether a protein is folded or intrinsically disorder, it is a function of the force field the cutoffs are always a function of the force field. It's just that they were looking to see if there might be any, you know, cutoff dependence in what they were doing so, you know, use the, the, the prescribed force field parameters and settings in your MDP files please. If you do a systematic evaluation and find that the properties are identical with a shorter cutoff or something I mean that's that's okay too. Again, there's a lot of unpredictability when you start hacking at the physical model without knowing exactly why you're doing that so lots of caution there but it wasn't because they were just saying certain cutoffs or certain systems certain cutoffs or others. It was a demonstration of parameter robustness really. So next on from Mohammed Sirush, with the LJPME, how do you handle force discontinuity at cutoff for the LB combination rule and how do you avoid end point catastrophe and alchemical free energy calculations using LJPME. Having never done the simulations I can't tell you, I'm afraid I can't. I've never done free energy calculations with them. And so, probably a good conversation maybe for the user form if you want to start talking about use cases or if you're having a problem with that we can dig into it. But I have not used those in simulations yet so sorry about that. Okay. So another one from CEO. When you calculate potential energy with Gromax and validates and validate against charm do you use double precision Gromax. No. The differences are, you know, again, observable to the third decimal you're never going to get them right to the sixth decimal so if, and most people are using the mixed precision anyway. I think these days is certain calculations that you need the full double precision but most people are running standard MV simulations in the mixed precision or what we call single precision and Gromax so if they don't work there there's no point in doing double precision but yeah the errors show up in a much larger decimal place than, you know, down to full double precision. So another question. What is penalties I think this is this was for the, the penalty calculation I guess well how's that defined. Yeah the CGNFF. The short answer is it is essentially a confidence score how confident is the program is the CGNFF program that the analogy being assigned is useful and robust. The exact calculation that goes into that required many pages of explanation in the relevant article there so please have a look at that if you're interested in how that's actually calculated. It's walking through the assigned atom types and how chemically similar or dissimilar they are. And so how far away are you moving in parameter space to make that analogy, but the details are all in the JCIM paper from a few years ago. Okay, so why do you need Network X for the CGNFF charm to GMX program. So that is what builds our graph network for the bonded connectivity. So the, the charm topology gives you a list of bonds, but then for us to be able to write out angles and dihedral and assign pairs and all of that stuff. That becomes a graph network. And that's what Network X is being used for. That is something that is under continual development. And this is why versioning becomes important. Some of the new cutting edge versions of Network X do not work with our script. And some of the really old versions also do not work with our script. So we attempt to validate against a couple of sort of robust versions in each development lineage the 1.x and the 2.x series. So if you run into issues, if you can, you know, install the version that we print out to the screen of we have validated it with this and use that version they're all available you can install them through PIP with Python and whatever you know there's other means to do that but that's why we sort of advertised which versions we've done it with because Network X has been particularly finicky as things change. And we've had some very kind user contributions to get us up to speed with some newer versions really appreciate that. So that's what we're trying to get back from people helping us debug that but but that's what it's used for. So a question from Ludovico. Is there any guideline on how to properly fit dihedral when adjusting from QM to mm. Properly fit those words are doing a lot of work. How good, how good can we get the fit I mean there are automated programs there's some some we swear fitting parameters on Alex's site. And that came up with, and that are actually published you can read about those, you know, sort of minimizing the differences at each point. You know, if you're doing it sort of empirically you're toying with it what are you looking for, you really want the positions of the minima, and, and the relative maximum to be pretty spot on because that's what's going to dictate the range, you know, and the sub populations that you sample in the simulation. But, you know, it's we're trying to get as closely matched as we possibly can and most of the time using, you know, some money Carlos simulated annealing fitting or just the square fitting in parameter space you can get them pretty accurately. You know that's, you want to minimize the difference is really the answer and with always with the sort of chemical intuition of I want to make sure my relative minima and maxima are as close to accurate as possible if there's one little deviation somewhere. You know it's not the biggest deal if you've got the rest of that that energy surface looking good. Yeah, so from you do. Is intramolecular H bond considered in MP to 631 G could charm parameters deal with intramolecular interaction well. Does charm effort force field use bigger basis set. Now, or future. I don't, I don't know if I fully understand the first question. I mean, every interaction is considered I mean intramolecular hydrogen bonds if they're relevant in the optimized geometry, yes. You know that's that's part of the calculation and getting that right now of course intramolecular hydrogen bonds might be different when you put something in solution and that becomes a challenging part of the fitting in terms of what is the relevant sub state when you have you know biomolecular something else in in water because it's going to be competing for the hydrogen bonds. But you know that's we go with what the data tell us as far as what we're fitting against and then we have to make sure that you know again we don't make anything overly strong. And then the basis set. We've done some larger basis that calculations for things like the nucleic acids obviously with phosphate groups and things that becomes a bit of more of a challenge with a greater extent of electronic localization. So yes, some of that uses a bigger basis set. I'm not really going to change in that regard though because if you start changing that you're changing the entire force field, the polarizable force that we have the Druid model which I didn't talk about at all today. That uses much more, you know, larger basis sets more more sort of elegant model chemistries but but the charm out of their force that is going to stick with what it is. So question from Google. Is there a Python script that translates the charm GUI files to GMX is that available. Well charm GUI on the back end is using a Python script to convert all of that so I mean if you want that you'll have to ask the charm GUI developers I don't know whether that's licensed for external use or not but yeah it exists I know it exists but I don't have one or I can't point you to one. Yeah, so we have hit the hour mark but if are you happy to continue with a few more questions just in sure I can stick around for a little bit that's fine. Yeah, we can use a few more we can go through then. So from. So can you comment on patching residues so terminal and non terminal was separate biomolecule for so eg ubiquitins and p lipids. There's no ability to do that in grow max right now. That's one of the areas where it's super easy to do that and charm it's a one line command. And you can patch anything they have lipid you know lipid linkages and things like that for maristellation or whatever you know isolation you want to do. That's in there, but not it's not implemented in grow max at this point I'm not sure as far as charm GUI goes if they might be able to do that I think they can do lipid anchoring type stuff so look into that. Because then it could output grow max compatible files if that's what you're after. So what were the main reasons of removing or not adding new URI Bradley terms in modern charm force field. I'm not aware of any being removed. We use them when we need them. You know obviously you want to have as mean it's not going to incur any real overhead in the calculation obviously but if we need an asymmetry in the in the angle bending. We put it in there if it's recovered well enough by a symmetric stretch, then then we don't. But I'm not aware of any being removed, unless it was shown that you know for some reason that was a sub optimal parameter ultimately. But it's a case by case basis when we're looking at the vibrational frequency analysis. So, thanks for a nice and clear presentation. Do you have comments or insights on a semi polarizable charm force field. semi polarizable charm force field. I don't know what that is. I have the drood model which is fully polarizable. Everything in the system is polarizable in that case, you know, water ions biomolecules whatever. I don't know. I'm not aware of any efforts to make anything you know some things polarizable and some things not I think that would sort of certainly based on what we've seen I don't think that would be physically very realistic or useful. I'm not aware of any efforts to, you know, make charm polarizable or something that's what the drood lineage sort of, well, I got to be careful. Drood is not simply polarizable charm. It's derived, it's, it's origins are in charm but it is a separate force field. So you would not want to combine, you know, like let's say oh I want polarizable solvent with a non polarizable solute. That's not something you should be doing with you know mixing drood and charm or something like that. It's not going to be balanced against one another properly so I don't think that's really anything that anyone's looking into at this point. So question from Giuseppe, when it comes to unnatural beta amino acids are CMAPs as important as they are for standard residues. Can a good optimization of the dihedral terms reduce the error of not having a CMAPs computed. I've never worked with beta amino acid so I can't tell you specifically what that would be. I would imagine that anything involves a polypeptide is going to have to, you know, there's going to be some sensitivity to CMAPs and they're going to be somewhat important. You can optimize as well as you want, you know, in terms of the phi and the psi and of course you have connected you know chi one dihedral and things they all impact one another right everything's everything's linked. The reason for CMAP is, you know, those are, those are soft torsions, they're fairly easy to rotate except when you get into really strained configurations and there's, there's a lot of sort of coupling between those that's rather quantum mechanical and nature. And, and that's just not something that we have found we've been able to get really good agreement on now you're seeing this of course like amber 9919 SB residue specific CMAP saying that each residue is somehow different and not just the polypeptide back but there's a real sensitivity to that. So I really think that that's that's a critical thing to get right in in simulations I don't know if you can just sort of hammer that out, just with dihedral, you can get really close. But I think there's always going to be a little bit of correction that's going to matter there. So question from Sondos, I'm doing binding free energy calculations for carbohydrate protein systems and grow max using charm 36 force field, the van der Waals contribution of the energy always has a negative sign. Is that normal. It's hard for me to prejudge the result of a system. I don't know, you know, depends on what it's binding to is what I would say, you know, maybe it's maybe it's unfavorable. And that that could very well be just depends on the nature of the interactions that are being transformed there so without a lot more context I can't really say, but that, you know, could be a good question for a discussion on the user form if you're interested in digging into that a little bit more. And then a question from Ernest kind of polarizable force field to be used with grow max. Yes, with some limitations, and they're not officially so our drood model is supported we published an implementation a few years ago. It is not officially in grow max yet, sorry, it is in a developmental branch, Alex's website has details on how to do that. It will work on GPU and it will work with open MP parallelization. I fought the dragon of domain decomposition for many years, tracking down some bugs, we had to do a complete rewrite of it post publication. Everything in the publication still works, and it's valid, and what we have implemented is valid but for robustness in the code I had to do a lot of rewriting. And of course I took my position here and working on that has become a little bit of a back burner thing. So you can check out a developmental branch of grow max that will make that will work. There are some things that are not implemented at the moment. There's no barris that right now in polarizable simulations just because that's, it's easy to do but I figured I wouldn't implement that until domain decomposition work but you can run on GPU you can run with open MP. It's not implemented because we do not have the through space tolay screening yet that really gets into the bowels of the non bonded neighbor list calculations which if you do that in a clunky way your simulations are going to absolutely grind to a halt. So we have to be very careful with that so with a couple of caveats. Yes, it works. It works quite well and they're quite fast, but they're not officially supported yet. If you're really interested in that you know reach out to me and I can answer some questions on that but it's been a been a long, long haul getting that getting that working and with some of the new features in the latest grow max versions I'm probably going to have to do a bit more rewriting of some things which ultimately is a good thing it'll be easier to implement, but it's going to require a little bit more work before it's an official feature. I have a couple of questions now so thanks for that. I'm back to Alexandra to just finish us off. Thank you. Thank you everybody thank you all that indeed for the question. Thank you Justin was a great talk. I just want to announce the next occasion. So, it will be the nine of December and we will have a webinar on x3 DNA. So please, if you're interested in a visualization and nucleation structure, join the webinar. Thank you. And I declare close this webinar. Thank you everybody.