 And he was incredibly important and driving the parameterization of the newest generation of the amber biopolymer force fields, the FF19SB, and we're super excited to have him out here visiting. If you have any questions, go ahead and just ask during the talk. We end up being very interactive here both locally and virtually. So without further ado, I'll just hand it over to Zhuang. Thank you, Zhuang. Good afternoon, everyone. So first, thanks to Open FF for inviting me. Thanks, Zhuang, for hosting. Thanks, Carmen, and Hannah for organizing. Thanks to the group. So my name is Zhuang. I'm from Carlos Simulis Lab at Stony Brook University. So this is my first time to be here. I'm very happy to present our new Putin force, FF19SB. So it's been almost five years since we last published FF14SB. That few years was almost my entire PhD also. So we are very excited to present FF19SB now. And the main contribution of 19SB development to the whole field of force field development is we try to systematically train amino acid-specific proton backbone hydrochromers. And we use quantum mechanics energies in solution instead of gas fields. So I'll talk about this force field in the next 38 minutes. So this is our end to my talk. I'll give you a little background on MD and Amber Force because I'm sure most of you are familiar with this. But I want to talk more about the force field limitations, which are also the motivation behind FF19SB development. And then I'll go through the strategies how we parameterize the force fields and then show how great 19SB is in terms of the agreement with the experimental data, different types of experiment, and then make the conclusion. So I guess most of you will agree on this, that computation experiments are complementary in biochemical studies. There are a lot of things that cannot be measured precisely in the experiment. And then the computational models will be very powerful. They will give us a lot of information on the biologies such as proton folding or DNA protein interactions in the nucleism or how in the draft discovery how a Lagann pass through the pathway and bind to the target. And those videos were generated from MD simulation, which use classical mechanical force to do all the calculation. But of course you can use other level theory to do the calculation like quantum mechanics or a mixture of KMM which method you use really depends on what questions you want to solve and what level accuracy you want to achieve. So in several years that we use a classical mechanical force to do most of the biological studies. So in terms of the force field, since it is still a classical approximation to the reality. So there are a lot of physics that is missing in the model. So that's why the force field keep changing over years and it is improving actually. So this is a recent development of Amber force here in the past 20 years since FM99SB was developed in 2006 and FM03 in 2003. There are a few variants after that like FM99SB, IODN from the initial and the IODNQ from Robert Bass. And then after that we have 14SB and then have 50IPQ force balance from LiPing wall and 99SB from the initial. And then now we have 19SB. So this is a proton force with development and in the meantime, so these water models are keep improving as well. And those are not all from Amber but some of them are from Amber developer. So we have T3P in back in 1980s and after that we have a few four point model like 4PW and 4P2005. And then we have force balance three and four also the water model developed in LiPing wall group. And then OPC from Alicin are on the fray and T4PD from the initial. So there is sort of interaction between the proton force here and the water force here because some of the force here have to be parameterized together with a solid model in order to better reproduce experiment data. Ideally that's not a perfect strategy but considering the overall limitation on the class model, it's sometimes it's practically beneficial. So, but as I said, the both the proton force here and the water force here are improving over years and we have seen a lot of great MD studies using this force field. So there are a lot of other ways that it occurs and here I'm just showing some examples of using 14SB to do some MD. So for instance, in the top plot, in our lab we are trying to fold a proton with system size up to 100 residue from fully extended structure using 14SB GB9 too or in the bottom left plot where we can reproduce some local chemical properties like order parameter using 14SB T3P. I probably cannot see this but on the bottom plot, it shows overlap between 14SB and MR data. So it's pretty good. And also in the drug discovery application we attended DSRJC4 to do the binary free energy prediction and we use 14SB GB2 T3P and we have achieved a very good performance which is kind of the top one performance in the session we attended for the 39 molecules. But we can also do other things like CD melting curve or AMRJ coupling and others but with all these great achievements and success with the force field, I always remind myself that this is still a classical model. All models are wrong and some are useful. Another thing is all models are wrong, just a matter of time. So we still need to think about the limitations of the force field and then trying to improve it which the force field will still be compromised. The accuracy of the force field will still be compromised on the computer power. So one thing that the current force field doesn't work really well is, for instance, the correlation on the helioprene state is imperfect. The helioprene state is a measurement from the MR that shows how likely the amino acid wants to form a helix. So different amino acids, you can see different amino acids, we have different helioprene, like alanine will be most likely to form a helix and praline glycine will be the least. So if you look at this piece of data that we... So what I'm showing here is the helioprene state from the MD prediction which is on the y-axis and the helioprene state from the MR experiment which is on the axis for 20 different amino acids. Each point represents amino acid with a value from both MD simulation and the experiment. Helioprene state. So we run really long simulation for each of this peptide. And we can see the correlation is pretty bad. So with R square only point 0.38. So and 40SBT3P seems to be a very reasonable combination for a force field. We can reproduce a lot of experimental data really well with this combination. But in terms of this test, it doesn't work really well. We show a fundamental issue with 40SBT3P in mutation studies. So for instance, if you have a mutation in a drug binding pocket and that causes drug resistance, in fact, then if you want to study it, then the MR side from pulse partate to alanine, for instance, the helioprene state increases. But according to the simulation, it decreases. So, and that's just one example. If you look at the correlation, that means we might make more mistakes in other systems. So we see this problem and we are trying to fix it. But instead of empirically fixing the symptoms, like the helioprene state correlation, we ask a more general question. What fundamentally is limiting the sequence dependency? So what is causing it? So we see the helioprene state is wrong, but that might not be the cause. It might just be the symptom of the force field. So we also generate exactly a lot of the data, trying to find out the reason. And one thing that we found is the error or inaccuracy is in immunoscipt specific backbone preference. So here I'm showing the fight and side distribution for alanine and valine, those data are from the PDB Coil Library. And if you compare these two distribution, they're clearly different. The alanine prefers PB2 much more than extended than beta. And valine prefers a more flat distribution and the beta and the PB2 are evenly, have the even high peak. And the alpha region in valine is more rigid and the alpha helioprene in the alanine is more diagonal. So we see the difference in the PDB between alanine and valine. We're trying to see what is the distribution look like in 14 SP for instance, but we cannot simulate all the PDB structures. We just use a dipeptide, which the comparison might be limited, but since we only include the Coil Library structures, which the structure shouldn't have any crystal packing, it should be more exposed to the solution. So we use dipeptide to mimic that environment. And we can see 14 SP, if we run a dipeptide using 14 SP in an explicit installment and we can see the distribution look almost the same. So both prefer PB2 much more than beta and both have very spherical basin and very symmetric. But as I said, this comparison is limited and we can't exactly match the dipeptide simulation and PDB and also it doesn't make any sense to match the dipeptide to the PDB. But let's look at another piece of data. So we calculate quantum mechanics energies on the grid for dipeptide. So we get a dipeptide and we have a grid of structures and we calculate the QM energies and we plot this and we can see they don't agree with each other and they agree much better with the PDB than the 14 SP. So especially for alanine, if you look at the alpha basin, it's more diagonal and in valine, it's more rigid and it has this V shape in the beta PB2 rigid like here, right? This is very similar to what we see in the PDB. But this is still a qualitative comparison but we do this for all the 20 amino acid when we see a similar trend and that makes us to think that maybe we should treat different amino acid differently using different quantum data. That might be a good strategy. And before moving forward, I want to show some work from other group. So like in Robert Best's group who also do the forestry development, they see the problem in the helical propensity and they see the correlation is pretty bad and they look at the data and see, well, this is not using the same forestry, this on the left is using 14 SP and on the right is using 99 SP but we have the point that the correlation is both really bad. In both cases, so they look at the data and trying to refit some of the outliers like DE and IL and then to try to improve the correlation and it seems the correlation gets a little better. The R squared is 0.5 and slope is 0.6 but it's not very good and also in their fitting since the only refit the outliers just leave the other amino acid unchanged then generally this fitting is not very consistent across amino acid. You treat them differently just trying to improve this correlation. So they have limitations but this forestry was developed 10 years ago and so both of the forestry were combined with T3P water model but I haven't really talked about saline model yet but what if the correlation that we see with 14 SP is really because of the saline model? What if the T3P is not a good one, right? And apparently T3P is not a good, it's not a good one. So this is a data from LSE on the field that showed the errors on the body properties for different water model. Like on the left, T3P has really large error for most of the body property like the cell diffusion constant and also the isothermal compressibility and others. And SPCE T5B is also not very good and T4PW seems a little better but not that good. OPC seems to be very promising. So we see the performance of the different water models we ask what if we substitute T3P with a better water model? Will 14 SP give a better correlation? So here I want to ask you a question. How many of you think this will definitely improve the correlation? And use, okay. Almost nobody's raised their hands. And how many of you think this will definitely not improve the correlation? Okay. Okay, so the rest will be the maybe, right? Yeah, so if the OPC can improve the correlation, then there won't be 19 SP, right? So the answer is definitely no. So with OPC, we won't improve the correlation on the field of pregnancy. You can see all the plus or the field of pregnancy are all reduced by almost similar amount but the correlation is still bad. R squared is 0.27. And I think that's not surprising to me because the water model actually is not a like main function of any degraphy of the solid or peptide rotation, right? Just a number of the interactions. So if you think OPC can improve it, then probably that's too much responsibility to the on the solvent model itself. So the problem might still be in the forestry. And also we try to remove the outliers like DEG and P and the correlation is not getting any better. So go back to this plot. So we think we might need to train, treat a different amino acid with a different quantum mechanics energy. That might be a good strategy, right? And we are more certain that after looking at other people's work, we're more certain that are neither simple force field refitting like what Robert Best has done or water updating solves the problem. We really need more systematic training of the force field. So that drive us to revisit the assumptions that have been made in the force field training that might limit the accuracy. There are a lot of assumptions in the force field like we use like one force scaling that's two ampere perical and the bannables or fixed charge model. But one of the, but a few things that we think are really, really important are the first, we use untapped 1D cosine function for the finite set dihedral. And they are overly symmetric. So here I'm showing two plots. The plot on the left is the current correction in the FI-14 SP, which I, this is a Ramos-Shenro plot on the Ramos-Shenro plot. So I scan the finite set and calculate the energy by applying the cosine functions of finite set. So they are very symmetric because those are uncoupled 1D. They don't depend on each other, right? And here on the right is the perfect correction which was generated by calculating quantum and calculating the 14 SP without any finite set dihedral. And that should be our target. That should be the, those should be what we correct for. So they clearly don't look like each other. And in the perfect correction map, we see the diagonal features, which probably because, we've probably showed that this is a 2D problem instead of 1D problem. And another assumption we have made is we use back bone to the back bone dihedral dependence on the United States insufficient because we use the same atom type for all the back bone dihedral, right? So we try against Alani and apply to all. Alani might not be a very good model. Just if you look at Hilloprensy, this Alani is the highest, right? Probably should take something that someone or something in the middle to make this better. And then lastly is, which is, I think it's a little complicated, is there is an inconsistent treatment for partial charges and dihedral feeling. So we have adopted to use rescue charges that were developed in 1990s. And they were calculated in Hartree-Fox history 1G star to better mimic the in solution because the charges won't change in the MD simulations. So we want some over estimating on the polarization to better mimic the in solution dipoles. And in the meantime, there are a lot of work on dihedral fitting that use gas-based energy and what's even worse, people might make the gas-based energy calculation much more and more accurate by like using NP2CCPBTZ or even like the complete basis set. So the reason why that would make it worse is because you use a charges to do the dihedral fitting and that will counteract the intended crystallization effect when you do the dihedral and you use gas-based. So you are counteracting a lot of the solid collision effects. So those are the three main things that we think are very important. So then comes to our strategy, the force field improvements. So there are basically three steps in the force field parent addition. The step one is a paper model system and then you create some reference data and then you come up with the objective function to gather parameters. So for the model system, we use a bunch of different diheptides and we scan the file inside the hydro in two dimensional space and we save each structure and calculate energies using quantum in solution and MM in solution. And we'll get a lot of energy surface for different amino acid with certain group. Like Lusain, we use Lusain as a model and apply it to the tryptophan tyrosine and phenalanine because we really don't want to have the rain effects incorporated into the back home. But for all the residues, all the rest residues, we have their own like QMN and calculation. And for the objective function, we use CMAP. So this CMAP is not actually objective function and they can do a perfect fitting actually. So when we calculate the CMAP, we just use QM and subtract MM without any dihedral. So the fitting error will be zero. So there is no fitting error in this process. So the error will be in the transferability. When you change the structure, that will cause the transferability error. But in the training, we got a zero fitting error. And eventually we have 16 CMAPs applied to 20 different amino acid with certain group. And that requires a lot of QM calculations actually. Can you explain again what's CMAP? So this is the plot on the CMAP on the right. So we calculate QM energies on the grid. So we have 24 by 24 structures and we calculate MM energies on the grid. And we do a simple subtraction to get the CMAP. It's a spline fit, a 2D spline you apply to two coordinated torsions that you add as a correction onto the individual single 4A terms for each torsion. So it's a coupling term between two different torsions. Right. In the MD simulation, we will fit with the back of this blank function and take the derivative and drive the MD. In the calculation, we just do like single point energy calculation. So that's why I said it's a zero fitting error. But you really count on that the bi-cube spline fitting is zero error, right? If the bi-cube spline fitting has an error, that will be also the error of the CMAP. And we use 15 degree, which probably too big of the spacing, probably can use 10 or others. But that's the limitation of the CMAP also. So I wanna show something that I thought is really worth mentioning in the 19 speed development. The first is we fit the CMAP to the salivated QM. And we do this for multiple amino acid. So here on the left is the QM energies in gas phase. And on the right is our training, the reference data in our training, which is QM being solution. You can see they are very, very different, right? We try to avoid using QM in gas phase because if you look at this energy surface, it doesn't look like a protein at all. It doesn't have a minimum in hours. It doesn't have in the beta and the transition region, the gamma turn is really stable. And that's the training data that we use for proteins. Another big thing that I think is very important is we fit CMAP to partially relax the PDB rotimer. So it's very straightforward for Alani and the glycine CMAP generation. So we scan the FISA and we do the calculation. But what if they have a rotimer? So which rotimer should we use? It's not 2D problem. It's like multi-dimensional problem, right? So that's a very complex question. And different people might have different strategies. But in our method, we initialized all the rotimer to be 175, which is most popular rotimer in the PDB Coil Library. And then I relaxed the side-chain with a strong side-side rescreen to make sure the structure is still on the grid, but the rotimer can be relaxed with respect to that side-side value, right? And then I did this plot, which is also on the Ramos-Chendral plot, which is a heat map on the chi-1 value of the veiling. So if you look at the color, it goes from 150 to 210. That's the dihedral value of the chi-1. So we initialized to be 175. After relaxation, they start to change the rotimer transformation, right? And so which probably shows that we are not actually using the PDB rotimer, which is the most popular rotimer, the trans-rotimer. But if you look at the PDB data, they report chi-1 as 175, but they also report a range because it cannot be that precise from the PDB. So the range is 145 to 205. So we are still in that range. So I think that should be fine. And in practice, the benefit of doing this is we are avoiding incorporating the non-bonding errors into the same map. For instance, there is a hybrid bound error that we incorporate into our same map, then it will apply it anyway, no matter it's hybrid bound, four or not, right? So we don't want to do that. And another is if we use a global minimum for each of the five sides, the rotimer, that means if you walk across the same map, when you change the five side by 15 degree, then your rotimer changed by like 80 degree. That's a big change. Now, what if the side change that hedger has error? We don't want to do that. So in this process, we want, eventually we make sure all the side chains are still similar, but they are different. So that's- Had you thought about correcting the side chain rotimer? Is there doing a more extensive refit of the side chains as well, or was there a reason you decided not to do that? So the reason why we, in the side chain rotimer were represented using the 14 SP parameters, those were systematically, I think we already do a very systematic job on that fitting. So we think we really can do anything more with that rotimer, with the side chain that hedger goes. When you went through this process, then did you leave those fixed and then refit on the separate new data set, or did you fit everything consistently to all the data you had generated previously? Yeah. So did you refit the find side? No, I didn't refit it. Oh, sorry, the kite. No, I didn't refit the kite. So they just left it? Yeah, they were just left it, yeah. Was that data included in your fit too, or just the new data then? Just the new quantum data? Just the new quantum data, just the new quantum data. And they are actually using different quantum data. 14 SP was trained against GASFIS. Ah, interesting. But I have a plot that shows that if we looked at the profile for like Taiwan or Tai Chi, the data, the QM GASFIS, the QM GASFIS data actually look very similar to the QM insolution data. For the same, yeah. Probably because it's veiling, right? It's not very polarized. Yes, and we use like the optimizer that is implemented in Ember, which will also be used when you do the actual emulation to do this relaxation. So as I said, the CMAP can do a perfect fitting for the training model, but we want to test. I think that's the first thing we need to test is if we have a good transferability of cross-rotimers, we use trans-rotimers. What if the rotimers is different in the MD simulation? Is the CMAP still working or not? So here I'm showing the energy surface from 14 SP, QM, and 19 SP on the trans-rotimers. So we initialize to be trans and relapsed, but they are still in the trans-rotimers. Do you mean the trans-peptide bonds? No, no, no. The trans-rotimers for the Taiwan. Oh, I see. Yeah, 175. But it should be a little different because of the relapsation, right? But, and if you look at energy surface, even after the relapsation, the QM still work, still look almost the same as 19 SP because that's how we train it. And 14 SP doesn't look like QM. It's like the PV2 region, this is much more at the energy basis deeper than the beta region. And the question is, what if we use another rotimer? So we pick another rotimer, the gosh minus rotimer. We initialize to be gosh minus and relapsed. And the rotimer was still in that range, but still gosh minus. And we calculate QM, we calculate 14 and 19 SP. And if you compare the trans QM with gosh minus QM, they are different, right? In the alpha basin, there is like a diagonal shape here, but in the trans is more rigid. Also in the positive five region, they are very different. And if you look at 14 SP, they don't agree with QM. And if you look at 19 SP, at least we reproduce this region really well and also in the positive five region. But that's like a qualitative compare, just visualize the difference. So we, we do some quantification, we calculate the QM-M-M difference RMSD. And 14 SP QM difference is 1.7 pKal. And 19 SP QM is nearly zero because that's CMAP. And also in the gosh minus, the 14 SP QM difference is 1.4 and 19 SP QM is 0.9 pKal. So, so next I want to show some results on the improvements of 19 SP RMD simulation in terms of agreeing with different types of experiment. So first to show this fine side distribution, the other unveiling doesn't look like PDV and these are too similar to each other. And when we do 19 SP, we do simulations in 19 SP, at least the difference between any of these reproduced. If you look at the wheeling in 19 SP is on the bottom right plot, the alpha region, the alpha basin is very rigid, right? And it's very rigid. And there is like the peak on the beta and the PB2 are almost the same. And in the RF, in the alanine, the alpha basin is more diagonal, which is also, which is also agreeing with the PDV and it prefers PB2 much more than the beta. But again, this is, the comparison is limited by itself. We just compare DIPAP test simulation to the PDV coil library. So we do some other tests. So we use the DIPAP type AMR data, the G coupling and we compare to our DIPAP that I showed earlier. So we look at the distribution, we see a really big difference from 14 to 19, but we want to quantify it. So here I'm showing the error against AMR G coupling data for different force fields. So the red bars are 14 SP with different solid model. So we run simulation with different solid, different force and solid model and we calculate the G coupling from the simulation using the coupling equation. And then the blue bars are 19 SP with different solid model. So you really can see which one is better, right? But if you think about the equation for the error calculation, we use chi-square that equals the difference between G simulation and the G experiment divided by systematic error. The systematic error is uncertainty of couplets equation and the associated parameters in the couplets, which shows uncertainty of a correlate the dihedral value with some AMR experiment with G coupling, right? If we divide by that error, that means if the chi-square is above one, is below than one, that means that this agreement between the simulation experiments is even smaller than the systematic error of your calculation, right? In that sense, 14 SP and 19 SP are both reasonable in this test. So this is not a very sensitive test, but why do I have to do this? Because in the development, I have some forces that don't really work well in this club. And you can trust me on that. I have a lot of parameters that have like big error bars in this plot. So we have to make sure this passes, right? So and 19 SP with like a really systematic training, we pass this test. For instance, in the group, if we use Alanine CMAP and apply to billing and do this, the error bar can be really big. So the transferability is not very good. That's why we use Alanine, spot for Alanine CMAP for billing. So another more sensitive test is on the clinical pregnancy test. But I show this plot a while ago. On the 14 SP OPC, the correlation is pretty bad with R2.27. And when we update for first year or 14 to 19 R2.75. And the 19 SP war was not trained against this data, which is trained against quantum. And we combine with OPC, which is also a quantum model and got really good correlation. And in order to plot this, we need to run 1.5 millisecond MD simulation straight. Now the replica exchange just straight MD. So we tested 19 SP with other water model as well. Here I'm just showing a few of them. 19 SP with T3P with T4PW and OPC. If you look at the correlation, 19 SP T3P is 0.5 on R2. 19 SP T4PW is 0.6 with slope on 1.3. And 19 SP is the best. So which means if you still remember the plot, I show the errors of the water models in reproducing body properties. From T3P to T4PW and OPC, the water actually gets better and better in terms of reproducing body properties. And which means 19 SP can, a benefit of this is 19 SP can be even as a probe to test the water model quality. And we think 19 SP can work well with any water model that really reproduces the body property well. It doesn't have to be OPC. I think T4PD might also be a promising water model. So what's the reasoning behind that though? Because there are London Jones parameters, right? Yeah. That there's this trade off between how you do the electrostatics and how you set the London Jones even the epsilon and sigma. Right. Yeah. So from T3P to T4PN to OPC, the dispersion of oxygen gets bigger and bigger. So we have stronger interactions between solids and solids from T3P to OPC. That's one thing. And in OPC, they also reach in the electric, the electrostatic interaction, in the electrostatic, the partial charges by picking to the QM electrostatic potential. And like T4P, EW and OPC, both get lots of bulk properties very well fit across the entire range for P and E. Yeah. But they're clearly very different from each other when mixed with the first field, the person first field, right? Yeah. So is there some special property that OPC also has other than just fitting neat properties that you think makes it play well with 19SB? Or do you see this for other good three-site water models or even good four-site water models? This good behavior. I think if I look at the parameters T4P, EW and OPC, I found, I only found that as a dispersion that's larger and larger, then the 19SB performance gets better and better. Interesting. Yeah, I really don't know what other, we definitely see other difference between the water model, but I really cannot see a trend on that. So I think right now, I think the like a four-point model is reasonable and we need to make sure that dispersion is strong enough. And we tested other things like MR, order parameter on the globular proteins like GV3, ubiquitin, lysosome. So on the right, I'm showing the order parameter. The colors are the first field and black are the MR. If you have very good agreement, no matter which first field we use, even 14SB OPC doesn't seem to be a good model, but in terms of this test, it's very reasonable. And that's also why 14SB T4P can have a lot of, there are a lot of great studies on that, right? It's accurate in some cases. And if you look at RMSD histogram on the left, the blue curve is 19SB OPC. We see, very interesting, 19SB is more stable than 14SB. Even though if we look at the helical princly plot, 19SB are slightly overestimating the helicity, but it's even more stable than 14SB, which is kind of surprising me. And this is just for a 200 nanosecond MD because for all the parameters, we really don't need to run that long, just need to run longer than the tumbling. So we also test whether 19SB TN accurately fold a helical structure and the beta-herapy. Here on the left is the K19 helices. The rest of the peptide is 19 residue long. We have a few glycins and then lysing alanine, alanine, alanine, and then some alanine afterwards. So this is the testing system that we have used for 14SB development. So in the 14SB, we empirically cracked the parameters together with T3P to better reproduce this type of data, which is K19 data and also alanine pipe. And that's why the K19 upgrades pretty well with the black dots, which is the MR data. But if you look at the yellow curve, when we switch to a better water model, then the 14SB immediately fails. And if you look at 19SB OPC, both are quantum, but they also were pretty well with the curve. I think that's very, very exciting to us. And also, since we see the helocracy is little shifted to the up left, which little over S middle of the helicity. And if you look at the K19, it's also over S meeting by like 0.1 KKL. So we wanna see if that over S meeting of helicity will compromise on the beta-heropins, the beta-2. So we test the CRN25, we run from extended and native for like 56 microseconds, collectively. And we see, we use 14SB T3P, 14SB OPC and 19SB OPC. And the red one is 14SB T3P. That's the most stable. That's more stable than 19SB. 19SB is a blue one, but it's still reasonable, right? And the yellow one is 14SB OPC and we see a broad peak in the unfolded structures like the high LSD here. So to make the conclusion, in 19SB, we introduced a couple of fiscite agro parameters for each of the mid-axis separately. And we have significantly improved the background profiles for 20 mid-axis, which can be seen from the PDB comparison and 19SB has improved the mid-axis specifically properties like the G-Coupling and Helio-Prenskate and it's reasonably reproducing the secondary structure content, like Helico content and beta-heropins stability and also the order parameter of the process. So in terms of the availability of the force field, we have a manuscript on the chem archive and it's in the revealing process of GCTC and the code parameters and test cases are on the GitHub and everything's implemented in Ember already. If you have Ember tools, even if you just have Ember tools, you can just update and everything will be in there. And Ember 20 will be next year. And we also provided some test cases for outside Ember, but that test case is only on the force field implementation. So we provide some structures, we provide energies from 19SB and the user can track if the energies are exactly reproduced using their implementation to track if they are implementing 19SB in the correct way. Do you know if it made it into an update on the Ember tools 19 conda package or if that hasn't been built yet? Conda package is a conda package where people can grab all the Ember tools which has leave in all the force fields in it as well. Stiff cases group, which is handling. Yeah, I did that patch. But I moved to the Ember 19 with patches. With patches. Yeah, that branch. I think that's a different branch. How long ago was it with the patch for the set of curiosity? Like last week. Last week, okay. All right, so out of the prices. Yeah. So yeah, this is the availability. And with that, I want to first thanks to my advisor, Carlos Simulane. We have a collaborator from Brookhead Venting Wu. And this first year project was funded three years ago by NSF. And also thanks to OpenFS and MSK, since John for inviting, thanks to John's group. We had a nice lunch and good discussion. Yeah. And with that, I'm ready to take any questions. Let's start with any questions from Zoom so we don't forget anybody. Yeah, this is David Mobley and Irvine. I was curious if you feel like you see any evidence for needs to refit the Leonard Jones and how easy it would be to sort of redo this with refit Leonard Jones. I think it's, to our experience, it's not very straightforward to refit Leonard Jones to achieve at least better amenity-specific properties like the Hercoplasty or the- Yeah, this is like- What I'm asking, I guess I'm asking, do you see any evidence that some of the Leonard Jones might be wrong and then if somebody- Oh yeah. Yeah, yeah, yeah. If somebody handed you your Jones that had the refit, how hard did you do it? We did some like empirical correction to the Leonard Jones. So for instance, when we do the Valiant CMAP, we see a disagreement between the QM and MM, but since Valiant has a beta branch, so it might be the number, so the error might be the number on the error, might be the Leonard Jones. So we empirically shift the Leonard Jones curve to the left or to the right in like grade 13 manner and the same that agreement between QM and MM gets a lot better, but actually it's not guaranteed. So some structures, so we have like 24 times 24 structures on the five-size space. Some structures get better, some structures don't. So it's really hard to see if we can do like systematic correction to the Leonard Jones that can guarantee the better agreement between QM and MM from my experience here. Yeah, but I think that's a very good question. That's something that we definitely, because the dihedral correction is actually correcting for the different terms like bound angles and the one for probably the empirical, two empirical and Leonard Jones and also the charge model. So, but since we, if we cannot do a really systematic correction to the non-boundary term or other boundaries, then probably dihedral fading is the easiest way to do and most straightforward way to do it. More questions from Zoom. I can make a question for Esteban from Chile. I want to ask if you have a feeling what the role of explicit waters will be. So if you compare implicit calculations and you would redo it with explicit solvent, would it change your MM reference data or the QM data? You mean in the training? Yeah. So basically you could actually express what the water model can explain. Right. So then make a difference on the dihedral plus? Yeah, yeah. So in the training, in the training part, we use a GVNAC2 plus a SASA term and that will, so in the QM we use SMD solvent model. So they should be canceled on the salvation part. But in terms of like doing explicit in MM, then we probably need to do TI calculation for each of the grid point on the same map. So for each of the 576 structures, we have to do TI to calculate the solvation energies and add to the MM and do the QM and to get the same map. They're probably too expensive. And we think the reason why we use our GVSA in the training is because theoretically, it's very similar to the SM, to the PCM solenoid. They're both using the Born or Anzager model. And so they are, they both define that actually boundary and they have like effective radii. So we think in theory, they should be similar because we really can see they perfectly cancel in the salvation part, right? Did you have to do any refitting of the GVNEC parameters to the quantum level you were using or the implicit solvent model that you were using? If we change the GV model? Did you have to refit the GV model parameters because there's still a bunch of parameters with the GV model? Yeah, we used GVNEC too. So you didn't change those parameters even though they were fixed? But we tested like GVOBC and GVNEC, GVNEC too. Since it's a dipeptide, it's fully exposed to the solvent. We can imagine it's almost like zero, right? So they are almost the same. Okay. Yeah, so even, we even do some TI calculations with OPC tips repeat. They are very similar to the GB energy points because it's so small. It's like just fully exposed to solvent, there's no bearing. So we think it probably find them in theory that implicit is, it should be better because we use implicit in QM as well. I hope that answered your question. Yeah, thank you very much. Well, maybe a second question. Could you remind me what the difference in the charge model between FF14SP and FF19SP is? They are the same charge model. So it's the same charge model? Yes. Okay. Thanks. Other questions from Zoom or from New York? I don't have a couple of questions, Steve. What other criteria you would consider to say that this is a bit too much for you? We usually test it. Oh, yeah. So actually the reason why we do, so if you look at the testing, we after we generate CMAP, we do basically three cents. We run MD on that peptide with that CMAP and we compare it to PDB and we quantify that error by MRG coupling and then we calculate the sequence and compare, right? The reason why we do this is because for the G coupling, that's only dependent on phydihedral. It's like the hydrogen on the amide between hydrogen on the amide and hydrogen on the CR, right? It's correlated with the phydihedral. So with that test, we can, at least we can be sure of whether the barrier between the beta and P2 is right or wrong, but we really don't know about the phydihedral. That's why we do the helical pregnancy test. So we do that following some logic so that we can fully test the whole parameter space. We definitely cannot test the whole parameter space because that'd be so large, but I think that would be, with that three tests, I think that would be enough. And then we combine those parameters, combine those CMAPs and test on the bigger proteins and see if that worked well. I think importantly, your surrogates require only a few hundred nanoseconds of big proteins or a few microseconds or milliseconds of small peptides. So they're tractable and they can find out about a computational time. There is the worry, though, that the J-coupling constants, maybe less so with the S-squareds, but for fractional velocity, certainly, these are all interpreted through a model that the carpolous equation is just a cosine expansion, right? It's probably wrong. And the coefficients are kind of per, have been fit for each, you know, massive type for each one. The parameters that I use for the electric coupling combination is, I think that's our original parameter set. But there are many parameter sets. There are many parameters. Yes, yes, yes. The fractional velocity, of course, is a total interpretation of a CD experiment, presumably a temperature-dependent CD experiment. Yeah, that's true. Even the K-19, the chemical content, they were assigned from the chemical shift. The chemical shifts, I see. Yeah, so they just assume like from zero to one and any chemical shift below, anybody within it just assigned a percentage, right? That's where we got the X-30% velocity. That's how we got this data. Have there been other experimental or biophysical or thermochemical thermophysical benchmarks you've been thinking about looking after in the future or like Dave Case has been simulating crystals. So you actually put it in the crystal, it cools it down and then runs a simulation. Are there other, if you had the opportunity to, you would benchmark again. Right. I think we probably do some real application studies with the forest field like drug binding or other stuff. So we can, but in terms of the parameters, we think this probably enough, maybe we can test AFL because we haven't touched that area. But yeah, since we really cannot, so we have to trust quantum, I guess. So that's our target data, but obviously there are some uncertainty in the quantum calculation as well. But yeah, I think maybe AFL might be something we can test. Any more questions? Anything else from Zoom? If not, let's thank them again. Thanks for everybody who joined remotely.