 So today I'm going to give you sort of a big picture update of progress and status and hopefully get you excited about what's to come. So I'm not going to drill down into detail on any particular issue today I'm going to give a big picture and you'll hear a lot more about a lot of the specific issues in what's coming up, although probably there'll also be things that we don't drill into that much except for discussion. To remind you where we are and we are so we, you know, there's a great group of industry partners. Let's get here. Supporting this initiative and the way it currently works. So the initiative is the term we use for the broad the whole effort, including all of the science and then the consortium is the formal funded components, or industry funded component. So these folks, and we really appreciate you, find an agreement and give money to MOLSI, which serves as a coordinating intermediary and then gives fellowships to folks managed by the academic PIs and mostly in these groups. We also have a couple great external consultants and we really appreciate everyone's contributions and support, either money or some in some cases in kind support in terms of people. So we're partly the initiative partly is about getting, I think, us as a field to where we would like to be force fields that are really accurate and broadly useful and so some of the types of things that we would, I think a lot of us would like to see in a few years are things where you can easily take the working on a synthetic project and cover some chemistry that's not covered by a current force field and almost immediately have a custom force field for that that's going to work very well for your particular chemistry with a minimal amount of computational expertise and time spent or dealing with new covalent probes in a chemical biology context having force fields that can accurately model those. Being able to do force field science rapidly where you can compare, let's say a polarizable force field and a fixed charge force field that are fit to the same data and look and see what the speed versus accuracy tradeoffs of those are going to be and pick what's going to work best for your particular target, your particular project. We'd also like to end up where we're not the only ones doing lots of force field fitting there's many people doing it, and we're force field science is thus able to progress a lot more rapidly. Yeah, how do. Yeah, I think that might be a topic a good topic for discussion I think the question is how do we avoid going to the way of DFT and just standing up with a billion force fields that are highly variable. I think one aspect of that will be that, you know, if the force fields that tend to be broadly transferable and broadly useful battle. So if it connects to the connects to the benchmarking infrastructure, if we clearly benchmark everything will see the utility and transferability of different things. Yeah, yeah, so let's come back to this in discussion. So the initiative isn't grown up yet but to give you a recap of the history, Chris bit after lots of discussions over the years about force fields complaints we finally started to decide to start doing something about it. So Chris Bailey arrived in Irvine on sabbatical on June 12, and I helped him get checked in at the place he's gonna be staying and then immediately my wife went into labor. And so this was the start of a new initiative, Chris arrives and we have a new baby on this same day. And then, you know progress accelerates as the consortium launched and there was formal funding in 2018 and so now I think we're kind of roughly the same place my daughter is we've got, we're working on our letters he's got a book there and at the same time we're doing our first full force field refits so we're maybe not all grown up yet, but you know, we're not anymore either. So today I'm going to tell you briefly about going to recap what aspects the open force field initiative involves tell you some about our progress on automated bidding so far. I give brief updates on release one and this actually says in the next few days but that's prior to last night when we think actually actually has it ready so plans for benchmarking and getting going with it and some thoughts on where we're going. Let me tell you what the big picture is in terms of our key parts so we start with some kind of an initial force field, and we have a bunch of infrastructure that's focused on fitting. So we have our open force field toolkit which is the thing that actually applies a force field implements the force field specification. We have a couple of key components that go inside this infrastructure box that are going to take in data, one of them is a parameter optimizer which currently use force balance and it's doing least squares optimization. And then we have something I'll come back to in a minute it's called a physical property estimator that can compute condensed phase physical properties like densities and dielectric constants and so on. So we can use this to fit to experimental properties, and then we have the data that we're going to be using so quantum data that's in QC archive and experimental physical property data. And so the idea is that this, this piece can work essentially independently of a person and produce an optimized force field that then we would assess and maybe the assessment will fill feedback into more fitting or into a release. So we have key components and you're going to hear a lot more about each of those components in the meeting. Our key infrastructure at this point again involves the open force field toolkit QC archive which is where our quantum data is being stored and it's available to the public force balance which is our current storage for fitting, and that can use data from QC archive and also connected to connect to the condensed phase data so the property the property estimator is another key component that the idea is it computes whatever properties that we're interested in condensed phase properties but potentially even ultimately binding three energies and other things and it can use that the property estimator can be used both for just computing those properties but also with force balance for fitting to those properties. And then there's a variety of other tools we're working on but those are sort of the core of that automated fitting box. So how are we going on automated fitting and I'm going to begin this with a recap because I know some people are just joining us or won't remember that clearly send a recap, where we started. We started with a new force field format called the Smirnoff format that doesn't do traditional atom typing. Instead it uses substructure searches to assign parameters. And one of the key differences about using a substructure search in this case with smarts or smirks is that instead of having a process where we first take a molecule and we assign types and then we use the types to look up the parameters. We just take a molecule and do substructure searches on it to assign parameters. So it's a direct parameterization process rather than an indirect process which ends up resulting in some important differences. That's been published and it's been around for a while. And what Chris Bailey did on his sabbatical and some work in my group continued on was courting Merckfrost, Parmafrost force field, which is an extension of Parm 99, a general small molecule force field, courted it into this format. So you could you could think of Parmafrost as a sibling of GAF. So it's an amber family small molecule force field. When courted into this format and simplified it has far fewer parameters than Parmafrost, which has something like 3600 lines of parameters, whereas here we have about 330. So it's less than one tenth the size it removes a lot of redundancy but at the same time it also actually covers more chemical space so when we've applied it to a couple of different tests that we're using. This is a zinc subset. This is drug bank and significant fraction of molecules. We actually cover more of the set. And we've, I don't want to belabor this because it's been around for a while but we benchmarked it on a number of things including hydration for energies and basically we're expecting. This is the Smirnov versus GAF plot we're expecting because it's basically a sibling of GAF in a different format that performance would be roughly comparable to GAF on a wider variety of properties. And that's what we've seen so far. So our initial pass was just essentially to make sure we hadn't messed anything up too badly and it seemed that we hadn't, but we have far fewer parameters. So that meant it's a great place for refitting. So this year we've been busy getting ready to do that and actually doing that so in October was the formal launch of the consortium and we spent a lot of time on infrastructure after that. Our first publication came out here and then we spent, you know, a couple months on infrastructure of the basic toolkit including getting RD kit support in so that people could use our force fields even without an open eye license or as an alternative. A lot of development on QC archive automated torsion drive pipeline and a bunch of other things. So the toolkit and force field infrastructure and that overlapped with some data set curation figuring out which molecules we're going to run through our QM torsion drive pipeline, do QM geometry optimizations on and figuring out what physical property data sets we're going to be using initially. We said we'd have this first optimization fit sprint, which is when we actually produce the first refit force field rather, rather than a force field that's just a port into our new format. And so what we wanted to do that was, was refit all the bonds angles and torsions and refit winner Jones on a selected set of neat liquid properties and curate some host guest data for use in benchmarking. We're here where basically we said we'd be releasing a beta of our force fields at the beginning of September and we're doing that now. So, and then we're heading into time period we're going to do some assessment of that to see if our refits have improved things are not and get a sense of what that's doing to accuracy and for different types of properties. So what we had said we were going to be doing in year one was releasing and validating the format and the initial set of parameters and get out the toolkit with open ID and RD kit support. And so we've done those things QC archive platform that we that multi has worked on and we've helped support is out and a lot of our data is in it. We have the property estimator working and it's playing a role in our refits and we have refit bond angle and torsion parameters along with selected Leonard Jones. So the last thing on our list is just assessment of assessment of our first refit force field. So it looks like we should be on track to do that. So I want to point out that not very much of our time has actually gone into fitting force fields, which is kind of what the point of all the infrastructure work was so there's this big period from October to end of May where we were just building infrastructure, and then some data generation and creation and then you don't I guess in the Ping's group, you'll get to hear more about this but there's been fitting going on for some of June, July and August and this is more than a dozen refits of the force field already. So it doesn't take a really significant investment of human time anymore for us to refit a force field which is exactly what we want, because then we can make rapid progress. It's it becomes a data driven thing rather than a human driven thing. So it's becoming routine. A lot of folks have worked on making this possible but I was looking at you don't notes or releases and he did a first test refit on May 31 that was a very limited set of torsions only. And then the next one after that he did. So it was just to make sure infrastructure was starting to work. Then the next one was July 11 using torsions and optimized geometries and then there's been about a dozen more refits since July 11. Some of those included adding new science like excluding torsions that had strong internal hydrogen bonds, fixing some issues with chemical perception fixing some hierarchy problems freezing angles that were linear adding new targets like vibrational frequencies. And then most recently bringing in selected Leonard Jones refitting. And so all of that's happened since July 11. And it's reached the point where this is automated enough that we can do a full refit of the force field to check scientific questions, like, would we be better off with changing the chemical perception in this way. So let's try that and refit the whole force field and see what that does to accuracy on the train set so benchmarking comes later. So that's really, really helpful to us. Instead of having to guess whether something is going to improve things and just do it. We just get to try it. Force fields are going to be have normal names or boring names like open ff 1.0 and they're going to also have code names after herbs and so our first one is going to be called parsley. And so the names are going to be open ff dot x or x dot y dot z where x is basically the functional form. And why is the phase we're in the fitting, and then Z is the fix. So the first one is parsley. And so this is available as of last night, though, over the course of the meeting it should get a little bit more accessible right now it's a little bit hidden in a tarball somewhere that you can get out of if you want it now. But it shouldn't buy that by the end of the weekend should be really easy to get. So, you know it's used this infrastructure we built to refit essentially all the bonds angles and torsions except for some very, very unusual chemistry that's hard to cover with with our data. And we've refit a selected set of Leonard Jones to condense based properties density and heat of vaporization at this point. Part of the Leonard Jones refit goal at this point was just to make sure our infrastructure is working properly for that. And so we've done it in a pretty limited way. For pure solids only ready to extend the mixtures and out to other properties but we're not doing that yet. We think just Leonard Jones. Oh yeah, I can repeat questions the question was, are we doing pairwise Leonard John they're just standard Leonard Jones and the answer is just standard. Okay, and Mike's grabbing Mike. So, we'll see what happens in terms of accuracy from this we think that the whole force field will be an improvement because it certainly seems to be a significant improvement on the data we're fitting to. And but most importantly for us it's a proof of principle and prep for subsequent releases as we start being able to fold in more and more data. And especially on the Leonard Jones will be able to refit additional properties. But so we think that people will, we hope that people will see improvements in accuracy already will find that out soon. And in terms of what QM data we're using. Daniel Smith and others including you on who put a great amount of effort into this on the QC multi multi is quantum chemistry resources that we're relying on for this there's QC portal that allows us to interact with QC fractal which is the workflow engine basically, and we can run a lot of compute on our clusters from that. And then the results get deposited in QC archive where we can pull them down and use them for fitting and we don't really have to worry about where the data lives, it just is easily retrievable so our data sets in QC archive we now have about 2400 torsion which is 170,000 geometry optimizations 50,000 Hessians and something like 3.4 million gradient evaluations. We're not using all of that in fitting yet. So some of this is much, it goes significantly beyond what we're fitting to at this point and will likely be drawing on some of it in testing. It's a significant amount of data all the torsion drives is worth noting so far are, whoops, that bullet point is incorrect. All the torsion drives so far are on molecules that are fragment size so whole molecules we haven't done fragmentation of larger molecules yet, but the fragmentation of larger molecules automated fragmentation is essentially ready. So we're going to be doing that very soon. So we have a good set of data. I'll give you a couple of peek into some of the data so the very first set we ran through was what we call the Roche set so a set of fragments or molecules submitted by Roche and here are some of them. It's 468 molecules and these are all small enough that we're doing 1d torsion profiles on them without fragmentation. So we ended up with 798 1d torsion profiles you have 936 optimized geometries, 660 vibrational frequencies. So that's the first set we used, and there's some code here at the bottom that you can't quite see that would show you how to retrieve that data from the archive. We also have a second set of molecules I'll show you. If there's like a couple torsion, do you just run one independently and then run the other one and then say that's it. That's what we're doing right now so we have the ability to torsional scans that we haven't done many of them so far because of the cost, but we're going to be looking at that. Or n dimensional torsion scans more generally. Yeah, and I know leaping has done some of that for some of the force fields he's fit in the past so we're not doing that yet, but we can do that. Right okay and then so the coverage set an undergrad in my group Brian Tanaka did essentially a greedy set cover on molecules to try to find the smallest number of molecules below a certain size that would use all of our parameters. And then we ran this through so we just want to make sure we had at least some molecules that would cover essentially all the parameters in the force field. So, we've run those through. And so for those, it's it's a rather small set of molecules I think it's a little under 100. We have 417 when you torsion profiles 831 optimized geometries and 235 vibration frequencies. And here's some of those molecules. So some of them have rather unusual chemistry because we're trying to make sure that even our very odd ball parameters get extra sense. So that is available. And also there's condensed phase data so the goal here was mainly to test the infrastructure we're not necessarily expecting to do it to see major changes with respect of Leonard Jones parameters, but we wanted to make sure it would work. So here's so we have a data set of 58 data points with 30 molecules were fitting to pure solvent properties just density and heat evaporation at this point. And these are the parameters you can see one at the bottom but these are the how much data we have for each of our parameters in this set. We don't cover all the parameters we cover a subset of them but these are the ones that are covered. So, here's how many how many data points we have for each. And so this is largely the work of Simon Boothroyd here. And here's some of the compounds involved. Here are some core cases that you would expect so we're drawing from this thermal ML and thermal ML is relatively new so it turns out it has some gaps with respect to data that compounds that you would naively expect to have like for example they don't have densities and heat evaporation for benzene and certain other compounds like that so we're going to have to curate some of that data from elsewhere. So this is what was in our first set of compounds. And so we do have some interesting things there, but some gaps. Apparently we're using M1 BCC charges for everything so that's supported in both chem and traumatic tool kits and likely there will be support for some new charge model and new BCC is coming soon. And you'll hear about that later on. So we're starting to dial up your constants yet. That's something that we are very interested in but we expect that's going to be more important and more helpful once we have the ability to also adjust BCC. So with I think there's science questions that need answering about what will results in the best accuracy and predictive power. It's really fitting great but does it actually work. So I went to we will start off with a quick sanity check. So Chris had pointed out in June that he thought he had a mistake or error in the bond length for ethers. So this is what he said it has egregiously short length 1.370 could we please it to please change it to 1.430. So let's say no let's see what happens when we refit it. So here's the refit value for that. And so Chris is the wizard. And actually the first refit was 1.430 subsequent refitting changed it so now he's off 5.004. However, in my group also noticed the same thing. So, you know, originally we were looking at 1.37, which is for esters but if you look at this, this is some torsion drive data for a variety of different compounds and histogram it here's what the bond lengths look like. And so yeah, we're seeing 1.442 or 1.43 being more consistent with this. Yep. There's a question. Yeah. It depends on and I'll just throw this at you don't know the question is what what clock time how long does the refit take. It depends a lot on which data we're fitting to so if you refit including the Leonard Jones you're going to be taking the days. But if you're doing it just to the just the bonds angles and torsions in my number is ours do you have a more exact number. 20 hours. Right. Right. Yeah, Simon do what's the condense phase looking at right now. Condense phase fitting currently taking a day or day and a half depending on resources. Okay. So we are seeing improvement in torsion profiles so I'm going to show a couple of these to have some info on the bottom about which parameter is and which what the molecule is. And then the smirks count at the bottom which you can barely see is 38 so that's how many times we use this parameter in the set. There's one plot of one molecule but there's 38 times this is used. And so orange is where we were originally green is where we are after optimization, not just to this molecule across all 38 molecules and blue is the quantum. So we're seeing an improved profile here. Here's another example where we're looking at sulfur carbon bond rotation in the couple of different molecules that I've shown there. And so again, orange is original blue is quantum green is new. Julia has a question. Of course. You're fitting to quantum chemistry gas phase data. Right. Yeah. So you shouldn't expect to get exactly on top of them. Okay. Probably not where at this point everything is pretty consistent with how these historically benefit. So there's a lot of science questions one might want to answer about what if you don't do that. Mm hmm. And oh yeah so there's link there at the bottom you can get all of these plots yourself and browse through them. If you like. There's also clear improvements in other parameters. So these are plots that look at quantum value for a particular bond link versus and value for that bond length. Orange is before refitting blue is after refitting. There's a lot of changes and improvements. So you'd like them to correlate along that dash line and at least, and if they're not going to correlate along the dash line at least to be have a dash line pass through the data. So you can if you want you can browse through the exact changes to the parameters here are equilibrium angle parameters for equilibrium angle values and changes in values for certain angles. You're not supposed to make much of that data other than that there are some changes some of the premise they're sorted in order of which one has changed the most so some of them change a significant amount some don't. Here's force constants for bonds for example all this data is in there. Um, the Leonard Jones refitting does also result in some parameter changes the largest thing for chlorine this point. And it's not a refit of all of our parameters is just some of them but we're seeing relatively modest changes in these, which we think is probably good. Yeah, so the, the, the, the left hand legend or the bottom legend. What's that. Oh the colors. Yeah, so red is a decrease so the blue is the new. So you don't use it blue is new that blue is the original so green if you have a green bar that's taking you up to the new value, red is down. And, but our data also does show the need for improve what we call camera chemical perception or essentially typing this is some data Trevor in the group generated or happened to pull from QC archive. This is just looking at torsion. So horizontal axis is torsional angle versus bond length and so this is for many molecules that happen to use the same parameter. The series along the horizontal axis are as you drive a torsion. So what you can see if you histogram this data is there's basically three peaks in the bombing distribution. And, you know, right now this is the current parameter value, but we're basically seeing three kinds of chemistry grouped together in the same bond parameter. And so, presumably, one can get better values for this if you split out these three aspects of three different chemistry. There's a lot of that type of thing going on. We'll need to dig into that eventually. Part of what we eventually want to do is automatically be able to split out different chemistries from this type of data but that's not really yet. Yeah, that needs to be worked out to the question was how do you avoid overfitting. I mean, partly I think Bayesian techniques can help with that and that's something we'll come back to later in this meeting but there's a lot of science to be done. You can see things like this and it's clear that you need to do it but then how do you fully automate that there's a lot of work to be done. Chris says that's so good I waited years to see this showing up in a general way. He's going to hand you the, we're going to hand you the mic so you can say that again. For the recording and for the people who are not here. I just saying this is so good just to see this graph that you don't have to be somebody in a back room trying to guess that there's three different kinds of chemistry wrapped up here. We don't need we won't even have to guess what the chemistry is because Kemper will just sort it out. It'll just be taken care of in a timely fashion. And I've been waiting years to be able to see this science just put out so plainly like that. I guess I should just thank everybody. I guess this is kind of like one of the like this is has a bit of just about everybody involved in it to get this graph. Yeah, so we weren't looking for this at the time but it happened to come out of the data. So next comes benchmarking. So benchmarking. We've done a lot of work for on infrastructure for fitting force fields. We haven't done much work yet on infrastructure for benchmarking force fields other than that the property estimator works just as well to calculate things as to fit to them. So we can use that the short term benchmarks that we want to do and need to work on automating our energetics relative to QM on data that we're not fitting to physical property estimation for pure solvents. We can do those estimates but we have to pick the data density heat of vaporization probably die like your constants and those are already available in the property estimator. We can do physical property estimation for mixtures like enthalpy of mixing and excess density. And we can do host guest binding. David Han who's here is a new right there. He's waving his hand. Thank you. He's a new open force field postdoc at Janssen. And he's going to be working on benchmarking on protein ligand binding. And we just talked with Vitas Gapsis from birth groups group who has done a bunch of benchmarking of protein ligand binding with GAF and CGNFF and the other force field that she'll not be named and has a pretty automated workflow that can run through the whole jack set plus a few other sets they've curated pre prepared protein ligand systems so it's just not benchmarking preparation just benchmarking binding free energies using relative free energies and also non equilibrium techniques. They can run all that they get the numbers out and he's interested in running his whole set of benchmarks with the Smirnoff small market force field as well. So we think that will work great as a first pass because we can just run ours through it and be able to compare with those other options already. But David will be and has been in touch with a number of folks already about longer term benchmarking plans. Also, there's other things we could benchmark on. I think, you know, you can give us some input on that. Also, we'd love to have you guys try it out on your favorite tests and see how it does and see what what issues there are. Another thing we can potentially benchmark on our strain energies of CSD structures that could be really interesting. I know they have expressed interest in the possibility of releasing some of their data. So host guest binding Dave Slajaro who's here has done some nice work looking at Smirnoff 99 cross performance on binding free energies for host guest systems, binding enthalpies of host guest systems and comparing that to GAF. And so I think that that's basically working in the property estimator and we should be able to use that on our new releases also potentially that could eventually be used for fitting this type of data. Also, we don't know if that's going to be a good thing or not. So those are different chemical series binding to this. The three colors are different chemical series binding to the same host. And I think we'll get to questions pretty soon so people can ask a lot more of it. And yeah, so he uses the technique for binding free energies that from the Gilson lab called attach pull release and so that's in the property estimator. So ideally, what will we benchmark we benchmark GAF and or GAF to we benchmark string Smirnoff 99 frost, which is our starting point and then our beta or release candidate with the Leonard Jones refit and without the Leonard Jones refit because this is a pretty limited Leonard Jones refit done primarily to ensure that our machinery is working so we don't necessarily expect it will make things better. And so then I mentioned the protein ligand binding, which should get as comparison with some other force fields that we are not ourselves running and we hope that you'll begin using this in your own favorite tests in the next in the coming weeks and get back to us on what you're seeing. And you can use our force fields now and the release candidate is going to be soon condit installable so we should be able to getting out announcements as soon as that's ready. You know to prepare a system is super easy. If you have a molecule you'd like to parameterize this really just this you just import the force field. The force field, you import molecule you create a molecule out of let's say a multi file, you assign the force field using this line, then you make a topology out of it and you create an open and system and from that open M M system you can easily export to amber gromax format or you can go on to use it in open M. You'd use the positions when you're, when you're doing that. I would point out that if you're starting, I forgot to put it on the slide, but if you're starting if you're using RD kit, the RD kit back engine back end rather than the open I one, you should be starting not from a multi file because RD kits multi support is pretty bad. So you'd probably want an SDF file. Russ. Here's a mic. The question is, is if he's able to have create amber system create gromax system. Yes, it is. We haven't done that yet because we felt like that kind of makes us responsible for the thing that creates them which is parmed which is not something we're currently funded to work on. So, but it's two or three lines of pipeline to do it and there's examples of it. So if we can, we could do that. We just worry that people will blame us for kind of problems. If you go back one slide. Sorry. Yeah, the, how do you, what's the process for doing the LJ refit you do all the Leonard jokes from the solvent simulations and then refit all the torsion subsequent to that. Right. So currently what we do is we refit all the bonds angles and torsions once because we did that first and then we refit the limit space properties then we go back and refit the bonds angles and torsions again. Of course they don't change very much at that point but the torsions change more than anything else. One thing we'll explore eventually is what happens if we refit simultaneously because we can do that, but it's unclear whether that will be a good thing or not. Wait for the mic. Wait for the mic. Follow up is what happens to the charges after the LJ. Okay, so right now we're not doing anything with the charges. So we just have M1 BCC charges they stay M1 BCC charges. So eventually those will be able to change as part of the refit but not yet. So how will your infrastructure deal with users who may want to fit the force field to molecules they cannot release publicly. Yeah, that's a great question. Actually there's a slide on that or part of the slide. Okay, so that okay and there's examples there if you need them. So then I'm going to really quickly talk about some other new directions that are exciting science future science directions. One of them is that we, if we're doing something like an M1 BCC calculation to assign charges we get the Weiberg bond orders, some kind of a bond order, essentially for free from that. And there's a couple of big benefits of that one that can simplify handling of resonance structures, remove dependence on aromaticity model because you can determine from that what's aromatic or not. But then also as high as CERN has shown, if you have a series of let's say biphenyls like these depending on what's the distant chemistry, you can have a really significantly different bond order for that nominally rotatable bond between them. And it turns out the barrier height for rotating that bond basically seems to depend linearly on the bond order, which I think we think raises really exciting possibilities for the positive for simplifying torsions considerably based on parameter interpolation. So instead of having to have let's say a separate torsional profile, as you change the distant chemistry here, you could perhaps get by with just one or two torsional profiles with a barrier height that scales. Yeah, so the AM1 is only getting us the bond order here. And but that's something we'd have to investigate is whether we need a better bond order than an AM1 bond order. But the torsional profile is not coming from AM1. Okay. Yeah. And so we're not using it for the torsional profile just for the rough bond order. But that that comes also from the optimized geometry right to some extent and that and that's being done in the process that you're using. You're optimizing the geometry before you get the charges. So these things are all interconnected. So yeah, so they've been looking some of that and but that's something that before we would throw this type of thing into production, we have to carefully look at those issues where there is going to be enough. Okay. Wave to Mike. What I think is so exciting about this just in terms of the direction of the force field development is that when you think of what's happening with that barrier. How would you make a torsion in any force field general force field that we know of that would accommodate that it's got to somehow know about the distal chemistry. And whether Adam types or fragments that is a horrible horrible land war by by this beautiful method that hi has been working on. I mean we can get a property and use it to within the context of a general force field to capture that degree of variability and torsion angle and that will be I think a first. Then also, I'm going to go through this quickly but we've been looking some nitrogen caramelization or planarity and here's a range of molecules some of which are flat. Some of which are tetrahedral and some which are in this gray band in between which is intermediate. Victoria Lynn my group's done this and it turns out that you can actually make pretty good headway towards telling how flat or tetrahedral a particular chemical series is going to be as you change the distal chemistry by looking at the bond order. So here graphs of improper angle as a function of bond order for this series these molecules here and this is for these molecules here this all there's a lot more that needs to be done to work all this out but we at least know there's a really strong correlation which potentially it could allow us to predict how cleaner tetrahedral something is going to be doing without needing a separate atom type for each longer term we there's also the avenue of improved electrostatic specifically virtual sites in a number of places some years ago with Dave Saruti and Bill and Julia and others we looked at needing needing offsite charges for some carbon chlorine bonds and how to improve the electrostatic potential. That's not something we support in our current force fields but it's an interesting direction for future science that you can help us prioritize. And also you'll hear I think some about potential work on graph convolutional networks for fast scalable charges assignment that are purely graph based. My shell pearl has been working on high quality bespoke charges from a next generation rest approach and I won't get to walk through some details of his data here but sort of the upshot is that you can it can improve predictions of things like hydration free energies and dielectric constants and could be if you're after a really high quality charges for a particular molecule of interest rather than something like a DCC it can be really interesting for that. Also, Jessica my group is working on improved improper parameters generally and then especially for nitrogens to capture how planar tetrahedral or intermediate they are and that should get folded into a force field coming relatively soon. A lot of other things to do. Expanding how many types we have and expanding chemistry coverage of chemical space bespoke parameterization of torsion so to get to Julian Michelle's issue. What about if somebody has a proprietary molecule they want to make sure the parameters are really high quality and they can't show it to us. We're working on getting somebody to build bespoke torsion parameterization workflow so they can run that internally and make sure that their parameters are going to be high quality without telling us. Then there's work to be done generating counter chemical data and making sure we have the right data and are using in the right way and on experimental data sets for fitting. And then eventually things like parameter optimization based on Bayesian inference and that's something we'll come back to. I want to remind everybody both people outside and especially even people inside that the website is key for news and status and reports on what's going on. So we have the science updates on our website and that's so that everybody including our team can stay informed about what the other aspects of the initiative are doing. Because it's important that we see how our parts are going to fit together with the other pieces to wrap up. We hope you like our parse the release. We're really excited not as much about the release as the fact that now we can do rapid refits so we can explore lots of new science and we can hopefully rapidly improve the quality of force fields. This likely creates opportunities for spin off science maybe internships in your company's rotation projects in our own groups. Different things like that where somebody can experiment with what happens if we try this or if we try that and let that contribute to advancing science. The things that work well can get folded back into releases things that don't can don't have to. And we're seeing clear improvements in our parameters already so we hope that means that when as we go into benchmarking that results in accuracy gains for properties we care about what we're going to find out soon. And I tried to show pictures of people who were doing this as I went along. I don't think I can even remember all of the people who to thank but some of them were there and you can see a list of our members on the website. We're also in great debt to a lot of people over the years all of you on the amber community gap and NIH and NSF funded earlier work that helped pave the way for this. So thanks again for your time we had some questions on the way but I think there's some time. So so David just wondering how will you avoid in the future to have an an explosion of force field that you have to benchmark all of them. Every time you make a new one you have to go back are you going to benchmark with a previous one to know it's already improving. So that's not something we've totally worked through yet as far as us having an explosion of force it will probably I would guess we'll be doing is probably benchmarking the newest one and the last one and as long as we're making headway that's going to be all we're needing because every step is better than the one before we don't keep going. Testing the ones the older ones, but in terms of an explosion of force fields I actually my hope is that there will be sort of an explosion of force builds from other people who are using some of the same infrastructure to try new things. And they'll use our same benchmarking infrastructure to benchmark theirs. And then we'll all get to find out what's best. Yeah, I just wanted to add to that to say that because we're building the automated benchmarking infrastructure and because you know you folks are supposed to guide us on what we should be comparing against then we could have the force fields against force fields challenge essentially online benchmarking all the things that you care about so that's why it's so important for us to get the things that people do care about into those benchmarks. You mentioned that what for the release one basically most angles torsions and bonds are fitted except for some unusual chemistry can you give an example for the unusual chemistry. I'd actually have to dig really deep so it's basically things that we can't find in molecules at all. And I don't off the top of my head remember any of those things because they seem very strange. Yeah, chromium hexafluoride sure. Yeah, I know, but I also, you know, obviously we're not refitting monovalent ions. Do we have a sense, how much more calculation or QM calculation we need to cover a broader space. For example, if we take Kimball or medicinal chemistry. Yeah. The easy part is the easy part is the quantum calculations, and I don't think that many more because doing something like this greedy set cover you don't. You just need, you know, a few molecules that utilize each parameter, at least to be able to fit it on the harder part is things that require an expansion of the typing. Because we don't fully have an automated if we don't have an all fully automated machinery for doing that yet. So somebody needs to say, oh, well let's introduce a new type here. Now that's something we want to do, but it's not yet. So, so when I mentioned college, I don't necessarily mean it's available like display earlier, more about they are all high quality parameters. You know, even compare with OPR 3 is starting from 40,000 torsion to 150,000 torsion. Yeah, so we don't, we don't know that much about assessment of parameter quality yet. Yeah, if some of the work you guys have done on on that in the context of your own torsion initiative is portable that could be something we could perhaps draw on. And that's part of the role that benchmarking needs to play is to help us start learning how to know which ones are quality and which ones aren't. I had a question concerning the progress in the fitting. When the parameters change, I assume that the bars represent what happened after 14 iterations. So are those converged changes or how do they change in between each of the iterations. Can somebody pass the mic to you. Throw it to you. So from my understanding, the question is how does the change happens during the fitting over the 14 iterations or so are they fully converged. Yes, so force balance does this opting optimization with Hessian gases and trial steps. So what ended up happening is that in the first two iterations, most of the improvements are down and the rest of the iterations are just small steps that try to get small improvements down and it finally converged after we met to thresholds. One of them is the step size on the parameters and one of them is the change in the objective function we are fitting is below threshold then it converges.