 So yeah, thanks everybody for joining. Today, if this were a physical in-person meeting, this would be the first talk. And so today I'm trying to give you sort of a big picture overview of some, but not all what's going on because there's too much to tell you all of it. And to point you to some of the other individual talks and make sure that you catch more details there. So to highlight some of the stuff that we're especially excited about. So it could be called where we are now, current status and our near-term roadmap. But it'll be a good variety of things. So to remind you, we are in part the Open Forest Field Consortium. The consortium is part of our funding model in that a big group of industry partners sends funds to MOLSI, which then supports the consortium, which is led by five economic PIs. And then we have a variety of other people involved. We also have Chris Muzny at NIST as a consultant. And we really appreciate the support from all of our partners. Then in addition to the consortium, the broader group, the broader initiative can be called the Open Forest Field Initiative. So the consortium is the industry group that funds a portion of our science. And you can get, we have a ton of people involved now, many of whom are now are listed on the members page on our website. You can find it there. Get little names and photos of people who are involved. The initiative continues to broaden both by getting more people involved, but also we're really appreciative now of NIH support, which is broadening the initiative beyond just the consortium, which is focused on improved small molecule forest fields, and really carrying on with our conventional fixed charge forest fields. This NIH funding allows us to broaden in the direction of building consistent biopolymer forest fields and in the direction of machine learning potentials and interfaces with those, among other things. And you'll hear more about some of those aspects in some of our other talks. As you have heard or will hear from Jeff Wagner, we have several new people on board. He mentioned David Dotson and Matt Thompson, who are two new software scientists who are on board. As part of our NIH funding for the consortium, as we head in the direction of biopolymer forest fields, we now have David Suri on board. He's at Rutgers and he's a senior scientist with us and is playing a key role in heading in the direction of biopolymer forest fields. He's been one of the, somebody very connected with the Amber forest field community and has played a role in some of their biopolymer forest fields. He's been behind the HypoQ series of forest fields that they have and he works with Dave Case at Rutgers. Then we also have Ben Pritchard, who's a software scientist at Mulsey, who will be replacing Daniel Smith, who's transitioning to another position. So to recap, people have seen this before, but part of what we're about is making forest field science routine. We have a number of long-term goals, some of which are some examples of which are highlighted here. Really want to make it easy for people to build new forest fields, maybe even custom forest fields for their specific problem. We want to make it easier for people to do covalent modifications of things like biomolecules. And to do, looking at down here at the fixed charge versus polarizable, we really want to allow people to answer scientific questions about forest fields. Like what would, if you fit a polarizable forest field and a fixed charge forest field to the same data, and then look at what that means in terms of accuracy and transferability versus computational cost, what's that's going to mean? What's that going to mean in those areas? And really, we want not just to be, not us to be the only, we don't want to be the only ones building forest fields. We want to kind of democratize it, make it easier for people to do a lot more forest field science so that we'll see faster progress in this area than we've had. So in, as we've headed that direction, we've invested a lot early on in infrastructure to make it easier to fit forest fields. So, the picture that we have in the infrastructure we've built allows us to take an initial forest field, represented with the open forest field toolkit, use that to assign parameters. Then we can use our evaluator to compute properties. This mainly focuses on physical properties. And we can also feed in quant mechanical data from down here, as well as experimental property data. So we compute properties, we compare to experimental data and we use a parameter optimizer, which currently force balance is playing that role to refit parameters given the data. And this happens in an automated way. We really just feed in a forest field and the data and it can optimize parameters. Then we test that and we benchmark our optimized forest field and maybe release it or maybe feed that back in for further optimization. And so the nice thing about all of this infrastructure is that we can now answer scientific questions about forest fields. Would this type of change or this fitting approach result in improved properties and predictions or not? So we don't always have to proceed to release and you'll see and you've probably seen already that we've tried some things that we haven't released. So the last time we met back in August, we roughly at that time we released open FF 1.0, which is the first in our parsley series of forest fields. Our plan for naming forest fields is that they're gonna be named open FF XYZ, X dot Y to Z, where X is the generation and so our major version number and that's the plan is that we, if we ever change a functional form, that will change the major version number, but we may also, we plan to also change that for other reasons. Then why we change when we refit essentially all of the parameters or make major changes and Z would be a small change like a bug fix. We also have herb based code names that go along with those. Parsley is the first one. So Parsley is our one dot Y dot Z series, Sage two dot YZ Rosemary and Time. Parsley is the only thing we have so far, but that's the plan. And we are now heading towards Sage, but not there yet. Now to remind you in Parsley, we did make significant improvements over our starting point for its fields Murnaw 99 frost. We had fit to several different sets of molecules, a Roche set, which was some fragments that had been provided by Roche. The coverage set, which was introduced to provide more coverage of parameters that weren't necessarily utilized by the Roche set. And you've read about that in the blog post. We're working on the paper now. And so we provided a bunch of data on how this impacted errors by specific parameter type. I just have a graph here for bonds where green meant an improvement in how well we treat that red meant it got worse. So in general, we improve things and you can look at similar data for bonds, angles and torsions. We've done some more benchmarking and Victoria Lim from my group will be talking about that. We've done some more benchmarking since then looking at other aspects, not on our fitting data but on relative to quantum mechanical data on other molecules we didn't fit to. And this just as a preview is looking at smooth curves showing RMSD populations for a variety of different force fields. Brown here is our starting point force fields Smirnoff, Anand Frost. You can see that, well, the peak is kind of in a similar place to these other force fields that has this broad shoulder where it does worse in terms of geometries. Then our first parsley release improved a lot. So now the peak is way up here but there's still a shoulder over here which is a bit wider. And so we're working on improving that and bringing that in. And one can with this type of data you can easily pick off which parameters are tending to cause these types of errors. And Vicki will tell us a bit about that. So we've had two or will have had two new parsley releases since then. 1.1 fixed some specific issues with NN bonds and a geometry issue with tetrasols. So what we were getting with parsley originally was red and with quantum was over here. And so with our 1.1 release, we have fixed that. It also fixed some torsions. What we're doing for 1.2 and this will hopefully be out at the meeting or very shortly after it fitting is going on now is redesigning the entire fitting data set to improve coverage and thereby accuracy. So we're not dramatically changing what parameters are in the force field but we're changing what we fit them to. And I'll tell you a bit more about that. In 1.1, so we fixed some specific issues with tetrasols and nitrogen torsions. What we're looking at here is a change in weighted RMSE. And so things below zero, this is a benchmark. So not on our fitting data. Things below zero are better, above zero were worse. And yeah, so you're seeing a lot of blue, I'm sorry, below the red line are better. So you're seeing a lot of blue down here but it's not an across the board improvement. It's a bit mixed. Here, you're looking at change in RMSE. And so again, there's some cases where it's worse but a lot of cases where it's better. And that's up on the web if you want to see more details. We are, we continue to fit to a good set of QM data. That's the data we have on hand continues to expand a lot. We can't fit to all of it because it's in fact, just too much. So that's something I'll come back to is how do we pick which to fit to. So a great team of people who's been involved and Jessica Ma and Kei Su Zheng have been doing a great job in redesigning our data sets. So we now have something like 1.5 million core hours of QM data in QC archive that's available to everyone. And here's some stats on what's in that. So it's really a large amount of data available. Part of what we've been trying to do for 1.2 is redesign the QM data so that we have better coverage and more diverse chemistry. In 1.0, we were in a bit of a rush to get our first release out there. So we just used the data we had. Now we have so much more data that we can be more careful in designing it. So if we look at coverage, how many times each parameter was used in our training data for 1.0? This is a log sale. You can see that some parameters are used thousands of times and some are used only once. In our training data. And that's also true when it comes to torsions where we're particularly concerned. There's a lot where you only see a single use of a particular torsion. And really, this is a very heterogeneous coverage of parameters. And to some extent that's inevitable because some of these are very generic parameters. They're gonna cover a lot of chemistry. But still we really like to avoid having parameters that only occur in one molecule. So we've been trying to fix that by redesigning more carefully selecting which data we're fitting to. So Hasey and Jessica have come up with a procedure for this where we take actually now a broader set of starting data sets, including something Bayer has provided from their patent collection. And so then we look at which parameters are used in all of these molecules. And then for each parameter, we're gonna take specific molecules that it occurs in and include those in the data set to make sure that we get good coverage of all parameters. And we do that using a clustering procedure that I'll talk more about to ensure that we're picking molecules from diverse clusters, diverse clusters to include in our training data. So we're making sure that the data is diverse, which we think is a good procedure and a good option and something I don't think we haven't seen explored before in how to pick data for fitting. So then you pick molecules from clusters and use those for fitting. And I'll talk more about that. So now here's our torsion coverage. There's still some molecules that have only, some torsions that have only one occurrence. It tends to be those are very rare torsions and we're working to fix those. But there's far fewer molecules that only occur, far fewer parameters that are only used once now and you'll see that a lot more of these are higher. So still room for improvement, but we're doing a lot better. And you can see now we've got, this is just a very small subset of the molecules we're using. We've got a lot of diversity in what we're fitting to. So our preliminary results on this, very preliminary benchmark is that we're seeing improvement change in weighted RMSC that is really good in a large number of cases. So these are on our benchmark set or our test set. We have two different ones, a primary set and a full set. These are objective function values. So lower is better. Here's our initial force field. Here's partially 1.0, 1.1 and our preliminary 1.2, which is still undergoing fitting. So you can see these objective function values get a lot lower. And you can also look at this is prediction of relative conformer energies. We're looking to see, so broader is more error. So we're looking at one thing you can look at is what's the height of the peak near zero. So our initial force fields black, 1.1 is in orange and 1.2 is in blue. So that's telling you the distribution is getting narrower and more peaked at zero error in conform energy. And you'll get to see more of such data. But really, so remember the only change we're making here is redesigning our training set. That's it, not which parameters we have, just what data we're fitting to. So we think it's exciting that we can get better results just from doing a better job picking what to fit to. Now David Hahn continues his work on benchmarking on binding free energies and you'll get to hear more from him on what's coming out of that. But we're looking at calculated binding free energy versus experiment here is for CDK2, but he's looked at a whole range of different targets now and you'll see more from him on that. And what we're seeing is that if you look at this distribution of errors across targets, about 50% of cases are within one KKL per mole, 75% two KKLs per mole and so on. You probably won't hear about it here, but John Kaderra is also doing a lot of work using parsley on folding at home to apply to actual COVID-19 discovery projects. So it'll be interesting to see how that goes. We're also excited about infrastructure changes and this is shifting to the near term roadmap part of my talk. So I've just been talking about some of the changes that are coming immediately in 1.2, but infrastructure changes are also exciting. So our 0.7 release, which is going to be out very, very soon, perhaps as soon as the meeting, will Jeff Wagner's been spearheading this and it implements Weyberg bond order interpolation for torsions, building off of some of Haya's during this work that you can hear about. Also include BCC and slash charge increment model support or adaptable charge models. And for people who want to bring in their own charges from a file, previously that was only supported with an open eye backend from mold two, but now you can do it with SDF files, which will also work with the already get backend. And then we're headed towards having virtual site support in the toolkit as well. Here's that one virtual set, one of the four different types of virtual sites that'll be available. And this is infrastructure change to enable science to happen. We're also making headway on automated benchmarking. We want it to be that when we produce a force field, it's automatically benchmarks and the results are made available and then people can drill into those. And so Jaime on the left here has been working on a front end for this. And you can see a little more of this is in Jeff's talk. This is some test data, just as an example. And Matt Thompson is working on the backend to get the automated benchmarking running, which will then populate this front end interface. There's one more view of what you'll, the type of thing you'll be able to do with this, but basically you should be able to drill in and pick the metric you want, the data set you want, and look and see how it's performing, maybe even eventually drilling down to which molecules are seeing which types of errors. But we really know make it so people can look at the data they want in the way they want it and extract the data that they want for use and talks or wherever else. So this will ultimately be available to everybody via the web to actually the ability to make your own plots in that kind of way. Josh Horton continues his work on what we call bespoke parameterization. This has been a frequently requested feature and we think this is gonna be in a state that people will be able to use it relatively soon. So the approach bespoke parameterization is, if you want a force field specific to an individual molecule. So it uses some of the same infrastructure, but you might use this on proprietary molecules. So in that case, you'd be running a local QC archive instance. So you're not making your data public. And you can use some of the same tools to then fit a force field for your individual molecule and you can hear more from them on that. As we head towards SAGE, a couple of things that are headed your way, we think it should be our first force field release which has refit Leonard Jones. Been trying to do a careful job of picking which data to fit to. We don't wanna just push something out that doesn't really improve. So Simon Boothroyd has been looking at different, refitting the different types of data. And so this is just, you can hear more from him on this, but this is looking at root mean squared error on enthalpy of mixing density, pure density and density mixture densities with several different fitting approaches. So red is our starting open FF 1.0, you get to heat of mixing and density or heat of mixing density, heat of vaporization and pure density. This is the classic approach, heat of vaporization and pure density. So let me check out the classic, our starting point force field has fairly high error. The classic approach of doing heat of vaporization and pure density does improve things relative to that, but not as much as some of these other approaches. So that looks like that's probably the way we're headed. And you can also see that as you go to some of these other approaches, you're able to remove some systematic errors for particular classes of compounds. So it looks promising and you can hear more from Simon on that. We also have the machinery ready or we'll be ready in our 0.7 release for BCC refits. And so that'll be something that can be coupled with this same refitting to further improve things but we need an opportunity to work through that science before it makes it into a force field. We also expect our sage release to begin including interpolation based on Weibring bond orders for selected torsional barriers, again, building on Haya Stern's work. So in her talk, you can see what's behind this looking at several different chemical series. You can see that the torsion barrier height clearly correlates with Weibring bond order within a particular chemical series. And so we should be able to capture that within our force fields in a way to actually make the force field simpler rather than more complicated by adding parameter interpolation. And so we're really excited about that. So to wrap up, we expect Parsley is gonna keep getting better as we make some of these changes specifically in 1.2 redesigning the data that we're fitting to and then we think we can roll out a series of improvements based on changes to chemical perception after that. But then we're excited about the new science that's gonna come into the 1.2 release or in the sage release. I also showed you how redesigning the data we're fitting to, selecting it more carefully can improve results without even changing the number of parameters we have. For sage, we expect you're gonna start seeing Weibring bond order parameter interpolation in the force field, which will allow us to capture some effects of conjugation and other things in a way that we're really excited about. And then continue changes to the toolkit will enable new science that as it becomes clear that we need it will roll into force fields that you may use, such as virtual sites. So there's really too many people involved to list everybody. So please see our website for a list of people who are involved. And we really appreciate all this also the consortium and the NIH for support. There's been a ton of people who've built the foundations for this, including especially the Amber force field community and the GAF and GAF2 folks. Many of us have benefited from NIH and NSF funding that helps pave the way for this. And we appreciate the consortium at NIH for current funding. And so there's where you can get more details, but thanks very much for your time. And I probably have a couple of minutes for questions if anybody wants. Okay, I'll pop up and ask the first question then. So about the plans to use the Weiberg bond orders as part of the privatization process, does that end up removing the option of using the force field in a high throughput capacity? Because in this instance, we're going to be doing an AM1 calculation for every molecule. Yeah, so I guess that's a great question. And we've been expecting that we're going to be doing an AM1 calculation for every molecule. And I think if you wanted to do that on something really large like enemy and real, you could imagine I think caching parameters for molecules or caching the results of your AM1 calculation. From our perspective, it's not really a significant change from where we are currently in the sense that currently, if you were using, via most routes to apply parameters, you would be doing an AM1 calculation to get your charges. Yeah. So with that said, like if you were bringing in user supplied charges and trying to use just the rest of our parameters without using AM1 BCC charges, then that might change, switching to parameter interpolation might change your workflow. Yeah. And actually, if I could hop on this, this is Jeff Wagner. I think this is a really insightful question. And the answer is sort of that if you're looking to use the force fields in a high throughput way, then it's possible, if this is already in a workflow, then that workflow likely was not already calculating AM1 charges. It was rather just identifying that if it were to parameterize the molecule, it would use AM1 BCC. And so I think we could probably do the same in the labeling functionality. So not the parameterization functionality, but like the label molecules functions to not actually calculate the partial bond order, but rather identify the parameter ID that would be assigned to the bond and say, oh, if I were parameterizing this molecule, I would do this interpolation. But for now, just here's a parameter ID or here's the information about the parameter that would be assigned. Yeah. But I think if somebody maybe Carmen wants to make a note of it, I think we should maybe invest a bit of effort in thinking about, and maybe we want input from the partners on how many people want to do something really high throughput where they wouldn't want to do an AM1 calculation because we may want to think about whether we want to maintain a version of force fields that doesn't use parameter interpolation. The other thought medium term is the obvious sort of high throughput alternative to AM1 at the moment is people have got various sort of AI-based charges on their screens. Presumably there's no reason why somebody couldn't build an AI-based bond order prediction scheme if you wanted to do that nice method as well. Yeah, so my understanding from John Kedera is that the one that I think Yon-Ching and his group is working on is attempting to learn both AM1 and Weiberg bond order. So an ML approach to predict AM1 BCC charges and Weiberg bond order at the same time. And I think they're training on something like enemy real, so it should be potentially quite useful if that can work. Cool, thanks.