 Okay, here we go. Okay, so I think we can go ahead and get going. I agree. Yeah, so this is, depending on how you count our fourth open for sale workshop or the fourth one since we began a open for sale consortium. And that says there's the second of those which is virtual apparently. Jeff, can you flip to the next slide. So, I'm just going to be kicking things off here today. Just briefly and then we're mainly going to hear from first Jeff Wagner and then Simon Dothraud. And then I'll kind of wrap things up with some some summary at the end and then we'll have ample time for discussion. The plan is that this session, the first to talk the formal talk will be recorded and then we'll turn off recording for discussion so that everybody can be candid. And as we'll talk about later, well, let's go on to the next slide. So, the idea is that we would have an hour of talk and our discussion and q amp a, and we'll titrate between how much of that is discussion and how much is q amp a. And then you can put in your requests for what you want to hear more about, and we'll schedule sometimes for talks on those in the coming weeks, so that we don't have to block out a huge amount today for things that we think you want to work here about, and maybe you're wrong, we'll actually talk to you about what you most want to hear about and so we'll pull for that during the q amp a time. Next slide. So just to jump right into it, we started officially in late 2018 October 2018 but so we'll call our first year that's 2019 year and a key part of that first year was building out the infrastructure here. The open force field toolkit and value internet that you'll hear more about is a condensed based properties. We're interacting with force balance to see archive data sets, and so on I won't work through the whole workflow right now, but we spent a lot of time on infrastructure. And really at the end of that, that first project year, we ended up fitting about three or four force fields within a couple of months to release open force field 1.0. So automation has been a key part of what we've done and as you'll see today, it's really starting to pay off a lot. Go on to the next part. So since that open force field 1.0 release, which is code name parsley, we've had a series of fixes and improvements in between that and our siege release which is coming shortly and you'll hear about today release candidate out so 1.1. We've had some more valence parameter refits and some fixes 1.2 was a full refit by which had a lot of data set redesign that went into it to improve accuracy. Then we had a series of specific fixes and issues addressed in response to challenges and problems that came up. So in time that we've been doing a lot of new science, really we've done 10s to hundreds of fitting experiments to test out a wide range of ideas, like the effect of vibrational frequency fitting with session fitting which is not fitting to either, and all kinds of other things and you'll hear about some of those today. Next slide. So, um, also, we're working a lot on the infrastructure side Jeff will be talking quite a bit about different aspects of this and just to call it one of those we're bringing out our new interchange objects and so we've been relying previously on an open MMS system as representing a parameterized object and we're switching to having our own objects which will probably be open FF interchange and that will replace our system replace par med in terms of converting to other files and also allow interactions with machine learning and other frameworks and you'll hear more about that. Next slide. Then also, we now have ready a sage release candidate and you're going to hear about some of what's gone into that today it does better on mixture and condensed phase properties because it includes a refit of the Leonard Jones. And then in tests that appears to result in improvements for salvation for energies both aqueous non aqueous and transfer for energies, even though it wasn't fitted to those. And this will allow us in the future to train small molecule force fields and others and profile molecule force field self consistent. So we're really excited about that and you'll hear more about that. Next slide. A big focus has been on automated benchmarking working together with industry and we're really excited about how that team has come together and being able to do things both on public molecules and on internal industry proprietary molecules to assess performance. And it looks like both our first force fields are looking pretty good but also our stage release candidate actually looks like an improvement. Even though it's our first force field with refit Leonard Johnson you'll hear more about that. It must come to so next slide. The other big another big thing that's changing is we're switching from thinking about force field force development process as one where we come up with an idea and then we implemented. Fit it and then release force field and then test it and then release that to you one where we kind of we're we have a single idea and it's going to end up in a force field. Instead, thinking of it as a process where we try a bunch of ideas, roughly in parallel test those ideas out as quickly as possible, see which of those look the best, and then go through the setting and more detailed test process and then release the best of those. But we're finding that it's hard to know in advance what's going to give us the best thing for our buck. So it's better to try a bunch of things in parallel and let the science drive what actually makes it into the force fields. And so for example, we had been we spent a bunch of time working on whatever bond or interpolated torsions which are still a good idea and show some promise but it's not quite ready for mainstream and release yet. So we deferred that from the stage released. On the other hand, some of our bespoke fit tools and another idea you'll hear hear about based on driving parameters from quantum mechanics via modified seminary approach kind of came from an unexpected place and look like it's going to impact what we're doing in the short term, even though they weren't really on our road map. So these are things that are turning into key parts of our fourth seal, even though they were developed in parallel. So, with that. Oh, yeah, also I wanted to mention, we're also starting to collaborate a lot more broadly. This, you'll hear a little bit more about the many different people outside open force field initiative you're using the software and I think that'll play a key role in long term sustainability. And sometimes that leads into people formerly partnering us partnering with us like Danny Coles. Jeff, over to you and on to the next slide. Great. Thanks, David. Right so I'm Jeff Wagner. I'm the tech lead for open force field. And I will be giving the infrastructure part of the talk for the next 10 or 15 minutes. I'm structuring this part of the talk around our major priorities. And so, since I've started here we've had I think more work to do than we've had hands and I've had to identify kind of what is open force fields value stream. I think it's not sufficient to just make a new force field and then expect people to use it. I think the three pillars are that the force field needs to be good that people need to believe that it's good. That people need to be able to use it for the problems that they care about. And so I'm structuring my talk around these three priorities. I'll be discussing how our infrastructure makes it easy to use force fields makes it easy to compare force fields and makes it easy to create force fields. So let's start with making it easy to use force fields. What the normal user wants to do is take a molecule and force field and just run a simulation. Our big bottlenecks here are getting people to start using or integrating our software and getting the software to work well when they do try it. We help people to start using our software by having great examples and documentation. We recently brought on Josh Mitchell, he's over here on the middle right to help update our documentation and expand and refresh our examples. So over here is an example that Josh Mitchell worked on getting into our public GitHub repository. This shows how to run a protein ligand simulation with an amber protein and an open force field ligand in a box of tip 3p water with some ions for developers who want to use our package as a library inside of theirs. They don't want to pull in all the optional dependencies and the graphic stuff and the user facing things. We've made a bare bones base conda package that they can use in their own package recipes. For people who have been following our release notes and channel announcements you've probably seen this but over the past year, we've made a lot of bug fixes and minor improvements. Basically on a rolling basis to the open force field toolkit and our other packages. And also I'm learning more about being a good citizen in the open source software ecosystem and I want to clarify that this number being one instead of zero is not a major accomplishment. We don't want to be breaking the API on a regular basis but I do want to acknowledge that we did. And in the future I'm hoping that we can keep this number as close to zero as possible. The reasoning for breaking the API this year was that we wanted to make more room in our Python namespace for the other open force field software that's being developed and so we changed our import path from from open force field import to from open ff dot toolkit import. This makes room in our namespace for the other open force field software that will be coming out like open ff dot evaluator and open ff dot QC submit. In the long run, we're optimistic that this can reduce user confusion when we have many major pieces of software by associating them under this top level namespace but then giving each of them their own module path underneath that. Finally, we're in the process of developing functionality to load and parameterize proteins using a Smirnoff format force field. This is going to be the foundation of our upcoming biopolymer fitting efforts. Some of the code is there. It's still in the prototyping phase. So we're using it internally and soon the science team will be able to start parameterizing proteins with it. But for users I recommend that you wait another few months until this makes it into a stable release unless you really like pip installs and extremely obtuse error messages. Now reliable distribution and installation are fundamental to making our software really truly usable. You might remember a few months ago that Anaconda the company Anaconda was getting a little bit dodgy with licensing. And we realized that if we were able to move our packages and dependencies fully onto condo forge. We could resolve we could resolve this ambiguity around licensing and avoid making all of the companies that want to use our software go through separate contract and legal reviews with an account of the company. Further, we realized that moving to condo forge has a bunch of technical benefits basically a huge number of small improvements that in some will really improve the user experience and expand our platform compatibility. So in late 2020 we started investing a lot of time and energy into working with the open mm team as well as actually contracting a condo forge maintainer to help move our packages over. This effort was largely spearheaded by Jaime over here on the right. He did such a good job that he actually got hired by the company that we contracted to. And now he's becoming a major condo forge maintainer himself. I think the biggest payoff from this though is that now we no longer need to spend engineering effort inside of open force field to maintain the omnia channel. I think that would break unpredictably, and it was hard to schedule other things around because of the complexity of fixing it sometimes. Hey Jeff, can I ask a quick question. We might keep questions for the end if that's okay. Okay, no problem. Thanks. So we don't want to make open force field software do everything in a research workflow, because other people have made really good specialized tools, and it would sort of be a waste of time to try and replace those. But we do want to interface smoothly with the adjacent tools that people would want to use in a workflow. So over here is the standard workflow for using open force field to parameterize a molecule. The intake for our software the way that you get a molecular topology or a molecule object in open force field format is using our D kit or open eye. And that those tools are already pretty well suited to fit into a workflow, we have some special additional needs for open force field molecules compared to general molecules but for the most part the intake is operational right now. The output is a bit more of a question. So, our current native output format is the open mm system, and when people want to integrate the outputs from open force field into their workflows or some ambiguity. It's not clear if they should move their whole workflow to use open mm or if they should kind of take some number of pass through par med to get to the format that they want. So I'm thinking that this ambiguity was probably a very large barrier to people who were trying to slot open force field into their workflows. So to resolve this, we're working on replacing our use of the open mm system object with a new object called the open ff interchange being developed by Matt Thompson. So this is that we replace the open mm system with the open ff interchange. And this is going to have native writers to all of the major biophysics formats. And what this will let us do is control the flow of information without needing to squeeze through par med, which is already kind of stretched past its original design scope, and requires more than the developer resources that are available to it. Eventually, these lines will be bidirectional. So we won't just export to these formats, but we'll also be able to import them. However, the importing part is going to be more difficult. So we're focusing on the exporting first. The interchange object will keep track of the provenance of each assigned parameter, which will let us do post assignment parameter modification. So in the current open mm system, if we want to change the length of a particular bond, and we try to do that, it only changes the length of that instance of the bond in the resulting system object. But in the interchange object, we'll be able to change the length of the underlying bond parameter and have that change propagated through every use of that bond parameter in the whole object. This might not seem very cool, but it's actually really exciting for what it will let us do with our force field fitting. Also, the interchange object will have native writers out to a number of machine learning formats. And this is exciting because it brings us one step closer to fully differentiable force field optimization. And that will let us take all the effort in the engineering that's gone into modern machine learning infrastructure and kind of use it as a hammer with which to beat our current specific field fitting problems. So on the previous slide I mentioned that we're designing our software to become part of a workflow and success and how well we've done this is hard to measure. For the first few years, in fact, we were just sort of throwing software releases and force fields into a void and hoping that someone was using them. We've engaged in a few direct collaboration sense, for example, with the most efforts at Vanderbilt and other institutions. We're going well, but what we'd really like to reach is spontaneous adoption of our tools and force fields. And what's been neat in the past year is to see some of these results start spreading. So there's companies like Crescent and open eye and if you look at their recent release notes they're starting to support force fields and use some of the open force field tools. We have formal collaborators including my staff and the rally and co labs as well as several others. There are some unaffiliated groups around the world. So for example, this is the paylay free energy approximation program made in Spain. And it was a joy to see one day that they had started using the open force field infrastructure. And we've been in occasional contact with them to help them get integrated. What's coolest for me on the open source side is that if you do a search like a text search for from open force field import or from open ff.toolkit on GitHub, you start finding bits and pieces of our examples and our API in repose that we've never heard of from people that we've never talked to. And that I think reflects really well on our efforts to make our software and force fields easy for people to use. So now I'd like to talk about how we're making it easy to compare force fields. And so back in 2020, the open force field team had been talking about how to get people to try using our stuff. And basically it's a chicken and egg problem where the people we care about have finite time and they need to spend some of that time using our stuff to be convinced that it works. But they won't spend that time unless they already think that it works. So earlier this year we started an effort to automate the Victoria limb and David Han force field study. And our goal with this was to let people test our force fields on a data set of their choosing in as automated a fashion as possible. David Dotson made this great figure to explain how our automation works. So the user comes up with one or more molecules defined in 3D with at least one conformer each. And then we use our decit to generate up to 10 conformers of each molecule. We do some sanitization and validation to make sure that these molecules are suitable to be handled by mm force fields. Then we run a QM minimization of these molecules using psi four. And for each QM energy minimum that comes out. We run mm minimizations using a variety of force fields. The results of those mm minimizations we see if they stayed close to the local minimum identified by the QM minimization. And also whether those mm force fields correctly rank the relative energies of those different conformers. And finally a summary reports generated. So we reached out to a number of companies that are working with us on this study, and each company that's participating submitted some non proprietary 3D compounds directly to us and we're running those. We're running the workflow on those and in public. And this is also running a separate data set of their choosing internally, and we're collecting only the summary statistics from the results of the automation running on that to produce a big pooled study. So I'm optimistic that this has people seeing the performance of our force fields and tools on molecules that they care about and on infrastructure that they have access to. What was surprising about this was how smoothly the distribution of all this software went. So we packaged the open force field toolkit open mm the cy for quantum chemistry software QC engine QC submit and the little lightweight CLI front end on top of all of it. And people at different companies were actually able to install this and feed in their molecules and have thousands of quantum chemistry optimizations executed locally. It was super cool. And I think it worked really well because we had a really good team working on it. And because we've been developing our software with such internal consistency that these things kind of become Lego bricks that you can snap together and ship out. And what I think more users care about then conformer geometries is inter intermolecular and simulation based properties. Now tackling these is more than a processing problem because there's also a data collection aspect to it. So whereas drug like molecule geometries are kind of plentiful in different crystallographic databases. Things like bulk molecule mixture densities and protein ligand binding free energies require a lot of curation. So setting up and running simple property simulation Simon has created basically this death star of a solution called open ff evaluator, and this automates a whole bunch of physical property calculations. The current extent of this is showed in this table but it's it's changing a lot and more and more of these checkmarks are coming in every week. The protein ligand calculations David Han has been curating database of protein ligand complexes with known binding energies, as well as a machinery to programmatically access and iterate over them. So you can run your your protein ligand for energy calculations in a loop. This is the foundation of a lot of our benchmarking efforts using different free energy frameworks. And so there's been some work that we've been doing to get results on protein ligand stuff but this side is a lot more effort in compute intensive. So the automation still has a little ways to go before this becomes totally routine. The protein ligand stuff is being built, Simon's property evaluator is fully automated and ready for use both in benchmarking, or if you hook it up to a force balance optimization loop force field fitting, and we'll be hearing about that during Simon's portion of the talk. Now finally I'd like to talk about making it easy to create force fields. Our force field training and experiments require a lot of QM data, and to generate the volume of quantum data that we want. We've started really heavily automating the submission of data sets to QC archive. So there's some extra work that you need to do when you're submitting graph molecules for quantum chemistry jobs. When you're doing force field work because you need to make sure that you're getting the same graph molecule out at the end as you put in at the beginning. With these data sets, there's also a lot of information entropy and you don't know ahead of time, whether this data set you're submitting now will just be a dead end experiment, or if in several months it'll become important artifact that's part of our force field release. So in the first year or two we didn't really have standards for how to record reasoning and settings for data sets. And so a lot of our initial work on QC archive no longer has the notes or context that we need to understand it today. So thankfully, Trevor Goki drafted some standards for our data set submissions moving forward. And this was based on his lessons learned from sort of being an archaeologist among those early data sets and having to figure out, like to reverse engineer the reasoning behind them. Simultaneously Josh Horton has been working on the QC submit package, which makes it a lot easier to submit and retrieve graph molecules from open force fields data sets on QC archive. And David Dotson has started automating huge swabs of our job and compute management infrastructure using QC archive. So here's an example this is a GitHub repository that David Dotson maintains called QCA data set submission. And this is a project board where each one of these items is a poll request containing a bunch of molecules that were submitted to QC archive. And they're all in different categories so some of them are currently being computed and because we use a lot of preemptible compute. Sometimes the errors that we get are just because a job was killed. Because a higher priority job took its spot. So we have error cycling. Sometimes we reach the end of a data set and a lot of molecules just finished and send it back to the original scientist and make sure that the questions that they asked made sense. And then we keep track of the end of life and final review for these data sets. What's neat is that this automation goes through every day. And for every data set that's currently running, you get a report of how far it's gotten. So this is a report from a few days ago, showing that 872 torsion drives of a submission had been completed. Whereas 15 were in the error state. These torsion drives break out to a number of optimizations each. And so there was a total of 55,000 optimizations that had been completed result totally automated, which I think is super cool. And now we're starting to hit a really good pace for quantum data set creation processing. And finally, in the open force field toolkit, we spent a lot of the first two years laying the groundwork for different types of parameters and functional forms. So we've added support for virtual sites. And different partial charge methods other than AM one different BCCs other than AM one BCCs. Now we can define them using a smart space grammar. We've got vibrant bond order interpolated torsions and bonds supported, as well as more technical support for molecule subclasses for folks who want to make types of molecules that have additional functionality from our base molecule. We've also started seeing some examples of research code maturing into production. And this last year we saw this for packages C miles and fragmenter. These were codes that were used by highest turns research, but after she graduated, they had maintenance needs and future requests that were hard to coordinate with everything else that was happening in the software. And at that point we basically realized that with her paper out the behavior and interfaces for these tools were fleshed out in such detail and they were in such a stable state that it was straightforward to refactor their functionality using the open force field toolkit as a back end. So now these are available again. And in the case of fragmenter now it's no longer open eye dependent we can use amber tools as a back end. Our force field package also made the jump to condo forage, and now it's under the name open ff force fields. And with this change, we've automated a lot more of our force field release process. So one exciting thing is that just like how if you have a bug report from a user in your code base you make a regression test to ensure that you don't add it back later. So these simple regression tests for things that pop up in simulations and catch recurring problems before they go out in a release so for example open I had reported a problem with propine substituents in HMR simulations and now we've added a test to make sure that we can't break that again. And over here on the right this is another example from Josh Mitchell. He's showing how the toolkit can be used for force field modification where we learned that Pinocchio was a real molecule all along. And so now that we're starting to talk about changing parameters, I'm going to turn the presentation over to Simon, and he'll talk about the fruits of all this labor. And are you stopping sharing. Sure. There we go. Cool. Thanks so much, Jeff. Just get the slides up now. Okay, awesome. Thanks. I really want to start my part of the talk, kind of overlaying the amazing science that's been done by open force field by really reiterating the core mantra of open force field. Open software open data and open science, because really is these three core tenants, which are rapidly facilitating the force field science that open force field is doing, as well as the incremental improvements that are being made to its force field. The open software as Jeff has mentioned, so we now have a new fully automated pipeline for training testing and analyzing new force fields and force field hypotheses. What may have previously taken whole teams of researchers and engineers weeks is now becoming possible for a single researcher to explore in days using these these fully automated pipelines. Open data aspect, you know, open force fields commitment to open data, and especially building tools to curate high quality training and test sets from open data says is like the quantum chemical archive quantum chemical data, and the next quantum chemical archive for experimental physical property data is really allowing open force field researchers to focus more and more on the science, rather than the tedium of data curation. And this ability to cherry pick from high quality, highly curated pre existing data sets, or rapidly create new ones were missing and contribute them back to the community has been fundamental to a lot of the work which I'll hopefully show through these these slides. And then of course the science. This combined software and data infrastructure has made the process of exploring new force field science, almost routine. We can now readily start from some new hypothesis. You know, should we be including virtual sites do we need a polarizable charge model. So have the software team integrate the necessary infrastructure answer the questions if it's not already supported. Have then the science team perform whole matrices, 10s and 10s of test force field fits without much human intervention. Yes, those force field fits against the well curated standardized test sets, especially those that are being provided by the industrial partners as Jeff previously mentioned. And finally, make data driven decisions but if, how, and when these new force field ideas should be incorporated into the main line open force field force fields. I really want to show over the course of my slides that this process is working, and it's really enabling incremental improvements to the open force field force field, as well as enabling a wealth of new force field science. And so, in particular, I think a fantastic example of this process working, even though the science, perhaps isn't quite there yet. So these biberg bond order interpolated torsion parameters. This really started from these fantastic ideas and observations made by higher Stern and Christopher Bailey that the quantum chemical torsion barrier height has a strong correlation with the biberg bond order. So the idea is, well, why can't we take our force field classical torsion force constants and also interpolate them based on this biberg bond order to capture this effect to capture non local electronic effects and conjugated effects directly in our torsion parameters without needing to introduce hundreds of new parameters for all the many different edge cases. Based on this idea, the software team were able to add support to the open force field toolkit for computing biberg bond orders who were able to add support for interpolated torsion parameters. So the science team, then Jessica Martin, Pavan Bahara, especially, were then able to use this infrastructure to perform 10s and 10s and 10s of new force field test bits, exploring all things from high parameters to fitting targets to which chemical series may best benefit from these interpolated parameters, especially those which may be exhibit conjugation effects and then equally for each one of these test bits routinely assess how the performance of the interpolated parameters were changing things against standardized quantum chemical methods based on the work of Victoria Lim and David Hahn. Okay, so ultimately what was found from these assessments that limitations in other areas of the force field, especially the electrostatic interactions and non bonded interactions were really starting to introduce noise and these wonderful correlations that were that were observed and ultimately at this point precludes and hindered fitting the torsion parameters to these beautiful signals. And so based on the data, it was decided not to include this aspect of force field science yet into the mainline force field and not include it yet into sage. But I just want to stress that even though the science hasn't necessarily yielded the expected performance improvements yet, given that we have all of this high degree of automation, we can continue to revisit the scientific ideas as other areas of the force field improves again applying this same process. And when the data ultimately says that this science is ready to make it into the force fields, then we can incorporate it because ultimately open force field is about making these data driven incremental improvements to their force fields. And I think parsley has been a real positive example of this because it's had a number of incremental improvements since it was first released in 2019 and as David has spoken about things like the valence parameters being retrained against a fully redesigned quantum set to enable better coverage of chemical space or particularly better performance on those regions of chemical space which are most of interest, especially pharmaceutical relevance. They've had incremental releases based on user feedback so it was identified that the geometries of the rear containing molecules were perhaps passed poorly reproduced so a new release of parsley was made to address that. And it was identified that the force field may be yielded pathological behavior for triple bonds when simulating with high degree of mass repartitioning. So release was made based on that to resolve that. And it was identified that perhaps the geometry of sulphonamide was perhaps not well captured by the force field. So an incremental release was made to address that. Time and time again we've been able to either based on new science or user impact improve parsley force field and this now leads us to the next stage which will be the sage force field so the next incremental generation of open force field. And sage is really going to focus on a retrained set of van der Waals parameters trained against physical property data alongside continued improvements to the valence parameters and ultimately sage is an important stepping stone, especially in determining the methodology for how these van der Waals parameters are going to be reclaimed an important stepping stone towards the ultimate goal of a self consistent biopolymer and small molecule force field and even beyond that just a self consistent force field be able to take ligands and amino acids and sugars and lipids hopefully just have a force field that all of that can be thrown at it and it handles them well because everything has been trained in a self consistent manner. But for now say just still focusing on the methodology and building up that small molecule force field, even though it's building towards the self consistent biopolymer and small molecule force field. And so I mentioned sage is going to be the first open force field force field which contains a select set of retrained van der Waals parameters and these van der Waals parameters have actually in the release candidate been trained about against mixture properties which was inspired by a previous piece of open force field science, especially entities of mixing and binary mass densities. Previous work by open force field has shown that when you just train your van der Waals parameters against pure properties, one can introduce these systematic errors into complimentary interactions. So this figure here I'm showing some previous benchmarks that were done against entities of mixing, and it was found using our previous force field, whose van der Waals parameters ultimately came from these pure property fits. So these systematic errors and being able to capture alcohol and ester and alcohol and ketone interactions these strong complimentary interactions given the hydrogen bond donor and accept aspects of them. This has shown that as soon as you start training against entities of mixing and binary mass densities, one can be able to begin capturing the complimentary interactions present in the system and start to remove these systematic errors. And just on the trend of capturing complimentary interactions and why we think mixed the properties will be so important moving force field to open force field force fields. Because it allows you to capture in single training sets whole wealth of interactions, one can trivially incorporate into the training set mixtures containing solvents with ligands ligands with amino acids amino acids with sugars and solvents. And this way, your training that can capture all of these potentially many diverse range of interactions. So although parsley although sage is focused still on the small molecule training sets and interactions as a methodology and the pathway forward. We think mixed the properties will allow us to start to incorporate these kind of self consistent interactions which would be critical to yielding an accurate bio polymer small molecule force field. So new science based on open ff science going into to sage. But one thing I also want to stress is that this is actually quite a technical achievement and it's gain open force fields. Dedication to open software is facilitating this kind of work because the sage training set contains about 1000 of these binary mixture. properties so 1000 entropy of mixing and densities data points. So if one thinks how one would naively kind of compute those it would be about two simulations each for the pure phases and one for the mixed phase so about 3000 molecular simulations that maybe would be required to evaluate this data. Multiply that by the number of training iterations so for sage it was about 15 that's about 45,000 molecular simulations that simulations that would have essentially have to be set up run analyze and then incorporated back into some force field fit by having the open ff evaluate frameworks which can routinely curate data sets from the NISTOML archive and then takes care of all of the implementation details of actually estimating that we can evaluate these large training sets and even go larger without minimal human interaction. So the sage training set about 1000 data point but really facilitated by this great software that the open force field has built out and in this vein of being able to fit. Using mixed data to fit small molecules with other components sage also begins to include some aqueous mixtures where we've included fits including the fixed tip 3p water model. Our small molecule force field is beginning to in some way see the water model as part of the fits, even though we're not retraining it yet, although retraining a water model is definitely on the roadmap. And so by doing these refits against mixed properties and in the release candidate of sage we are beginning to see some really nice positive improvements. We're benchmarking against both salvation for energies and non aqueous to aqueous transfer for energies, which we hope would be in some ways at least somewhat representative of something like a ligand unbinding from a protein and then being transferred into a solvent. This is based on subsets of the free solve and Minnesota solubility datasets. So on these plots I'm showing the non aqueous salvation for energies, the aqueous salvation for energies and the non aqueous to aqueous transfer energies, and the RMS ease of those for one that one plus the am one BCC charge model for parsley one dot three the hour, and also the sage van der Waal parameters which went into the release candidate. So even though one maybe doesn't see statistical significance if one digs a bit deeper and looks at chemical moieties one can find statistical significance in certain chemistry is improving. In general, they do seem to be a positive train, positive change after retraining the bandwale parameters. And of course, there's some caution required here because there's always going to be some missing terms if you're just using the fixed force field when evaluating these properties. And of course, when we begin to self consistently start retraining the bandwale with an electrostatic model which I'll mention a bit later in the talk which begins to see hopefully even larger improvements here. So we do think these mixed properties are not only helping to improve the van der Waals parameters, they should give us a sustainable pathway and beginning to build up a self consistent bio polymer and small molecules force field and then beyond. So it's not just the van der Waals parameters that we've been working on for sage. We've also tried to go back and reevaluate what how can we improve the valence parameters incrementally at the same time and there's been a couple of approaches that we've taken here. So one of them is looking at can we expand on the torsion drive training data that are included in the valence fits and this is work that's been really pushed forward by haste usang. But it is based off an observation made during previous fits and during the fiber bond order work that if you include torsion drives in your training set, which are influenced heavily by strong steric or electric static interactions, and you have deficiencies in your standard models which we know we do have, you can essentially introduce easily artifacts into your torsion parameters. So hey she's been looking at can we design a small yet chemically diverse torsion drive data sets, which doesn't include these strong steric and electric static interactions and mainly exploring this by trying to combinatorially combine small chemically diverse fragments with a single bond and do a torsion drive across that bond. So unfortunately while great strides have been made in this this area, and we've done a couple of test fits based on these and assess them against quantum chemical data. The data is showing that these this new approach to designing the torsion drive set isn't quite ready, and it's not yet ready to be included in sage. So it is coming on the horizon it does look like it's a promising approach but for now there's still still work to be done but again it was explored using this. So we had the idea, we're able to do test fits based on the new data sets that she put together and that was able to compute using the distributed infrastructure, we made the data driven decision, and we were able to explore. What actually we have been looking at for sage though as well. Okay, if we just take the data that went into parsley so our second generation data sets, can we actually get more mileage out of it. Previously we've been training against things like optimized geometry is torsion profiles and vibrational frequencies, all kind of equally somewhat equally weighted. But do we and should we be training against all of those targets. You know what happens if we don't include vibration frequencies what if we swap them out with directly training testing data. What if we derived our force and angle constants directly through something like the modified seminary method what if we changed the relative weighting of these contributions to the data. And then from the data itself, what happens if we start to filter some of the cars and drives by removing those which have strong steric interactions what if for the optimized geometry so we don't just take in all of the conformers that were given what if we only try and retain the distinct conformers. So just by looking at the parsley data and re evaluating and doing 10s and 10s I think there's over 20 test bits that have been done here and assessing every single one of those against quantum data, as well as some protein ligand data, we've been able to see some yield some significant improvements in the sage release candidate, even though the data is very similar to what went into the one dot two and one dot three releases. On this slide slide I'm showing hopefully some of these significant improvements that we're seeing significant improvements to optimize geometry derived targets and these were, if you've been part of the industrial partner benchmarking project these kind of plots will look very similar to you as well as using the molecules which were contributed by our industrial farmer partners. So here I'm showing histograms of a number of metrics. So in particular, the RMSD between the minimized QM and MM structure for a range of different force fields including the GAF 2.1 force field, the open force field 1.2 force field, the 1.3 and the sage release candidate. So definitely seeing some significant improvements in the sage release candidate. Again, just by having explored how we use the data that we're creating against. In terms of Delta energy differences between the QM and MM conformers, again, we do see relatively good performance from the sage release candidate actually been going from, say, parsley 1.2.0 to 1.3.0 it seems like there was perhaps a slight regression in performance, but in going from 1.3.0 back to the release candidate, it seems like we've resolved that regression and improved things whilst also improving conformer based properties. And same with TFD, which is kind of like a weighted internal coordinate RMSD. So significant improvements from the sage release candidate here. Again, this was all born out of this process of just exploring and using our automated infrastructure to perform tens and tens of test bits benchmark on the validate on quantum chemical data and ultimately benchmark on this larger set and we are seeing these nice incremental improvements in sage that we were hoping to observe. And so while sage and in general open forcefield forcefields have mostly been benchmarked, at least on the QM side to optimize geometry quantum chemical data, we're rapidly looking to see what other quantum chemical metrics could we employ as part of the forcefield assessment to see whether our forcefields are yielding positive improvements and if we're pushing them in a positive direction. So one of the things that we want to look into is how well does our forcefield begin to reproduce things like torsion drives. So one of the metrics is we've started to based on the jacks ligand set. Take the jacks ligands do a bunch of QM torsion drives do a bunch of mm torsion drives and then for each model tool in the set to compute the RMSD between the mm and QM profile and calculate the average over the set and this is the metric shown here for 1.2 1.3 and the sage release candidate. So hopefully this kind of metric gives some idea of the relative magnitudes alignment, how well the relative magnitudes of two torsion profiles match up. And then looking at, well, what if we take the QM and mm torsion profiles and first scale them and normalize them by their maximum barrier height and then calculate the average RMSD between them and average over them. This is a kind of a metric which we found yields a relatively good correlation with how well does the shape of two torsion profiles match, even if they may have significantly different magnitudes. And again, force fields kind of look like they're doing reasonable here but the idea is more exploring the metrics for, you know, it's just a good metrics for assessing performance against torsion barrier profiles. And just in general, I think as the consortium would love more feedback on and we'll hear a bit more in the discussions later but what kind of metrics do people want to see the force fields assessed against because anything we can do to improve performance where users need it most is something that we would love to include incrementally into our force fields. So another idea for a metric that we want to assess is kind of a finer grained metric based on kind of optimized in some ways geometry is not necessarily optimized. And this really would love to be able to compare well how does the geometry of specific chemistries compare maybe minimized or simulated between QM and MM, because hopefully this will allow us to detect earlier some significant problems that we've seen in previous force fields and I want to give a case study of one of these now as well as showcase how open force field was able to use its automated process to overcome this. And this was identified first by farmer partners and especially a couple of weeks ago, open eye came to us and they said that they've been doing some binding free energy calculations, and they were noticing that in calculations where ligands contains sulfonamide as part of their simulations they were observing that the sulfonamide valence angle was decreasing to about 75 degrees when simulating with 1.3 to 0 and this wasn't present when they simulated with 1.2 to 0. And so they kind of hypothesized that maybe this geometry issue was yielding to problematic binding for energy calculations. So after that was identified, open force field was able to diagnose that between 1.2 and 1.3 two angle parameters in particular seem to have decreased somewhat maybe unphysically, they were able to triage the problem by rolling back those parameters. And I think the day after I had a conversation with open eye, we had an alpha release of a new force field out we provided it to open eye, and they were able to go and redo their binding for energy calculations. So while the geometry seem to have been resolved by this triage fix, unfortunately the binding ligand free energies maybe weren't quite so resolved. But I just want to show that, again, based on farm upon input, within a turnaround of almost a day we've been able to provide a triage solution. And actually when we went back to revisit this problem through SAGE, it seems like the release candidate also does not have this issue due to the redesigned training data that was trained upon. So not only do we have a triage release, we have a longer term solution for SAGE just by varying the data that was trained against. So the process of, we had the idea, we managed to do a test, we fixed the problem, we were able to test it, and then release a fix I think is fundamental and something if people are identifying issues with the force field would love to hear about so we can fix them and include the fixes in our release. And so the other metric which I think we'd like to assess and I'm sure many people would like to see assessed is it would be fantastic, it would be a game changer really if we could routinely incorporate protein ligand binding free energy measurements into the assessments of force field, especially as we're doing these 10s and 10s of fits which are becoming routine for us, if we can assess them against many protein ligand binding finities, especially when we're starting to retrain the bio polymer force field, this would be fundamental and hopefully improving performance there. So as part of the SAGE release as more just a starting to scope out whether this would work well into our workflows, we have been benchmarking each one of these 10s and 10s and test bits against the tick to Jack's ligand sets using percies mainly as a, as a sanity check to make sure that things are not deviating too much. But we don't want to just be focused on one target ultimately we'd love to do either the full jack set and beyond to David Hans protein ligand benchmark. So there's going to a lot of conversations have been had and we're trying to invest heavily in adding support to folding at home, so that we could essentially get access to their wealth of compute power and ultimately and essentially allow benchmarks against even many test force fields and make it routine part of the force field validation and assessment cycle. So hopefully the shift will be coming in the future but if this does pay off again it could be a game changer in terms of allowing better improvements to things like protein ligand of entities as the very part of the first field fitting process. So, open force field SAGE, the next incremental generation of open force field force field, which contains retrained van der Waals parameters which seem to be yielding improvements to things like salvation and transfer for energies, retrained violence parameters which seem to be yielding improvements to optimize geometries and perhaps also being able to produce torsion profiles, continuing the cycle incremental improvements and in general a stepping stone to the ultimate goal of the self consistent small molecule biopolymer and everything else force field. And this release candidate is now available on GitHub, but in the coming days will also be working to make this available as part of the open force field force field package to make it easily useful and testable. So this is all of the work that's gone into SAGE but now I want to take a bit of time to walk through some of the amazing science that's also been done in parallel to what's been going into SAGE because there's a lot of really nice being facilitated, again by open force fields data sets and infrastructure. And one of these really nice and kind of unexpected cases was kind of led by one of the collaboration with the coal group and by Josh Horton in particular, and it's the idea that can we take essentially a methodology which gets used routinely in bespoke fields to a generalized force field, in particular the modified seminario method which gets used by the coal group essentially allows you to derive bespoke bond and angle force constants directly from quantum chemical Hessian data. So the idea is well can we just do a bunch of these calculations for a large set average over them and use these as the general bond and angle force constants. So Josh was able to pull down all of the Hessian data made available by the open force field using tube kit, the package that he created with the coal group. They were able to compute a bunch of bespoke angle and bond force constants average over them, and actually the values that they yielded seemed reasonably sensible, especially for triple bonds the values for the force constant seem to be a much more commensurate with what you'd need to kind of reproduce things like vibrational frequencies. Once Josh had reached out with these kind of idea to me within about a day, we've been able to apply the machinery of the open force field and refit, keeping the kind of average bond and angle force constants reasonably well trained. We've been able to refit an entirely new force field or all the surveillance parameters in a new force field within about a further day we're able to retest that force field against a large quantum chemical sets that the industrial partners had provided. And ultimately what we saw from those is that, while one seems to get better force constants which match better with vibrational frequencies, one also did see reasonably similar performance to what we've got out from just allowing the force field to be fit to things like taught and drive an optimized geometry data. So again, from going from hypothesis to entirely tested force field, and now we're iterating it on this approach with the core group. In a couple of days, really, I think shows the power of open process and infrastructure and open data availability and look forward to exploring more this science and seeing if we can begin to incorporate it into the mainline force fields. And as well as retrained van der Waals parameters, a big interest in the open force field is retraining the electrostatics models. So while I am one BCC has been kind of the workhorse of a lot of calculations over the years but especially for the open force field force fields. The idea is we'd love to take the bond charge correction parameters and start to retrain them against new quantum chemical and even explore co optimizing them against quantum chemical and experimental data and the van der Waals parameters all at the same time to yield a self consistent charge and non bonded model. And so the process for this is currently ongoing. All of Christopher Bailey's wonderful aim one BCC charge a smirk bond charge correction parameters have been ported into a smirk based language, which are made available in the open FF recharge package. The ability to train these BCC parameters against both quantum chemical data and experimental data has been integrated into the fitting infrastructure by adding support to evaluate a force balance and open F recharge. The test fits are currently being performed, building on previous open force field science, especially the work of Mike Chaperrill in his recipe to methodology exploring training these things to a mixture of vacuum and implicit solvent quantum chemical data as well as mixed BCC and densities which are being used for the van der Waals and we're currently in the cycle of training and testing against currently salvation of transfer for energies but maybe expanding this to a broader set of test data as well and we hope to share how that's going in the next couple of months. In the same vein of trying to improve these electrostatics models Trevor Goki has been leading the efforts to explore again this base idea of how much improvement can one get by including off site charges into a force field does their inclusion justify perhaps the introduction of new particles which may yield perhaps slower simulations. So based on this hypothesis and following the process which open force field does it science by Trevor's currently ended support to the open force field toolkit for virtual sites and he's currently working on adding support for training those virtual sites against both quantum chemical data and hopefully physical property data and this fitting infrastructure should be available in the following weeks and the next step will then be to again follow along and use this fitting infrastructure to perform many test fits against quantum chemical data using similar approaches to what are being used with the BCC parameters to see if there's yielding improvements to the force field and ultimately make data driven decisions should virtual sites be included in open force field force fields or is the complexity not worth the expense so hopefully have some more information than this in the in the coming months. Another big area for open force field is the ability to train bespoke parameters for molecules which people are most interested in. And this has really been pushed forward again by by Josh Horton who built open force fields bespoke fitting package and hopefully this package should be available within the next one to two months. But currently, it's been shown that it can retrain well a bespoke set of torsion parameters for the jacks ligand set, and especially it can take advantage of the wonderful work that hired it and fragmenting the molecule in such a way to retain the chemical information that around the torsion and fit to the torsions to fragments of an original molecule that's improving the performance of the bespoke fitting. And it does look like from the work Josh has done that these bespoke torsion refits and introducing bespoke torsion parameters for certain molecules is yielding an improvement to the general force field so it does look like things are working. While we want to make available the fitting package which currently can support bespoke fitting torsion parameters are works being done and looking can we expand this to other valence times as well can when fit things like force constants to the modified seminar method, or especially a big one which would love to explore more and is currently being explored is what instead of generating bespoke quantum chemical data for each of the fragments that the torsions are being trained to. Can we just use a machine learn potential like any 2x to generate the bespoke data that's being fit to and in such a way significantly reduce the cost of bespoke fitting while hopefully still yielding the relative accuracy benefits so hopefully looking forward to this in the next couple of months. And there is more there is so much more. So Trevor Goki and the work that I've mainly been speaking about is just the stuff that's coming in the near term future, but Trevor Goki in particular has been working on automatic chemical perception. You know, trying to automate the process of when and where do we need new parameters. I would made and has been looking at kind of train surrogate models to predict physical properties as a function of force field parameters to take the cost of evaluating physical properties from days down to seconds which would be a real game changer in terms of speeding up the force field science that we can do with things like van der Waals refitting. Paisu Zhang has been doing beautiful work on figuring out different ways that would include Hessian data into our fits that maybe avoid the artifacts that would sometimes found vibrational frequency data to introduce especially by projecting the has seems along internal coordinates before doing the fits. Jeff Satyadi has worked on not only getting host guest binding to sort and support into the open up evaluator but also being able to compute the gradients of host guest binding to entities and is currently using the support he had to open up to evaluate it and force balance to train fits against host guest binding to entities. Jessica March is building off her previous work with the fiber bond order interpolated proper torsion parameters, and it's looking can these be generalized actually to improper torsions as well where improper torsions shouldn't suffer from such aspects from things like steric and electrostatic interactions, and she's done some really nice work in showing that actually the degree of planarity around a means is strongly correlated with the fiber bond order so looking if we can capture that directly within a force field to capture these kind of improper, these kind of like planarity dependent fiber bond order and chemical environment effects. Pavan as well as working with Jessica on the WBO work has also been revisiting our level of theory that we're training and testing the force fields against and also looking into is starting to looking to things like well can we also use things like a machine to learn potential some something like any to also be generating our train data so that we could rapidly expand the size of our training sets without necessarily rapidly expanding the cost of generating such data. So, like I said at the start of the talk, open software, open data, open science is rapidly facilitating the force field science which open force field is doing, and as I've hopefully shown with sage is really facilitating the incremental improvements that are being made to the force field. And as addition to incrementally improving the force fields themselves as generating a wealth of parallel force field science, which we hope will eventually feed back into the force field and yield significant improvements there. And so with that, I'll hand back over to David Mobley for concluding remarks. Yeah, so just to kind of sum up. We were really excited the stage release candidates is available and looks promising on diverse data we've done a little bit more testing and before we before we release but it's really exciting. And we've been really excited and pleased with the work everyone has done and including all the partners on this automated benchmarking and really working together as a group to do this with all the industry partners. We're also excited about the level of community uptake we see both with and without our help. Because we see, you know, this is an effort where we benefit the initiative benefits the more the more people are using it, because you know that will feedback into more improvements that come from the community. We're also excited about new technology that's coming online whether it's stuff we deliberately built to improve force fields or stuff that comes in unexpectedly like some of this modified scenario approach. That's how that appears to results in improvements. And we're excited about the bespoke torsion and a variety of other things that will be able to let you start trying out very soon that look like they're going to provide accuracy gains. Simon, can you go on to the next one. Okay. Good. And really, we should acknowledge almost the whole community about the work that's gone into this so there's too many folks in our initiative even so lists, but then also we're indebted to the amber force field community a gap to developers. And to everyone who supported this financially both before this formally started. And IH and NSF funding that paid some of the way. The open force field consortium and IH for funding work that we talked about today, and we'll see in others you funded fellows fellowships, graduate student and postdoc fellowships in our groups for work that has interfaced with this as well. Next slide. If someone wants to drop the link to that form in the chat that would be great, Jeff or Carmen but yeah so there's potential follow up workshops that we could have on this or follow up talks rather that we would schedule at times that work for you. So you can use this Google form to vote on different things that you are interested in hearing more about bespoke fitting. And what that would look like, how we're, we're at the details of how we've actually been working through optimizing and troubleshooting and trying to improve force fields and where that might be going. Reviewing the benchmarking on reviewing the benchmarking infrastructure how we can improve it what could go on next, what we should be looking at there. And also some demo session, you're interfacing with QC archive using QC submit, and so on. Feel free to vote there. And next slide. Okay so now is when we switch to questions. So, feel free to launch into questions on the talk at this point. Thank you for those in the chat or speak up. Oh yeah and now we separate recording.