 Okay, so I think in the interest of time, we can go ahead and get going. We're really glad to welcome everybody here for our fifth open force field workshop, which is amazing and exciting. And we have a bunch of really great updates to give you today and looking forward to the dialogue that starts and the great things to come. So we'll be doing a bit of jumping around between speakers to cover the different areas of updates with an open force field. I'm going to lead off with kind of an overview and some highlights of what's going to come in the rest of the talk and then you're going to hear from Jeff about infrastructure and toolkits and great software working on and then from Lily on science and Diego on my project management side. A couple points of order so we have this setup as a webinar so that we can handle larger numbers of people and so on. And so it works a little bit different but feel free to use the chat and the Q&A features in zoom to ask questions during the talk so typing them in. And then the speakers will try to answer questions in the Q&A after we're done with our presentations, and there can be some dialogue in the Q&A over there the chat. As things go on, because other people also know the answers aside from just the speakers. And then after the conclusion of talks there'll be discussion and opportunities for more Q&A. We do have to enable your mic so you'll have to raise your hand and we'll call on you when it's your turn. So without further ado to get going. You know really we're here because of you and you know this is a slide you've seen many times before I think. But you know it's great working with such an exciting group of people and having this feeling that we share a common set of problems that we really can make headway on. So it's a great group of people great all the industry folks supporting great team and we won't walk through all of that right now because we want to spend our time updating you on what's going on. I'll remind you that OMSF the open molecular software foundation is our host entity right now and so that plays a coordinating role. And anyway, so let's get going on the science. Again you've seen this before but a key point is that openness is central to everything we're doing and really this is partly because our problems are too big to tackle alone or in small groups we share so many of the same problems. And they're too big for any one person or any small group of people to tackle an isolation so we need to share and we need to be open about the problems and about the infrastructure and about the tools we can use to solve them. And so that drives a lot of what we do. And as part of why we're so excited to have more more people and as many people as possible participating because there's too many problems to go around. Now originally we came into this initiative with a bit more of this idea that force field development and this and improvement would be a linear and organized process where we take an idea we'd implement the idea we would force fields with that idea we test the force fields and it would be great and we would release them. And as time has gone on we've realized that our ideas and what we think is going to work best doesn't always work out the way we expect. And so it works better to take a bunch of ideas to try them with as little effort as possible, or in with a minimum investment of like human time, and then see which of those ideas look the best. And then improve accuracy the most and then get those into the toolkits and the force fields and release them. And so that's kind of changing how we're doing things and I think that's a theme you're going to see coming up in the other presentations as well. So we're going to tell you about the science and about the infrastructure. So one of the first things you're going to hear about today is infrastructure from Jeff Wagner. He's highlighting an overview here but he's going to be going through a lot more of the details. Ranging from new functionality like the spoke fit for fitting custom torsions to things that support the science we're going to later to key interoperability and other are factors that our users really need. One of the things you'll hear about is virtual site support off site charges and where this can be placed and how those are looking as we start trying them out in force fields and we're excited that we are now trying out in force fields. And this really hinges a lot on the work of Simon and and others in the infrastructure space. Also now in the wild is our bespoke fit toolkit for making custom more accurate torsion parameters. So it can take a molecule or a chemical series, fragment it and easily make custom torsions that greatly improve accuracy. Here's one example on the top right. This is just where we started and then QM and the bespoke fit results are much more similar. And one of the cool things about this is you get to easily you can choose your favorite QM method, including GFN X TV, and which is really fast. So it's not true QM method, you can go to something slower and higher accuracy if you want but it's really fast and improved accuracy. So we have some preliminary results from binding free energy benchmarking and there'll be more of that to come. One of the reasons this is cool is that the bottom left. We have really few different distinct torsional parameters in the force field, which actually results in a quite general and pretty accurate force field. Some other force fields have far more torsional parameters. So potentially we could get really high accuracy torsions by just fitting some custom torsions for your chemical series of interest without having to make our force fields dramatically more complex. And of course you're going to hear also about biopolymer support for our coming rosemary release. So as I can attest from some of my postdoctoral work, handling covalent modifications of proteins can be a real pain in the neck with current force fields but with once we have a protein force field. We can easily consistently treat protein and the small molecule and you're going to see an example of that today here's a snapshot from a simulation. We have an amber protein force field on the back on the backbone and sage small molecule force field for the covalent modification and it works. Another thing on the infrastructure front that we're continuing to work on is benchmarking infrastructure. As you've heard in the past, a key thing driving people's interest in open force field has been using force fields for binding free energy and so we've done some work in the past on benchmarking those but we really need to turn this into something that we can routinely run in a high throughput manner to always rapidly assess how our force fields are doing so we're working on getting that type of infrastructure going on folding at home. So we can benchmark rapidly on a large scale and assess progress. In time to all the infrastructure work. There's a whole lot of really great science going going on from checking that we're using an appropriate level of QM theory to better initial guesses to force fields and a key one is protein parameters. And you're also going to hear about work on virtual sites and graph charge models and a lot of other things that are going on. So since our last open force field meeting we did release our sage force field and one of the results from that I wanted to highlight is that we fit refit Leonard Jones to do a better job handling mixture data particularly we fit instead of to pure densities and pure heats of vaporization we looked at fitting to heat of mixing and mixture densities and other properties. We found that training to make sure data actually improves the performance relative to training pure date to pure data only. So if you look at like prediction of non training data but test data salvation free energy. You can see that in the orange and the purple. If we train in the old way, we increase the error. Whereas if we train in this new way, we decrease the error and get closer. The distribution is more evenly distributed and you see the same thing in this data in this visualization as well. And I think new in the force field development area that that pure data can cause some systematic problems that looks like especially potentially the heat of vaporization. We've also made a bunch of improvements to the force field fitting process itself, and these are going to get rolled into our next release we expect. One of them is that previously what most of the data we're using for torsion fitting was coming from torsion drives only, but it turns out we also have data on what torsions should be doing from optimized geometries. So we start looking at including information from those optimized geometries looking at how wrong torsions are when we M M optimized geometries versus Q M optimized them, and we can include that in fitting. And when we do that it looks like it improves accuracy of geometries coming out. You'll see these quite a bit, and this is torsion fingerprint deviation on the top, and we're looking for this peak close to zero to get higher. We've done also another thing we've tried out is taking different initial guesses for parameters from the modified seminary method based on Q M. And that also seems to improve accuracy of the fits we get. We're also for the first time starting to fit our improper torsions. And that also improves RMSD. So geometric measure measures your performance and maintains good energies. And so on the bottom you can see a bunch of these different improvements being tested at once. And what we're seeing in the orange is the best performance and that combines several of these different improvements. And you can see this on industry benchmarking as well this is looking at energy error. And the purple and red methods are combining more of these different improvements, and we see significant improvement in this is how much what fraction of the data is in the lowest energy error bin and so we see that significantly improved. We're also fixing a problem with torsion multiplicities, where we originally had some torsions that combine had central atoms involving different valences. So in this case, with two connections or with three connections that results in different numbers of torsions passing through the same central bonds, and it turns out that's a mistake. So we are addressing that we were able to refit already based on optimization data but we're doing more torsion drives and a full refit based on that. But again we see improvement from that also TFD and RMSD. As we start with sage and then we fix begin fixing this multiplicity problem. We see performance improve. We also have you and you'll hear more about this a preliminary tests of work with virtual sites, and we expect that this would make a significant difference in condensed phase properties. But we're also seeing that it improves accuracy of geometries and energetics relative to quantum mechanics. Again the same type of plots the top is energy error bottom or geometry errors. And we're seeing that the virtual sites actually improves performance on these significantly. In addition to impacting the condensed phase property calculations. And there. Okay, just making my slide advanced here. Also we've been working as many of you know we've been working a lot on automated benchmarking with industry and that's showing progress as well. And as we go through our different force fields. 1.2 1.3 to we see improving performance on RMSD torsion fingerprint deviation. And that's encouraging benchmarking on proprietary data sets shows the same thing. These are plots from a forthcoming industry benchmarking paper. And this the top two include OPL s data and compare open force field 2.0 to OPL s and gaff. The bottom shows progress across different open force field versions and so we see as we get to 2.0 we've improved performance a lot and we feel like we're still just getting started there's a lot of a lot left to do so we're excited to tell you about where that's going today. And another way of looking at the same thing and this is on public and proprietary data is that if you look at what fraction of molecules in a data set, have energies and geometries within a specified threshold, and you can vary that threshold. So we're looking at open force field and gaff versus OPL s on public data sets from industry on the bottom. This is proprietary data in internal partner testing, where we just compare a gaff and open force field. And so you can see in the three different open force field colors. We've made significant progress. And this is really looking quite good. We can also use that same data to look for trouble spots. Where are there systematic errors, and then begin trying to fix those. So, there's a lot more in this on our in our forthcoming paper. And then you'll also hear a quick update on the status of the protein force field and how that's coming along, and we're excited to be able to make a consistent protein and small molecule force field very soon. So with that, I hope I've gotten you interested in what's coming and I'm going to hand it over to Jeff for updates on infrastructure. Great. Thank you, David. Yeah, so I'm Jeff Wagner. I'm the technical lead for open force field and today I'm going to give an update on what we've been working on since our last annual meeting and what we're going to be doing in the coming year. So this year I'm organizing my talk in terms of our organization's values. And I'm doing this because as we've grown, we've needed to figure out how to coordinate progress on these interlocking research and development efforts. We have really creative people on the team and we've got a ton of project ideas, but we've got more creativity and ideas than we have time. And so it's important that we keep all of our efforts aligned. This is Diego, not me. Diego is our new project manager and he has final say on which projects we pursue and how Lily and I ultimately spend our time. Diego shortly after joining pointed out that if we don't know what it is we value, we're going to lose a ton of time being confused about what's our main objective, as opposed to a side quest. So he's going to talk more about this later. But once Diego had time to study our organization, he pointed out that we have sort of a hierarchy of values. And our highest value is product leadership. And for us, this means accurate force fields accessible infrastructure and broad interoperability offerings for users. Our second highest objective is what's called proximity with users. And so this is things like custom solutions or features that apply to a subset of force field users but not all. And engaging in deliberate joint development with other simulation ecosystems to make shared features. We want to achieve operational excellence. And for us, that means improved transparency about our plans and improved predictability about the timelines and scope of the projects that we're going to undertake. So today I'm actually going to go through these priorities backwards from lowest to highest because after me you'll be hearing from our science lead who's entirely consumed with force field accuracy. So she's got a lot of cool things to share. And at the end we'll switch over to her to hear about or force field developments. So in everything that we do there's this constant pressure to add one more feature or to take a passing to suggestion and redirect the work that we're doing to do a bit more. But this comes at the cost of uncertain time estimates and miscommunications over goals. So after working through a bunch of projects, we found that when we have scope creep or ambiguity and who gets to make decisions and when they happen, the cost ends up becoming very high. So here I'm showing some project plan pages for public confidence, where we have these clearly defined roles that ensure that the relevant stakeholders stay informed, but that decisions can be made by the people that are most involved. For a few of our major development projects, we're using these sorts of pages to ensure transparency and predictability. And this ensures that everybody understands the goals and that there's a concrete framework for changing them when new information emerges. And even more so than our code, it's incredibly important that we keep our standards and specifications public and current. So here I'm showing the latest published version of the Smirnoff specification, which now has a dedicated home in the open force field standards repository. This lets the specification be discussed and versioned independently of our software. So so far in our efforts, our greatest successes have come from this pattern of specifying first and implementing second and going specification first guarantees that we wind up with fairly thoughtful feature design for us. And it really lowers the barriers to other people when they want to interoperate with open force field software. But this is just the readable view. It doesn't imply that this is publicly writable. And that's why we have a public source repository for the same standards and specifications. So by maintaining a public standards repo we ensure that we can communicate plans and potential changes to other maintainers adjacent to our ecosystem, and that we have a platform for their feedback. Early on, we added a few pressing Smirnoff enhancements that were brought up in the past year. Those would be things like clarifying ambiguities in the specification and some minor adjustments to facilitate interoperability. On the left here I'm showing the issue tracker from the standards repository showing the enhancement proposals that have gone into the Smirnoff specification. I would really love if other developers in the ecosystem could follow this repository and weigh in when we're making important decisions that might affect them. And since we're still talking about operational excellence. Here's some data from Ben Pritchard showing our cumulative total number of core hours used and cumulative total number of jobs completed on QC archive. We labeled the parsley and sage releases on this graph so you can get a sense for how the QC archive growth lines up with our project history. And so these numbers are going to be a bit noisy because some jobs are trickier than others, but this shows a really cool story about the scale that we're working on. And over on the right, I really want to recognize the QC submission team because they're doing an amazing job of marshalling a ton of really heterogeneous compute resources. And for many of them a condition of use is that our jobs can be preempted by paying customers or higher priority users. And they've been dealing with a ton of unique challenges that they've overcome. So this team's really brilliant and we're getting some really awesome data sets from this for fitting. And as a final point for operational excellence. I wanted to show what I'm going to call this subway diagram for where we are in terms of the infrastructure teams goals. It's really interesting for me to be able to bring us to more clearly communicate our plans and I found it really interesting to make this because it helped us discuss longer term options and tradeoffs. The way to read this is that full lines show tasks that are done, and dash lines are tasks that are in progress, all dotted lines aren't yet started. The colors indicate who's working on what. And when things are in gray that means that nobody's been assigned to do them yet. And this is a black line that shows where we are right now. So from top to bottom, we have bespoke fit having been released a couple weeks ago for those of you who are watching. And this is a black line because Simon and Josh Horton mostly worked on this. In green, we have Josh Mitchell's work. And this is the documentation track. So our themes across all of our repositories have been standardized in our documentation, and we've released user guides for all of our major packages. So Josh's plate is going to be a centralizing documentation examples and preparing a series of example videos and running hackathons. Next is the toolkit track. And in black, we have the completion of a refactor of our virtual site implementation, which I'll be talking about more later, and it's black because Simon did this before he before he moved on from open force field. And in blue, we have what's going to be our next major release, which is the toolkit biopolymer refactor. And you should expect to see this in the coming weeks. In red. This is David Dotson's work on our on our QC compute marshaling. So like I showed before we have heavily automated QC computations. In the next few weeks QC archive and multi are going to be releasing a new version of QC fractal that's going to require some updates to our infrastructure, which David's working on. Like Mobley showed at the beginning, we ran a season of benchmarking using a couple of custom made tools. And that's complete, but we do want to continue on benchmarking, especially making a more modular benchmarking framework that the science team can use where we can add in new sorts of analysis. To guide our force field development. In red, we have David Dotson working on interoperability with open free energy. And after that, he'll be working to complete our interface with folding at home and I'll talk more about that later. And finally, last but not least in orange. We have Matt Thompson working, doing incredible amount of work really to prepare interchange and interchange has already been integrated into the toolkit. And it will be a major part of our next toolkit by a polymer release. But in the future we're going to work on getting our amber and grow max exporters production ready and able to handle proteins and virtual sites. And after that, we're going to work on importers for things like palm tops and serialized open amount systems. So that's what I have to say about our operational excellence. Our next highest priority is proximity with users. And to me, this means new future development tailored to either a subset of our force field users, or direct commitment of resources to engage with another simulation community. So for the lucky people in this audience who haven't yet been woken up by my midnight slack notifications you'll be happy to know that the spoke fit has had its production release. This is a first in an operational model that we'd like to reuse in the future where we have the science team writing some great code and functionality. And then we want to transition the maintenance of that to the infrastructure team so that the science team is in burdened and they can continue forward with making more ambitious developments. The release of the spoke fit is thanks to a massive amount of effort from Josh Wharton and Simon Boothroyd and a lot of polishing of the documentation by Josh Mitchell. So the spoke fit is a way of running quick QM torsion scans of your molecules degree of freedom to generate extra accurate torsion parameters that replace the generic torsion parameters from a given force field. The spoke fit first fragments your input molecule into representative substructures, both to reduce computation time in the torsion stands, and also to avoid adding complexity from distant atoms or other degrees of freedom. A number of options to reduce the computational cost of the subsequent QM calculations, but the most likely use case for bespoke fit would be to improve the parameter quality of molecule before something like an even more expensive operation like a free energy calculation. So for a lot of users, the cost of the QM here is going to be kind of negligible compared to the next steps. Simon and Josh Wharton put a ton of thought into the design of bespoke fit, and you can deploy bespoke fit in such a way that instead of feeding in one molecule at a time and possibly doing a lot of redundant work. If they fragment into identical fragments, you can feed in a ligand series from the beginning and bespoke fit will identify the common core torsions and only run the minimum number of QM jobs needed to parameterize them. So bespoke fit is really powerful. It's also a bit more complex than many of our other published tools. And so I strongly recommend that people who want to get started with bespoke fit. Look at the quick start guide, which you can find by searching on GitHub or you can grab the link from the slide once we send the slide deck out. This walks through a pretty neat custom use case where to save time instead of doing full DFT. So bespoke fit can be set to use GFN to XGB, which is a fast open source semi empirical QC method where you might find that to have a better cost benefit ratio for your use cases than full on QM. So, I think this is a really cool example of hot documentation from open force field and I'd recommend that you jump into these docs and sort of choose your own adventure. What you'll be doing is polling for follow up workshops after this meeting and we'll have a poll open for about a week, and then we'll run some follow up workshops in the coming months on topics of interest and bespoke fit can be a topic that we can go into more depth about or help users customize. And so since we're still talking about the user proximity goal. I did want to follow up on last year's benchmarking project. So to review. This is a project where people could feed in a bunch of molecules and then have our D kit generate conformers of them, then use Cy for to run a quantum and chemical optimization of the conformers that are D kit generated. Then we bring in a bunch of different mm force fields, and we see for each of these minimized QM conformers. With the force field to keep the molecule in that QM minimum by looking at the geometry. And to what these mm force fields accurately predict the energy differences determined by QM between these conformers. So this manuscript is nearly done. And a lot of you have probably been getting harassed to review the early state and approve it. And we will of course be waking you up at midnight with channel blast and the general channel when we've uploaded it to a pre print server. So what's cool about this coordinated benchmarking project wasn't just about so much interfacing with users but it also gave us a ton of information about the weakest parts of our force fields. So on the left Lorenzo who ran the study identified some simple molecules where we're getting conformer energies wrong. Lorenzo used our toolkits to identify which specific parameters were in these degrees of freedom that ended up having large error. And so we're seeing that like T 64. The torsion parameter T 74 is appearing a few times and T 75 keep causing trouble. On the right Lorenzo does a more statistical approach. And he looks at how often we're getting torsion angles from so that's this measure of violation. Which parameters are assigned to the torsion when that happens. So again, our toolkit offers a bunch of custom ways to dig in to what's going on with these force fields on the left we see sort of a case by case inspection whereas on the right we can we have a larger data big data set approach. So this is great. In the scope of our overall strategy. This is a nice situation where a single project serves both as a way to interface with our users and help them build familiar familiarity with our tools, but also as a way to provide quantitative data that can help the force field team. So here we could take these torsions that Lorenzo has identified and suggest targeted improvements like maybe we could split these parameters up based around their surrounding chemical context. So now we're getting on to the main course. Let's talk about how the infrastructure team is making our force fields into a kits more accurate accessible and interoperable. And so in this talk, I wanted to start each section by saying what we did do since our last annual meeting. And this year we've done a lot of small things we've worked on the documentation we've tried to improve the user experience on a lot of fronts. And so I can't really enumerate everything. So, when I should be talking about how we've improved the accessibility of our software. I think that the best thing I can do is report the success by proxy. So the y axis on this chart is the number of stars on GitHub that a project has and these are kind of like Facebook likes for developers. And the x axis is the date and again I've labeled where we've had our major force field releases parsley and sage. Now, each different colored line is a different open force field project repository. And I'm showing them for a bunch of our user facing packages. So I want to say that this tells us that we're doing right by our users and our downstream developers, we're investing in better documentation, we're keeping the API stable. And overall, we're working on having a good user experience. The main force field toolkit is our oldest package and you can see that it's been gaining adoption since 2018. I didn't even join the project actually until late 2018. So this was already well on its way before I came in. But we have two major new projects that I expect to be rocketing up in the coming in the coming months and years. And those are bespoke fit in purple, which, as I said just launched a few weeks ago and it's already gaining a lot of traction. There's a major change in brown, which is I think a slow and steady growth but we're hoping that this is going to become the core of a lot of user workflows in the future. And so stars are easy to measure, but they aren't the be all and end all for us. One thing that actually really cheers me up when I'm having a bad day is to see how many other people are using our tools. So this is a GitHub code search for the string from open ff. This is a string that would pop up in your Python script if you're using our tool kits. So when I do a search of public code on GitHub, I get over 1200 results. And a lot of these are going to be our own files because we import our own stuff. And a lot of these are going to be bots and duplications and things like that. But I think this is still really cool. I don't know who these people are. I don't know what their projects are doing and that's a great way that we can have a project that scales out to a lot of users. So this is useful for me as a developer because sometimes we're in a situation where we want to make a little change to the API or remove something and we're pretty sure that nobody uses these small features. But we can do a search on GitHub for that feature that we want to change or remove and sometimes we'll find that somebody is using it and they're using it in a way that we didn't expect. It wouldn't be enough to convince us that we should just keep the feature there intact, or maybe we start issuing a deprecation warning that tells the user what commands to replace it with in the future. And so I think a really big component of our user growth has been the result of some great focused work by Josh Mitchell. So Josh is our specialist technical writer and scientific communicator. He's a computational chemist who's absolutely obsessed with pros and page layouts, and he's been doing some great work all over our repositories and documentation. What's really helpful is that he also brings a fresh pair of eyes, and can see the things that we've gotten like accustomed to overlook it. For example, he realized that when users go look at documentation for one project, there is very rarely references to the existence of other open force field projects. And so these users will come in for help with a certain task but they don't end up discovering that we have other tools that can help them complete their whole workflow. So Josh saw this thing that this problem that we've had for a long time, and one day he just made this central documentation page it's docs.openforcefield.org. And this page is really simple it just lists all of our public facing production ready software and links to their documentation and provides short descriptions for all of them. You know, I think when you're standing so close to a problem you start forgetting that it's there and Josh can come in and just fix stuff like this, which I think is really great. Another example of a complete oversight by us is one day Josh opened this pull request called molecular gastronomy and gastronomy is this obscure word that basically means cooking. So, Josh had added a quote cookbook to the open force field toolkit documentation. And this just lists every single way that you could make a molecule. And I realized that actually when I talk to new users or when I try to convince somebody to use the open force field toolkit. I always have to start by explaining what's an SDF and what are all the ways that they could make their molecule. And it's like, Oh, yeah, wait, we could have just made a page and listed all the ways to make a molecule. And that's, you know, the big barrier that probably deflects a lot of our early users. So Josh does all sorts of great stuff like this. He did a really good job of standardizing the themes of our documentation. So now when you go to all of our different repositories, things are very consistent. We've got that cool logo. I love this logo, and we're using the right shade of blue so it matches up with our the rest of our branding. And, you know, I'd like to walk through a lot of specific stuff that Josh did, but it's just like everything got easier. And if you look at on the left, this is the old form of our documentation on the right is a new is the new. And when I interface with these pages, I just find them more pleasant to look at and faster than navigate. So this is supposed to be the end of me talking about Josh's work, but you're actually going to see a lot more of it, because I've stolen a ton of his figures for the rest of this talk. Okay, so that was accessibility. Let's talk about interoperability. First, I need to apologize to everyone, we are going to be switching our units package in the next open force field toolkit release. The units package, formerly known as SIMTK units, which is now known as open MM units aren't really the best option for us moving forward. And so instead, we're switching over to a solution based on pint. Vantage is this really broadly supported units package that's used all over the place, including emulsi software, like QC archive. And our big advantages here are that we're going to have tighter control over the serialization of our data, and that will be able to swap in different code data references is needed, among other advantages. So we do know that a lot of you have scripts that need to talk to an interface that expect open MM units or that units open MM units. So we've made sure that we have these very straightforward converters, we have to open MM, and we have from open MM. And so it should be really easy to update your existing scripts to work with the new toolkit. And if it would help to see these in action, we're going to be updating all of our examples for the next toolkit release so you can see exactly how these converters can be used. What we're going to do for interoperability is that we're about to have interchange fully integrated into the open force field toolkit. And this has been the result of a really huge amount of work by Matt Thompson to do the initial developments in the validation to show that we're getting the correct outputs. He actually in implementing this and validating it, he looked so thoroughly that we found places where the original toolkit code was doing the wrong things. And he found fishing with dynamite in the net and he found a bunch of bugs out in the ocean. And so, yeah, we found existing toolkit bugs just by doing large enough searches, and he found areas of the Smyrnov specification where when you looked at it, it was actually incomplete. And so he spearheaded a lot of both development and specification work to shore these up. And here is a workflow that's sort of the main use of interchange for most users. And in this case, we're loading a prototype of a protein force field. This is FF 14 SB ported from Amber into Smyrnov format. And we've got a typology that already exists. And now instead of running force field dot create open MM system we're running force field dot create interchange. And this object that comes out, we see it's an interchange it knows what kind of potentials it contains and how many atoms. And with interchange you can go in query parameters directly so if you wanted to change some bond parameters or see what's assigned where all of the information is in there, and it's accessible through the interchange API. But what's great is most users don't care. And so you can still go right to open MM using this interchange dot to open MM function, and you'll get out the same open MM system that you would have by running force field dot create open MM system. So to drive this point home, your existing scripts for just about all of our users are going to have a call to force fields dot create open MM system where you pass in a chemical topology. Now we have the option of running force field dot create interchange with that same topology, and then running interchange dot to open MM. So this is a meme from the office corporate needs to define the difference between this picture and that picture. They're the same picture. And I mean that in more than a, you'll get the same results sort of way I mean now in both cases the same code will be run. So you can trust interchange to to be a core part of your workflow, once we get it into the next release. Okay, so a bigger picture. What is interchange interchange is a lot of things to a lot of people. In a workflow interchange is the thing that you have after you've applied a force field to a molecule, but before you've exported that to a specific simulation format. You can do all sorts of manipulations at this stage. For example, you can combine interchanges to add in components that came out of different workplace. So for force field developers interchange is a way to reach into a molecule that's already had parameters assigned and modify the physics. And that's going to be really helpful for our fitting team. For other people, interchange is going to be our replacement for par meds conversion functionality, kind of like what you see on the slide. Some workflows that users want to use are only available, starting with amber and grow max format files, and workflow makers will need a way to get their parameterized components into those formats. That's what we hope to offer for them with interchange. At this moment, we're only endorsing fully the open mm export route like I said we've done a lot of validation on this. And we've proven that it's equivalent to the existing toolkit behavior. But we also have partially featured amber and grow max exporters in place and we'll be updating you as we develop and validate these. One thing that's not actually shown in this picture but it's really important nonetheless is that for machine learning people who want an entire simulation system defined as numerical arrays. There are API points and interchange that export to vectorized representations. But because there isn't really a single standard destination format for how you'd represent a simulation system in a machine learning library. You may need to access the lower level API to to make your custom exporter that we do have some existing exporters that may work for you just out of the box. So a question that we get a lot about interchange is, well, when can I fully replace par med in my workflow. And the answer is that depends on exactly what you're using par med for. So this is a work breakdown structure that Diego helped us make, and we've been making these for a few projects in open force field but they're super technical so we're not going to show any other ones today. But this is the WBS work breakdown structure for the upcoming interchange releases. On the top right is our color guide for the priority of each item so blue is top priority, and then green and then yellow and then purple. And you can read these colors as sort of corresponding to major upcoming interchange releases. So, in our in our work breakdown, we've got these branches for the major simulation engine we've got amber and bromax open mm. And under each one, we have a split between export functionality and import functionality. And what's important to see here is that the exporters are all in the top three tiers of priority, they're blue, green and yellow. But the importers are all in the bottom purple tier. And this is because during our early design work, it became clear that imports, like making an interchange from a palm top or from a serialized open mm system. That's going to be way harder than exporting. And so, because both, both directions are valuable, we're going to get fully featured exporters working before we move on to making importers. Now, what that means for you, if you're using paramed right now to convert from open mm systems made by the open force field toolkit to an on disk format. If you're only doing that for small molecules, we call that vanilla, and you can go ahead and start trying to put an interchange right now, we're fairly confident that you're going to get the right numbers that there's only a few more bugs to shake out. But then we would call proteins non vanilla, because we need to do some more validation about how we're handling hierarchy information like residue names and residue numbers. In order to solve these four formats we need to resolve issues where residue definitions can be tied to the physical parameters. And that's going to be complex for us because Spirnoff parameter application rules don't know about residues. So we're going to try to get residue handling in as soon as we can. So that you can export the systems made by our next generation of force fields. So in a similar situation with virtual sites. We're going to get these in as soon as possible, and we're going to prioritize these higher if it becomes clear that our rosemary force field will have virtual sites in it. But right now that's a maybe and not a must so if that becomes certain, the priority of the virtual site exporters will go up. But the big thing that I want to get across is this message at the bottom and that is, if you're using paramed in a workflow to load components from files. So if you're using a parmed top or if you're loading a serialized open a mem system, and subsequently combining them, then it's going to be a while before you can use interchange to replace that. That's going to require importers and I would say that there's almost certainly will not be supported in 2022. Okay, so that's an update on interchange. I did want to talk about now getting into force field accuracy topics. So we had the possibility of using virtual sites in rosemary or a subsequent force field release. So we worked on an initial implementation of virtual sites in the toolkit a few years ago. But what we found is that the further we got into development, the more complexity we had to add to handle some really tricky edge cases. And these edge cases were technically allowed in spec but it pretty much indicated that the user was doing something wrong. We tried hard to accommodate all of these edge cases and the complexity of this initial implementation ended up being really severe and it made it susceptible to a lot of bugs. So when Simon, our previous science lead went to work on virtual sites, he kind of took a machete to this complexity he proposed an update to the Smirnov specification, and he added a new implementation in the toolkit that just ruled that weird edge cases are unsupported. So Simon did a great job with this refactor adopts simpler behavior and it doesn't allow ambiguous uses. And even better this refactor was accompanied by some really stringent tests for parameter assignment and geometry, geometry, and that will help us keep the complexity and check in the future. And so here we're showing a couple different kinds of virtual sites that can be applied. Lily will talk more about the kinds of virtual sites that are likely to go into our initial force field, but this is sort of a proof that we're doing it right now the geometry looks straight. And these are correctly interacting offsite interaction centers. Now, as we're preparing to make a protein force field, there are a few new pieces of infrastructure that will need to add to the fitting pipeline. Now, in particular, I'm pretty sure that a lot of our users are here because they want to know if our force fields are going to perform while in predicting binding free energies. So, John Kodaro was gracious enough to make us an introduction to the folding at home team, and by a stroke of luck, the open free energy consortium got started right at the same time. So we started working together on making the infrastructure to regularly run protein ligand binding free energy calculations on folding at home. So folding here is an early working draft of an object diagram involving a mix of existing open force field components and planned open free energy components. And a lot of these are have a proof of concept implementation or fully featured. So what's happening is that we're working with open free energy to define the core object models for their infrastructure. So we're preparing and formatting some protein ligand data sets that we can use a standard bench for force fields. And we're ensuring that our infrastructure will be highly interoperable with the open free energy toolkits and workflows. And so this is very early stage. That's all I'm going to say about it today. We'll have more to talk about next year or in the coming months. But I do want to really encourage you to tune into the open free energy team to hear more about what they have coming. Okay. Now the final thing I want to talk about today is adding protein support to the open force field tool kit by quote supporting proteins. I mean a few things. So for one, I mean efficiently assigning parameters to very large molecules using chemical perception, which is possible but it's required a bit of refactoring in the toolkit. The next thing that it means is adding data fields and logic for handling hierarchy information like residue names and chains to our core object models. But most of all, it required doing the one thing that we keep telling you not to do. And that is, you cannot make an open force field molecule from PDB. What we've done is we've made this new API point called molecule that from polymer PDB that assuming that your input PDB has explicit hydrogens is only made of canonical amino acids and normal in, you know, reasonable protonation states and has a correct atom names. We can load that into an open force field toolkit molecule. Okay, well, wait, why can't I load molecules from PDB like ID kit can do it right here's arginine main chain cap. And it looks like things are going well we've we've identified a formal charge on the side chain and we've got some double bonds here. What's wrong, you know you think that everybody wins. But very often with PDB files in the wild, you run into trouble. So here we try to load a capped histidine protonated at the delta nitrogen. And here it wasn't able to recognize any of the bond words here is something wrong about this input. And this is a pretty realistic input I believe this came out of amber tools and so we have to be a little bit strict and how we handle PDB loading because we don't want to have failed deadly cases like this where we automatically try to add some information, but we do it wrong and we have no way of warning the user that that happened. So the fundamental issue is that a PDB file doesn't contain all of the connection table information that we need. So instead it's defined that certain atoms with certain names in certain residues should match up with an authoritative template for this residue, and that that template will fill in the bond orders and formal charges. Now that authoritative reference is the rcsb chemical components dictionary. So in conjunction with open mns very strict PDB reader. We've put together functionality in the open force field toolkit that parses the chemical components dictionary to load the connection table information that we need. So to us, a protein is just a really large kekule structure suitable for smirnoff parameter assignment. And while you might say, you know, wow, that's a lot of work for something that our D kit could kind of do already. What's even more exciting is that building this infrastructure for proteins opens the door for us to load and parameterize user defined polymers in the future. So I want to take a quick detour and show you what a loaded protein looks like in the open force field toolkit. In the first box here. I'm showing our residue iterator, and I have this philosophy that as you move away from the PDB specification and towards anybody holding a force field. The odds that you can find two people that agree on what a residue is. It becomes astronomically unlikely. So we expose this default residue iterator it's very lightweight, you can customize it if you like. And we expose these default iterators for residues and chains. They're really just there for your convenience in the open force field toolkit, we still for parameter assignment we only look at the graph. And so these residues don't affect parameter assignment in the open force field toolkit. Residue objects also have a little API where you can query their their atoms and a few other things. And we're hoping that this will make it more convenient to deal with proteins in the toolkit. We've also done significant work to build in logic for conversions and round trips so that hierarchy information will survive going to other formats. In the first box, we are sending a open force field molecule and topology over to open mm, and we see that open mm does recognize that there's one chain there's some number of residues, and you know the atoms and bonds. And if we go further into that open and topology, and we asked about a specific Adam that Adam knows about its, its residue information. So we have a similar thing here instead of open mm we've sent that protein to an Rd mole, and we can query the atoms of that Rd mole and we see that they know their, their residue information. So this functionality is really exciting, we're just finishing off the final stages, before we put this in a production release. So you can try out the pre alpha now, and we expect to begin a one month or C period in the next week or two, and then to have the final release. Okay, so at this point you sat through a boring rant about PDBs. Maybe some of you people are weird and you like that. And then you looked at a bunch of pictures of Jupiter notebooks and so I'd feel wrong if I let you go home empty handed thinking that that's all that we had for PDB functionality. So as part of our alpha testing for the next toolkit release, I asked our scientific communicator Josh, who's been laboring away, thanklessly for months on documentation pros and page margins and color shades. To double check that our infrastructure would eventually be able to handle non standard proteins. And so what I'm going to show you is very pre production. But if you'd like to learn more we could run a follow up workshop on this topic and you can, we'll make sure that you can vote for it in the coming poll. Okay, so here's an alpha helix on the left. This is five alanines, a cystine and then five more alanines. And here's a dye, and this is called fluorescein five malamide. And it's got this bond in the red circle. And this bond really likes to stick to cystines. In the production table manipulation, we can combine these to make the product of their couples shown here on the right. So this central cystine in the alpha helix has gained a bond to the dye. Well, it would be really neat if we could simulate this. But if I look from the perspective of any single force field, you know, a normal protein force field would look and say, No, thanks, you know I can't deal with this messy modified amino acid in the middle I can't recognize that thing. And if I look from the perspective of just a small molecule force field it's going to say, Well, yeah, sure I suppose I could handle that but you know I'll put some numbers on these alanines and these backbone torsions but it's not going to be pretty. And so if only there was some way to let the protein force field handle the stuff that it knows, and then anything that it can't cover to delegate that to the small molecule force field. So an interesting thing about this Smirnoff format we keep talking about is that they act directly on Kekulay structures so they don't care about atom types or residues. And you can actually, if you read the fine print, you can add two Smirnoff force fields together. And when you do that you put the parameters that you want to be generic, and the first force field, and the parameters that you want to be specific that will override the generic parameters, you put those in the second force field. So we could in theory, load a small molecule force field, like sage, and then append a protein force field, and the molecules that this combined force field parameterizes would get protein parameters for the recognized residues and small molecule parameters for the unrecognized residues. Well, and like I mentioned before it just so happens that we'd worked with Dave Siruti a while ago to make a Smirnoff format port of Ambers FF 14 SB, where the parameters were based on Smirks patterns that looked for entire amino acids. We can make a single force field by joining sage to this port of Ambers FF 14 SB. And it basically just worked. On the left, I'm showing all the bonds that were assigned to this modified protein of interest, and which force field they came from. All of the bonds and angles and torsions and Van der Waals terms were successfully assigned from either the small molecule or the protein force field. And the protein looking parts, the alanine and the backbone, got protein parameters, those are in yellow for amber. And the unrecognized parts got sage parameters, those are in green. And in this case, it's actually kind of interesting to see that the 16 got protein parameters, even though they're part of a larger modified residue, because the substructure happens to look exactly like see why x, the, the dive sulfide bonded 16 residues. And all of the modified, like all of the dye and the modified amino acid got valence parameters sage. Now, the partial charges were a bit tricky, because we didn't have a library charge for the modified amino acid. So over here on the right, we had to go and do a separate step where we made a library charge by splicing up the modified residue, and then capping and charging it using a and one BCC, then turning that into a new library charge in this combined force field. And here it is simulating the things that seem like they should be planar state planar, nothing unfolds. And so things don't look too bad at all. We would love to run a force field workshop on hackery like this. And so you can feel free to vote for this and follow up polls. I'll say at this point we wouldn't recommend that you actually use this workflow for publishing or anything. Sage and FF 14 SB have never met. So combining them is maybe a little bit scientifically dubious. Not to mention that our FF 14 SB port is a little bit a little bit unwieldy. But the protein force, but the force field combinations that we did here are very similar to how we're going to be making the rosemary force field. So we're starting to use parameter overriding to titrate in a small number of protein specific parameters into a small molecule force field like sage to wind up with a force field that captures the most important aspects of protein structure while also remaining well suited for small molecules. So, on this high note, I'm going to turn over the talk to Lily to talk about the details of exactly how we're going to be making rosemary and what else is on the roadmap for the future. Um, so hi everyone. I'm Lily. I just joined up in force field at the start of this month as the science lead. And today I just like to show some of the amazing work that the science team has done over the past year. What we're currently working on. And some exciting future developments to come. That is if I can change this slide. Oh, you may need to click in the window to get it. Yeah. No, I am doing that. Let me cut off your shirt and try again. Okay, could you request once more. Oh, great. Um, cool. So, uh, at the last meeting, some me introduced the development that went into the sage force field, which was released at the end of August. As David and Jeff have both introduced our next big goal is the release of a bi polymer force field rosemary. Rosemary will be the first open FF force field to incorporate parameters that support proteins and other bi polymers alongside small molecules. Jeff, I still need to be able to change slides. Oh, I'm sorry. Let me restart the share and then we can take one more try. Okay, can you request once more. Is that me. Yep, that's you. Okay, fantastic. Thank you. Yeah, sorry about that. Cool. Yeah, so the next big goal is rosemary, which has been really a massive project. It's been led and driven by shaping Canada, but honestly, many people across the entire organization have contributed to the science and infrastructure. And all the other needs of building our protein force field. For example, as has been shown by Jeff, a crucial part of this project is an efficient is an efficient, intuitive, fully featured interface for working with proteins in the open FF toolkit. And to build this open FF script out all the necessary features by listening to and working with stakeholders in the community. So the infrastructure team, particularly Jeff and Avon spent a significant part of the last year on this really incredible refactor and extension of the toolkit. And the result is the impressive support that just just showed. And the additional software features are really just one part of the rosemary project with rosemary, we are focusing on adding several new biopolymer specific parameters, such as the backbone torsions in a protein. To fit these new, these new parameters, we've really been able to capitalize on the existing quantum chemistry pipelines laid out before to systematically generate testing and training data, such as optimized geometries and torsion profiles. This itself has been kind of a back and forth process, where as you generate this QM data, we use it to guide us on selecting which parameters to optimize, and where we might need to generate more data to look more into the science. For example, one question we've had to ask is how specifically should a single torsion parameter apply. With this manner of format, we can choose to model all protein torsions, or all protein background torsions with say one parameter, or to have different unique torsions depending on the side chain evolved, or possibly even the Rotomeric form of the side chain. To stick into this, Shapen computed torsion drives several cap peptides. He looked particularly at one rhythm of alanine, two rhythm is a proline and two rhythm is a tryptophan, to see how different the energy profiles were between them. So, the side chains were constrained to the particular Rotomeric forms, and the QM energy was calculated at various phi angles on the x axis and psi angles on the y axis. He also labeled the secondary structure motifs on this plot. So with alanine, the torsion drive pretty much looks as we expect, with minima at the gauche confirmations. When he looks at the most populated Rotomer of tryptophan, it actually does look quite similar to alanine, except for this additional minimum here. At about psi equals zero, which is likely driven by the focus side chain. Looking at the second Rotomer of tryptophan, however, we see quite a dramatic difference. For example, this minimum at psi equals zero has flipped to about 180 degrees here. And the differences are particularly highlighted if you plot the energy, the profiles of the energy differences. So, the difference between the tryptophan Rotomers on top is much more dramatic in scale than between the most populated Rotomers of tryptophan and alanine. Ultimately though, while looking at these QM energy profiles is useful, what OpenFF actually cares about is whether we can follow and reproduce the shape of this energy profile with MM parameters, specifically with some number of torsion parameters. So therefore, it's probably more useful to compare the different Rotomers looking at a few different targets. The differences in RMSE between the energy profiles of the QM energy, of the MM energy as computed by the Sage force field, and of the MM target. What we're looking at here is the difference between the QM and the MM energies, but without the torsion parameters involved. So the target is basically the energy that we would use to fit a torsion parameter. So interestingly, if we look at just the differences between Rotomers of the same side chain, and also alanine and tryptophan, we see that the RMSE is actually quite low between the MM targets, which implies that they actually find up being fairly similar. Once you look at either Rotom or a proline with any other side chain, however, the differences increase dramatically. So, given the relatively small differences between Rotomers, but the large differences between proline and every other side chain. The initial plan for Rosemary has been decided to train backbone torsions individually for each side chain, and to generate some Rotomeric data for validation, but to treat the possibility of fitting independent torsions as a goal for a future release. So this QM data is still being generated. Once that's completed, Rosemary will move on to the fitting stage. And here the currently existing small molecule parameters and the new protein parameters will be fit at the same time, so that the entire force field is self consistent. And that brings us to the final but very essential part of the Rosemary project, the benchmarking stage. Shaben has been leading a huge effort to curate the experimental data sets for evaluating protein force fields. Based on discussions with domain experts in the theoretical and experimental community, we've decided for now to focus on NMR observables such as chemical shifts and scalar couplings. And as Jeff said earlier, we're also working with open free energy to automate protein ligand binding free energies, which are being designed to run on following at home. So since Rosemary will be a single force field that can handle both proteins and small molecule ligands, this will provide a means for routine benchmarking against data protein ligand data sets. So this will give us confidence that our parameters can provide accurate results and can also be used for diversity of targets. Besides the fitting operating parameters, the team at open force field has been working on many other projects alongside exploring the many other aspects of force field fitting. The parameters of a force field can be broken down into three groups, which have actually aligned with the major focus of each force field release so far. The valence parameters which were refit and parsley, the van der Waals parameters which were refit and sage along with another valence refit. And now we're looking at the electrostatics. Currently working on two major projects that may or may not make it into Rosemary depending on how confident we are about the quality. But that we think will be hugely valuable going forward. And the first is moving away from human based methods for electrostatic methods for electrostatics. Towards using machine learning models such as graph convolutional networks for predicting partial charges. So graph charges and graph models as a whole, a very appealing for a couple of reasons. Firstly, especially in light of the forthcoming release of Rosemary. They are likely to scale much better to larger molecules than cure methods. It would be very useful to be able to apply an efficient unified charge model to both macro molecules and the small molecules. The aim one is to see method that we currently use becomes quite slow for molecules larger than 150 items. And it's really prohibitively expensive for anything on the scale of proteins. And secondly, a known flaw of Q and base methods and a one basic is that they give us charges that a conformer dependent. So the same molecule can get different charges. You know, if different companies are generated. And this does affect simulation results and makes them a bit less reproducible. So with a properly trained graph network model, we may be able to support generating partial charges that are the same quality as a one basic. We can also show this conformer dependence and could be much faster. So over the past year, Simon and the infrastructure team have put together a few software tools, both for training graph models and for applying them with the open ff toolkit. And we're now looking at answering scientific questions, such as what would be the best features to incorporate initially and selecting relevant hyper parameters. It's quite exciting because graph models as a whole just show a lot of promise with force fields. The espeloma project and package spearheaded by Genshin Wang has been able to learn and predict not just partial charges but also valence parameters simultaneously. And it's been able to produce pretty self consistent by polymer and small molecule for field parameters that perform quite well on test protein the consistent. And that really excitingly espeloma has been able to predict and one bcc partial charges with lower error than the differences between the toolkit back ends used by the open force field. So that's really encouraging. And it might mean that we should see me for a similar charge bottle given well all the other advantages of this approach, such as scalability and giving us customer independent charges. The second big electrostatics project I'd like to talk about is adding off site charges with virtual sites. So it's quite difficult for the classical add and centered fixed charges fixed charge model to accurately represent parts of the electrostatic surface around a molecule. For example, this methyl bromide has a single hole here on the electrostatic surface. At H621G star that is. If however you project a surface. The surface generated by Adam centered and one bcc charges. Like these in stage, the, the single hole is missing. This and similar result, this and similar issues can result in clear and systematic errors in simulation properties such as hydration for energies. So, knowing that we can add virtual sites to improve the anisotropy and give some electrostatic surface that corresponds with your much more closely seems like a clear direction forward for improving our simulation results. So I think this hypothesis, the team at Beckman force field, primarily Simon, Owen and Trevor have been working towards the goal of adding virtual sites to a force field release. As discussed by Jeff, the open f of toolkit now has robust support for virtual sites, and we're now able to train and test a set of virtual site parameters. A proof of concept study on the enthalpy of mixing of halogens and pyridines was quite promising. In both releases of open f of force field so far with mixtures containing chlorine, the enthalpy of mixing is systematically overestimated in simulation compared to the experimental value. Adding virtual sites and retraining the charges brought these properties much more in mind with experimental values. So these data point data points over here are mixtures of pyridine apparel. In both open ff releases, the enthalpies of mixing are calculated at about zero in simulation, even though they have much lower values in experiment. Again, adding virtual sites brought the much more in line with experiment, but it did actually worse than the enthalpies of mixing of these data points here. So while this proof of promise there, this proof of concept is quite promising and exciting. There's still a bit of work to do before virtual sites can make it into a general force field release. In terms of benchmarks, as David has said, it was quite encouraging to see that adding virtual sites resulted in generally modest improvements overall for our benchmarks. Looking at the calculated energy geometry and torsion fingerprint metrics over the standard industry benchmark set. And this was again, sorry, again, particularly surprising for the gas phase energy energetics as we hadn't really expected to see a lot of improvement here, making this even more encouraging. So yeah, we're quite excited about these three projects, virtual sites graph charge models and the protein parameters, but a lot more has been going on in parallel. And the dedicated team at open force field has been running many other studies into advancing our force fields. And these can, these are looked at quite fundamental questions, such as the modifications that we can make to the force field feeding process, and even what level of theory that we should be using to generate the QM data that we used to fit. And this was a necessary question to ask, because we do you want to ensure that we are using the best method possible to get the best data possible for training our force fields. And there are hundreds of post hard refoc methods basis set and functionals that could all be used, and they all vary accuracy and chemical chemical regions and computational cost. So we see and provide curated a set of molecules covering a diverse range of chemistry and calculated torsion profiles using a number of different methods shown here. And after doing all that work, surprisingly, they found that the current default method that we're already using actually work quite well. That's B3 lip B3 BJ with the double seated basis set. The RMSE in torsion profiles was only about point one killer calories per mole worse than the best function that they tested. And quite importantly, it was also the fastest function that they tested. So given this optimal balance between quality and speed, we decided to keep using this method for generating QM data. So looking back at the overall fitting pipeline, we've also been looking at updates to the fitting process itself. This included both the initial values that we use to fit and updates to how we construct the target and optimum and objective function for optimization. The first project looking at initial values solves a couple of different problems. When parsley was released, we realized that our fitting procedure had been resulting in unphysically low sulfonamide valence angles, especially in simulation. And actually, these angles have been getting lower with each force field release. This problem was fixed in sage by excluding vibrational frequency targets when feeding parameters. So a different update to the force field fitting process. Another systematic solution is to use the modified seminary a method to derive initial bond and and give and angle values from the QM petro matrix instead. This gives us much more physically intuitive bonds and angles, and actually also results in some improvement on the geometry targets on our standard benchmark set. This was actually part of a systematic exploration of potential updates to the fitting process. Another thing we looked at as David mentioned earlier was also explicitly including dihedral deviations when we fit to optimize geometry targets. Combining this change with the modified seminary method to the fitting process showed greater improvements in the geometry and torsion fingerprint benchmarks than either change alone. So these are really great examples of advances that we can make to force fields without needing new parameters or input data, but just by taking the fitting process itself and measuring performance on standard benchmarks. And we have quite a few other studies for improving force field fittings in footing in the works. One really cool direction is applying surrogate modeling to optimization process. Instead of using simulation data to optimize as we currently are. The simulations are first used instead to construct the surrogate model. The surrogate model is optimized instead, and the solution is checked kind of iteratively using simulations before a solution is accepted. This addresses two problems with our current optimization process. Firstly, that it's quite likely that the solutions we currently get from our simulation based models are in relatively close local minimal. And secondly, that training to simulation based physical properties is just quite expensive computationally. So, Owen has carried out a really exciting proof of concept study where he used surrogate modeling to refit van der Waals parameters from the first parsley force field. This surrogate fit resulted in a much lower objective function. As shown in this, the black crosses here. And also gave Sigma and epsilon values that were dramatically changed from the initial parsley force field. So this is pretty cool. It suggests that optimization did manage to at least explore different minimal. So we're quite excited to explore surrogate modeling further and see how it can improve how we fit our force fields. Another direction that could really change change up our process is Travis work on using automated chemical perception to determine parameters to fit in the force field. So basically this is studying. This is the idea that you can start from one very general parameter such as one for every single bond or ankle or torsion in the force field and split it out automatically to apply to different chemistry as needed. This would be much more systematic and efficient than the manual process that we currently follow. It would hopefully reduce the likelihood of grouping the similar parameters together with overly broad parameters or supporting redundant values. And yeah, in fact, Travis been doing a lot of work on an arcane data set. And when comparing it to sage is shown that we can achieve close force field fits with much fewer bonds and ankle parameters than the sage force field currently has. So there's a lot of room for automated parameter perception to really advance our force fields. And yeah, there are more projects, many more that open force field is working on that we can really show here. Just to name a few and I will note that the dotted lines here don't quite correspond with Jeff's key. The projects are definitely well like have already been started and have already had a lot of work and if it invested into them. They want some of the works here. Look at tweaking the parameters themselves. They're like explicitly in your writing, torsion multiple cities were done by Jessica Mott and Pavan Bahara, or using why the bond, why the bond orders to interpolate improper torsions. And we're also looking at using internal coordination as fitting targets in optimization work that was originally started by here so Jane. So, yeah, we believe that everything on this page, and all of these directions that we're exploring will be really valuable in advancing our force fields and our process, and hopefully will show up in a force field release in the coming future. And we're always looking around for more ideas. So please do stay for the discussion at the end of the keynote. If you have ideas on where we should go after resume. So different projects will require more or less effort for varying levels of pay off. So we do have to choose which to prioritize with the resources that we have available. And this kind of planning is an organization level decision. And something that we really depend on Diego for. So with that I'll hand it to Diego for a broader perspective on a long term goals of open force field and the strategy that we need to get there. Thank you so much Lily. Yeah, hi everyone. I'm Diego, no last go. I'm the scientific project manager at OSF, and for now I'm being shared between open FF and open FV. And it's been really cool. It's been a really cool journey for me since January. So today I'm not talking about project management. I'm talking about a little bit about strategy. And the open FF initiative is a side tech organization. And in this way, our endeavor involves both science and technology. We can divide our efforts into two equally important fronts, making our models. We can generate and slash useful and providing additional features slash functionalities for our users. The balance between these two possibilities is defined only by the amount of resources available to us. We will be a low cost institution. We will be forced to commit ourselves. Sorry to interrupt Diego. Yeah, you need to you need to click to take over the screen. Oh, I'm clicking. Yeah, could you please go back to slides. Yes, this one. Thank you so much. I'm going to try to best one. Could you please go back. Nice. Thank you. So I was saying that. Yeah, if we choose to be a low cost institution, we will be forced to commit ourselves much more to mainly and gradually developing a few from foundational core products, then to the possibility of innovating on a daily basis. If we assume a somewhat more expensive positioning, we can lean much more towards the new features without, of course, neglecting the maintenance of the quality of what has already been released. In that, in view of that, a well defined strategy is crucial. Dealing with the unknown is a must. Some do this by relying on luck, hoping that tomorrow will magically bring the dream results. Others simply engage, taking responsibility for beauty each step that will lead them to their most audacious goals, rather than waiting for the future to arrive. Let me show you how I believe we should align organizational strategy with our concrete plans. So what are our organizational value disciplines. They have taken leadership positions in their industries over over the past decade have generally done so by narrowing, not broadening their focus. They focused on delivering superior customer value according to one of three value disciplines, product leadership, user proximity, or operational excellence. And meeting minimal industry standards in the others. At OpenFF we are not that different. In discussions with the OpenFF team leads, we ranked 21 organizational priorities related to the aforementioned value disciplines. The results indicated that our initiative should focus on product leadership. And also intimacy as a secondary goal and operational excellence in the last place. For us, product leadership refers to the accuracy of and usability of our force fields and core tools, whereas proximity with users refers to hands on support in building custom solutions. Our unique position as an open source initiative, the resources we put into product leadership will compound and we will get a disproportionate return on investment compared to custom solutions for individual users. For example, as we improve the quality of our force fields, more external packages will contribute resources toward interoperability, which will benefit everyone. What product leadership really means is that we want to produce the best force field and software in this space and be recognized as doing so. This means we need to work closely with you guys, our core users to ensure we met your needs. But also that our operations need to be excellent. That's of course not enough though. The force fields and tools themselves need to be outstanding. The goal to excel in the force field area must play a key role in how we prioritize our software and infrastructure. Product leadership depends on usability, depends on the usability of our models and the accuracy of their predictions. We could be covering more chemistry, we could be building more accurate parameters, new functional forms, and we could be exporting them to more formats. This is a matter of time and resources, of course. To promote proximity with users, we need to expand the scope of our communication channels. This goes beyond holding meetings with our boards. We need to build channels that allow personalized and specialized service, especially when it comes to requesting new features. It is extremely necessary to give differentiated attention to the pursuit of operational excellence. If we manage to optimize our processes accordingly, defining specific teams for specific tasks, such as development and maintenance, we will end up promoting more than specialization. We will be promoting reactivity, speed. Finally, it is quite evident that all these possibilities, like resources, both human and financial. So why should we have a strategy? Our strategy must be the source of reasoning for our decision making. It is the strategy that should guide the process of building our project portfolio, and this should be based on how we will add value to our users. Projects in turn need to allow the delivery of results that, when analyzed collectively, reflect our mission. Our daily work must be focused on achieving this mission. Of course, we need constant monitoring so we recognize when we go off track and can readjust. Our strategy must serve as the key basis for everything we do, but also must be something that adapts to reality as the project goes on. What is a project for? The Open Forestry Initiative is a project-driven organization. This means that we must always be planning, executing, or studying projects. Because of that, the most important definitions of my current professional life are projects are ways of delivering the organization's strategy. And a project is a temporary endeavor that aims for a single outcome and has limited resources. I repeat these phrases every day. I need to internalize these two definitions, like internalize my daughter's name. I do this to empower myself when I need to decide in favor of the projects I manage and consequently in favor of our organizational strategy. Projects serve to translate this strategy into reality. Projects are where we actually take action and make decisions. When we do this in an organized and appropriate way, we always have the chance to reduce the risks inherent to the process and speed up the launch of our products to our user community. As we need to understand that time and cost are extremely closely related. Continuous improvement must also be considered in our project ideas, as it is through this that we will pave the way towards excellence. Thus, we need to learn all the time in every opportunity, because this is the mindset that will drive us to improve every day and that will ensure that we will continue on the right path. To continue to succeed into the future, we must continue to innovate in a sustainable way. So, how does strategy affect our daily lives? Open force field initiative or something similar to it has been dreamed off by different players who are interested in building a solid bridge between academia and industry. Even here in Brazil, professors like me have always conjectured about the possibility of designing some kind of organization that would work as a link between research groups with the potential to do science and develop technologies. And companies with the budget to offer support. So please know, we work in a pioneering organization. If you are part of the open FF team few proof legit, as your role is to contribute to the solidification of the next truly auditions are undertaken. Hopefully, many of our farmer farmer partners feel the same feel the same way about the effort as a whole. We're excited about the way this project allows us to share an interchange ideas and data and work together as what sometimes feels like a global team with no single organizational home. I would like you to understand how we as an organization that brings together professionals with different skills are ahead of our time. If you like me have been in the workforce for over 20 years. I'm sure you agree with me when I say that open force the open force with initiative is a genuinely avant garde initiative. Instead of hierarchical structures, we implement interdisciplinary structures. This implies understanding that we act as if we were our own bosses, as we contribute each each in our own way, unique and complimentary skills. Instead of bureaucratic bonds, we implement value based relationships where we seek to provide value to one another. Instead of activity management, we implement self managing team teams as we as as each of us is self motivated and highly inspired in doing what we do. Instead of centralized information we implement distributed information since transparency is one of our pillars. Instead of sector results we value collective results, as we understand that we are free to do what we do because we trust our coworkers will take care of the other tasks that need to be performed. Instead of a physical environment, we seek excellence in the virtual environment, which allows us to achieve considerable financial savings. In short, we are able to reach extraordinary levels of achievement because we have the freedom to make the most of our abilities, which means that we feel free to put our knowledge and our abilities at the service of this organization with the attitude of someone who knows they have the freedom to do so. All of this is the result of a judgment free environment. And how can you help you faster and sustain our strategy. Well, this is, this is an easy one. Just understand that whatever your role is, you can get involved or inspire others to get involved with the initiative. We need more scientists, more developers and more industrial partners, so we can actually make the future we dream about a reality. I honestly hope you understand that what I am doing now is an invitation, a request I would even say for you to engage in broadcasting the purpose of the open force with initiative, translating cutting edge science around molecular modeling and simulation into technology. There is so much great science and engineering to do. And we need and want your continuing help doing it. It has been, and I hope it continues to be a pleasure to build rational solutions to the problems of humanity at your side. Thank you very much. David, back to you, I think. Thank you. We're at the end of our time and or coming up on the end of our time. And so just want to wrap up by recapping a little bit. We've told you about a bunch of the exciting science that's going on or at least giving you a flavor of some of it ranging from where we're heading with protein force field and to different aspects of science that are being explored virtual sites graph charge models. Modified seminary a method for better starting parameters and a lot in this space and we hope you you're excited about where things are headed we certainly are and there's so much left to be done. And we're convinced that we're going to be able to continue improving force field accuracy. As we work through this. And on the infrastructure side there's a lot going on as well. The spoke fit is out now. Lots of different improvements coming including for bio polymers and support for the other sciences going on folding at home benchmarking interfaces and interchange. So we hope you're excited about all of this we certainly are and we have appreciate you taking the time to join us today. And it's, it's an exciting time and we hope that you continue to interact with us and take advantage of what we're building with that we're going to wrap up and move to discussion. I think one. Please raise your hands if you have anything you'd like to chime in here are some ideas for things we could we could talk about but also if you have questions on things that we've covered we were happy to take questions. One thing that's come up. If you're interested in support for non protein polymer force fields. So maybe polymer developments for non bio polymers. There's a funding opportunity that's been brought to us in that area so if you're working in industry and have interest in that area please reach out as we could potentially connect you up with something. So please weigh in on these. We're interested also input on several other areas so how should we combine parameterize components. Where should we plan to go after graph charges and virtual sites and bigger picture what, what would you like to see open force field look like in three years. So that's about non protein polymer support including lipids. Yeah that that could be one. The, I've shared a possible funding opportunity in the consortium advisory board channel so that could be a place to look but so bio polymers. Those are the nucleic acids might be more in the bio polymer area but they could fall within the scope of that. Maybe have hands up. I don't see any hands up yet. Oh we got Chris Bailey. Go ahead Chris, I should have admitted your mic. Okay. So, and firstly if I have tons of questions but I'm sure other people have them too. I'll just begin with one and that is with the bio polymer force field. One of the things that makes it difficult to apply a Smirnoff force field is that the Smirnoff approach depends upon using substructure, substructure search to identify where to apply parameter but with a bio polymer. Once you know it's a histidine, you actually almost don't need that that kind of format anymore. In that you know from the graph all the torsions they're all in a library so this whole aspect of for bio polymers with residues you've got it seems to be library structured. Could one make the whole could one adapt the open force field so with bio polymer perhaps we don't need to use this substructure based matching but instead take advantage of library parameters and library charges. So that and I would say two fronts. Number one, I mean that's that's how tea leap work that's that's how a lot of existing tools work and so you know there there wouldn't be really an advantage to reinventing that, given that the infrastructure already exists. The second is, if I don't know if you've looked at our ff 14 SB port, but we we do do this library approach and what is kind of a large unwieldy force field. Because we have to cover different nation states and consider cases where things are C terminal or N terminal or both. We end up with this explosion of very large parameters and it makes the force field sort of hard to manipulate and hard to reason about where you know we look at our small molecule force field and we can point at one particular torsion and say hey maybe we should split this up in different chemical contexts. And with a library based force field, these parameters are going to be so large that, and you know so so specific in their in their initial construction that they will be really hard to, to improve in a gradual way. Thanks. I have another one. Please. I'd be happily seed the floor to other people who want to ask questions. In Lily's presentation on the virtual sites. It really struck me how when we looked at the overall objective metrics and how well they improved. You know the improvements were modest, but that was kind of looking at the global parameters and what I'm wondering is whether the, the global metric improvement is, is how you assess the overall force field but with the virtual the virtual site so this is looking at the objective function in terms of the overall improvements to the energies and torsions. It's kind of one of the other. Yeah so a couple slides after this. Thanks Jeff. There. And so I'm wondering whether, well, you know those those are the metrics which drive the overall force field. I'm wondering whether it's worth saying well where we would look for improvements with the virtual sites is in the chemistries that are actually using the virtual sites. And that's something that I wouldn't mind seeing is, you know, in the, in the DDE's and in the RMSD's and the torsion fingerprints. How much of an improvement was actually attained in the chemistries which were really using and exercising those virtual sites. Thanks Chris that's a really good point. And honestly Simon would be a better person to talk about this. Yeah, my perspective is because the charters were refit to account for the virtual sites. We do also want to check that, at least they don't result in unimprovements on the benchmarks and detract from results to standard chemistries that we don't want to be affected by the virtual sites. I completely agree with you. And I guess that's where I'm saying what I'm these this slide that I'm looking at is a slide that shows me that we haven't ruined anything. You know, we made everything became a hair better. But I think what would really be convincing to me as a as somebody who's done a lot of, you know, small molecule modeling and industry is knowing that in the chemical coyotes that that we're really suffering from not having the virtual sites if I've got a major improvement in there, then that kind of is the justification for all the overhead and weight of putting in those virtual sites, even if over the thousands and thousands of molecules that don't have any virtual site. It at least hasn't broken anything. And I guess where I'm going with this also is as I was hearing in the presentation today as we're going to talk about custom fitting or or focusing focused fitting on certain areas of the force field or certain chemistries. It might be really, you know, ultimately we've got to show we haven't ruined the large force field, but what might be really helpful to see is how much we've improved the area that we focused on. I think that would be a good idea and we actually had a question from a member of the governing board the other day about where exactly we're planning on putting virtual sites. And so yeah when we when we talk about oh well what if we only look at the subset of molecules where we do get virtual sites or chemistries where we do get them. Absolutely, if you mentioned it but we're, we're aiming to be a little bit conservative with the initial release I think we're looking at sigma holes and aromatic nitrogens, and then things like lone pairs on sulfurs or oxygens may come later. So, not a direct answer to your question but maybe a little more context about how exactly they'll be used. Because you know that's some of the whole science project or series of science projects that can and should be done is where exactly do virtual sites provide the most gains this is a good area of interface with Danny Cole's group and their work on bespoke force fields and individual molecules because they have quite a bit of insight from that on where virtual sites seem to be helping so in some cases we can. We can draw on their molecule specific findings and try to generalize a bit, but yeah the infrastructure supports putting them in pretty arbitrary places so this is a good, a good place for people to get involved with, you know, spin off science and see where they're going to provide the most gains. I'm going to go back to our discussion slide there's a lot of great discussion happening in chat. Oh and I see I'm misadressing my messages. But yeah, does anybody else want to ask a new question feel free to raise your hand for a new question on voice chat or we can we can continue in the text chat as well. I do see some. It sounds like twice today now we're thinking about ways that we could take advantage of residue information to make something like a library parameter that that matches a whole residue and it recognizes that whole residue either by substructure or by sort of PDB residue name information Adam name information. And so again I do want to re emphasize that it's it's a big part of our philosophy that we don't want to be looking at cosmetic things like residue names during parameter assignment it's important that we look only at the chemistry. And it's by doing this that will will be able to wind up using a consistent set of tooling to gracefully handle modified proteins and stuff like that where the force fields melds nicely between sort of unnatural components and natural components. If we start treating the protein as as you know whole substructures or something like that I think we do get ourselves in a little bit of risk that this, you know, the arbitrary small part of the force field and the protein part of the force field won't end up being compatible. Any other input on these discussion points or anything else. Oh, this is also a good time to mention. Do we have a slide on this the follow up workshop topics. We've we've assembled and we'll do that in the general channel and we'll also send a follow up email to the invitation to this meeting. And so you'll have one week to vote on follow up topics that you'd like to be the subject of an interactive workshop. Yeah, so we did this before and it worked well so we went really fast through a lot of things today. And if you're interested in hearing more about some of them. You're going to get a chance to vote on what you'd like to hear more about and we'll have a follow up workshop on that. Um, Alberto you have your hand up. Yeah, and Alberto we've enabled your microphone that you need to turn it on from your end. And you hear me now. Yes. Okay, for the virtual sites. Couldn't we look at our QM data and find examples where the electrostatic potentials differ from QM to what we currently have and use that to identify places where we could help with virtual sites. That's a great question and I'll let Lily take the first one with that. But if she declines I can, I can jump in. Yeah, that's a good question. Um, that would be an interesting approach. What we've been targeting learning fruits so far but that would be, I think a great way to move in the future. Jeff, did you have anything else to say. Yeah, so one interesting thing is that we talked about our QC stack is a little bit particular and it has to deal with some some interesting constraints and so generally when we run our QC jobs we're not saving the the wave functions or the output electrostatics grids. But shortly before he left I think Simon did recognize the importance of having a large diverse set of QM jobs where we have this information where we can reconstruct the electrostatics grid and or the electron density grid and so before he left he put together a very large data set of of molecules that could be good candidates for virtual sites or other sort of charge fitting. And that's something that we can draw upon now that it's now that it's done it took a while to run on QC archive, but now that it's available, we can draw on that to do exactly the kind of things that you said. And Pavan has posted the identified the data set for that in the chat if you want to go check it out. The do. Cool. Well thank you so much everyone we've we've reached the end of our time but you're probably all on slack, and you can feel free to contact us in other ways, again be watching your email and the general channel on slack for the link to the follow up workshop poll, and we know that all of Europe is about to go on vacation for two months. So we'll probably be scheduling these follow up workshops to happen in late August, or a little bit later, and that will also give our presenters time to assemble software that you can actually install and distribute notebooks. Thanks everybody. See you later. Thank you. Thank you. Thank you.