 All right, thanks, everyone. So my name is Jeff Wagner. I'm the open force field software scientist working on the open force field toolkit. And today I'm going to walk us through a demo of what exactly the open force field toolkit does and kind of a high level view of some of the implementation details for what's going on in the background. So I wanted to start off with some terminology that I'm going to be using because I know that this isn't totally familiar to folks who aren't internal. So we are going to be referring to the Smirnoff specification. And that's a language for defining parameterization strategies. And this is going to include both Smirks-based parameters for per particle assignment and also global parameters, such as electrostatic cutoffs. The general goal is to contain a keyword for every decision that you can make that would change the systems energy. And we're going to be talking a lot about OFF XML files, and that's what we're using currently. But what the Smirnoff specification describes is an object model, which doesn't necessarily need to be XML. So don't get tied to that. It could change in the future. XML is just one way to contain the information. Right, an OFF XML file uses a language of Smirnoff to describe a particular strategy. So it has Smirks-based per particle parameters. You can add on parameter libraries, like tick-free water, and it's modular. So you can take bits from one force field and stick them into another force field. And within a certain set of rules, you will actually get forces from both forces and energy components. So the open force field toolkit is basically the parameterization engine that can read a Smirnoff format force field and a molecule in various formats and present to you a parameterized system in open event format. So for probably three quarters of the users of the open force field toolkit, this is all you need to understand. That you come up with a molecule that you're interested in simulating and a force field that describes the physics that you want to use in the Smirnoff format. And the open force field toolkit presents you with a system that's ready for simulation. We, for people who just want to use good parameters, we don't want to stand in the way and say you need to go compile things or you need to make all these decisions about your force field. We just want you to be able to use it. And to that end, we really push the Kanda distribution channel. It's a way to make sure that you get the latest software, that it comes down with all the dependencies that you need that are compatible with their operating system and versions that are compatible with each other. It's really brilliant how Kanda works. And so we strongly encourage people. I know that it's kind of new to the scene and people might be more familiar with virtual environments or having a single root environment. But Kanda's great. You don't need to talk to your system administrator to get things installed. You can install it right in your home directory and in fact, that is recommended, at least in some directory that you have right access to. And it lets you make any number of environments that you want. So probably everyone has experimented with, instead of installing things in the top level system, bin directory, just putting a bin directory in their home and just putting things in there until you mess up dependencies and something breaks. And that's the spirit of Kanda. So you can make as many environments as you want. If you mess one up, it's easy to just go and reinstall everything in a way that you can get any solvable set of dependencies in one single environment. And if it's unsolvable, then you just need two environments and you can get everything you want running. Kanda is not available for everyone. And we know that some companies don't let you reach out to the Anaconda cloud, the package repository online. And so for you, one month ago, we started making these single file installers. And a single file installer effectively contains all the data that you would have downloaded from the internet when you run Kanda install open force field. But it's all zipped up in one big file. And when you put it on your system, it all gets expanded. These individual packages are installed locally. And basically you should get the same software as if you had run a Kanda install through the internet at a certain moment in time when we bundled all of these things together. You can find these single file installers on our GitHub release page. So every time you release a new version, 041, 050 recently, part of the release process now is to make a new single file installer that you can go grab. Finally, for people who really want to install from source, you can use setup.py or pip install develop. We really recommend you don't do that. Resolving dependencies can be difficult. We have our Dkit on the back end and that has boost and that is famously difficult to rejigger. So if you can condense all of your dependencies even if you're a developer, then just pip install develop, the open force field tool kit. Okay, so here we go, time for the live demo. This is the first live demo I've ever done in my life. There actually is no plan B, which is exciting. So I'm hoping it'll be kind of like the guy on the left. You know, I'm concerned that this is more like the guy on the right and we don't have a first aid kit up here. And you have probably noticed that in the meeting channel I posted the, I posted these exact instructions. So in some sense, we talked about how great condit is and we should put our money where our mouth is. So if you want to run the same notebook that I'm gonna be running here, you can go ahead and use these commands. So I've got my terminal ready and I have cheated here. I downloaded these this morning so they're cached on my computer. That way we don't have to sit here and wait for this to go search on the internet. Okay, it's resolved. It's found compatible versions of everything for my platform, Mac OS X. And a lot of people are probably familiar with Honda as a way to get the version of Python that they want. And they sort of assume that it's like a Python virtual environment, but really it's more than that. Really it can handle different versions of Python, Fortran, GCC. If we look in what it's pulling down right now, it actually is pulling down a recent version of Python. So this is really, really flexible. It's more than just a specific Python executable that has what you want. It has everything. Okay, so I just went from zero. I couldn't have loaded the open force field to a moment ago. I couldn't have loaded Gromax. I couldn't have loaded anything else. And here I just said, hey, I'd like open force field. I'd like this really troublesome package called nglvue that ruins presentations. mbtrag, Gromax, Smirnoff 99 Frost, which is our first family of force fields. I'm gonna activate this condit environment and now I have access to all these things. So if I wanted to see what to do with Gromax, which is not installed on my computer before just now, and now I have Gromax. There's one annoying command because nglvue is really finicky. And for those who are following this at home, you can probably run the notebook without nglvue. It's just for visualization. There's simulations that will be running in a minute and those run just fine whether or not you can visualize them that you can go back and look using DMV. Okay, great. So I'm gonna go through two workflows using the open force field toolkit today. And they should be representative of two different use cases. One of them is you don't want to build your own force field. You just want to use a good force field. No questions asked to set up a protein ligand simulation. So I'm also gonna be a little bit transparent about this. I had not been a huge user of the open force field toolkit until Tuesday of this week. I, as a developer, had made certain decisions which resulted in Li Ping and Yu Dong and Simon Boothroy contacting me very angrily. And my excuse was kind of like, yeah, I actually haven't really had to use it. So this was a genuine test of can I go get some data from out in the wild and turn this into a useful simulation. And in some sense, eat my own dog food that I've been dishing out for several months now. So this is a Jupyter notebook. You're gonna see Python code in cells. So first I'm just gonna do all the imports for the whole notebook so we can get them out of the way. And I want to make a guarantee to you that I'm not going to be, I'm not gonna be the Benihana chef. So I don't know if folks have been to Benihana. It's this restaurant where you sit at a counter and in the middle of the counter is Riddle. And there is this chef who is also a performer. So he'll, while he's preparing your food he'll pour a little spatula here. He's made a little volcano out of sliced onions. He put some oil in there and lit it on fire and everyone's saying, wow. And in this hand, he's taking a stick of butter and throwing it into these vegetables when you're not looking. This is how I feel like a lot of tech demos go where it's amazing and it works automatically. And then you go home and you try to do it with your own vegetables and things go horribly wrong. So whenever possible, I'm gonna explain when I'm using magic, when I'm using kind of code that's off screen. So our DKit doesn't count, you can install our DKit. But if I've written a function somewhere, I'm gonna declare that I had to do a little bit of extra work for a certain step. So for example, I'm gonna declare, I've cheated and brought in my own function to find waters that are clashing with a ligand. In this process, I'm salivating a protein and I'm doing it in the absence of a ligand. So when I add the ligand coordinates in later, I need to remove waters that are clashing with it. Function defined. So let me introduce to you the data. It's not my data. A month and a half ago, I was at the Gordon conference in Vermont and the German Merck presented it was Christina and I think Daniel King, Dennis King. Presented some nice, oh yeah, hello. Protein ligand structures that they had put out with IC50s and such. And these were just in the format that they found convenient. So I don't have data on my computer that I'm going to use for this. I'm just going to go grab their data. If you're following along, you can actually go look at the exact data for repository. But in this live example, we've just downloaded their data. For each protein, they have many ligands. So I'm just going to take the first ligand. And when you see an exclamation point in this code that I'm running, that means don't run it in Python. That means run it in the terminal that's on the back end here. A terminal running in that in that tarble that I had sent out. Great. So we've taken the first ligand out of this file and it's just SDF format. I'll declare some magic right now. So the protein and ligand are already in the same coordinate system. Ligands are already docked to the protein. And these were before they were uploaded, they were prepared by the Schrodinger suite. So you can't come in and use my workflow just from like a raw PDD structure. These were already protonated. They had gaps filled, so on and so forth. And here's the scary part. An NGL view works. Okay, good. 50-50 for people following at home whether or not NGL view will actually work for you. It's a package I really don't like very much. All right, so I have arbitrarily picked the first ligand in the file. It looks like a ligand. It's got some diverse chemistry and we're gonna go ahead and parameterize this with our force field. We also have the first receptor in the file. You'll notice that NGL view doesn't do a great job of determining invisible. You'll notice that NGL view doesn't recognize the end caps on this protein. So it treats them like ligands, but they do get parameterized. All right, so here's the plan for the first workflow I'm gonna show you. We're gonna parameterize this ligand using the open force field toolkit. We're gonna solvate and parameterize the protein using OpenMM. We're gonna combine those two systems using Parman. We're gonna remove clashing waters because I'm, yeah, magic. And then we're gonna go ahead and run a simulation and look at it. And we're gonna do that all while I'm standing up. So first of all, I've gone ahead and I've imported the open force field modules. And what a lot of people have been concerned about is that we throw this nasty looking error. If you have not installed the open eye toolkits, but it's a warning, it's not a full error. So it means we tried to load the open eye toolkits, we prefer to use them because they're faster and in several edge cases, they do the right thing with weird molecules. But if we can't do that, we'll grab our decit and amber tools, those were installed in that terminal session just at the beginning of my talk. So all that came down from the internet right then. And now we'll go ahead and use some of the open force field objects to parameterize this ligand. So we've loaded it from a file. I'm gonna hold onto these positions for later via atomic positions. I'm specifying a force field to load. So this is the Smirnoff 99 Frost 110 force field and I'll talk a little bit more about these force fields later. I'm doing two bits of magic, which came from foresight because later on I'll be converting this to amber or through pyramid and then to Gromax. Gromax gets really mad if we don't give Adam's names. For a normal force field, this isn't a problem because I have types that you can use. For us, it is a problem because we don't really care. So I've gone ahead and manually assigned Adam's names with this loop. Also, pyramid is gonna throw with it if we apply constraints before we combine the system. So we have a pyramid applied to constraints later. This is something that just popped up actually only in the 110 force field because this is when we decided that hydrogen bond constraints were part of workforce. We turn our ligand into a topology and then we run create open event system. I'm gonna keep talking about workflow number one, but one point that I wanna drive home is that this is the only code block that uses the open force field toolkit in workflow number one. And that might seem a little bit disappointing because I'm supposed to be up here talking about the open force field toolkit, but the point is that you should go home and use this but you shouldn't be writing hundreds of lines of code with our tool kit. There should be something that just slips in. It goes somewhere, we can update the toolkit, we can provide new force fields but your code shouldn't need to change. It's just a quiet hard working part with a larger workflow. All right, so in the background, we parameterize this molecule. Our force fields do not currently define charges based on smirks. We have the AM1 VCC charge model and during installation, the toolkit automatically detected that we did not have open eyes so we were not gonna be using quack-pack. So it fell back to use anti-chamber and SQM and amperagements. Again, those came down during the installation process right at the beginning of the talk. So right, our design goal is that this code should just work. You shouldn't have to think about the science that you're doing or anything. This is eight lines of code and I'm disappointed because three of those lines are hacks for Gromax. This should be five lines of code and we're gonna try and minimize that even more in the future. One of our philosophies is that we want to be really strict with our input. That is, if you're trying to feed us a molecule and there's ambiguous chemistry in there, we want you to not make a mistake. And so instead of having you go ahead and parameterize a whole thing and run a simulation and then realize it in the manuscript preparation phase that you didn't simulate the molecule you thought you did, if there's ambiguous protonation or ambiguous even stereo chemistry in your molecule, we're gonna yell at you about it so that you can finalize, so that you can fix that. And like I said, we do want to calculate charges without any user intervention required. All right, so that was part one. The open force field toolkit just prepared our ligand. We're not gonna see the open force field toolkit for the remainder of this first workflow. I'm using some basic open MM functions to solve the protein. So here I'm loading the receptor, I'm defining a open MM force field, then creating a open MM modeler object, having it add solvent and that includes ions as well and then creating an open MM system. So now we have an open MM system for the ligand and we have an open MM system for the protein. Again, I'll remind you, the magic in this step is that the protein was already prepared. It already had amber compatible residue names and we're gonna have format apply constraints later so we're not rigidifying water even though we want that in there. One thing that frequently pops up on the issue tracker is that we have a topology and a force field object and open MM has a topology and a force field object. And in the age of global Python imports for you just from whatever import star, people start getting very confused. So this is something that we need to improve our documentation about and I started drawing a list of what you could convert to what else but I decided that that's not how we should be doing this or at least that this diagram needs a lot more work. So we'll just say if your topologies are giving you trouble, make sure that it's an open force field topology or an open MM topology as a situation. In this next step, we are combining the receptor and the ligand and we're using permit for this. And the great thing about permit is that once you have things in permit format, once you have these systems in permit format, to combine them you just say system one plus system two and it's such a moral victory that they could have had a function, they could have had permit that combined system parentheses system one plus system two and it would be like any other function but the satisfaction of taking these horribly complex MD simulations and just putting a plus sign between them and getting like the reasonable result is amazing. So you'll feel it when you get to this part. So I don't think that it commutes and in fact one troublesome thing and this is a little bit of magic that I've thrown in is that if these things have different box vectors, which one would you expect it to use? And so here I've had to go and specifically say use the box vectors from the solvator protein but not the box vectors from the ligand. So again, this is a workflow where I've kind of stumbled over these things and hopefully you can implement something very similar to this exact logic but we do consider this part of our user experience. Yeah, so the question is right now the list of force fields is very short and what happens when we get lots of them? So this is kind of answered by our force field distribution strategy. Number one, we do not have a list all available force fields function yet but that's a really good idea and we should put that in preferably soon. And two, we make it flexible so that in the same way that Iconda installed a bunch of things like Romax and stuff at the beginning, one of the things that Iconda installed was a package that contains nothing but smear not format force fields and we happen to publish that one but so that we are not just being a central authority for who gets to make smear not force fields anybody can make condo packages with smear not force fields. So in terms of a search function that we still need work but in terms of being able to add them from different sources, we have the machinery out there and later today I'll actually be copying that for this one, we'll have another condo package to pull that in. All right, so before I had shown you the receptor and ligand here is now the solvated structure this just all got made in the course of this presentation and if I zoom in on the ligand here and again the coordinates were already docked before we started, the protein was solvated in the absence of a ligand so you see there's this water here that really shouldn't be bound to the ligand so I'm gonna use that dirty magic from before and run the function to resolve these clashes and delete any waters that are too close. And now if we take a zoom in all the clashes with the ligand have been resolved. So, we have combined the protein system and the ligand system. Right now they're both permit objects and permit can output to lots of systems. Right now I want it to output to openMM but we're gonna revisit that in just one second. So I've outputted it to openMM and now I'm gonna simulate the complex and it says to view openMM object manipulations. Now what's gonna happen is we're gonna minimize the energy of the complex first. We're gonna hold on to those minimized coordinates and then we're going to yeah, no time for equilibration because it's a live demo. We're just gonna go ahead and set the temperature to 300 Kelvin and hope that it doesn't, in my testing it doesn't. So that's gonna take about three minutes to run and I'm gonna talk about some stuff in the meantime. So we have had some discussions about how exactly we want people to use force fields. People who have done the examples with the open force field toolkit have seen that we import this force field from a test force field sub directory and we update that whenever we feel like it. It's for our tests and so when we make a change to the Smirnoff format we need to update that file. We go ahead and make a bunch of changes by hand and one day we sat down and we said, wow, what if somebody publishes a paper with that force field? What was in it? Like how could they say what force field they use because it's changing all the time. So we realized that we need to be very strict about releasing force fields and knowing what's inside of them and now all of the force fields that you're gonna see come out are gonna have version numbers on them and on the repository where these force fields are from they're all gonna have DOIs. So if you go and do full science and you need to say exactly what science you did you can give these force field names with a version which should be a unique identifier and even more so we have DOI objects for all of them that get automatically created when we cut a release. Yeah. Can you also add a hash to the XML files? Yeah, so the question was can we add a hash to the XML files? And yes, we could. We thought about whether or not that's necessary because our format, you could technically encode the same force field using non-identical text. One could define distances in angstroms and the other one in nanometers but if they could encode the same parameters. And so it becomes a little bit of a philosophical question. I think if things get ugly we probably will start using a hash but for now we don't see any. I grab a XML file from somewhere. Oh, okay, no problem. Yeah, so I guess the main concern is that I mainly edit BP files all the time and sometimes I forget which version I'm actually using. So if there were a hash that was published online I can just do a quick check to make sure that I have this actually matching the one that's out there. John Cudero says you would have to hash an object representation of the force field and not a file. And yeah, so that is technically correct. Like I said, the XML format that we are currently using happens to be something that we can convert our data into but the real content is the object model of the force field. But yeah, I think that's an important consideration. Don't hash the XML, hash the numbers. So do you actually need to hash the binary data inside the force field object to generate the real hash to a certain extent? Yeah, more or less. This is actually a really interesting discussion. We should probably continue this at length but maybe not right now. Okay, we can talk offline. Yeah, thank you for the idea. Right, so like I said, more force fields become available in these conda data packages. So packages that don't really have a lot of functionality but they do allow you to import these force fields. And the great thing is that the open force field toolkit and the open force field force fields are going to evolve together. So if one day we release a force field that we've done some cost-benefit analysis and we say, wow, polarizability sure is worth adding and we provide polarizable parameters, we'll release that as a new force field but in conjunction the toolkit will be able to support that force field. So to you, the user, that code, those eight lines of code should not change but the systems that come out will be using polarizability. So again, we don't want you to be inconvenienced by our changes. You don't need, if you just want to use the best force field we can offer, you can just use the best force field we can offer. We had covered charge generation a little bit before and so right now we currently use AM1BCC. There are several downsides to this because it's gonna depend on the initial conformer for the molecule that you bring in, different implementations will still give different results. So we'd like to move away from this and I'm really excited about the idea of doing graph-based charges. So that would be just something like taking the smiles of a molecule and terminating charges strictly from that because then we wouldn't have toolkit differences, we wouldn't have platform differences, we'd just get the same charges every time. And that would be, I think really nice because right now you do get slightly different results if you use OpenI versus RDKit or depending on which platform you're running on. Right, so between the two toolkits and Dave Mobley had touched on this a bit, you're gonna expect to see differences in terms of the file formats that are supported. We don't really trust RDKit to read MOL2. You're gonna get slight differences in partial charges. The OpenI toolkit is substantially faster though we're hoping that the time that you spent parameterizing your system is small compared to the time that you spent simulating it that everyone's gonna have a different use case for that. They canonicalize smiles differently in some cases which makes it difficult if you're trying to consistently deduplicate within objects. If part of them were generated using OpenI and part of them were generated using RDKit. RDKit is not super stable in terms of one day you will get one set of smiles for a molecule database and then another day we'll get, they'll have put out an update that maybe fixes some bugs but we'll get different smiles which means that RDKit's a tiny bit inconvenient because your data sets will go out of date if you identify things by smiles. And then there's a couple of edge cases of stereochemistry definition. The big one is whether or not stereochemistry is defined around prevalent nitrogens but there are some other edge cases where a molecule which is accepted as having no ambiguous stereochemistry by OpenI is rejected by RDKit. Again, we expect that this represents like less than 1% of molecules that you would try to fit in. Okay, so at this point our OpenMM simulation should be done. I'm hiding the water here but it was present in the simulation. And, right, in the last few minutes we parameterized the protein, parameterized the ligand, combined the systems, resolved clashes and started running a simulation. So that's great, but lots of people think that OpenMM is weird. So what about using Parmed to convert to other simulation packages? Here I'm going back to that Parmed complex structure that we had saved right after the merge and the water clash resolutions. And I'm just going to instruct Parmed to save those out as Chromax format top end.gr opons. And then run a series of console commands to take these through a simulation. This will also take a few minutes that you can see that Chromax is running in the background. Some magic here, the MDP files were already prepared thanks to Dennis de la Corte who had just submitted a cool example that walks you all the way through starting in the open force field to get running simulation in Chromax at the end. So I'm not going to say that the way that I have modified these files necessarily makes them appropriate for anything other than generating a trajectory during live demo. But these are real Chromax simulations being running Chromax on my laptop right now. We used our Dkit on the back end and we carried these partial chargers around a bit so there's a rounding error so we do have to use the MaxWorn flag. There's like a .005 net charge in the box which we need to resolve. But I wanted to disclose that. So like David had said and this is a big point that I want to drive home Parmed's great, you can do magical stuff you can use a plus sign between two systems to get a new system out and it's what you expect. However, there are serious structural differences between simulation packages and there's a lot of edge case behavior. And every once in a while we find something and we say like, oh wow, boy, like maybe some parameters were being ignored when we thought we were faithfully exporting to Amber. So these are things where we're interacting with the community, we're getting in contact with the developers and pushing bug fixes and everyone's been really receptive. So this is good because if we're having this problem then lots of other people are having this problem as well. But if you really want to make sure that you're using perfect parameters as assigned by the open force field toolkit, OpenMM will do that. Other simulation packages start getting hairy if you move through Parmed. Another problem like we saw with the fact that I had to artificially just not constrain hydrogen bonds even though I really wanted to constrain hydrogen bonds is that different simulation engines have a different definition of parameterization. So we say parameters are things that affect the energy of a system and we're gonna be really strict about it. And that means that hydrogen bond constraints are part of our force field but in Amber they're just a run option, same with Romax. And so in a way we're trying to over define a system when we export to Amber. Same thing with electrostatics cutoff. We define that as part of the force field and other people define that as just an option that you set at runtime. So these are difficulties that we're gonna have to overcome. Again, in OpenMM these are things that we encode, we bake them into the system that we produce but in other simulation packages you'll want to do a little bit of legwork if you're gonna say, I used exactly this version of the Smearnoff 99 cross force field to make sure that the run options were the same. Okay, our Romax simulation is done. I don't have time for the jokes. So here from the local directory I've just loaded the trajectory using MBTRAD. And here is a Romax simulation from the Romax that I installed 20 minutes ago and I don't think we're gonna see anything in cycle but it ran. So my conclusions from this first workflow and it's by far the longest workflow so I'll probably still be done on time is that using the open force field toolkit in this process required eight lines of code, three of which were cheap hacks that I'm gonna build workarounds for. And that's what we want your experience to be. If you just want to parameterize molecule and do a good simulation, you don't need to worry about putting in hundreds of lines of code from our toolkit. We want to be small and invisible. We're gonna try to keep things very consistent so we can release new versions of the toolkit and versions of the force field and it's not gonna break your workflow. It'll just sit where you put it and work for years. The only magic I performed when I installed literally everything that was used in this demo was that I had downloaded it all earlier. If I hadn't, it would have just taken several minutes to pull down lots of megabytes of files. So Kanda's really great. I just got Gromax for free. I got the open force field toolkit. I got this notebook visualizer that's probably working for 50% of people who are trying to follow along. So Kanda's really nice. And if you want, I can work with you to experiment with it but it's really cool what you can get done. When we did the simulation with OpenMM, we never left Python. So if I want to do simulations in a loop, like if I want to do all the ligands in that SDF file instead of just the first one, I would have never had to leave Python. I could have split it. I could run these OpenMM system preparations in a loop and all of this stuff could have been automated easily, no shell scripts, everything in one single environment. And then using Formed, we did have to do a little bit of additional legwork. We had to go out. We had to have some run parameter files prepared but it was still pretty simple. So I'm not really happy that I had to do quite so many hacks to get this example working. So things like removing the constraints, which maybe people wouldn't know about beforehand that I just learned. So I had a great experience eating my dog food that I've been serving out for a long time and all of the little inconsistencies I hope to have resolved at this normal toolkit behavior in the next place. Okay, yes. So before we move on to the second workflow, question is, is there like a workflow or an error check or sanity check when we start moving our systems that are built into Gromax and back? So for instance here, you ran a simulation in OpenMM and then the same system was exported out and ran in Gromax. Can we put together a piece of code or something that would quickly check if both these energies are exactly the same or eventually the RMSDs are exactly the same? Something like that. Yeah, I think that would be a really good validation check. It comes down to what we want to prioritize because I think that it's really important for our effort that we be able to faithfully export to other simulation formats. But at the same time, if we start, if we start being really rigorous about supporting Parmed then we've de facto become the Parmed developers, which isn't a bad thing, but it's something that we want to very deliberately devote effort to. So I think in energy, number one, yes, we should absolutely do energy checks. And we should say, aha, we get identical energies or no, it turns out that it didn't record the cutoff or you have to set it manually and it would be easier than this. So number one, we should do that. And number two, I think the bigger thing is if we do find a difference, how exactly we approach that and what level of effort we can get to it. It looks like it was figuring out some of the things by the extension you use, like .g, .ro or whatever. So how do I know what the list of extensions are? How come that wasn't .g, .mx or something like that? So how do I know those? Yeah, so that's all from the Parmed documentation. Yeah, I had never, I started writing this notebook on Tuesday and this is like the first time I've ever exported a Parmed system. So it's pretty good documentation if you need to look for this. Workflow number two, it's quite a bit quicker, which should line up well. Effectively here, we're gonna look at the expert user level. So a user who's interested in modifying aspects of the force field, who wants to inspect which parameters are being applied to which molecule. And I'm gonna start with a quick reminder about the Smirnoff specification. So this is an old graphic and there's several problems with it, but it is really good at illustrating exactly how parameters get applied. So over here in this first section, we're looking at the bonds described by Smirnoff format force field. And each bond is described by a Smurx pattern. So this is a pattern that can be matched onto the smiles and then the parameters that get applied. So the length of the bond, the k value for the bond. And here, this is the blue parameter and it matches the smiles that corresponds to these carbon hydrogen bonds. So this is what our parameters look like. If you were to crack open a file, there have been slight differences, but you would just see big lists of here Smirks and then here are the relevant parameters for bonds or you know, there's more parameters for portions. I will say if you crack open the most recent file, we've made a format change where we don't want units to be ambiguous. Over here we kind of have them be unitless and we just say what all the units were up at the header of the section. That's really scary. And we don't want to do that because we already had made some mistakes and some hand transcribed force fields. So instead we make it very unambiguous. You can have combinations of different units in the same section. You should be able to look at this and really know what parameters are being used. Along with this, we had found several things that affect system energy that we hadn't defined as part of the force field before. So things like the electrostatics cutoff just didn't have a spot to be input in the previous format, but in the new format they do. So anybody who's experimented with an old script and the new toolkit will have seen this where I'm loading a 0.1 format Smirnoff file and this is giving me alarm fatigue by explaining, oh my gosh, you didn't specify a one three scaling factor. So I'm going to assume that it was this and I'm going to make that explicit from that one. We should probably remove these but this is to say as we've improved the Smirnoff specification, we've become more thorough in exactly which aspects of the system energy we keep track of. We're not going to need that force field. I just wanted to show you all those areas that we should probably be carrying. Great, so we're going to reload that again from the first section and here I'm doing a quick hack. So we consider AM1BCC to be a part of our force field but a lot of people have spent a lot of time figuring out chargers or they want a molecule to be charged consistently so comparing with some other package. So there is a work around where you can bring your charges from phone and you can define them as an umpire array or you can define them in the text of a file and you can bypass AM1BCC charge generation during system creation. So this is all stuff that we had done above. I just had it as a backup in case they failed. Okay, so let's say that we want to go play around with the, here's our ligand and it has this hydroxyl on one end of the molecule and this hydroxyl if I play the simulation energetically it likes to be co-planar with the ring. And as an example of changing parameters what if we wanted to tweak these such that it wants to be perpendicular to the ring? I can iterate through the force field object. So over here I'm loading Smirnoff 99 Frost most recent version and I'm going to say, hey I want you to tell me which parameters you would apply to this ligand and then I'm going to go walk through the proper torsions. So the one, two, three, four angle forces. And I want you to tell me if the one and two atoms are H0 or if the three and four atoms are O and H these are the different torsions that could have defined that hydroxyl hydrogen. So I go through with this conditional statement and it prints out, yes I applied two proper torsions to hydroxyls. There's only one hydroxyl in the molecule so this must be it. And we get the smirks of the pattern that had matched the hydroxyl so we can go and inspect what was the reasoning behind why this parameter got applied and then we see the parameters for the torsion. So I can go and just take the smirks and go to the original force field and say, hey give me the thing that handles your proper torsion parameterization and then give me an object corresponding to this parameter that got applied. So I'm going to get an object called hydroxyl torsion which is the parameter from the file and then I'm going to just go mess with it. I'm going to say, hey, I'm going to have your pure out of the CDB2. I want you to be perpendicular so I'm going to set your phase at 180 and I'm going to set your k value at negative 10. And now I want to go ahead and apply these new parameters that I've just tweaked from the original file to the molecule. I'm going to do that with a little bit of dirty magic where I'm just taking open mm's energy minimizer and applying it to a system and writing out a file and visualizing it. And here we go, this terminal hydroxyl. Now, when we've minimized the system from that starting geometry corresponding to what we saw in the simulation where the hydroxyl wanted to be parallel with the ring, now the hydroxyl is perpendicular. So that's not special in my mind. You could have gone in department. You could have just cracked open any format system file and changed one number. So let's go ahead and do a similar trick but for all of the HXH angle turns. So instead of those being 109.5 degrees, let's dial those down to like 50 degrees just for the sake of doing something really visible. And sure enough, I've done that. The code was as simple as, hey, give me the object corresponding to this angle term in the force field and then dial that equilibrium angle down to 50 degrees. Put that through the same parameterization and minimization and we've got this horribly mangled molecule with hydrogens artificially close to each other around these angles. So it would have been maybe scientifically more interesting to vary the K value but you wouldn't have been able to see anything. So I'll come in with a quote from Aperture Science which is we do what we must because we can. You probably would never want to do this but it does give you the sense that you can programmatically iterate through all the terms in a force field and tweak them until you get a geometry day. So finally, I'm gonna go ahead and show one last cool capability that we put in which is that we like to yell at people about their input. We like to say you didn't define stereochemistry or I can't figure out the bond orders in this molecule. And frequently people will want to simulate beginning with a PDB file. And those are notoriously sketchy about the information that they actually encode. So here I'm taking this PDB file. I don't know how to name this molecule but I'm going to describe it using the smiles and then use that to supplement the basic connectivity information that's coming from the PDB. And so now without loading an SDF or a MULTU which would explicitly have told me about the bond orders I'm able to go ahead and pull this molecule in and energy minimize it with the original parameters. And so the geometry doesn't change because it's minimized to begin with. But I can also in a similar way go through and say that I want to take these these substituent groups off the ring and I wanted to instead of having their equilibrium angles be coplanar I want them to be perpendicular again just because it's visible you never really want. I can go through and find every proper torsion that ends in O. So either in the one or the fourth position. This returns a whole bunch of them. I don't want to go and manually type all these in but because this is a Python object I can just capture all of these parameter objects. And then in the next block run through and set them all to want to be perpendicular instead of coplanar. So changing the phase, changing the k value to override any other energetic factors. And now I have programmatically gone and changed the force field to achieve the geometry that I want. So the big takeaways here are that the 0.3 Smirnoff specification update like I've shown you gets us a lot closer to completely describing the energy of a system via the contents of a file on the parameters agent strategy. The force field object model is really handy. You can go in and just start messing with parameters without having to change the text of a file or anything. You can see how parameters get assigned based on smirkspace matching and do whatever you want. There wasn't any file input or output in the process of doing this. OpenMM does its energy operations directly from this Python interface. So you never have to drop out and run a bash script or file IO, you get this for free. So this API with some performance shortcuts was used for the first round of parameter optimization. So this is something that's actually being used and is sufficiently performance to help us put out a force field. And even cooler, this gives you a way to bridge kevin informatics and force field science. So in the future, we hope to make our own system class that gets over some of these problems. We have the issue of if I want to tweak a parameter and the reparameterized molecule and see how the geometry changes, I have to go through this arduous process of doing this smirkspace matching over and over and over again for every time I tweak the parameters, but really that shouldn't be necessary. Really, we should just be able to remember which smirks got assigned to which topological terms and then we can really quickly reevaluate energies and have this indirect system object where that enables optimization a lot more efficiently. So there's a few things that we need to resolve on the way to that, but that's one of our big goals right now because that's gonna enable a lot more force field science. All right, well, I'm overjoyed that we got through the live demo. I just have a couple more points. One common question is there's all sorts of capabilities that we will be presenting as work goes on and you might ask what do I need for my use case now? If you want to take molecules and just parameterize them using the best force field we can give you, you just need the open force field toolkit and then whatever machinery to get your OpenMM system into the format thing. If you want to perform bespoke parameterization using your own machine, so you're gonna be waiting a little bit. We have many of the components that will come together into that, but we don't have a dedicated person working for that. I know that this is gonna come up in my one on one, so I figured, good to mention now. If you want to use your own machines to make your own force field in theory, you could do that right now, but it would be kind of a big headache to take these disparate places where the fitting has been done. And one of our goals in the coming timeline is to start merging these and making them more automated. So that should become easier as time goes on. In terms of the pacing of what the toolkit's doing, the initial release of the RDKit backed toolkit was on April 7th. We've made several releases since then at a pace of one every several weeks to month. The next one's gonna be a little bit delayed because I spent a lot of time preparing for release one, but you can always go on to our release history page and see URLs down there. You can go on to the release history page and see detailed descriptions of everything that's changed. We try to be good about semantic versioning, so we're not gonna change the API unless we're incrementing the not the last number. So basically this is a good way to keep up with toolkit progress. If you wanna see what's coming in the future, we try to keep things reasonably ordered in milestones. So 050 was the most recent toolkit release. Right now we're working on 051, which is just a whole host of bug fixes and things that I found while making this notebook. And then the next major feature that we're looking towards implementing is library charges. So this would be a smirks-based way to apply charges, but there are some theoretical questions behind how exactly do you make sure that things end up being integers that we need to resolve. Finally, I wanna point out that we are being active participants in the open source ecosystem. And this is really exciting. We pushed bug fixes back to OpenMM while we were developing GVSA so we realized that some problems in our energy tests weren't our problems, it was their problems. We've worked a lot with Mulsey on the QC archive, getting, being kind of initial friendly users and getting architecture support from them. I had folks from Biosyn Space in my kitchen two days ago and we were doing a hackathon to see if we can inter-convert molecule objects. And I hope that we can make it easier because Biosyn Space is a platform that connects a lot of really interesting tools that help you prepare systems and manipulate molecules. Yeah, and also as much as we don't wanna become maintainers, we are getting into the native radio permit and finding some of the bugs in there and pushing things back up screen. So this is kind of nice that we're making tools for you, but we're also making the other tools that you might be using better at the same. So for the industry folks, I'll be putting up a schedule tomorrow for one-on-ones. During the one-on-ones, I wanna make sure that you can get the toolkit running on a computer that you control, that we can talk about what your workflows need. So maybe your workflows look exactly like this one, but probably they don't. And we can prioritize, I can tell you about our capabilities or we can prioritize new capabilities if we don't do what we want. And with that, I'll wrap up. Does anyone have any questions? Just a quick technical question. So when you load a ligand from the PDB and then put a topology with smile, you have to be the same order or you're align the graphs? We align the graphs.