 All right. Hello everyone and welcome to open force fields second follow-up workshop. This is going to focus on the use of the new interchange package that Matt Thompson is the lead developer for. And so before we get started, let me give you a little bit of background information for how this is going to be run. So these follow-up workshops, unlike our annual workshop, where the annual workshop was just a presentation of information with not that much room for feedback, these follow-up workshops are intended to be an avenue for feedback. And so we have a large Q&A and developer support session at the end. We encourage you to interrupt the speaker if you have a question, either you can post it in the chat, or you can just say it verbally when there's a natural pause and we'll be asking for questions. We ask you to be a little bit patient. So not only will the speaker be executing code that brings with it all of the joys and the sorrows of a live demo, but also we would like you to be executing this code as well. And so, you know, this is going to be sort of a more technical session, and this isn't just going to be a presentation of science, but also sort of getting into the nitty-gritty technical aspects of how to use open force field tools. This meeting will be recorded for the first portion. So we have prepared material. Today will be a number of Jupyter notebooks that will be going through together. Yeah, so this session will be recorded. Any, anything you say and whatnot will be, you know, you should assume that it's public. It's going to be uploaded to YouTube. After the prepared material, we will turn off the recording. And so then you can maybe feel a little bit more free to have Q&A, but at no point in this meeting is there an expectation of privacy. Open force field does not sign non-disclosure agreements. And so, you know, if you're trying this out on your own data, which we strongly encourage you to do, but that, you know, is project data or something. You can try to contact us in a more private venue, but again, nothing here is going to be super private. Yeah, so basically assume that this whole meeting is a public space. So the general plan for today is that we're going to have one to two hours of prepared material presented by Matt, and then we'll have an extended discussion and Q&A period, and then a developer support session where we really encourage you to, if you're having trouble setting things up on your own computer, an open force field developer can hop into a separate Zoom call with you and get it going and then we can come back into the main room. And yeah, if you try to put in your data and you get a new and exciting error message or, you know, there's something fundamentally that you need that open force field doesn't provide. This latter portion of today's meeting is going to help us collect use cases so we can make tools that are more useful for you. So on the Confluence page for today's workshop, and I'll go ahead and Matt, could you paste the Confluence page? I've somehow lost it. But we have materials for today prepared and we have installed instructions so that you can get this running on your own computer and all that's required is the KONDA kind of user space package manager. So please, if you haven't already, go ahead and begin executing the instructions on that page. Today you'll be checking out a Git repository and building a KONDA environment that will let you follow along. If for any reason your computer is unable to get everything running, the same way that you see the presenter Matt showing how it works, we have binder links available where you'll get a free cloud instance with all of the dependencies installed. And unlike yesterday's workshop for people who are there, if you do have trouble installing, just use the binder for the prepared material part of the session and then during the developer support in the later part of the session. An open force field developer will help you get everything set up on your own machine. And then finally today, there is one additional advanced notebook that we've added to the materials that won't be in the binder package. And so we may or may not get to it. We'll see how the timing is going. But there it's an advanced notebook showing how to use virtual sites and interchange and whether or not we get to it, we will update the binder link after this meeting to contain that additional notebook. So, does anybody have any questions about the the running of this workshop. If not, I'll turn the floor over to our senior software scientist Matt Thompson, he's a lead developer of interchange and he's been with open force field for a number of years now doing a lot of the critical work that you know keeps things running and installing. And so, yeah, Matt, if you like, please take it away. Thanks. Excuse me. Thanks, Jeff for the introduction. I will show off my technical prowess by taking the right window to display as I screen share. I think is this, do you see my, my confidence page here or do you see something else. Yep, we see your confidence page. Okay, cool. Yeah, so I am just on this confidence page that we've distributed out and there's a again in the chat is link. So, at the bottom, there's a link to a GitHub repository that includes all of the materials. I've been prepared for this workshop. And one of them is the read me that serves as sort of the structure and the loose agenda for today. So these are the installation instructions again, but the stuff will actually be covering includes mostly high level stuff. So, our objectives include summarizing the key objectives of the project. And kind of painting around the boundaries of what the current scope of supportive functionality is. Depending on sort of the time you have left and user interest. There are some more experimental features that are a little bit more forward forward facing that we might have time to get to, but those are far from reliable reliable yet for use in production. In the meantime, we'll spend going through the high level API of interchange. Believe it or not, in about two hours I won't really have time to get into the new degree of the internals. But depending on if depending on how much people want to hook into those internals for their own work. That's certainly something I'd be happy to work on with people outside of outside of the workshop. So those are the high level objectives. Because there's a good bit of stuff to go through. I don't think it would be wise to put everything into one notebook. So, unfortunately, things are scattered across a bunch of notebooks, but I think it makes each one of them a little bit more digestible. So those are the notebooks, they're mostly in the order we're going to go through them. So the first thing we're actually going to go through is talking about units, which is not specific to interchange it's it's more the entire stack. So we're going to go talk a little bit about the objectives and sort of design. And we're going to make some interchange objects that we're going to show how we can export them to different objects and file formats that are used in the simulation engines that you use. And then we go a little bit out of order and we'll be going through the protein ligand example, which takes a sort of mostly prepared system of a dock to ligand in some protein used in some protein ligand benchmarks I believe the files that were used for benchmarking partially and are possibly sage as well. But we'll go through that and show how we can go from those input files to open a mem and other engines using just open force field tools, mostly the tool get an interchange. So we take a step back and go to a notebook where we explore a little bit more of the, just a little bit of the internals, specifically the potential handlers. And then depending on the time left and what people want to see we might do a little bit with virtual sites or at that point it might, we might go into the last developer hour. So that's a rough picture of what we will be going through today. I believe I shared my window or my desktop and not a window so you should see my terminal now. Yep. Cool. Okay, so I'm going to assume that we have made our current environments by now. So, I did the same thing as well. And we will want to activate the environment. I hope I spelled this right. Okay, cool. And from here, if you would, from here you can. Oh, sorry, I need to go into the repo. Here, I will pull up. Not a specific notebook, but if I just point it to the notebooks directory. It will pull up. It'll pull up the directory. And from here, I'll open each of the notebooks one by one. So I believe I said the first thing I was going to go into his units. So it's a sort of design decision. Interchange specifically but also a lot of the rest of open FF software tries to tag things that might have physical meaning with with units so instead of, you know, just be in numbers like floats or numpy arrays. These are unit wrap objects called quantities. We have full unit solution unit packages out there. We provide our own, which happens to be basically a pretty thin wrapper around pint with just a couple tweaks to make our stuff easier and a little bit more friendly to computational chemistry. But most of that package should behave similarly to pint if you're familiar with it. If you've used it in the past in other packages. So the, the kind of basic usage is so we'll want to import this unit. This is technically an object but it kind of behaves as a namespace. You'll use you'll use from open FF that units important a lot in your code. So I'd like to make these quantities with the quantity constructor, which for almost all uses will just take the actual values you're wrapping, and then a unit object. You pass those to the constructor and you get the object out. Like I said this unit thing acts like a namespace and it has a bunch of these things dangling off of them, where you can kind of, if it's a, if it's a commonly used unit, you can kind of do unit dot, and then type it out and type complete and it will probably be there so here this is just nanometers, you don't need to import anything more than just the unit. As a shorthand you can also just multiply the number by the unit object. And the cell just demonstrates that that these are that these are the same thing. Some of the other things you'll, you'll probably want to do with them are convert them to something of a different unit, and there are sort of a few different ways or a few different. Use cases like this. So if you use quantity dot to and then pass it a different unit, it will convert that quantity into a different quantity, it will not do it in place so here if I really needed that quantity to be an I would either need to store it back into itself or create a different, a different object in memory. If you want to talk to something that does not want the units to be associated with them, like if you're, you know, if you're writing in some file format, and you just need the number and you just want to trust that you have the units. So if you're interested to do that, there is a dot magnitude. What is a property or property like thing, I suppose, and that will just spit out whatever, whatever the number is and it will effectively strip the unit and not do anything particularly smart or special around that so that's that's very very fast. If you want to be sure that you are writing something out with particular units, you can use in as and pass it a unit and then it will do the conversion, and then strip the unit out so in this case, you can see, if I just do dot magnitude, or it's alias dot then I just get the number that I had and it has no knowledge of the units but if I want to convert it to angstroms and strip the units out I can do that. And just to show it works if you want to do in as and ensure that it's still nanometers just in case you did something earlier in your code that kind of that kind of works as expected as well. We often find, or we often find anyway that serialization of our objects is very important. So, you know, serializing a force fields molecules, apologies so on and so on. And pint does a little bit of magic to make this happen pretty, pretty well out of the box. So you can just call the string, the built in string function on any quantity object and it will give out a string. With a pretty sane representation. This is one that's obviously human readable, but it's also one that conveniently is machine readable as well in a sense or at least pint readable so I can pass that string directly back to the quantity constructor, and it will actually parse the string in the units and all that. So, this is a kind of very small in scope example but this this serialization this serialization round trip is basically what we do at scale for these sort of thousands of line long for skill files, for example. Okay, so you might have some existing code that uses open MM and open MM has its own units package, which works great and we want to make sure that we offer interoperability between those. So we have a small sub module here that offers these two functions from open MM and to open MM. They do what I hope it seems like they would do on on on the on the tin, just based off of the names. They convert these open ff quantities to from and to open and quantities. I make a distance of 24 meters, or I make a quantity object of 24 meters, I can convert it to an open MM quantity that should be analogous in meaning. And you can see here that when I print it out, it appears to be still 24 meters and it's an open and quantity. And then just I can go in the reverse direction and I can call from open MM on that. And I get out just what I started, which is hopefully not too surprising but that's what we expect out of a round trip. There are a few cases in which. Depending on sort of what libraries you're talking to and what versions to them there might be some cases where like, I'm not sure if this positions vectors in open ff units or open MM units. And you can do like an is instance check or, you know, you can check the type of it. I have some branching logic out from there. And I've spent a good bit of time writing that code over and over again so I wanted to provide a short, a short, short shortcut to it. And that's in the form of this ensure quantity function so this takes in two arguments one of them is a quantity where you're not quite sure if it's open MM or open ff or potentially in the future. Some other unit package. And then just a string saying which which one you want it to be. So if kind of kind of separately, you can use this question mark operator in Jupyter notebooks to pull up the kind of doc string help of a function or a class if it's written. It works the same if you do question mark thing or thing question mark. And in this case it pulls out the, I guess if it write a doc string but it pulls out the function signature which I think should give you about as much information as you need. These next two cells just demonstrate that whether, or sorry. These demonstrate what happened if you call this function passing these two different arguments. So you can see if I tell it to make it an open to ensure that it's an open ff quantity it does that and the same thing with open MM. So for cases in which you would effectively convert to what it already is it makes an effort to short circuit out of there. So if something's already an open MM unit and you want it to be an open MM unit it won't do any roundtrip. Or it wants to any roundtripping you will just recognize that and short and short circuit out of there. And last couple things about open MM so open MM sometimes provides array like things as this as sort of kind of weird objects. I'm assuming for reasons that relate to making stuff work faster on on GPUs or something but something that I find myself dealing kind of encountering a lot is positions from like a like a PDB file object is this thing that's like a list of these VEC three objects. And these are open in open MM units and the logic inside of this from open MM function is the time to handle that so you can see if I can read it from the open MM object to the open ff quantity, I get, I get a wrapped up by right, which hopefully is stuff that something that people here are are pretty familiar with. And I kind of spent all this time talking about this because it's not something that's just used in interchange this is used now throughout the toolkit as well as of the most recent version. So, if we pull up pull up the toolkit, we ensure that we're on version 11 or newer. And we do something that probably everybody in the audience does all day long, and we make a molecule object from smiles, we can see cool we have a molecule that looks like, you know, like a quiz from Okim one to pull out the name of it. And we can. So, say we want to generate the conference for it and we want to assign partial charges for it. Those objects that are carried along with the molecule now are both open ff unit quantities. The Confirmers is a list of them just to ensure some partial compatibility with how things, how things work with the past because that's a list of Confirmers, but you can see, you can see this is using this unit solution. Yeah, so any questions about the units before we move on to the next. The next notebook. Hey Matt, this isn't directly about the units but I'm curious, can you say a little bit more about how the Confirmers are generated. The first part of it is stuff like Confirmation generation and partial charge assignment. A lot of the molecule API actually internally calls these toolkit wrappers, which are classes that basically get the molecule objects to call already kit or open eye tool kits or some other tool kits as well to do the heavy lifting so things can get a little bit more things slightly more involved if you have an open eye license but for this workshop we aren't installing open eye or dealing with the licenses there. So what is in these environments, and what's in all of your areas of environments are already kit and amber tools. And if memory serves the default behavior here is for the Confirmers to be generated by already kit with I think it's already kit dot chem dot embed multiple Confirmers or something like that, using some sort of some sort of sensible defaults. And then I believe the aim of BCC charge assignment happens by happens with the amber tools. I think it's sqm is the program that does it. So, there's a lot more that we could talk about with that I suppose but the short of it is that this is all wrapped. This is all wrapping other chem informatics tool kits. Okay, got it. Thanks so much. Yeah. And one thing that I'll add is we we sort of highly prioritize interoperability of our molecule objects with our D kit and open eye and so people sometime like if anybody was at the workshop yesterday we had to we had to connect to molecules using a new bond. And we we just sent it off to our D kit and then did it over there. And this is why we don't have a lot of more flexible functionality in our own API is because we, we want to encourage people to take an OFF mall, go to our D kit and modify it in strange and interesting ways, and then just bring it back to an OFF mall when they're ready to do physics with it. I think I saw Chapin unmute for a sec so he might have a question. I have a question for me Matt. The unit namespace is shared between open FF units and open MM units. So is there an easy way to import both of those into the same Python session. Yeah. Good point. So, a lot of so what Chapin is saying is, you'll see a lot of existing code that imports this unit namespace or unit thing whatever it is from open MM, and what I'm telling people to use in the open FF stack is something like this. And the, there's a collision here so I personally I just use an alias. Since I sort of want to get things in open F units I will do something like this. And then if I can if I know how to type, then it looks a little bit, you know funny but you can just sort of work with this just like just like you're doing before. Answer your question Chapin. Yeah, thank you. Okay, yeah, there's definitely a potential for that. Okay, now let's move on to the objectives notebook. This is this has the least code of any notebook. Zero code. And this is this is kind of setting the stage for the actual aims of the project itself. This is a Python package, provided by the open force field initiative with a high level objectives being storing manipulating and converting my molecular mechanics data. There's sort of this core class of the same name. And this does the storage, and this provides a bunch of methods kind of high level and low level as well to do manipulation. And also some high level functionality for exporting things out to different engines and formats and such. The, its existence is a large part to provide a kind of discrete state that results from doing doing your typing. So before you go into the anything, you know, open MM specific. So just to continue on sort of the open MM usage example. If you spent time with open MM you, you know, has a great API for doing all sorts of stuff but one of the kind of tricky things is that you're dealing with a bunch of objects. And you have your positions, you have your topology you'll have a system, and then eventually have you'll have your, your simulation and your context and all that other stuff as well. And they can, if you're doing a lot of stuff at once, or maybe some experiments it can kind of get a little bit difficult to keep track of everything and how everything is associated with each other. So one of the things that interchange does is it provides a single object that kind of serves as a container for all of those. And will. And the next couple kind of goes through each of those individual components. The focus in general, at the moment is applying screen off style force fields to chemical topologies. A group of molecules, including potentially just one molecule if you're doing gas phase stuff. And then. And then we also want to export out to engines. The current state in at a very high level is that we support open MM, I would say pretty well. And the other things that we do remember are a little bit more in development, depending on sort of the use case, and then we also have a little bit of functionality for lamps depending on how much people are interested in those to to very broadly kind of give a picture of where the current functionality is that straightforward like physics calculations are are well supported for all these engines. And stuff like polarizability is not supported anywhere. Nor is it supported by the same aspect at the moment as well. And then kind of the in between stuff, you know, how will this protein ligand system work in this engine. Stuff like that, that kind of varies engine to engine later today we'll be going through a protein link an example in open MM. For example, I would doubt that works in the current life's export. And also charm is not something we have support for right now, but depending on interest of our user base that's something that we could provide. Okay. So before we go through this. The key components of an interchange object are some representation of the physics physics that results from applying the force field to the topology. And it also stores box vectors positions. Velocities, and also the topology itself, this this chemical topology. And for most users. This is the bridge between loading up a molecule data set and it's me in our first field into the toolkit and then finding a way to use those in your engine of choice. And yeah, this diagram unfortunately cuts off the word toolkit and interchange in ways that I couldn't find out how to fix. And this is kind of a handful of different user flows. You may have a ffx and now, or speaking of style force field. You want to load that into the toolkit and you get this force field object. Some analogously, you'll end up with a molecule object in the toolkit, or potentially a number of molecule objects if you want something that's that's condensed phase. You can come from a smiles string and SDF file a PBB file a bunch of different things that are all pretty well documented in the toolkit documentation. If you're unfamiliar with that there's a nice cookbook for getting stuff into into molecules that are encouraged you to take a look at. And for those, there's interchange dot from Smirnoff that takes those two objects and does does everything with that. Optionally also there are the box vectors positions and velocities that can to two of those to the two out of those three can be inferred from the topology object that's passed to from Smirnoff. If there are positions on molecules and if the topology has box vectors, but you can also just set those on the interchange object as well. And then once you have that state created from there you can do the sort of manipulation and inspection, but you probably just want to get out to running some simulations and there are a few high level functions that they handle that directly. So that's the end of this notebook. Are there any questions on this before we move on to the next one. I might add. Matt mentioned that we have these different exporters on the right we can go to amber or open MMR grow max. And, and like you said we sort of have different levels of confidence in each on this on this figure it's indicated almost as like dotted lines. In terms of our confidence about exporting to open MMR, it's very, very high for all of our released force fields. And we actually in the last release did a little, a little sleight of hand and now folks who use the open force field toolkit in their workflows have seen the create open MMR system method. And actually all of it now runs through interchange. So, when we say that we're fairly confident in our export to open MMR. We're so confident that actually the toolkit when you run create open MMR system just makes an interchange and then converts that to an open MMR system. So, yeah. And Matt, I wonder, do you want to mention the from open MMR method here. That's a good point. So, yeah, so I was talking all this about how interchange tracks internal state, but I only, but I only talked about one way to get there. And that's partially because of the motivation of this is to help get open FF stuff into others engines. But from Smirnoff is not the only way to create an interchange object and we hope that in the future there will be a bunch of different methods that do this. There is from foyer that can take in. I believe a foyer force field and an open FF topology with small with also some helper functions to interoperate between open FF topologies and build compounds. And also from open MMR that takes in that takes in populated open MMR objects, at very least a system and a topology, and then create some interchange from that. It may, you know, you may wonder why you would actually want to do that. But there's a lot of interoperability hurdles and pathways that make that actually useful in some cases. So open MMR has a ton of interoperability and a bunch of file loaders. And there's also open MMR force fields out there that provides a nice way to get a version of GAF and Espeloma, and I think a couple other force fields as well into preemptory systems but only into open MMR. Yeah, so open MMR is another, it's a little experimental and we may or may not have time to get to it later today. But the key point here is that there are going to be a bunch of different from X methods to create these interchange objects. Alright, thank you. If there are no more questions. Let's see, we're at 40 past so we have still a little bit of time for the before the first break. I believe the next notebook. Let me check that one. Okay, cool. Alright, so we're going to get back to actually working with some code and nope, that's not that one. So I am in the construction notebook. And what we're going to be talking about here mostly is actually using interchange dot from Smirnov. So, for reasons that I'm in the process of fixing right now, the actual import of interchange takes a few seconds. Which is, I think is not good, but I'm working on fixing that right now. But anyway, once you have that imported, we can check out the doc string for from Smirnov. And there's a lot of complexity here. And unfortunately, the type signatures right now are a little bit are a little bit hard to read, but the important bits here are that the required arguments are first yield and topology. And the first field object is a spin off first field using this object provided by the toolkit. And then the topology is ideally ideally a topology, but we also allow you to pass in a list of molecules as well. And in the case of passing the list of molecules the first thing that happens is that is converted to a topology using the toolkit API. And there are some other optional arguments that are there for largely for compatibility with existing toolkit behavior. Those are a little bit less important so we're not going to get through them today. But again, the main point here is first field and topology and that's what you need. We're going to ignore this tool, this, this open eye warning. So, when you call it internally it will use the tool kits, smarts matching logic, just in the same way the toolkit has been doing for, I suppose several years now. It does this by directly calling the same toolkit APIs that were called before. So, if you recall that one of the things that makes me enough a little bit unique is the use of direct chemical perception, which does a lot of, which has a lot of cool features and effects that the actual scientists will do a better job of explaining than me. One of the consequences of that is that there's not a explicit concept of an added type that you might be familiar with if you're, if you're familiar with basically using any other MD engines out there. And so by extension these don't exist as sort of very important first class things in interchange. There are ways to get them to be important for compatibility with other typing schemes, but I probably won't get that get into that today. Probably if you're working with interchange a lot. What you will find is, as is pretty common working with MD stuff, a lot of your time goes into system preparation, and a lot of the time here will still be system preparation. A little bit, a little bit less than before. And of the time you do spend preparing your inputs, most of it will actually be giving things into structures that the toolkit understands. Because once you're there, once you have your first new topology. It's the same method that you would call anytime. So, okay, let's, let's keep going so this block of code does. I think just what just what we did in the notebook or two ago, we load a molecule from smiles. I don't know the name of this molecule but it looks vaguely drug like let's say. And for reasons that will become clear later I want to get two conformers on this molecule so I call generate conformers I tell please give you two conformers. And because this molecule is big enough to have to kind of chemically distinct conformers, it has those. Yeah, so the default representation here for something with multiple conformers actually displays them kind of like an empty trajectory. This is using in jail view, I believe. So we can see cool we have our two different conformers. And then I want to make a topology from this molecule. And then I have one of the two inputs I need. The other one is, let's just get a first skill. This is a small molecule. Let's just use a small molecule first field let's load up sage. And yes, this will be a time. And that currently interchange implements everything that's in this specification. Except for GPS a models which we can, we could prioritize if that is something that's that's important, important enough to enough people. But the nice thing about that about the current support the current state of support. And that because almost all this mirror spec is implemented. That means new force fields that are developed using the current specification are kind of de facto supported. So there are a handful of features that show a lot of promise for sort of various scientific reasons but for other reasons have not made made them into main mind open FF force fields. One of the good examples of this are virtual sites and wb wb o based valence parameter interpolation. So these are ones that you could write for skilled that has this in them, and interchange supports those. And as long as you follow this specification, you can, you can expect interchange to understand to understand that. It's not a Smirnoff section per se, but everything is in place for biomolecule support. And so this is all to say that once any of these features finds their way into sage or one finds their way into rosemary. I believe after rosemary is his time. Or if you want to write your own Smirnoff force field, as long as you follow this mirror spec. These are supported by interchange. And that would also include if you do a bespoke fit. So if you want to refit torsions for a specific ligand. The result of doing that is you actually create a separate force field. And you can just load that up into interchange just like before. And I believe there's an interchange workshop. Sorry, there's a bespoke workshop in I believe a couple of weeks that I think will be of interest to a lot of people. Okay. Enough of that, we can do what I said we can do and call dot from Smirnoff. Pass it our first deal past our topology and we get this interchange object out. And then the wrapper just displays a very small summary of information I have some potential handlers, I did not specify periodicity on my topology or pass in a box argument to from Smirnoff so it does not know anything about periodicity so it is not in periodic. And then in my topology I have 23 hours. Okay, let's actually go through some of sorry. Before we do that, let's hit this point that Jeff mentioned earlier. You may be familiar with using force field dot create open amend system. This cell is a little bit of an update. So, if you're familiar with force field that create open system that takes in a topology. Like just said earlier. Now, a lot of that heavy lifting that was previously done by the toolkit is now done by interchange so now when you call force field that create open open system internally and interchange object is created and then it is converted out to open amend. So these two lines of code do basically what the toolkit does now when you call it. However, you can also get from the toolkit. Oh, sorry, there, there are two similar points here that can be made. One of them is the point Jeff made about about what happens internally in the toolkit. The other is, if you have a force field in the topology, you can call interchange dot from smear off, or you can also call force field that create interchange and those will have the same effect. Those will combine the first field of the topology into an interchange object. Okay, let's step through some of these components now. So there are five components, zero of which are required but at least two of which you're probably going to want to exist. We have a topology, which stores chemical information. So independently of the first field. We have handlers or potential handlers as they're called often internally here. And these map the force field parameters on to kind of what is actually applied, what is actually stored in the final in the final systems that you run. The first field is not stored directly anywhere in interchange. It's the result of applying a force field to a topology. We also have positions and velocities which are positions and velocities, and then we also have box factors, which is just the information about the previous day. Like I said, none of these are required. You can just create an empty interchange but I'm not totally sure why you would do that. There's there's no information here so I don't know. I don't know what you could do with that. The topology attribute carries directly a toolkit topology. So this does not act like a toolkit topology it just is. The same API, the same functionality is the same if you do interchange topology dot whatever, compared to if you call that whatever on the topology that you probably prepared beforehand. So for example here in a topology we may we may be curious about the number of atoms number of bonds or the smiles that represents the first Adam or first first molecule. The smiles of the first Adam is like probably not very useful. The, in the future, there may be a separate way of doing this in interchange, you know, I'm, we may not always use the tool kids topology object but for now it's for now it's worked great. I also have these potential handlers that are stored as a dictionary mapping between a string that identifies effectively the type, and then a potential handler object of which here you've seen a bunch of some classes. So we have a, the exploration notebook later goes into a lot more detail about these because they're a little bit more complicated than than that I can get to just a couple of cells. So, we'll go a little bit deeper into those later. And then we also carry positions along. These are just open FF quantities. In this case. So interchange has a little bit of logic in it inside of from Smirnoff to read from each of the molecules, if they have conformers. And then if all the molecules have conformers interchange kind of assumes that those are positions you want things to be. Those are the positions you want things to have. So, in this case, the positions of the interchange object are directly. It's the same as the as that zero confirm of the molecule. For what it's worth just kind of by conventions for for most use cases of the toolkit, it will be the zero or the first confirm that's read. I suppose just be aware of that if you have a lot of conformers and it's really important which, which one is, is it actually used. And yeah, this just shows. I have an open FF quantity. The same object we we were talking about a few minutes ago. And then these positions look like they certainly could be atomic positions for this molecule. So if you wanted to not use the zero confirm, you don't have to. You just have to set the positions again. The center should take anything that is array like, and is of the shape number of atoms by three, kind of as you would expect. So the first code just demonstrates how we can pass it the second conformer in my in the molecule, pass it to the position center and the now no longer. And this is, and this is immediately updated in the, in the interchange object. So, if you wanted to you could look at, I'm not 100% sure about the for okay, so the first item is that is a slightly different positions. So, switching the confirmers is maybe not the most interesting thing in the world but if you're doing some more involved system preparation. And you have all of your molecules. So you have a big, you know, pretty leading complex, and then you have some ions and all this water and stuff, you can, you can, if you want sort of prepare your positions with some external tool like pack mole or do you make solving or something like that. You can create the interchange object without really caring about the positions, because you can just go and set those positions later. Okay. Lastly, there is also a box object. It can be none. If the system is non periodic, but if the system is periodic it's expected to be a, I believe a three by three made for us that that represents, represents the periodicity. And again here from Smirnoff just make some effort to infer things based off of the information it's provided. So, like I said earlier, we did not set box vectors on our topology if we wanted to we could have but let's just say this is a gas phase system that we care about so we don't have box vectors, we can see that the resulting interchange object does not have box vectors. However, if we change our mind later, we can just set those pretty similar to setting the positions. And then finally you can also, if you want, you can set velocities just setting positions. It's not not going to look like I can't imagine this having sort of fundamentally behave fundamentally different behavior than positions but it's also not been tested a lot. I think in some cases if you're doing some, you know really involved I think free energy stuff or if you're doing some involved science you may want to explicitly store the velocities of all of your atoms for, you know tracking whatever you're doing. Okay, that's the end of that notebook. We're coming up on the top of the hour again so this is a good time to take a break I think but maybe we should leave if you open the floor for questions here Jeff. I'll open the floor for questions real fast and actually I'll make one comment which is this this behavior where an interchange by default has coordinates, if all of the molecules and the apology had coordinates. This is a big change from what we used to do. To everyone who was told open force field it's easy. Just go load your SDF run create open MM system and simulate, and then you tried to simulate the open MM system and learn the painful lesson that it has no idea what its coordinates are, despite the fact that you just loaded the 3D molecule. This behavior and interchange where it's made with positions built in as long as every molecule and the topology had some positions that it could take. This should make your life way way easier. Yeah, so many of our examples have weird, weird position arrays floating around that we have to keep track of separately, and hopefully those days are now behind us now that we're converting over to interchange. Great. Does anybody else have questions or comments about the construction notebook. Okay. If not, we'll go ahead and Matt will queue up the next notebook and now we'll take the first of our breaks. Matt, do you think a five minute break would be good. Yeah, sure. I just wanted to get a cup of water. Okay, great. We'll take a five minute break now. And we will see you back at five past the hour. Thank you very much everyone. All right. Thanks everyone will go ahead and resume momentarily. Well, let me turn the screen share back over to Matt. And we'll move on to our next notebook. And Matt, could you review the table of contents for today. So you get back into it. Yeah, all right. So, so far we've gone over a lot of a lot of background we've talked about units, some objectives of the project. You have made some interchange objects and then next step we are going to do some exports, which should be one of this program of books. And then after this, after this we'll actually do the more complicated work of running a protein leak an example. Then we'll look a little bit into the internals and then depending on time we might do virtual sites we might do something else. Okay. So, I've talked all right by now a lot about how the goal is to store a lot of information find ways to get information in and just kind of track internal state in some way. But that doesn't actually solve your problems. We need to end up talking to talking to MD engines and different finals and stuff. So, Interchange offers a lot of high level APIs for doing these conversions. And we're just going to briefly step through some of these. So, my first block of code, I make a biphenyl molecule, which is, I think one of the molecules open up have likes for reasons that the scientists would do a better job of explaining than me. But out of this, I load up sage, and then I call from spring off. You may see, I'm just kind of lazy sometimes I may pass in a molecule as molecule dot two topology. This, like I was saying earlier, this will do the same thing if I just pass it. A list of molecules, a sort of one list, those those will do the same thing. Okay, but some of the experts. One is a PDB writer. If, if I remember correctly, this uses open mms PDB writer in general, especially with stuff like PDBs that are kind of complicated and have a few sharp edges around them. I'm super interested in making the world's hundred native PDB parser. So for, for this and most of these and some of these cases we will, we'll rely on external tools. See if I can just to show that this did something sensible. So this is, it looks the same because it is basically the same information but here's the nglv representation of that molecule. Oh, okay. That's exactly what I had prepared. All right. If we want to talk to grow max, we have dot to grow and dot to top, which writes out the grow coordinate and topology file, respectively. In this case, we're getting a warning due to kind of a current quirk of grow max not explicitly supporting vacuum simulations or sort of gas phase simulations. It's basically the same for first for I think for most empty settings is basically the same if you just pass it a really big box. So, in this case kind of assuming you might be doing single molecule stuff interchange will actually just set a somewhat arbitrary five millimeter box. And yeah, this is, you can work around this if that's an issue just by just by setting a box if if I'm if five being muses too small too small for whatever reason. But if this causes issues of course please please reach out and we'll work through them. And again here just showing that showing using nglv that the file does as kind of what you what you would hope it does. I may have showed this off a little bit earlier, but there's a to open a man function. There are I guess they're actually a few but the high level one on interchange creates an open system. And I guess, those of you who are really familiar with open a man know that it's very very flexible, which is one of the features but it's also but it kind of can lead to do some complexities as well because there are, there are for most things a handful of different ways to get to get something done. And simulating molecules is no exception to that so for for compatibility with things that aren't sort of the most simple non bond enforcement use case. And I wrote to open a mem to actually export a handful of custom non bonded functions. So this splits out the Vanderbilt and electric staff interaction into separate forces and then kind of quirky consequence of that is that the one for interactions need to be added back into into the system by custom bond forces. This is something that you can turn off if this is something you don't prefer my based on discussion or based on discussions I've had with open a man developers that should not have any impact on performance. But but maybe you you're using some tool it kind of expects a non bonded force instead of all the sort of stuff. So there's an argument that controls that. And if I recall correctly. This is this argument is set to true. When this is called inside the toolkit. I would not say I would not say that one is better than the other. I think they're both, they're both, they're both fine. But the toolkit, since it's inception or since it had this functionality always prepared its forces in this way so that's that's why it's there in the toolkit just for just for maintaining a little bit of similarity with with previous behavior. Yes, that's that. The the parmet exports, parmet exports the amber exports are very similar to the gromax exports. I'm a little bit less experienced with parmet than I am with open a minute. I'm less experienced with amber than I am with open a mem or gromax, but as I understand it, it's somewhat analogous these two objects to the two gromax files. And this does the the exports accordingly. Yeah, you can also write out lamps. Like I said earlier, lamps is comparatively less used by our audience and our partners. And I don't believe anybody in open ff uses it internally so this is less battle tested than the other ones but it can write out a famous data file. And then finally, this is not MD per se, but there's a lot of interest in sort of machine learning adjacent groups to get vectorized representations of these parameters out. So we have APIs for that. One kind of detail that becomes pretty clear. If you if you spend a little bit of time with this is that compared to other things like, you know, if you're going to deep learning on a photo or a video or something like that, kind of, you can have everything in one representation. So that sort of idea does not map on very well to the parameterized mm systems, because I guess just take my word for it, because I'm not finding the right ways to express this but it becomes a lot simpler if you split this all out into different short of getting a matrix out for all of the physics parameters at once. It makes it makes it a lot easier to get them sort of handler by handler so one for the bonds, one for the torsions one for the vendor wall so on and so on. And for each of these there are two separate ones. One gets you the first few parameters. And then the other gets you what's called the system parameters. So, in this case, the biphenyl for bonds, I guess the biphenyl has three unique bonds using sage my guess is this is carbon carbon in ring carbon carbon between rings and carbon hydrogens. And these are, I believe, structured in this case as force constant comma equilibrium bond link. So that's why this happens to be two columns different different handlers will have different shapes, but the number of rows here is the number of unique parameters in the force field. And then these are expanded out into the system parameters here the number of rows are going to be equal to the number of bonds. Like I said, if I do a different one. You can see it, you know, it has some similarities but it's going to have a different number of columns because just torsions behave differently than differently than bonds. And these are all unit list representations because if you're going to be passing this off to some, you know, some machine learning library it's it's going to want stuff in just raw race without anything wrapped. So those are right those units that we get stripped off of off of these pretty quickly anyway. So this, this functionality does not sort of have a clear use right now, but this may at some point in the future go into some kind of next generation force field fitting tool. So if you're interested in using some sort of vectorized representation of the stuff, please, please get in contact with us. We'd be, we'd be interested in seeing how we can, how we can facilitate that. Okay, but that's the end of this notebook. Are there any questions on exports before we move on to the protein league example. I think I might emphasize we don't need to look at the files now but we, we showed the correctness of the exports in this notebook this by by loading up like the grow file and GL view and seeing that it had the right coordinates but all of these exports have been exporting the physics parameters as well. So, you can, you can check for your engine of choice when you run this on your own computer but yeah if you crack open those, like the farm top file, you should see all the all the physics values in there as well. So, here's the farm top file and all of its portraying glory. There are no questions and I think we can move on to protein ligands. All right, great. First I want to give credit to. I believe this example was largely built off of a toolkit showcase showcase example that I believe Jeff you wrote to maybe three years ago, and it's sort of slowly been progressing over time as the as the stack has moved around. So what we're going to do here is we're going to going to take a just one protein leak in combination from a set from a data set that was used to benchmark some open FF force fields. We're going to parameterize it. We're going to solve it and then we're going to export the system to a few different engines. This is not production ready for a couple of reasons relating to a couple of details that will that will point out when I get there. Of course, well, if you want to build something off of this I would just strongly encourage you to just spend some time validating validating the experts and sort of the details of the contents that you might get out of this. All right. Okay. First things first we have a big block of exports of exports of imports to run these. These warnings we can all ignore. Okay, let's grab some files. So, like I said, we're pulling this from a protein link and benchmark data set. We just have these hard links here and we're just, we're just downloading these, these files. We have a protein in a PV format, and we have a ligand in an SDF format. These are, I'm not going to say what is a preferred format and what is not per se but this is probably the most this is probably the combination pair. This is probably the most common pairing of molecules to file formats. And then you can see I have a bunch of craft in here but I also have download these two files. Okay, we have to do a little bit of work before going straight into the toolkit. I think at some point in the future there's a large interest in having the toolkit parse sort of multi molecule PDB files for something that has sort of various other things and at like crystal waters and co-factors and dot, dot, dot in it that are not just a protein but for now the toolkit wants PDB files to just have one protein in them. So what we're going to do is we're going to use empty trash to slice out the crystal waters, I believe, which I don't think is something you want to do necessarily but it's it's what we need to do to get us running. Okay, so yeah so this used empty trash to just pull out that first protein or to just pull out the protein, which I have to know is this chain ID zero in this PDB file and then we save that out to slice dot PDB. So this is the file that we're actually going to be using for the next few steps. Okay, so the toolkit now offers from polymer PDB, which builds a molecule from a PDB file. There's a, there's a good bit of complexity that is involved with this. I'm not really the person to best communicate that if you're interested in learning more about this functionality, the stuff at sports, maybe the stuff it doesn't support right now. There's some documentation around that I believe the version 11 version 0.11.0 release notes would be one good place to look, but also yesterday, Josh Mitchell ran a workshop that did some other stuff but but spent a little bit of time talking about the biopolymer functionality in the toolkit. I think at some point in time that recording will be will be live so if you want to learn more about that that's something that would I would encourage you to check out. But now that we can load it. Let's load it up. This one might take a few seconds. I actually forgot how long this one takes to run. But I will just skip over a little bit to the next, the next section, because I know this next one takes a little while to run. Who knows if you'll be able to do that in this at this particular moment in time. So, you know, you want a nice trick. I was going to say, for people who are used to running Jupyter notebooks and jail view does a fun trick and live demos where if you try to execute many cells simultaneously, none of them will render an NGO view until the last cell is done executing, which is, yeah, often very inconvenient and I've learned that the hard way. Well, I guess I guess now so now's when I learned that the hard way. Okay, so we've loaded the protein the protein looks good. It looks like a protein. When when Rosemary is released. This next bit will no longer become quite as relevant. But if we want to run protein ligand simulations using spin off. It's a field that supports supports proteins and biopolymers in principle you could use sage to do that, but that would not be very good. You don't want to use a general force field for for proteins. In this case we're going to use a port a spin off port of FF 14 SB. There are a couple quirks around how the improper is are handled and the total parity with ambers results, as measured by the energies I believe is not quite there but it's pretty close. Certainly it's fine for instructive purposes, I would argue but please don't use this in production and assume that it is just a total one to one conversion of FF 14 SB. But all that being said, it's something that we can load into a first field object so we will do that. That took a little bit to run that that took a little bit to load because it's a large first field, and then we will create the interchange object from this, which will take a little bit to run as well. I think in this case is just because there's sort of a lot of, there's a lot of stuff to do, there's a lot of a lot of smart matching. Okay, so we have a protein interchange now this is the result of applying this FF 14 SB part to the protein that was loaded up by the toolkit. And I just want to emphasize here, the way you do this is kind of the same, no matter what first field you use no matter what topology use. But we don't just want to protein in vacuum so let's keep going let's load up the ligand into a molecule object, like we've been doing before. And then I can load it up sage I can make an interchange. Again, basically the same basically the same as we were doing before. So let me take a little bit to run. Probably because it needs to run a one BCC calculations on on the ligand I'm not sure how big this ligand is but like that, but that can take a few seconds to run out of this will get a legion interchange. And then, from here, we can just add these two interchange objects together. In Python in general, adding objects is pretty unspecified. So you can't just do that out of the box, but when you're when you're writing your own code, you can override what happens when you call plus and interchange has done that. So I'm, I'm emitting a warning around this because this is is a place where a lot of mistakes can be introduced, but we've we've tested it for some cases and what we've tested so far looks looks good I believe. This adds them together. This attempts to add the topologies together this attempts to kind of merge that per potential handlers combined the positions make some make some inference about the box vectors, but out of these separate parameterization boxes, we completed each of those and we just added them together. So now we have a doctor interchange that includes the protein and the ligand. And then, just to. Yeah. So, again, because the, because this is already prepared. The ligands already docked. This, this will look nice out of the box. If you just did if you just made the molecule you generated a brand of customers and it was just kind of somewhere in space it wouldn't look very good. But yeah, this looks, looks pretty good. Okay, then the last thing we need to do is add some water to make this a little bit more useful. I don't think I want to go through all every line of the cell. Basically what it's doing is it's using a pack mole wrapper provided by open ff evaluator, doing a little bit of math around figuring out the number of waters and stuff. So this running quicker. I set this to have a density of 500 kilograms for cubic meter. This is a density of water so this is kind of intentionally not a realistic water density but I just wanted to make this less likely to crash and not run for a few minutes. I'm sorry. So once all those inputs are prepared. So this function parses the pack mole output as an empty trash trajectory. And what I care about from that empty trash object is the coordinates and the box vectors. So I will save those into X, Y, Z and box. And then I will separately from that I will create a water interchange. So I will load up sage again. Sage includes tip 3p parameters. So this is effectively just using tip 3p. And then I make a water, then they make a topology out of the number of water molecules that they have. And the water molecule that I made. Okay, so then now I have a different interchange object representing my waters and the work that I did earlier means the positions positions of those waters should work with my doctor team. So I will just take that doctor change, add it to the water interchange, and then I will use the position and box setters to take in the information from the kind of went through this long interoperability route. Trek. Okay, so here we can see this looks pretty good. So we have, I think maybe a nanometer or two of buffer of water, and definitely still looks like we have a ligand docked in a protein in what might be a reasonable, reasonable location. So that was kind of system prep, but also showing how depending on how you prepare your systems using the plus interoperator using the plus operator might be useful. From here, the exports out to the different engines are relatively straightforward. And then if we for the open MM topology, the topology object from the toolkit provides a to open a method that converts that open FF topology to an open MM topology. And we can call that directly, we can call that directly here and get that out. So this. Most of what you need to run see relations and open and you'll still want the positions and the box vectors but those are, those are on your interchanging and you can pass them on to the onto the setters as appropriate. We also can call the different exports out to the, the other engines using just the same API that I described in the previous notebook. Unfortunately, these are a little bit slow. So for a very large system like this. These will take a little bit longer than I want to wait for in a notebook. But I, I hope soon I can make these a little bit more performant. In general, I would not expect performance to be one of the best features of interchange. I mean wrapping everything in units and writing this in Python means it's not going to be as fast as like some, you know, C plus plus or rust parser or writer, but I'm sure that something like this can be at least exported in, you know, under a minute or something like that, ideally a few seconds. But anyway, this is that's my explanation of why these are all in if else or if false blocks so they're not actually running. I'm kind of for my own testing. I, there's a module in interchange called drivers that provides these these high level functions that take in an interchange object and computes zero point energies from different engines. This is useful for a bunch of testing stuff that I don't think I need to go into here. But for for cases like this it can be useful to to either spot check things to make sure my bond energy isn't you know, 10 orders of magnitude higher than you'd expect it to be. You could also excuse me use this to compare the energies reported by different engines by the different exports. Like I said earlier, unfortunately the amber gromax exports for the for the large system are a little bit slow right now. So to demonstrate this, I will just do it for that ligand interchange we made earlier. And so out of this, I believe right now these are just little hand as data frames. But these report out the different components of the potential energy function, kind of as reported by in this case it's only open and amber because gromax is not installed because that would make the binder image a little bit bigger. So if I'm supposed to install it would find those it would add the rows together. So you can see here the bond angle torsion. Numbers are very, very close, you know, basically exact kind of by mm mm standards. The variable and electrostatics are a little bit quirky. It's just kind of hard to get these engines to agree on on what exactly not, you know, non bonded methods should be used and stuff. I recognize that those numbers are a little bit off but I discourage you from assuming that that means that the vendor all parameters or partial charges are written out incorrectly it's just just kind of a quirk of how these are, how these engines are called before I open out for questions I did want to point out. So this example spends a lot of time doing system preparation stuff. By no means, you are, but by no means isn't the case that the workflow here is the only way to do it. For example, you may have your own favorite way of adding waters into your system. If you want to use GMX all day if you want to use the I think there's add solvent in open mm. There are a bunch of ways to do this. Really all that you need to get out of it is just the information that you really need you need the positions of your waters. And then probably also the corresponding box vectors so this cell you can imagine being written in a bunch of different ways, probably just depending on what you're familiar, what you're most familiar with or what you prefer. Okay, I think I will open it up to questions. Questions and have people been able to run this notebook on their machines as I've been going through it. Do people have other questions about usage around here stuff you could other other team can you can do. Or just general questions I'd be happy to help out with anything. I wanted to point out one thing actually which is so we had shown these exports when you when we're transferring information between between these different engines. The reason why sometimes is tricky is because the like a palm top and a and a grown itp file or grown top file. It's not that it's like a technical problem of like how do I like what data do I take from what fields here and how do I convert it into these fields here. In several cases, these files contain fundamentally different fields. And so we have to do a little bit of guessing when we change from one format to another. And what we'll probably wind up with in a couple of months, maybe once we start getting more more user reports and we start understanding which of these information differences cause confusion. Matt, could you scroll up to the ensure unique Adam names. Yeah. So in some formats, Adam names and Adam types are different and in some formats they're one in the same. For open force field we don't care. We don't care what an Adam name is, maybe just have bonds between between particles and the particles are identified by an integer index. So we're looking forward to a place that expects, maybe globally unique Adam names in the entire file, in order to identify the bond parameters or maybe unique Adam names within a construct called a residue. It can get complicated and this is a place where we expect user workflows to have a little bit of trouble. So, here we have a keyword argument that actually users may need to use called ensure unique Adam names, which will start policing how we handle Adam names. And this can make your exports work correctly and if you're combining components from different sources some engines will require unique Adam names. This is similar to the combined on bonded forces keyword argument that Matt had shown in the previous notebook. Yeah. So we're anticipating that in our documentation, we're going to have a sort of just like a bag of dirty tricks for if you're using interchange and a workflow and the wrong numbers are coming out or the specific engine that you're going to is making funny noises. We'll, we'll have like a little checklist, or like a set of tricks that you can use to try and make the different engines happy with these conversions. All right. Well, hopefully this example demonstrated in some part how how this might. This might be a kind of drop in a bowl and maybe some of your, some of your workflows if you're doing stuff with proteins and leggings which I think probably most people are. And then the final point that could that I would like to make is here there was a kind of a non trivial amount of work associated with okay I have these different components I need to apply these different force fields to them and going to make the different apologies and stuff. When rosemary is released, it will. One of the key objectives is having a self consistent bio polymer and small molecule force field. Probably somewhere in there there will be a water model I don't want to make any guarantee around that but if water model is included in there as well then this an example like this becomes a lot simpler, because we can prepare just one topology with kind of everything in it. And then the force field will include both small molecule parameters and all of the torsions and stuff and library chargers and charges and stuff that model proteins well. So you could kind of do that all in one step and and maybe if water models include isn't included it will still be simpler because that will be kind of two steps instead of one here. I think based off of the time I'll go I'll just jump right into the next notebook, and that will probably take us, maybe not to the hour but a little bit closer. Okay, I am in exploration now. And here this is split out just to spend a little bit more time digging into the details of these of these potential handlers. So I start out with the same code block we did earlier, we load up a biphenyl parameterize it with. Remember is it with sage. And I wanted to actually look at the Adam indices, just to get a visual representation of this that the Adam indices you can access programmatically from the molecule API but since this is a small molecule and they're just a couple. Adam, Adam's I'm interested in I wanted to look at it. So I joined a little block of code from the already kit cookbook. And this just just displays it with with with the Adam indices. One small detail is that it didn't want to represent it didn't want to put out a C zero so I had to add one to all the indices. The index is everything at zero. So this is slightly. You'll see stuff is off by one that's why it's also mentioning sorry with this molecule representation I realized I saw this yesterday and warm up. This really is a biphenyl there's it's not for fused rings, it's just two rings and the Adam names for the the ages coming off of the biphenyl are overlapping. There we go. That's great. Thank you. So. Okay, so we have these handlers I've talked about a little bit before. And I kind of made a reference to how these correspond a little bit to the parameter handlers in Smirnoff force fields. Exactly one to one because of the way that different partial partial charge assignment methods are split out into different handlers in Smirnoff first fields but we kind of want to keep them in one handler and interchange so it's not one to one but everything. Some of them are some of them do directly correspond and ultimately all the information in the force field has some corresponding representation in in in an interchange object. Yeah, so to make these, I made a base class specifying kind of if you required fields. One of these is type, which is a string that just kind of says what type it is basically. This is kind of repeated information but it makes a DC realization practical. There's also expression, which is just an algebraic representation of whatever potential is used in that. So, you know, for the for the Vanderbilt that expression is the going to be the liner Jones. Leonard Jones 12 6. This is not directly parsed in the sort of mathematical sense but it is used to keep track of what is compatible with others. So if you made, you know, 15 sticks. It's like potential because you think that has better physics associated with that you can certainly get it into interchange but some of your other exports might not work because depending on what supported there will be some track to make sure it looks like as whole six Linda Jones. There's a another field that specifies the sort of parameters which is useful for talking to the parameter handlers and knowing kind of what I'm expected to know how to process. But then most of the data is stored in these two other fields that are dictionaries there's a slot map and a potentials these are these are both maps so the slot map is a mapping between topical topological locations, which I call slots for hand. And these map to unique identifiers of applied parameters. And then the potentials field maps between those identifiers and the parameters themselves. So we can look at that. Okay, so let's try and work through this with the with the bond handler. So, yeah, so with the bond handler I have a couple of these required fields, the type is just bonds, and the expression is just the harmonic bond form that we're familiar with. These, the next two fields are not used in sage directly but they kind of demonstrate how each individual handler might add a little bit of information so if you're going to do bond order interpolation. So we're not for quite a couple more fields to know the type and then then how you're actually signing the fractional bond orders. And these are processed from the first field directly. Okay, but the actual important stuff here is in the slot map. I said this is a mapping so we use a dictionary in this maps between the keys are these topology key objects which which store Adam indices and the values are these potential key objects, which know just a little bit about uniquely identifying the parameters. So let's pluck out the very first one. Just by unwrapping it into a list and grabbing the first, the first value. So this topology key is associated with the bond handler, and it includes these two Adam indices. So this is just saying the bond between these two atoms doesn't doesn't know anything about the physics. And this we want to map on to an identifier of the physics and the identifier of the physics will not be the physics directly, but it will be a key pointing to it. So then, here, I will look up in my slot map, I will pass it a key, that is the first apology key, and then out of that I get my potential key. The potential key is just a little just a little object that knows what handlers associated with and some unique identifier of that potential. So, in the case of spring off force fields, you can uniquely identify parameters with in an individual handler with just the smirks pattern so this is the smirks of what what was applied there and this is. So I think this is just a pretty general aliphatic smirks. You could imagine for other Adam typing scheme for other typing schemes, this could be like, you know, two Adam types squished together, if you're doing OPS or something amber style. And then once you have that, okay, sorry, that's the information included in there. And then, once you have that potential key that maps on to a potential object. And I know there's kind of a lot of a lot of layers to this but there's there's a point to all of this. Once we arrive at actually having the potential object that actually stores the parameters here. So, out of all this work we've, we've gotten to the point where we know for Adam for the bond between Adams, zero and one. This is the first constant, this is the equilibrium length, these are the values that are sort of prescribed by the force field. The mapping here kind of has these has these extra layers because we want to make sure that we can do duplicate the mapping between between the potential keys and potentials themselves, because my slot to potential key mapping is going to have be as many slots but then the second mapping, I can duplicate it out a lot, which just kind of helps storing stuff in memory. Okay, so with all this information I wanted to also show kind of one way you could maybe do something a little bit more useful with it. I think this might be one cell that had a slight modification since the, since the release that I tagged yesterday, but it's pretty close and I think if you follow along you'll be able to see the differences. So we can just kind of take the code from the previous few cells and wrap it all up together into a function that takes in an interchange and then takes in a couple of Adam indices. And then you can sort of automate the, the, what we did in the, in the previous few cells. And you can, you know, get a little bit closer to kind of interesting scientific questions so what I've done here is I've called this function on the interchange and I've asked it okay what what are my, what's my equilibrium bond length for the bond between zero and one, and then what's the same thing for my bond between five and six, and I picked out five and six, because from earlier, this is five and six same as six and seven. That's the bond between the two final groups so that's Yeah, you could also you could I, you know, modify this a little bit, maybe the bond length isn't the thing that you're most interested in maybe you care about the first constant because I don't know something's too floppy or not floppy enough and you can sort of, you can sort of do the same thing and see that I have I have slightly different course constants as well here. So I wanted to do this to kind of demonstrate how, how you could use stuff like this to to do the inspection of your systems. So here, here I also rewrote it again slightly to instead look at the torsions that kind of correspond to in each ring in between the ring, and you can see that the proper torsions not the, well, I guess these are aromatic right there yeah so I don't believe that these are. I don't know for these systems how much the planarity is defined by the proper versus the improper torsions but you can see that the force constants here are different by quite a bit. Okay. So that ends on this notebook. Again, hopefully you can see how a little bit more of the internals here and how and how you might be able to use this in your research or your workflows. If there's no questions. Then we might go for our second and final break of the workshop. And what we'll do here is we'll take another five minute break and so we'll come back on the hour. And after the hour will have the choice. There's another kind of advanced so this exploration notebook I would consider a bit advanced. I think it's, it's cool, because it goes into some of the neat architecture, and it shows you with the inner workings of interchange. The next notebook that we could run is also a little bit niche. It's about virtual sites and how they're represented in interchange. So we'll have the choice of either running that. But maybe what we'll do when we come back will have a more open discussion and q amp a, in case people do have larger topics that they're wondering about. And then we can, we can take a vote as to whether to go on to the virtual sites notebook or if we want to extend the discussion and developer support time. So thank you very much everyone will be back on the hour. Folks can correct me if I'm wrong but I believe this is this year is the last year that the US will go through daylight savings time. I've noticed, especially in the age of zoom. It's difficult to, it's difficult to give absolute times for people in different time zones. And so we always, we always give reference to the hour and thankfully we don't have any. We don't have any Venezuelans or anybody who's offset by 30 minutes in the consortium or I think in our, in our audience yet. But yeah, I look forward to, to not having daylight savings anymore. I believe we'll, we'll do the cycle once more in the United States and then we're done. I think after the next switch that would be permanent. Yeah. We'll do the cycle once more, but there's like, there's summertime which is a good time with late sunsets and then there's like winter time and which is a sad time with early sunsets. And I don't think anybody on earth actually likes winter time. I don't know why we still have it. And so I think we're going to do winter time this year and then when we come back it's it's summertime permanently. I think I could be wrong. I'm optimistic. Okay. Well, we'll see when Matt gets back. One thing I did want to ask is, so we showed this, this objectives slide and one of the important things on it that we could use a little bit of feedback on and that might be worth a little bit of discussion is this objectives diagram. If you look at the right side of it, and I think that's the part that we're kind of emphasizing today we're talking about the interoperability with a bunch of engines. And so we, right now we have all of these arrows coming out of interchange. And, you know, if all of these arrows were bidirectional so if we could both export to amber and also import from amber and you know, then immediately export to a different format. So if all of these arrows are bidirectional, we're basically permit, and it's a big goal, but in the long run, we do want all of these arrows to be bidirectional. I would, I would wonder from some of our partners here today if you're using permit and you're thinking about using interchange to replace permit, which of these pathways are most important for you. Are you using sort of the existing formats or do your workflows have internal formats that you would need to convert parameterized molecules to and from. Yeah, if anybody wants to chime in, we'd love to know sort of which pathways to prioritize. Yeah, I'll go first so we're primarily going from in essence to the open FF stuff through to amber formats. And then taking the amber formats into open FM. And the reason we do that rather than go directly into open MM is because we also want to allow users to sort of prepare externally their own palm top files and load those in. So rather than having to completely separate paths for one starting from molecule one starting from palm top, we just take everything to a palm top file. I see. So you send everything to palm top and then you use the amber suite to combine all the components into like a single system, and then you send that to open MM. No, we build all of the components ourselves. So we use a combination of the open FF toolkit and palm it to combine all the components together into a single palm top. So we just take that through into open MM. Oh, I see you have open MM directly read the palm top. Because it has a palm top reader. Okay, exactly. Yeah. Cool. So we know with open MM we had just described some of our difficulties where we see particles as integer indices and a larger topology, whereas a lot of the existing engines. See parameters being applied between atoms of certain types. Do you encounter that a lot using paramed on a larger scale in conjunction with open FF exported molecules. I don't think so at the moment we're not heavy users of paramed. We used to use it a bit more but actually at the moment most of the stuff we do using the open MM force fields. We use the code to go from sort of molecules straight into topology and then to a sort of amber format topology and coordinate files. So at the moment we're fairly light touch users of paramed just the occasional tweak. Okay, excellent. Yeah, we've noticed in putting together these examples I think we didn't understand. The underlying principles of how to combine components from different things or you know if we want an amber parameterized protein and something else parameterized ligand. What pathways would be taking and we found that the big commonalities were before interchange if we're setting up a protein ligand simulation, we basically would either find ourselves using paramed or open MM force fields. And, you know, both of those are a little bit tricky paramed doesn't like that, that we don't respect Adam names and Adam types and open MM force fields is I think intended to be like a stop gap provided by the Kodera lab. While, while sort of open FF takes off. And so we are targeting a lot of the functionality that we see is essential from most packages in a protein ligand workflow where we're targeting replacing those with with interchange. Your feedback would be especially valuable if you start integrating interchange into your workflows, because we would love to know what's slow, what's what's inconvenient what is convenient. Yeah, so that we can we can work better for you. The one thing that's likely to be fairly important for us is actually preserving Adam names. Just because we've got a whole sway the machinery to start from a protein and go through sort of various prep stages and let users to tweak it. And then feed it eventually out through the machinery into open MM run a dynamics and then reading the output. And things always get moved around a little bit. And so we've got a complicated set of atom mapping codes. So we're half of that code bases that mapping codes by these days. But that relies heavily on the fact that you know if this atoms called C 14 and the output it's going to be called C 14 and the input. And that's how I can tell it's the same atom. Yeah, Matt, could you in the chat, could you post the doc string for for something with ensure unique Adam names. So basically yes, we ran into the same thing where where it was hard to predict when we could preserve Adam names and when we absolutely had to change some Adam names so that the exported file would have the correct physics. Yeah. And so we just posted in shortly before our last big release we put in a method that it was so you saw the keyword ensure unique Adam names and it was false or maybe it was true and the thing that we saw. And we realized that true and false doesn't provide the granularity that you actually need to use this in the real world. Yeah. And so now you can provide either true false, or the name of a residue iterator. So you could say, you know, don't ensure unique Adam names anywhere. That's false true means ensure unique Adam names everywhere, and then residues, for example at like the string residues would say, there's an iterator on this molecule called residues, and within each residue, make sure that the Adam names are unique, but between residues they can be different. And so that's, that's part of our effort to not completely obliterate everybody's Adam names. Part of the issue we've had, we're using the amber formats for sort of historical reasons. But the pain we have with them is of course they completely blow away all of your residue numberings and identities. So it's, you know, which aspartate in the output was the aspartate in the input. Again, that's not necessarily always trivial. Yeah, oh man and then open free energy put in an 11 hour feature request to respect insertion codes which we hadn't even considered before. So now we respect insertion codes as being a defining characteristic. Yeah, as long as you sort of support, you know, a residue names numbers and insertion codes that's going to make me very happy indeed. Okay, great. And yeah, I think I think you should be happy but I'd love feedback if the new tool gets doing something that you don't like. Thanks. Does anybody else have like a story about what important expert pathways would be really important for them because this is basically going to guide our development priorities for the next year. If we can hear from you. Yeah, that's strong opinion. Yeah, we'll, we'll continue prioritizing kind of amber and open MM as as top priority and grow max as well and nobody's really come and made a compelling case for for charmer lamps yet and so we'll probably keep those sort of on the back burner will prioritize the top three. And, and wait until somebody complains about charm and lamps not being supported or having issues. Okay, cool. Does anybody so we're going to get into the virtual sites notebook after this, which is a bit of a niche topic and may not apply to everyone. So does anybody have like any bigger topics, like guidance questions or, or sort of topics about any of the previous notebooks that we've looked at today. And as always feel free to put questions in the chat. And yeah, Matt, I think we can go ahead and get started with the virtual site exploration notebook. All right. Let's. Okay, we should be back to my Chrome window with a bunch of notebooks open and do it. All right. So, this largely builds off of, I think, stuff we've done, we've done already. The point I kind of, I kind of keep repeating is that interchange. So if you're not using gvsa's interchange will take in whatever sort of smear off style force field you pass it. And whatever toolkit topology, you provide it. So, so here, it's mostly be going through your stuff that's similar to what we've already gone through. It's just that the forest fuel has virtual sites and we'll see sort of how that, how that affects the exports a little bit. So, to do this, I actually want to run some simulations. So we have a short, we have a function that takes in, takes in a few objects and runs a short simulation in open and then, and then it somewhere along the way it writes, writes out a trajectory that we can look at. Okay, so what we're going to do is we're going to start by adding some weekend. This requires a little bit of virtual sites. We will write up a little bit. This is all described in. I just got the, your internet is unstable thing. If I, if I was lost for a while, just please interrupt me and let me know if I need to go back to your stuttered a little bit you sound good now. And I don't think you need to repeat anything. So, there's there's documentation about the virtual site support in spin off in the specification and a little bit more user facing docs in the toolkit. So this, this might look kind of complicated and dense and long, but the, the short of it is, we have three different types of virtual sites here. So we're going to add a couple of kind of lone pair virtual sites on to a sulfur, and then the other are going to add a kind of Sigma whole correcting or something like that. And then we're going to add a bunch of virtual sites into around halogens, specifically, kind of along the bond between a halogen and the carbon that is bonded to. So that information is all encoded in this, in the string, this kind of XML like string, and then remember that spin off force fields you can sort of load a bunch up at once. So we're going to, you know, squish together a lot of model squish together water models to do weird stuff to bespoke force field that's tacked on to sage and what we're doing here is we're just adding these virtual sites to sage so I wouldn't say that we're we're sage now because we're kind of using a modified form of it but we're using something that's based off of based off of it. And it should go without saying just because the numbers take things but these numbers are have like no scientific basis and if you, you know, the physics here are probably probably worse than having than not having virtual sites so these are not these are not trained at all. So we will do I did not run to sell that was the problem. So this is a kind of large ligand so we'll take a little bit to to create the interchange I think most of the time is spent doing the AM and BCC on this will take a few seconds. But here you can see the ligand. We have this sulfur in the middle that will have a couple of virtual sites tacked on to it to do something with the ESP around the lone pairs that actually do exist on sulfur. And then we have the forings on the left and then we have chlorines on the on the bottom, and there should be some sort of signal type virtual sites on these. I'm going to run these cells ahead of how I'm talking just because we can see this one is. This one is taking a little while. So, so, again, this this this from screen off crawl looks very similar to what we've done before. One difference here is that virtual sites are added on as a separate handler. So I want to assert that that they're in there. They are tricky because virtual sites need to know about geometry and stuff but they also affect potentially affect the electrostatics and Vanderbilt directions so there's a lot of crosstalk between those handlers, especially when you're exporting stuff out. We just want to run like a couple of picoseconds in open mm. So to do that, we're going to get an open mm topology and an open mm system. A small quirk kind of in our infrastructure. Right now is that. So you may remember earlier I used interchange dot topology dot to open mm to get an open mm topology out. That's because the toolkit already has this way of going from the open ff topology to an open mm topology so I'll just use it. The thing is virtual sites don't really virtual sites don't exist in molecules, they're sort of a construct of applying force field to a molecule so in that sense, there's no way that the toolkit can can really can kind of sort of formally know about virtual sites. So you can't put a virtual site on a molecule object, you can sort of only deal with that at the interchange level, because you need to know a little bit of the physics associated with it as well. So if I call interchange topology to open mm, I will not get my virtual sites because again the toolkit does not know about them, or the molecule object does not know about them. To make that actually happen though there's a separate interchange dot to open mm topology method that does basically the same thing as the tool gets to open mm, but it does include the virtual sites. So I understand this is a little bit confusing because there are two things that look pretty similar, but two things that are not quite exactly the same. I get it and I hope in the future we can find a way to make this a little bit more on a little bit smoother but but for now we need to we need to call this method to ensure we get the virtual sites. There's nothing different about calling the open mm system. The handling of virtual sites is handled in there. There are just some checks. This is not needed to run the simulation but I wanted to just get out the number of virtual sites. You can do this by querying the open mm topology. I want to make sure that the number of virtual sites that are in topology is the same that as the number of virtual sites in the system, and then also the same as the number of slots in the virtual site handler. And those are all the same. In this case, it's five, five virtual sites everywhere. And then we can run a simulation. It took two seconds because we didn't run it for very long. And then you can see I have you know I just have this little molecule things around in space, the physics of this movement look reasonable I would argue. And if we look at the actual molecule. The PDB doesn't this particularly PDB file or the way it's parsed through MD trash or something it doesn't really know about about virtual sites they just kind of looks like different particles so some extra bonds are drawn. You can see on the sulfur here, the sulfur is the yellow and the two other the two virtual sites are these yellow spheres. These bonds are obviously non physical but that's because you don't really have bonds between virtual sites and the kind of next carbon over. So, we can see that the virtual sites are on the sulfur here, and then this is a chlorine, you can see a virtual site was out of date. I guess it was on the other side of the bond but you can see. And kind of the same story here with I think these were forings so cool, we have a virtual sites from our force field. We run it through everything that we were doing before and we actually have virtual sites in our output and bad force field parameters aside. I would anticipate that the virtual sites are actually going to have an effect on the on the physics. So, so this shows that it kind of all works. And in the future, hopefully there's a mainline open FF force field that includes virtual sites. And, you know, short of more extensive testing and validation of all the various, all the various possible cases of virtual sites and the different things. Interchange basically supports these just with whatever, whatever force field is provided to it. Okay. So, to get maybe a little bit more involved. I wanted to show off how, if you, if you want to dance through a couple of hoops, you can also take this. Ligand which has virtual sites on it and solvated in water using a foresight water model. So now they're virtual sites on the water molecules as well. Work currently discussing ways that open force field might distribute water models or potentially other other types of commonly used force fields in Smirno format. Nothing's kind of set in stone right now and so I can't make any guarantees around it but I would like, I would like to be able to ship out this this XML without without digging into my tests and and I suppose I suppose in cases like these just just passing around the XML snippets and examples is okay. But anyway, this is a smirnoff encoding of tip 4p. And just like before where I tacked on the virtual site XML onto Sage as it was being loaded, I just tacked this on. So from here I load everything up and I have this one force field object. I wanted to give it a different name to avoid accidentally confusing myself. This is all on notebook but this is Sage plus the virtual site bit plus tip 4p. This walk of code should look pretty similar because I basically stole it from the protein ligand example through a few minutes ago. I won't go over that. I won't go over that again. This takes the ligand finds a number water molecules finds a number of water molecules and sulfates them into the empty trash trajectory. While this runs though I will make a subtle point here. So, so I loaded tip 4p into the force field. So now there's one object in memory that I have that knows I'm going to do virtual sites. This entire block of code here does not know anything about virtual sites. It doesn't know that we want to use tip 4p on our water, it doesn't know that we want to use these virtual sites on our ligand. In principle, I could use the same block of code and I can apply tip 3p. And I believe one of the examples in the interchange repo does this where we sort of build everything with a three site water model and then because those molecule objects are the same if you're going to use a three or four or five or whatever site water model water model. You can just reuse that topology just applying a different force field to it. So, so again because like the backfill stuff all ran with without virtual sites, you can kind of use different force fields along the way there. Okay, and then what I do is, I think stuff that I think of I've shown before plenty of times again I make a topology. I want to set my box bar to some positions. I make an interchange passing the stuff that I made. And then just to show off that we can run a simulation. I call this same. This one took a little bit longer since I have waters, and then this this representation is a little bit more involved, but I want want to show that it definitely looks like the ligand still has these virtual sites on it. And the water looks pretty funky because visualizing four and five say water models is is kind of funky. Even though this is at a kind of goofy low density. The very early trajectory physics that you get out of it is at least, at least more sensible than just blowing up so you know from this side of some some reasonable confidence that the simulation will actually run. Okay, so this again this example shows off how once you once you get the virtual site parameters into Smirnoff force field interchange just handles them. And then depending on you know how you actually call your engine you might want to do slightly different things so we have to jump there a couple of loops to get the virtual sites into the open and topology. But again, once, once virtual sites, hopefully exist in a mainline open FF first field at some point in the future, interchange is ready to handle them. And if you want to write your own Smirnoff first field using virtual sites or port some existing first field that uses virtual sites into Smirnoff. It should be pretty well supported. Okay. That's the virtual site notebook. Any questions on any questions on this or any other people like to discuss. If there's nothing all. Yeah, Matt, Matt made helpful mention of a couple things. Oh, thanks for one. Yeah, Matt made mention of a few things that more or less interchange currently smoothly handles everything in a released open FF force field, because all of the inter because all of the open FF tools are both different tools and research tools. We build out the infrastructure for our next generation of force fields, before those force fields are released, because our infrastructures used to fit the force fields that's how we make sure that people get the correct parameters applied as as we use in training. So for our future plans regarding force field releases. We hear a lot from our advisory board that they really like us to include virtual sites that OPL us already includes virtual sites and why don't we. And it is a big topic so here you can see we have the infrastructure ready, but the numbers are all made up that emerge or just mashing on a keyboard. And to do the training is is a bit difficult, and we're working on the infrastructure to the training correctly. It's, you know, a poor man's virtual site implementation could be trained using just electrostatics, which we can, which we can calculate using our quantum chemistry implementation, but a better one and to be consistent with the previous open FF force field that we've done, we should not just be fitting to electrostatics but also physical property data. And it's hard to get enough physical property data for, for example, software containing molecules to know the density of these mixtures of sulfur containing molecules such that we could train a virtual site containing force field on them. We're in a data scarcity position right now. And what that means is for the Rosemary force field we're looking at improvements to electrostatics but they're still going to be Adam centered point charges. So what we're trying to do with ANA is try to move to something better or more scalable than am one BCC, so either higher quality, or an implementation like a neural net that can look at larger molecules and and assign am one BCC charges that are consistent with how we've trained our force field, but the things like big polymers and that would that would let people do amazing things in material science and formulation chemistry. So, yeah, again, just a temporary expectations. We're super excited about virtual sites, we've got them implemented in the toolkit and an interchange and we can run simulations, but it will be some time before we come out with a parameter set that includes virtual sites. If nobody has questions or comments, we'll get into developer support time. So first, thank you all for for joining us today and this was this was really great that we had a good audience. We are trying to use interchange and you run into any trouble. We encourage you to look at to talk to us on the interchange issue tracker on GitHub. And this is a good way for us to make sure that we understand the details of your your issue, and that we can direct future future development to resolve what you need. You can find further examples as Matt is showing and the interchange examples folder in the in the main repository. So this is separate from the workshop repository that you were sent today but the main repository is not too hard to find it's in the open field organization called open FF interchange. If you'd like more urgent support. We have on our open force field slack. There's a channel called tech support that you can write into. And we'll we'll try to handle your problems as quickly as possible. So that's it for sort of quick usage questions. If it's a feature request, you'll probably just get redirected to the issue tracker to sketch it out so we can put it on our roadmap in detail. Right, and so I'll stop the recording.