 All right, thanks everybody. So last talk before lunch is telling you about one of the near-term plans for improving the quality of general torsions in the force field and How this involves all of the different working parts that we've been assembling over the last year Okay, so this is a we're gonna trade off between a few people here because a lot of people have contributed to this And only some of them are mentioned here on the slide The basic idea is that We're going to take sets of molecules that hopefully are representative of the space that we would like to perform well on Enumerate the different pro-nation taught America states that are relevant a fragment them in some useful way That gives us one dimensional and two-dimensional torsion drive capability for them by looking at both individual bond rotations And and couplings between them and you the thing will tell us more about why that's important And I was able to actually do this to achieve high accuracy is in his last force fields We'll generate these in torsion scans using high-level qm in parallel and deposit them in an open database that Daniel Smith at Mulsey will tell us about But the workflows for doing that and the public database QC archive that he's been engineering as part of a larger effort that we've been very lucky to synergize with and then we'll fit these first generation of course will be with force balance second generation may be with Bayesian methods that hi has been working on and then there's a lot of different pieces of code That we've been working on to pull all of this together So I'm going to go ahead and yield to be pinged to tell us more about Why we're driving one and two dimensional torsions and what the tools involved in this are all right? Thank you, John I will only speak for a for a pretty short time about About some sub components of the sub project as you'll see this will this involves a multiple interrelated software tools And the and the part that all I'd like to report progress on that we kind of achieved in the last year is Is the is the following tasks? So So the task says given a molecule initial confirmation quantum calculation specification and selected torsions to drive Produce an n-dimensional potential energy profile where torsions are constrained along the specified and Orthogonal freedom are minimized Now I should be clear that this really is sorry closer. Okay. Yeah Okay, okay, this this okay, okay Yeah, so I so I should mention that this really is kind of a sub problem in the In the in the larger problem because if you want to generate a lot of high-quality torsion data You have to also identify which like, you know, which degrees of freedom to drive which is also a very difficult problem that That I will be telling us about a little bit later So but this first part was achieved through a combination of open-source tools that I will briefly tell you about So first so first is a tool called geometric. This is an open-source tool for geometry optimization. This actually Seems very similar to the force balance that I already told you about instead of optimizing force field parameters We're now optimizing geometries and this code talks to quantum chemistry codes to By passing them the Cartesian coordinates and getting the and getting the gradients in the getting the energies in the gradients in Cartesian coordinates from the quantum chemistry code Geometric itself implements several internal coordinate systems and determines the internal coordinate steps and take steps to Ensure that all of the constraints that you specify are satisfied along them in the course of the minimization and so on And this code really was written to implement a new coordinate system called translation rotation internal coordinates with the With the design goal of being efficient at optimizing things like water clusters and super molecular complexes That's not directly relevant to torsion drives But that's why this software was written in the first place And you can see that the performance is better than other Than several other codes and other coordinate systems Okay, so I'm so we're going to be relying on geometric as sort of a low-level tool and the quantum chemistry code like side Or is the even lower level tool that geometric is calling to obtain the energies and gradients at a given level of theory? Sitting one level higher than geometric is a tool called torsion drive and torsion drive torsion drive generates this Generates this grid of constrained Energy minimizations through this idea of wavefront propagation. So this figure here is supposed to illustrate the difference Between what I would call a uni you need directional Scan of constrained minimizations and the way from propagation. So imagine if Sorry Imagine if you had a starting geometry that may be at this corner in the grid And you wanted to generate a grid of two-dimensional minimized structures You might pick a leading dimension and a trailing dimension in which you know in the leading dimension you optimize You optimize along these values keeping the other value constrained And then once you and then for every one of these valleys that you optimize then you then you start new Optimizations in the trailing that's what I that's what I'm referring to as Unidirectional scan whereas in this other way from pop propagation idea you start with an initial point You start optimizations at all of the nearest at the nearest neighbor grid points And then at the nearest neighbor grid points you repeat the cycle again And in this process of way from pop propagation You will end up starting new Optimizations at grid points where you already have minimized energies But because you have changed the initial conditions of the minimization You might pick up on new local minima and this in this way from propagation. You repeat it recursively until until you end up with a complete surface and the advantages of using the way from propagation is that Is that you end up in general with lower energy structures? It is a it is more robust in terms that in terms of if you have a starting structure you You you will you're guaranteed to end up with a certain potential surface Whereas in the unidirectional scan is going to depend on the scan direction that you choose as well as your choice of leading and trailing dimensions and If we are running these optimizations in a distributed manner, which Q3 archive is doing You may actually be able to achieve the way from propagation results in less wall time because because these optimizations are launched in in waves you can actually end up with a converged result with with with fewer sets of optimizations compared to the unidirectional case even though the total number of Larger okay the total number of optimizations in the way from propagation is in this case is four times the number of grid points Whereas in the left case, it's just the number of grid points So you pay a factor of four in terms of computational cost, but you get these benefits that I've listed on the right And here are some examples with a one-dimensional and two-dimensional torsion drives This is a modified version of the molecule that that Haya provided and we carried out a unique directional scan as well as As well as a torsion drive scan shown in your two panels here And the blue curves indicate the potential energy of the minimized structure and the orange curve indicate the number of Optimization steps needed to reach a minimum So don't really worry about the jaggedness of the orange curve That just shows you how many steps the algorithm needs to reach a minimum Okay, but what I would like you to pay attention to is the kind of asymmetric nature of The of the curve on the left and the reason for that is that as you are driving the torsion in in a chosen direction Your constrained energy minimum ends well Your your minimized structures get stuck in higher energy local minimum until for example You break you break some kind of Intramolecular non-bonded interaction and then now you're in a different local minimum And the shape of the curve is going to depend on the choice of direction Whereas in the torsion drive there really is no direction that you're choosing So I would argue that the quality of the result is better on the right-hand side. Okay, um, and um, hmm Um, and I'd also like to show you an example with a two-dimensional torsion drive So this um, so this code which was originally Written to only support two-dimensional rides can now support arbitrary number of dimensions So at least we can do one and two and some cases even go up to three and in the case of two You can first see that a two-dimensional drive contains richer information than a one than a pair of one-dimensional drives That's and it also shows you that if you perform this a uni directional Or I guess you could call this a bi directional scan in in a in geometric alone But you do end up with some of these Getting stuck in these high energy local minima indicated by the red regions here Or as if you use torsion drives wavefront propagation You you do spend a little you do spend a factor of four more in in terms of CPU time But you uncover the the local minima that were previously in right plus you end up with a more reproducible result so So I so I think that is So these so these tools are basically in place so assuming that you have the molecule and you have the torsion that you want to scan the integration of these of these four tools including the QC archive that Daniel will tell will later tell you about should a should be able to generate this data for you, you know using a Completely open source needs okay, um and um and now I now I hand off the mic to Haya who will Will tell you more about the molecular Hi, I'm hi stern. I'm a graduate student in John cadera's lab, and I will talk to you Okay, and I will talk to you about some of my work on fragmenting molecules and You know different aspects of the torsion drive pipeline So before we start fragmenting molecules and driving the torsions We want to enumerate protonation and tonomeric states of these Molecules now, why do we want to do that? first of all Most molecules exist in equilibrium of different protonation states and another reason is that different protonation states will change the The torsion profile of certain bonds, so it's important for us to enumerate it now Currently, I'm using open eye to enumerate these states, and it does introduce some unreasonable states and Because of the way the torsion profile Sorry because of the way the torsion pipeline is working we do do an AM 1 calculation So that does filter out many of the unreasonable states. It is a pretty expensive filter We are working with Marcus Vieter in a postdoc in our lab on a more efficient way of Generating reasonable polymer states Okay, now once we have the states we want to get torsion parameters for We want to fragment these molecules in such a way that They are that the torsions that we want to drive are computationally efficient. So we want fragments that have you know one two to three Rotatable bonds because we want to reduce computational court on cost and also the conformational distribution that these molecules have and we do want some overlap so we can capture the the You know the torsional and the configurational distribution of molecule Now what are some pitfalls when you try to fragment molecules? So I'm gonna walk through this example of this biphenyl to demonstrate What what we want to be careful about when we fragment the molecule. So in this biphenyl Molecule we're looking at the central rotatable bond and what I'm showing over here is a torsion scan of that central bond then as you can see looking at it. This is a rotatable bond and You know, you can it's fully rotatable now as you start as you take this molecule and generate different coordination states for it the the the The torsion profile of that bond changes drastically. So if you look at is there like a Way to point So if we look at the that's the neutral form You got I mean the torsion profile looks very similar to the one we saw in the last slide But then if you have The cat ion here you see that the barrier heights increase this barrier height decreases and as you And here you introduce an anion and this looks closer to a double bond or an aromatic bond as you and it's and as you go To the zwitter ion This is a totally aromatic bond and if we look at the vibrant If we look at the vibrant bond order, which we calculate using a m1 You see the vibrant onboard the vibrant bond order goes from something that's close to a single bond to 1.5 which is an aromatic Bond and the reason and here I'm just showing that the reason why there's with a ion here is You know has has a bond order of 1.5 is because it's actually conjugated. It's part of the conjugate extended conjugated system now most chemian from addicts tools if you're going to be cutting, you know Rotatable bonds it will not it will label this as a rotatable bond and you would think that you can That you can fragment that now here I'm so the The important point of that is that you can have changes you can have some situations Far away from the from the rotatable bond that you're looking at that can change the electronic property of the bond. Yes And on the previous slide, how are you choosing your QM functional for these calculations? So for this one I used B3 lip and Was it was I think Basis sets and so So currently the the the level of theory that we're using for QM We're basing it on the benchmark study that was done by the Genentech Scientists that they showed that you know B3 lip is actually pretty good I'm just wondering because in my very limited experience with quantum it seems like protonation states can also change like which function and best describes things and So I just something that I'm wondering about personally a lot and so I was wondering what you'd seen Thanks. What one additional question as well How are you handling things where you've got parts of the torsion space? Totally inaccessible like you can get certain systems where you rotate them. They'll react for example, right? I'm saying when you've got How do you handle torsions where you can't get a complete torsion profile all over time? You know you can get a reaction occur and and then you've got to be able to detect what that happened if you're gonna Automate this right so I'm currently in the in the pipeline. We have to end of workflow We have two different ways of running the torsion scan So one of it is doing the full rotation, but then we also do these, you know restrained scans where certain certain bonds if they they're like have a bond or above a certain amount We only like go like and we don't do like we don't use the torsion drive what that does like the waveform propagation We're just going up You know the barrier if I like, you know five to ten degrees in the case of rearrangements like at what you're concerned about also I think we're computing the vibrant bond orders after the calculations as well so that we can make sure that the bond graph is Preserved and we can then label those or filter those after the fact easy That's because I say you can be sure you're not going into complete Definitely, so I Calculate the bond orders and we and use those bond orders to get a chemical graph and then check if it matches the initial chemical graph Or that will also pick up protons that That you know migrate Thanks So here what I'm showing is I'm just so currently I'm working on the kindness inhibitor set to get started With this so I had fragmented them and then These are a little small, but so I'm just showing certain These are vibrant bond orders calculated for these fragments and I'm showing how different fragments will have You know different so this is this is you know some of the fragments of the satin and if you look at this bond Okay, so you know you got a bond order of 1.02 here and that's 1.17 now 1.02 is very close to You know single bond 1.17 from like the different Um torsion drives that I have done it's it's it would like it will look different Maybe the bond that maybe the like change in the bond or it doesn't look as drastic as what it does to the torsions And if we look at this one here, I thought this was interesting where um the larger fragment actually had a bond order that was Much different than than a smaller fragment and this this has to do with how You know Substitutions further away are either an electron donating an electron Withdrawing so depending on which substitutes you take away or add on that can change things further away. So What we're trying to do is have a fragmentation scheme that will give you Fragments that are small enough so that you are not That you're that that your torsion drives are not terribly expensive And they're still going to be expensive but that you know to reduce the computational cause on cause but at the same time We don't want to we don't want to you know obliterate the the chemistry that we're trying to Generate data for so these are the criterias that we want when we're fragmenting the molecules We want to have a central charitable bond We want all the substitutions right next to it because that will give us the immediate chemical environment But then we also want the correct resonance structure, which is basically we want the correct You know bond order the correct You know electronics in that bond And then we don't want to go up to like more than one or two or three like tough sort hateable bond For computational on feasibility and then you know some things we do not want to fragment We don't want a fragment ring systems certain functional groups like a list of functions that we don't want to fragment We don't want a Fragment extended pi electron systems and this is where the Vibeck bond order can help us out so currently we have I have an initial on fragmentation scheme that Where you take your molecule? Calculate the Vibeck bond order is using a long calculation and then using chemoinformatics tools finding the Rotatable bonds and then once you find the rotatable bonds you build around it and you know you keep the Rings and certain functional groups and now once you have those fragments you check the the bonds that are going to be fragmented now What's the Vibeck bond order on that? Right, and if it's above a certain threshold which that threshold is I'm still optimizing whether threshold should be You know we keep we keep that we keep that bond And you know you keep on doing that recursively until you can fragment your molecule Now so this algorithm I'm pretty Thank you. The Vibeck bond order will depend on the geometry probably pretty strongly So you do a minimization on the first step as well. That's a very good question And I definitely looked at that a lot So for many of the bond so for the bond orders the bond orders that are involved in conjugation That's where there'll be a lot of variance with with with the geometry for the bond orders that are not really involved in the conjugation It's actually pretty tight so So yeah, I mean we we do like a but again, it's it's AM1 So we it's we're not doing like a full QM geometry minimization because that would be too expensive But the assumption that I'm going on right now which you know Still has some limitations is that because it's the bond orders that are conjugated that have higher variance I should be able to pick that up without doing the geometry optimization Well somehow you have to define the structure for which you're calculating the point orders And that will be critical in that first step that you actually get a Mini you can minimize that the at the Yeah, of course, I use the the minimal structure like a Like a from from open eye Yeah, yeah, the lowest you might still want to do an AM1 AM1 on that before a minimization at the AM1 level. I think would be probably good Maybe Christopher Bailey can clarify this we're using his AM1 computed Vibrant bond orders, and I believe there is a geometry optimization perform Maybe can you pass the microphone over to Christopher who could comment on the methodology? Yeah, so Hi is looking at some she's really trying to generalize this Approach of using the wiper bond order to drive these kinds of decisions I think AM1 is very cheap and you can run it on the complete on the whole entire small large small molecule before Fragmentation, yeah, of course. Yeah, that's what I do it I do and you could even run it on several different geometries of that as well, so I think Well, if you're you you could do that But another thing you could do is if you just got the AM1 single points at the various different geometries Then you might be able to more quickly detect when you have this kind of variance a bond order Which flags the kind of bond that you don't want to break right which I had looked at and which Which basically showed me that if you're going to have a Bond order that you don't want to break it's yeah, that's where the variance is going to be Okay, so This the above fragmentation so we want to use the vibrant bond order for We want to we want to ensure two things one is we want to ensure that we're not breaking conjugated systems And another thing we want to use the vibrant bond order is that if we have a molecule If we have a bond that has That's highly conjugated. We want to be able to extend out around that central bond To the point where we don't lose that Right, so that's that's that's slightly different than just making sure looking at each rotatable bond that you're not Fragmenting the conjugated system. So um, this is still a work in progress. Um, so What what I do here is so I have my initial molecule. That's um, that's in it Where I have the calculated vibrant bond orders and then we did the initial fragmentation scheme and then we can again recalculate these vibrant bond orders on the fragments and Look how how much they change Right, so in this case, it actually changes pretty that's like 1.09 and that's 0.99 So that's something that I would want to avoid so then just randomly you add on different Different parts of it. Um until you arrive at the bond order that is, you know close within what within what the variants of these bond orders are and um, and that's how you find the fragments that Have similar torsional profiles as the parent molecule um, okay, so this is just a little bit of Of of statistics on the kind of fragments we get out using this scheme So here we're looking at the fragmentation of 43 FDA approved kinase inhibitors and Using the initial scheme. We have 295 fragments that we got from those 43 molecules So if we look at how many fragments each molecule Produces produces most of them are, you know, reasonable like, you know between likes No, that's that's the distribution of it. Um, now Within the set I didn't find that many overlapping fragments between the set And but this is the what the distribution of the heavy atoms in the fragments and the And the amount the number of rotatable bonds which um, you know in this case most of them have You know between one and three rotatable bonds. There are Very few that have four to five Rotatable bonds too. Now in this set over here. I did not expand the states. This was without expanding states. I just took the the the, you know, the neutral forms and Fragmented but then um, when I expanded the states, we arrive at, you know, around double the amount of fragments Um, but if we look at the amount of states that get generated We find that many of these fragments actually overlap with each other. So even though we're generating a whole bunch Of more states. We're not generating that we're only doubling the amount of fragments that we have to actually Drive the torsions for and and so and so Looking at it from the, you know, expanded states for con generic series We probably will have a lot more overlapping fragments. So the the The amount of computation that you would need by just adding more to the con generic series will not increase that much Um, given that we're going to be keeping all the data in the database and That data will be reused. Um, so this is just an overview of around how many torsion drives you need for a drug like molecule Using examples of a magnet and the satin nib Um, so here I'm showing the fragments that are shared among the magnet and the satin nib states And you know, as you can see with the magnet, you've got 16 states, but you only have 15 unique unique fragments and And then, you know, the 1d and the 2d torsion drives that we get so for the 2d torsion drives currently What I'm doing is I find the rotatable bonds and then just do, you know, combinatorial The the the 2d torsion scans because I want to make sure I'm catching I'm capturing all the correlations between the torsions. Um, there might be a more intelligent ways of doing it so that you don't have to do All that there might be ways to Reduce that but currently, um, on, you know, using a little bit brute force for this. Um Okay, I think is that no. Oh, okay. So now I'll move on to another topic Um, which so now that we are generating all these fragments and we run the torsion drives. Um, we need a way to, um Store the data and make it reusable for the entire community. So daniel smith will speak about the qc archive project. Um, that That ensures that but for the for the q for the database. However, we need a way to, um Index these molecules such that they are usable for both for the Open force field chemo informatics and the qm community So for that, um, I wrote c miles, which generates these indices For molecules in the qc database now The issues that c miles is trying to address is first of all, um, when you generate a Graph a chemical graph the nodes The the the indices on the nodes are arbitrary But for the qm calculations that you want to run your, you know, your xyz Um matrix cannot be arbitrary like it needs to needs to the order needs to be Needs to be the same every time such that you can find, you know, the matches between them or, you know That they're equivalent. So what we are doing here is we are generating smiles that have these tags And um, so these tags are the map so we so these tags are the maps the map indices on these On the molecules and these map indices correspond to the order in the xyz coordinates And they're also like in c miles. There are these, you know, utility tools that help you regenerate Help you reorder the geometry that you have in that order and like once, you know, if you have a new a new a new, um A new chemical graph it maps it onto the It helps you map it onto the geometry so that you can always recover the order that you initially Submit it to the database. So that makes it You know, that makes it easy to to recover molecules that you already have Another issue that we have is, you know, these indices need to be the same if we're going to be using them to search Right, but the problem is that you're most of you are aware that even though Different packages will call something canonical smiles. They're only canonical within the package And in some cases the package version so um so You know So to ensure that we always get the same indices for, you know, the data set that we're generating C miles pins will pin the toolkit versions and we will be distributing it as a docker image such that the versions are always pinned To the same version. So we're getting the same canonical smiles um, and another You know, since we were generating this tools We figured that we also want to have some standardized representation of all totemers and protomers of Of the molecule. Um, so see miles also does that on I would say it's It's it's really good to have here. It might not be absolutely crucial Right now for the project But it's it's it's a feature that can be very useful if you want to get all the if you want to Search the database in the future and get like all the protomers of acetic acid or But the problem with With a standardized representation of the totemers is that each one that I had looked at still has some limitations and none of them cover everything Um, there might be other indices that I'm not aware of that people can point me to but for what I've looked at You know, I mean, you know in key is supposed to standardize For totemers, but it there's certain times especially like the keto enol is one well known that it doesn't it doesn't capture that Articid also has a new standardization Um function, but again, it doesn't capture everything and open eye also now has a new standardization Function, but there are some also that it doesn't Like these iso indoles It doesn't it doesn't necessarily capture those um I think so, I mean currently we have all of these indices and c miles um And I'm again working with marcus to have something that's more robust than these Um, but and you know, it's it's um, I think it can be a useful feature in the future So now i'm gonna give it over to daniel Okay, yeah any questions Chem informatics is hard dudes All right, go ahead and introduce yourself great. Uh, so i'm daniel smith I am a software scientist at mulsey. Uh, so i'm actually not formally part of the open force field project I'm actually with a different organization But we have a really nice project called qc archive which synergizes very nicely with the requirements of the open force field Uh, so we've heard a lot about uh qc archive and like what it does So I actually wanted to walk people uh through a particular example So we could see exactly how it works what it does. Um, what are the limitations or the benefits of this kind of approach? Uh, so I should note that qc archive is much more low level than everything else we've done so far So everything is very high, you know, I want to for example, get a whole bunch of uh, Torigen energy surfaces. Well, how is it that we do this? How do we marshal tens of millions of cpu hours and make sure that we never repeat those cpu hours across this project? Uh, so this is after you fragmented a molecule, uh after you've done everything in turn including preparing its original states And I'm actually a wanting to do a single torsion drive on a single molecule Um, or more of a question of not how I only do one of these, but how do I do thousands? So at the very beginning, uh Fragmenter itself will actually say, okay, I finally have a full three dimensional molecule that I want to run a torsion drive on That will be submitted to a central server somewhere From one of our clients, which are python based These things are also purely rest interfaces. So if you interface them through uh javascript or something else, you're more than welcome to So once you have submitted a single quantum chemistry torsion scan It goes through a client which finally ends up on the server Uh, and when it ends up on the server the server first says have I ever seen this before? And if I've seen it before I immediately return it. If it's a new computation. I start up a torsion drive itself Since torsion drive, uh, has wavefront propagation. We're not able to immediately fragment this So torsion drive is in fact a service where a service is able to make very small and very cpu Lightweight decisions of exactly what should happen. Um, so in this case, uh, goes to torsion drive It says should we start a new wavefront propagation or be complete? Um, how many more computations? Uh, so usually torsion drives can say hey, I need uh, maybe 50 60 jump organizations. Um, I want you to go out distribute these Across not only one local cluster with many nodes, but maybe many local clusters with many nodes So in this fashion, we're able to aggregate not only a single cluster but multiple clusters together from multiple ps Or perhaps a very interested power user in the community who would like to contribute Uh, so this can go out to look cluster of supercomputers aws, you name it We hook into everything because these things are effectively embarrassing parallel You wind up wanting these compute clusters, which will spin up geometric Um, and in this particular case side four to actually execute the geometry optimization Uh, these results go through a serialization process. Um, go back to the server Uh, torsion drive is then asked If the result is complete or if we need to compute new results And then finally once this process has finished its iterations, uh, torsion drive will say I'm complete and it'll shut down And so the next time the user queries a server, they're able to get this entire torsion drive data back And so because this is a bit more asynchronous, um, we usually Wait for a a continuous request. So I need this piece of data. Um, do you have it? Can I give it back to you? Um, if not should I So this is obviously quite a large software stack that attempts this goal Uh, so we have a couple different pieces, uh throughout So the client interface that you might find on your laptop, um, probably underneath fragments or this actually going off wearing and asking for the compute It's called to be some portal. Uh, we have a central server, uh, which will host all of the data and be able to submit New computations called piece of fractal We actually use a variety of distributed workflow tools depending on your, uh, Supercomputing cluster or if it's AWS, um, not this not a one size fits all So we're actually able to implement and uh, delegate to any number of these Uh, then we actually go through a small program called qc engine and qc engine is kind of the heart of everything We're actually able to take a single representation and farm it out to different qm programs semi empirical or things like torsion if you want, you know, any force Uh, so we're actually able to run these workflows. Um, just changing out two lines, uh, in an input So if I want to run it with semi empirical or quantum mechanics, etc I'm actually very flexible in this particular regard And then of course, uh, we can always go back and we can do different, uh, services So if you, uh, if you don't want to use qc portal and its client If you want to have like a web interface or etc You can do a raw rest calls to the client. I'm sorry to the server to get this exact data and so really at the heart of all this is something that we call the quantum chemistry schema and Effectively, this is what allows us to go through all these serialization distribution processes Where we have a central schema, which does very short descriptions of quantum chemistry molecules And we're actually able to have many different back ends that give us the exact same result back And what's really nice about this is we're no longer writing ASCII text files We're no longer parsing ASCII text files We're going directly to the quantum chemistry codes. We're saying what are every single quantity that you have We're serializing that and giving it back So for example here, um, a lot of times you want just maybe like what is the energy of a b3lp model And that's usually great, but you know, what if you want to dig into a little bit? What if you want to go back and get say like the dipole moments the algebra poles the library bond orders Usually this uh, increasingly means that you have to parse this kind of awkward ASCII text file The schema goes around that meaning that we have a formal way of getting these objects back So there's no more parsing. There's no more breakage of Basically ASCII parsing, etc And so this is online now We're continuously expanding the capabilities of this and the number of codes that hook into the back So analogous to the schema is we have the quantum chemistry engine What this engine does is it takes the schema and applies a single program to it So in this case, I'm actually able to simply take this task I can run qc engine, which is a python program and get the same result back from say Cy4 qchem and nwchem In the exact same format. So there's no question of how do I parse this thing? How do I get it back? In addition, um, we can also hook into for example, uh, CME empirical codes if you want Currently we have 4chani if you want to run any one to see how well that does for your test set And we also have things like force fields. So we're actually able to run UFF is one of our first approximations that we use for testing Which is incredibly useful the whole stack instead of waiting for quantum mechanics to actually compute And so we find that, you know, this is three orders of magnitude faster, which allows us to develop ourselves for so much faster And so I actually wanted to go a little bit into the api itself and how a qc portal call would work So again, this is very low levels. This would actually be inside fragmentary in general But if you want to access this project directly, this is So first of all, you build yourself a client you connect to a website somewhere And so that website will usually be multi central server But if you want to have private servers as well, that's perfectly doable And the first thing you can do is you can say list all the Collections that you have a collection is very analogous to workflow except it's very static in nature And usually we'll give you data back in a very specific format to the client So in this case, we have the open force field workflow that we've been dealing with working with hyatt to create And within that it looks like we're dealing with the Kemper project And we are switching between side four and already get compute basically between testing and production There's also other things called data sets, which are different kinds of collections, which are much more I guess tabular in nature. So I have like, you know, a very large number of molecules And I want to compute a single method for them. So there's a lot of different types of collections that you can So once you look at the collections in this case, I want to open up the open ff workflow I get the Kemper Cypher data set back on the Kemper Cypher data sets about 15 gigabytes at the moment So obviously you don't want to pull all of this at the same time I'm going to say what you do is you pull a metadata an object that you can deal with in python I'm going to pull the data that you're really interested in And so for example, and like I said, this this interface is very specific to Fragmenter itself. And so in this particular case, whenever I create the open force field workflow I've already determined All the fragmenter options all the rd kit options and exactly how I want this Um, so once I have done that I can I can add new fragments by simply specifying What my initial molecule looks like in this case is going to be an ethane fragment And again the constant chemistry schema And it's going to go off submit it to the server and actually start the cascade compute To actually evaluate this in a single line of code. Uh, and so of course, uh in this particular, um test set I think we ran, um, perhaps like 134 torsion scans Which resulted in maybe 80 000 QM computation. So this is a single job out of hundreds that we can add to this And we can also pull these results that quite easily. Uh, so for example Here, uh, one of the very specific things that we want is we want, um, what is the torsion energy profile of this? so instead of having to look at Well, we have torsion drives inside this torsion drives or optimization runs inside this optimization runs your individual constant chemistry Instead of dealing with all that you can just say give me the final energies and I can plot this directly Um, and see what that energy portion profile Was for this particular job So I'm able to pull back not all the data, but individual pieces that data very easily. Um, so that I'm not Basically hugging huge bandwidth. Uh, and this is a couple kilobytes versus many gigabytes And I should note again that Within of this within this run, we actually have all the wipered bond orders already there So, uh, I think we're talking about, um, what would happen, uh, with if a hydrogen is moving Well, we have, um, exactly we can go back and pinpoint exactly when that hydrogen move where it sat What happened to it, um, via the bond orders. Um, so we have all that data that we can go back to back um, I should note too is that the, uh, Quantum chemistry archive project is, uh, not just, um, about torsion data. It's actually for all kinds of data Uh, so in this particular case, uh, we're doing torsions, but we also have bond angles and proppers, uh, within Uh, the open force build collaboration. Uh, we're also doing, uh, Let's just add a potential data. Um, not quite yet, but definitely very soon Um, I looks like this was a snapshot before I was quite done with this. So I'll say it's about that But the other points that I wanted to make real quick is what's nice about the software stack Is that if you want to run it on your laptop, you can do so Or if you want to run it in master distributed fashion, you can do so as well So a lot of times you might want to say, uh, I want to run, you know, a torsion drive with any one on my laptop And that's perfectly feasible with the competition resources available So, um, it's incredibly elastic in nature, uh, able to hold small projects all the way up to very large projects And everything's actually There's all kinds of other capabilities that I was going to list here But uh, basically any kind of quantum chemistry data that you want, um, we can compute it We can organize it in a fashion, um, that's useful for And that's the flexibility of the project to do it. Um, and I believe that's all I have If you'd like to talk to me more about this, please come and find me Questions on the qc archive part I think this is going to be a fantastic resource for the community and for especially for enabling machine learning All right, just a couple more slides before lunch. Um, so, uh, the last last thing. Oh, yes, please go ahead Where's that data going to sit? Sorry, currently he's going to sit at tac. Um, the tech advance computing center Um, yeah, and it should be open for anyone. So you just Anyone can pull from that central location. Thank you I should ask if there's any questions from zoom as well All right. Uh, so once all that data is is retrieved, uh, obviously it goes in the first generation into force balance to, um, parameterize things Uh, there's an ab initio target. I believe maybe actually maybe you can just walk us through this slide very briefly Sure. Okay. All right. So I'll uh, I'll I'll go through this a little bit quickly because the Because the final result of a torsion drive is a is a set of ab initio single point energy calculations It can be directly translated into the um into the ab initio target type so that force balance can fit force field parameters to reproduce these energies and optionally gradients the um So the automation of data from qc archive into force balance on that that part still that part still needs to be written but I think that um But I think that should be pretty straightforward because they were really designed to interact with each other um, and um, and this and this is already um, and so the use of torsion drive data to parameterize force fields has uh, um, Has has been done and I'm definitely Not the first right? There's just some examples where force balance has been used to do that. Um, the uh, these figures You've already seen from from the previous talk where Where where we were um where we parameterize force field, uh a force field to reproduce these torsional energy profiles for um for a protein force field on the left And we are currently in the middle of adding parameters for phosphorylated amino acids on the right. Um, so um um, I would like to point your attention Maybe a little bit closer to the to the scatter plot of the qm versus mm energies that that we have here and um, and and mainly I'd like to mainly I'd like to show you that in the That in the initial scatter plot you have a lot of um You have a lot of blue points that are below the diagonal line, which means that you have like You have confirmations where molecular mechanics is predicting a lower energy and quantum mechanics Which means that um, if you if you run a simulation using the initial parameters You'll get a wrong equilibrium structure And that is I think first and foremost the type of problem that reparameterization should correct So that is why all of the so that's why the confirmations and energies In the status that are not equally weighted. Um, we have larger weights for lower energy points And we also increase the penalty on confirmations where the molecular mechanics energy is lower than the quantum mechanics energy So that after a few optimization cycles, you see that the predictions of the force field are not just close to the diagonal line but mostly above um and um and and and lastly Even if you have a lot of data, that's not going to completely prevent your force field Into going somewhere far away of where it is no longer accurate. Okay, so um, um, so so so a good idea Is that um after you fit your parameters you perform additional molecular mechanics minimizations And then add this quantum mechanical data as extra targets. Um, and this is basically going to um This is going to basically root out and remove the appearance of spurious molecular mechanics minima That are far away from your quantum mechanics training data. Um, this wasn't my idea and so um And and then after doing this for a few cycles, if you no longer encounter spurious molecular mechanics minima then um Um, then what's some reasonable confidence you can you can say that But the parameters are at least pretty good for this degree of freedom. So that's how we plan to use this data in the You know in the short term. Great. Thanks so much. Leifing um, so we uh after feedback from the october meeting the virtual meeting We are working on a bespoke torsion fitting tool that will reuse many of the same components, but then run on your laptop or run on a um, uh local computing resources with the idea that for a free energy calculation For example, you might have a congenic series that you'd like to generate all the fragments for And parameterize very rapidly to refit high quality torsions. We're hiring somebody to lead this project And are working on that right now and it might be something we talk more about and get some more Feedback about your compute environments. We also love input on what compound sets We should prioritize for the first sprint of generating data and fitting for the first improvement of torsion So that might be a good afternoon discussion topic as well And with that I'd like to thank everybody all the folks who's who spoke and all the folks who have contributed Also, I want to thank the folks at Pfizer for a lot of conversations about the fragmentation Schemes that they had used which very much heavily influenced our fragmentation pipeline Um, and any questions. I'm happy to answer them either from zoom or from the audience here