 Okay, so today I will talk about fragmenting molecules in such a way so that we don't destroy the chemistry around the bond. And also we want to make sure that we capture nonlocal through bonds effects for these torsion scans. So, one second this. Okay. So why do we need to fragment molecules. I'm going to avoid the high computational cost of running quantum chemistry torsion scans. So, most quantum chemistry methods such as hard to fact the FT scale poorly with molecular size. And when I'm showing over here. Over here is the data that we have from QC archive on the x axis you see the heavy atoms on the y axis the CPU time for one. Yeah, for one gradient optimism for one gradient evaluation. And as you can see it scales poorly with molecular size now if you overlay the distribution of small amount of FDA approved small molecules and these molecules were taken from the drug bank. What you see is that you know on average, most drug like molecules have roughly 25 heavy atoms. And from our estimate that would mean that an average torsion scan cost around one million CPU seconds. And in comparison, when you look at a molecule that has 15 heavy atoms that cost approximately 300,000 CPU seconds which is an order of magnitude less so we want to fragment the reduced computational cost. But in addition to reducing computational costs we want also want to avoid intramolecular non local like through space intramolecular interactions that can convoluted the torsion scan we want the torsion scan to mostly include the one for interactions and the conjugation level of conjugation around the bond. And we want non local through through space effects to be taken care of by other by other parts other parameters in the force field such as non bonded Leonard Jones. So what are some problems that we can run into when we fragment molecules. So let's take a look at this example the biphenyl and we'll look at the central bond. What I'm showing on the left is the torsion scan of the central bond and as you can see it looks, you know it can rotate. Now, if you protonate the nitrogen on this molecule the barrier heights increase. If you deprotonate the oxygen it increases some more and when this molecule is as winter is in the zwitterion. You end up with a scan that looks more like a double bond than a single bond. Now if you take these molecules and you run molecular mechanics. You see all these torsion scans as I'm showing on the right, you see all these torsion scans look alike, which is not what we want. So why is this happening. If you draw the resonance structure for this biphenyl what you see is is that the central bond is actually part of the conjugated system. And it is an aromatic bond. So how well we see the problem but how is this a problem when we are trying to automate fragmentation and also parameterization of torsion parameters. One is that most chemoinformatics tool tool kits will label the central bond as a rotatable bond. Another problem is can we use the same torsion parameters for these four torsions, they're the same torsion type, but they very obviously have very different torsion scans. And what and relevant most relevant for this discussion right here is how do we ensure that we do not naively destroy the chemical environment by fragmenting a remote substituent that has a strong effect on the bond of interest that we want to run QC torsion scans for. So we can use the am one vibrant bond order, which is a cheap measure of electron population overlap between two atoms. The way it is calculated is by taking the quadratic sum of the occupied orbitals in Adam, a and B in the bond. And this value the vibrant bond order gives us a number, and the number is very closely correlated with what the closely responds to what chemists think about when they think about bond multiplicity. So what I'm showing over here is the vibrant bond orders for this series of molecules in the neutral molecule we end up with a bond order that's close to one, and it increases until around 1.5 for the sweater ion. So we plot this vibrant bond. We plot the energy of the barrier heights against these vibrant bond order. What we see is that this relationship is pretty linear. What that is telling us is, you know, few things one is that can we use this relationship to interpolate torsion parameters for the same torsion types but in different chemical environments is the chemical environment changes by remote substituent. And also, can we use the vibrant bond order as a surrogate to to determine if our fragmentation if the way we fragmented the molecule destroyed this the chemistry of the central bond. So to take a look at how general this is this linear relationship with the vibrant bond order versus the torsion barrier. I generated the set of substituted fennels where on the x one position at the x one position we have functional groups that span, you know, electron donating and electron withdrawing groups as shown on the bottom. We have x two position. So we generated the set where we have a full, you know, a full spectrum of electron donating withdrawing groups at the x one position and then also at the x two position we created this combinatorial set. And then for each for each substituent. That's at the x one position. So we generated the vibrant bond order for this bond in the chemical environments of all the different of all the all the substituents on the x two position. So in this case what I'm showing is what these vibrant bond orders what the distribution of vibrant bond orders look like so on the on the y axis here I don't know why my pointer is not working, trying to point but it's okay you can just describe which one to look at. Okay, so the x one at the y axis is the substituent that's at the x one on the molecules on the left. And say for enemy to we have this distribution in Burgundy and what's in that distribution are all the molecules that have enemy to at x one. But then all other substituents at the x two for the y one and the y two we also have nitrogen there. And what you can see from this Claude is two things number one there's definitely a trend as the substituents become more electron withdrawing the overall trend is that the vibrant bond order will be lower. And the second thing is that you see that for each for each substituent itself. The vibrant bond order will change depending on what's on the x two position. So now we check we I we set basically took molecules from this set representative molecules around torsion scans on them. And what we found was, what we found was that we still have this linear relationship. So on that again on the x axis you see the vibrant bond order on the y axis to the torsion barrier heights of the torsion scans, and I'm showing a few substituents, and you see that that linear relationship still holds. And then if we look at all other substituents that we had in the set. We still see that relationship so what this is telling us is number one, we should be able to use the vibrant bond order as a surrogate to torsion barrier heights. And so we say maybe to interpolate vibrant bond orders when it comes to fitting torsion parameters but also now we're fragmenting the molecules, the relative change in the vibrant bond order will tell us if we had destroyed the, if we, if we significantly change the chemistry around the central bond that we're interested in running torsion scans for. So our fragmentation scheme right that we're using right now is we take a molecule calculate the vibrant bond orders for the find the retainable bonds, calculate the vibrant bond order and then for our initial fragment we use the scheme that Rupert Feiser scientists have done, which is in this reference, where you, you find you built out, you make sure that you keep all the one to five atoms around the central bond, and then you also make sure not to just not to fragment rings specific functional groups that we don't want to fragment. Now when we get the that minimal that minimal fragment we then recalculate the vibrant bond order for the central bond. Now if this bond order is within a threat user set threshold, then we, we have, we say that we have arrived at a fragment that is representative of the chemistry in the parent molecule. If the vibrant bond order has is, you know, if the difference of the vibrant bond order of the, of the fragment and the parent molecule is above a threshold, then now we have to start building around the fragment. So what we do is so here I'm showing is this fragment here has is is 0.9 the vibrant bond order for the highlighted bottoms 0.99 while in the parent molecules 1.09. So that's according to the data that I had previously a point one difference in the vibrant bond order does lead to a significant change in the torsion scan. So now we start building out around this bond until we reach the vibrant bond order that's within the disruption threshold. So what what disruption threshold should we be using. Well, for that I generated a benchmark set so that we can, you know, validate the fragmentation scheme and determine what what the parameters for our scheme should be. So for that I filtered through drug bank to arrive at molecules that are, you know, within within within the size and the number of rings that we basically I filtered drug bank based on size and some other some other properties and arrived at around 700 of the approved molecules. Then I took those molecules and did an exhaustive fragmentation. What that means generated every possible fragment from this molecule without this without fragmenting rings on rings. And then for each one of those fragments I generated a set of confirmers and then calculated the vibrant bond order for those for those calculators so what for those confirmers. And then we ended up with were distributed distributions of hybrid bond order for every fragment that has the bond that we are that that has that specific bond that we're looking at so in this case we're looking at the parent molecule that's up on the left. So we're looking at the highlighted bond between the cell friend the ring. And what these distributions are showing you are all fragments from that parent molecule that has this bond. And if you look at these distributions what you find is that there's, you know, you see this somewhat clustering of fragments. You can see at these fragments in these different clusters that are colored. There are have you know have these different colors. We find that they're actually important chemical changes, remote chemical changes that causes these shifts in the vibrant bond order. The colors are based on a score that we give of calculating how far this distribution is from the parent molecule and in this case, I'm using the NMD which is the maximum mean discrepancy. And on the right what I'm showing is what we have in the x axis is this distance score this NMD score and then on the left on the y axis we have the CPU time one, the estimated CPU time of a gradient evaluation for this fragment we want to end up we want to end up with fragments that are at the lower left corner which are cheap. They're not too expensive but they're also, they're also close to the chemical environment in the parent molecule we don't want to end up with small molecules that are just too far away, or you know large molecules. Now with our fragmentation scheme, we end up with a fragment that circled in red, which is good it's small and it is the parent the the chemical environment hasn't moved too far away from the parent chemical environment. Now after looking at all these molecules that I had shown before is what we found was that certain bonds are more sensitive than other bonds which makes sense. Because it's the conjugated bonds that are part of conjugated systems that will be more sensitive to remote substituents what I'm showing over here are these, you know, three just representative molecules. And the more sensitive the bond is the, you know, the color, the more red the bond is the more sensitive it is. But then what I'm also showing is which functional group, which chemical moiety, it is sensitive to. So on the, on the left you have that, you know, the red bond is sensitive to the protonated nitrogen, and so on. So now that we have this benchmark set we ended up with like a roughly around 300,000 fragments. We can now benchmark the set to see what kind of threshold we should use. So, what I'm showing over here are these are the molecules from this benchmark set and we fragmented them using different thresholds. So on the left I'm showing a very small threshold. So if you have a very small threshold you do end up with very close fragments to the parent but you're also ending up with larger fragments and the computational cost is too expensive. If you use a threshold that's too big like a threshold of point one. Yes, you end up with many small fragments but you also end up with many of them that are, you know, their distance score is just too high. The threshold of point oh three is where you end up with the most fragments that are in the lower left quadrant where they're, you know, they're not too expensive, and their distance score is not that high. So now when we compare the these distance scores to just using the sim the the rule based scheme that we have from the Pfizer group, which, you know, for those schemes for many cases it does find good fragments but for the cases where you have sensitive bonds to constituents, what we found if you look at this at this plot we're looking at the difference in those distance scores. And, you know, we sometimes have the Pfizer does perform better, the simple scheme does perform slightly better than the than than our scheme but in those cases it's only performing a little bit better we have several that are many that are equally good but when in many cases and showing in the blue part of the distribution when we do better, we do, we do, you know, we do better in those molecules where we have the sensitive bonds. So now after looking through the set is what we found was which chemical groups induce these non local effects. And what I'm showing at the top of the slide are the groups that we found the functional group that we found to be to induce these long these non local, you know, long range through bonds effects. Now again this is not exhaustive this is just looking at functions that were in that set to begin with and there might be other groups that that induce this long range effects. So the, the figures that I'm showing at the bottom is showing what what does it look like so the blue the molecule that has the central bond highlight light and blue is the parent molecule and the highlighted bond is the bond that we're looking at. And the one that is circled is the functional group that it is sensitive to. So, in the orange. So the blue distribution is this distribution of bb bond orders from calculator at the central bond from a mega generated Now if we look at the orange distribution those are the ones that we get when we just find the minimal fragment without recalculating the vibrant one that are the central bond. And if, as you see that distribution is, you know, if you look at the distance score it's point five four which is in this, in this context that's a pretty large score. So if you look at the green one is the, that's the that's the fragment that you get from recalculating the boundary and then growing out. And in this case, it found that protonated nitrogen. And now if you look at the distance score it's a lot smaller. And when I'm showing on the, on the right is another example but this time we're looking at the protonated oxygen. So, to summarize, what I've shown is that the vibrant bond order is a good descriptor of a bonds chemical environment. It is correlated with a torsion barrier height, and we can use it as an indicator if the remote if removing a remote substitute altered the chemical environment and we also found these eight chemical groups that have these long range effects on for future directions of this project is is looking at interpolating towards the force constants from the vibrant bond order. And then, you know, currently, the scheme that we're using where you calculate the vibrant bond order and then you have to recalculate it, even though it's cheaper than running torsion scans, it's still somewhat expensive so maybe we can use some machine learning models from the data set from the from the exhaustive fragmentation data set that we have to learn which chemical groups and which bonds we need to be careful about that should not be fragmented. And with that, I would like to acknowledge our collaborators the force field initiative. And simply I want to I want to thank Chris Bailey for for initiating this project and the my program NSF MLC for funding and of course the QCR check archive project where all the torsion scans that were around for this project are available. And with that, I can take questions. Any questions. Aside from fragmentation, are there like, so thinking of using this for like, using some of the same ideas for parameter interpolation what are the other specific like chemical series or environments you would want to look at next if you have more time. Beyond the substituted fennel set. Yeah. Yes. So I think what we need to do is by fennel's besides of just so what the set now what I have now are substituted fennel's we should look at by fennel's, we should also look at, you know, different kinds of conjugation like hyper functions. So you know, some combinatorial sets of some, you know, smaller fragments with different substituents at the different positions around the central bond. So like think about the one four or one five positions and have let's say, you know, some like I don't know, you know, look at oxygens and nitrogens and forings and you know just general, general functional groups that we know are donating our withdrawing around those central bonds. Does that make sense. Yeah, thanks. And I think it should be possible to design a set that covers a large chemical space. Any other questions. I've got a quick question. How confirmationally sensitive of the WBOs and how important is it sort of exactly which confirmation you used to compute them. That is a very good question. I don't have slides to show here in this talk, but I did do an extensive analysis of that. So the, the bonds that I should let me let me go back in this slide slide back. So, Okay, so the bonds here shown the red the ones that are more sensitive are also the ones that are more sensitive to confirmations. Right. So they're not all bonds are equal, not all bonds are is the vibrant bond order equally sensitive. So yeah, so these are are sensitive to confirmation. I didn't mention it here because you know there wasn't enough time, but we can use something called elf 10 vibrant bond orders which tries to like integrate out the confirmation dependency by looking at by by removing molecules that have strong intramolecular interactions and that remove some of the fluctuations and those those vibrant bond orders in general, I found are pretty consistent. So you can you don't really need to think about the confirmations because you'll get very. They're very consistent so you'll get very, they're very reproducible and you'll get pretty much the same vibrant bond order every time you calculate it so that is the one we're using the fragmentation scheme. And that is it is implemented in open eye. Okay, thanks.