 Hello everyone, and welcome to the fifth webinar in the BioXL Best Practices in QM Simulation workshop. This is the final webinar of this year. As a reminder for those who have been to the other webinars and also to let those who are new know what the goals are at this workshop as a whole. We're really looking at exploring what is a good practice and also bad practice with things to avoid when performing QM simulation for biomolecular systems. So we've invited a range of expert speakers to share their experience based on expertise about applying QM in a variety of areas. So with that, I would like to introduce today's speaker. I'm very pleased to be able to welcome Professor Adrian Mulholland to speak today. Adrian heads the Centre for Computational Chemistry at the University of Bristol. And he has significant expertise in computational enzymology, having built up a body of work over the past 25 years in this area. In particular, developing and applying multi-scale techniques, especially QM for enzymology. And Adrian's interests also span antimicrobial resistance, drug metabolism, biocatalysis, and all things enzyme, many things enzyme related. Adrian founded and led until a while ago the UK's high-end or high-performance computational biomolecular simulation consortium, HEC BIOSIM. He was the chair of the 2016 Gordon Conditional Chemistry Conference, the inaugural Lakshmi Raman Lecture at the University of Pittsburgh. Sorry, it's American, not UK, 2019. And recently he was awarded the John Maurek Thomas Medal for outstanding and innovative work in catalytic science. So with that, I will hand over the virtual floor to Adrian. Okay, so, yes, QMMM efforts. So a long time ago when I was a graduate student, I became interested in enzyme reactions and having spent a little bit of time in a drug company. I was interested in how understanding enzyme mechanism could potentially be used in a pharmaceutical context and also in herbicides and, you know, as other biological targets. And I went to work with Graham Richards in Oxford to do my PhD. I went back to the idea of Linus Pauling, the idea that if you understand the transition state for a reaction, for an enzyme catalyzed reaction, you potentially have a route to designing inhibitors. That's an old idea, which, as we'll see, have some merit that enzymes stabilize transition states. So if you can make a stable molecule that resembles a transition state, then that should bind tightly to the active site and out compete the substrate should have higher affinity. And the corollary of that is if you can design something that stabilizes a transition state, it should be a catalyst. And I said, well, take this. He gave me this paper from guy called Martin Carplus and Ron Elba. He said this is a way to find transition states in big molecules. So why don't you implement this and apply it to an enzyme called triasphosphatizomerase. Because Jeremy Knowles across the road works on that enzyme. And this would be an interesting case to potentially test inhibitor development. Shortly into my PhD, this guy Martin Carplus managed to publish my PhD. So he had looked at the reaction mechanism of triasphosphatizomerase using the charm program, the early QMM interface in that and found a mechanism, many details of which remain relevant and correct today. And he'd used QMM and it seemed to me that this was a good approach and Graham went and Martin was kind enough to accept me into his group. Graham supported me to visit there and I went on to work with Martin in future. I think it's fair to say that QMM methods only really became respectable back in 2013. Of course, Carplus, Levitt, Arshall got the Nobel Prize for multi scale methods with QMM exemplifying those. They have, I think, proved their use and their worth in lots of different applications. Now, of course, there are different types of QMM methods. That's some something that I'll touch on today and you've seen examples of that in the previous webinars in this excellent series. The basic principle is very simple. Of course, that we use an electronic structure method in the active site, focusing the computational effort, particularly there and allowing for, for example, modeling of reactivity in the active site and include the effect of the environment more approximately, for example, by MM point atomic charges, invariant charges and include the interactions generally and everything I'm going to talk about the QM region is polarized by the MM region. There is rightly a strong focus on the boundary between those two regions, often in terms of electrostatics, but something I'd particularly point out is that in all the QMM implementations that I'll talk about, we're including non bonded interactions through typically the Lennard-Jones terms and the choice of those Lennard-Jones parameters to represent van der Waals type interactions is also very important, particularly, for example, of course, as the chemical nature of some of the atoms changes during the reaction. So the Lennard-Jones parameters that you use need to represent all the chemical states through the reaction, unless you're going to change those parameters somehow during your simulation. And typically we've stuck to simple partitioning methods. There are other methods available and we've tested some of those. But in many cases, one can get away with using a link atom, a so-called link atom, which is generally a hydrogen atom placed along the covalent bond between the QM and the MM regions, and then including MM bonded terms between those atoms that are involved any one MM atom. So what have we done with this? Well, I'm going to talk about a couple of what, to me, are classic enzymes and I think these guinea pigs, these enzymes which many groups have worked on now, are important in terms of testing aspects of QMM modeling and the sensitivity of the results to some of the parameters that you may use. And they're also good as tests versus experiment. When we do know mechanistic details fairly with some confidence, we can start to make predictions about experimental observables in terms of its selectivity, for example, or rate of reaction for direct comparison with experiment. And I particularly highlight this example, pH-BH, parahydroxybenzoate hydroxylase. Now, so this is a flave-independent monooxygenase, and this enzyme and some of its similar enzymes like phenylhydroxylase are quite promiscuous. They'll accept a variety of, for example, halogenated substrates. So there can be substituent fluorine substitutions made on this ring, which alter the rate of reaction. And we can compare with experiment and the basic comparison is in terms of transition state theory. So when we calculate a rate coefficient, or sorry, when we measure a rate coefficient, we can compare it in principle with a calculated activation free energy. We should bear in mind that there are dynamical corrections to transition state theory or recrossing coefficient. And of course, tunneling can be important and needs to be taken into account. Generally, and we've looked at this, this transmission coefficient is fairly close to one in terms of classical reactivity. And the most important factor is in terms of determining the rate of reaction is the activation free energy. And what can do calculations on enzymes like pH-PH to see how good your predictions of rate coefficients are, to see whether we are in fact doing predictive calculations. And this is going back many years now. What we were able to do calculations for, this is phenylhydroxylase, do calculations of activation barriers with a low-level QMM method, and find that those barriers do indeed correlate with the log of the K-Cat of the rate coefficient for the enzyme, the experimental rate coefficient. And that shows that this method is indeed predictive. Now, in absolute terms, you'll see these barriers are very high. They're too high. And that's because the AM1 method is approximate. But it does show that even with a low-level calculation, one can start to make sensible and useful predictions about enzyme activity. The other thing I would say here is that these are potential energy barriers. Now, those are calculated, in this case with an approximate reaction coordinate, where we're simply using the difference between two bond lengths as the reaction coordinate. And it is a potential energy barrier, so no effect of entropy, for example. So the activation entropy presumably is similar for all these different phenols. Citrate synthase, another classic example. So citrate synthase is the enzyme responsible for generating most of the energy that comes out of digesting your food, in that it is the position in the citric acid cycle in which two atom units enter for ultimate oxidation. Now, we think we've identified, again, from QMM calculations, the mechanism of reaction, which involves deprotonation of astral CoA, via an enolate intermediate, which is a nucleophile, to form this molecule over here, citriol Coenzyme A. Now, the important thing about citriol Coenzyme A is that it is a kinetically competent intermediate. In other words, you can give this to the enzyme and it will turn it down and make citrate about as fast as it would make it from the natural substrates. So it is an intermediate and that means that its energy should be consistent with the experimental kinetics. So when we predict a mechanism involving formation of citrate Coenzyme A, we want to make sure, we want for the mechanism to be reasonable, the energy of this intermediate has to be lower than the activation apparent, activation energy from experiment for the reaction. Now, this is a rather busy slide, but what it shows is our QMM potential energy profiles for formation of citriol CoA in the enzyme, going from acetyl CoA over here to citriol CoA over on this side. And we believe, so there's a number of profiles on here, this purple one and this dark blue one are with MP2, so Abinicio QMM methods, and they predict that the energy of citriol CoA is indeed relatively low. But remember, so we need the energy of this intermediate to be lower than the activation energy to believe the mechanism. And if we look at the B3 lip QMM result, we can see that the energy predicted energy of citriol CoA is nearly 25 kilocalories per mole relative to the substrate complex, well above the activation barrier. And so if you went on the B3 lip QMM result here, you would say that this mechanism is wrong because the energy of this kinetically competent intermediate is too high to be consistent with the experimental kinetics. So that gives an example of where it's necessary to go beyond density functional theory to make mechanistically sound predictions. DFT is our workhorse for many calculations, but it is not systematically improvable. And we don't know a priori whether results with one functional should be better or worse than results with a different functional. There's nothing to say that one is better than another, all other things being equal. So how well can we do. Well, if we take two of these guinea pig enzymes, charisma, mutase and parahydroxybenzoate hydroxylase, we can do calculations where we can test the level of QM theory in our calculations to see how well we can predict the barrier. We know the mechanisms for these two enzymes well. These are well characterized. This is charisma mutase is a Claisen rearrangement reaction. PHPH is an electrophilic aromatic substitution reaction. Now if we do calculations with a variety of different QM methods, you can see is the barrier differs greatly. Now this is the sort of table which computational chemists always get stick for presenting because it is lots of acronyms and lots of numbers. But in essence, focus on the numbers. This is charisma mutase. And this is the average barrier calculated from multiple QMM calculations for multiple pathways. And this is PHPH the other enzyme of these two. And you can see immediately the barrier is as expected much too high. The experimental activation enthalpies which are known in these cases are about 12 kilocalories per mole. The barrier with Hartree Fock is much too high. On the other hand, the barrier with B3 lip is too low as is often the case not always but often DFT gives you barriers that are too low. Localized MP2 calculations. These barriers are a little bit too low compared to experiment. But at the highest level of QM theory, using local couple cluster calculations with an approximate treatment of triple excitations, we get activation enthalpies, which are close to the experiment within about a kilocalorie per mole. In other words, these barriers are chemically accurate. These are first principles, calculations being QMM calculations. And we know that couple cluster involves the lowest degree of approximation to the Schrodinger equation of any of these amnesia methods we've tested. We know in the right cases that it can provide chemically accurate predictions of activation barriers just as it is doing here. Now what do these numbers in parentheses mean? Well, this is the standard deviation. So we've sampled here multiple configurations from low level QMM dynamic simulations and calculated multiple barriers. And why is this an activation enthalpy? Well, that's because we've calculated potential energy barriers and added corrections for zero point energy and for thermal vibrations. And you can see that within the apparent accuracy of the calculations, the uncertainty due to conformational variation, the agreement is indeed within chemical accuracy. Now those are activation enthalpies, not free energies. So we went on to calculate activation free energies and we've done this by using the low level molecular dynamics, low level QMM method to calculate activation free energies. So we can use methods such as AM1 or DFTB, which are, as you know, much less computationally demanding. We can use those in MD simulations, including in, for example, umbrella sampling MD simulations and use those to estimate activation entropies very roughly by looking at the difference between the activation free energy and the activation enthalpy at the low level. The difference gives you the estimate of the activation entropy. And the long and the short there is that for charisma mutase our estimate of the entropic contribution to the barrier the T delta S contribution is probably fortuitously but is very close to experiment. When this was work with Walter Teal, the late Walter Teal, we were able to show that the activation free energy is again within chemical accuracy of experiment. So what does that mean? Well it means that QMM methods in the best cases with the highest level QM methods such as couple cluster can give you results for activation barriers which are about as accurate as can be measured in the lab. So remember that you don't measure an activation barrier in the lab. You derive it using transition state theory from experimental kinetics. But the agreement between the experimental barrier apparent and the calculated barrier from QMM in the best cases is within chemical accuracy. That's about as accurate as it can be measured. So if we can do chemically accurate QMM calculations. First of all, it implies that the theoretical framework transition state theory which underlies the comparison we're making is a good framework. Dynamical effects can't be very large. If we're getting such good agreement based on free energy barriers. Importantly it shows that with QMM methods you can make reliable predictions about reaction mechanisms in enzymes. There are lots of mechanisms in the biochemistry textbooks that are wrong. I'm afraid because those mechanisms were written down at a time when all one could do is speculate from experimental kinetics and perhaps from music genesis and write down what looked a reasonable mechanism. But enzymes use some quite unusual chemistry and unfortunately intuition and well informed guesses are often wrong. So to be confident I would say of any enzyme mechanism it is necessary to do a good calculation and good QM calculation to show that you're getting a sensible mechanistic prediction. And at that level we can be confident about our predictions. We can also start to understand how catalysis arises. So this is pH pH. This is calculation done in Walter Teal's group. And this is the OH group in flight. So this is the aromatic ring which is going to be substituted. So the OH group is going to attach itself to this C3 here. And what we find is the transition state has an interaction which is specific and individual to the transition state. So hydrogen bond is formed with the OH group with this conserved proline residue and that's only formed in the transition state, not in the reactant, not in the product. It is specifically a catalytic interaction. It is an example of how or what Linus Pauling proposed of how an enzyme stabilizes a transition state. Similarly in charisma mutates. If we look at the electrostatic properties of the ground state compared to the transition state. So this is this blob here is the electrostatic potential difference between the reactant and the transition state. If there was no difference the blob would be green everywhere. It is where it is red or blue. That shows that there is a significant difference in electrostatic potential between the ground state and the transition state. And what the enzyme does is to place charged groups in positions specifically to recognize that difference and stabilize the transition state relative to the ground state. So how do we know that well if we take multiple pathways from this enzyme QMM pathways calculated in a similar way to what I showed you earlier. We can look at the stabilization provided by the protein as the reaction proceeds. So remember in this case in charisma mutates a simple class and rearrangement reaction. And it means that we have a very straightforward separation from between the enzyme and the reacting system. The QM region in this case is simply the substrate or transition state or product as it reacts and the MM region is the surrounding protein and solvent. We calculate potential energy barriers sampled from low level QMM MD. We can see that the amount of stabilization this is going from reactants over here on the left to products on the right. And the transition state is here in the middle. What you can see is the stabilization provided to the QM region as the reaction proceeds is maximal at the transition state. And the amount of stabilization varies. So different configurations have slightly different amounts of transition state stabilization. And the amount of stabilization determines the catalytic properties of that particular enzyme configuration. So the barrier for reaction the instantaneous barrier for reaction in the enzyme depends on how well the transition state is stabilized. That is the crucial factor in determining how fast the reaction proceeds in the enzyme. There's a good correlation here between the transition state stabilization and the barrier to reaction. Of course, that barrier in reality will be dominated by those configurations that have a lower barrier. And we can we can go on to show that this is indeed the origin of catalysis in this enzyme. The catalysis strictly defined is the lowering in barrier or the acceleration of rate by a catalyst compared to an uncatholized reaction. In the case of charisma mutate, we're able to do a very straightforward comparison between the reaction, which takes place in water, the unimolecular reaction of charisma and in the enzyme. We can calculate by the same approach I've already talked about multiple barriers in water and multiple barriers in the enzyme and see if we those do account for the difference in barrier, the apparent difference in barrier observed experimentally. So here's our average calculated bar in the enzyme 12 kilocalories per mole at this level of theory, which is B3 lip QMM calculation, and it's significantly higher in water. So this is again one standard deviation over multiple pathways. So we can see that the difference between the barriers and the two environments is indeed significant. And this arises because the enzyme is better set up to stabilize the transition state. So we can see that multiple pathways again. The barrier to reaction correlates with the transition state stabilization ability of that particular configuration, whether it's a water environment or an enzyme environment. Now, overall, catalysis, the difference in rate comes from transition state stabilization and also some confirmation effects. Again, we can use QMM methods to estimate this. Overall, the difference in activation free energy is about nine kilocalories per mole or a difference in activation enthalpy about eight. So it's primarily an energetic effect. And from a QMM calculations I've just shown you we can see that the enzyme is better stabilizing the transition state. Water is on the whole. And that's crucial to lowering the barrier. We also can analyze the difference in the energy of the substrate between the two environments and we find that there is a small difference in the energy of the substrate in a reactive confirmation in the two environments. And that contributes a small amount of barrier lowering to and also from QMM MD simulations, we can analyze the relative free energies of different confirmations of the substrate reactive and unreactive confirmations. That also is important. So all of those contribute to catalysis in the enzyme. All of those effects, I would say, are probably due to the fact that the enzyme is set up to recognize the transition state. By binding the transition state it favors a reactive confirmation and it favors a confirmation which is closer to the transition state. So this emphasizes that transition state stabilization is at the heart of enzyme catalysis in this case for charisma mutates. And the conformational effects and so on are probably a consequence of transition state stabilization rather than something distinct from it. But we can also do calculations. So charisma mutates, as I say, a guinea pig enzyme a test bed for QMM calculations. And finally, have we tested the level of QM theory and QM calculations. We've compared against, for example, large scale QM calculations. This is calculation in using the one tap linear scaling QM method. This was working collaboration with Chris scolaras and Mike pain. The goal to do here was to model the reaction within a large QM region of fully QM calculation 2000 atoms in the system, optimize the transition state within that system. And we were able to show that the results from a large scale QM and calculation are very similar to those from a QMM calculation. And differences arise from the different levels of QM theory treatment. So this is an indication that the QMM approach works well for explaining charisma mutates catalysis. We also tested this, for example, with decomposition analysis of QM calculations. Okay, so I've shown that it's possible to get chemically accurate results for QMM calculations on enzymes. Now those calculations that I showed you used local approximations, which are well defined, but they're difficult calculations to employ because it means the user has to define domains, domains which don't change during a reaction. And that requires some expert knowledge and experience. An easier way a much more straightforward way to access high level potentially chemically accurate QMM treatments is to use projector based embedding. So this is a technique developed by my colleague Fred Mamby and his collaborator Tom Miller and coworkers. And it allows for the rigorous treatment of a high level region, for example, with couple cluster within a density functional region, a larger density functional region. And what we've done is to incorporate this within a QMM framework. So this is an additional level of QMM treatment. We have a DFT region within our MMM environment. And within the DFT region, rigorously, we have embedded a couple cluster or MP to region. Now, why is that important? Well, if we look at going back to citrate synthase, and this is for the first step of the reaction, if we look at barriers to that reaction, the QMM and barriers calculated with different QM methods, you can see the barriers differ enormously. Hartree-Fock again very high, B half and half lip, somewhere down here in the middle, B three lip, PBE and LDA. These are very different barriers, even between the density functionals, they're differing by maybe as much as 10 kilocalories per mole. If we use projector based embedding, all these QMM calculations, in other words, if we define a QM region within our density functional theory region, that dependence on functional disappears. So we calculate potential energy barriers here with density functional theory. And then we can use any of those DFT QMM profiles with projector based embedding and get very similar answers. So we can predict those barriers with confidence. So let's look at charisma mutate charisma mutate again reacting to product in this stationary arrangement reaction. The barriers differ enormously between different density functionals, all respectable functionals. None is better or worse a priori than any of the others. But the barrier here is much too low, and the barrier up here much too high. So if we take any of these QMM calculations and define a high level, in this case SCSMP2 abinacea region within our DFT region, that dependence on the functional disappears. So this use of abinacea methods within a QMM context with DFT M M methods means that you remove the dependence of results on the choice functional. So it is a route to removing uncertainty due to DFT. Other things we should test. Well, we should test the size of our QM region. We've done this for reaction of charisma mutate again. Different numbers of water molecules give different barriers. So as the size of the QM region changes the barrier changes. And this gives an example of how much that's likely to be. It's a smaller variation when we do the calculations with projector based embedding. I would say a couple of things here. One is that the variation in the reaction energy and barrier is much less dependent on the size of the QM region than it is on the choice of density functional here. Functional. So one should focus more attention on a good QM treatment on the size of the QM region in this case. The other is that while it's attractive to think that a barrier should converge with ever increasing size of a QM region that isn't necessarily the case. So some of the limitations of a QMM approach will in fact increase with system size. So as we scale up the size of a QM region within a QMM calculation. Of course the QMM boundary becomes larger. And if there is excessive polarization, for example of that boundary, that error will increase. So it isn't necessarily a good test of a QMM calculation simply to increase the size of a QM region. And that's doubly the case if you're dealing with a heterogeneous environment like an enzyme where the charge of the system may differ as you increase the size. So it's projective based embedding not only to look at simple organic reactions but also to look at metalloenzymes. So we've recently looked at Rubisco. And again, we find that we can use projective based embedding to remove the uncertainty in barrier and reaction energy, due to the choice of density functional. That's an indication of how well we can predict barriers in absolute terms, but generally we're often interested in understanding selectivity. So not the absolute barrier but rather the difference in barrier. So if the same reactant can form two different products, can we predict the difference in activation energy? Well, we can. So if we take an example of lethal synthesis of fluorocetrate by citrate synthase. Now, what the enzyme does, it removes preferentially one of two prociral protons. It's a very subtle difference. It removes this proton rather than this one. And by doing so it forms this form of fluorocetrate, which is toxic. This inhibits a connotase and this is what makes fluorocetrate, fluoroacetate poisonous. So we've used, for example, as rat poisons. Using QMM methods, we're able to calculate the barrier for deprotonation of ASTAR CoA for either the PRO-R or the PRO-S hydrogen. And that the difference in barrier against sampling multiple profiles, calculating the barrier as an MP2 level, the difference between the two energy profiles is consistent with the experimentally observed difference in specificity. Another example of specificity. So P450s, these are enzymes that are important in biocatalysis and also in, for example, drug metabolism. These are classic examples where we know from experiment what the selectivity is. So for example, P450-CAM, given propene, forms only the epoxide. Given cyclohexene, P450-LM2, similar with CAM, forms a mixture of these two products, hydroxylated and epoxide products. So why is that difference observed and can we reproduce that rather subtle effect? Well, we think we know the mechanisms involving compound one in both cases. And what we should find is that we get a lower barrier from our calculations for epoxidation of propene, whereas we should have similar barriers, roughly speaking for hydroxylation and epoxidation for cyclohexene. So that's quite a subtle effect. So these are details of our QMM calculations. We've done quite a lot of work on P450s. We're working with the same sorts of setups we've used before using B3 lip or other density functions, B3 lip in this case, embedded within a QMM framework. What's crucial and what I really want to emphasise is that it's absolutely essential to consider the conformational behaviour of the substrates as well as the protein. So if we look at MD simulations of cyclohexene, basically the substrate can tumble and rotate. This is relatively small substrate and it can move around within the active site. What does that mean? Well, that means that, similarly for propene, you can see that as you would expect a small molecule in a relatively large active site, it can move around quite freely. So what do we do about modelling a reaction? Well, in contrast to the previous cases, we can't run MD here, large amounts of MD with QMM because we're using DFT for our QM method. So we have to do our sampling with an MM-MD model and take configurations from those MD simulations and use those to model reactivity. Now, to do this, most of the configurations are not reactive, so we need to choose configurations which are reactive, that they are relatively close to the transition state. But we have to be careful that we're not being subjective in this choice. So we have to sample the coordinates, configurations, such that we are taking into account the free energy penalty for forming a reactive confirmation and I'll come back to that. So we select confirmations and from our knowledge of the transition states for these processes, we're able to define what rough distances and angles would be for reactive confirmations and doing so we can sample and model different configurations and their reactivity. And what you can see here is that the barrier for epoxidation for cyclohexene in P450 cam is very dependent on the choice of starting configuration, differs by 8 or more kilocalories per mole, and of course the lower barriers will dominate reactivity. And it's important to take that into account. So you need enough snapshots, enough configurations, and then it's important to Boltzmann weight those barriers, lower barriers. We do an exponential average of the different barriers. We can take into account that the more reactive configurations dominate reactivity. So, overall, how do we do? Well, if we use B3 lip, we don't get very good answers. If we include dispersion corrections, we get much better answers. The difference in barrier between epoxidation and hydroxylation is about right for cyclohexene, which is encouraging, but it is not correct for propene. And why is that? Well, it is probably due to limitations of B3 lip in terms of its underestimation of barriers for hydroxylation, but also potentially because we can only sample a limited number of configurations and model the reactions of a limited number of configurations at this level. So, basically, worry about and pay attention to your QM method, and its accuracy, and ideally test it against high level ovenitio QM methods, and certainly pay a lot of attention to configurational sampling, whether you're calculating potential energy barriers or free energy barriers. And this is important in practical applications such as drug metabolism, where if we look at, for example, P452C9 for these clinically approved drugs, we know that, for example, for dichlofenac, this product is favored over this one. For ibuprofen, we can see different products, major products and minor products, because we know the populations of the two products, we know what is the likely energy difference between the two pathways, similarly for warfarin. So if we take our QMM approach, sampling with MD and calculating potential energy barriers with DFT, how do we do? Well, for ibuprofen, we correctly reproduce the preference for abstraction from this carbon C3 over this one. And for warfarin, again, we identified the correct pathway as having a lower barrier, but for dichlofenac, we do not. In this case, the difference in intrinsic reactivity is overridden by a difference in binding affinity. In other words, we know that reaction at this position has a lower barrier intrinsically, but experimentally, this is what's observed. This configuration, placing the four primed carbon close to compound one, must dominate over this. So it's very important also to consider potential differences in binding orientation and binding affinity. Another example of conformational effects, fatty acid amide hydrolase. So again, we've identified what we believe here is the reaction mechanism involving a neutral lysine as a base. And what we find is doing, calculating multiple barriers with QMM, only a small number of those configurations have barriers that are consistent with experiment. Most of them have barriers that's too high, and they're too high by at least 10 kilocalories per mole, which we think is outside the range of error of B3 lip based on calculations I've already shown you. So what's going on here? Well, what we've done is analyzed multiple barriers from multiple configurations, and we can identify different families of configurations. Some of those with significantly lower barriers than others, and it is those configurations with lower barriers, which will dominate reactivity, and we can characterize those. So how do they differ? Well, there's a difference in terms of substrate confirmation, and there's a difference in terms of transition state stabilization. This is a cooperative conformational change, and reaction involves not just the chemical change, but the cooperative conformational change to achieve a reactive configuration. And we can estimate what that cost is just from a simple population analysis from MD and the reactive configurations, which are not sampled very often in MD. They are seen, so we can estimate their energy, and the reactive configuration is about three kilocalories per mole higher than the predominant confirmation of the substrate, the non-reactive confirmation. And that three kilocalories per mole energy difference is not enough to overcome the difference in barrier. So reaction proceeds through a conformational change, and then reaction via this reactive configuration. So that's another case where sampling configurations for your initial state is absolutely crucial. We were also able to use QMM calculations to predict covalent binding, so we were able to look at different potential orientations of a covalent inhibitor in the enzyme and predict that only one of these two configurations allowed reaction. In other words, only one of them had a low barrier. Gratifyingly, this prediction was born out later on by solution of the crystal structure. And QMM methods, I think a very important application of QMM methods is to understand covalent reactivity, as we've recently applied to the SARS-CoV-2 main protease with Vicent Molinaire and others. Another example of a prediction, so low-level QMM-MD simulations, they have their place. Here we did QMM-MD umbrella sampling of the reaction of phosphide dehydrogenase and found that this interaction between a positively charged histidine and methionine is only observed in the transition state. And this appears to be a catalytic interaction. Wilfred van der Donk was brave enough to do mutation experiments, including mutation to non-natural amino acids, and showed that these mutations changed the catalytic properties, but not the binding properties of the enzyme. So again, that's a prediction from QMM-MD, which has been verified by experiment. We can look at quantum tunneling. Again, low-level QM methods allow extensive calculations, so we can apply methods like true laws, charm rate, variational transition state theory framework to calculate kinetic isotope effects. Again, in good agreement with experiment. We've looked at antibiotic resistance, in particular looking at resistance to Carver-Penem antibiotics. So this is an emerging problem. Bacteria that produce enzymes that break down Carver-Penem are a significant health risk, because there may be no effective available antibiotic. Looking at the mechanism for reaction in class A enzymes, which we believe we identified involving several functional groups within the active site. We can calculate the free energy surface for the reaction, and do this in a two-dimensional sense. This is using AMBA, using DFTB, and we were able to calculate free energy barriers for deacylation. So the newer enzymes, which are Carver-Penemases, are able to break down the acyl enzyme with a low barrier, and so deacylate deficiency. We did calculations. So this is work of Vivi Herbanon, and Mark Vanderkamp, Jim Spencer, and others. So these calculations of full free energy surfaces here took about 20 days, but what they're able to do is to distinguish between the older enzymes up here in blue, which have high barriers. Experimentally, we know to deacylation, and the newer enzymes down here, which are efficient Carver-Penemases. So these are the older enzymes. Sorry, these are the newer enzymes, which are a threat. These are the older enzymes, which have high barriers, which are inhibited by Carver-Penemases. So these QMM free energy surfaces are able to predict the activity of these enzymes to a specific substrate. But we wanted to reduce the time of the calculation. So we did what is the opposite of what one would normally do. So we reduced the amount of sampling we did in our QMM MD simulation. So simply looking at the minimum energy path, just following the minimum energy path across the surface, rather than calculating the whole surface, we were able to reduce the time per calculation down to about 38 hours. And still the predictions are significant. They're significant in showing a much lower barrier for the Carver-Penemases and for the inhibited enzymes. And we can reduce sampling still further. So going down to only a few picoseconds per step for sampling, we can still reproduce this difference in barrier. And that means that these calculations are predictive, but they can be done on a time scale where they are competitive with experimental assets, just in sheer time measurement, which I think is interesting. What's also important is because the calculations reproduce the difference in barrier, they're capturing the essential chemistry correctly. So within those active sites, within the calculation is something that explains why the enzymes are different in terms of their activity against different antibiotics. And an example of how we can identify those features is further work that Vivian has done with Mark, which is to look at class D enzymes. And there we find that small differences in the hydration of the catalytic base here account for the experimental difference in barrier against a particular type of antibiotic. So the better hydrated, the carboxylate group, the less reactive it is, and that subtle difference in hydration accounts for the difference in reactivity. Now, I could go on, but I'm conscious that I should take questions and I would like to allow some time for questions from one implied that an approximate CC algorithm. So, the accuracy of couple cluster, well, there are of course different implementations of couple cluster. We have used some different applications, there's been some interesting work recently on limitations of couple cluster that I've seen. That was the best method that we were able to access in those QMM calculations. I'm not saying that it's perfect. And we are of course dealing with single determinant system here. But what we're striking is that the, without any fitting, the agreement between those couple cluster calculations and experiments is really very good. Whereas at lower levels of QM theory is less good. So, it's not the be all that end all. But I would say that for practical applications, a method such as SCS MP2 provides a very good balance of accuracy and computational efficiency. So on the whole we would tend to use SCS MP2 rather than some flavor of couple cluster. But with projector based embedding, you can use whatever method you wish. And if you don't like a perturbative treatment of triples, you could go on and use a different method, probably at a higher computational cost. Thank you. In the interest of time, I'm not going to unmute people individually. So please feel free to continue as I think you're about to do. On the commentary be offered on the extent of locality and validity. Well, that's, I think that's the discussion for another time. So all localization schemes have their limitations. And, you know, can go into lived in analysis and everything else. So those really are details of the QM treatment. They're important questions. But so I think slightly more detailed than we can go into for this seminar. What I would say is important in this context, whether you're using local approximations as we did in the first applications, whether you're using projector based embedding. You certainly test different domains and different embedded regions, respectively, to test the effects on the results. You chemical intuition helps, but certainly you want to know that you are embedding a large enough region. How do we find active and non active ligands? I'm not sure I understand that question. So usually we're finding ligands from. So we've been testing against experimental data on active and inactive ligands. And in a QMM calculation, I guess you're not usually very interested in, in ligands that that are completely inactive alternative substrates. So many enzymes, particularly mammalian enzymes are very specific and may not turn over a variety of substrates. So material enzymes that are more promiscuous are good places to look for alternative substrates. How do you decide where the QMM boundary should be? Well, that's a very good question. There are now quite a number of published examples from many groups of different classes of enzymes, which help you perhaps to see what good choices have been made before. Some sorts of chemical system are much harder to partition than others. So it's, it's hard to partition DNA, for example. Relatively speaking, compared to a protein and protein, usually if you can deal with side chain reactivity, the partitioning can be relatively straightforward. It's more difficult if you have to partition the peptide backbone. I think in time, an automated way and people have been working on approaches to do this, of where you can draw the line most effectively, is, is a good and general approach that we need. But for the moment, generally, I would say chemical intuition in most practical applications. Ligand protein interactions, yes, they certainly are important. Solvent effects. Oh, well, let me, let me show you. I did want, I would like to say something about testing of water models. So we can use, as I alluded to in the title, we can use QMM methods, not just known in reactivity contexts, but also in contexts like the prediction of binding affinities of different ligands. So if you think of a free energy cycle like this one, where typically you're interested in understanding the relative binding affinity of ligand A. Sorry, Adrian, at the moment, at the moment, I think you're not sharing whatever you think you're doing. So, when you, when you're doing a calculation of, you know, what does one ligand bind better than another so we might be comparing ligand A over here with ligand B over here, and we want to know by doing some sort of FEP or other free energy calculation, we'd like to know, does A or B bind more tightly to the protein. So ideally we would do this, not with an MM force field, but with a QM force field and you can do this. You can do this by perturbing from an MM treatment on to a QM treatment of your ligand and for a fixed geometry of the ligand that's particularly easy, and then you only have to worry about changing the QMM interaction and define a lambda parameter, which will change from, if it will work properly, change from a QM description to an MM description or vice versa, using replica exchange thermodynamic integration. And we can, by doing so, we can work out the difference between a QM and an MM representation and quantify it, calculate that delta G. But for the same ligand in different environments, we can, for example, test the limitations of the MM description, for example, the lack of electronic polarization. And if you're interested in doing these calculations, these can be run in the SIA program. I've given a link here. This is a tutorial, which you can look up if you like. And what we've done is to test perturbations from a QM description to an MM description of water within water. And we find that different QM methods give different free energies, not surprisingly. For example, Hartree-Fock reflecting is a very polarized nature, gives a large free energy difference. The smaller this free energy difference, the more consistent the QM method is with a particular MM water model. And you can see if I just pick one here, so BLIP and TIP4P, the calculated free energy difference is very small, zero within the accuracy of the calculations. So that means that this combination of a QM method of BLIP, in this case, with TIP4P is a good combination. They're consistent with one another. So we can do the same thing for water molecules within a protein complex, in this case influenza neuraminidase. And if you just focus on these numbers on the right, this is where we've done the perturbation with BLIP. And we're looking at water molecules in different crystal structures. The fact, the important thing is that these numbers are different for different water molecules. So the QMM perturbation is different in different environments. Compare that to the perturbation in bulk water, which is zero. And it's different. This means that the water molecules on the whole within the protein are more polarized than in bulk water. And of course, this is something that's not taken into account in an MM force field. And we can do this, not just for looking at stable bound complexes, but for looking at, for example, free energy profiles for unbinding, for looking at the change in polarization along an unbinding pathway. Right, I will stop those and go back to questions. Are there easy to follow tutorial on QMM? There are. So I would point to, so HEC biosim was mentioned before. I would point to CCP biosim. So if you look at CCP biosim, sister organization of HEC biosim, CCP biosim organizes tutorials in various biomolecular simulation methods, including QMM. QM zone dependence. So Robert asks about the dependence on the QM zone selection. Well, that goes to the point that I was making with charisma mutates and the size of the QM region there. So not all properties will converge with the size of the QM region. You know, we might get very large QM regions in which the boundary effects are going to be much larger than they are in a small region. So while some properties at the center of those regions should become, should converge, other properties may not. And I think it's particularly important to bear in mind that if you change the size of a QM region in a protein, then you may change the charge of that region. Unless you pay very careful attention to neutrality. And that may force rather unusual QM choices on you. So I think one should look for convergence. In general, as I've mentioned, I think, for example, if you're doing DFT QMM calculations, the effect of the functional is much more important factor in the accuracy of the calculations than the size of the QM region. So long as your QM region is of reasonable size. Choice of basis set. Well, with basis set the larger the basis set the better the result in general. The calculations are less sensitive to basis set and have an issue of methods, but that's something to test. And if you have a basis set effect, use the larger basis set. Zero partial charges on. And that's a very Gaussian question. I'm afraid without looking into detail. I'm afraid I don't know we tend not to use Gaussian for these Gaussian for other things but not, not for QMM and basis set extrapolations for CCSD what there are extrapolations have been made. And I would recommend looking at some of the papers we cite in our projector bedding projector based embedding techniques and our 2006 and Govanta paper. So some starting references on that. The harmonic qualities of low vibrational modes. It is not trivial. No. And in general, we would not. We don't use harmonic estimates of we certainly wouldn't use them for entropy. They'll be highly inaccurate for entropy. We don't use them specifically for estimates of zero point energy in some cases. And we have looked at an hominicity corrections in those cases. Can you use an electric field? Well, yes, in principle, you can use an electric field and some of the catalytic properties of enzymes come from the electric fields at active sites. How is the transition state and product state sampled in an MD simulation. We would apply for a transition state, you need to apply some restraints and we typically we would apply restraints to the forming and breaking bonds to stay close to a transition state like confirmation. How long is your MD simulation. Well, I've shown several different MD simulations there. In the case of low level QMM MD simulations where we do umbrella sampling directly. Those make up to nanoseconds. As we showed in our example on the, the assay of carbapenemases. You can you can reduce sampling significantly below that in some cases and still get good results. As always, with an MD simulation longer is in principle better so long as the structure is not being disrupted by the simulation by inadequate force field. How many snapshots. Well, again, the more the better. I think that that's usually limited by the amount of time needed for every calculation of a of a barrier. So, perhaps of the order of 10s of configurations. Julia asks, is there any frontier term in the projector based embedding Hamiltonian for the QM region. There isn't a from frontier term so we project a based embedding approach project that region out rigorously. So there is no frontier that whole region is. density is scooped out. So, there is no, there's no equivalent of a QMM boundary they are rigorously separated by the projection, the projection that's done. And Gung asks in Orca, there is linear scaling deal PNO CCSD linear scaling method. We have not done that comparison directly there's been nice work by Frank Nazar and others. Those results look promising and I think offer another route to high level. QMM calculations. I think the fundamental point here is that we need different levels of treatment for different aspects of the problem. So, the sampling has to be done at a low level whether that's MM or low level QMM. And in some cases, you have to calculate profiles with DFT. And then those profiles have to be out corrected with a higher level of an issue method such as one of the varieties of couple clusters we talked about before. And the question about what software do we use for QMM MD FEP. Well, we, we've used different programs for different applications. So my background was in charm so most of the early applications use charm. We use AMBA very much nowadays as well. And we use those for, for, we use AMBA for, for example, DFT be QMM simulations. We've used for DFT QMM calculations. We've used an interface which was been originally developed by Jeremy Harvey, who's now Luthan, now head of school at Luthan, called COMMA, which is an effective way to couple together a program like Jaguar with Tinker. And for our high level calculations, we're using mold pro. So no single piece of software does everything that we want at the moment. So the free energy calculations that I showed at the end for the QM to MM perturbations. We use the SIA program developed by Christopher Woods. So Islay asked about which program would you recommend for preparing input. It's a very important question. Setting up your model correctly. The beginning is crucial that include for a protein that means checking protonation states, potentially testing different protonation states of ionizable residues. It also means solvating the system and having water molecules in sensible places. One thing I would point out here is a tool called Enlighten, developed by my colleague Mark van der Kamp, who's in biochemistry in Bristol. Enlighten is a tool specifically to prepare a protein crystal structure for a QMM simulation. One of the CCP biosim tutorials which Mark led a training class on is on the use of enlightened. So I think it's a very useful tool for practical setup of proteins for QMM. Andrea asks about DFTB sampling and activation energy estimation. That is certainly can certainly be a problem. DFTB is very useful method and invariance and DFTB 2 and 3. They're very useful methods in many contexts, but of course they do have limitations. You can have problems where protons may move in ways that are unrealistic parts of the system where you don't want proton transfer to take place. Something I would always say is check the structures. So always check structures, always look at your structures because something may be going on that you're not expecting. Occasionally that can find something interesting and new and nice and more often it may be an error, but yes, you're quite right. There can be errors due to that. Is the type of QMM important? Embedding important. So just so Maxime asks this. So yes, I would say so as I think the electrostatic effects of the protein environment are actually absolutely crucial. So I think it is necessary to include at least the electrostatic effects of the nearby residues to get an effective description of the environment in the active site. So polarizability. Well, not many have not been many applications to end zones of QMM calculations with a polarizable MM force field. We have tested MM polarizable force fields. It adds an extra level of complexity and potentially an extra level of self consistency. So this is the definition of the two different regions. I think there is a need for more accurate MM models for more physically realistic models. We showed at the end, the effects of polarization on drug binding. It's significant and when the QM method is highly accurate, then certainly it's important to to go back and improve the description of the MM method to improve accuracy further. But I think I would say that there aren't a lot of applications of that sort yet. And as I tried NAMDI QMM but it's scaled badly. We haven't really used the NAMDI GOES quantum interface yet or has been used effectively by some people. I guess the question there would be is the scaling due to the QM method or the MM method. That's worth looking at. Hello, Kami. Thank you for joining. In your calculations you use one method over another. The basis set effects. This goes back to the basis set question from before. Generally B3 lip or other density functional methods are less sensitive to the choice of basis set. We tend to test that with single point calculations with that functional with different basis sets. We're choosing the smallest basis that we can get away with for the geometry optimization. MP2 of course is much more sensitive to the choice of basis set. And so there really we need something beyond augmented double zeta at least I would say. Keir, how are you? Projector based embedding seems to do a good job. But can you reproduce activation enthalpies and entropies for Arrhenius plots? Well, that's a very good question. I think that would be a challenge. I think potentially within a hybrid scheme, but that would mean sampling of course at different temperatures. And then generating multiple profiles at different temperatures. So I think that's a very good question. I would say, and as you know, better than I do, methods such as EVB are more computationally efficient. So if you're looking at temperature dependence of rates specifically, as opposed for example to heat capacities, then you might be better off doing an EVB calculation. Hannah asks if there are enhanced sampling methods to sample reactive conformations more frequently in an MD simulation. Yes. We have used metadynamics for example to do that and umbrella sampling. The difficulty really is knowing what the reactive configuration looks like. If you do know what the reactive configuration looks like, in other words, what the coordinates are that are important in determining the barrier, then you can use a method like metadynamics or umbrella sampling to drive formation of that. And also, importantly, you can use that method to estimate the free energy difference between the ground state and the reactive configuration. Sarah asks about the interaction with the drug. Well, proteins interact with drugs through polar interactions such as hydrogen bonds, hydrophobic interactions, electrostatic interactions, hydrophobicity and hydrophobic effect are very important in drug binding. Francesco asks about the potential applications of QMM to paralysis of plastics. Is it practical to use projector based embedding and corrected with Delta G from simple MD collaboration? I'm not quite sure what you're proposing there, but what I would say is that I think for any system where a potential energy barrier is useful, you get a stable barrier without hysteresis. In other words, you can go backwards and forwards and get similar barriers. Then projector based embedding is potentially a good way to do single point calculations to get more accurate barriers and potentially just as we showed for enzymes, you can combine that with estimates of entropy and so on from other methods. Next question from most of the cases B3 functional gives the correct barrier height and consistent with experimental data. Well, depends what you mean by correct. In most cases B3 lip gives barrier heights that are somewhat too low, sometimes too high and sometimes very close. Consistent with experimental data. Well, if it's too low, you can argue that it's consistent with experimental data. What should you do? Well, I would say in any DFT calculation, you should look at the potential dependence on your choice of functional test different functionals and ideally use a method such as projector based embedding to remove the sensitivity to the choice of functional. I think I've already heard that question from Julia. I think if you are in terms of what's an acceptable error, if your error is comparable to the error of the experiment, and we can estimate some of the errors from experiment, then probably your method is usefully predictive, and that I would say is an acceptable error. And I will stop there. Thank you very much. Of course, thank you very much, Adrian. We had several people, those people saying thank you for the talk, very nice talk and golden information. So really appreciate you taking the time. It's been a pleasure to have you as a speaker here. And with that, I will just mention that we have two more webinars that are part of this workshop on this theme in the new year. So please follow the, for anybody attending, please follow the website for that. So with that, thank you again very much. And yes, best wishes. Thank you. All right. Well, have a good Christmas. Thank you. Bye bye. Okay. Bye. Bye everyone.