 So, good afternoon and welcome everyone to the third BioExcel Best Practice in Q&M Simulation workshop webinar. I don't want to take too much time away from today's speaker, so I'm going to say less than I have than at the previous webinars, but I just wanted to reiterate or state for the first time for those who are here for the first time what the goals are of the webinar and the need of this whole workshop, which is to share and promote best practice in methodology and approach for simulating biomolecular systems using Q&M. So it's really about sharing good protocol methodology, warning about the pitfalls, and also looking at some cases at maybe the software that's used. So, today's speaker is Professor Ulfrude at Lund University, he's professor of theoretical chemistry. Now Ulfrude has a very wide range of interests in terms of the system that he studies, interested in ligand binding protein structure and function, the effect of proteins on bound metals, ligand chromophores. He and his group use a variety of methods, including ab initio, coupled cluster methods, that's intentional functional theory, various kinds of quantum refinement approaches, free energy perturbation, and also, and it has interest in accurate force fields and parameterizations. I've also highlighted here a list of software programs, which I thought was interesting to mention. So Ulfrude's group has developed a number of methods and associated software implementing these methods as listed here. So I'm really looking forward to Ulfrude's talk, which is going to be about obtaining accurate structures and energies with Q&M. So with that, I will hand over to Ulfrude. So thank you very much. I hope you all hear me. And I first of all want to thank the organizers to organizing this very nice workshop, and also to inviting me and giving me the opportunity to tell you about our experience with Q&M methods, how to obtain accurate structures and energies by Q&M. And since this is a series of many different lectures, I will not do very much introduction. Instead, I will concentrate on the methods that are special for our group that was developed in our group or which we find very useful. So I will give you a very, very short introduction, and then I will describe our methods to combine Q&M methods with experimental data. I will tell you a little bit about our QTCP and big Q&M methods. And then I will discuss how you can use Q&M calculations to obtain energy components to understand the results you actually got. Yeah, so in the introduction, I will just remind you what is Q&M. Q&M is a method where we do Q&M for a small part of our system. Our systems in this talk will entirely be proteins. And then you use MM for the rest of it. And the philosophy is simply to add Q&M, MM, energies and forces, just avoiding to double count any terms. And the hope is of course to combine the accuracy of the Q&M method with the speed of MM methods. And in practice, at least we obtain this very simple way. With this equation, so you take the energy of the Q&M system, the Q&M energy of the Q&M system, add the MM energy of everything. And then you have to subtract the Q&M energy of the Q&M, the MM energy of the Q&M system to avoid double counting. So that's essentially a subtractive scheme and it's essentially the same philosophy as used in the Onion method. Although we call our approach the Comcum approach. Yeah, so that's a very short introduction. Now we start to describe how we combine Q&M methods with experimental data in order to obtain accurate structures. So we have done many computational studies of protein systems. And when you do computational studies, you need some sort of structure and you can decide to start with experimental structures or you can use computational structures. And the good thing with experimental structures is that they are in some way the true structure, but the bad thing that are might be problems with them and you cannot directly use them in the calculations. Whereas the computational structures you can use directly in the computations, although you don't know if they are the true structures and that some of the methods can be rather inaccurate. So what we noticed that was that all our projects start with re-optimizing a crystal structure. And then we asked ourselves, can we do something better than that? And to understand or answer this question, we have to look at what are the structures. And since almost all structures we use are X-ray crystallography, let's concentrate on that first. So when you determine a crystal structure on the protein, you have to get your protein, you have to get your crystals, which of course is hard, and then you collect your data. And the important thing to notice here is that the raw data from protein crystallography is not what you find in the PDB file. It's not the coordinates. The raw data is structure factors that you collect. And then with these, you also need to determine the phase and the phase is in principle not to determine. So we have a phase problem making structured bias to the model we have. But there are different ways to solve this phase problem. And once you have the phase, you can get an electron density map. And in this electron density map, then the crystallographer tries to build a model of the protein, so building the coordinates. And then once you have built the model, you start the refinement where you try to optimize this model. And this is an involved cycle of refinement model building where you go forth and back. And therefore we will concentrate on the refinement part. So crystallographic refinement, you could say that's essentially a geometry optimization where your energy function describes the difference between the measured and calculated amplitude or structure factors, the amplitude of the structure factors. So you try to minimize something like the crystallographic R-factor, or often somewhat more sophisticated and maximum likelihood functions. But the R-factor is essentially determined by the difference between the observed structure factors and the ones that you can calculate from your current model. From a theoretical point of view, it's quite interesting that what they optimize is actually not only a target like this, but they normally add also a molecular mechanics function to this. And if you add both a crystallographic function and a molecular mechanics function, then you need some sort of weight factor telling which is the importance, how large is the importance of the MM term, how large is the importance of the crystallographic term. And this is often called the WA factor. And to determine this term, there are different ways to do it, but the normal thing a crystallographic program does is to run a short MD simulation and then ensure that the forces from the X-ray part and from the MM part is of the same magnitude. So in some way you can say that crystal structures, the experimental crystals are actually 50% theoretical already from the beginning, although the crystallographic, the crystallographer doesn't want to discuss that and doesn't want to admit it, that's the fact. And why do we need this MM term? That is simply because the accuracy of the crystal structures is normally not good enough, at least for the medium and low resolution structures. So you could say that you need the MM potential to get right bond lengths and right angles, so chemically reasonably, bond lengths and angles. And to illustrate this, we can say that if you have a rather bad structure and optimize it without the MM, you can get something that looks totally crazy. And then you add the MM term, and then you get something that looks chemically reasonable again. So you can say that the MM potential determines whether bond lengths and angles, whereas the crystallography determines the general fold and the dihedral. You can also note as we use this MM term, it will move the structure a little bit away from the crystallographic raw data, meaning that once we turn on this MM term, we will increase the R-factor. So for when you change only this small group, you see that we change only the fourth decimal, but it increases all the time. But if we did the same thing for the whole protein, we will have an extensive effect. So this MM potential is quite important. And there are of course several problems with this approach. One is that as a computational chemist, molecular mechanics is the least accurate theoretical method. It's fast and it's easy to understand, it's easy to manipulate, but it's not very accurate. Another problem is that the results will of course depend on the potential. And the crystallographer normally doesn't discuss it, but still there is a dependence on the MM potential he used. And often you can at least read in the file and in the papers what he used. So if you have a bad potential, you will get a bad structure, which I'll show with this example. So the original crystal structure with a very strange short contact. And if you optimize it with another potential, then you get a more reasonable structure. And there is a very big difference in this case. The third problem is that different parts of the crystal structure will have different accuracy. So the MM potential that we use for the amino acid and for the nucleic acid and other standard parts of the structure are normally very good because they are based on accurate experimental data. Whereas for the other parts of the structure substrates inhibitors, metal sites, which in some way quite often is the most important part of the structure. There it's much less accurate because you don't have any good experimental data. And even the worst this molecular mechanics potential is often constructed by the crystallographer himself, who is not a theoretical chemist. He's trying to do computational chemistry. So how could you solve these problems? What we suggested almost 20 years ago is the method of quantum refinement. It's a very simple approach. We start from the normal crystallographic refinement energy function. We say that we just replace this MM potential with a QMM function. So we replace the MM potential with a QM calculation for a small but interesting part of the protein. And equivalently, you could say that we can take a normal QMM energy function and add the crystallographic penalty function as a restraint. And in both cases, you get this energy function, which is implemented in our Compromise program, which we call quantum refinement. And does this work? Well, that's not so easy to prove because, as I said, the MM function will always take you a little bit away from the crystal structure. But to prove that we can improve a structure, we decided to use crystal structure for which there are two structures available at different resolutions. So one very good structure, a very atomic resolution structure and one with a lower resolution and medium resolution, you could say. And then what we did was to use only the lower resolution data and refine it with our method and see if we can bring it closer to the higher resolution data. And we selected this structure of cytoprome C2PiPi3, which contains a metal site. That's what we put in the QM system. And this contains heme group and iron, iron, metionine and histidine ligand. Unfortunately, there were some differences between the two crystal structures. So there were differences of up to 0.3 angstrom in the metal ligand distances. And we can see that our method, which is this here, brings the low resolution structure much closer to the high resolution structure by just adding the QM data. And of course, the basis of this is that DFT methods give a very good structure. So already the vacuum structure is much closer to the high resolution data than the lower resolution data. And we could also show that we could improve the R factors. We could show that we removed the densities. We removed the different densities. We removed these different densities when we refined it. Okay, so the method works. What can we then use it for? And what we have done especially is to try to see what you actually don't see in crystal structure. For example, the hydrogen atoms, the protons are not seen in the crystal structure. But perhaps you can deduce them from your crystal structure by using this quantum refinement. And to prove that we once again need a good test case. So we selected the enzyme alcohol, the hydrogenase, which converts alcohol to aldehyde. And in the active site it contains the zinc ion to cystin ligands and histin ligand and then the substrate down to the zinc ion. And here we could have either protonated alcohol or deprotonated alkoxide. So removing this proton. And from kinetic experiments, it's known that if we have the proper coenzyme and the proper alcohol, we have a PQA of 6. And if we take a crystal structure obtained at pH 8, and we know that the crystal structure should contain the alkoxide. And then we see if we can see that from quantum refinement. And what you do then is to do separate refinements, one with the alcohol and one with the alkoxide. And then you see which one fits the experimental data best. And you do that by, for example, looking at the R-free factor. Now this has been done, so this is the alcohol, this is the alkoxide, we have done it for different values of this weight factor between the crystallographic energy function and the QMM energy function. But when we compare things, we need to do it with the same numbers. We mainly look at this with three and compare with this three here. So then you can see that the R factor is higher for the alcohol than for the alkoxide, so the alkoxide is better. We can also look at the strain energy, which is the energy of the QMM system when it's optimized in the protein, so with the quantum refinement or in vacuum. So that tells us how much is the structure strained by the crystal structure. And we can see that the strain energy is much smaller in the alkoxide than in the alcohol case. This is in kilojoules. And finally we can look at the distances and especially at the zinc-oxane distance. Because if it is an alkoxide, then we have a short distance, so we can look at the vacuum structure. We have a very short distance, 1.93 angstrom, whereas the alcohol has a longer distance in vacuum 2.29. And then we can look at the crystals or the quantum refined structure and see that the quantum refined structure of the alkoxide is very close to the vacuum results. Whereas the alcohol is rather far or quite far from the vacuum data. We can also see that as we increase the weight of the crystal structure, then we go away from the vacuum data. So there is inconsistency between this structure and the vacuum data. And all these three things points directly to that it's the alkoxide that is the beta interpretation. That is also the truth according to a kinetic data. So we can therefore determine protonation states in crystal structures by using the quantum refinement. This is a more recent application with the same aim. So here we look at nitrogenase, the enzyme that cleaves the N triple bond in N2. It has a horrible active site consisting of seven ions, one molybdenum, nine sulfide ions and a central carbide ion and molybdenum. And here we are interested in one of the ligands of molybdenum which is a homocyte rate with three carboxylate groups and one alcohol group. And the question is what is the pretenation state of this one. Looking at the crystal structure we could say that there could be either zero, one or two protons and after some investigation we found that two structure were most likely. And this is with either one proton and then the proton is here shared between the alcohol and one of the carboxylates. So it's essentially in between these two. And in the other structure we have the same hydrogen proton here, but then we add another proton on this side. So here. And we can see that what directly happened is that this group bends a little bit outwards. And from the crystallographic different density we can directly see that this is not right. We get a positive density saying that it's too little electron density here and too much electron density here and we don't see them here. So we can directly say that this is the better interpretation than this one. Then we can ask the question, can we see electrons also so can we reduce the reduction states of the metal. So to do that we have been looking at the manganese superoxide dismutase. So it contains a manganese ion and then three histidines one carboxylate and then a water molecule which could be either water OH that might change when we oxidize or reduce the enzyme. So here we look at two crystal structures that were said to by the crystallographer to be one reduced manganese two and one oxidized that is manganese three. And if we start with a manganese two structure the reduced structure. We can see directly that in this case it's water ligand. So it's protonated. You can see that that gives better R3 factor residue R factor that's a more local property of the group here and better distances and lowest strain energy and OH group. So that's fine the reduced structure works very well. But if we look instead on what was said to be the oxidized structure, then we get chaotic results. It turns out that manganese two is the reduced structure with water gives the best R3 factor. The manganese two with OH gives the best residue R factor, although this one is almost the same. This one gives also the best strain energy that this one gives the best smallest difference in the distances compared to the vacuum structure. And that's the manganese three structure with OH. And discussing these results a little bit with a crystallographer. It turns out that what's happening when you beam x-ray on proteins is that the x-rays they push out electrons from a structure and then those electrons in floating around and then they go to sites that can be reduced and one site that can be reduced is manganese two. So this structure is actually photo reduced they start with the oxidized structure probably with OH minus and then during data collection, it slowly turns into the reduced structure. And what you see in the crystal structure is actually mixture of those two. And that's why this quantity refinement doesn't give any clear states. What we could say is that the quantum refinement is a good way to detect photo reduction in the structure. But until recently we couldn't sort out what we really see because we have a mixture of two different states. But early this year we solved this problem by doing quantum refinement with multiple configurations. So then we just add two versions of the QM system one with for example with oxidized state and one with the reduced state. And then we can actually sort out what we do see in the crystal structure. And to show that we turn back to the nitrogenase but to the other metal cluster in the enzyme which is called the P cluster, which is an electron transfer site and contains essentially a double Fe4S4 Cubane. So we have eight ion and seven sulfides because this one is shared by the two sub clusters. And there are a very accurate crystal structure of it where they interpreted this as a 20-80% mixture of the fully reduced state and a two electron oxidized state, which actually differ also in the protonation of the nearby residues. So when you oxidize it, then once serine residue is deprotonated and also backbone group is deprotonated and both of them bind to one ion-ion. So this ion moves from here in the reduced state to here in the oxidized state and this moves from here in the reduced state to here in the oxidized state. And in the original crystal structure, they gave the same coordinate for all the other atoms except those two ions, which is the only thing they can discern from the crystal structure. But by using our quantum refinement approach, we can actually get structures of both states on top of each other. This is what's shown here. And we can see that some atoms are moving quite a lot up to 0.74 angstrom, although the crystal crystallographer doesn't discern that because these are not as heavy atoms as the ion-ions. And we can see that we have improved the crystal structure in terms of the electron density maps, especially around this cluster. So this color on this color is for the original crystal structure, whereas the green and red are for the quantum refines structure. And we can also see that it's improved in terms of the RSZD score, so the real space Z score based on these different maps. And the strain energy are very strongly improved. Okay, so this was what you can do with X-ray structures that of course you can do essentially the same thing also for other experimental structure data. So we have shown that we can do it for neutron structure. And that's quite simple because also in neutron structure you have the same type of energy function. You actually use the same program as we did for the X-ray data. So you could just set up an energy function like this. Yeah, actually if you only use neutron data then it's only this part, but normally if you do a neutron structure you normally also do an X-ray structure or the same crystal. And then you do a joint refinement, so then you add another term for the X-ray data also. This is what you normally use for neutron structures. And we have done some application of this method also. So now we turn to a protein called lytic polysaccharide monoxygenase, the copper enzyme, which degrades cellulose and it contains two histidine ligands, in the active site, and it also binds to the terminal, the amino terminal, which is connected to the first histidine. And in this structure we are looking at, they have a peroxide or superoxide ligand bound to the active site copper ion. And they have different structures in the two subunits of the proteins, so this is one subunit, this is another subunit. And the original crystallographer, they argued that the amino terminal is actually deprotonated to N minus residue. So this is what the crystallographer suggests that in our calculations based on the neutron structure, so what you see here now is the neutron structure of the electron, so it's the nuclear densities. And we can see that we find no evidence for the deprotonation of this N group in the crystal structure. Instead we get bad densities and the real space set score is much higher from this case than for our suggested case. The next extension you can do is to NMR, and once again it's simple because it used the same energy function, just change from crystallography to NMR. And it once again used the same program so we can just set up a energy function like this and do refinement of that. The difference is that with NMR you normally do MD simulations and you get not one structure but an ensemble of structures. So this is the original ensemble structure of our test case, which is protein S, which contains a calcium site. And in the original structure they didn't care about the calcium site. So these green things are the ligands of the calcium and up here it doesn't look at all like a calcium site. When we do quantum refinement on it we get nice calcium structures also. And we can see that in the original structure they had very strange distances between the calcium ligands, whereas in our quantum refined structure we get much more reasonable calcium oxygen distances. And finally we have tried to do the same thing also with EXAFS. EXAFS means extended x-ray absorption fine structure. So this is something quite different so it's much harder to incorporate cure and calculations, mainly because the normal method doesn't use NMR potential. It doesn't give any detailed structure but only some description of shells. So shells of a metal ligand distances with a resolution of a 0.1 angstrom. But we have shown that we can combine it with the QM data in the way that the QM gives all the coordinates and then EXAFS improve the detailed distances to the different metals. So we have used it to determine the best structure of the three copper sites in a multi-copper oxidases. So they were studying what's called the peroxy adduct and they had two possible interpretations. The spectroscopic experiment gave two possible interpretations of the structure. One with the peroxide in the middle of the cluster like this and the other one on the side. And we could optimize this structure either just as a complex or within the protein with the QMM methods either with the QM cluster model or with the QMM model. And in both cases we could clearly show that this one, this structure fits the experimental data much better than this structure with the peroxide on the side. Okay, so that's the first part on my talk. That's about structures. And now we will turn to energies. And it turns out that energies are much harder to get than structures. So we are looking for accurate reproducible but also cheap method based on accurate QM method. So there are many reasons why it's hard to get stable energies in proteins and I will discuss two of them. One is what we call the local minima problem saying that if we do minimized structures that we normally do when you do QMM, then your optimizer will optimize locally to the closest local minima. So if you start here you go here and here you go here but here you go to another local minima. And that's a very big problem if you have a big protein because then all atoms affect the energies. So even if you change or if you add one extra hydrogen bond of a water molecule in the periphery of your protein, that will change your energy by 20 kilojoules in a way that is probably totally uninteresting for your results. And this is, as I said, a very serious problem because you can never see if you have such effect by eye. Now different ways to solve this problem you could run forth and back between starting state and the final state until the energy converge. You can optimize as little as possible or things outside the QM system. You can base your calculations on many snapshots from MD simulation. But of course the best thing would be to calculate not to do minimizations but instead look at free energies. So in principle running up here where you don't have minimas. And if you want to do free energies we know in principle how to do that from statistical mechanics. What you should do is then free energy perturbation using either exponential averaging or thermodynamic integration or benefit exception ratio or something like that. Unfortunately then you need to do QMM MD simulations and if you don't have very very much computer time then it's normally expensive. We have heard in the previous talk people doing QMM MD simulations but we have never had enough computer time to do that. So we are looking for a faster alternative to doing QMM MD simulations. So how do we do that? And so what we want to do is to calculate the energy difference between two states at the QMM level and getting free energies out of that. As I said we cannot afford it. What we can afford to do is to do the same thing at the MN level. So you can do MD simulations at the MN level and doing free energy perturbation from state A to state B. That's a standard method. Once you have done the MD simulations at AMB you could also do a free energy perturbation from in method space. So from MM to QMM. So then you get this energy and that energy and then you can use the fact that you have a thermodynamic cycle. So you could go this way in the thermodynamic cycle and then get the same energy as this one. So without doing the QMM MD simulation you can still get free energy at the QMM level. We designed this method 15 years ago and called it Q2CP, QMM thermodynamic cycle perturbation. So we see the thermodynamic cycle. But when we started to publish it it turned out that of course Warsaw did it 15 years or 10 years before and he used to call it the reference potential approach. And it works fairly well. The problem is to converge these energies without doing MD simulations. So then it means that we cannot divide it into many small steps as you normally do here. And the way to do that you normally had to fix the QM system and then obtain entropies of the QM system by frequency calculations instead. And here I will show some application of this method and compare it to other methods to obtain energies that are more accurate than standard QMM energies. And our test case is then hydrogenase and nickel-ion hydrogenase. It contains one nickel-ion and one ion-ion. The ion-ions have strange ligands to cyanides and one carbon monoxide, whereas the nickel has four cysteine ligands. And this performs a very simple reaction taking two protons and two electrons and forming a hydrogen molecule. And then it is naturally the protons bind to the cysteines at some intermediate states. And then the interesting question is of course which one of these four cysteines does it bind to. So that's what we tried to do in the first step using some different tastes of QM and QMM methods. The only thing you should see here is that you can essentially get any results within 120 kilojoules depending on what you're doing. And then we designed something even simpler. We said that let's protonate this cysteine residue and then the question is does the protons stay on the cysteine or does it go to the nearby histidine. And this is a movement of a proton by only 0.6 angstrom. And you will see that still we have big problems to calculate the relative energy of these two states. But that's what we are trying to do here in the following. And first I show you the Q2P results because we are using them as a calibration. So these are the two states. And here we have the MM energy. So this is this energy from here to here it costs 40 kilojoules. And then this curve is the difference in these two energy terms. You see that that for this reaction it essentially goes to zero at the end point. But that's not typical. That is just by chance. And then we use this method to compare with other cheaper methods because doing free energy perturbation is quite time consuming. It takes a week to do all the calculations needed for this. And one way to speed it up is to say that we don't do the free energy perturbation as these two points. But instead we do a single point QM MM energy calculation. And we get the QM MM free energy method by Young or the QM F free energy method by Jörgensen. And another way to simplify this is to use an idea from the ligand binding. We use a lot of method called MM PBS saying that you can get a free energy by taking the intermolecular energy from it. So the electrostatics and thunderballs from MM calculation. And then you take away all the water molecules and do continuum salvation by either person Boltzmann or general spawn. And to this you add a non-polar salvation energy from surface area. An entropy is from the frequency calculation. And with QM MM then you could replace these two energy terms by instead the QM MM energy terms then we will get what's called the QM MM PBS approach. And then we have tried all these approaches both for the hydrogen a system different sizes of QM system and different protonation states. And also for a related system which is my nitrite redactase. And then we compare with the QTCP method and see if these methods reproduce the QTCP results. And you can see that if we don't do the QM MM to MM free energy perturbation we get slightly worse but I would say still acceptable results. With the QM MM PBS a method you get a little bit bigger errors but still perhaps acceptable. And it doesn't matter if we take many snapshots or if we just take one minimized structure. Well see if we take the raw QM MM data either with the surroundings fixed or free. There are quite large errors up to 30 kilo yields or up to 70 kilo yields. If you take QM cluster calculations or only QM with a point charge or put the QM calculation in a solvent. Either a continuum solvent or a full protein solvent you still have errors of 30 40 kilo yields on average. So from this we conclude that QTCP is best that these and especially this one is a very simple method. Where you just do a post processing of the QMM results by something that normally takes about 10 minutes then you can actually get more accurate results. So what we normally do now is to do the QMM and we do the QMM PBS and do the QTCP results. Okay so that's one problem with the energies now we turn to another problem with the energies. And that's that the energies depend on the size of the QM system. And this is a well known problem. For example the oxenfeld group has many times done different calculations where they make the QM system bigger and bigger and bigger and see how QM cluster calculations or QMM and calculations converge with respect to the size of the QM system. And what they typically see is that you need something between 500 and 1000 atoms before you get converge results. And typically also that QMM results converge a little bit better than QM cluster calculations. And we have done similar study once again for the nickel hydrodynamic case. And in this case we added residues one by one so from one up to 40 residues. And we added them either just by distance criteria so just taking the closest residue to the active site or by energy criteria. So then we did one QMM calculation and then we took the MM energy components of each residue in the neighborhood and took those that gave the largest component. And then we did the calculations in vacuum or in a continuum solvent with a direct constant of four. First you can see that the two curves in both cases they converge when you have large enough system that influence or the continuum solvent decrease. That's what the Siegman and Himo group has said many times for the QM cluster calculations the same applies here although the convergence is a bit worse. And you can see also that they seem to converge. This one converge here and this one converge a little bit worse but still the interesting thing is that they don't converge to the same results. So here we converge to 120 kilojoules here to 60 kilojoules. So even if we have added both these systems are of the size of 400 atoms, they still do not coincide. They have a 60 kilojoules difference. That's of course horrible. And the question is which one is best before doing saying anything about that we would concentrate on these results. So this is the big 40 atom system. And here from these results we can get the energy contribution for each residue. So that's what we do in the next figure we plot the energy contribution with respect to this distance to the active site. And then we color them regarding to whether the group is neutral that's the green points or if it's charged. And you can see that the energies fall off the green ones like are minus three curve and the charge ones, like R minus two curve, which is what we expect from interaction of the charge on dipole dipole in the active site. And you can see that we have quite large energies also after 15 angstrom. And we can do the same thing for the QTCP calculations or rather from any mm calculation and we can see that we have big energy components even after say 30 kilojoules from the active site. And looking a little bit closer for this bump here in the curve, we see that all these come from solvent exposed charges on the surface of the protein. And from mutation experiments, it's known that they shouldn't influence energies inside the active site. So that's some sort of problem with standard mm calculation. And what many groups it's done, especially the Warsaw and Okwist groups is that they remove solvent accessible charges. And if we do that, we get the green curve. And then we see that the energy becomes constant after say 12 angstrom from the active site or something like that. And we have got similar results if we do pre-energy perturbation for ligand binding at the mm level. We can actually make a protein small on small on smaller down to 15 angstrom radius from the binding site without changing the binding energy. And these are seven different binding energies. And this is the mean absolute deviation in kilojoules. And you see that by average we don't change the binding energies by more than 0.5 angstrom on average if we remove everything in the protein down to 15 angstrom. That's quite amazing. What do we learn from this? From this we learn that if we want to have stable energies, we should include neutral groups up to four, six angstrom something. We should include all charged groups that are not on the surface of the protein. So the buried charges in the protein. And then what I haven't shown you yet is that another problem with the QMM calculations are the junction atoms and we should try to move them away from the active site. And using this information, we designed the big QM approach to get converged QMM energies. And it says simply that we should after doing a QMM calculation, we should do a single point calculation on a much bigger QM system, which you select by moving junctions to reduce away from the minimal QM system, including all groups within 4.5 or now when normally say six angstrom from the minimal QM system and including all buried charges in the protein. And for most proteins, if you do something like this, you end up with something between 801,000 atoms. And that's something you can do with today's QM calculations. You can do a single point energy calculation with 1000 atoms in one day with a decent basis on a single core. And that's what we normally say that you should do. That's the big QM approach. And this gives much more converged results. So here we have the same test system as I showed in the first slide on this section. So here we have the bad convergence when we added residues with a distance criterion. And they converge to 120, irrespectively if you did it in vacuum or with the Cosmo salvation. And down here, now let's see, it's the brown and blue, these are the two curves when you added them with respect to the energies instead. The green ones are the big QM results and you see that they are stable from a small QM system up to a big QM system within say five kilojoules. So it's a much more stable method. Finally, some comments about geometry optimization. So once again, we go back to the QM cluster calculation and compare structures you get from QMM and from QM cluster calculations in QM cluster calculations you normally fix some atoms. So we have tested three different ways to fix the hydrogen atom you convert or the carbon atom that you convert or fix two, both the carbon and two hydrogen atoms that's the S fix. And you see that we have very big movements in the structures and essentially no convergence. So if you do QMM calculation, you have nice converged structure already from the beginning. And this is average movement. So this is structures and these are the corresponding energies. And once again you can see that the QMM entities that are fairly stable, but not exactly stable and the jumps we see here and that's caused by the junction atoms. So when we move a junction away from the active site, we often get a jump, and that's why it's not exactly stable. Whereas with the QM cluster calculation you have this very strange curve until a size of this size. So QMM is much more stable and you can use quite small QM system to obtain good structures and then for energies you need to use the big QM to get stable energies. And finally, I will say a few words about energy components. What do I mean by that? And what I mean is that the one advantage, a big advantage with the QMM calculations is that you have information about the full protein. And from that information you can get lots of useful information about the reaction you're studying. So in this I will show you a test case where we studied glutamate mutase, which is performing a rather seemingly simple reaction you're moving this carboxylate from this carbon to this carbon. But chemically a hard reaction and when you have a chemically hard reaction you normally do it by radical mechanisms. And when you have radical mechanism you often use a coenzyme B12. B12 is something similar to a heme group, not exactly, and then with a cobalt in the middle, and then a ligand here, which in our case is adenosine. So this binds by the carbon to this cobalt, so we have a cobalt carbon bond and what's happening in the reaction is that we cleave this cobalt carbon bond homolytically. So we start from cobalt three and a closed gel system and then we go to cobalt two and a carbon radical. So that's the reaction we are studying. We start from a crystal structure which doesn't have any intact cobalt carbon bonds already in the crystal structure we have a cleaved bond. And one of the big issues here is to obtain the structure of the cobalt three minimums with an intact bond. So we will concentrate on the reaction and we do the reaction first in vacuum, we have done it with three different coenzyme but this is the interesting one for this reaction. We see that in vacuum we have a big step in energy it costs about 150 kilojoules to go from the cobalt carbon or from the intact cobalt carbon bond to the cleaved bond. If we do it in the enzyme then this point moves up here and we have a very shallow minimum that is only 10 kilojoules lower than the dissociated state. And in most studies they would be satisfied with this because this is in close to the experimental suggestion of about zero kilojoules per mole. But what we think is interesting is to understand why do we have this enormous effect for the enzyme. And to understand that we can use energy contributions and use the fact that we have all the information in the QMM calculations. So here we try to understand how we in the enzyme go from minus 143 so this curve up to eight kilojoules in the QMM calculations. What we could do in the first step is then to use our energy, our QMM energy. This QMM energy which we have here consists of two parts. It consists of the QMM energy in our case, which is the QMM energy with a point charge model, which is 33 kilojoules. And the MMPot is the difference between those two terms, which is 25 kilojoules. So we could first say that we have a MMM effect, which comes mainly or only from the surroundings of 25 kilojoules. Then we can go from this energy, which is the QMM energy or the QM system and the point charge model and do a single point calculation with the same system without the point charge model. The difference between those two points is the electrostatic effect because we do all the electrostatic between the QM system and the M system in the QM calculation or this point charge model. So then we have an electrostatic model of the energy of 34 kilojoules per mole. Next we could go from this. This is a QM calculation, isolated QM system without point charges, but optimized in the enzyme. And we can compare this with the same model optimized in vacuum. And that's essentially the effect of the geometry effect of the surrounding protein. And in this case, this term is very big 56 kilojoules in most other enzymes, it's a small term and this is the dominating part. And then in this special enzyme we have something we call the cage effect because the adenosine ligand cannot dissociate fully from the QM system, which it does in the vacuum. So we have 20 kilojoules just because it's kept close to the current sign, but that term is normally not so interesting in other enzymes. So that's a first understanding we have divided it into a QM part, an MM part, electrostatic part and geometric part. In the next step we can try to understand also these terms. So the MM part, since it's MM, it's absolutely pairwise decomposable, we can understand it to any two interactions in the protein. It's normally not so useful to do something like that, but we could, for example, do a division into interactions within the QM system, interactions between the QM system and the surrounding MM system. I divide the MM system into parts, one that we optimize, so this one is optimized by an MM, the green one, and the other one is fixed at the crystal structure. This is 8 kilojoules, the interaction between the QM system and the surrounding, so the system 3 is zero because they are far away. The interaction within system 2 is 11, the interaction between system 2 and 3 is zero, which is by chance, it could be larger, and the interaction within system 3 is zero because we keep this system fixed. And if we are interested, we could go on and understand the large terms here in more details also because it's an MM term, we can divide it completely, but it's normally not that interesting. More interesting is the second term, electrostatic term, and to understand this, this is the difference between QM calculation with the point charge model and without the point charge model. So what we could do is to delete group after group from the point charge model and see how much it affects the energy that is rather expensive because it requires two QM calculations for each point. What is much simpler is just to do a ESP, so electrostatic potential charge fit or the QM system, and put it into an MM calculation, and then you can calculate the energy component of each atom or each group in the protein. What's most interesting is normally to divide it into residue contribution, that's what I have done here, so here I show how large each group in the neighborhood or the QM system contributes to this electrostatic term. You can see that it's mainly the charge groups on the upper side of the protein, and you can see that if it's positively charged, it has a positive effect and negative charge as a negative effect above the plane or the current time and below. It's opposite, so in some way there is an effect along this coordinate. Again, you can understand it in more detail, but normally you don't need to do that. Then we can concentrate on the geometry effect and that was the effect of optimizing the QM system in the protein or in vacuum. And to understand contributions to this energy, you could do partly geometry optimization, so you could keep this part of the structure as in the protein, then optimize only this part and then you get the effect of this part and then you can do the same thing for all the parts. So assuming that you can see that there is essentially no effect from the corine ring or for the lower axial ligand, all effects comes essentially from the ribose group and a little bit from the adenosine group. You can continue to understand the effect by doing it in more details of this ribose group and then we can see that the big effect is actually a strain in an angle here, this angle here is the main effect of this big geometry effect. Other things you could do is to put in other coenzymes. I said that this is the one we have used all the time up to now on the adenosine cobalamin, but we could cut off the adenosine group to get the ribo cobalamin or you could cut off all this ribose group and only have the methyl group, which is also a coenzyme used in the body, some methyl cobalamin. And we can see that the big effect comes from this group and not from this group. You need, so to say, a hand at these hydroxide groups to keep the geometry strained in the protein. You can look at which energy components are important so you could turn off electrostatics or you could turn off the van der Waals terms and see which has the biggest effect on the energies. And finally, you can see what part of the protein is important for this geometry effect by doing in silico mutants, so you can mutate different groups to a glycine and see how much is the geometry affected by these mutations. And by doing that, we could see that a few groups are important, especially this one and that one and a side change from the coenzyme itself. And that's what I wanted to show you today. So my conclusion today is that, oops, I shouldn't do that. QMM, oops. QMM, no, sorry. QMM is, this is hopeless. QMM is a good method to calculate structures, especially when combined with experimental data. It's much harder to get. You need a proper sampling and you need salvation to get accurate energies, accurate free energies. You should employ QMM and free energies to avoid local minima. You should use big QM to get stable results. And QMM is a good method to understand the effects of the surroundings. Finally, just two short comments on how we normally do QMM applications today. We optimize the structure with QMM. We use a rather small QM system in this optimization. We normally have surroundings fixed, but we check with free calculations and if we see a big effect from the surroundings, then we normally put that group into the QM and run fixed calculations again because we want to have all the important effects in the QM system. We use big QM to converge energies. We use frequencies to get free energies of the QM system or entropies of the QM system. And we use Q2CP to get free energies of the surroundings. Regarding the method, we use normally structures with pure DFT methods with dispersion corrections. Small basis sets for the structures and we do single point calculations with larger basis sets to get more accurate energies. We always test both a pure DFT method and a hybrid DFT method, and if they give the same results, then that's fine. If they don't give the same results, you have problems and you need to calibrate your energies with some better methods. Local capital cluster or some sort of multi-reference methods if you have open shell systems. And finally, I want to thank all the people who have done all this work during the last 20 years. There are lots of people I've presented the results of. Finally, I want to thank these for money, these for computer and of course you for your attention. Now I should be willing to take questions. All right, thank you very much. That's a very interesting presentation. I thought it was interesting how you highlighted the role that QM plays in these different kinds of calculations. And also not just looking at the results, but also really lifting the lid on some of the underlying reasons why they perform better or worse in certain contexts. So thank you very much for that nice talk. So I think we do have some questions. So are you able to see any of the questions that have been asked in the question box on the control panel? Yes, I do. Okay, so if you read out the first question and I can unmute the person who asked it after you say something. Okay, so the first question is, did you validate the method for the protonation prediction with structures obtained by neutron diffraction? That's a good question. Now let's see what we have done. So for the test case we used, we didn't validate with neutron structure because we didn't have any neutron structure of that system. Today we could do it, so far we haven't done it, but it's a very good point. That's something we should test. Okay, thank you. I'm just looking. I think the person asked the question may have had a connection problem possibly, but they can get your answer to the question from the recording. So I'll be sure to follow up and let them know that you answered that. So thank you. Is there another question? Next one. What do you think will be the impact of neural network potentials combined with M and potentials in a NMP M and setting for the problems you described such as crystal structure refinement. The QM quantum refinement method, it would be an alternative because then you would use the M and potential instead of the crystallographic potentials and that is an interesting approach. It probably could be accurate. Hopefully, I think that the QM method will still be more accurate. So there will still be room for using the quantum refinement method. That's my guess. Thanks. I will assignment who asked that question I will unmute you so that you can. There's no audio at the answer. Oh, he said he has no order in the answer in that case I will mute him again. Okay. Then next is very input. Interesting application. Now let's see. Yes. Of QMM one is this QMM and crystal refinement software easy to use can be used as a black box or need advanced experience such as Steve T function selection. QMM parameterization. I would say that it's actually easier to set up a quantum refinement calculation than a standard QMM calculation. Because you use only the quantum pot. So the only thing you need is quantum mechanics and quantum mechanics Steve T calculations are easy to set up. And normally, since we look at structure, they are much less sensitive to the functional than energies. So normally a pure functional with a small basis it split valence basis it with polarized function is good enough. In that way, it's more or less black box. The hard thing is to set up the crystallographic refinement with new software we are using the old, a little bit old fashioned CNS software. And that is a little bit harder to set up. We are planning to move to the more used Phoenix software where it is more or less black box to set up the calculation. And then it should be rather simple to set up. Okay, thank you to the next question. Yeah, the next says that your electrostatic contribution is surely dependent on choice of model system system partitioning. I would guess that it's a little bit dependent that I wouldn't say that it's very dependent because it is. Yeah, of course. If you make the QM system very much bigger than it will swallow all the interesting groups, and then it's hard to get the contribution but if you have a fairly small QM system. It depends too much on the size of the model system on the partitioning. That's my guess. Okay, thanks. Do we have any other question. No, that was the last one. Yeah, if, if not from the audience anymore because I also don't see questions coming in via the interface then I can maybe ask to general questions. The question is about this free about the approach to calculate free energies where you try to approach where you try to find an answer for the free energy to do the reaction to the process at the QM level by doing it first at the M M level and then to the QM level to the QM level. Have you checked, is it possible to check this for for a simple system where you are able to run the QM calculations or even if you would use a semi empirical method just to check if the overlap between the M M and the QM ensemble is good enough to actually close that cycle. Have you checked those have you checked the closure cycle or are you aware. Yeah, we have actually we have started to work on this rather lately when we have started to do a legal binding with QMM methods. Then we have done it all the way from one method to the other one, running the full QMM MD simulations. And I should say, there is a big problem with the overlap. Our conclusion was that the best thing is normally to run the MD simulation but still use a reference potential method so use a reference potential method but dividing the method perturbation into several steps and actually run some QMM MD simulations. Thank you. If I can ask another questions or no questions coming in still on my list. The other question is when you do the decomposition into energy terms and you start to decompose the interaction between the M M atoms and the Q M atoms by doing the ESP fits the ESP fits. And I think will be polarized so you will have higher body many body effects in the actual charges that you obtain the ESP at least if you obtain the ESP in the QMM calculation. Whereas the amino acids and all the other residues that charges have been optimized to use to body potential to represents to kind of capture these many body effects. So is there no some risk of double counting or meeting a specific term. What you see is that normally the ESP model is a good model of the QM system meaning that you get essentially the same energy difference between the reactant and product state. If you don't get that then you cannot use the method but we always check that before. Since then this energy decomposition that's not very accurate approach what you normally want to see is which is the most important residue and for such a qualitative statement. I think it's a very good method. It's good enough. So this quantitative statement that is also coming back I think to the question that was also asked before by I think the Jonathan question that. So how dependent this is also on the force field because depending on different force fields can have quite different partial charges like some forces are quite polar or the photos are much less polar like almost. So the question is then which also to choose will probably control the answer in terms of the entity composition. If you also played with varying the mm part of the QM calculations. Not so much. I don't think ever try to change use different mm potential. So what we normally see is that even if the charges look different. So the actual charges are different. The electrostatic potential from this charges is much more similar. So you should. Yeah, because on the longer distances probably becomes the dipole and the higher order one of the dipoles very well. Yes, I see. That is good. Yeah, I don't want to have to sensitivity on that. All right. Thanks a lot for the question for answering the questions and also for the stock. I now close my microphone. I don't know. Yeah, if I can allow be allowed I also have also a question. So BioXL, you know, one of the things that BioXL is funded to do is to try and promote the uptake of software that enables usage of these high performance computing machines that are being deployed across the EU. As part of, for example, your HPC and you mentioned SNCC, I think the center Swedish center that that provides, you know, I think computer infrastructure as well. I'm interested to see how you highlighted how QTCP is clearly very, you know, as you said, very time consuming. So calculation might take a week or weeks. They also highlight I thought very nicely the by the systematic analysis of the, the effect of the QM region size, the, you know, desirability of having this kind of 800,000 atoms. That's, that's more compute. And there is software to, I mean, software like CP2K is an example of just one QM application that in principle could scale quite well with that kind of large region size is a large number of QM atoms to explore, you know, to do large scale parallel calculations and short turnaround time. So you mentioned running a single core calculation for a single point energy optimization. So my question really is the access to sufficient compute is that, is that one of the barriers facing maybe using QTCP or using large QM systems and do you think that it's important for researchers to exploit, you know, to ask for more compute to use more compute and to do these kind of more large scale more accurate calculations. It is always good to have more computer time you will always use the computer time you get. I think we are not at the limit. So big QM we could probably still increase the QM system to say 2000 atoms, and that would probably improve the convergence. But I think the most important part for computer time is probably to do calibrations. So do capital cluster calculations and high level calculations, but also free energies. And at the end, you want to do free energies at the QMM level with MD simulation, and that's the next step, then you need very much computer time. Yes. Okay. Thank you. I think there's some more audience couple more audience questions that have come in. So then we have the electrostatic charges are always described as point light, like charges. Yes, that is what we have done so far. We have started to play a little bit with the polarized embedding and multiple embedding. And then that's, if you have the right software that that's fully possible. And they will probably improve the results a little bit. But we have very little experience for enzyme systems so far. So this person cannot use a microphone. However, before going on to the next question, I can mention that they responded just now to your response saying yes, but the quantum correction considers the possibility to involve higher order description. Yeah, so that I guess that depends on the what program you're using. We are using top to the moon and I think it can use multiple model for the surroundings and polarize abilities. There are still quite few software that can do that. I hope that answer your question, but I'm not sure. Thank you. I think there's one more question from Mara. Did you check something? Check which effect is stronger close q and m and cats to the active site or charged residues at 10 angstrom distance. I guess that depends on the system, but I would say that both are important and depending on the system they can be more or less important. Normally, I would say that the closer residues are more important. That's my guess. Okay, Mara, I will unmute you in case you want to respond to that. Thanks. The question was answered. Thank you. Okay. Okay, I think those are all the questions unless you see anyone assigned that have not yet spotted. No. Okay, in that case, I will just briefly say something about the the upcoming webinar. So we have a couple of upcoming webinars. We have, first of all, the next webinar as part of the as part of this workshop, which is on December 10th by Maria Joao Ramos about studies on enzyme catalyzed reactions. And we also have a webinar a couple of days before that by a bio Excel colleague of ours, Mitya Morozov at the University of Vasco. And he will be talking about the interface at least developed to allow Gromax to couple with CP2K to be forward to perform QMM calculations that could both both could that could also be of interest to people who are here. Okay, so with that, I would like to thank you again very much for this for this really nice talk. And also thank you everyone who attended. And I hope to see you again at a future session.