 So, for the day, Michael Seaton, based at STFC, does be a battery in the UK and today is on biomelecular simulations with molecular dynamics and I believe we've got our first two lecturers in over many. So, our first lecturer today is Professor Anela Ivanova, who is based at the Faculty of Chemistry and Pharmacy at Sofia University, and I believe the talk is going to be on the basics of mathematics. So, Anela, are you ready to present your slides? Yes, thank you, Michael. Okay, good morning to everyone. I am glad to virtually meet you again. In the next one hour, I'm going to introduce you to the main ideas of the molecular dynamics method to what the meaning of this method is to the underlying theory and to the main concepts related to its implementation so that we can do biomelecular simulations with molecular dynamics. Since most of you marked during the registration that either they have no experience with molecular dynamics or basic or intermediate, I'm really going to try to stick to the basics of the method and for those of you who would like to have a more advanced discussion, I hope that we will be able to do that at the end of the lecture during the time for questions. Now I think I'm going to switch off the camera so that we don't put the additional vote on the connection, and then I'm going to share the screen if for some reason you don't hear me or don't see the presentation, Michael, please tell me. Okay, I guess you should be able to see the screen now. Yeah, that's fine. Then before actually starting with the method of molecular dynamics, I will introduce several concepts that will be needed or are inevitably needed to understand how molecular dynamics simulation functions. I would like to start with answering the question what can we gain from a molecular dynamics biosimulation? Well, here is just a list of information we can gain from performing such a simulation on myoactive systems. For example, if we are interested in what is the shape and the size of a biomachromolecule, being it a protein, being some membrane assembly of lipids or a DNA fragment or an RNA or some other super molecular assembly, we can learn from the molecular dynamics simulation what is the shape and size of this biomachromolecule. It's a very special focus on the natural environment. I'm going to talk more about that later during the lecture. If we want to know how biomolecules aggregate, for example, we want to deliver drugs in the human organism. Usually, nowadays, this is done by some aggregating the drugs inside a biocompatible carrier. So, this can be also investigated by molecular dynamics. What are the characteristics times of formation shape and the lifetime of aggregate? These also can be answered in a way by MD simulations. Or, very often, we would like to know how strong does a drug bind to a macromolecule, very often to a protein, to an enzyme or some other chance factor macromolecule in order to judge whether this drug will be able to replace the natural substrate of the macromolecule. Then, this again can be investigated by molecular dynamics. Or, how can a drug traverse the cell membrane into the cell and do its job inside the cell? This also can be studied by molecular dynamics. Rate and conditions for conformational transitions of bio macromolecules is also important. Aspects that reveal the mechanisms of functioning of these macromolecules in the organisms. What are the magnitude and the nature of the force affecting on the system of interest for us? What is the super molecular structure of biopolymers? How is it related to their function? You know that it is very important. The super molecular structure is deterministic of whether the biological function of a certain biomolecule will be displayed or not. Last year in the list, but not least, we are very much interested as Professor Ducinova was talking yesterday in the bio activity or potential bio activity of new compounds that may be used as drugs or as have other bio functions. This also can be information to this and can also be obtained from molecular dynamics simulations. Of course, this is by far not everything that we can do with MD. There is many more. Nowadays, this method is becoming more and more popular and it starts to be one of the irreplaceable tools in the toolbox when studying biological reactive systems and when doing research in this direction. I guess with this, I was able to convince you that it's an important method. When we start planning a molecular dynamics simulation, we should always think about several factors that might influence the setup, the carrying out and the outcome of such a simulation. First, we have to think what are the characteristic times of the processes that we would like to observe? Just briefly to summarize the previous slide with molecular dynamics, we are actually monitoring the natural motion of molecules in very often liquid environment or in the presence of ions or whatever it is they are embedded in the organism. So these movements of the molecules, they happen at different timescale and it is important to think in advance what is the timescale that we would like to achieve. Here I have put a vector so to say showing the increasing times and some of the bioprocesses that correspond to them. The fastest ones of the order of femtoseconds are the vibrations of the bones length in the molecules. All the atoms move, they vibrate and this happens within femtoseconds. This is a very fast process. Then conformations of molecules are generated by rotation about covalent bones which typically take about picoseconds. Then we go to the slower processes. For example, if we would like to observe a transport of a small molecule or ions in an ion channel across the cell membrane or a lipid molecule dating in a liquid medium or the water relaxing around a biomolecule, this would normally take up the order of nanoseconds. So slower. Even the smallest routine would fall, would attain their spatial structure quite slowly of the order of some microseconds. Here is where this arrow ends because nowadays if we would like to have a domestic detail with a molecular dynamic simulation, this is about the slowest time scale that we could achieve of the order of tens of microseconds with the computing power that was demonstrated to you yesterday. If you would like to take a look at slower phenomena like the folding of so-called normal proteins, which are rather large, then it may take up to milliseconds. If we would like to monitor the synthesis of proteins in the ribosome, these are already very, very slow processes taking the times of the order of seconds. If we would like to observe a biochemical reaction or a protein folding inside the cell membrane, this is already the domain of biology, and it may take up to days. What can we do if we would like to observe such processes by molecular dynamic simulations, where it's a bit tricky, we can do the so-called coarse graining of our models. This means we have different level of complexity or detail of the molecular model themselves. The so-called fully atomistic or all atom models, there we treat separately each atom in the molecule, as chemists are used to doing this. However, it is also possible to unite several atoms in one particle, in one effective particle, which is called the B. This process is the term coarse graining, and then we can treat this effective particle as the center, which interacts with the other sub-centers. Okay, you can visualize this most easily by taking a protein. Proteins are very well bound by amino acids, bound in a certain way. Each amino acid contains a number of atoms, but we can imagine that, okay, the structural detail is not so important, and we can treat the whole amino acid residue as one particle, one bit, so this is a level of coarse graining. Of course, we say that by coarse graining, you usually decrease the resolution of your results, so you will not be able to see the atomistic details, but you can observe, in this way, slower phenomena. Even further, we can give up all molecular representations, so we can say I don't meet discrete molecules in my model. I can describe their phenomena, the phenomena happening therein, with a mathematical function. These are called mesoscopic simulations, and using them, you can also observe phenomena from the realm of biology. So this is something you have to decide. What is the time scale, and if you are willing to describe faster phenomena, then you may remain with the atomistic detail models. If you are willing to describe slower phenomena, then you might need to coarse graine at a certain level. Your system, the two are always related. Also, the time scale of the process, okay, I have a big system, but I'm interested only in the conformations of some protein side chains which are on the surface, and they rotate fast, so I can stay with the atomistic model, but just do short simulations. So that's how you usually make this decision. What kind of model to treat? The rule of thumb here is that the slow phenomena usually require long simulations, and or maybe sometimes both seem to be fine models. Okay. Professor, before we continue, so I've interrupted, there's a little text box that keeps appearing and disappearing that says, please move this window away, which is about a little way from the top of the screen. But how can we get you to be here? Maybe I can share the screen by sharing the okay. Hang on. Yeah, there's still that little window, yeah. Okay, let me try like this. Here's the slideshow, and then I stop the sharing and repeat it. Yeah, you might have a, it might be a little window somewhere that you've got to open that's sort of. Yeah, it happened to me also yesterday, but I couldn't find out the source of this. Ah, okay. Okay, share. No, I think it's gone. Excellent, okay. Okay, we can go on. Thank you, Michael. Okay, then one more classification, which we have to introduce before, because it's another decision you have to make when you start the simulation. The computational methods that can be combined with molecular dynamics may be divided into main classes, the classical methods. They use the laws of classical mechanics to describe the motion of molecules. The so-called molecular mechanics force fields, I'm going to talk about them in a minute, a little bit more. The advantage of these classical methods, okay, let me first say what they mean. A classical method is a method that neglects the quantum, the discrete quantum structure patterns. So if you have a carbon atom, it will just be a particle, a bit, interacting with the other particles in the system. What is the advantage of such simplification? You can check quite a lot large models. I have written here up to 1 million atoms, but nowadays people are already starting to look at models up to, let's say, 5, 6, 10 million atoms on an HPCC system. Of course, if you are treating an event that relies on the quantum events that can happen in your system, for example, if you need to describe tunneling, you cannot use such classical methods, but fortunately, biological systems, they obey more or less the laws of classical mechanics, and usually we are interested in phenomena that can be treated with this approach. So from now on, in the lecture, I'm going to focus on these classical methods. Just to mention the other ones, the quantum methods. There, the discrete structure of the atom is taken into account. This means, of course, the nucleus is treated as a classical particle charged, being there in the middle of the atom, but the quantum properties of the electrons are taken into account. You can also do that with molecular dynamics, but these are some of different implementations. We are not going to talk about them today. What would be the limitation of some of such quantum molecular dynamics simulations? I would put here up to 500 atoms, but maybe up to 1,000 will be the problem on supercomputer. So here, the only one has to decide I want a big model, I want my protein in water with ions, then I will have to stick with classical methods. Or I want to see very precisely how this drug molecule binds to an enzyme active site, and I want to especially monitor the spin properties of this ligand protein complex, then I have to do quantum dynamics. Okay, so we have made this decision. We are going along with the classical approach. Then this is a very important picture to put in your head, the concept of the potential energy surface. I guess you are all familiar with this, but nevertheless I'm going to just introduce the terms that I'm going to use later on. So the potential energy surface is the surface that the molecules travel along when they move by their natural movements, by the thermal fluctuation, by the kinetic energy they possess at a certain temperature. So this potential energy surface represents the energy of a system as a function of its structure. By changing the structure of the molecule for every super molecular assembly, we inevitably change the energy of the system. And we are usually interested in stationary points along this potential energy surface. So there are a lot of minima on the potential energy surface, a lot of maxima, and the molecules can move between this minima by crossing certain energy barriers, so by moving through some maxima. Usually the minima are the target points of especially biomolecular simulations, because the minima are where stable structures actually are formed. Most of the time the structures will be like this. They will correspond to minima on the potential energy surface. You have here an example of a protein that has two minima, and even a third one here where it is almost completely unfolded. Of course this is a very simplified representation of a potential energy surface of a protein. They are usually much, much more complex, with many such minima separated by barriers of different types. In this case, okay, every potential energy surface has the so-called global minima. So it is the state, the structure, to which a minimum energy corresponds. In our case the most stable conformer of this protein is the conformer, and so this is the global minimum on our potential energy surface. I would like to reveal even at this point that the molecular dynamics method is devised in such a way as to be able to naturally visit such minima on the potential energy surface. So you will get the so-called damping of the potential energy surface, which means that you will get a minima on the potential energy surface. Okay, so we need this, the potential energy surface, because the molecules are moving all over it. Well, depending on the heights of the barriers and on the conditions, I mean temperature pressure and so on, where the molecules are. Now a couple of words more about the molecular mechanics method. This is a method which is used to describe molecules using the laws of classical physics, our classical mechanics. That is why it is called molecular mechanics. Within this method, we represent molecules as a system of bound mechanical objects. So the evidence are these bits that I was talking about and they are bound. We know that in reality they are bound by covalent bond, but within the methods of molecular mechanics, bonds are represented as, let's say, springs. So you can imagine that the vibration of the bond is actually these two bits here attached to a string of a different strength, and this spring can deviate from its equilibrium position, but it always returns to this most preferred bond length. The equilibrium position. We represent the vibration of the angles in a similar way. We represent the torsion potentials with the classical expression, because we need to be able to describe different conformations of the molecules. Then we, of course, in biomolecules, there are also the so-called non-bonded interactions, the most important among them being the electrostatics. This is the interaction between different recharge particles and, of course, the van der Waals interactions, the dispersion. So, at the end, we may assign a kind of classical mechanical function to describe each of these interactions in the molecules, and when the molecules are moving, there will be a contribution from all these movements to the potential energy of the system. The combination of the type of mathematical functions that we are using to describe all these interactions and some parameters, I'm going to explain in a minute what these parameters are, actually provides the total potential energy of the system. So, if we know this number, the potential energy for different structures, we can construct the potential energy surface of our system by a molecular mechanics force field. So, the combination of the type of potentials and parameters is called a force field. Force fields are derived by different research groups, and therefore there are different force fields which exist in the literature and which people are using. I have listed here some abbreviations. Usually these force fields, they have abbreviations, and that's how they are known. I have listed just a couple of them. There are many kinds of force fields, or I don't know how many, but here we will briefly focus in a minute only on the most popular ones which are atomistic. I'm not going to talk about the force field models from now on, and which are often used for biosimulation. Okay, so let me just illustrate what are the type of mathematical functions that are most often used to describe the different movements, the motions that can happen within molecules. For the stretching of bonds, of course, it's very convenient to use the harmonic potential. So, we have the string, we have its strength, described here by the force constant, we have the equilibrium bond length, which is cell zero, and this means that this bond can vibrate within this harmonic potential. For the angles, the same approach, usually a harmonic potential is used, just with a smaller force constant, because a kind of skin is more open. Usually this potential is symmetric or periodic. That is why a periodic function is usually employed to describe the energy change upon rotation about covalent bonds. We have here the height of this barrier, we have the equilibrium angle, and we have the number of the minima along this torsional potential. To describe electrostatic interactions, there is no surprise, usually the column law is used where we assign here some partial charges to the atoms in our system, and of course the electrostatic energy is inversely proportional to the distance between these particles and proportional to the magnitude and sign of their charge. For the van der Waals interactions, the dispersion, there is a very popular potential, the Landon Jones potential, probably you know it, which has an attractive part, and a repulsion part when the particles are too close together, so they don't merge. This is actually a representative form of a force field, but usually some things are circled here. These are the so-called parameters. These are numerical values that we need to take from somewhere in order to be able to calculate the separate energy contributions. So how are these parameters derived? We say usually these values are derived in a way as to be able to represent the experimental data for a certain series of molecules. These molecules are used to parameterise the force field, that is how we say it. So these are numbers, and putting in all those numbers guarantees that when we obtain the energies, they will be able to represent the experimental behaviour, the experimental properties that were used to derive these parameters. What kind of experimental data are used, different, most many structural parameters taken from XA and MRO, whatever experimental methods you have, then some thermodynamic parameters, density, enthalpy and so on. And some of these values cannot always be secured by experimental measurements. For example, the energy barriers for conformational conditions, then usually high-level quantum chemical calculations are performed to ensure the data for these parameters. Also the charges on the atoms, they are not directly experimentally measurable, so they are generated from quantum chemical calculations. That is how we end up with the force field. We have chosen the potential functions and we have the parameters. This is something that the user usually doesn't do, but the force field is ready. The parameters are already introduced as libraries in the software package, but you should know what the force field is and how it looks like because sometimes some of the parameters are missing and then it is the task of the user to provide these parameters. Therefore, I'm just going to briefly show you the potential functions and just mention in a second the three most popular atomistic force fields that are used for biosimulations. Well, the father of all biomolecular force fields is amber. It is developed by some scientists at Cornell University in the United States and originally it was developed, so this is very important to remember for the force fields. Depending on what kind of compounds were used to derive their parameters, they are suitable for describing the same or similar classes of compounds. You cannot use amber, for example, describe metal complexes. It is not suitable for that because it was devised for proteins and nucleic acids. From version 95, amber is also appropriate for small organic molecules like for example the drugs. They also have a version for carbohydrates and another version for small organic molecules, so it is very, very, very popular for biosimulations. Then we go to Charm. It's also quite often frequently used for biosimulations. They have focused more on parameters for lipids, so if you would like to simulate the lipid membrane, then Charm would be your best choice. It works okay for proteins and also it can be used for nucleic acids. Again, it is developed in the University of Maryland mostly in the United States. The third one that is gaining more and more popularity for biosimulations now during the last maybe five, six years is the OPLS. It was originally derived based on amber to describe proteins and organic liquids. This is a specificity of OPLS. If you need properties of liquids, then it would be a good idea to use it. Recently, they also provide a very decent parameter set for description of lipids. It's developed in the group of proteins at Yale University again in the United States. Of course, there are other force fields. You can read the literature and you can make a choice. After all, I'm just mentioning the most popular ones. Okay, just sorry. It is important to say a couple of words about the environment. I'm sorry, just second. Because bioprocessing, they usually don't take place in the gas phase, but in a liquid environment. So it is very important to include explicitly with the solvent molecules also this environment in the calculations. You will hear more details about that also during the training of how can be done and what is important to obey. Of course, the most natural environment in the living organism is water. So usually we put at least water surrounding the biomolecules that we are modeling. There are also force fields that are derived especially for water and maybe the most popular among them is the so-called water model PIP3P. There, each atom in the water molecule is a separate particle interacting with the other particles around. So you have to choose the appropriate force field also for water. And okay, so now we have our biomolecule. We have surrounded it. So for example, with solvent molecules and we have put it in such a critical compartment. For example, however, at the walls of the compartment, the particles that are interacting, they feel or they have different forces than the particles that are in the center of this cubical compartment. This is called the simulation box. So in order to eliminate this, the so-called finite size effects, they can cause artificial effects in our simulations. For example, if you have charged particles in the system moving, then the particles with opposite charges will try to stay with the same charges, will try to stay as far apart as as possible in the box. It's an artifact that's not what happens in nature. That's why usually biosimulations are done in the so-called periodic under conditions. What does it mean? Your simulation box. And this provides you with a continuous solution or solid event on what you would like to simulate. It eliminates the finite size effects. And it provides you with a more realistic model that you can describe. Okay. But now we have the periodic boundary conditions. We have all molecules inside. For example, this is a part of a lipid bilayer with some proteins embedded into it. This is a DNA fragment with a solvent, water solvent and some counterions, sodium chloride. That is normally how the resulting models look like. But the sizes nowadays are 10 to the power of five, 10 to the power of six atoms. So these are quite large models. And their potential energy surface will be rather complex. It will have many such minima, many such maxima separated in the minima with different barrier heights. So we need some statistical approach to be able to visit to sample the potential energy surface properly. So to have structures that correspond to the different minima on this complex potential energy surface. That is why we need an approach from statistical mechanics. And this approach in our case will be the molecular dynamics method. But before we go to the method, I very briefly will mention, so we always have a lot of particles in the model. Then we have to do statistics on them. So we have to use the approaches of statistical thermodynamics. This automatically leads to another decision that you have to make, which ensemble of statistical mechanics you would like to simulate in. And here are the two most popular statistical ensembles that are used for biosimulations, the so-called canonical ensemble. And here are the quantities that are kept constant or have constant average values in such an ensemble. This is the number of particles, the volume of the system and the temperature. Okay, some of the biochemical reactions may be take place in such an ensemble, but they are not a lot. Most of the biochemical process, they actually happen in such conditions in the so-called isobaric isothermal ensemble, where the number of particles, the pressure and the temperature are maintained with constant constant. So this is actually the most popular ensemble for biomolecular simulations. But again, you have to think what kind of process you are interested in, in which conditions it takes place, and then choose the appropriate ensemble. So this was the introductory part. And actually, we can now set up a scheme of a classical biomolecular simulation. At the first step, we select our level of detail, either an optimistic course, a simplified course-chain approach. Then we select the force field, which one is the most appropriate for our system. Then this is the other general method for performing statistical mechanical simulations among the curve, but we are not going to talk about it today. Most of the biochemical simulations are done by biomolecular dynamics. Then we have set up our model of choice in our method. We perform the simulation. And as a result of this simulation, we get the so-called, let's say, molecular dynamics trajectory. This trajectory is a file. And this file contains an ensemble of structures that are, let's say, a representation of sampling from our potential energy surface. These are usually a lot of structures, several hundred thousands normally. So we have to process this significant amount of data somehow. This is usually done by performing statistical analysis on these trajectory files. There, from, we can extract the necessary physical properties of our system. We can average them. We can do different processing. And finally, we can do some conclusions about the properties of our system. What do we use these properties for? We can compare to experimental data to assist, for example, interpretations of some complex experimental data. We can explain some experimental phenomena, or we can be even bolder and predict some new properties that were not known for the system so far. And now finally, we start introducing the method of molecular dynamics itself. Okay, what is the essence of the method? Nothing special about it. The method of molecular dynamics is a solution of the classical equations of Newton, the equations of motion they are called. And this is the very familiar second Newton's law, which relates to the force acting on a particle, the force acting on a particle to the acceleration of this particle. So as long as we know the force, it is automatically accelerated following the second law of Newton and it starts moving. But how do we know the force? We can calculate the force as a derivative of the potential energy with respect to the coordinates of these particles in our system. Where do we have the potential energy from from the force field? So this is the basics of the method. It sounds very simple, but as you will see on the next slide, the implementation is not that simple and there are several things that we need to consider. Then solving sequentially these equations of motion, we can monitor the movement of the particles in our system over a certain time period and we can record at some time intervals the data in a trajectory file, which we then process. What would be the accessible time periods for currents and dissimulations? The atomistic normally up to 5 microseconds, the force grains normally up to 20-50 microseconds and the microscopic can go up to 1 millisecond. How do we perform a molecular dynamic simulation? There are several stages that we have to follow. First, we do the so-called energy minimisation of the initial configuration. The initial configuration is the initial structure of our system. You will hear more about that in the next lecture and in the practical training, but somehow in a graphical program, for example, you have generated the initial structure of your model. You have put it in a periodic box and applied periodic boundary conditions. Then you do the energy minimisation. This is required in order to eliminate some unnatural repulsive forces and to take your system to the nearest energy minimum on the potential energy system surface. Then you hit the system. This means you assign velocities to the particles that correspond to a certain temperature. This is done gradually. You will hear about it more again in the next lecture. Then you equilibrate your system in the chosen statistical ensemble. Since we are following the natural laws of Newton, if you just leave your system in the chosen statistical ensemble, it will inevitably reach equilibrium. This is the equilibrium state. All these three stages, they are preparatory for your simulation. Then you start the actual accumulation of configurations of low energy structures by calculating the forces and then solving the equations of motion. This propagates the system in a certain direction. Then you order a snapshot and then you do it again and again and again many times until you are satisfied with the time period you have. Then you stop the simulation and you start the statistical processing of the recorded configurations. Something that is important as a check whether your simulation is doing well. You have to know that energy and momentum of the system are conserved during the simulation. You may monitor the energy and sometimes the velocities are very rarely to check that everything is fine. There should be no drift in the energy of your simulation. The equations of motion on the previous slide look very simple, but in fact they are second order differential equations, which are very complex for such large systems, which necessitates numerical solutions of these equations of motion. Here is a very brief presentation of the approach that is used to integrate these equations of motion. The most popular algorithm, these algorithms are for numerical solutions of the equations of motion, are usually called integrators. The most popular maybe in for biosimulations is the so-called leapfrog algorithm. It relies on a Taylor series extension of the coordinates of the particles around a certain time by adding small time increments. So this is an important parameter in these simulations. It's called the time step. It's usually a very small number because it's a Taylor extension of the order of one to two femtoseconds. Then the new positions are generated from the velocities at the half step ahead and from the current coordinates. The velocities at half step ahead, we calculate from the velocities of the particles at one half step before, and from the current forces acting on the particles that we have calculated from the potential energy. Then we also have the velocities at the current time step by a simple average of the velocity set half step before and after the current moment. I'm not going to go into details about that, but it's a robust algorithm. It's reversible in time. The drawback is that it's a bit weird the way the velocities are calculated on the particles. There is some improvement with the so-called velocity valet algorithm where it is considered to be maybe the most stable and reliable integrator of the equations of motion on these valet series. Here you have all the coordinates and the velocities and the forces at the same time step finally saved. This is the advantage. The disadvantage is that it needs a little bit more storage. Okay, but so far we have, we know how to integrate the equations of motion, but this is not the canonical, neither the canonical, not the isothermic isobaric ensemble. We now need to extend the equations of motion with the so-called thermostats. The thermostats are algorithms which allow maintaining constant average temperature in our system. So you can imagine it as coupling the system to a heat path as it is done in thermodynamics. I'm not going to go into detail about this formulae, about how the equations of motion exactly are extended. I'm just mentioning here the most popular algorithms that are used for maintaining constant temperature. So one of them is the parent center of stuff. It's very robust, it's very stable, it can always bring your system into equilibrium in the canonical ensemble, whatever initial structure you start from, but it has the main robot that it does not sample properly from the canonical thermodynamical sample. Nevertheless, for biosimulations people are using it all the time and it has been proven over numerous simulations that there are no problems with the structure generated using the parent center. There is an implementation in Gromach that corrects this problem with the sampling and it is called the very scale thermostat. If you would like to be very, very thermodynamically correct then you can use the so-called new shape of our thermostat which is a different approach of correcting the equations of motion. In order to maintain constant pressure in the system then this is done by allowing the volume of our periodic blocks to fluctuate. So it resembles a virtual piston acting on our system which can maintain the average concentration. Again, one of the most popular bars that sit at the parent center algorithm, very simple, very robust, it's usually used to equilibrate your system to bring it into equilibrium before you start sampling, before you start the so-called production part of the simulation. Again, it doesn't sample exactly the isobaritite thermic ensemble. So if you want to be very correct you can use this perineval ramandar stuff which is also available in the Gromach's package and not only in the Gromach's and most of the packages. So we need to now talk about some specifics relating to speeding up the simulation. Up to this level all the basics are set. You choose your system, you choose your force field, you choose the periodic boundary conditions, you choose the thermostat and barostat, you choose your time step, you choose the length of the simulation and your rate. However, if you have a million atoms in your model then it will be quite time consuming to do the calculations and there are several techniques to speed up these calculations. I'm just very briefly going to mention those. First of all we can apply a cut off to our potential energy. For example, if we have one particle here and one particle very far from it, it is for sure the interaction energy between them is zero. So why calculating it? It takes a lot of computer time. So we can impose the cut off on potential. Just check the distance between the particles and if it exceeds a certain value, the cut off, then we can just keep the calculation of the potential energy and the forces which is very time consuming. This is done. Usually values of the order of 1.2 nanometers are assigned as the limiting inter-atomic distance beyond which the potential energy is not calculated. Here I have shown different techniques by which you can smoothly make the potential energy go to zero. Usually this is recommended to avoid some artifacts in the simulation. Okay, but in order to impose the cut off, you need to check all your particles in the system to decide which fall inside the cut off and which fall outside the cut off. This is also a slow process. So in order to speed it up, there is the so-called ballet technique suggested by the ballet. So you don't check all the particles in the system. You check particles that fall inside a sphere that is a bit larger than the cut off, usually by 0.2 nanometers larger radius. Then you make a list of all those particles that are inside this ballet sphere and you have your neighbor lists. These are the neighbors that need to calculate the inter-particle interactions for. There is also another approach where you just split your periodic boxing to such cells. You are sorted by a certain thing, coordinate or whatever, and you list your particles according to these cells. So this beats the calculations very much. One more thing. Okay, I have one particle here and if I need to calculate the interactions of this particle with all the other particles, it's against law. So the so-called minimum image convention issues. I calculate the interactions of this particle only with its closest neighbors, either in its actual box or the so-called periodic images. I forgot to mention that whenever a particle leaves the box, it enters from the other side. So we keep the end condition, the number of particles constant in our system. So particle one will interact with particle two and with the image of particle five, with the image of particle four, and with the image of particle three. This also saves a lot of time during the calculation of the force. Okay, by imposing this cutoff, we, however, lose part of the energy. We can never be exactly sure when the energy becomes zero for all particle pairs in our system. If we think that this loss is big for the Van der Waals energy, for the dispersion interactions, we can do the so-called long-range correction. So we calculate the potential energy up to the cutoff exactly using the force field, and we do such integration to estimate the long-range part of the Van der Waals energy just by using the so-called radial distribution function. It's a specific analysis that can be done on our system. Okay, it's seen part of the Van der Waals energy, but it is for sure not enough to compensate the missing part of the electrostatic energy because the electrostatic interaction is so long range. So a particle here can still feel electrostatically the particle here because the dependencies one over are the distance between the particles. To compensate for this, for the missing part of the electrostatic energy beyond the cutoff, there is a special scheme called the particle mesh level. Based on the method suggested by Evel to treat the electrostatic interactions in crystals, I'm not going to go into detail. Just I'm going to tell you that there is the electrostatic energy is split into three contributions. The direct one, which is just the Coulomb law applied directly up to the cutoff we have imposed on the potential. Then the long-range part, which is calculated in the Fourier space, in inverse space, so to say, minus some non-physical self-interaction, which is subtracted from the energy at the end. This is also speeded up by assigning charges, not taking the real charges of the atoms, but dividing them on the nodes of a regular mesh, that is why particle mesh. This speeds up very much the procedure in the Fourier space and allows you to calculate the long-range part of the electrostatic energy in a different way. Just some final notes before we go to the end. In biosimulations, it is quite often that we would like to impose the so-called restraints or constraints on some parts of our model system. For example, we take a protein. We know it's crystal structure and then we would like to keep the crystal structure as it is. But we would like to see how water molecules assemble around the protein or how ions assemble around the protein or how ligands bind to this protein structure. Then we may impose a restraints on the atoms of the protein. What is this? We just impose an additional harmonic potential that forces the atoms of the protein to not to move, to keep their positions fixed during the MD simulations. This is achieved by a figure setting up a big force constant of the harmonic potential. These are the so-called restraints. We decide whether to involve them or not. There are also, however, the so-called harmonic constraints. Sometimes, not sometimes, the size of our time step in the integrator is limited by the frequency of the fastest vibration in our system. Usually the fastest vibrations are the ones of the hydrogen containing bonds. Okay, but there is nothing much interesting in the vibration of the hydrogen containing bonds. That's why we can use these algorithms to constrain the bond length to their initially defined length. Not all of them, we may constrain only the hydrogen containing bond length. The most popular maybe algorithm is called Shake. The essence of this method is to add an additional force which is along the direction of the bond and counteract exactly the deformation force along this bond. By adding the two forces, just the bonds are constrained to their initial bond length. This is the essence of the method. Okay, now we know how to do the simulations. We can make some choices, reasonable ones. We have the trajectory. It contains a lot of data. So what can we extract from an MD trajectory? Here I have just grouped the main types of analysis that can be done. So we can extract average values and structure parameters, terminality properties, whatever you can think of. There are some physical properties that are also related to the standard deviations of the quantities that we have in our trajectory or to the fluctuations of these quantities, for example the three capacities. We can generate the so-called radial distribution functions by counting the number of particles for type A and type B and what is the preferred distance they are located at. This is in the essence, the radial distribution function. This is a typical profile for a liquid. From this profile of the radial distribution function, we can judge how the molecules or the atoms are structured in our system. This is a typical profile for a liquid where the first neighbours are here at this preferred distance and the second neighbours are here at maybe some third neighbours. We can do cluster analysis. This is grouped representative structure that look like and extract in this way the preferred structure of our biomolecule. We can do correlation functions. This is an analysis that allows us to track, I would say the memory of our system whether a certain property B depends on another property A at a different time period. These are very powerful analysis. I have no time to go into detail, but you can identify if there are some periodic processes in your system. We can also estimate transport coefficients like for example the diffusion coefficient. If we would like to see how fast our molecule is moving in the medium, we can estimate this diffusion coefficient and many more just to give you an idea. So at the end, some advantages and some drawbacks of the methods. I hope I was able to convince you that it's a very powerful technique with which you can monitor the natural behavior of different sites and complexity systems. It may be parallelised efficiently because you can just split the atoms in chunks and send them the calculations to different cores. It can monitor both static and independent phenomena. It can be extended for some very non-trivial simulations for non-equilibrium processes to monitor phase transitions or rare events to estimate the free energy which you are going to listen to about on Thursday. It can be both classical and function. Of course, there are some drawbacks as well. It is a common problem with complex potential energy surfaces that molar systems may be stuck in a deep local minimum. Then there are the so-called enhanced sampling techniques which are required to cross higher energy barriers. You might need to apply them. The storage requirements are enormous. Tens to hundreds of terabytes for a single trajectory, sorry, of gigabyte. For a single trajectory, we are not in common. There is some tricky implementation of the QMM methods. You are going to hear tomorrow how the colleagues from the Raspberry Lab overcome these obstacles. Energy-related properties are not trivial to estimate. Molecular dynamics is a very, very structure-based method, but it's not impossible. You just need special techniques. Here are the facilities and the software packages. Of course, the choice of the particular computational approach will depend on the properties you want to see and on the available computational resources. Usually multi or servers are needed to run sub-simulations. Here I have listed some of the more popular packages that are used for molecular dynamics simulations. Of course, we have to acknowledge the support of praise and of these four organizations for allowing us to accomplish this training. I hope it was in a way useful for you. With that, I would like to thank you for your attention and of course to tell you that we will be waiting for you at the training and Q&A session in the afternoon. I'm sorry that I oversteck a little bit the time, but I think we have a break. I will be happy to answer your questions if there are some. Thank you very much, Lena. Yes, there are actually some questions. Shall we go through them now or shall we wait until the afternoon? We do go through them now because there are about a dozen or so questions. Yes, I can. Should I read them from the chat? I mean, I could read them out for you actually because of the sort of taking here, to the pain here. So the first question is about is from Hamid. He's asking whether we can combine well, I think this was talking about the combining, say a simple model for amino acids and you're going to find a grain sort of an atomic level for sort of a more interesting part of a protein. Yeah, okay. It's an interesting question. It's a non-trivial far, it's by far not a trivial technique. What people usually do in these cases is they use hybrid methodology, not hybrid models. So they, for example, treat this interesting part of the protein by quantum methods, and then they use, again, atomistic force fields to describe the rest of the protein. Of course, you can combine these quantum methods with a coarse grain force field and then extractively to results in what you're asking. It's a great question, yeah. Okay, so another, the next question was from Daniel asking if you could comment on the accuracy of the TIP4P water model, whether it's more accurate than TIP3P? Okay, it's another good question. Yes, the short answer is TIP4P is more accurate than TIP3P. But we come again to whether we would invest more computational effort into gaining some accuracy. It depends very much on the property that you're interested in. TIP4P has one interaction site more, so it will add about 25% to the computational effort. If you are just interested in the structure of your molecules, then it doesn't pay, from my experience at least. However, if you would like to compute thermodynamic properties, it's like to have very accurate density of your system, or to have enthalpies, or to have code, how do you say, properties of the entire system, collective properties, then TIP4P would pay for. It gives more effort to go. Okay, and then we've got Fbenal. I'm sorry, I don't know your first name. But they're asking, are there any specific recommendations for the use of solvent models according to the selective force field, or can any solvent model be used with any force field? Thank you very much for this question. I forgot to mention, usually force fields for biosimulations are derived together with a certain, or for a certain water model. All three force fields that I was showing you, Ambaro, PLS, and Charm, they recommend the authors of these force fields to use the TIP4P water models. So normally they were derived for TIP4P, but you can also use them. It's good to provide them also with the TIP4P. Yeah, you'll have to check. Usually in the papers of the force field, they recommend also the appropriate solvent model. Yeah. So, and then we've got a question from Pedro asking, are walls, finite sizes, effects eliminated with periodic boundary conditions? Yeah, usually yes. There are no edges in your system, so particles can smoothly move across box boundaries, and there are no finite size effects in this aspect. But, of course, your properties will be limited both by the size of the periodic box and of the shape of the periodic box. Because the periodic box is automatically replicated, this symmetry, the shape of the periodic box will be imposed on your system. And also, if the box is too small, there still may be some artifacts. In order to be sure that your box is large enough, it is recommended that you perform simulations with at least three gradually increasing sizes of the periodic box, then you monitor at least one of the properties that you would be interested in, and if you don't see any change, then the box is big enough. Okay. No, that's a good answer, actually. That's very good. I mean, there was, Aparna was asking, Willie, if you could explain again about those finite size effects, but I think you probably... Maybe I answered, but... I think you might have done yes. Yeah. Okay. And then Neff Bernal has got another question, which is, how can one validate or measure the reliability of a periodic simulation? Okay. That's also a very important question, but I think you will hear more about that from Dr Petkoff, who is going to be the next lecture. Just really, you always track the energy. Just be sure that the energy is conserved. There is no drift upwards, downwards, and any drift in the equilibrium part of your simulation is bad. So either energy is leaking, or there are some repulsive forces that should not be there. If you see such a thing, you have to see, you have to check what's going on. There are also several other properties that are usually monitored along your trajectory. If you are working in a NPT ensemble, then of course, before you check the pressure and the temperature, they should fluctuate around stable average values and not just any average value, but the rational one that you have assigned. For example, you say I want to simulate a body temperature, then your average temperature should be 310 Kelvin. That's it. And you have to monitor at least one structural property to see usually the structure relaxes more slowly than the energy in the thermodynamic properties. So you have to verify, for example, that the root mean square deviation of the coordinates of your atoms with respect to the initial structure has reached also a steady fluctuation around a certain average value. So these are the minimum set of properties that usually are to validate the reliability. That's great.