 I'm going to introduce Professor Pacheo Petkoff who's from the Faculty of Physics at Suffolk University and he's going to give the lecture on Introduction to Molecular Dynamics Simulations with Gromax. Let's remember again the simple simple explicit solvent MD simulation. You don't need to remember it by heart. There is a logic you did already explained. First, we need to prepare our solute model, what we are going to simulate protein, DNA, membrane and so on and so forth. Then there is assignment of the fourth where you set parameters to all particles and they are interact. After that simulation box shrine, this is the part of space where the simulation will take place. Usually in relations proteins and other other movie, we investigate our aqua solution which means that we need to add molecules explicitly in the water because we are doing explicit solvent MD. And of course most of the cases we have to account for the strength of the media where the molecules are solvated which means we need some ions. Depending on the problem, they can be different. Once we have prepared this box with the solute and solvent, we need to minimize the energy as it was already explained because we equilibrium molecule we need to start from some local energy minimum. Then because solvent is randomly distributed in the simulation box because usually it's done like that, the only the only important parameter here is the den of the system. And if this requirement is fulfilled, then we consider the simulation boxes well prepared. That's why we need to equilibrate the solvent which means that we restrain the motion of most of the heavy atoms of the protein solute which let's let's think about it as a protein and allow all other particles solvent particles to move. And usually people prefer to increase to increase the temperature gradually to warm up the system. Once the system has reached the temperature, we re-beach and run which is usually done with stable pressure control and temperature control algorithms just to acquire proper distribution of the of the velocity and of the velocities of the protein and solvent particles. And then we are ready to do some production run. These are usually the the computational demanding part and according to your system and time you want to simulate you have to have access to some computational resources and then it was quite good how the trajectory is analyzed and what is the aim of the trajectory analysis previous lecture. Initial configuration. Of course, when we simulate interactions caver of some molecules then we need 3D structure will set the initial conditions of the procedure. Most popular free free available source of such structures protein data bank. It was mentioned already yesterday. They eat almost almost all crystallographic structure taken to be found there or if you are able the knowledge and of course you need it. You can track your model by some modeling software. Then structure crystallographic structure usually be refined. Sometimes loops in the proteins the structure are missing out and so on and so forth. And for this you it is more confused dedicated software could packages one people around me use and it's very very convenient for such refinement. And finally, when molecules are have correct models, then we can assemble the desired configuration. We can start start next steps in the protocols and in the simple protocol we just discussed. For this, usually molecular dynamic is used today. I will try to present you how to do it with groma. It is the best molecular dynamics packages in the world. You know already that what it is. It is a primary design for biochemical molecules. It is fast because of algorithmic and processor specificizations. It is a parallel application. It have limited memory and shared memory models implemented there to accelerate the simulations. It can be run on many computer architectures from a laptop to a supercomputer. It has a nice user guide and reference manual. On more of the slides I will present you, there will be a link to a manual where you can read further details for the method which is implemented. Then coordinates. Of course, we need a three-d structure of the molecule which means that we know the coordinates of the atoms or particles if it's some coarse grain. Gromax supports three coordinates file formats. Gromax protein data format and Gromax 96. Usually people use Gromax or PDB to see examples of those formats. Both Gromax and PDB files can keep the particle coordinates and the boxes where the structure is defined. Gromax can save the information about the particles velocities as well. If you see my pointer, first three columns with this floating point coordinates and the second three are the velocities are different for those two file formats in Gromax files and in Gromax program itself. The units of land are meters and these are in nanometers per picosecond while in format the units of land are angstroms. In both formats are defined by a line. In one case, this is the matrix of the box vector while in the PDB file we have vector slangles between them. Okay, to set parameters we have a tool which is called PDB to GMX. It coordinates a force field parameters and writes the typology files. You can see that we can use include files and so on and so forth. Four force fields are supported umber, charm, gromos, and ls. It is very important to keep in mind that these two don't need that hydrogens in the structure. Hydrogens are added to predefined and according to the protonation stay amino acid residues. It can set in addition virtual sites which are used to remove a fast moving degree of freedom of hydrogens and improper topology file has exhaustive descriptions of intermolecular interactions. Particles have assigned charges, masses, thunder walls parameters, non-bonded interaction parameters, bonded constraints and so on. And so forth one simple molecular gromax topology you can see on the screen. First line includes the force field definition. Then in the red is the description of the of the morig, namely urea in this example. We have sections which describe atoms, bonds, the hydros, the improper and improper the hydros I described in this section. The difference is the function is assigned to this force field term. And we have of the water molecules name of the system and section where where the number of copies of each molecules described in the in the topology are presented in the box. We can use include direct to and conditional pre-processor practices you will see to easy control and nicely nicely prepared topology files. As I already said, this part here describes the molecule type. While this is the common information for the system, we can put when we can when we use many times one topology description for a molecule, it is very convenient to prepare the topology file which is called our case urea ITP and depending on what we want to impose restraints or not, we can define in the in the mdp file those strings here or whether we want to impose those constraints those restraints or not. And finally, we have this set of files to be included in the topology and topology can look very nicely and readable. Then for defining simulation box in if we want to simulate molecule in in in solution, best choice is periodic conditions as it was already mentioned to minimize the edge effects. We we need in mind that they are they are too optimized. The first is the the sufficient size of the box in order to minimize to minimize the artifacts of conditions, which means the distance between the images of the hour of our salute molecule and the other is the number of our system. Bigger is the number of bigger the numbers longer the conditions will last. That's why we need to optimize the the number of molecules in the water in the box. If we have a globular shaped protein, for instance, the box shape is a spherical one, but we can't apply periodic boundary conditions per sphere. That's why in Gromach, there are shapes which approximate the spherical shape. Namely, do the caedron and tender octahedron boxes. In the table, you can see the distance between the images in periodic boundary conditions and the volumes of the corresponding simulation cell. Here are the angles and and the box vectors with respect to the diameter of the system. The box can be assigned by edit.conf, it just to convert the format of some file from Gromach, GB and vice versa. But most often we use it to simulation box. It sets box vectors, angles between vector and so on. It can translate the coordinates, rotates the coordinates and velocities align the structure in the so it's very, very useful, very useful tool. Here you can see an example. Despite of command-line arguments defining the input and output file, you can use minus option which defines the minimum distance between the solid and the box. And if to define cubic box, the result would be look like this or octahedron like this. And according to the user to set this here, please keep in mind that for sure, be enough to minimize the interaction with the periodic images. We already defined the box, then we need to add solvent molecules there. The tool is gmxsolvate. It can be used just to generate box of solvent or it can it can fill the simulation box with solvent molecules. And optionally, it can modify the topology of the system, layer of solvent around the scale, those are things that are important but if needed, they are needed in more special cases. Important thing is here add the corresponding number of solvent molecules in the molecules section of the topology. Otherwise, the coordinates in topology file would not be consistent. Here is an example how to use it. We have initial input data which are the box protein there and the topology. Then we can put in this command. We have box with water and the topology modified in England. If you are curious how this tool works, it fills the entire box with the molecules in the direction of groma, predefined cubic template of solvent molecules with equivalent standard conditions. Then using this template, the box is filled entirely with this solvent molecules and according to conditions, the molecules with contacts with the solute molecules removed. Adding ions could be done by using gen ion tool which randomly replaces solvent molecules with monotonic ions. We need to have a neutral system which can do it by setting the number of ions such that the total charge of the system is set to zero. As an input, it takes a portable binary input file which I will describe in the next slide. Apart from that, it needs to know the names and the number of the negative ions, the names of the ions and concentration. As well as the charges of both types of ions. The output is a core file and optionally, again, we can pass the name of the topology file to defy it accordingly with the with the change number of solvent molecules and ions in the box. Well, binary input file gives the entire the whole information about the system environment which means the coordinates molecular topology and text files gather the information from those files and produces binary input file. Example how this can be done with Gromax tools. We have protein in a box corresponding to topology. Here we invoke the Grompy tool with empty molecular dynamics parameter file. In this case, both values all parameters have molecular dynamics parameter default values and those will be used but the only thing that Genion wants to know is the topology of the system which means that this this default set is sufficient enough for this operation. So we have the portable binary input file here and use it to generate to add ions or more precisely to substitute some certain number of water molecules with ions. Here, in this case, we just say that we want 0.15 moles per liter with neutralization and then we pass the topology file and give the name of the output coordinate file. Here we have the coordinates file and the topology modified accordingly. Here the ions are not shown but they are not well visible. If we hide the water molecules from the calculation, we can see how the in this particular case how the ions are distributed in the simulation simulation box. We have prepared our simulation with solute solvent there at the conditions we wanted now to run energy minimization position restraint and production runs. And for this we will need the main computational engine called which MD simulation it can be run for integral equations of motion minimizations, stochastic dynamics particle, test particle insertions, frame iterations, normal mode analysis, pulling some group of molecules, applying force to some to some defined group of molecules in the system, replica exchange and so on and so forth. It has mandatory input and this portable binary rim file. If we continue simulation, we can pass the corresponding checkpoint file where the system, the state of the velocity coordinates, other parameters of the algorithms are saved from time to time in checkpoint files during the run of this program. There are many other options controlling the parallel execution verbosity and many, many, many other things depending on the task you use MD run for. As an output, it gives binary file energy temperatures, temperatures is not a typo. You will see in a moment why we can, we need many more than one temperature to be recorded. Of course, some terms of force field, there are binary files for writing trajectory, log file, polydicament in the checkpoint file and many, many other depending on the task we are solving with. Then let me find you again, this portable binary is prepared by Gromax preprocessor, Grompp, because as we will see in a while, we can define, we need to define some groups of more to be treated in a special or used in some algorithms. These, those groups are in the index file, which is with very simple format. There is a section name, some string embraced with squared brackets and then the indices of the, of the atom. Molecular dynamics parameter, molecular parameters file is, as I already said, set a text file, which has following syntaxes, we have a keyword, then assignment sign and the value want to set to the, to the parameter. With this file, we actually tell to the what we want to do, what calculation will be used, what type of constraints to be applied, what are the cutoffs and interactions, how we control the pressure by which algorithm, temperature, how frequently we, we will write coordinates in the trajectory. We can modify the Hamiltonian for coupling and decoupling, with coupling decoupling parameter to do some calculations. For, to be, to, to which group if we need, if we want to some force to change the temperature with simulation time, you want to apply some related annealing and many, many other parameters. Okay, we now know how to tell the simulator what to do. Let's look at the program implementation just to, to have some knowledge how roughly the machinery works. To solve Newtonian equation, we, we need to calculate the acting on each particle as it was already world. And by taking derivative of the, of the energy, energy function, as an example, I have taken here the force field and as you know, there are two types of interactions bonded and bonded interactions are two, three, four, according to what we want to calculate, bond stretching, gangl, changing, bending bonds and bond. But those are well defined as you already have seen in the topology file, which we have list, list of atom, which can be just which can we need over and calculate the corresponding forces. But with these interactions, the situation is more, more complicated. First of all, we have, we have Van der Waals and Coulomb interactions, which are, which have infinite, infinite range of, but a, yeah, they, the interaction energy and forces molecular bandwidth distance. This is a good thing. The forces are some of forces calculated by, by pair interactions. And as was already described, the idea of, of introduction of truncated schemes. Atoms, we, to have just a linear, a number of calculations with, with respect to the number of atoms, we need to introduce some occasion, radius, cut off radius. And here comes the question, how, where, which, with what error we calculate these interactions. This is the purple, the purple, the purple color term. And we can judge if it is finite or not, taking this integral. Oh, this, this is the, the common form for the potential alpha here for two, for Coulomb interactions and six for Leonard Jones interactions. Unfortunately, when alpha is smaller than three, which means Coulombic interaction, the, the, the term is not finite, which means that we are not honestly applied truncations to the, to the, to static treatment procedure. Then for, for Vander Waals, this is, we know, we know almost what, what, what is the error of, of truncation. And in case we can apply switching of the, either of the forces or the potential in grommacs, this is done by, by such, adding such, such function here, you can look at the manual, they are simply derived here. Just to mention one here is the switching radius where the, where the switching function, the switching function starts. And we need to apply, as was already mentioned in the previous lecture, particle mesh error algorithm. I'm not going to explain it. I am just want to remind you that our, this algorithm, the accuracy of the algorithm has four important parameters. Those are splitting parameters, which, which sets the, the distance where, where the, the, the directs can be considered negligible interpolation order. Yes, this is the order of polynomials used to interpolate the, the, and, and the resulting forces and potential cutoff rates, which, which defines the, where we can use the direct summation to, and for the other particles we will use Fourier. Based procedure, you will solve the equation in the reciprocal space. And Fourier spacing parameter, which is the constant of the, of the grid mesh of the Fourier grid, I'm sorry, which, and all these parameters can be set in the molecular, in the, in the MDP file with key, key words. And here we can set the, we define treated either by truncated scheme scheme or use PM algorithm, the cutoff radius for it to static PM interpolation order Fourier grid constant and relative precision, the, the same can be defined for the Van der Waals, Van der Waals interactions. To do it faster and utilize the power of the computing machines. We need parallel implementation. And to have a distributed memory model need to define, to define, to divide the system in domains domain is run on separate MPI, MPI rank. In, in, in a domain, we can use shared memory, parallelization, either by the calculations to GPU or multi trading on with course. And the last level of production is SIMD instructions, which mean that we can, we can do operation of more than one, more than array element within one, one CPU cycle. And how, how many elements we can do operation with in one of the same, depends on the with vector register instructions, modern, modern support extended instruction sets for such vector operations. You can look what does SCC mean AVX, AVX 2, AVX 512 and so on and so forth. Not, not only vision is supported, you, we can reply or even if we have three, three array can, we, in one cycle we can multiply one, one with another and add the, the, the third one. This operation is called fuse multiply add, you can take, compare and so on and so forth. This is a very useful implemented neighbor search algorithms, implementation, force, signals, calculations and, and they, gromacs can benefit from, can benefit from the vectorization supported by the, by the simulation on. You can see here some, for instance, less scheme is implemented as a list of clusters and representation is done according to the, to the vector register. And three here, we actually label the atoms in, atoms or particles interact or not, but without but looping over all, all the, over the entire lists in search to compute sometimes zeros, but utilizes very efficiently the vector, the vector operation and actually have a huge benefit. In treat, in long range, interactions, treatment, particle mesh errors, parallel implementation allows use of dedicated, those are multiple program, multiple data. You, on the left hand side, you, you can see the, the flowchart of the calculations in the real space, in the PME, in the reciprocal space. Those, this runs on, on nodes called PP ranks and the reciprocal calculations in the PME, PMX. And with this, with these arrows here, you can see the conditions between two sets of ranks in the beginning of them with the coordinate size and to, to the PME nodes and once they are calculated, they are sent back to the, in the, to the particle, particle ranks for, for execution x steps in the iteration, iteration, iteration algorithm. This allows to control the inter process communications, the PME is computationally demanding due to some operations. That's why offloading on the dedicated ranks has a huge benefit for ability of the, of the software and move the load from PP nodes to PME nodes by changing the cut for agri spacing in such a way that we keep that constant, which cutoff is not important when anymore, you can use it to disbalance between two groups. And of course, the particles in the, in the domain, the particles in the domain are not, are not uniformly, the particles are not uniformly distributed and they are domains with, with, where more computations are needed and are, let's say, less computational demanding. In this case, have dynamic load balancing, which is implemented by varying the domain we use flexible, flexible domains are used. We came to energy emission, how to set it, the parameter, which is important, it's possible to for the, for the algorithm is called integrator. Three algorithms are available, steepest descent conjugate gradient memory version of BFGS. Algorithm first, first the steepest descent is relatively, relatively quickly approaches minimal vicinity, but the virgins is very slow, because sometimes it can miss them so on and so forth. Contrugate gradient on the other hand has to be used when we are pretty close to the local minimum. And then it, it's the minimum very accurately. It's worth mentioning here that constraints, the conjugate gradient algorithm is in such a way that can't work with applied constraints. That's why we, we need to use flexible water and remove all bond angles and so on constraints. This, the last algorithm has faster convergence with respect to the conjugate gradient, but it's, it has only serial implementation. Algorithms controlled by this set of parameters here, the tolerance is the upper value, EM toll is the upper value of the maximum force, EM step is the maximum step for steepest descent. We, you have opportunity to run steepest descent steps due to the gradient and these parameters sets this frequency and the number of correction steps in quasi Newtonian algorithm can be set by this parameter here. Finally, we came to the performance of our MD simulation depends on the so, which means how fast or how many steps we can calculate per year of work log time. And simulation, simulation itself because the, we need to set the, we need to set the integration time step and multiply the, we will have the, the simulate time per, per unit work log time. That's why the, the, the type of the simulation is important as integration algorithm is important. What is the, the, the software performance is that the simulation performance depends on the integrated as I already said, then we need control the pressure and temperature. And of course, we need initial atoms velocities are randomly generated according to the Boltzmann distribution. Leave frock and velocity integrators are available and control by integrator and the parameter if you want to use leap frock, just set integrator to MD or MDVV if you prefer velocity where the dynamics can be, stochastic dynamics integrator can be used as a thermostat in you just said stochastic dynamics integrator equals SD. How, how are lists are assigned, assigned and how many, and what, how frequently they the parameters are controlled by our list and, and is the limit. If you already know that the, the time step is limited by the fast motion in the system. If you use standard, standard simulation without constraint potential in the Boltzmann angle and so on and so forth, the, the, the maximized time step into Kelvin is, is one thing to say, if you apply constraints only do the, to the bonds connecting hydrogen and heavy atoms, you can use two things. If you fix the all bonds, you can reach let's say four femtoseconds type step and apply v-sites then, then you use five femtoseconds time step constraints, constraints, constraints algorithms available are subtle, shake links and parallel, parallel links version just to, to know what, what the, the parameters for, what are the parameters setting the links. The, this is a non-interactive, two-step, two-step algorithm faster, faster than shake and it has nice parallel implementation according to, to the order, to the order of the different amount of need to be transferred between, between domain. Virtual sites are, virtual sites are massless atoms with position in this function of the positions of the other particles. They can have assigned charges and Van der Waals parameters and they are used, the, the, they are used in integration and forces are, and, and actually they are used to, to distribute the acting forces on this dummy atom to the atoms which, which construct the, the v-site. For protein, this can be automatically generated by PDB to GMX. Temperature control, I'm sorry I'm running, I'm, I, I saw that I'm running out of time. I have, I have more slides but I think in, in, in three, five minutes I'll be done. Uh, temperature control can be, uh, temperature, uh, system can be coupled to Berenzen, thermostat, gliris scale, Nusekhuvar, they were all explained. You have to keep in mind that Berenzen is stable but breaks the principle of partitioning theorem. Nusekhuvar, uh, generates proper, uh, proper, uh, ensemble. During practical say, session you will see how we can, uh, set, set the, the, and its parameters. Here I just mentioned that for, for proper, uh, proper, uh, velocity distribution, uh, grammar allows, uh, pressure controls group, group of atoms. Then, uh, if we want to, uh, do simulate it annealing, we just get the protocol how, uh, how temperature will, uh, change during the, during the simulation. Pressure control could be done by the thermostat for an L-Raman, Martin Tukermann to bias client. Uh, and, uh, of course, we can, uh, we can ask for constant surface tension but this works only with Berenzen for now. Then usually is used, uh, during the equilibration phases and for an L-Raman, uh, in the restraints that we can apply our positional angle, the hydraulic distance, restraints. We use, uh, restraints during, uh, uh, equilibration and heating. That's why I'm mentioning, uh, this is controlled by, uh, topology file and, uh, how, how the workflow looks from the, uh, from the Gromach's tools point of view. Uh, we start with one, uh, new file, uh, then we, uh, add, uh, molecules, change, coordinate, and so on and so forth. And each, which each tool we change, uh, coordinates, uh, somehow, uh, while the topology is changed only when it's generated, uh, out-solvent and, uh, ions, uh, here, uh, integrator, uh, the, the set of setting up integrator determines the energy minimization, uh, protocol, then, uh, molecular dynamics. Here we apply the strains with MVT and here the integrator is a leapfrog, uh, and it's important to set one, to set one parameter with, uh, tails, uh, the constraints to be applied, uh, in the beginning of, uh, simulation. Internally, Gromach's, uh, uh, it doesn't matter what unit cell, uh, shape of the unit cell, he, uh, is set, uh, Gromach's always, uh, using rectangular representation, output files, uh, usually, uh, the, the, the coordinates and the trajectory look, uh, ugly for something different, uh, from the, uh, rectangular boxes used, then we need to process the trajectory before looking at it by, uh, TRG conf, uh, conf tool, uh, of course, uh, important to know how to do free energy calculations. Please, uh, uh, keep, keep in mind Moro's, uh, introduction to free energy calculations with PMX. PMX is a platform which uses, uh, uh, uses, uh, access to run, to run simulations. Thank you for, I apologize for, uh, keeping you hungry, uh, lunchtime, lunchtime, and thank you. Okay, thank you, Pedro. Um, there's one question in, in the chat, um, so we can probably whistle you that now, which is, do you usually, do you usually do your trajectory analysis using the XTCR files? Yeah, it's, it's very, um, uh, very, very important question. Uh, during, uh, if, uh, you just, uh, need to look, uh, look at the trajectory, visualize it, uh, uh, HTC, uh, format is more convenient, uh, because it, uh, this is, uh, the data, but the advantages, uh, to have, uh, uh, to have, uh, but, uh, uh, if you have, if you want to have a good, good compression level, you, you will accuracy of the coordinates by if, uh, if, if the accuracy of the coordinates is, is, is very important in this, uh, you need to use files even being, being bigger than XTC files. Yeah. Yeah, it's, it's, yeah, I understand it's that kind of between sort of, have it, you know, between sort of, um, the, you know, how much, once and how, and sort of how much, well, and then she fits onto your, onto disk space, I guess. Yeah, yeah, but if you, uh, if you set accuracy of the XTC file comparable to the TRR file, the, the size of the file almost equal. Yeah. It's according to our, to our, okay. Yeah, no, that makes a lot of sense. Okay.