 which is titled demonstrations on how to prepare and then an end de-simulation with Gromax on a supercomputer. And the blue page area is going to lead us through that. So. Okay, before we reach the stage at which we will run some simulations, first remind those who are not familiar with Lin command prompt. What are the mostly used commands for navigating file system work with files, with files? Why we need that? First of all, all supercomputers, I know they work under some Linux distribution. That's why the act we have to a Linux command prompt and through it, we submit commands and do the job we want to do. Moreover, as you already have seen, Gromax interface command line. We execute commands and control the execution via setting command line options and parameters. And for listings in the current directory is avud. It's the current directory you can make the directory by mkdir or e-mail. The race or directory is recursively copy files. The directory is recursively name, create soft links, create empty file with touch display the content of the file on the standard output or more convenient command is less. Navigate, scroll and down, head and tail commands print something from the beginning or at the end of the file from the beginning or at the end of the file from the beginning or at the end of the file and finally tail minus F, we can display file y1 changed. Connect, if we want to connect to a computer server remotely, we use secure shell or SSH, linear shell command line client that the command is SSH and then you will use the user name, current user name you just type SSH or with some other user name SSH user name at host, sometimes for security reasons, SSH servers on the machine accept connections on the, and then the default port, default port for SSH is 22. If you want to use other than default value of the port, you can specify it by minus P, minus P options option. If we want to transfer file from the local file to the process of the remote server, we can use either SSH client which is FTP protocol secured by SSH encryption or use SSH command which behaves in the same way, CP copy command in Linux, this part of the file copy command in Linux, despite you can specify, you should specify the server you want to transfer files or from other useful tools are sync. It's usually used for backup, to backup file system it preserves user times times if set and it can compress the data for sending and then it can send to the server. Uncompress them when they arrive to the local machine. It is, it's happened behind us, but you can, sometimes you can improvement in the speed that you are transferring the files with. Okay, what is the general picture? One high performance usually has computing resources, computing notes and storage, storage is visible from the notes and we have access to those via login note, usually on the login note you communicate batch system, scheduling jobs to be executed on the process as it was yesterday and the batch system controls the jobs, the execution time, resources to be reserved for jobs and many technical, technical things to be done. So, I think it's a good idea to do that. We, if you want to use an HB system, usually administrators system give the user access to the law and some permissions to use booths or partitions defined in the batch system to use some resources. This is the general idea and this connection here from your local machine to the login notes is usually via, okay, some helpful information for newcomers if you want to use nickname for your connection on the desired port with name to a remote, this can be specified in the dot SSH directory in your home directory config file, you can just with the nickname of the connection and here you can describe remote host you want to connect to username if you need to use some public key or this is not a, it's my mistake, it's my mistake, this is not a public key, I'm sorry, this is private key, okay, let me share it again, I'm sorry, okay, where we, here is your private, the public key is in the directory on your remote and sometimes you can use SSH to some other machine if it's needed, then you can do local forward. In general, one can use those five lines to make it more convenient not to every time the entire host is connected to the network and the host name, the IP of the remote machine. Badge system, submitting jobs, yesterday you saw how to do it with Torq, Torq scheduler or Torq Badge system here today we will use slurm in this way, this example file we can describe the job, it's convenient to set the name of the job, the partition you want resources from how many know what is the wall time the job that can be kept active and files where the standard standard output of the job will be, and the last lines, the following lines, you can take it as a normal shell which describes a sequence of commands to that the job consists of many, in many cases, predefined environmental variables are loaded via model system for instance, which are to use which MPI library, even the software package you want to use can be defined in way, once you load the environment, then you can execute the commands here, the script, this is more of a the basic interaction with an HPC system. Do you have any questions? If not, if it's not there, then let me start, let me produce you the system, show this example, this will be for later set, it has active side defined here in, it has 24, I mean ways, it residue charged and 21 negatively charged, so the net charge of the molecule is plus three, then we will execute some sequence of commands we already discussed and show you how this is done and let's start with connecting to the machine, just want to share another desktop, okay, first we will connect to the machine, HPC system, the one is located in Yulich super computing center, this is deepest type system, it consists of three models, extreme scale booster model and data model, I've decided to conduct the practical center because here we have a cluster like system or system with accelerated with GPU, okay and how it goes, just type SSH D pile defined in the manner I've already shown the system, and now it asks for the system, then we have prompt here, this is my home directory, in the manner I've already shown the system. And now it asks for the password of my private den. We have prompt here. This is my home directory. We will work in work place. Demo are some files. I just compiled for the latest version and place where we will execute our commands. On the machine, on the local machine, we have a directory where the initial structure from PDB and then refined is located. This is the directory where we have the initial structure from PDB and then refined is located. I will use MD for visualizing the coordinates. And this is the structure without hydrogens. Now we need to, we can proceed either by using the manipulating structure grommacs locally or remotely. Because some of many people I know use your workings which are under Windows, they prefer executing all of the command for preparing the structure input files and so on remotely. I will proceed in, but of course if the resource is a few of your workstation or laptop sufficient you can do that locally. Now we need to send the structure to the remote machine. Of course I can use scp and then the file name. Deep and then the directory which I can copy here and send it. Again it asks for the pass phase, pass phase, and then the file name. And then the file name, and then the directory which I can copy here and send it again. It asks for the pass phase of the private key and so on. Or I can be graphical environment, graphical environment to open the directory. It can be done file explorer, go to other locations and connect to server page. And then the directory unable to access host in just a moment. Deep. Why? Because it's here. Okay. Now before, because the password is already registered we see the content of the remote directory. Of course here is the same. And in this one I can just copy structure there. Then once we have the structure there we can add the PDB to GMX command to parameterize the system, get the topology and coordinates. First, if you can use different tools and you can use different tools and you can use different tools and you can use different tools and you can use different versions of GMX and which one will be active now is set by sourcing one file for setting the environments and it is like this sort. This is Gromax computer model in GMX. If you will use one particular version many times you can just put in your BASCH RCScript or TCShell script, resource script and so forth. Now we can, we have already defined GMX and we can start. You don't need to remember all of the options by heart. You just need to know what to do in general. PDB to GMX, to GMX command has options, I tried to describe them but you can use the help. This is because I have to specify the compiler environment and the parallel environment, the parallel libraries, this is parallel station it depends on the particular machine you are using. And now you have a very nice description what the tool does, peculiarity and here you can see the input files, the specifying input files, out files so on and so forth. In this example we will use Charm 36 force field you can download it from Charm website they have prepared archives with the files already working, you just enter archive here and you can use it like in the same way I will do PDB to GMX minus F our initial structure minus O we can set it for instance FF, PDB and then it will ask about which force field to be used this is the field located in the current working directory and there are other installation directory already you can use according to your flavor and the simulation you do here we just type one and then it asks for the water model it was originally that force fields are compatible with all water models in this case PDB to GMX helps you as it says that this is recommended and of course this is compatible with Charm force field with it has Leonard Jones on the hydrogen atoms as is done for the proteins so we here one and then we have successfully generated the topology now we have topology some include topology file where position restraint which will be subject on the position and are defined and coordinate file with the responding names according to the force field and coordinates taken from the input configuration and not to forget they are hydrogens added because the initial structure was without hydrogens we want to have a look at it we just get it and just see by VMD which is FF PDB and here you can see that the white lines are newly added hydrogens what should be mentioned here PDB to GMX explains what protonation state of the histidine will use according to the environment here the distance matrices and the disulfide bonds that are introduced by default the terminus are N-terminus and C-terminus which is the what have neutral conditions people often ask how to set the solvent or the simulation box this is done by by calculating the GA of the groups and set their protonation states according to desired pH as you know proteins chain ionic amino acids change their protonation states according to the pH of the environment now we have the structure how the topology looks like you can see here you can check if the force field is properly properly set then the protein molecule has one chain and its name protein chain D the atoms you can check if the total charge is okay it's very, very useful to look at the amounts here this is character or symbol denotes that those the following the next there are not symbols are not taken which is this is command sign in the topology form and you can see that the total plus three here are the bonds function one means and so on the end of the file you can see the restrain topology which is included if we specify we define plus restrain the water topology if we want to restrain position of the water we can define here the ions topology the name of the system and in the there is one copy of this molecule okay now we will we will assign the box gmx ditconf minus f f f minus box dot pdb then the minimum distance let's put it on two and minus d centers the protein in the box and if you want to and the principle axis with the box factors you can put the prince command line option then we can just execute the and here the system size is calculated the center of the system it is manipulated both vectors and the volume we want to set the center and the coordinates of the center in gromax are on the half of the of the box vectors the molecule is shifted by by these displacements here the new center is located here new box vectors just a moment I forgot to specify cubic this is the box type and box vectors and of the system you can experiment of course we can put adderon box then you can see in the difference here is 430 and here we have 550 this is the advantage that we were talking about just due to just I will work with cubic box because otherwise I will need to post process the coordinate file to fit the box that's why I will use cubic box the other useful thing is that the modified files if the there is file with the same name presented in the director or specified as an output it's back up with this science you can go back if you want ok now we have placed the protein in the box we can have a look and I just encourage you at each step it's better it's better to it's better to to see what what happens with your system box you can't see the box but if you with pbs box here the box will be drawn and this is how the protein is placed everything looks ok then we can continue we can continue adding solvent to the in the simulation box this can be done by gmx so weight minus cp this solute which is box pdb minus cs is the file which will be used to the box with the solvent if you are curious and we can in the gmx directory in installation directory and see what is it this share gromax top and here we have this file which is nothing else than box with atoms and this used as a template to fill the box with default default gmx solute minus x no this is not the proper version it is for gpus work praise demo gromax cp cm bin gmx here I did that to show you ml compile para station npi cc okay the option here minus cs by default this is the file we have been looking at and this is place in the gmx okay which means that if we use spc water we don't need to specify the argument of this option then waterbox db and we want to modify the topology top is not needed by default minus p option looks for the top file then we can see the report is the volume what is the density how many molecules are added to the topology I showed in the example we have added this number of water molecules okay let's look at the coordinates file which is waterbox db then here waterbox db you can see here the box is filled with water of course pbc box we can draw the box again and here everything is clear afraid of these atoms which are out of the box due to periodic boundary conditions those atoms will be just translated on the height of the box if you want to display the periodic you can use vmd and then up and down see everything looks very up and of course yeah it is okay now then we need to add ions let me I have defined by me which raises the backup files we will use gen ion command you can see that the tpr file portable portable by the input here we need to we need to create it how invoking gmx pp command gromax preprocessor just a moment cage which takes gromax parameters file coordinates the solvated system if we use position restraints we need to supply coordinate file again and so on and so forth the only mandatory input configuration and gromax parameters for that we will use an empty L by default looks for grom pp.mdp be careful check if the file is empty if you you can you can use the touch command to change the time attributes for instance touch to call top here you can see that the modification time but nothing is a change in the file that's why it is not very dangerous to invoke this command on already existing file I mean if the time stamp is not important then gromax pp c we need waterbox.pdp topology top we have some warnings because we have empty parameters file and everything has value again this is it's not to remember the parameter by heart because you can just use this mdp file which is output file parameters set by by the and then here you can see the complete set of grom parameters and you know what you want to set and modify and you can produce this file any you want okay we have topoltpr file and we can use it as in for the genuine command minus s of course topoltpr default name we need to specify but some of you see this procedure for the first it's better just to have in mind that such files are needed okay then we can see the gainion here is input if some index file is needed you can supply it topology file to be modified and output structure will be written in the file with the name defined with minus option okay here you can see the default with p name and n name we set the names of the with pq and nq we set the charge of the positive and negative irons actively and if you want to modify this distance here this is a non-solvent molecules this process this tool generates randomly which is randomly and if you want to set the seat of the random generators you can do it by this and then the two options which are very convenient if you want to just to set the neutralize you can concentration the zero point input liter and neutral we want to modify the topology the output is water box with zero point pdb it asks which is your solvent solvent group by default it's called sol sol here from solvent and we just type there 13 you can see what solvent molecules are substituted with the irons who need some other irons to be specified you can you can you can look here for instance just a moment this in the force wheel we are using IITP and here you have defined sodium irons potassium caesium chlorine calcium and zinc are available in the moment this force field if you want to to put other types of irons you just need to specify and for instance zinc this is thus we have topology modified accordingly here we have added and of course we can have a look and the coordinate coordinate for VMD water ok my fault and of course we can visualize the protein for instance protein they then create another represented irons and here we can wonder and you can see how the irons are placed they are placed only and the distance between solids and the irons is at least 6 angstrom as it was defined by the by default argument of genion ok here we have almost everything now we need to minimize the energy and we will need one mdp file we can just modify the file if we wish but there are many lines in it and usually this once you do some simulation you you prepare your mdp and use it next time and change parameters accordingly I will call it em mdp ok em mdp what we can hear we define flexible water and the imposition restrain on the crystal structure we will keep it restrained until we end up with the heating ramp then use steep integrator and tolerance which is key for demonstration this let's put it let's put it here something like 200 just just to make more to make the patient faster it will well equilibrate it and then how we treat the interactions we use PME for home interactions treatment and switching function for ver.le for the bander here is the switching radius and the bander radius the neighbor radius for constructing air list is 1.2 and every 10th the planet will be written in the trajectory then this is because this is just energy minimization there is no temperature definition and so on and so forth ok and this is the mdp file x grown pp F water box minus em minus c is coordinate file minus p is the topology and output file will be named em because we have position restraint we need to specify the coordinates again and it's very important to read the warning because sometimes some parameters can be set wrongly or not accurate enough and so on and so forth here we have only note that the data the output file will be about 16 megabytes do all these steps but the procedure will converge faster than that ok and we need to describe of running gromax on the machine we will use computer cluster model which consists of cpu nodes connected by infini band switching network account which will be the time will be taken and the reserved I should not forget to thank Dr. Estela and Peter Nisen from GSC for providing us some resources to make this demonstration possible then we will run it on let's put the 24 tasks because there are 20 cores and each core has about two threads as far as I remember but you will see that the log file gromax will tell us what is the architect machine if you need to do it correctly you just go to the description and you can see the sockets zion gold this version and so forth ok working with slown there are commands for see the info about the partitions as if here you can see the partitions defined how many partitions what how many nodes are allocated how many are idle drain and so on and so forth this is the inform slown gives us if you if you have reservation s control show reservations we use this here we take this name and put it here usually you don't use reservation any time today it is exceptional just to ensure that we will have enough resources to this demonstration usually you just submit the job in the queue and it what happens with the queue you can see with sq command here there is a job ID every job submitted to the system has a unique identifier this is the idea which partition the resources on each partition are requested the name, user the status are means elapsed time how many nodes and which nodes then as batch we can submit submit and again cf means the job is configured now and if you want to see only your jobs in the queue the option is minus u of course if you don't want you remember your name because our linux command which is fullmi and which username you are logged in or you can do it like this output fullmi and give it to the main option the job has finished this is the output here you can see the because I have set minus v option which means verbus the information which is at what step what placement what is the partition what forces and the most important is to have conversion minimization procedure set yes the other output file this is em where the energy saved we can see the energy using em energy to minus f em and we just want to plot the potential energy which means empty and of course we can download it here where is where is it gxvg the default name and it can be utilized by gmx by fixmgrace command energyxvg and you can see here how the potential energy is minimized with the steps okay we have minimized the system now we need to do heating and the solvent we can do it by this aqpr mbp file okay we have we with this as you know we say that we will restrain the positions of the heavier in the protein the initial time simulation time which is integrator is live and we will do about 400 4 picoseconds here write the position every 1 picosecond and the parameters are the same this is NVT simulation here we have 4 terms coupled to the both of ions solvents and the protein and since this is done in NVT pressure we have to know okay then we take the minimized structure which is em grow as an initial coordinates of the next simulation minus em grow then topology then AQPR AQPR AQPR AQPR AQPR AQPR here again because of that we need to this one here you can see that they are not worn and again we need to submit the job shell here I need to SH I need to copy copy the reserve here let's have a look at it again sure let's try on four eight tasks per node and six open per MPI task this is the way we tell the how many how many how many MPI want to run and how many threats to be associated with the tasks then we just submit AQPR H with tail we can monitor the output we can see here because we told to grow marks to be verbose it tells us what is the imbalance in force calculator between PME and PP nodes what is the current time step and what is the estimated time when the simulation will simulation will end okay we have some time question just let me check if there are questions in the chat I asked from whether you could have a position experience how could you have the one place position experience to say on that some other kind of okay okay let me clarify that just a moment this is set in the topology okay this section here position restraints we just atom with which index what type of the constraint type 1 meaning and what is the force constant in the long y and z and this constant can be 100 thousand does is not too tight is not too loose if you do very nice equilibration you can decrease decreasing steps the force constant for instance 1,000 logarithmically 1,110 and then without restraint this number here is the index which of the of the atoms defined in the topology here 5 is this atom and other as it was here 1,5,7,10 here we have 9 is the nitrogen and the n-terminus and we have here then again come as you can see the only heavy atoms restrained of course if you want to restrain for instance a ligand then in its topology you just need to define some position restraints in this section when you can specify which atom to be restrained by with what force constant I think it's not fair enough okay the simulation the constraint has finished I to emphasize that we have defined temperature annealing which means that the temperature was linearly from the 0 to 30 to 310 second picosecond of the simulation now we can see how the temperature how the temperature goes g-energy minus f eqpr edr we just want to see the the temperature of the system here of course you can plot the temperature of all thermostat groups defined here but let's see the average temperature here as you mentioned the you may mention that here you can use either the number or the name of the coin we want to plot and just let me temperature cold temperature in this is 16 okay we just need to take heaps and temperature xvg as we start from 2 kelvins and linearly increasing the temperature up to 310 degrees you can oscillate and so on and so on the on average it can be calculated it is 310 okay then we need to run the calibration means that we will aq aq n ntp I don't have such a file cp aqtp mdp aqn pt ntp let's modify that not use a kneeling protocol we can set or just delete it okay then some pressure pressure coupling it is not template let me take it from mdp files I minimize the mdp aq dot n t dot mdp m aq npt npt coupling and if your system is unstable you can put berenzen sen berenzen here as well if you want to use the this case the velocities the input file will be used and again let's do some bigger the system longer the the equilibration berenz do it for 500 seconds and again minus f q npt we take as a coordinate the output of the other other the previous simulation then and minus o npt npt aq npt and again we can just take this job file here and modify it k em aq m sh t sh k the resolution let's try to run it on 16 nodes and here we have npt and npt what to tell you that def nm by default looks for set tpr file output files base names have are equal to this string this this is a option npt npt sh and again we see that the resources i might misspelled the name of the the second file name you got it you got it npt sh just a typo em eq npt sh just correct it here we have aq npt sh let's look at it again and the production run is just you need to set the proper npt npt mdp production mdp what we need for sampling we just put here the rescale and mail and of course adjust the steps you need for instance 500 nanoseconds i think yes 500 nanoseconds of course for production run it's better to test to find the maximum of the simulation scales and in order to have the good performance of the simulation we have already related the system and you can see the rmsd of the protein here for 500 now there are some conformational changes here it is useful to see the rmsd of the pocket this is the the side here because it's an enzyme and you can look at root mean square fluctuations here and of course you can see here that there are some some flexible parts and other that are that don't fluctuate much here with the red points this analysis by the way done by and here you can see the generation radius as far as I remember the crystallographic one is round one the country member is exactly the inertia momentum production temperature energy total energy and pressure in the production the average density was here and there can be done many many other things but unfortunately it might be that I was presenting too slowly but I hope that at least people who are not familiar with turning are satisfied okay yeah two more questions if we've got a few so one of the questions was so if you the rotation of the protein that's previously equilibrated would you have to re-equilibrate this stay in place now mutations for which is equilibrated and mutated states we need to equilibrate again after mutation if you mean again to warm up the system it is not I think ah no no you will put again it's better to just keep the molecular restraint and then yeah it's better like that okay and the other question was about how do when you put in a job permission when you request a particular part where how does that necessarily map onto nodes processes for nodes that's going to depend on machine really yes it depends on the machine but there are some suggestions even by the the optimal number of threads to be used per MPI process is between 2 and 6 as far as I remember and then when you if you want to utilize all threads in the socket or on the node you to set appropriate number of MPI processes per node which means per course and then course per processor no if you know the number of the the number of course in the node it's already it's already adjusted and then threads per core you should experiment it's better not to oversubscribe the the hardware per threads and as well I in many of the cases it's better to pin to thread just to keep to to satisfy the the locality requirement what what else I don't know I think I think that's about it I mean I suppose the only thing I one thing I spotted in the job is submission scripts or for the machine using is that you I think you specify I think you specify number of course per node that you're going to know that you're going to use and I guess that's something you can but on certain machines that's something yes yes of course but it's better to consider the scaling of the code just to play something with it to see how it behaves on the system as we have experimented Gromax behaves very well even on K and L on processors 0 and 5 we got yeah it deals quite well with many threads I think that's about it then thank you very much