 Good afternoon and welcome to the latest installment in BioExcel's webinar series. My name is Adam Carter, I'm from the BioExcel project, and today we're very happy to welcome Simone Maloney, who's going to be talking about multiple timescales in atomic simulations. So I'm just going to give you a very quick introduction now, no more than five minutes, a little introduction to BioExcel. What we do for those of you who are less familiar, and then I will hand over to Simone for the rest of the talk today. So for those of you who don't know about BioExcel, we are a centre of excellence for computational biomolecular research, and there are sort of three main pillars to the work that we're doing at the moment. The first is biomolecular software. So one of the things that we're trying to do is to improve the performance, efficiency and scalability of certain key codes that are widely used in the biomolecular research community. So in the project at the moment we have some of the lead developers from Gromax and Haddock and people working on the QMM interface in CPMD. So these codes are important to what we do, but we're also interested in other codes as well. Another important part of what the project is looking at is usability. So usability of these codes and how they can be used as part of wider biomolecular research workflows. So we've got work going on in the project to make it easier to integrate these codes and other ones into workflow environments, including the ones that I've got shown here. And the final part of what we're trying to do is to promote best practices and to train end users in aspects related to these codes and to the usability aspects to the workflow platforms as well. So the consultancy and training, that's one of the aspects that I'm involved in. And as part of the consultancy we also have interest groups which are open for anyone to join. So if you're interested in any of these things, I think possibly the hybrid methods might be of interest as some of the audience for this webinar, but you can see there's a number of different interest groups here that you can join to keep in touch with what we are doing in all of these different areas. One little pointer, it's a rather last minute now, but the registration is still open. If you can join us in Amsterdam next week, we are holding our first community forum event. It will be based at the Lloyd Hotel and Cultural Embassy. This is a networking event with a programme of interactive working groups aligned with the Pyroxel interest groups that I've just mentioned on the previous slide. It's free, so if you're interested in finding out more or registering, you can go to this URL here. All right, so Simone has indicated that he is happy to take a question as he goes along in the talk today, but it probably is easiest if you save up most of your questions until the end of the talk. We'll make sure there's plenty of time at the end for you to ask questions. The easiest way for you to do that is to type your question into the question box at the side of your screen and go to webinar. It looks similar, not identical, but similar to this control panel that you see here. So if you click where it says questions, then you can type your question. Then at the end, if you have a microphone and you want to ask your question directly, I'll open your mic and you can address Simone directly. Otherwise, I'll read out your question and Simone can answer it then. If you're watching this webinar as a recording on YouTube and you've got some questions for us later on, you can ask those at BioExcel's forum, that's ask.bioexcel.eu, and we'll make sure that somebody passes that question onto Simone or answers it directly there. So I'd like to now introduce today's presenter. Simone Maloney is a research professor at the Department of Mechanical and Aerospace Engineering at the University of Roma, La Sapienza. He works on applications of computational statistical mechanics to fundamental and technological problems, especially related to energy technologies, materials for third-generation solar cells, energy storage, energy scavenging, and so on. He's developed techniques for the simulation of rare events, that is, events which are too infrequent to be observed on a time scale that's accessible to brute force simulations. And he's also developed non-equilibrium techniques, which in principle allow you to exactly compute statistical quantities in atomistic and molecular systems out of equilibrium. So I'm now going to hand over to Dr. Simone Maloney. So Simone, I will make you the presenter and I will let you take it from here. Hi, Adam. Hi, everybody. First of all, I would like to thank the organizers for organizing this webinar and giving me the chance to present part of our work, part of the work I've done with some collaborators, and a topic which I found quite interesting in general. As Adam said, of course, I'm ready to take questions during the seminar, especially if addressing this question and help keep following what I'm telling. So feel free to ask them. Or if you prefer, you can ask at the end and we'll try to be as exhaustive as possible. So what Adam was saying before about the objectives of Bayek cell somehow is a nice combination with what I'm going to tell right now. So one of the points Adam made before is that one of the objectives of Bayek cell is to improve efficiencies of the code, which of course is a key issue. Still, you have to keep in mind that some of the problems we want to address in chemistry, physics, biochemistry, and so on, they exceed the time scale, which is affordable by present and possibly future generations of supercomputers. And for this, we need some special techniques which is a subject of my seminar of today. So let me give you first of all a short outline of the seminar. I will start trying to convince you that you might need special techniques for dealing with this kind of problem. I will first give you an example of a simple but well-known rare event problem in molecular simulations. Then I will try to convince you that there are cases in which the standard techniques, they are not enough. And I will present you a couple of different techniques to address the problem of reconstruction of free energy or study of rare events in a dimensional space. And I will conclude showing you a couple of illustrations of the use of these techniques. So let me start from the problem for the apparent toy problem, which is the one of isomerization of an alien peptide. So isomerization is essentially the change of conformation from the roughly speaking linear one into this scarred one. So essentially this can be described as a rotation around one or few diadral angles of these molecules. And so if you're interested just in this problem, you can coarse-grain the description of the process passing from the descriptions in terms of the positions of all the atoms. So three times the number of atoms composing the alien peptide into the description in terms of one, two, or few diadral angles, which make you more focused on the problem you really want to address. And the typical question you ask about this, they are the statistics of the transition. So what is the probability that the system stays in configuration one or configuration two? How much time it takes for going from configuration one and configuration two? And what is the path? What is the most likely trajectory to go from configuration one to configuration two? So this is the kind of question I would like to address. And so let me introduce a couple of observables which typically we use to describe this process. So if we decided to describe the process in terms of a set of collective variables, these observables which can better describe the process, the diadral angles in the example I was showing before, one of the natural question to ask is what is the probability to observe a given value of this diadral angle? So for example, the probability to observe the value 180 degrees of this diadral angle and measure what is the stability of the state with respect to the stability of the state in which the diadral angle is no longer 180 but is zero. So already just this example tells you how much important can be the probability. But of course, the probability is also important all along the process because it tells you what are the configurations which are the most likely visited along the isonarization processes so implicitly the reactive path. Then of course the probability it changes quite rapidly when you change configuration so very often we are used to this to use instead of the probability its logarithm and when you express the logarithm of a probability in a thermal unit, so KBT where KB is the Boltzmann constant you end up into what is usually called the Landau free energy. So the Landau free energies are kind of free energy in which the relevant observable describing your system on top of temperature and so on is the set of collective variables describing your system in the course grain descriptions. So once again in this case it will be the free energy of the collective variables. So why is this free energy important? Well for example it's crucial to estimate the times tau or the rate of the isonarization process. So this can be described via an Arrhenius-like or transition state theory equation in which you have a pre-exponential factor tau infinite and an exponential term which depends on the barrier so the difference of free energy in between the initial minimum of the free energy and the top of the free energy before you end up in the other system divided by the thermal energy available to the system. So once you know in principle the free energy landscape of the system you are also able to estimate the kinetic of the relevant transition in the system using a question like this which requires once again to identify the extremal point of your free energy landscape in the minima and transition state. So one in principle can imagine to use brute force molecular dynamics to compute this free energy because one can run just a very long trajectory one can construct the histogram of the trajectory so compute how many times your system your trajectory visit a discretization of the coarse graining of your system so a discretization of the collective variable space this is an estimate of the probability one can normalize this probability and take the logarithm and the logarithm will be the estimate of the free energy and when you have free energy as I told you before you can also in principle compute the rate of the process. The point is that when you are in presence of barriers much higher than the thermal energy typically this means that the time your system takes to visit the other minima of the free energy starting from initial one its scale exponentially with the barrier so typically this time is very long is much longer than the characteristic times of your system so for example the characteristic time of a normal system is connected to the vibrational spectrum of the system femtosegons or so while the relevant process of an indipendent isomerization can be even in the range of microsegons or if the molecules are macromolecules even longer so this brute force approach doesn't work because for our efficient if efficient can be your molecular dynamics code typically the time is needed to run such a long molecular dynamics exceed the capability of your super computer so this means that typically you need to use some acceleration techniques and actually there is an array of acceleration techniques one might want to use so for example one can use umbrella sampling, constrain molecular dynamics that you may call Ruhman ensemble you can use metodynamics boxed molecular dynamics for work plot sampling and many others so one problem if you wish one characteristics of these techniques is that typically they are able to target a limited number of collective variables one, two or few of them and sometimes this is insufficient let me give you an example in which this is not sufficient and once again let me resort to the Aralindipeptide problem so here is illustrated the say stride configuration and here there are a couple of wrapped configuration of the Aralindipeptide and the question is what is the trajectory and what is the barrier to go from this one to one of these two and we consider the case in which we coarse grain the process in terms of two angles only two diagonal angles Phi Psi or with four diagonal angles adding to the first two also Theta and Z if you compute the ideal path you go from one state the initial one to the final one with two and four collective variables actually you obtain two different free energy profiles which differ both in terms of the barrier for this organization process and also in terms of the shape which seems to suggest a different mechanism of the process now you may wonder which of the two descriptions is the good one or the one with four collective variables and actually there is a way in which you can address this question rather precisely which is by investigating the committer function at the putative transition state of your process what is the committer function the committer function is the probability that starting from a given configuration in your system ends up in the product before the reactant so for example if you are sitting in one configuration here and you shot a number of trajectories you may end up with a given probability in the product here say along the green trajectory and with a different number with a complementary probability in the reactant now you expect that if you are on the transition state the committer function is 50% so you have same probability to go into the product rather than into the reactant and so if you have an ensemble of configurations belonging to the transition state so for example you compute a constant temperature molecular dynamics under the constraint or restraint that your system stay at the top of your free energy profile you get the gray in the case of four collective variables and black in the case of two collective variables ensemble and from each of these configurations you can shot a set of trajectories so for each configuration you can compute the committer function and what you expect is that if your transition state is a good one the committer function of all this configuration is picked around 0.5 probability so 50% while if your collective variable set is not good they are distributed for example in a uniform way in any case not picked around 0.5 so what we observe is that actually even for such a simple system which has been investigated plenty of times two collective variables gives a committer function distribution which is quite uniform which is as I just told you before and we will see in a second a bit more in detail is an indication of an insufficient set of collective variables why why with four collective variables the distribution of the committer function is well picked around 0.5 which is an indication that now you have a set of collective variables which is sufficient to address the problem you want to investigate now let me show you why the collective variable or collective variables you pick can be insufficient actually we can split the problem in a taxonomy of four cases and let me illustrate you these four cases for a simplified scenario in which actually you have two degrees of freedom and you decide to use one for coarse graining the process so each of these panel represents one of these case each panel is composed of three figures the first figure the main one is the energy escape divided by iso-contour line in the two-dimensional space and here typically you have two attractive basing A and B, say that A is the reactant and B is the product and the dashed line represents the putative transition state computed with the only variable you consider for the constraint Q in our example and so Q star represents the transition state in the coarse graining so in the Q variable the top panel is the projection of the energy function along this dashed line so is the projection of the energy along the remaining degrees of freedom as when the one used for the coarse grain is constrained to be on the transition state and the bottom panel is the corresponding distribution of the committer function so let's consider the first case in the first case you have that the attractive region of the basing A is on the left of the putative transition state the attractive basing of the product B is on the right of the transition state you have a minimum of the energy along the transition state which is in correspondence of the transition state which remember you is the saddle point so is expected to be a minimum along N-1 directions and if you shot a set of trajectories from configurations belonging to the transition state you would expect that 50% of the time they will end up in B 50% of the time they will end up in A so in principle you expect them to be a delta function centered around the 0.5 value but usually because you have errors or things of that kind you expect something which is a broad distribution but picked around 0.5 case now let us move to the example 2 here same column lower line so in this case you don't have attractive basing of A in the left and of B in the right but actually the attractive basing they overlap if you describe them only in terms of one collective variable Q so for example if you sit in this place you would expect that most of the time the trajectory shot from here will end up in A if you sit in this place here most of the time will end up in B and so instead of a saddle point you have a maximum along the S direction and so most of the time your system if it is constrained to be in the transition state won't stay in the center but will stay in the region of attraction A, a region of attraction B so the committer distribution in this case typically is picked at 0 which is when trajectory start from this region and so they have essentially 0 probability to end up in B and 1 which correspond to the trajectory shot from this region which have 100% probability to end up in B so you see that if you use just one single collective variable to describe the transition state in this case or the entire processes in this case this is highly sufficient actually you can guess just by intuition that a good representation of the process requires that the transition state is a diagonal sorry is a diagonal line here cutting the two basins in two but of course this is not the only possibility we can imagine this is a third one, the one in which the two basins, two attractive basins are still in the left and right of the putative transition state but you have a wide flat region in between so in this case if you shot from this region you will end up in A if you shot from this region you will end up in B but if you shot from the beginning from the center you have essentially 50-50% probability to end up in A and B and so in this case you will have a distribution of the committer which is rather uniform like this which actually is a situation similar to the one of two collective variables we consider for the island in the deep of tide so this means that the island in the deep of tide with just a phi and psi the other angle for the core screening of the process belongs to this scenario finally you have a fourth scenario the one like this in which your putative transition state is completely based in one of the basins so like this, something like this that the real trajectory you expect to pass along this line but indeed if you consider just one collective variable you put a different transition state so your maximum of the energy is like this and in this case you will have that the committer function is picked only in should be in one here I think there is an error instead is reported pick it in zero instead should be picked in one so you see that you can have a completely wrong representation of your process if you use an insufficient or inadequate set of collective variables for describing it so the question I want to address today is when typically you need a significant number of collective variables your process is so complex that you cannot easily guess a number of them like in this case just four natural angles but the one in which perhaps you need to go beyond this limit many of them which typically cannot be addressed with standard techniques and I will present you two techniques to address this problem one is the single sweep method in its improved version and the second is the string method so the objective of the single sweep method is to reconstruct a free energy landscape in a large dimensional space so when you have many collective variables say five, six, seven or eight or whatever standard techniques what they do usually is that they reconstruct the free energy landscape while the system explores the space so for example this is the case of metadynamics so you drop Gaussians so you introduce an history dependent potential which makes it less likely that the space which are already visited and the Gaussians that you drop themselves they allow you to reconstruct the free energy landscape at the end of the simulation so the single sweep approach is different you introduce an acceleration dynamics just to identify the regions of the space which are important because they contain metastable states so minimum of free energy and the regions which allow to connect one metastable state to the other the free energy landscape in this subdomain of the entire space in a second step so let me first speak of the first step so to explore the first step one can use plenty of different techniques including metadynamics which I just mentioned we use what is called temporal accelerated molecular dynamics so you extend the standard molecular dynamics over the atoms the molecular dynamics which allow to change the value of the collective variables let me explain in detail each of these two dynamical equations so this is if you just consider the first part of this dynamical equation plus this term which is the thermostat attached to run constant temperature molecular dynamics this is standard molecular dynamics what we do here in addition is that we add a harmonic light term which force our system to visit configurations which are consistent with the present value of the collective variable so you see that as long as soon as the present value of the collective variables differs from the target value, you have an harmonic force which forces our system to get in a configuration which is closer to the one compatible with the present value so essentially this system is driven by two forces one which is the physical force another one which is a restraining force which limit the region you can explore limit enforcing the system to visit configurations which are only close to the present value of the collective variable then we have a complementary equation for the current value of the collective variable itself for the Z which is different from the value you will compute from your present configuration and once again you can imagine that the dynamics of the system is driven by the same potential of course the first term the physical potential which doesn't depend on Z doesn't add any force on the dynamics of the collective variable only the second one will add a term now in these dynamics we have two tunable parameters one which is the mass of the value of the inertia of the collective variable and one which is the temperature of the two thermostats and I will avoid to describe in details because this running properly in these dynamics requires some understanding of the dynamics of the system and some tuning so in general what we do is that we set the mass of the collective variables large enough that the characteristic times of the collective variables is much slower than the characteristic times of the atoms and we set the temperature such that the thermal energy available to the collective variables is much higher than the thermal energy available to the atoms and I will show you why this in a minute so first of all with this choice of the masses the collective variables they essentially move in the field of the average force of the average configuration of the atoms so essentially it is like if they are driven by an average force and this average force is the mean force in the approximation of a perfect separation of time scales in between atoms and collective variables so it is like if the collective variables they move over the free energy landscape and if the temperature is large enough the collective variables can easily overcome the barriers which are possibly present over the free energy landscape so in principle you can set the temperature such that your system is able to overcome barrier such that barrier of the 8 which is compatible with the thermal energy deriving from the non-physical T prime so in other words you are accelerating the exploration of the free energy of the system computed at the physical temperature by having this alternate temperature only on the collective variables so once again I am exploring the physical free energy at the physical temperature but in an accelerated fashion so for example by doing this on the toy problem of the Muller potential with a time affordable by a standard molecular dynamics I can explore the regions which are about the minima say starting from the deepest minimum I am still able to explore the metastable minima and also the region which allows me to go from one state the absolute minimum to the other state so also the putative transition state the following problem is to be able to reconstruct the free energy now the idea here is that the free energy is nothing else than a function and so in principle I can expand this function on a suitable basis set so what we do here is that we expand the free energy function over a Gaussian basis set and in order to be able to reconstruct this free energy function then I have to determine the coefficient of the expansion and since we decide the basis set the Gaussian in this case to have possibly a variable width which better suit the shape of the free energy escape also the value of the width of the Gaussians and we can also make the Gaussians anisotropic so have different width along different direction in the collective variable space so we have to find a way to obtain the CK and sigma K which you can imagine to do it is to minimize an objective function suitable one with respect to C and sigma so what is the objective function that we introduce is the difference between the minus gradient of the free energy defined by this expansion and the mean force that we can obtain with any of the many standard techniques so you can compute the mean force in simulation by constrain molecular dynamics by restrain molecular dynamics and many other ways so in particular this objective function can have a non-trivial weight and what we do often is that we use a weight such that points at very high mean force they are less important in the definition of the objective function so you see that our weight of the type 1 or over F to the square then of course we have to add a little value to this F because in proximity of the minima F can go to 0 so this definition of the objective function can become unstable and the way in which you avoid this problem is to add a little delta here so at this point the point is just to find an approach to minimize this objective function E of C and sigma and do it over the relevant range of the all possible values of the collective variables which has been identified before along the accelerated molecular dynamics so for example what we do is that along our molecular dynamics we drop we identify points here we don't drop any Gaussians nothing like that so we just drop points which are distributed over the interest in regions and say for example they are at a constant distance in between them which means that we have a good and uniform description of the free energy escape and then this define the position of the Gaussian function introduced before this ZK and then we are left with minimizing so minimization objective function can be made in many different ways the way in which we do it it is by simulated annealing so this means that you start from a trial value of C and sigma and you give them give to this value a random displacement so from initial C prime and sigma prime you obtain C second and sigma second and then you can accept or reject this move by a standard metropolis criterion in which you introduce an epsilon which is a measure of the probability that a move up in energy is accepted and slowly by slowly you reduce the epsilon such that at the end of this iterative procedure you will end up in certainly one metastable minimum of this parameter space landscape once you have C sigma C and sigma the optimal one you can reconstruct the free energy landscape just by the linear combination of the Gaussian functions and here I am presenting you three cases of a potential energy that has been used to try to reconstruct the molecular potential so this is just a test case to see how the approach works so the one you see here in the top panel is the Vanilla approach so essentially is the one in which you have an expansion on a Gaussian basis set and for all points you use the same sigma and you weight all points in the same way without this special weight I introduced here so in the case in which mu i is equal to 1 for all the points in our summation so already at this level you have a qualitative good reconstruction of the potential surface you identify here black is the exact one red is the reconstructed one and you see that already at this level red presents metastable states in proximity of the real metastable states transition state which is close to the real transition state still you see plenty of features which are quite different so as long as you're interested in a qualitative description of the process already the Vanilla approach is ok however since in this case you see that there are strong asymmetry in the potential it is probable that using a single value of the weight of the Gaussian which is the same for all the Gaussians and it is isotropic will be inefficient so what we do is that we replace a single Gaussian into a Gaussian containing a covariance matrix so here sigma k actually is an n by n matrix and we allow to the all the diagonal and off diagonal terms apart for symmetries and in principle to be different and we optimize each of them if you want to introduce asymmetry in the directionality of the Gaussians you have a very nice reconstruction of the energy landscape but still since you are giving the same weight to regions of relatively low and relatively high energy you have that sometimes the reconstruction is not accurate like in this region so if on top of this you add the weighting factor not uniform dimension before actually you obtain a very accurate reconstruction of the energy all over the landscape of interest now let me show you single sweep at works in a real case here we were interested in a material for hydrogen storage alanates in particular and we were interested in processes involving the hydrogen transfer actually H-transfer in this complex materials that in principle can involve plenty of degrees of freedom and we wanted to reconstruct the free energy landscape using as collective variables the coordination number of aluminum which are these green spheres with respect to hydrogen which are these white spheres around so we selected two collective variables which are the number of hydrogen around two aluminum atoms which were exchanging one proton and reconstructed the free energy landscape in this two dimensional space and we wanted to compute how goes the accuracy with an increasing with increasing size of the basis set so you see that already considering just 25 Gaussians over this region maximum error here which is of the order of 0.01 electron volt since this calculation they were done with DFT density functional theory essentially this error is at the limit of the accuracy of DFT so going beyond this limit was essentially useless so this shows that already just a calculation of 25 mean forces which is needed to compute the reconstruction of 25 Gaussian element basis set will be sufficient to have a very accurate reconstruction of the free energy we didn't do the comparison with other techniques because it was not the objective of the papers but essentially because this costed us order of 80 picoseconds of molecular dynamics simulation which is a significant time but probably other techniques that are taxilation of the space which is scale exponentially with the number of collected variables would have been much more expensive would also the same approach has been applied in the in the field of biochemistry and here what people were interested in I was not part of this article is the diffusion path of carbon oxide out of myoglobin myoglobin is at the same time really simple really complex protein which in principle can absorb oxygen and carbon dioxide and it has four metastable states which are known experimentally which are located in the structure and by reconstructing the free energy landscape with respect to the position of carbon oxide of the center of mass of carbon oxide using single sweep method people were able to identify the metastable states so the minimum local minimum of the free energy in myoglobin and actually they have a very nice matching with experimental position which have been identified by and now knowing the position of these metastable states and the free energy of the entire space one can also reconstruct diffusion paths so the paths connecting the metastable states which have been identified and also these paths which people have found they coincide quite well with some huge effort which has been known with brute force simulations or other simulation techniques so even though we can go beyond a three-dimensional free energy landscape within what is doable with the system this approach seems to be quite well in agreement with known results experimental in computation however often one has to go beyond this significant but limited number of collective variables I told you that the number of dimensionality in which you can use a single sweep is typically four, five, six, seven, eight but sometimes you need tens, hundreds, thousands or even more collective variables I will give you an example shortly so in this case one has to give up to reconstruct the entire free energy space it's way too complex and one should be happy with reconstructing just the most probable path to go beyond known reactant and products I won't develop the entire theory here but one can prove that within the coarse grain representation so within a collective variable space this most probable path obey this equation here let me tell you what this equation means so essentially it means that the force orthogonal to the path is equal to zero apart a matrix which I will explain in a minute now this is actually if you think for a moment quite intuitive so imagine yourself to walk from a valley to another valley along a mountain path what you will do you will stay in the lowest possible place at the level of progress along your path and this is what this equation is telling you so while I progress I will sit myself in the lowest possible place so where the force is equal to zero when the energy is minimum along that level of progress so not the absolute minimum of the energy but the minimum of the energy along that level of progress it's exactly like walking along a mountain path now since we are passing from a coordinate representation into a coarse grain representation variable representation perhaps we are doing non canonical transformation and certainly we are doing reduction of the space this matrix in front this metric matrix in front is exactly the result of this fact the result that we are perhaps rotating and reducing the dimensionality of the space and actually it depends on the derivative, on the gradient of the collective variables with respect to the coordinates but this is a minimum technical problem now taking once again as an initial example the granular problem what we want to compute in principle is the optimal path to go from the absolute minimum to go from the local minimum here to the absolute minimum here and what one can imagine to have is just a first guess of this path for example just a straight line with initial and final points which lay like in the attractive basing of the product and of the reactant and in principle we would like to have a technique which brings ourselves along the ideal path which is this black one shown here so I told you already that the ideal path has to obey this equation which means that along each point the ideal path has zero gradient of the orthogonal force and one can invent an algorithm, one can introduce a dynamics for each of this point such for example a first order dynamics that move the system in the direction of the orthogonal force so if at a given moment say at this moment of migration my orthogonal force to the path is non-zero if I move along that direction I will reduce the orthogonal force and so iteration by direction I will converge to the ideal path this is a kind of steepest descent algorithm this is a kind of steepest descent algorithm in which I replace the force just with its component orthogonal to the present iteration of the path one of the difficulties is that to compute just the orthogonal component of the force I need to compute the tangent to the path and to subtract the projection of the force along the tangent if my path is coarse grained the accuracy with which I can compute the tangent is quite low actually and this might bring to severe a numerical instabilities so this has been known for a while perhaps you know of the nudge elastic band method in which you were still discretizing the path in images and the way in which they were avoiding to compute the tangent is that they were connecting to successive images along the path with a spring which essentially try to counterbalance the force that will move the bits toward the two minimum itself which has been quite successful since it has been introduced it also offers of some numerical instabilities so one can try to do a bit better than that which is the objective of the string method in collective variables so we would like to replace this equation which suffers of this potential problem of instabilities because of this orthogonal component of the force with something which doesn't have it and the key objective the key point to achieve that objective is noticing that the real force acting on the coarse grain representation of my system at any of the points of the putative path at the given iteration it consists of two components so the actual force is the red one there is one component that is actually orthogonal to the path so it goes in the direction in which I want it to go which is the orange one and then there is a tangential component along the path which the only effect will be to bring the point closer to themself while if I want to have a good description of my path of course I would like them to be uniformly spaced so what one can imagine to do is to replace the single step approach in which I compute the force acting on the point and I replace and I remove the tangential component in a two step process in the first step I move the system along the entire force so if I do this for example this point will be moved here in this point here and if I do for all the points of course they will end up in the next step iteration but they will be no longer at the same distance with respect to each other in particular this will have the effect that the region around the transition state will be underpopulated and so the accuracy with which I can compute the transition state will be quite low but then I know what to do because since I know that the tangential component of the force the only effect it has is to bring together bring break the uniform distribution of the points what I can do is that I can redistribute this point in order to make them at the same distance and for example I can do replacing the continuum ideal description of the fact with a polygonal curve so the curve which just joined the points I can compute the length of this polygonal path I can divide the length in the same by the number of intervals and I can put this point at a distance which is the same which is the one of a specific interval so without by this approach without having to compute any tangent or any orthogonal component of the force I can still achieve the objective of having a good description of the reaction path without any numerical instabilities let me now show you a couple of examples in which actually a large number of collective variables has been necessary and in which I and other colleagues used the string method in collective variables showing its efficiency and importance so here the example that we give you is a bit more in my field and this concerns the wetting of textured surfaces so here in brown on the bottom you see a surface, solid surface which is not flat it has a kind of square pillars on top of it and in blue you see a liquid put in contact with this textured surface so perhaps against your intuition if the solid is hydrophobic and the pillars they have a proper width height and spacing the liquid won't invade the cavities underneath but we remain suspended over the top of the pillars and what people are interested in is what is the relative stability of the state in which the liquid is suspended over the one in which the liquid enters into the cavities and what is the rate which means what is the barrier for this wetting transition to take place and we have studied these wetting transitions using collective variables simple ones like the number of particles of liquid into the cavities which is an implicit in direct measure of the amount of liquid in the cavities and so by using standard techniques like umbrella sampling they can compute the free energy as a function of number of liquid particles into the texturation or collective variables which are extension of this simple collective one so if you wish, number of particles in this region is just the average density of liquid in this region but you may want to use also a density field so a local density you can consider the space in the texturation here and you can chop it into little boxes and then you compute the number of particle boxes by boxes if the array goes back, here we go and you can use different kind of texturations so for example in the case of the surface in which we have three by three pillars we can chop the space just in one region like this in nine regions essentially each one corresponding to a pillar and then in 864 regions in which plenty of boxes corresponding to each pillars in the horizontal or vertical directions and you may compute the free energy in all these cases and you see that the free energies they have really different profiles so for example in the case of a single collective variable you don't even have a well defined maximum and the free energy of the final state the one in which you have a liquid completely wet in the cavity this is well above hundreds of KBT so it's much higher than the one you conclude with the other description of collective variables which should not be the case but you see something which is even more surprising so what I'm showing here is the distance, the Euclidean distance computed in terms of proper observable which I will avoid to describe here along the reaction path the wetting path using one red line nine blue line and 864 green line collective variables now what you expect is that the distance in between successive images is almost all the same all over the process and indeed with using roughly 900 collective variables 864 you get a uniform variation of collective variables from one image to the next on the contrary if you use less collective variables you see this sudden change of distances in between successive images which means that actually your system in this point it has a big big difference in configuration with respect between the previous and next configuration which means that something important can happen in between and you are missing it so you see that for example in this case you are forced to use a 9 number of collective variables 900 you cannot stick with what is available with standard approaches 1 up to 9 so this means that also using single sweep won't be enough in this case you need to use something different and actually we use the string method and we were able to have an accurate description of the wetting path which proceed with the liquid touching the bottom surface only one point which was unknown before and then spreading over the entire surface but similar cases you have also in problems more related to biological systems so this is the case for example of the hydrophobic collapse of proteins this is a work done around 10 years by Eric Mandelen and David Chandler and David Chandler had the idea the intuition that the collapse of hydrophobic proteins is not driven by internal degrees of freedom it cannot be described in terms of lateral angles, bold angles and things like that but rather it is driven from flat directions of the density of the water so water gets verified instantaneously verified around the protein and this gives rooms to the protein to wrap only when this takes place the protein actually can wrap so what they have done is that they have put a toy protein this is just a set of Lenard-Jones pieces into a water sample and they discretized the simulation box in 130,000 collected variables which means once again the density field corresponding to each of these cells and of course with 130,000 collected variables there is no techniques perhaps a string method which allow you to compute whatever neither the free energy scale but even not the reactive path and actually using these vectorial collective variables and the string method they were able to reconstruct the wetting path and to identify that actually the transition state corresponds to the initial bending of the protein so the one in which the first closure of the protein passing from the straight configuration and this happened in correspondence of reduction of the density near the protein only when this happened then the protein can fold and I don't report here the image but they were able to use in these collective variables you have a well-picked commit or distribution around 0.5 which confirms that such a complex collective variable actually is needed to describe the folding of hydrophobic protein in water so this brings me to the conclusion of my seminar which are the following so I hope that you get convinced also in simple processes or reactions sometimes one or two collective variables they are not sufficient so you may need to use a to use another number of collective variables which means more complex space to explore there are techniques to validate whether your collective variables were in mind were sufficient which is through this commit or analysis but commit or analysis that's you only if they are sufficient or not then you still may have to explore a larger space now if your problem if your subject if the scientific problem you want to address is one of these cases in which you need large number of collective variables you have I'm proposing you two possible techniques to use so if the number of collective variables is still limited few of them five say you might still want to reconstruct the entire free energy escape that gives you a lot of information and in this case you can use the single sweep method in which you separated the problem of exploring the space and the problem of reconstructing the free energy escape or if you are working very high dimension so tens, hundreds, thousands of collective variables you may want to limit your attention to the reconstruction of the most probable reaction path and use the sweep method which is also which is scales linearly with the number of collective variables which is essentially affordable has been used already for 100,000 collective variables and it is numerically very stable and so should be a guarantee to convergence in reasonable time and with this I would like to thank the organizers for this opportunity once again and for you for attending to this webinar and I pass the word to Adam Thank you very much Simone so we have our first question in the question box already I'd encourage anyone else who has any questions to ask them now I don't know how many people have time for it today but it is that we don't get round to we can always follow them up later but so we do have our first question from Alberto your Pietre the question from Alberto is the presenter showed us advantages of the single sweep approach but which are the limitations or the disadvantages with respect to other approaches to reconstruct free energy landscapes and are there cases where free energy landscapes make difficult or unsuitable to employ the single sweep method so I think it's really a question about the constraints well the conditions under which you can use these methods that you've described here one limitation if you wish is that other techniques for example method dynamics they allows you to have an idea of the free energy while the trajectory goes so at already at internet stage you can reconstruct the free energy at the present time and for example you can discover that already at the point you visited multiple minima with given relative free energy now in principle single sweep the two step approach requires that you do also the second step which is the reconstruction of the free energy to do the reconstruction of the free energy you need to compute first the mean force which means doing simulation on purpose for that so this might be a limitation however if you do the exploration step in the way I was showing you before if you evolve your system according to this equation under the conditions that is a good time scale separation between the physical atoms and the collected variables actually the distribution of Z is also an estimate of the free energy you need to rescale by the temperature but it's something which is really simple so also this limitation in principle so it might be not a real limitation what is a bit more tricky and you cannot see by my questions here because I didn't report them in detail is that in order to that indeed underneath this minimization of the objective function so this is essentially a linear fitting it's non-linear one if you introduce also sigma if it is only for c will be even linear linear square fitting now underneath there is coefficient matrix and that can be e-conditioned if it is e-conditioned you might run in trouble for minimizing this objective function and unfortunately you don't know it before ends so you don't know this problem in your simulations so this might be a limitation and also of course a limitation which this approach shares with other techniques is that with the selected collective variables you may have troubles in identifying well defined stable states so you can still run short of the number of degrees of freedom you need to represent your system so you can just go more far than the others but this can be still insufficient ok hopefully that answers the question Alberto if you do have any follow up ok Alberto says thank you that's great any other questions from the floor I'll ask one quick one seems to need to type in if they have any more before we wrap up just a quick one more from the computational side I was wondering whether these techniques that you've described here whether they I wonder how they scale compared to more traditional techniques to can you also use them on very large machines with many processors so using these techniques for example on Marconi which is the Italian super computing machine which is made of night landing processors so low frequency low energy plenty of cores and actually the techniques I was showing you they are optimal suited for this so for what concern the single sweep the computational expensive part is the calculation of mean force and you have to compute this mean force for each of the points of this summation so for example in the case of 25 points I was showing you before in this example here we have to compute mean force at 25 values of the pair of collective variables now these are 25 independent runs that you need to combine at the end so in principle you can combine the intrinsic parallelism of your molecular dynamics engine classical or ability with an embarrassing parallelism that comes from independence of calculations of the mean force in these points and these so you have a perfect essentially a perfect scalability and this is true as well for the string method so let me come here so you see that before we advance a string from this configuration to the next one we need to compute the mean force at each of the images that are the discretization of the continuum path typically we use in between 32 and 64 images and before you do the final step of moving configuration from here to there configuration in the course screen representation these are completely independent calculations so once again you can exploit the embarrassing parallelism of independent calculation to do it we have been running this kind of calculation on any European machine of the present and the past so for example I'm actually running at css I have been running with some colleague years ago it was it was at the I-Certain Ireland we have been running on different generations of Chinega machines so I would say they do well great thank you very much okay so I haven't had any more questions in the question box so I think we will wrap it up for today thank you very much indeed Simone for your presentation today and thank you all for coming along to listen I reminded that if you do have any other questions later or if you just want to ask a question about any of the other things that BioExcel is working on do check out our forums you can find a link to them from the main BioExcel webpage okay thank you all for coming along and hopefully see you all again soon bye