 Welcome to this talk on accelerating sampling in Gromax using the AWH method. In this presentation I will first give an introduction to sampling in general, then I'll discuss how the AWH method works, then some short notes on how to use it, and then I'll show an application to computing a potential need force of permeational small molecules through the protein aquaporin. So let's start with a general problem formulation. As probably most of you know, energy landscapes of biomolecules are high-dimensional and often rough, and then this causes issues with sampling the landscape, as you might be stuck quite long in local minima. And very often, simulation times, which can now be reached on the order of microseconds, are not sufficient to sample states once interested in for a particular problem at hand. So typical situations one could encounter is that, for instance, you would know the beginning of the end state of some process of a molecule you're interested in, and then the questions could be what are the paths connecting these and what are free energy barriers if there are connecting these states. It could also be that you only know the beginning state and you want to know what motions of interest or other sorts of interest there might be. In both these cases, what can be very useful is if you can come up with a reaction coordinate that characterizes differences between the states or that characterizes progress in some direction from a beginning state. So here on the right I've shown an example, which is just an artist's impression of a free energy landscape where you could have a minimum that you start out on the left, and you want to go to a minimum on the right, which you might know where it is or not. But it's useful if you can come up with a general direction, which this process goes, but then one needs, if the barriers are too high and the process too slow, one needs an enhanced sampling method to accelerate the process of going from one state to the other state. So that's such a method is what I'll discuss here in this talk today. So as I said, the typical issue in molecular dynamic simulations is that interesting events often happen on microsecond or longer, so we need billions of time steps of two femtoseconds. So this is inconvenient because you spend a lot of time computing and often the time scales can be even longer than that, so one needs to do something else to sample faster. But if you look at what's actually happening in many cases, is that the event or the events once interested itself are often quite fast. So often you have to wait a long time for such an event to happen, but when the events happen it's quite fast. So when that's actually the case, then it is in principle possible to use many independent simulations because you can use a short simulation to sample a short event. And if you can have some kind of control over where you sample and how you sample, you can smartly buy a simulation, as I'll show, to get more events. So in that way you can have a more efficient use of compute time and also a shorter time to solution. So if you can increase the number of events and wait less long. So that's a general approach that provides a lot of advantages. So here's a schematic depiction of the sampling problem. So this is in one dimension with a double well minimum where this single red particle does not have enough kinetic energy to cross the barrier within reasonable time of this animation. But if you know general reaction dynamics, then you'll know that the time to cross the barrier here goes up exponentially with the height of the barrier. So you can easily get extremely long time scales just if the barrier is a bit too high. So you can wait for a very long time here and nothing will happen. So here it would be advantageous of course to make something happen, instead of just waiting for a very long time. This is a cheap simulation, but of course if you look at a very big biomolecular simulation this could be wasting a lot of compute time, just some vibrating in a single minimum. So the trick we're going to use here is a well known trick which is rather simple and elegant is to add a bias potential to make the effective potential flat or close to flat. So if one looks at this double well potential which is the black line here and the sampling is limited simply by the high barrier in the middle. So if one can remove that barrier by adding this blue potential then one would get a flat landscape which would make the sampling much faster because you would just diffuse over a flat landscape. But the problem here is that this blue potential, the optimal blue potential would be equal to the original potential, the dash black line here which is often what you don't know. So you need to make this work effectively, you need to put in the answer of part of the question you're interested in. So you need an iterative method to solve this because you don't know the answer to start with. So for that you need some iterative solver. So what I'll present today is the accelerated weight histogram method but there are more methods, I mean there's metodynamics, there's adaptive biasing force, there have been very many methods here in literature available for solving this problem. Some are originally started at Hock but some have improved and have a more solid mathematical statistical mechanic basis behind them. These are probably three of the most popular ones at the moment but I'll discuss the accelerated weight histogram method which has been implemented natively into Gromax which we've been working on and I'm one of the developers of. So what makes this method somewhat different from other methods is that the target distribution is a central quantity. So you can decide beforehand what kind of distribution you want to sample along one or more reaction coordinates. And that's unlike many other methods that use your bias and you don't directly control the target distribution. In a good method there should always be a target distribution but it might be given implicitly or have only a single shape. So here you can set it freely. And it can but it doesn't have to depend on the free energy so you can have a flat landscape as I just illustrated on the previous slide but it can also be some enhanced temperature sampling. It has initial exponential convergence and later it goes with the square root of the number of samples. So this is a big advantage in the end stage any method the error should go down as the square root time of square root number of samples but initially when the errors are much larger than the thermal energy KT you can converge exponentially which gives a much faster convergence and this is automatically controlled. There's only one uncritical convergence parameter as I'll discuss later. So most of the work on the method has been done by a former piece of mine called Vivica Lindau and the manuscript of the method is given here. So how the method schematically works is that we extend the ensemble with reaction corners it can be one or multi-dimensional lambda and then one runs a bias simulation for a short while this provides new samples and then from that you can estimate what the real distribution is so initially you have some initial bias which is usually zero no bias at all which is of course incorrect because there is some potential, some barrier or so that you want to remove so looking at what your bias you put in and what samples you get out you can estimate the real distribution this is done with efficient updates here for this then having this information one can update the bias and here the target distribution is included explicitly and then get out a new bias and then one iterates around here until you're happy with the convergence here in formulas it's shown how this goes so in the top again the process at the bottom you can see that the updates of the free energy estimate they go with the collected samples you have here this delta W that are the collected new weights and you divide by the total number of samples and the target distribution which comes in explicitly so this tells you how much your samples deviate from what you expected and that's you correct the free energy with that which gives a convergence of 1 over M as it should or sorry an update size change of 1 over M so how does AWH apply this bias? so initially we use a harmonic umbrella potential which is moved using Monte Carlo moves now we by default do a kind of Boltzmann version of the convolution of the effect of these potentials of the Gaussians this produces a nice smooth bias and this is done on a regular grid considering only neighbors so only closed neighbors so this is rather fast because we don't have to track positions of all the biases they're nicely on the grid which makes for efficient code so here's an example of how this works now for this double well potential so here you see how the blue bias potential slowly converges to the black line and at the top we have this initial phase where there's exponential convergence towards the target distribution this simulation stops right after the initial phase where you see now you've got a quite good estimate of the free energy and now already you have quite diffusive motion over the landscape so now the method would shift to the final phase where you get the normal 1 over square root T square root N behavior to converge the free energy exactly to the landscape and so there's this update size so here you can see how the update size in a simple toy example goes down initially exponentially and then later goes as 1 over square root sorry everyone's 1 over T the update size and the weights that need to be updated consistently with this so initially the weight is fixed which is shown here on the left sorry on the left is the update size on the right is the weights which are a bit strange initially because if you don't change the update size then the weights go up of samples initially but these initial weights they get weighted out exponentially because of the exponential initial phase so you get in the end to a flat weight distribution as it should be that all samples are weighted equally so this provides very nice quick initial convergence and correct convergence at the end so the initial update size is basically the only parameter you need to set after choosing a reaction coordinate so that determines how fast you flatten out the landscape this is in Gromax provided by setting two parameters but actually only sets one because the update size is an intuitive parameter so we set the diffusion along the reaction coordinate and an initial error estimate so these parameters are not critical and these default values that we have in Gromax work quite okay at the bottom here you can see in the formula how these actually contribute to the initial update size so one can play around with one of the two data they go into the same parameter in the end so yeah the only thing that can go wrong is if you have far too large parameters here then you get very slow convergence because the update size is very slow so not much happens if you have too small parameters the system might be pulled apart but you'll see that so but there's quite a large range to play with here and things can work well if it goes wrong you'll notice okay so then a final method note before we go to applications is that one can also use multiple what we call walkers to contribute to this bias and computer free energy so there's no reason to limit the sampling phase in able to reach to a single simulation multiple and they'll contribute to the same weights and to the same free energy updates so then they can run independently apart from that they share the bias and then of course get updated yeah they have to update the bias to feel the effects of the other copies so then the question is how many copies can be run and how does the convergence depend on the number of copies so I'll get back to that later for the application so here's an example I think these are four walkers but a bit difficult to see in these fast movies so you see it converges much faster now and they all happily move over the landscape to create a very to create this smooth potential so here this is in the same time scale as the original movie so it goes much faster you see it actually goes more than four times as fast but I'll get back to that later okay then the application I would like to show here to highlight the method so this is on the protein aquaporin which in principle as the name says it permeates allows permeational water through a cell membrane so it consists of four monomers which each have a channel in the middle and in this case we're interested in a bacterial channel which also can conduct ammonia depending on the sequence so we are looking at slight mutations which make it easier for ammonia to permeate so we're interested in how that functions so that was what the study is for which is referenced here so one has in this channel one has a water chain and a very fancy selection mechanism to only allow water to permeate and no other molecules except well in this case if you change something NH3 ammonia can go through so that's also of interest to understand how the channel selectivity actually works so as a reaction coordinate here we use the Z coordinate rather simple of the system so that's the direction through the pore for water we can use equilibrium simulations since there's water as a solvent so we don't need to do bias simulations but for ammonia we use the AWH method and two notes to make here which are general about the method is that if you do umbrella sampling which you could also do then you can block rearrangement of the pore or in general one can block rearrangement in the system because you fix an important coordinate which makes the rest of the system it's more difficult for the rest of the system to cross barriers you can also force an ion through as was done a long time ago decades ago because the simulation time for too short but then you can this can lead to non-equilibrium artifacts and hysteresis because you might pull the system out of equilibrium okay so here are some results shown of PMFs so for different so the blue is water and red sorry no blue is the original aquapor and red is a double mutant and then you can see that here we show on the top the charm force field and the bottom results for amber which differ a bit but especially for charm here you can see that the mutant makes the water chain much more structured compared to the wild type and you can see quite some differences for ammonia here in the way the permeation goes so the mutant is less permeable for ammonia but as you see you can get very detailed PMFs here from the simulations because the ammonia moves back and forward in this method through the channel and samples very well so then the simulation time to solution here one can see how the convergence goes as a function of the number of or well as function of time but for different numbers of walkers so if you do a single walker you get this black dashed line but if I do two walkers then actually the time to solution more than half so it's nearly four times as short so you get super linear scaling here and that's we think because if you have only one walker it goes into one minimum and then it moves to the next minimum and then hangs around there for a long time increasing or decreasing the free energy estimate there so you get time correlations due to this and lags whereas if you have multiple walkers this is much less of an effect since they sample at different parts of the free energy landscape on average so for this case you can go up to 32 walkers and still have a gain 64 seems to be getting worse here which is a bit strange but there's quite some gain up to quite high so that's our journal experience with biomolecular simulations gains up to quite high numbers of walkers so this reduces your time to solution significantly and can lead to nice parallelization so if we look at what the time to solution is so here we set as a target an error in the PMF from beginning to the end of the channel of a half kilo joule per mole then here we can see two curves which are for different numbers of walkers so you see initially a quite rapid drop of time to solution because of the initial the gain for one walker to two more and then the gain gets less if you go to two more walkers so here we've also looked at a recent improvement of Gromax where we have enabled direct in collaboration with NVIDIA direct communication between GPUs through NVLink and more for doing communication when we compute things on the GPU so this allowed us to use four GPUs per walker instead of just one so we were able to use to run this showcase on the whole of the machine putty at CSC in Finland which is one of the few machines which has direct links between GPUs and we can get down to a time of solution first of six hours which would otherwise be well in worst case two hours for one walker with one GPU per walker so then to conclude we have a very robust accelerated sampling method here in AWH which has few parameters and it was having discussed here is there's good checks to check if the method actually converges or not and if there are problems especially with your reaction coordinates there's only single parameter that controls convergence that's this initial update size and it's not very sensitive I've only shown one D here there's an example of a 2D case for DNA shown here next on the side it supports multiple walkers which can lead to quite hyperalization efficiency and super linear scaling when going from one to more which makes it work efficiently in parallel and you can freely choose the number of ranks to use which is also nice but as a general note the challenge is still to come up with a good reaction coordinate always a problem with that can be a large problem with any system but at least now if you have those reaction coordinates we have a very good method in ROMAX that can sample those such reaction coordinates quite efficiently with that I would like to thank you for your attention