 Good afternoon everyone and welcome to the next edition of the BioExcel webinar series. My name is Rosen Apostolov and I will be today's host. As you may already know, BioExcel is the leading European Center of Excellence for Computational Biomolecular Research and in our webinar series we feature notable scientists in their work. We also invite developers of software applications, various tools that are related to the field of computational biomolecular research and we also present major achievements and results of work done in our center with which we hope that you will find very useful and of interest for your own research work. If you'd like to learn more about the center in our activities you can visit our website at www.bioxcel.io. Today as a ghost we have Berkhez, one of the core developers of Gromachs, which you are very familiar with and he will present us a method that he's been developing the last several years, the accelerated weighted histogram method for accelerated sampling. Berkhez is also working in BioExcel and this is one of the big outcomes of our work and today it's my pleasure to present you Berkhez, who is a professor of theoretical biophysics at KTH, the Institute of Technology in Stockholm, Sweden and he's been designing algorithms for Gromachs for over a couple of decades already. Currently he's very interested in advanced sampling methods, one of which he's presenting today and also studying wetting surfaces at the molecular scale. Okay, welcome to the Berkhez webinar series. Thank you, Rossum, for the introduction. Let's see if I can forward the slides. Yes, very good. Okay, so today I'll introduce the AWH method, first say how the method actually works, then how to use it in Gromachs, some example applications and then at the end as Rossum said we'll, there will be time for questions. So the first thing I should say is that most work of this on this method has been done by my former PSU student, Fivika Lindau, who has developed the method or taken the original weight accelerated weight histogram method as used in more theoretical physics fields, adapted it for biorelector simulations and implemented everything, added several features and also wrote the reference manual section as listed here on this slide, as well as several manuscripts using this method. So if you want to read about all the details of the method itself, I would like to refer to the reference manual and to the different publications listed here. Okay, but let's start out with what one, the problems one would encounter where one would want to use such accelerated sampling methods. So as most of you will probably know, the free energy landscapes of biomolecules are often, well there are nearly always high dimensional and they're often quite rough. So typical situations are is that you might be interested in two different states of the protein, which for instance if you're somewhat lucky you might know the beginning and the end state you're interested in and then the questions could be like how, what are the paths connecting these two states that you're interested in, which could tell you what the mechanisms are for instance, but also if you know the path you might be able to compute the free energy profile between these two states and that might tell you how difficult it is to go from one state to another state. One might even be interested in dynamics between these two states, which is a bit harder, but which can also be investigated with somewhat more effort. A more difficult situation might be that you only know the beginning state and you want to know about a certain unknown other state in the system, which is an even harder case. So because these free energy landscapes are rough and have quite high barriers, it can take a very long time if you just run a normal simulation, as probably many of you will also be aware of. So if there are ways of accelerating the sampling between states that could potentially give a major speed up of your simulations and thereby you might be able to solve problems that you weren't able to solve before, or save long computer time. So what is necessary for the method I'll be presenting today, but for many methods that are similar, is that you need some kind of reaction coordinate to investigate the system with. So you need some handle on your system. You usually can't just say I want to accelerate sampling periods because then the question is what you actually want to accelerate. So there's a few methods that work completely general, like increasing the temperature or temperature replica exchange, but then for many biomolecules they would actually unfold at higher temperatures. So that doesn't help you in figuring out paths between states that you're interested in. So to be able to do something you need some kind of reaction coordinate, but that will be very system specific. But for most of this talk I'll assume that you know one or more reaction coordinates and then the question is how can one accelerate sampling along such a coordinate. So that's what we'll be discussing today and in particular one method used to do this. Okay so then general technical challenge in molecular dynamic simulations is that interesting events often happen on timescales on the order of maybe microseconds to seconds in biological events or even in industrial polymers or so things can timescales can be even longer. So we might need billions of time steps of very short two femtoseconds or maybe a bit more if you use virtual sites in Gromax. So that seems a bit of a waste doing billions of steps waiting to see transitions between one state and another. So but if we actually look at what's happening here then the event one is interested in or the transition itself can often be quite fast. So there are options there to accelerate things by maybe using smartly using many independent simulations or you could buy simulations to get more events. So if the event itself is fast then you need to find a way of generating more of those fast events which means that you could in total simulate shorter potential. So I'll show you both these aspects today here. So this could lead to well at least obviously the more efficient use of computer time if you can spend less time waiting and it can also lead to shorter time to solution if you're interested in getting an answer to a question in a certain given amount of time if you can run your problem more parallel. Okay so this method or generated accelerated method many of those depend on the fact that actual transitions can be quite fast and you spend a lot of time waiting. So this is also shown in this in this simple very simplified movie here which doesn't look at all like a biomolecule or polymer system or whatever but this is very very representative of many cases one encounters is that tubes are stuck in one minimum if the free energy barrier is much higher than the thermal energy KT. So you can wait for a very long time in the starting out in left barrier and you would never see the right barrier unless you can simulate one or many words of magnitude longer. But if you know how this picture looks like then there's several ways of actually improving the situation. So if you know how it looks like exactly as drawn here then you could just move your system from the left minimum to the right minimum here and then you would solve your problem of at least sampling the other parts but that might still not tell you how the barrier how high the barrier is in the middle. So we would like a method that samples the whole range you're interested in. So if you draw it this way and you have everything then you could very simply say I can add a bias potential to my system which is here the blue potential or negative which brings down the barrier exactly to zero. So if you could apply such a bias potential then your free energy landscape becomes completely flat and your particle would move freely from the left to the right. So now you've of course modified your potential your landscape so you will not be sampling according to the distribution that you would like might like to have like the Boltzmann distribution of the original system but since you know what bias you've applied you can correct for this. So you can exactly correct your weights of your samples for the bias which is simply with the Boltzmann factor of the bias potential or minus the bias potential so that's rather straightforward. So the question then becomes how do I for a general system or a general reaction coordinate come up with a bias potential that works effectively. So this is a problem because the bias potential is both the input and the output of the method so if you would know what the bias potential should look like so you know the actual free energy profile the black line here then you already know your answer so you won't need to do anything. So in practice you never know this so the issue now is that you have the bias potential or free energy which is both the input and the output of the method. So the way to solve such problems where some unknown is both the input and the output is usually an iterative solver so there are several solvers for this type of problem available which you might be familiar might or might not be familiar with which is for instance methanomics adaptive biasing course I'll present today the accelerated weights histogram method and there's a many more available in the literature. Okay so how does the accelerated weights histogram method work? So here unlike many other methods in this field the central quantity is the target distribution so the distribution you would like to have for a system. So you set from the start you set out what's distribution do I want along my reaction coordinate or multiple reaction coordinates and then the method is set up such that you should if it converges you should get the target distribution that you have asked for. So an advantage here of the AWH method is that the target distribution can be chosen completely freely and it could even depend but it does not have to depend on the free energy so you could for instance choose a flat distribution which is often good not always but reasonable good choice like I show here in this very simple example you could flatten out the landscape and get a freely moving particle on the flat or degree of freedom on the flat distribution on the flat landscape even a flat distribution but one could also one could also choose to have a distribution that for instance depends on the free energy which for instance certain forms of metadynamics have where you get an enhanced temperature distribution so you get a distribution at a higher temperature or you could apply cutoffs as I'll show later. Then another nice feature of the accelerated weights histogram method is that the initial convergence is exponential which cannot continue forever since as you probably know the error in sampling usually goes as a square root of the number of samples. So you can't have any exponential convergence forever but if the errors are much larger than the thermal energy you can each have exponential convergence and later as the errors get on the order of the thermal energy you could you switch to the slower final convergence rate and this is fully automatically controlled which is very convenient and then there's only one uncritical convergence parameter left in the method. Okay so here's a schematic drawing of what the what the AWH method does so you extend the ensemble with a biasing with a coupling parameter lambda which couples to which is your reaction coordinate this could also be multi-dimensional so the idea is that you perform or you start out with an initial bias then you which could be zero if you know nothing but it could also be some good initial guess then you run a similar bias simulation then from that you estimate the real distribution because you know what you're sampling and then you can update the bias for that with the idea of converging towards the target distribution which of course requires some physics or some mathematics to work correctly here so the idea is that we have efficient updates here and as I said before the target distribution is explicitly included to converge to so given this framework there's actually not many choices to be made because everything has to be consistent according to statistical mechanics so if one works this out one gets a scheme as drawn here so one collects samples with a certain bias f0 here for instance start out and then one needs to generate with these samples a new bias f1 so this these new bias or the correction to the bias from f0 to f1 is given by statistical mechanics which is produced the formula shown at the bottom here so the free energy changing free energy is minus a log of one plus and then one has the fraction of the samples one collected so this delta w divided by what you expected given the target distribution so if that is yes so that gives you a correction which is on the order of one over the number of total samples which is fully consistent with the statistical mechanics of the recorded samples so if you like to read more about this about the basics then I suggest you read the reference provided on the on one of the first slides that explains the whole method in detail here but here in this formula for the free energy correction there's no choice to make this is the only way that provides consistent bias to get towards the target distribution that you've set okay so if one knows this formula then it's simply a matter of iterating here with collecting samples correcting the free energy collecting samples so there are some parameters involved one can decide how long one wants to sample in between these how long one will collect samples for instance which we generally do quite often because there's no reason to postpone this strictly speaking one would have to do a complete equilibrium sampling so long enough to get a full flat distribution but in practice that's not really a really problem okay so then the next question is how does awh apply the bias that we that we just discussed so the initial setup we had was using a harmonic umbrella potential which is moved using Monte Carlo so that's somewhat similar to umbrella sampling but then instead of having many independent simulations with one umbrella in different spots you move the umbrella around but the more elegant way which we've implemented in the action the first release official release of the method in gromax is that we have a convolution of gaussians which are produced by these umbrellas and we didn't take the potentials the bolstering version of this so to say so this produces a smooth bias potential over the whole landscape but still based on umbrellas at given spots so we have a regular grid where we have the umbrella potentials which are co-convolved into a smooth biasing potential this has the advantage a comparison is to a meta dynamics is that we have always a fixed a fixed grid so we know by by how far these umbrellas affect each other we can just index them so we can have a very fine grid or very high force constant correspondingly which requires fine grid without much extra computational cost which is convenient and also it makes the processing very easy if these things are equally spaced in the analysis as well okay so here's an example of how awh works here so this is again the simple system with two minima and you see that the the the red particle now goes from left to right and it builds up this biasing potential in blue which starts to over time become very similar to the to the triangle landscape which is the the black line and it converges nicely as you see so if one would run longer it the difference would go to zero so at the top there's a correction factor given or a scaling factor more to say so that's that's how this how this initial final state conversion is controlled so this is done by having two different stages in the method so there's an initial stage which has constant update size so this is not consistent with the formula I have shown before but initially you have very non equilibrium sampling so you're pushing a system around forcefully so in this stage what happens is that if we cross the whole range which is easy to say easy to define in a single one-dimensional case and then when this when the range is crossed which is given by the colored crosses in the plot to the right then we divide the update size by factor three so this means that as you see in the plot of every time we cross the update size goes down by factor three so this gives exponential convergence initially because each time you cross the update size goes down a lot then but this counts continue forever because statistical mechanics tells you that the error in the end should go to the square root of number of samples so what we do is switch to a final stage where we have constant update weight so then as you see in the plot on the right the update size goes as one over t in that case which means the error goes down as one of the square root number of samples for the square root time and the point where the switch is made is exactly when the initial weight would exceed the final weight which is something you do not want since the initial which should certainly not be higher than the final weight since there um somewhat biased due to the non-equilibrium pushing of the system whereas in the final stage the system will move around freely so the system the method and the system finds themselves automatically the right point in time and the right update size to switch from the initial stage to the final stage which is very convenient because you don't need to control that so that comes out of the method if one looks at at how the sample weights then then change on the left there's the update size on the right of the sample weights you see that so there's a log plot notice that so the the initial samples are weighted down very very much so in the end if you stay for for some time in the final stage then all the initial non-equilibrium samples will be weighted out completely one could also explicitly ignore them but it's not really necessary since they're weighted out exponentially as seen in the plot on the right okay so now we go to some examples of the method here so one one instructed example is this dna base pair opening which is which we've studied so this is an event which happens on the millisecond time scale in experiments as observant experiments so that's too slow to simulate with molecular dynamics even with gromex that's very fast uh closing is is is forward as a method faster so that's that's reachable in molecular dynamics time scale so that's from this to rates one can already see that the the open state is tend to the mind the tend to the power four less probable than the closed state so one in experiments never observed the open states so therefore simulations are needed to actually characterize this and and see how it looks like and what the process is also involved here okay so what we've done here is we've defined a reaction coordinate which is the the central hydrogen bonding distance between donor and acceptor in the in the base pair interested in so you see a typical closed and open configuration shown on the on the left this finding the reaction coordinate is actually usually a difficult problem in this case it's rather straightforward because it's one if one knows about DNA one can think that this hydrogen bond is controlling the process in many other applications finding the reaction coordinate is by far the most difficult problem but that will not be discussed here so that's rather method independent what well uh yeah finding a good way or finding a good reaction coordinate okay so then how this you how do you do this in practice so um the aph method currently only supports pool chromatic pool coordinates we plan to extend this in the future to other types of reaction coordinates and more cool coordinates so one needs to define pool coordinates which you might be familiar with so there one needs to define in this case two groups one one pulls between the names of those which are well I've just called them rest ID here but actually it's only one nitrogen in each of the residues that's the two atoms we pulled between and then we set up a reaction coordinate which is a distance and the specific thing here for the 808 method is that now the type is not what might normally be a harmonic potential or something else but we now provide we now say we have an external potential provider and then we have to provide which provider that is which is awh so that's specified here so then a gromax will know that the potential for this pool coordinate comes from somewhere else and what that is and that's the awh method so now we actually need to provide the now we can look at the parameters for awh itself so uh this is a the simplest setup one can come up with so this is uh what has to say that one wants awh awh is yes so now we have one bias with one dimension um we set the coordinate index so that's the pool coordinate index which is one so that's refers back to the pool coordinate there so you can link to different pool coordinates you can also have multiple biases or multiple dimensions which could link to different coordinates we have to say the interval we want to sample so it starts at an end range which in this case is a 0.25 nanometers and 0.6 in this file that we have to give the force constant for the harmonic potential which should be so that's a important parameter it should be um it should give a brella potential or bison potential which is uh can have a higher curvature than your free energy landscape otherwise you cannot control your sampling so this should be in general you would want this to be high um and as i said before this is not an issue having this high since there's not a strong cost to having a fine grid the only issue is that you should be able to integrate your system still with the given time step in this case to 10 seconds so this is quite close to the limits of that uh then the error is controlled by the or the initial update size is controlled by a diffusion and an initial error setting so i'll discuss those two parameters later those are the actually control only one parameter in the combination um so that's the information you need to give which is not my standard a few more things about how often you want output and so which i left out of here later we'll also show what happens if we want multiple simulations to share information that's given by these share options which are not so many there you can just say that you want multiple simulations to share the bias okay so this is what one has to set up in in practice then we can look at what comes out of the method here oh sorry sorry sorry first we'll we look at the initial update size so that's the only parameter that controls convergence so there are two parameters but actually we're setting only one quantity here and that's the initial update size so but since the initial update size is not in quantity that's very intuitive uh sure not to the user certainly not to the user but even to me as an experienced person with the method i wouldn't know what the magnitude of that would need to be so we set that indirectly by setting a diffusion rate which tells how quickly the system would move along this reaction corner and then some estimate of the initial error so here's some numbers given which are quite typical so initial error of five kilojoules per mole for instance and a diffusion of five times tensor minus five nanometer per picosecond and i'm with the square picosecond sorry these are luckily are not are not very critical so the you can vary this initial update size of these parameters by quite a lot and it will not affect your convergence too much which is a very nice nice property of the method so if one chooses these parameters much too large so you get a very small update size that means you would get slow convergence so if you make it very make the parameters um very large then um let's see you just hope it's around that was correct then if you get very slow convergence then uh you will will will probably note the system almost doesn't move so that's not a desired property the other things if they're if they're too small then the system the update size gets very large and then the system might be pulled apart which is also not something that you would want um but that's easy to observe so both issues are rather easy to observe and there's a quite a big region in between where one can choose the parameters uh and not have too much effect on the convergence speed this is all because the initial convergence is exponential and later you go to the one over square of t regime there's could still be an issue that your system has slow converted in general if the multi is slow so that could be an issue but one can easily figure this out by playing with a few values here since the initial phase usually goes quite quickly where you can see what happens okay so then what do you get out of the method so there's a tool called gmxawh which extracts from the energy file different quantities so this is all that's stored in there um so you get several quantities out of there so one is potential of me force the black line in this curve um the then there's a corner bias which is the the red curve which is quite similar to the black but that has the convolution with the bias potential in there so that means that it's slightly smooth as you see so it's a slightly less sharp features than the p mf so that's the bias you you apply so the the deconvolution happens automatically and fully consistent with the method and you can get the p mf out right away uh then uh we'll look what we look into a bit more detail in the next slide is the is the different distribution so there's a corner distribution of reference value distribution and the target reference value distribution and then there's also something we call the friction metric that comes out of this which tells you how difficult it is to move the system over the reaction point along the reaction point so this is quite a useful quantity where it tells you in this case that around 0.4 nanometer there are some difficult region which in this case actually is for the DNA opening is where the hydrogen bonds breaks and reforms so before and after so when it's the hydrogen bonds in fact it's easy to move the system and when it's open it's also easy to move the difficult part is finding the pairing which is yeah not so surprising but actually the method tells you this uh for this friction metric that that's the high the difficult part so in this case the friction is the high friction because it's difficult to find the pairing uh of the hydrogen bond if once the base pair is opened it needs to find the hydrogen bond part again okay so then we can look in a bit more detail at the distributions that come out so as I said initially this AWH method is based on the starker distribution so in this case it's chosen flat that's the the blue line so we say we want a flat distribution for the reaction coordinate and then one can see on the left after 20 nanoseconds that the distribution is quite far off so we have a coordinate distribution and a reference value distribution which is again they differ by the uh convoluted uh or the conclusion with the potential so the the red line the reference value is slightly less sharp than the coordinate distribution so it sees that initially after 20 nanoseconds it's not the red line doesn't match the blue line very well so that means you haven't uh converged the sampling yet so in this case the system is managed the the DNA got opened but it's that doesn't go back to close again easily likely so if one simulates longer like 10 times longer then I think they'll get a lot flatter it's more perfect but it's a lot better um notice though that the the free energy is not biased by the same amount that the distribution is biased because that's taken into account so the answer from after 20 nanoseconds is quite accurate already actually even though the distributions don't perfectly match because that's simply that system spends some more time on the one side than on the other one detail one can see here one effect is that you see where the errors are pointing is that the coordinate distribution has a a peak at short distance deviates from the reference distribution and this is typical of the case where the curvature of the free energy landscape is is higher than the than the bias potential you you provide so this would this tells you if you want this exactly to match or to match closer you would need to have a stronger force constant here which is getting close to the integration limit but again the deconvolution takes this into account so it doesn't affect the the results too much in this case since the they don't differ by very very much so the sample is still okay okay then we can look at results one gets for instance in this case you can see that there's uh we we looked at uh different force fields you can see for different base pairs and different colors you can see different free energy profiles you can also see that the free energy difference is quite high so the the sampling is accelerated by several words the main queue tier so if you would wait you would never see this base pair opening so the method works as it should okay so then um we can actually also uh use multiple simulations to sample the free energy landscape so there's no reason to limit the sampling phase to a single simulation we can run many so then the question is how how many can we run so we can run the sampling independently for many simulations they share the same bias so the the whole biasing uh machinery is still only one but we collect samples for many simulations so that's is an example given here in this movie so you see that the particles there's now many i thought there were 12 but it's difficult to see how many there are maybe there are the six but they sample much faster so they move around freely over landscape and converge this potential quickly um so to analyze that one has to look for each particular case how well this this works so here's a plot given for this dna example with one two four 16 and 32 walkers and you see that there's a big gain actually going from one to two from the black to the red line so that's a factor four more than a factor of two so actually get super linear scaling here which is because a single worker gets stuck here in there so that's unfortunate so it's actually good to use multiple workers to get better effects than using a single one after that the scaling is linear up to 16 or so and 32 it doesn't gain much much anymore so with this we've had a showcase example for for bioxcel where we had 20 different base pair openings in different sequences and we could show that we could do it in one hour if we use this ensemble method compared to 34 hours without this sharing here to get to an error of half kilo supermol so this was run on 614 nodes of the cscs base data machine in switzerland so this this sharing is very convenient if you have a big system where there's just one single workstation with a gpu you can still share and that will accelerate the sample quite a lot then a final example here is aquaporin so here we looked at how water permeates the channel versus ammonia which is some bacteria some bacterial versions of aquaporin allow ammonia to pass as well and we were interested in why this is the case so here this is a aquaporin is a a multi-mercancy system of four identical subunits so there are four pores so you can run four independent awh biases one in each model or so the method also supports that so then we we can look at what are for which you actually don't need to do any bias sampling because water is just in there so you can just directly look at the distributions but if you want to look at ammonia then you need to have some biasing so in this case we used awh because you could use umbrella sampling but then you actually lock the system in so it can't move if the ammonia can't move then also the residues around it can't move easily you can also push something through but then you get non-equilibrium artifacts so in this case we could see that we get with charm force with the top and amber at the bottom we get for instance ammonia here has a higher barrier to pass through in charm then in yeah versus water and charm in amber it's relatively similar so there's force fields give the differences here which makes it difficult to conclude something final here but you see the method works very nicely in particular for instance for you see for charm you see that the water case it mutation gives a far more structure in the PMF for the water case so that probably makes water less permeable and makes it relatively easier for ammonia to get through so the sensitivity changes as given by experiment but you can get very detailed the free energy profiles as you see okay then finally a comparison with with with other methods so here i'll just discuss awh versus plume which is a very popular plugin which has also been a webinar before and it's used in in grownax as well so awh is is very robust and easy to use because there is it controls itself to a large extent and there's only one critical parameter which is not so so critical to set up so awh is limited in a sense that concurrently only acts on pool corners unlike plume which has a much larger range of reaction corners but the advantage is that awh is fully integrated into grownax and therefore the awh and also the yeah as it already did to the center of mass pooling they work efficiently in parallel so you can run this on the on the parallel machine very efficiently with very little overhead um we would also like to extend the pool code with atomic contacts which is probably one useful extra feature uh and linear combinations of pool coordinates um or maybe also nonlinear so suggestions for other exchanges are are welcome here but it's not our goal to cover all the options uh that plume provides so for that there's there's plume which has a good good job of that but the integration thing brings a lot of performance advantages here plus the advantage of awh itself okay so then finally to conclude um awh is a very robust accelerated sampling method so you can see from the since it has a target distribution if you get the target distribution things worked if you don't get the target distribution there's an issue which might be uh you can either see issues with your erection corner if it's there's a a jump in the distribution somewhere then it means that there's some some issue in your erection corner at that point you can also see this from the friction metric another advantage that the target distribution can be chosen freely so it could be flat but um for this multi-dimensional case i was a slightly disappeared it was black that's the multi-dimensional case which i show here on the right so there we run awh with two different erection corners so there um you can get a two-dimensional landscape but then you would want to cut off for instance to uh because otherwise if you ask for two-dimensional sampling you get two-dimensional sampling in a rectangle which might not be what you want so you can set a cut off there for instance 20 uh thermal units and you get the the sampling only in the region you're interested in so there are several other choices of target distribution as well which allow you to control the sampling there's only one parameter that controls convergence and this parameter is not very sensitive which is also very nice and it's easy to to check and control so we support one two three and four erection corners probably i don't think one or two these manageable three they probably getting problematic sampling wise it supports multiple walkers as i've shown and it works efficiently in parallel but as a as a final note the challenge is still or maybe now completely because eight of the wage basically uh fixes the the sampling issue once your erection corner is defined the issue is is still coming up with a good reaction corner if your reaction corner is bad then any methods uh no method will work for sampling that well so the challenge is still coming up with good reaction corners but if you have one then awh is a very efficient and convenient method to work with okay so that's um all form for my presentation so now it's time for questions thank you berg can you go to the slide with the which shows how to ask the questions so now we have a opportunity for the audience to speak to berg we have a question by rajat so i will try to connect him to audio let's see if we can hear each other hi rajat can we hear each other yeah hi rosen uh there was a great talk berg your online please go ahead hello berg yes yes i'm here what's your question i wanted to know how you define small large and small quantities for the diffusivity and the energy uh so are there a few orders of magnitude um because i do let us say we have a reaction coordinate along which i do not have a good estimate for diffusivity yes how low can we go um well that's if you don't anything that is easy then you just try some values so in the end there's only one only one initial update size so you just you just agree one of the two parameters and see what you get um so uh either it falls apart or it goes very slow and then you can you can you can change you can try try some some values until you get the process convergence so this is this doesn't take much time often because uh you you'll see quickly what happens so if you have no clue then then you just try otherwise you take the default i think there's a default value which works quite okay for many cases but trying a few is not a problem so it's since since it doesn't affect the final answer um you can you can do it really like see see if it works so just play around three three different values we can tell you enough thank you burg uh rosen can i ask another quick question yes please uh so burg what were the pull groups that you used for the aquapor in example sorry what what were the pull groups that you used for the aquapor in example groups oh i've got to look in the detail so of course one one is the ammonium or water which we also tried the other thing is i i think it's it's um most of the poor lining residues so it's probably the the the group's closest to the port for a large scan so you see you don't want to pull on the outside of the molecules and then it might react in in in ways you don't want so but it's something that's close it turned out it's usually even that's not too critical in this case because the system uh that's the initial update size is very high it it won't react too strong so this might be a problem of course in other cases for this case it's rather critical i'd say thank you burg okay uh we have a question by luka let's see luka can we hear each other maybe yes i can i can hear you here you are yes go ahead right uh so i was wondering if there's a simple way uh to estimate statistical uncertainties in the friendly landscape uh that's a very good question i haven't discussed that here in this in this talk so the more advanced the method is the more difficult it gets to estimate errors so one thing is one can look over of the time evolution but the the thing we usually tend to do is run uh a few independent simulations like four or so and get the uncertainty from that um so that's the the the safest way to do so the the more information the method uses in awh is quite a lot the more difficult it is to extract reliable error estimates um so i would usually say run independent simulations which often want to do anyhow for checking all right so one can get a rough a rough guess by looking at at how the free energy changes over time but there could be a systematic component onto that um we would also look at that usually but uh yeah for a final error estimate for for for for a paper or so one would want to use your run independent simulations thank you which you can by the way average again so you can take the error estimate from that and you can average if you do four simulations you can take the average of that to get an even lower error but then maybe you can also use the multiple walkers for this no because those are all coupled so the coupled case which are short converges makes it faster but it's even more difficult to get the error from that so um i've seen that before friends with with um uh meta-limit simulations with multiple walkers there people talk about error estimates but you won't know unless you run multiple of those multiple copies of the simulations because they all contribute to the same quantity so they're not independent so you can't do anything from from from that same here with awh so you get better you get lower errors but yeah you don't know what they are unless you run multiple uh realizations okay thank you very much all right and uh we have a question by maria maria can you say something see we whether your audio works okay maybe we don't have audio connection with maria so i will read her question and her question is can the free energy be accurately recovered even in the case of a very high diffusivity the converges parameter d is it critical that this parameter is set in a certain range no so that or well the answer is yes it can be accurately recovered so this parameter doesn't affect the final result it only affects affects how quick you converge to the final result so these these two parameters which control the one update size only affect the initial phase so the final result is is independent of of the diffusivity parameter you set or of the actual diffusivity of the system so those don't even need to match um it's nice if they match because then you're it's easier to understand but uh all that doesn't matter so of course if your system diffuses very slow the error is going to be higher since the the error basically goes with one over the square root of the number of crossings through your interval so if the system has very low diffusivity the interval likes more but that's an inherent uh issue with sampling so that's not specific to this method okay uh thanks Burke and we have a question by Arjun let's see whether Arjun has working out you Arjun can you hear us probably yes but uh so Arjun hears us but we can't hear you you're unmuted probably the microphone is not working okay I will read his question would you suggest using this method for large protein with a large open and closed state given using center of masses um well that's a difficult problem so this method works works well in general but it's it doesn't make the problem um much easier so I would say yes it it works but things can be very slow because the the system has to diffuse in the end this method biases away the free energy barriers and your system will diffuse over the flat energy landscape if you chose flat as a target distribution so this diffusion is going to be very slow in the case of a of a protein with two large domains so it'll work but it will be very very slow but having said that I don't know of any other good methods to to tackle this problem so I would think it's quite hard unless you have a lot of computational time but this method won't do any worse than other methods I'd say so it just takes a long time okay thanks Burke we don't have other questions in the queue I have one question would you give any tips for example some recommendations about choosing reaction coordinates which is a challenging step in the so that's that's very system dependent so that's that's the critical question in most cases is what is a good reaction coordinate but that's for that you have to know about your system so if I people giving a particular system I could say something but that's not my job of course here to provide people consultants on that but this is a this hard problem so the I would like to have well what I would really like to have is a method that that's that can automate or help search for better reaction coordinates but AWH already helps in a certain sense since as I said during the presentation you see when the when their distribution does not convert to a target distribution that's one thing and the other thing the metric has been very helpful in in finding where there are issues in the reaction corner so if you see a high peak in the metric and you don't understand why like in this dna case I we knew why it's there it's because you open it you open and close or you need to break and reform the hydrogen bomb so that's natural so we might be able to improve the reaction coordinates there one can actually also which I haven't presented here to an only in your transformation of the reaction corner which is in one of the papers so you spend more time at this it's a difficult region so that can actually help them but this this friction metric can often tell you where there's issues in your reaction corner so it might tell you where there's a problem it doesn't directly tell you how to improve the reaction corner but it has helped us in many cases identify issues for reaction corners so that's at least one thing is that you can see in which particular area there are issues with your reaction corner and that might help you to find issues quicker and come up quicker with better reaction corners thanks yeah it is a challenging problem um there is one request from Arjun just came in uh if possible to give a tutorial with an example uh in future but what's meant there I mean there were examples here or this does he mean an interactive tutorial uh right of knowledge we so we we actually have an have an awh tutorial that we use at workshops a viroxial workshop but you haven't published that yet so I was looking if someone was online but there wasn't yet so we should we should go through this through one or two tutorials that we have that we can publish them so people can have a tutorial to play around with on the other hand the method is quite it's quite simple to use so it's if you want to apply to your system it's not hard to do just by using the manual okay great well this this would probably be exactly what he's looking for and he thanks for it okay it looks like we don't have more questions and that is all then back thank you again for the great presentation to the next slide just a heads up for the community that in November we will have a webinar with Steve Crouch from the Software Sustainability Institute in UK and it's a it's a very interesting topic which refers to software development in general how how to build your software how to maintain it how to work with the communities around it to ensure that in the long term it's a sustainable product some of these ideas are heavily influenced grommos development as well so I would recommend to all of you who develop your own codes to attend this webinar and subscribe to our mailing list visit our website we will have more webinars coming we are in discussion with the presenters so keep an eye on on twitter and on the website as well thanks to everybody today and thank you back again and we'll see each other soon bye