 You can start sharing your presentation. Yeah, you can share the screen again. Now it's gone, of course. Okay, so let's try this one. Okay, can you see this? Yes. Okay, so should we go full screen now? Yes, perfect. Right. So first of all, let me thank you all and especially the organizers to that they give me the opportunity to present some of the work that we have been doing for almost 20 years or actually maybe it's even closer to 25 on transition part sampling and its applications to complex molecular processes. And I'm going to focus a little bit on bio processes because that's the theme of the workshop. And yeah, I prepared a actually a pretty long lecture, but we'll see how far we get. So I introduced very briefly the notion of rare events and why we need something like transition part sampling in the first place. I will show you how we do it or how you can actually sample parts, how you can then analyze them, how you can enhance the transition part sampling for reaction networks or for multiple state systems. And then I will hopefully show also some advanced developments. And only if I have time I go to the kinetic constraints part, but I think I won't even get there, considering it's only 15 minutes. Okay, so again, I think this is a workshop where it's also probably some pedagogical part so if you have questions or something is unclear please raise your hand and well maybe even interrupt me. Okay, let's go so the first thing is that we are concerned with. Okay, sorry, I mean I have here a menu that I don't want to have. The first thing is that we have molecular dynamics processes or molecular dynamics in our simulations and this is of course governed by well Hamilton's equation or Newton equation as you can see here that's the acceleration is the governed by the gradient of the potential and this potential is usually an approximated by a classical form with bonded interactions and non-bonded interactions which are usually the Coulomb and van der Waals interactions. And of course the bonded interactions are governed by bonds the angles and dihedral and here you see a nice movie of such a molecular simulation as you probably have done all of you previously. Now the classical MD is capable of resolve systems at the domestic levels. And then we can do statistics and we obtain free energy landscapes we can get stability of structures if the force with the right of course, but we even can get transition states between stable states. But further, I think that classical MD is also capable of predicting kinetics and that is important for observables like rates, but also understand, you can get realistic information of transitions and even transport properties like diffusive behavior. Now classical MD has of course two important sources of error and I think that this has been discussed before, we have the sampling problem that means that we need to cover enough of phase space to make a proper estimate of our statistics. That is the systematic force of error. And I'm going to focus on the first, the first part is sampling problem in this talk. One of the causes of the sampling problem is that the current MD on all atom system is still limited to under the millisecond time scales. I mean, I think that Anton still is the record holder here and most activated events can take much longer. Now, why is that because there are large barriers between states and this is a picture that must be very familiar to you. So we have a free energy landscape. This is the collective variable Q, or the order parameter as you sometimes call it. And here is then the free energy on the y axis and you see that there is a stable state a and a stable state be a transition state in between, and pathways or trajectories or systems, I can linger in a for a while and undergo transitions to go to be. Now, when you see this in a time series, this will actually usually look like this so we have a long period that it stays in a, then it suddenly jumps to be and then it goes back again, and this is known as a rare event, where the transition goes quick compared compared to the molecular time scales that are over here I mean it's it stays long in the stable state relatively. Okay, so now my yeah. So if you have high dimensional system as we usually have in biomolecule systems a transition state search is actually futile it means that it's very difficult to find transition states, there's too many of them. And what you usually need is some sort of an enhanced sampling technique at that that is able to bring you from a to be without waiting so long. And the way to do that is divine a reasonably good reaction coordinates using collective variables, and then do some enhanced sampling, and I'm sure that that you have all heard about umbrella sampling and of course, most famous in this community is also metodynamics. Now this is actually sometimes troublesome and this is actually my attempt to, to show why this is troublesome so here we have a couple of stable states in this case a two dimensional free energy landscape so here you see to stable state and one metastable state. But suppose that you only know about the x variable, then you would not even see at this third state. And moreover, if you only know about why the third state is also hidden and what's even worse, it's outside of the A and B minima. So what happens if you actually move you start for example in this particular stable state and then you move into this direction. It's not guaranteed at all that you escape into the right well, but it could be even worse if you start in, in, in why you go completely in the wrong direction and you will always stay in this same position. And this is very similar to the famous cow analogy so here is a little cow and of course it's surrounded by flies. And the idea is if you then use as a coordinate the number the fly position, you cannot expect that the cow is following this is of course a little bit unlikely. So it is not the good reaction coordinate. I mean, there is of course a little bit of a, another way of looking at it and that is in a more complicated landscape like this, and where we had a, again, and minimum here and a minimum there and the reaction coordinate is not only along this but also needs to go into the queue prime. And so that is a settle point here, but if you only push into the few direction that what happens if you the system doesn't relax quickly enough into the queue prime, then it actually shows huge amounts of and you actually end up with the wrong estimate for the transition state. Now for this type of systems have it's clear that you need methods that that circumvent this this problem and one way to do that is to use a so called to ended method, a way you create pathways, trajectories between to privately defined states, and that is also in this case known as transition part sampling. So I'm going to focus on that. And I'm going straight to what we actually do. And so the transition part sampling method is about an important sampling of the so called rare event path ensemble. It yields the path ensemble that can be analyzed to extract mechanisms, reaction coordinates, kinetic rate constant, and also free energy. So this is a nice cartoon showing how this is being done. And just to remind you are about the philosophy behind this is actually almost the opposite of what you would normally do. And so the TPS philosophy is that you start with sampling all the path ensemble so all the relevant trajectories, then go to the mechanisms and to extract and then from the mechanism and sampling more parts, you get kinetics rate and in the end you actually end up with a free energy. And this is probably, I mean certainly reversed from the normal way of doing things that you first end up in a normal and sampling first to free energy calculations, and then made me look at the kinetics later. Now, of course, you want to have an exponential speed up. It means that the rare event timescale is extremely long. If you have milliseconds or even seconds or even hours to simulate in real time, this is of course undoable. But in part sampling you focus on these very fast parts, and therefore you get an exponential speed up. This is only possible if there is of course a separation of timescales, but in many cases this is the case. The advantages with respect to other path methods is that you get unbiased dynamics, that means that all the trajectories are real molecular dynamics trajectories, you get exact rates, and the promise is that you get independence of collective variables. Now, there's also software packages available and I want to point out open path sampling and pirates here. Okay, so just to dive a little bit more into the details of the method. So we actually first need to define a transition path probability density that we can sample. So to start with we actually define the path itself. This is this bold X, which is of course has a parameter which is the length of the path. And it consists of a high dimensional vector of each of each of the frames and the time frames of the system. So this is the system at time zero time slice one, etc. So this is a discretization of the space time. So here you see a two dimensional version of it with the position R and a momenta P. And then you can have a time discretized path like like this. And then you can on that path you can define a path probability by defining a density or a path probability, sorry, a configuration of probability for the first time slice is copied by this circle here so this is kind of a density of the first time slice and then we have short time Markovian probabilities transition probabilities to go from the first slides to the next slice and then along the path like this. So this is the initial distribution to usually a Boltzmann distribution. And this is a short time accordion propagator, which is usually either an MD propagator, or a, for example, lunch van, and you can also do Brownian or Monte Carlo dynamics. So this is indeed the path probability. Now, the next step is to identify the stable states, and we usually do that by indicator functions. So we have here complex landscape again. That's what this is a high dimensional free energy landscape. And the idea is now to identify within that landscape, some stable states and they should be fairly stable so low free energy or low energy. So you define this as well using an indicator function that says that the configuration isn't a then is indicated functions one, and otherwise it's zero. And then you can define this path probability distribution here is a restricted or a constraint distribution, where the p of x is the same popularity here and you put these two indicator function on the first and the last slice, or time frame, and then of course also normalize otherwise you don't get a distribution so this set here is a normalization factor which is similar to a partition function. Now, if you have a transition. Sorry, a property like this, you can propose a important sampling, you can construct a metropolis. No, what I'm saying the same a mark of chain from by proposing a trial move. And then accepting this trial move with a metropolis hasting algorithm like this. And it turns out that you can write this as follows. And so, first of all, because of the form of the indicator functions. The, the H a and the H B go out of the min function so you always need to obey the constraints, but there is this this complicated fraction to worry about, which is the path densities of the new and the old path, and also the transition probabilities of the old and the new path. Now, this sounds almost like impossible but the magic of TPS is that, you know, yeah, that this actually all becomes much more simple if you use a generation which is actually the same as the the same as the underlying the underlying dynamics and I have to see now that I actually have changed this by coincidence. So what we then do is actually we create a new path, not by a proposed move, just in general by just moving a configuration around. No, we actually use a so called shooting move. And the shooting move is a special move which is okay if I have a initial pathway like this solid line here, and I identify a random slides on this, and this is a point here I will take this by a random position, and then I move this size, only a little bit with an in practice we only change the momentum a little bit with a delta p. And then I integrate my equations of motions forward in time, and also backward in time. And I do this until I reach. Well the end of the pathway or I reached a state I come back to that later. And if this is a valid pathway you can accept it, and if you if it's not valid, you reject it. So this becomes extremely simple. Now why does it become extremely simple because the generation pathway as I told you before is the probability to select a certain frame, X of t, X of tau prime, and you actually change it a little bit. So this is the generation probability. And then you actually have a forward shot, which is this is given by the propagator, but you also have a backward shot, which is the reverse move. And what do you do in the reverse move you actually reverse all the momenta and you integrate backward in time. And so it turns out that the backward integrator is the same as in the needs the momentum reversal, and then integrate forward in time. Now, if you assume a symmetric generation problem of probability so this part is symmetric. The only thing you end up with if you fill in all these generation probabilities is these large fractions here with all the generation probabilities and the acceptance probabilities and the path probabilities. Then you assume, not assume you know that the system obeys microscopic reversibility. This means that if you go from a position X to a position Y. And you divide, this is the transition probability for this particular transition, you divide by the reverse process, then this must be equal to the ratio of the Boltzmann distributions. Yeah, so this is a property of the phase space. And this, if you plug in this in here, everything cancels. This is of course the lucky part of the of the wider TPS works. And so you only end up with this particular acceptance ratio, which even for constant energy at the shooting points, that reduces to this very simple expression. So this is a long story. So what you actually do in a standard TPS algorithm is to choose a random slice, you change the momentum slightly. So, and you actually integrate forward and backward. So you get a trial path. Now this trial part is obviously not correct because it goes back to the initial state so this is a rejected. So this acceptance are three places, then you do this again, and again, and then you calculate average of all this path ensembles, and you repeat this indefinitely. Now, of course, the, this is the standard shooting algorithm, this is only valid for when you don't change the, the energy of the Hamiltonian of the. Of the slice that you chose. If you, but you want to change it anyway of course you want to change the delta p to get a new path. And this is, and it works of course because you do change your path a little bit. Because at the barrier paths actually are more likely to go to the, to the stable states, and then in one of the, if you start in one of the stable states. So you select parts of shooting points which are close to the transition state. You can actually do a lot of different sampling algorithms, and one of the most promising or the most important for biomolecule systems is to do the so called one way shooting algorithm. And that is, if you start with a path like this is the black path, you select a new frame, and now you only shoot one way, either forward or backwards. And the reason why you do that is that in, in this particular form, when you have deterministic dynamics, you have a fair chance of being close to the old path. You have a high acceptance ratio. Now for a diffusive path or more path where the friction is much higher. This is not necessarily the case and you end up with a very low acceptance ratio. And for that you actually say, instead of two way shooting you do one way shooting, sort of means that you only go forward, and then in the next iteration, you go probably backwards, and you end up with a completely new path. Now this of course only works if you have enough sampling so you need to do some sort of a check and this check is often done by the correlation pathways or declaration trees. So you start up up here and you do all this shootings and you see the number of MC steps or the shooting steps here indicated in a forward which is red and in a backward which is green. And then you can see how this actually evolves over over time. And in the end, you, you have sampled completely different pathways. So this is called the correlation of parts. And we have to be a bit careful because the word the correlation is a little bit over used here what what this means is actually that it has no frames in common with the previous path. Now, this is only one example of a shooting algorithm and in fact we have now many shooting algorithm so here is a very recent paper that overviews all of the different approaches. I'm not going to go too much into detail, but I do see a hand raised by someone. Please go ahead. Hi. Yes, can you hear me. Yes. Okay, I was wondering what is on the x axis of the correlation pathway and I never saw that graph so. This one yeah it's a complicated graph. Good question. Again on the y axis is the time of, but this is not only Monte Carlo time, and on the x axis is the simulation time. So, here is the initial path, or like I should actually start here, this is the initial path time zero is the initial path time T, or the, or the last time slides. And when you move to the next pathway, you have a different path and could be longer. This is of course, an introduction of the shooting move where we have a flexible length. And the reason why we do flexible length is that this is much more efficient than always go to the end of the trajectories. Okay. Is that a bit clear because it is it is a whole. It indicates how long you need to reach one of the stable states then. Yes, exactly. Yeah, the longer it is the longer it. You have to wait for the week for you enter the stable states. This is, I mean in the original versions, the length of the path is always fixed. But we found out over the years that a flexible approach. The initial path length is extremely much more efficient. And because you don't have to simulate all the time that you actually end up in this, in this region. Okay, I think I'm going further now. Are there more questions. No. Okay. For some reason it doesn't want to go to the next slide. Yeah. All right, one important thing is how do we define states. Here is an slide that explains what you should do. Well, if you use if you do have these two basins or two stable states, and you use one. So this is the X and Y here are different order parameters or CVs. And if you use two different parts of two different CVs for a definition of both. This will actually lead to trouble. So that means that if you're here, you're actually in in this dark yellow, and you're actually both in a and B. So this is not not good, obviously. So this is a big cross here. So this is a similar situation that you don't want. If you are defining your yellow block as state a, but actually there are some overlap with the basin of be this is asking for trouble so this is also not good. So here is a similar situation. It's very similar to this but here you actually you think you already reached be, but you did not and you actually go back to a so this is also not what you want. So what you should want is the last version, the last, and you want to make sure that things are in a or in B, but there should be no overlap between the base in a and the stable states definition of, sorry, base in B and a stable state definition of a and vice versa. And so this means that you have to be careful with the defining the state and the rule of thumb is always that you want to be as strict as possible, and meaning that you should be certain that if you are entering the states, the stable state, and you also stay there. This is actually the rule of thumb that you need to obey. Okay, so this was a long introduction. I'm sorry for the length of it probably. And so there are some over the years they have been many applications so here's a short overview. So there has been applications to chemical reactions and solutions to class transitions to micro phase separations and thematic reactions, reactions in liquid membranes, nucleation problems, and also by molecular generational conformational chains. And of course, I'm going to talk a little bit more about these applications, and I'm going to show you along the example and this is a famous example of the photo active yellow protein. And how we actually conduct these type of transition path samplings and how we think about how to extract information from it. Okay, so just a little bit of background. This is a small protein at the photo active yellow protein. It comes from cyanobacteria. And it is a signal protein that actually warns the bacteria for harmful UV light and it gives you a signal and then it actually the bacteria can respond by swimming away. And the question is how does it do that. Well, it actually has a chroma for which is a picomeric asset which is actually pointed out over here in this yellow box. And this is of course capable of absorbing light. And this is embedded in a hydrogen bonded pocket or hydrogen bond pocket. And this is a binding pocket, I should say, where we have the chroma for in yellow here, and it's held in place by this glutamic acid 46. It's actually in stylus in and this the system is actually the how it is covered in bound to the background. Okay, so that's the, the workings of the protein. And then there is a photo cycle, which actually has a ground state, which is, is yellow, that's why it's called a yellow protein. And what you have is then you absorb a blue photon, and the whole thing shifts to the red, it's called PR. And what happens then is that you get a cis trans isomerization or goes from trans to cis I would say. And because of that, a proton transfer follows, and you get partial unfolding into a signaling state called PB, which is blue shifted. And we get a signal transaction in a ground state recovery. And what we tried to find out is how is this mechanism for amplifying the signal actually happens. How does it occurring. And we studied here two steps, one of what the proton transfer and the other one was the partial unfolding. To do the proton transfer first, we did some QMM simulations, this is, I mean, maybe a bit basic as very old work also using CPMD and in the QMM version with the blip functional and an old Gromas force field. What you can see here is that you can clearly see the proton being transferred. And maybe I should show it again. So at the beginning, and so this is the eukomiric acid again with the system here in yellow. The tyrosine is hydrogen bonded to the to the chroma for. And what you can see is that there will be a proton transfer over here so there you go. And you can see the protons. Yep, there I went. And you see it. The donor was the calutamic acid. So this is actually how the, how we can identify the, the transition mechanism. This is actually using very short paths, but you should imagine that this reaction time is on the microsecond level. So we have a separation of time scales of about a million, which makes, well, makes complete sense but it also shows you that you can speed up this realistic calculation quite a bit. Well, we didn't linger too much on this part and just showing that we can do a proton transfer. In this work, we actually looked at partial unfolding title and so here you have you start with this particular proton transferred states. The proton is already on the on the on the peak American acid on the ground before, and then you have a couple of intermediate states, which are dubbed I alpha and you alpha, and then there is a fork in the mechanism where the the glutamic acid moves to the solvent, and then you get partial unfolding of the, yeah, of the, the binding pockets. Now we can actually do a transition by sampling all these intermediate steps. And so here you see some statistics so each of these steps here needed a special path ensemble. And we actually have parts that are in the order of nanoseconds 100 picoseconds nanoseconds, and then you can do in this study a few hundred of them, maybe not enough, but we do get the correlated part. And for this. So this was already an old, old study of a, I think, 10 years ago now. Where we actually already did a couple of microseconds of MD and then that time that was quite a bit. Okay, so here is again the, the movie so what you can see here is the proton transfer and then the salvation of the peak of the glutamic acid. At the end, you see the salvation of the, the groma for so this is actually these, these couple of steps that you can can visualize in this particular way. Now, the question is now what type of if you have this information what can you do with it. And how do you extract information on the reaction coordinate. And that we actually devised a well, a method that actually use makes use of the so called committer. And the committer is a high dimensional function. It's, it's a probability as a function of a certain configuration state can also be a phase space point but in here it's defined as a congressional point. It's called X. And if you initiate a trajectory with randomized velocity, it has a certain probability that to end in B versus ending in a, and this probability is called the commitment probability or the committer. Now, if this probability is a half, that means that it is equal likely to go to be as it is go to a, you can call this a transition state. In fact, it's a the operational state, the operational definition or a transition state. And in this way, if you do that for all the pathways that you, the pathways that you collect, and you actually scream these pathways for points where the committer is a half. And this could be over here, you get the so called dividing surface. And this dividing surface is a proxy for the, or actually it is the transition state ensemble. It's a transition state ensemble. TSE transition state ensemble, which is the intersection of the transition pathway with the committer half surface. Now this is very useful because this committer half surface. This is the embodiment of the reaction coordinate heads and the embodiment of what distinguishes the reactive from the product state and how do you actually go from one to the other. And one thing that I mean there has been many attempts to actually do reaction coordinate analysis based on this. I'm not going to talk about most of them, but I would like to point out the, the one that we mostly use. And that's the one that was developed by Baron Peters and very trout a while back already. And the idea is that we know that the committer is the reaction coordinate it's the optimal reaction coordinate. The problem is only that this is a very high dimensional function, and it's very difficult to gain insight. And so usually we need some sort of dimensionality reduction. You find the best low dimensional or the parameter combination, the best reset represents this committer. And the way you do this is by interpreting each TPS shooting attempt as a, as a part of the committer calculation so. Yeah, so each. So suppose that you have done all the transition path sampling, but you know precisely what each of the shooting points was, and where it ended, and each of these is a committer attempt. Now what you can do then is you use this information to optimize a reaction coordinate model, which is here denoted are for reaction coordinate as a function of a number of collective variables q. Then you can actually write down a likelihood maximization or likelihood model, which is actually the the committer function as a function of this model are, which is then based on the variable q, which in turn is based on the exact coordinates of the shooting points for the be the ones that go to be versus the ones that go to a. And what you can you can visualize this as follow so for example, the red points here. They go to be in the green points that go to a. And you can clearly see that there is a separation. So the best or vector that were the best. Yeah, the variable that represents this is this high dimensional vector. This is indicated here, and that can actually reproduce the data in the most of the best way. And you can do that for the helix free unfolding so this is the first step in the, in the unfolding of the PIP, where we actually took 78 or the parameters and we found out that these three for actually, we're the most different. So this is actually the RMSD of the alpha, which is just a proxy for how good the alpha helix is. The number of waters around the tires in the distance between the Eleanor and the pro line and another distance of a hydrogen bonds involved in the, in the helix and they are indicated here as well. You can do likelihood maximization and find out in this case that this combination was the, the most well, the description that is best describing the transition, and you can then further test that using committed analysis. And so, we can do the same for the solvent exposure transitions. So here is the exposure of the of the glutamic acid into the solvent. So that's the first step, but I mean as I said there was a fork in the mechanism, it could also be that the groma fork goes out first, but this turned to be to be a rather unproductive in the sense that the next step would be very high. And most of the flux goes through this path into the final state. So we can actually also do some coordinate analysis, and here find that for the first transition, it is mainly determined by the distance between the groma for and the glutamic acid, whereas in the in the, in the other state there is a more complicated. Any other transition is a more complicated reaction coordinate. But this actually shows you that you can analyze these fairly complicated transitions using the path analysis, the committed analysis of the path ensemble. And it's important to understand that this is a here the rate limiting step is about 16 kT, and this is about a millisecond. Okay, here is a nice visualization of what we actually figure out is the entire transition path ensemble. Okay, I'm. I see that I already am. Yeah. I'm talking for a long time it's it's pretty hard actually from from this part of the screen. I am, I am thinking of what I should do now. Because there's quite a bit of material to cover. And I am thinking I'm going a little bit I'm going to speed up a little bit. Maybe, let's say, 10 minutes more things like that. Sorry, like 10 minutes more. Also Peter, maybe you can take a break. We can take a break. It's not. Well, this is what I would otherwise talking in a row for two hours is a very long thing. Yeah, that's what I'm saying. Yeah, so maybe we should have a short break and have some questions, maybe already. Yeah, I think that's a good idea because I then also that we can skip a few things then maybe. Okay, Joe, go ahead. I mean, any questions up to now. I was wondering about the state of automation. Can you hear me. Yeah, very. Very low. All right. About now. Yes, slightly better. Yes. Yes, better now. Great. I was wondering about the state of automation of all these steps. Since that last paper showed us from 2010. Apparently there's a consortium open path sampling. So our tools being developed to automate these processes. Yeah, very good. Very good points. Indeed. Yeah, so this was actually an old paper. Of course I have. There's a newer work as well. But what you want. Is indeed a software. Code or package that can deal with that and that is indeed done by the open path sampling. Code and I know I have to share this again. How do I do that? Sharing. Working. Yeah, so can you see this or no. Yes. Okay. Yeah, so. So this is actually a slide which actually shows you how what what the open path sampling aims to do. So we have a Python library for these. The sampling itself. And at the moment it works with Chrome X and open MM. We have also lamb support. And it's basically using the, at this moment also the MD trash type of functionality for analysis analyzing trajectories. But it can allows you to. Yeah, define stable state trajectory ensembles and do all kinds of networks even. Yeah. But it can also calculate rate constants and do analysis like path densities and. Analyze also the. The reaction mechanism. Okay. So that's actually the open path sampling. How about by the, can you apply this to small and by the. Yes, sure. I mean, depending on how you. Yeah, no, of course you can. But I mean what you always have to take into account is that you need to be able to create stable states. And have. And you should be sure that there is a. Yeah, there are. The pathways between the two states and do not have. I'm not becoming indefinitely long. Okay. Yeah. I think we can talk about this more in, in, in a, in a different way. I am also not so sure what I should do next, wait a minute. So, again, the chair helped me a little bit here what what is now the plan. I mean, otherwise there are like maybe 10 minutes more talk. Yeah, but then we stop for a break. 1130 and then let's say it's over. I mean, it's around 1130. Okay. All right. Talking about the ligand, the ligand binding. So I don't have an example of ligand binding, but I, you, you. Yeah, I'm sure you can do this. Okay. So we looked at protein dissociation, which is not really ligand binding, but there we can actually sample pathways or maybe show this for a second. This is really difficult. Okay, it's really struggling guys. Okay, so here is the. An application of, of protein dissociation, which I find. Okay, so, so here you can actually see a nice transition of the how the dissociation works. And this is, you can do this with, with PS. And you can already see that it lingers on for quite a while in a kind of the meta stable system before it actually really dissociates. So you can actually do hundreds of trajectories here because these are, I mean, these are very diffusive trajectories as you might imagine. And we can analyze this and then you can see that these dissociation occurs by a different mechanism. And we dealt them aligning, hoping and sliding trajectories, which actually indicate different ways of that the, well, that the two proteins actually connect to each other. And so how you actually, you can understand the nature of this mechanism and identify transition states exactly in the same way as I already pointed out by analyzing these, these reaction coordinates, and then you can really find out what the important coordinates are. And in this case, it, it for the sliding parts, for example, it turns out in a crucial salt bridge that is involved in the mechanism. So, yeah, so I mean, of course, identifying such a coordinate is, it's probably not enough. You also want to really look at how these transition states look like. And you can actually identify really in some of these transition states, this, this, this, this, this, which really plays an important role. And in others, you actually can show that there are bridging waters and this bridging waters actually make the protein more mobile so it can actually slide easily over the surface and you can roll over the surface of another protein. Okay. Keep questions. Sorry. I'm actually now a little bit confused, guys, because the problem is now, should I actually take some more questions or. Actually, there is one more question here in the chat. Maybe, I don't know, I can read it for you or you can read it yourself. Do you see it? Yeah, I can, yeah, maybe, maybe you have to be a bit more careful. In the last lectures, we learned some ways to accelerate the simulation process. I was wondering if there would be a way to do so in here by reducing the number of unsuccessful transition attempts. And maybe you can ask the one that asked this question can explain this question a little bit. Are you talking about the number of rejections in the path sampling steps? Okay. Yes, exactly. Okay, yeah. Yeah. So you must realize that the way that path sampling in this transition path sampling works is because it is a Monte Carlo algorithm. So creating a Markov chain in path space. So it means that for any Monte Carlo method, you need some rejections because if you would make changes to the systems which are too small, you can have 100% acceptance, but you don't get anywhere. So this is the eternal dilemma between exploration and exploitation or something where you make big steps, but you don't, you're never accepted, or you make too small steps and you will always be accepted, but you don't get anywhere. So there's a fine line, a golden locks zone, where you are just right, where you have a sufficient acceptance and sufficient exploration. And the way to think about this is that your acceptance should be usually around 40%. Something like that 40 to 50%. So usually we get a little bit lower than that because it turns out that it's, yeah, I mean, depending on the type of shooting algorithm you can, especially for the diffusive pathways that we have over here, you cannot expect much higher than 50% basically. And the answer to your question is, if you have a way to reduce the number of unsuccessful transition attempts, that would be great, but because we're using this shooting algorithm, we are already pretty high up in the acceptance. Now I should actually say that it's not completely correct what I'm saying because we have, we assume now already an unbiased randomized shooting point selection criteria where we have, you choose shooting points from the entire pathway all over the place, and this is of course, can be very detrimental. If your pathway is long and there will be substantial part is inside the stable states, then clearly your acceptance will be lower. So one of the important parts is that you choose the right algorithm. And that's why I actually said in the beginning there are many shooting algorithms, and a good algorithm is important. So one of the ones that we now usually use is this so-called spring shooting, and I skipped all of that, because it turns out that if you have a diffusive process and you use uniform one-way shooting, this actually has an extremely bad decorrelation. The reason is that if you have a system like this where you have a barrier, a very strong stable state, and then a large plateau before you actually reach the final state, then all the trajectories that start over here in this regime give you pathways like this in the tree, which really don't help. They really don't show any decorrelation. So what you really want is only focus your shooting around here on this part. So how do you do that? Well, one way is to actually devise new methods, and this one new method is called, in this case spring methods, where you propose new shooting points based on the old shooting point and by shifting it a little bit. This is akin to what the aimless shooting is about, but this is probably meeting too far. OK, to be honest, I wanted to say a lot more, but I found it extremely difficult to, I mean, this is almost like, I mean, normally if I do some pedagogical lecture, it is usually based on existing courses. OK, so yeah, I mean, I'm trying to find out what I should say now. I think it's important now to actually show a little bit further what we actually can do, and that is to actually go to rate constant calculations. If I may, Angelo, is that OK? Yeah, yeah. OK, so all right. So of course we have now discussed how you can actually calculate half ensembles and how you analyze this, but we haven't gotten to the transition rate constant yet. And to actually say a little bit more about that is we, many years ago, we also looked into that, and actually it's also part of the first transition path sampling papers. But I want to show you a slightly different approach, which is the transition interface sampling. This is actually based on the notion that you can create a very long trajectory, a very long M.D. trajectory, and you can identify the history of the state that you lost. I'm sorry, am I sharing the screen? Am I sharing the screen? No, I'm sorry about all of this. It's very, OK, so can you see something now? OK. Yes, now we can. All right. OK, so here is a picture of a very long trajectory, and I colored it according to where the trajectory was lost in what state it was lost. And so if you do this, then this is the perfect way to create or to count transitions. And you don't have the problem of recrossings. And so all these excursions that are outside of B, and they are clearly not recrossings, and then go back to only if you really go to A, then you are sure that you are committed to A again, and then you have crossed the rare event. So if you actually do this, you are sure that the rate that you calculate is exact. This is called the over, yeah, it goes under several names in the original paper. We call this the overall state definitions, but it's also in the literature known as the core set definition. And what it actually does is you only count a transition if you really enter a new state. Now this means that if you are now cutting off all the parts that are in the, in state A and state B, you end up with all these little loops. And of course there are many more loops close to A than there are crossing points. And so what you now can do is you can calculate this flux that you really are interested in, these pathways going from A to B. You can calculate this by introducing a set of interfaces. So now you do an affiliation of the phase space. And so we have these curved interfaces here. And each of these interfaces is parameterized by a parameter called lambda. And this goes from zero to N. And now I can, on a particular interface like this green one, I can define pathways. So this is my pathway going through the interface and either go to the final state or to the next interface or it goes back. And by sampling now pathways on this interface, you can see them now, this one has been rejected. Under the condition that they have to cross this interface, I can actually construct the array. So how do I do that? I compute the so-called crossing probability. This is this definition. It is the probability that a path that crosses lambda I for the first time after leading A reaches also the next interface. This is this purple one. That's the definition of this particular thing. And then it's exact. This is an exact expression. The rate is the flux through the first interface at times. Okay, this is the flux through the last interface provided you're going through the first interface. So this is almost like a definition. But what you can then do is you can do staging algorithms. So you replace this crossing probability by a series of crossing probabilities, a product where each of this product is actually not so low anymore. It is on the order of one to point one, between point one and one. And this is actually very doable to calculate. So you can compute this in each of these ensembles and you get a very good, I mean, an exact expression and you should get very good estimates where you can identify and quantify the error in a proper way. So this is a akin to umbrella sampling or constrained MD sampling, but it is actually based on path that is a completely taking into account the history. So there's no Markovianity assumption and there's no approximation in this. Okay, so this is showing a small application of that work. Here's a DNA base pair rotation. This is from work that we did a couple of years ago where we looked at the Watson-Crick base pair to the Hoxing base pair. And if you do this in a conjugate peak refinement, this is work by Nicolauva, and you see a transition like this. So this is a minimum energy pathway. But if we do transition path sampling, it actually shows you that this is actually happening in a completely different way. It wiggles much more. It goes to the outside and then it actually enters again. So there's a couple of mechanisms that we can actually undergo. There's this inside mechanism, but there's also this mechanism where the base pair goes outside in the solvent and rotates and goes back in. And we can actually calculate the rate constant of this process by calculating these crossing probabilities. And we point put them all here now in one curve. This is a master curve. But in the end, we actually get an estimate for this crossing probability. And this compares this probability with the experiments. There is a mismatch. And this was in this work. In this work, we also identified as a force fields. Well, I mean, probably a lack of the force field, but still we actually see that there is a very good thermodynamic estimate of the, of the terminamics is reproduced very well, but the rate is a little bit off. And this is unclear yet if there is a force field or some missing transition here. So I mean, we can actually extend this. They can extend this to, to multiple states. So, yeah, I'm. So the thing is that if you have multiple states, then this is very difficult to, to sample one. Or at least you don't want to sample it. And one by one individually, but what you can do is you can do multiple state versions of, of TPS and where you start from a state A and you calculate pathways that end up in, in other states. And we can do a similar game here. We can calculate the crossing probabilities and then compute all the, the different fluxes and the different transition. The crossing probabilities for the final interface. And we can actually combine this using TIS and, and multiple state TIS and the multiple state version is just allowing all these different pathways in one simulation. And these rates can then be used in, in a Markov state model and then you can actually identify exactly in the normal way. How the transition takes place. I'm, I noticed. Angelo, I noticed that I'm completely out of time in a right or not. Okay, something is. Right. Sorry. Yeah. Perry, you have, well, we have till 1130, but we thought maybe. I'm going to show you a few more, a few more things because I'm getting a little bit anxious here. I wanted to show you that we can do this for, for a very, for the folding of a protein. And this is the famous trip cage. Actually, it's a small protein, which actually falls on, on the microsecond time scale. And it's a two state folder with some experimental rate matrix and the four microseconds. And it actually shows a couple of timescale. So this actually indicates that there are intermediate states. But what we can do is actually, we can apply our MS TIS method for this. So we get a huge rate rate matrix between all these different metastatal states. And then we identify which is the most slow process. And that turns out to be the overall folding to unfolding transition from U to N or from N to U, depending on how you look at it. But there is also a fast process, which has to do with an intermediate state, which I recall here as N here. We can actually identify complete flux network here. So starting from, in this case, the unfolded state to the unfolded state, we can identify which pathways take the most flux through the network. And then we can also identify what is the fast in the small timescale. And they actually coincide with the experimental work. And this actually also allows us to identify a metastable state, which was previously unknown. And so this is actually showing how you can actually do this in using advanced methods in the part sampling framework. So this is actually one of the last slides. This is showing the evolution of part sampling algorithms. So we can do intermediates by using this metastate, multi-state TIS approaches. We can improve the convergence by RITIS, which is the replica exchange version of TIS. If you have a large amount of replicas, you can do single replica TIS, which actually is a single replica version of the replica exchange. And most recently we looked at the virtual interface exchange, which allows us to get a complete path ensemble from a single path sampling simulation. So all this is actually using advanced reweighting schemes that allow reconstruction of unbiased dynamical trajectories ensemble. And if you do that, you can actually see, you can reconstruct also free energy calculations or free energy landscapes. I'm going to stop. So I'm apologizing for the rather chaotic way of presenting, but this is also because I haven't. It's extremely difficult to find this. The last thing I want to show you is an example of something that we did more recently together with the groups of Gerard Hummer in Frankfurt. And this is learning sampling and reaction coordinate analysis together. So I've been talking about doing transition path sampling and then trying to get information out of it. But you can also learn on the fly how to improve your sampling by adapting where you put your transition by your shooting points. So this is using reinforcement learning. So you learn where the committer is using a neural network. You collect your pathways in this way by getting more and more better and better estimates for the transition state. And then if you have this entire committer surface, you can identify a working model for it using symbolic regression. This also makes the mechanism interpretable. And we did that for methane clathrate formation or methane hydrate formation where you can actually see a crystallization process. So this is not a biomolecular system, but it's very interesting. We can learn and sample the PB and it actually is validated quite well. We can also find out what the important coordinates are. And then you can actually construct using the symbolic regression, a model for this particular process, which actually includes then the number of the size of the nucleus, the temperature and surface property in the sense of the number of waters. And this shows you that now what the state of the art is. It actually combines sampling analysis and creating insights in one framework. And I think I should stop here and I would like to end with the conclusions. So we can actually sample with TPS an unbiased ensemble of reactive trajectories. We can do committer-based analysis that yields the interaction coordinates. We can do TIS to create and calculate kinetic rate constants. We have multiple versions of this, multiple state versions to allow sampling a full reaction network. We can do re-weighting of the PAP ensemble to do evaluation of the full reaction coordinate. And then we actually in the last slide I showed you how you can simultaneously sample and analyze using machine learning. And well, except for the last part, the machine learning business is not in open-part sampling, but it's a framework that builds on open-part sampling and you can actually go to the website if you're more interested. And this is the acknowledgment. I want to acknowledge the OPS team, which is on top here, and David Swenson, Jan Hendrik, John, and Frank Nova are involved. From the UVA, I want to thank Fedem for his work on the protons, Arjen for crystallization work, Jocelyn for DNA and Bernd for his AI input. And then my collaborators, Titus, Christof Gerhardt and Roberto. So I want to thank you for your attention and I'll take some questions now. Okay. Thank you very much, Peter, for an observational club for the very nice talk. So maybe we have time for a couple and say more questions. I was wondering about that last application so could you take a reaction of which you don't know the mechanism nor the steps and nor the how many steps it has and can you apply that and then finally using that symbolic regression figure out the how many steps the reaction has, you know, the, I mean that symbolic regression does it give you the, you know, the equation for the reaction? Yes. I mean, the whole idea is indeed that if you have an unknown process to find out what the actual reaction coordinate and the collective variables are and then based on the reaction and based on that analysis, you also want to know the, yeah, an equation that combines these reaction coordinates or these combines these collective variables into a single expression. And I mean, a simple example is of course a linear expression in fact, in all the applications that we have seen until so far people assume linear linearity. It's always assumed, but it's not clear that that a linear linear form actually works. In fact, I mean, in many cases, it doesn't work. So I mean, I can show you a I'll wait a minute, I mean, this is wrong. Let me just quickly show you another, do we see this or not? Is that visible? Am I not sure? Where's the share button? Okay. Yeah, all right. Okay. Sorry about this. Okay, so this shows you, look at the lower part here. So here's a famous difficult potential. This is called the Zoro potential, potential if you want to know. So if you start here, you first have to move into the X direction and then go back. And this is very difficult to learn with a linear reaction coordinate. And so one way of doing it is to come up with a committer representation that can actually do this. And so, yeah, here you can clearly see that nonlinear reaction coordinates are important. But if you actually do this with a symbolic regression, you can, by combining all these different functions, you can also end up with nonlinear functions like exponential functions or log functions or fractional forms or these are much more complicated. And so, and they allow for these nonlinear behavior. So the answer to your question is yes, you can actually start understanding how these collective variables together build up a a reaction path. Thank you. Okay, I think maybe we can, we can stop here. And because then we have to, we have a small break of 15 minutes and we start with the other talk from the from the students and CCC and the CTP. So I would like to thank Peter again for this very nice presentation.