 screen-sharing or wait until maybe they can so they start recording when they start and then yeah then you switch on your representation all right okay yeah just to because some people are coming and start coming so more audience more fun no definitely more effect yes stream is on now I think you can start sharing the the screen so and yeah I think we can start so welcome back everyone my name is Angelo also for people who haven't seen me in the first days of this workshop and so we have a slight change of program because the first speaker of this afternoon Elina Rosta she she's not connected yet so we start with the second person and then in case we be back with her if she connects later and we don't know what happened sorry we didn't receive any communication from her so the the the first speaker so then of this afternoon session the last session of this workshop is Professor Stefan Wolff from the Albert Ludwig University in Germany and so Stefan will tell us about protein like and dynamics by using a atomistic simulation and so we are very looking forward for his talk so Stefan please the the stage is yours thank you Angelo so hello everybody thank you for connecting for the last session of this conference and I'd like again to thank the organizers for giving me the opportunity to show my research here in this framework now at the start I'd like to come back to a couple of points that have been raised in the introductory lectures on Monday by Alessandro Olayo and Alessandra Magistrato so currently in the field of molecular dynamic simulations we are facing two kinds of frontiers the first one is the frontier of increasing length scales that we want to simulate and this is quite straightforwardly addressed by essentially adding more computational power by the stronger parallelization and we now see frequently simulations of systems with several millions of atoms we've seen some of these simulations here in this conference for example by Gerhard Homer or Sima Khalid and the second frontier that we want to address is the one of longer and longer time scales and this is not that easy to address essentially and the major reason for this is essentially the physics that is underlying our simulations so because we want to numerically solve Newton's equations of motion for complex molecular systems we require an iterative scheme of solutions and an accurate numerical integration essentially requires a step size of about one femtosecond so and with this essentially we are slaved to the performance of CPUs especially by their respective frequencies and these frequencies have been stuck at about three gigahertz for the last 10 years so there hasn't been much progress on this field the smallest biological relevant time scales that we currently can address essentially by custom-made hardware is the micro-signal range that's what we frequently see and already for these kind of simulations this essentially means that our computers need to perform one billion iterations for one trajectory if of course you have the money at hand for custom-made hardware such as the D. E. Shaw Foundation then you can invest 100 million US dollars to build a machine which can go higher but as Alessandro Lairo already said so what the maximum that can be reached in even with these kinds of machines is about 10 to the power of minus one seconds however in nature we find as well processes which are happening on seconds, minutes and hour scales and it would be important to know about their dynamics as well so my focus of interest there is essentially the dynamics that ligands especially drugs have and they are the time scales that they have for binding to a receptor protein or their target protein and especially unbinding why is that well okay what is happening if you are taking a drug essentially goes into your digestive system and then it starts to go into your bloodstream and diffuse around in your body to the respective target protein here in this case that would be HSP90 as shown by in the lecture that we had a couple of days before and what then happens is that locally you get an equilibrium between bound and unbound states you have displayed down here as a two states free energy landscape and then essentially the common way of designing a drug or has always been essentially to find a drug which is a very good binder that means has a very high affinity to the respective target protein and that means essentially that's the free energy of the bound state and is significantly lower than the one of the unbound state okay roughly 15 20 years ago people were starting to screen the literature and checked what kind of properties do good have drugs have which made it successfully through the drug creation process and it turned out it was not the good binders essentially that everybody was focusing on but it was this but slow unbinders which made it through this process why is that well free energies are stationary distributions or some measure for them however our body of course is not stationary we essentially a non-equilibrium environment and what our body does as soon as we throw something into it it tries to get it out quickly as well that means even if you have drugs in your bloodstream and most of them then are finding their target protein bind there the unbound state will always be depleted essentially by our body and so you essentially you suck out the population for the unbound state and of course then the system tries locally to re-equilibrate and more and more of even the bound protein ligand will go out of its binding side even though the binding affinity hasn't changed there so unless you can slow down this process you do not achieve that the ligand stick can stay somewhere as a consequence you need to give more drugs and more drugs means more substance which is in your body which can bind somewhere else and cause side effects something that you don't want if you want to predict what is the especially the unbinding kinetics and rates of such a compound when it's bound to its target protein then you need to be able to perform predictions on rates which are in the minute to hour rate and again this is way out of what we can do with full molecular dynamic simulations so what can we do about that? Well coming back again to the introductory lecture from from Alacentro Eliu we need to find a way how we can essentially coarse grain the full Hamiltonian dynamics of this molecular system in a much more simple model which allows us faster and more efficient integration the first hallmark that we need to see here is is our system following a so-called timescale separation that means is the slowest degree of freedom that would be the binding and unbinding uncoupled on its timescale from everything else that means just random collisions of the ligand with water molecules the thermal fluctuations of the protein and the ligand and if this is the case then we can assume makovianity for the system that means essentially there's no correlation between the motions of the system so that means the ligand binding and unbinding and the so-called bath that would be everything else which is fast and then we can essentially describe this system in theory by a makovian Laussmann equation. Laussmann equation was already introduced by Ralph Eichhorn a couple of days ago and essentially what it says is that the dynamics of your system at hand projected on to a given reaction coordinate that would be here x let's say for our protein ligand system that would be essentially just the center the distance of the center of mass of the ligand away from its binding side is described as the gradient of the free energy and then you need to add then and then you need to account for friction in this molecular environment as well and your the ligand will receive a fluctuation as well from this is surrounding this fluctuation amplitude is coupled to the friction by the fluctuation dissipation theory and then the fluctuation is guy is following essentially a random process so we're coming out of a Gaussian distribution. If we want to describe our dynamics on this level of this kind of equation we need a method that allows us to parameterize both the free energy and this friction coefficient as well free energy is well investigated essentially how you can assess them friction for coefficients are more tricky and there's not so many processes or methods how you can do this and so we wanted to create a method how or an approach how we can do both at the same time and for this we wanted to use the power of bias simulation because we essentially need to accelerate this technically this process and here we rely on so-called targeted molecular dynamic simulations and this is essentially as well a biasing method where you are not pulling out your ligand from out of the binding side by a harmonic restraint like for example in stdmd or an umbrella sampling but you are using a Lagrangian equation of motion of the first kind so really a constraint in this case we're using a constant velocity constraint to bring our system from one side one stage to the other and the constraint is realized by a Lagrangian multiplier which is essentially acting as a as an additional constraint force which is added to the system and is helping us for this kind of smooth translation and the advantage with this method is that you have a linear stepping through essentially x over time and so you've got a very nice sampling of especially the transition states there which is one of the ingredients which is defining essentially the unbinding of course these kinds of simulations are now carried out in non-equilibrium and the second law of thermodynamics is implying that if we just integrate this force over this path along the reaction coordinate in this case that would be the distance of the ligand from its binding side we arrive with the work and this is larger than the free energy that we're interested in because we need to account for dissipating now how can we include this constraint force into our large one picture well we assume in a way so this is the answer that we make that we can just add the pseudo force that we can calculate from free energy so from from full atomistic simulations along this reaction coordinate just on top of the Markovian large one equation then the constraint force is canceling out all other forces acting on the system and so it is counteracting the free energy gradient the friction and the fluctuation and it forces a slow adiabatic change with constant velocity and this is constant velocity is now here this vc that I give in of course you have now essentially the problem that your constraint might interact with the fluctuation system however in an ensemble of n simulations and you if you perform an average over this ensemble the fluctuation will cancel out because this is a zero centered Gaussian process and you are ending up with this top equation and then you can identify this mean velocity that the system should take in this ensemble of trajectories just as the the constant velocity that we applied while calculating these simulations if you integrate this equation then along x you will finally end up in expression but on the left side again now that is the free energy of the system then comes the mean work that you need to apply minus an integral over these friction factors times the velocity and this second term here on the right side is essentially the dissipative work that you need to that is appearing while you are performing these kinds of non-equilibrium simulations now we needed to weigh how essentially to rationalize better this dissipative work and where how we did that was essentially that we took stack back went again to for another principle of modern non-equilibrium statistical mechanics which is the Jatzynski equality that was introduced already to us by Edgar Wolland Roldan in his introductory lecture and the Jatzynski equality is in a way how the second law of thermodynamics is transformed into an equation away from an inequality now the problem with the Jatzynski equality it is beautiful it is mathematically most likely completely correct but the problem is this exponential mean on the right side is converging extremely badly so it's hard to apply essentially to a real system however if you perform a kumaland expansion and you stop that after after the second moment essentially you arrive in an expression where you can identify the dissipative work as the variance of work in this ensemble of simulations that you did this truncation is exact in case that the this this work is following a Gaussian distribution so this is some another ansatz that we make on approximation or essentially a requirement for our model which needs to be fulfilled that this is this following but we we go along with that essentially now we have on one hand our long run based approach and on the other side our derivation which comes off from Jatzynski now we essentially are combining both together and say that okay in case that the dissipative work here can be expressed as the term which is proportional to the the the variance of the work we can now extract or explain the or calculate the variance as the square of the integral over the fluctuations of the this constraint force while we are pulling this system out of the ligand out of its binding side and square this essentially and if you rearrange this integral slightly and then are performing some substitutions of the of your integral borders of your variables then you're actually ending up with an expression for sorry one second. Trug is to the nation we've got in hospital just around the corner I think they had a major issue there. Okay sorry for the with the break there now you can identify you can derive or define a friction coefficient for these non-equilibrium simulations which is based on this combination of logical equation and Jatzynski and the beautiful about this as is this expression is that it is essentially now an integral of the autocorrelation functions of these fluctuations over time in this ensemble of non-equilibrium simulations and this is representing a generalized form of the well understood equilibrium friction coefficient that you can calculate out of fluctuations of forces as well but is transformed is extending this into the realm of non-equilibrium as well. Now of course we want to first to check if this derivation is really of any help for applying it to molecular dynamic simulations problems and as physicists we like simple test systems and so we started to perform first test systems with a very simple sodium chloride system in ion here in water the reason for this is essentially well this this is a two-body problem now you have only rotational symmetry so there are no hidden degrees of freedom the bath is quite well understandable it's just the water surrounding it and of course the mass which is entering the equation is just the reduced mass of the the the ion ion system so this is quite straightforward as well and you can see now here on the right side a graph so in black the true free energy for the for the of the system for the probability of sort of distribution probability of the inter ion distance here you can see a quite nice bound state on the left side on the right side the unbound state and this is now coming out of unbiased simulation so if you just perform 200 nanoseconds of free empty simulation which is very quick essentially just a day on the standard computer then you get a nicely converged free energy surface now in blue you see the non-equilibrium work mean work from 500 trajectories at a given well polling velocity it's actually 10 meters per second is rather quick so this requires only 100 picoseconds per trajectory and now in red you see our friction coefficient correction of this non-equilibrium work and you see a very nice match essentially between the free energy true free energy and this this correction you see even you're within so the free energy is within the the the error range of this system and the let's say the max so the best what we can actually expect in these kinds of simulations is to be within two so one kt essentially of the predicted the true free energy without prediction in this case here one kt for the physicist that is corresponding to 2.5 kilojoules per mole so this prediction is quite good essentially what would say now we then took a look on the friction factors and how they are looking along x and we were quite surprised there because first of all friction factors look similar to the from their profile to the one of the free energy however you see details which are different and when do we try then to better understand here these kind of intermediate maxima at at between 0.4 and 0.6 nanometers what is happening there we found by a time series analysis of the underlying autocorrelation function and just understand seeing what what is happening in the simulation that these peaks in friction correspond very well to positions where the hydration shell around those ions is highly perturbed now on the right side you once see the the probability of water molecules to reside in the simulations for a fixed bound ion state that would correspond to the very left of the graph on the left side that the friction is quite small and you see there's a high probability to be for the water molecules to be in quite defined positions and at the first maximum there here at 0.4 nanometers you see that at this so this is close to the transition state as well in free energy you see that this probability of order shells had decreased significantly so it seems that this this this high friction is coming from the ions from the water molecules not being able to properly arrange themselves in a in a good arrangement around the water molecules and by this essentially you get a situation where you've got a lot of fluctuations of that for them and so most interestingly was as well that these fluctuations so these effect seems to be dependent on the velocity of the of the pulling and so that's allowed us essentially to predict something like an or to account for intrinsic velocity dependences of these these friction factors which by the formulation of the larger one equation should not exist because there is no velocity dependence in the friction factors themselves however we think that because we base essentially our analysis on Chatzinsky's equality which is supposed to be always right for any kind of non-equilibrium these friction factors as well are right to correct our non-equilibrium work according to the given perturbation that we are fixing the system or that we apply with the system plus and that's a beautiful part the friction factors are accounting for degrees of readings of the system which are not taken care of or which are not found in the definition of the free energy because they are essentially integrated out when you write down the free energy so the friction factor allows us to better understand even the overall dynamics and physics of the microscopic scale of the Hamiltonian of the given Hamiltonian now we then thought okay cool results let's do this with with with protein ligand systems and we applied the approach there and we failed miserably in the beginning didn't work out at all we the friction that we predicted for was way too high what was the issue here well we had forgotten one thing which is that ligands when they are going out of the protein they might follow different pathways now a pathway is so through the protein means as well that such a ligand is experiencing context with different kinds of amino acids the different amino acids of course exhibit different vibrations and therefore fluctuations will be different along the different pathways which means again that there are different friction factors appearing for the different pathways and so our our normal distribution that we assume for one pathway is breaking down because for example if you now mix pathways of ligands going outside of a binding site and say we have two pathways then we will see the overlay of two Gaussians which is essentially then rendering our approach useless so we thought okay can we maybe then rationalize this in a way that we separate trajectories according to pathways taken through the different protein and then perform this friction correction independently for the different pathways proper for this was of course that we needed to be able to define or to to follow the different paths through the protein and the method of choice that we wanted that we used for this was then principle component analysis and that works in an awful case the following way that we are calculating the minimal distance of any atom of a given amino acid to any atom of the ligand during unbinding and then are creating a covariance matrix of this the diagonalization of this covariance matrix is essentially now a rotation of the initial coordinate system according to the axis of maximal variance and so and the respective eigenvectors then can serve for the generation of principle components which can again give us a new reference coordinate system along which we can search for pathways that this is the the ligands are going through the protein there's one catch though now from the side of theory you see in this covariance matrix we have means we are doing non-equilibrium simulations non-equilibrium means there are no time independent means and so there's no clear definition now what we need to do here with these covariances and therefore we essentially extended non-equilibrium simulation so in the PCA into the non-equilibrium range by taking or viewing these non these covariance matrices as observables from non-equilibrium simulations and based on an approach that Delago and Hummer were proposing we essentially can now reweight these non-equilibrium derived covariance matrices to obtain the respective equilibrium based covariance matrices and within these new PCs that we can generate then is that we see that there are defined pathways appearing here now for example from the start place at A to the end space at U you see that the system prefers to go either by the bottom along PC2 or by the top and so usually interestingly we found that we only need two PCs to explain most of the variants in the system here and then can really sort the pathways that the ligand is taking in this by projecting the different trajectories into these onto these new particular components and sort then trajectories according to pathways and perform the friction correction independently for these pathways. It is possible to do that by hand if you've got something like 200 of the trajectories for more we are currently using machine learning approaches as well to train the machine to detect the pathways on the surface as well. Now how does a typical unbinding look like for a test system in this case that was we used two different kinds of systems the one is a trypsine bensamedine and you can see that this is a typical free energy profile that we see now you've got one starting free so one transition state at two five more minutes yeah five more minutes okay thank you and then essentially you end up with something like an unbound state here at after four and we find that essentially these friction correct factors are corresponding very well with the build up of hydrogen bonds of water molecules surrounding these ligands and this is showing us that essentially now for the friction here on the side we are obtaining the so a quite nice picture that as well the influx of water molecules is mediating the friction that the ligand is experiencing during going out of the binding site okay now we wanted to start we started out that here with this project with hoping to be able to predict kinetics for the system based on large one simulations and well what what would happen if we now could put in these these fields into large one equations the large one equation and integrate that for this essentially we used an approach by busien parallel parallel as an integrator for large one dynamics the advantage of what this is this is very fast so you can just calculate one microsecond simulation time on at within four minutes on a single cpu and you're arriving essentially in in kinetic you can derive binding and unbinding kinetics here which are which are well which are consciously by a fact of only four essentially if you compare this to a fully atomistic md which in this case would take something like two days to get it or several days to get to the same range okay so we can arrive in predicting microseconds very quickly now but again we wanted to go to minutes and hours how can we do that well we can relate to one advantage of our cost drain model which is that these otherwise the indifference to the to the real proteins in your our simulations free energy profiles and friction profiles do not denature if you tune up the temperature so we went to this kind of reformulation of karmas law for the transition rates into an expression which allows us essentially to calculate large one simulations at higher temperature and then rescale the respective kinetics to lower temperatures by this essentially boost up the the sampling of binding and unbinding events in awesome large one simulations by a considerable factor with this action we were could not only assess a second range binding kinetics for that which is existing in the trypsin bender meeting complex with an accuracy of factor of roughly three which is quite so within the best that you can achieve so a factor of 10 is essentially the maximum you can achieve in these kinds of simulations and for an hsp 90 inhibitor complex which has got binding binding rate in the range of half a minute we could arrive in predicting the binding rate again in a range of with a factor of accuracy of four we are a bit worse for the unbinding so for the unbinding kinetics but again factor of one to 20 is very good if you compare this to any kind of other method which is existing now in case that you don't want to if you want to be even faster it is essentially we found just as a side note that you and you want to rank different ligands according to their unbinding rate to make a prediction what from a small compound library what kind of ligand is finding very tightly to so it takes a long time to go out of the binding site then it's essentially just necessary to look at the final unbinding uh non-group work and this is already quite good scoring function for your unbinding mechanism and uh with this essentially i'd like to conclude my talk as well i hope that it could convince you that by this combination of the method that we now coined the termed anticipation correct the target with molecular dynamic simulations for reading out free energies and friction factors and using them as input to tie the temperature boosted longitudinal equations that we can now reach time scales of at least minutes and probably hours as well and that's non-equilibrium pca then is allowing us to obtain principal components not only for simulations of simulation data from equilibrium simulations but as well from non-equilibrium simulations and non-equilibrium data in general so next steps essentially is that we want to better understand as well these kinds of pathway separation this is a bit heuristic still from from the way how we do this we're still working on a very good theoretical basis to understand what is happening there and we really want to push forward the time limits of these kind of simulations into the hour range and apply these situations these simulations into more complex systems for example ion transports through membrane channels velocity dependent friction of lubricants as well or to the decipher pathways of allosteri within proteins and last but not least i'd like to thank the the group that i'm in especially my my my boss Professor Gerhard Stock Simon, Miriam, Benjamin and Matthias where and our students PhD students which are essentially enrolled or completely attributed to all the data that i showed you now and together with my experimental collaborators i'd like then finally to thank our funders as well there very last thing all that i showed you right the moments i'm quite excited to share that with you is now research which has is will now be in the range of a new so-called Forscher group a research unit from the German Research Association which was just granted today and so we will apply essentially DCTMD now to more complex systems and try to understand better the respective foundations of non-equilibrium simulations and systems and you as well combine this together with approaches from other researchers of our of our university and as well of the Fraunhofer institutes which are more working on the material sciences side of to cause grain essentially non-equilibrium situations and systems and to get a better fundamental understanding of them and we are hiring as well now in UCF PhD students so if you're interested in obtaining PhD in this field or interested in working with this please drop me a line or write me an email and with this i'd like to finally thank you for your attention and i'm looking forward to your questions thank you very much Stefan very for the very interesting talk and for staying in time actually we have already three questions so i go to the first one by Drabani Tarafder so she's saying she or you think sorry uh how do you model the mass m along the reaction coordinate x of the launch event equation actually yeah thank you that's a very good good and tricky point so for the sodium chloride iron pair of course because this is a two-body system then we know this is the reduced mass of the system for the for the for the masses of the ligand protein system that's a bit tricky right at the moment we if you plug in only the mass we started with plugging in the reduced mass of the ligand itself and the reference position in the protein where we that we used as an anchor group to pull out the ligand essentially or push out the ligand and then we realize this mass is way too small and the way how we rationalize this is essentially that unlike in the tool uh two-body problem yet the mass that you need to take into account is not only the masses of the ligand and the protein itself but as well of the full protein because you are pulling on the whole essentially of a part of the protein of course the whole protein will react to the force that you apply there due to the fact that all our items are connected to all the other items within the protein itself however and that's a nice advantage here all these simulations we we checked in what kind of diffusion regime we are and we are in the overdamped regime if you are in the overdamped regime the mass does not play a role anymore for the kinetics of the system and this allows us actually to tune up the mass to ridiculously high value so usually we take this reduced mass of the protein ligand complex and tune this up to multiply this by a factor of 10 and this allows us to go to larger time steps so even up to 10 femtoseconds nevertheless we retain the same kinetics of the system so we're quite lucky that we can use this in a way like a physically sensitive sensible hack to accelerate our simulations thank you so next question by davide bassani is saying thank you very much for the very nice interesting talk and what would be the time required for a screening of let's say 1000 ligands based on combining using your technique oh um so for the i can tell you for one ligand uh one ligand requires roughly that depends a bit on the complexity of the protein as well um i would say 400 simulations um and usually you need to pull over one or two nanoseconds so that means you are needing something like so you're pulling over one or two nanometers sorry that means essentially for one ligand you should calculate um in total 800 nanoseconds to get a good good so a good basis for your analysis of course you can paralyze this very well over different computers um so one ligand in this case would require i i would say if you if the q is empty on in your in your supercomputing center of choice you've got one weak act if you want to have this detailed picture if you want to do this very fast and quick and dirty with the scoring function you just need a third so you need 30 pulling simulations per ligand for one uh for for 100 picoseconds that can be done on one um machine a sensor on one desktop workstation within six hours roughly and from there you can take it so it is it is demanding computationally more demanding than than docking but essentially um there's uh there's now a new nice review from the weight group uh from the hits in heidelberg in germany and we're roughly so 10 times faster than all other um uh algorithms which are on the market right at the moment okay thank you very much so another question by isabel louis growth house she's saying can you please explain again that the com pca on slide the 15 which pc did you use okay sure uh moment now we need to go back okay so actually for the separation here um this is now from this hsp 90 um this hsp 90 protein and we used for the past separation only pc one and pc two because you see you can see pc three is pretty much flat it doesn't contain any information um for the simulation systems that we used we found that actually two pcs are normally sufficient to explain roughly 80 uh percent of the existence covariance which is usually the the the given number that you need to know where where to set your cutoff there and actually so these pcs built up on an analysis of all amino acids which are approached by the ligands in any simulation during the uh during the unbinding process so in all unbinding some uh simulations within a cutoff radius of 4.5 angstroms okay thanks um so then andrea pasco di bisciagge is asking do you think that the combined could be predicted with sorry could be predicted with uh ml methods if he has how uh sorry the uh you mean the cumulative i i don't know sorry i need to i i check as well moment one yeah which is moment once maybe andrea can you can ask the question directly you can unmute yourself yeah can you yeah can you yeah i mean uh do you think that uh they care for the unbinding could be predicted with machine learning methods yeah actually sorry yeah now i can read it thank you um actually there there are approaches for this as well right at moment i'm not working on this uh myself i have to say but uh the weight group especially daria cork uh which is in a research assistant with uh rebecca weight in in the hits in at in in heidelberg they are doing these kinds of approaches right at the moment we are more interested in uh methods of unnecessary so of learning pathways uh themselves in these kind during these kinds of unbinding states but this kind of research do it does exist okay thank you one next question by ali asan ali how would the electronic polarization affect the friction induced by the solvent oh thank you yeah well that's a good question is something that i would like to look at actually the major problem is not not really the for for my simulations one of the the problems that i see is not that the uh such a problem for the determining the friction but you can see here now um that of course when you go to rama's law for transition rates that the the the transition barrier hate is uh the major uh player that you or variable that you need to get right to to obtain the right kinetics and um i see for some protein ligands systems that i overestimate significantly the transition barrier and there's a nice paper out from the kanoli group from yulich right at the moment so who showed that actually uh this is an issue with that some ligands are showing significant polarization effects on transition barriers and you can even tune them so this can make an error in transition barrier hate on the level of up to a factor of four so this is significant and uh one of the things that i'd like to try in the future is how i can improve these kind of predictions by explicitly using um polarizable force fields so this is definitely something that the fields need to involve evolve towards if we want to get the rates right okay and last two questions actually bought by shell team so first question is could you briefly describe how the constant velocity constraint force is calculated yep sure thank you this is actually done um just by again moment let's go back so this is essentially really um well first year's theoretical physics that's Lagrangian equations of motions of first kind that means you are introducing this this constraint here which is saying that you the system needs to count for a linear trans transition of the of the center of mass of the ligand itself over time and this is this this constraint is then fulfilled in the form of a Lagrangian multiplier which is essentially acting as a constant as a constant force which is acting on the center of the mass of the ligand now how this is actually then here we are facing one problem it's very nice to calculate these kinds of constraints on the level of a free energy now along a reaction coordinate however this information needs to go back as well of course into the full dimensionality of the Hamiltonian and this is tricky and there's because it's it's a not clear or no bijective projection that you have there so what essentially is the given way how people are approaching is is that you distribute this force dependent so by a mass weighing factor onto the different atoms of the ligand itself so each ligand atom is receiving this part of this constrained force as an additional force term directly into the integration I was skeptical in the beginning of the if this is really working that you can enforce unbinding but it turns out that ligands then really move out first with the residue which is least bound or fixated by fix it into the binding side by context so essentially it's a fact that it's a case that the weakest link again is breaking in the simulations so it's not this this way of calculating or applying this constraint to the ligand is in a way work around but it can be done by any kind of MD simulation program which is essentially using constraints so the normal constraints that you're using as well for fixing your H as of your hydrogen heavy atom bonds we are using grommets here in this case and distant type constraints really okay thanks and then last question so if by some person actually if you have a landscape of g of x and gamma of x for the reaction coordinate would it be faster to use the Smolokowski equation to calculate times case such as mean per passage time I haven't tried to show Smolokowski I have to say that would be something that would be something to trust what I tried to play around was if kramas rate law can be directly applied to the respective free energy landscape and the problem there is it doesn't work majorly because essentially now if we go just to the example here from from the sodium chloride ion pair right at the moment the bound state looks very much like a harmonic approximation that you can do there if you zoom in you realize that you are not able essentially to really fit a harmonic potential down there so either you get the left side wrong which is the r to the power of minus 12 barrier or the right side wrong which is in the r to the power of minus 6 barrier and there is no way how you can fit a harmonic approximation down there so thank you for Smolokowski I would need to see but kramas doesn't work definitely okay thank you very much so thank you again for the talk and for answering questions and so we say we reconvene in 10 minutes and treat pm sharp I don't know if Edina is online already I saw her I'm here yes yes so Edina maybe you can you can try for your slides the work okay can you see my slides yes perfect I'm saying can you see my mouse moving um yes I do okay perfect okay so I think everything is set okay so yeah we started three sharp okay perfect thank you thanks hello I was just yeah oh sorry I was I was unmuted yeah you can share your screen Edina sorry okay just was wondering if I start to play the presentation it's going to share the right screen because this sometimes yeah so then we just start to start to show one second so I will start in a minute okay so just to make sure it went on the full screen mode no no it didn't actually before it was okay well not okay we are right it was not in the screen mode can you try again yeah okay just this is the important thing so let me just try again um so just I think you have to just click on the perfect no it disappeared again yeah I just cancelled it now I just cancelled it but you can start all right so then let me start okay perfect and then I'm going to share again no wait I do just the presentation yes sorry it's not working no no no it's working okay just leave it I just present you and then you can go okay sorry yeah so thank you sorry welcome back everyone so the last speaker actually of these of today and of the of this workshop is Edina Rosta from Kings College London in the UK and Edina will tell us about irreversible irreversible Monte Carlo method to sample biomolecular simulations so Edina please go ahead you have 30 minutes then yeah and then questions thanks okay perfect so let me try to play again and the presentation first I'd like to thank the organizers that are inviting me to this very nice workshop that had to be held online Cali and it is really a pleasure to give this presentation today and I would just like to mention that my affiliation now is University College London as I moved during the lockdown in July so Edina sorry to interrupt Edina sorry the audio is not optimal sorry sorry what is not optimal the audio now is okay we have to maybe you have a microphone maybe you have to speak closer I don't know I'm sorry but I'm not able to change that because I just know it's fine no it's fine pre-record the presentation and so I would like to just show that I'm really sorry if it's not optimal but yeah hopefully it's going to be fine I will try to make it louder much of the research in my group has been motivated by looking at phosphate catalysis and even though phosphate chemistry is quite simple and just looking at the bone breaking and bone formation of these simple groups is very very simple chemically in biology and in life this is probably one of the most important catalytic reactions that are being used by enzymes and here we look at various biological processes very almost all of them very it's very important in reproduction energy storage and transfer signaling metabolism and as we can see even these coronavirus proteins and code as many of these essentially enzymes which we will be talking about for example polymerase for reproduction and ATP hydrolysis enzyme RNA helicase that I will be talking about in more detail and in particular in my group we do QMM simulations to study these catalytic reactions as well as MD simulations and looking at sampling which is very challenging for these complex systems especially with a DFT based QMM calculations now another focus in my group is looking at metal ions and in particular magnesiums and to understand how these magnesium ions are involved in there and what is their essential function in catalyzing these reactions and I will also be talking about this briefly and not to provide a bit more overview of the research that is going on in my group I only I would like to only mention that the different areas that we are working on so one of the key areas is looking at novel and hand sampling algorithms this is very important because of the complexity of the problem and how long QMM simulations take it's very important to be able to use and hand sampling simulations and here we are developing novel algorithms focusing on Markov state models and coarse-graining Markov state models but today I will not have time to talk about this topic but I would like to mention one of these new areas that we're working on looking at irreversible Monte Carlo samplers now as I've already mentioned we look at phosphate catalysis in biomolecular simulations and here I will be talking about the RNA helicase of the coronavirus and as one of the examples of catalytic enzyme that uses phosphate catalysis and how we now can use the understanding we have developed of these systems in modeling the structure of this protein then I will unfortunately not have time today to talk about because this is a very short seminar but we also work on computational molecular design looking at materials applications and these are really nice collaborations with several experimental groups in Cambridge so today at first I'd like to tell you about these hand sampling algorithms that we have been developing and these are the basis of this is looking at the classical Monte Carlo algorithms from the 1950s developed by Edward Teller and so here the idea is that we would like to use statistical methods to calculate multidimensional integrals in particular partition functions and to find relevant probability distributions and sample these probability distributions in practice we want to sample the Boltzmann distribution in our complex system we are studying and now here and the most standard approach is using the master police Hastings algorithm where we want to generate a new configuration xj and we have an acceptance probability to accept this from our previous configuration xi which is given by this acceptance probability formula here that everybody knows that we call the metropolis Hastings criterion here so what we want to do is to now develop novel ways of generating these probability densities where we can show that we still converge to the correct Boltzmann distribution that we want to sample but we can do this in a more efficient way than this original algorithm and so this project is basically done by Fahim Faizi a PhD student from the Keynes program supervised by George Teller and me and what we are looking at is irreversible Monte Carlo algorithms where we can sample these probability distributions in a in a different way which leads to some irreversibility in the algorithm that maybe we could picture as making a duplicate of our system which will have a flow in one direction and then the other counterpart will have a flow in the other direction so the overall system would not be reversible but the final distribution that we would like to find would still be the correct distribution nice it's just a way but what we do in practice is the following so the idea is that we want to satisfy the balance condition which is given here in this equation so this condition would tell us would be the necessary and sufficient condition to make sure that we have the target distribution and this is most typically satisfied through reversible algorithms where the detailed balance condition is satisfied this however this is not necessarily a required condition only if we want to look at the reversible algorithms however in this case we want to look at irreversible algorithms where this detailed balance condition is not satisfied and the way to do that we are using the lifting framework of two roots in a tall and the idea of this framework is that we can imagine our system as a kinetic network of various nodes and now we make a copy of this system by essentially assigning a spin to each of our states or our nodes of the of the network and then this could have the value of plus one or minus one and all other connectivities would be otherwise the same and so what we can do here is now we can define the transition matrix on this new extended space and this new transition matrix would have our intra-replicar transitions here which is denoted by this t plus and t minus which would be just our normal transitions within the network itself and then to actually change this pin we could also make a move which is this inter replica transitions where we could go from a j state which had the plus one spin to a j state which now has this epsilon equals minus one and so therefore this lambda is a diagonal matrix of transition probabilities and now if we construct this matrix appropriately then we can ensure that even though detailed balance condition is not satisfied within one of these transition matrices but the skewed detailed balance condition would be instead satisfied so therefore within one of these networks we would have flow which is compensated with the flow with irreversible flow on the other side or on the other counterpart and so we can what it can be shown is that if we satisfy this skewed detailed balance condition which looks like this then this will lead to also a satisfaction of the balance condition so then this will ensure that we will have the right target distribution now to accomplish this we will be using the lifting framework originally developed by Turitz and to do that we will modify our original transition probability matrix Tij by multiplying with a skewness function and this skewness function has to satisfied following two properties it has to be a probability between zero and one and it has to have this asymmetric property here that we need to make sure is correct and once we have such a function then we can ensure that skewed detailed balance is satisfied so we have the right equilibrium distribution and then we can choose this skewness function in such a way that we would achieve a more optimal sampling so originally we started this project by generalizing Sakai and Fukushima's method which was developed for the Ising model the skewness function that was developed for the Ising model and so we generalize this where the spins can take more values not just two and so then this allows us to generalize it to more complex systems and in addition just to looking at the Metropolis Hastings type of acceptance probability we also apply these irreversible algorithms to the GIP sampler where if we already know the target distribution the relative probabilities of our proposed new states then we can draw from that distribution so the move will always be accepted and an even faster and more efficient version of this is if we use the Metropolis GIP sampler where we basically want to stay in the current state as little as possible so we exclude the current state in our proposition and then we use a Metropolis algorithm to accept that move and otherwise we use the GIP sampler to move to another new state so we have developed a more generalized version of these irreversible algorithms using the Metropolis Hastings the GIP sampler and the Metropolis GIP sampler algorithm in our recent paper in JCTC and we applied it to the Ising model to the POTS model the generalized version of the Ising model and also we applied it to a two-state system with a simple 1D model potential so this was our simple 1D model potential shown here and when we model it these are the trajectories that we can get so this is a the reversible Metropolis Monte Carlo algorithm you can see here the trajectory is along x so we stay either in the valley corresponding to minus one or in the valley corresponding to plus one and we can see this two-state behavior with very few exchanges and mostly staying in the in the metastable states now if we use our irreversible sampling algorithm keeping everything else the same we can see that now we see many more transitions of the system and then these transitions also translate them into a smaller error so this is the variance in measuring the free energy profiles here with the two methods and we can see that generally we have smaller errors measuring this potential and also the correlation function the autocorrelation function corresponding to the simulations is faster in this case looking at the x coordinate so we have a more efficient version of our irreversible sampler which can be seen and also what we can see is that yet it's at the same cost as what the cost was for the reversible algorithm. We have also adapted these irreversible sampling algorithms for simulated tempering and this is the example on the same 1d model potential where we looked at 32 temperatures and what we can see is that in the standard Metropolis Hastings the temperatures are sampled slower than in our irreversible Metropolis Hastings or in the irreversible Metropolis Keep Sampler algorithm similarly the x coordinate the potential the position is sampled here more frequently in the irreversible algorithms this is also shown if we plot the autocorrelation functions of the temperatures or of the energy another example where we have applied it to is the alanine 5 in explicit water so again we tried here the three algorithms and we see an enhanced sampling of the temperature range again and what we also see is that the autocorrelation functions of the energy are also improved compared to the standard reversible Metropolis Hastings. So to summarize this part our work was motivated to develop and more general irreversible Montecaloo methods building on the work of Turitzin et al and we generalized Sakai Houshokushima's skewness function that was originally designed for an icing model and so we did it in a way that now the skewness function can be can use any observable of interest as the lifting coordinate and this way we can use not just the icing model but we can have other complex systems with multiple states in addition we also constructed these irreversible Montecaloo methods for not just the Metropolis Hastings but also for Gibbs sampler and Metropolis Gibbs sampler that are more efficient and finally we also in this very recent work we have also generalized these irreversible Montecaloo methods to simulated tempering simulations. I'd like to come back now to the biomolecular simulations and in particular looking at magnesium it is not only important as it has an essential function for catalysis but it is also very important structurally to find drug molecules and to help inhibitor design and here are three examples of small molecules inhibitors bind directly the metal ions and an example of on HIV integrase one of the that binds some of the best drug molecules that patients can take for 20 plus years making AIDS a chronic disease then a RNAs H which we have looked at the catalytic mechanism of this classic two metal ion catalyzed enzyme in the past and also another example is an antibiotic here cyclosurin bound to these two magnesium ions in DL and in DL and in ligase against used against tuberculosis which here we have an ongoing project with the Luis Carvalho's lab at the Krieg Institute. I would like to further come back to this work with Peter Cerepanoplav at the Krieg Institute here so in the Cerepanoplav lab they were able to find some excellent resolution cryo-EM structures for both the wild type HIV integrase as well as this double mutant which where these two mutations lead to drug resistance so basically when this glutamine and the glycine is mutated even though they are very far actually from the drug molecules they lead to the drug resistance of the first generation compounds while the second generation drugs are very little affected and when we looked at these structures what we could tell is that there is a hydrogen bonding network which is even further stabilized by this quite far serine mutation and this hydrogen bonding network interacts with the glutamate residue that coordinates one of these important catalytic magnesium ions and that leads to a weakening of the drug molecule to these metal ions and a slight change also in the coordination of these metal ions that can be seen for example in how this water molecule here is dislocated and then this change in the water position and water flexibility and the general flex increased flexibility of this pocket can also be seen by for example following the distance of this water molecule to these carboxylate groups that coordinate the magnesium ions and what we can see is that in these double mutants we see exchange of this water molecule and much more flexibility in comparison with the wild type and this leads together with the this leads to this drug resistance and also the difference between the weight is too much whose first and second generation drugs respond to these changes can also be explained by how tightly they are packed in this active site and as there is a shortness of time i would like to quickly tell you about some of the recent work that we have been doing now looking at the SARS coronavirus encoded enzymes and here there is a very long polypeptide is encoded by the RNA of SARS-CoV-2 which encodes several proteins and there are several that have been used already as targets for drug discovery like the proteases or structural proteins these spike proteins however there are quite important essential enzymes that are encoded here that have to do with phosphate catalysis and one of these is the RNA dependent RNA polymerase rdrp which is the target of remdesivir the approved currently approved drug for COVID-19 now the other enzymes that that follow this are nsp-13 and RNA helicase nsp-14 and nsp-15 that are also phosphate catalytic enzymes and exo-ribonuclease and endo-ribonucleases now what we have decided is to look at the helicase because this enzyme is quite different structurally than the human enzymes the closest structural homolog is only about 26 percent homologous so it is a very promising drug target as it would likely not inhibit the human enzyme the function of this enzyme is to help unwind double-stranded RNA and it is done by the help of ATP so ATP binds at the interface of these two domains called rack a1 and rack a2 domains and it pulls and single-stranded RNA chain from the five-prime to the three-prime direction while hydrolyzing ATP to ADP and phosphate when we started our work on this enzyme the only available crystal structure was the six JYT structure which is really of the SARS-CoV enzyme however it's essentially identical on residue different so this is an excellent starting point and it looked like it's a dimer that functions as a dimer now meanwhile several other or two other crystal structures came out and both these are in their upper form these six ZSL is another relatively high resolution structure where now this dimerization looks completely different and just very recently in cell a new cryo-EM structure came out and in this cryo-EM structure we have the NSP12 RNA dependent RNA polymerase with some of the accessory NSP proteins and here we can see two of the RNA helicases that are bound to this complex and so what this tells us is that likely the enzyme doesn't function as a dimer either like six JYT or six CSL but rather it complexes to this NSP12 RDRP protein and therefore we can consider its function as a monomer now the six JYT and the six CSL are apostructures there are no ATP magnesium or RNA bound however the very recent cell paper does have ATP bound nevertheless what we think is that the structure currently doesn't have a very good capture of what this ATP coordination is it was only modeled and the part of that active site probably is very low resolution could possibly be around three to six angstrom resolution so we currently think that some of our modeled structures might be actually better capturing the ATP coordination and there is no RNA actually bound a single stranded RNA chain bound in the structure now to begin our modeling we looked at the the top and around top 100 1000 closest sequences that are homologous to our RNA helicase sequence and out of these there were about 950 unique sequences and what we have looked is what is the sequence conservation and in addition to that we also have looked at the domains of this RNA helicase and we know already that it has a domain one which is very very different than most of the other domains or other folded protein domains available in the protein data bank or even sequences that are available in the uniprot database and then it also has the racquet one and racquet two domains and we can see a lot of matches for this area and these are typical ATPase domains that actually coordinate the ATP and perform this helicase activity now just to get back to domain one it is the area which will coordinate single stranded RNA chain and we have only found around 100 sequences that offer any type of match to this domain and if we look very closely the only reasonable match is basically all our coronavirus protein so if we look at the cumulative number of residues here that show a decent alignment and to some of our residues what we can see is that in the beginning we have quite a lot of match of all our like about 900 of our 1000 or 850 of our 950 match has a residue similarity to this top to this 400 residues and then the residues that are matching this domain is only really around 100 or less than 100 that will contribute to coronaviruses and in between there is no alignment so there are no sequences that would that would match larger than bigger than this area and partially some of this some of this domain this makes it very unique and offers a very good opportunity for this RNA helicase as a drug target looking at this domain one which is very very specific to coronaviruses there are no other organisms that would have a good alignment to this domain also we can using this alignment we can identify some of the important motives and here i would highlight is the motive which is this that both helicases or generally helicases have this catalytic motive where this aspartate residue is the one that is the likely proton acceptor from the water that cleaves the ATP to be able to have a catalytically competent structure where we are able to model ATP and the RNA we have been looking at already available existing structures and to do that we have looked at the coordinating residues that are near the ATP binding and where we can see that they are very highly conserved then we took the available structures which show very small sequence similarities then we took these structures and the closest one had only 11 percent similarity so with the help of these structures we were able to model our ATP binding pocket and what we see here is that this phosphate group here is coordinated with the magnesium several arginine fingers and this TE motive here shows the aspartate coordinating the serine and the water molecule that directly coordinates the magnesium as well as this glutamate that coordinates the catalytic water molecule and so unfortunately this binding pocket therefore is not very good for a drug discovery because this is a very negative recharge ligand which is not a good ligand for drug purposes so and unfortunately this the purine part of the ATP is not very specifically recognized and there are only really a few groups that interact with that and the reason for that is that this enzyme is actually has a second function and ATP and NTPase function which helps for the capping function to create the RNA and so it basically accepts any kind of NTP nucleotide triphosphate molecules that it can then hydrolyze them therefore this end of the ATP is not very well recognized now to model also the RNA we looked at the available crystal structures and we identified that the RNA binds very very similarly in all available structures with the three-pime and coordinating the rake 1 domain and the five prime ends coordinating the rake 2 domain and with that we were able to build a homologous structure for the ATP for the RNA however the problem with this is that the RNA recognition is not very conserved as many of the interactions are done by the backbone of the protein and therefore the side chain is not this is doesn't need to be conserved then the last when we looked at the conservations we've identified some Cretanin residues both in the rake 1 domain and then in the rake 2 domain which are highly conserved and show here also structurally you know so in the sequence alignments of other helicases okay i mean so that helps us to sorry two minutes position the RNA and with that we were able to model both aposimilate okay and i'm just going to skip so with that and i would like to thank everybody who contributed to this work and in fact the work on the coronavirus was a very very nice collaboration which from the start we had some fantastic volunteers who were helping us and we were doing all the simulations together and identifying problems and solving these problems together and then here is the list of people Sarah Harris from lease Jeff Wells from UCL Nadja El-Gobashi main heart from technical university Berlin Elisa Fraza from Paris Andrei Pysliakov from the University of Dundee as well as two of my students Dinesh Peta and Majid Badavi and also volunteers from different areas Thomas Mercer and Rachel Kerber which we thank them very much as well as experimental groups the co-group to perform synthesis of small with that i would like to thank again my group members Majid and Dinesh i have mentioned and Pedro who has been contributing to both of these projects and all collaborators i couldn't talk about the project and thank you very much for your attention and i'm happy to take questions Angelou you're muted yes i'm muted yes we have okay thanks sorry i was unmuted we have a question by Zrabani Tarafdar okay what is the best qm level model for him for magnesium ions so usually we use the dft with beef relief and and that works for us quite well i'm not sure exactly what is meant by the best qm level i'm sorry i'm looking here and there because my big screen is not here so yeah i'm not sure exactly what it meant by best qm level method but i have to say most of the dft methods are quite good whereas semi empirical algorithms we haven't had much okay thanks uh do we have other questions so if you have other questions just type or ask yourself unmute yourself and ask okay so if if not let's thank edin again so we close so first of all i would like to thank all the the the speakers for for for for these for these talks for in spite of this difficult situation i mean that we are all with them where we are all in i mean but i think it was very nice and also for us as i said the very beginning as we said the very beginning was a sort of