 Yeah, good morning. Wonderful sight here. Thanks for getting me here. And thanks for the brilliant introduction, which really makes things a bit easier for me. First of all, I should mention, I did not make this semantic distinction between replicability, replicability, replicability, replicability, and all that. I think to make a wrap up now, none of them we have. So whatever I've been talking about fits perfectly to all three of them. Unfortunately I wasn't able to take part in the previous discussion, so maybe I blurb something which already has been discussed, please give me some feedback, let me know, stop me from saying too many redundant things. I also should point out your enemy protein I did not simulate yet, so too complicated. Okay, so just a few scattered thoughts to spur the discussion. Actually, I went into that kind of discussion many years ago with a mathematician. They tried to help us with our simulations. I encountered that several times. As soon as you give them a real problem, they don't come back. But this mathematician, quite clever guy, Peter Deufelhardt, in Berlin, he pointed out that we cannot do the simulations in the first place at all. It doesn't make sense at all. And the reason is, of course, the chaoticity of the system and the mode. This is the right word, chaoticity. Which all of you know, and also alluded to that already in your introduction. So if you start any reasonable molecular dynamics system with more than two atoms, there's really a small difference in the last digit, even double precision. And you just let them run, you will find out they will diverge in phase space after a few people seconds, quite quickly. I'm not saying they are quite decorrelated, but they are quite distant in terms of RSD or so in phase space. That happens more or less immediately. And so this mathematician argued, one American mathematician, by the way, that whatever you do, you can never be accurate enough to really calculate the true real trajectory. And that was kind of a surprise to me, because I never had thought about that this way. The reason was that I kind of implicitly knew, and that's what I also told him, of course, that also nature doesn't do that as accurately. So if nature doesn't do it, I don't have to do it either. And that was my main argument. This argument, that was a strange experience. I never got across to this mathematician. So he always insisted I would make errors, and that will destroy my trajectory after a few people seconds. It doesn't make sense at all to talk about any trajectory. Don't do it. Yeah, that was sent a probe to Mars. That is a not chaotic system, but don't do the simulations. He doesn't know still, but... Okay, so that was my kind of... Yes, please. Does statistics help in his view? What's the problem exactly? I get to that. In my view, statistics is the key to everything here, of course. But that is also something I didn't get across. But I'll get back to that exactly. So essentially, that brought me to think a little bit more about what to do. Yeah, we had the same question before. What do we mean by reproducibility? Certainly we don't mean to reproduce, I think, the same identical project. You all know we can do that, of course, but we need the same seed for the same random number generator. We need the same CPU. We must not do it on a parallel computer because otherwise we don't reproduce. So all these things make it simply impractical to reproduce a simulation. So what do we want to reproduce? And what I learned from the mathematician, if you can't really get forward, just define it. So the definition I came up with at that time was simply... I called it relevant observables, you may also call it essential or stable observables. All those, which despite the fact that the individual trajectory may differ, turn out within a certain error rate to be the same. Only those can be interesting in a certain sense of observables. And typically that would be averages, like RLSFs, average distances, average structure, some spectra, transition rates, average transition rates, or mean test passage times. Of course free energy is hopefully an experimental observable. Anything which you calculate and can be measured, reliable, or personally also should be reproducible. So any variable which does not show this chaotic behavior because it's averaged out, I guess those are the observables which we should strive for. And that is very exactly what you may need. But of course we are afraid because we all know that estimating errors can be very, very hard in simulations. By the way, I kindly disagree with... You should not spend too much time to learn how to calculate errors because there's a black box which does it for you. No, I strongly disagree. As soon as you rely to black boxes on black boxes, your errors may be too small. There's nothing worse than errors estimating too small. We know it from the free energy business. All the errors we calculate are typically too small because we rely on black boxes which have some mathematical theorem, but forget the assumptions on the line. Okay, at the very end of the day of course we want to get to functional mechanism and also those should be of course reproducible and stable in the sense which I have mentioned. One example that I would like to show is simply this is a case of an observable which we calculated a long time ago. And that's the rupture force which you need to rupture a non-covalent bond of ligand as a receptor which can be done by experiments, AFM experiments. And what I would like to share here would be what I think we should really, and when it comes to quality, we should really aim at. Can we predict something which can be measured? Ideally before it is measured and ideally by another group. Can we predict that we give a certain error margin and within our errors which we give within an experimental error? I guess that is the gold standard what we all should aim for. That's an example here. So this is how fast you pull the ligand receptor apart and that is the force which you need for that and the blue is experimental data by Simon Schroding and co-workers and green here that is simulation data and obviously we're limited to short timescale that's why we have to pull fast. But you can see that now we start to overlap here and within the error points which just come from 20 repetitions of each pre-production of each and only right here we can get quite nice agreement. I guess that's really the ultimate test and also ideally it should be against an observable force here which has never been used by anybody to calibrate the force field which we are using here. So that means that the general is our approach to that. I guess then we can start to put some trust into the simulations. That means that many things go wrong but that is what we should aim for I think. So I mentioned already is nature reproducible anyway but it's not and there are several reasons in fact. First of course that this thing in mechanics proteins are not in vacuum but they are in an environment. Other proteins are around, water is around, molecules bumping on the surface of the protein, solid molecules, etc. So already in a classical world which our simulations are in at least as the nucleic motion is concerned already there nature is by no means reproducible so why should we be in a strong sense and the second is of course Gordon and the top of that is quantum mechanics which even if the protein were in vacuum and everything still it would not always follow the same trajectory because the trajectory is not the defined quantum mechanics. So from that also follows from an evolutionary perspective if I want to achieve a certain function as a protein I better be robust against these perturbations. If I am not then probably that's not a reliable function in a protein and that makes our life a little easier. We just have to be as inaccurate as nature is. We can't afford a certain level of inaccuracy and actually that is where we all rely on. Okay so much for that. Another piece of thought I'd like to just bring up is then what you might call reproducibility and that of course links to what I've said user-friendly numbers. If you don't start with the same random number for a e-bar then you don't get the same trajectory. It depends on what precision you are using. It depends on the sequence of the operations. If you add a large number to a small number and then you add another small number that gives a different result typically than if you add the two small numbers in the first place and only then to the large number. You can kill you and party through and you have to take care of those and that specifically means that as soon as you go to a parallel machine where the sequences of operation are not always determined uniquely then already from that fact you get different results which are not reproducible. And of course if you go to a different GPU of course there is an iterative standard on floating points but it leaves a few options open when it comes to two roundly up or down and so that means if you go to a different GPU or even a GPU you may get different results that may be in bugs in the GPU as we have seen in the CPUs. On top of that come and we discussed that this breakfast approximations we are using and I think that is probably where we might want to spend most of our thoughts when it comes to like transferring the simulation package that we give as a complete box if you want because we all use a number of approximations and that is a little different from maybe quantum chemistry where things are a little more easier I'm not saying they are really easy but I think they are easier compared to what we are doing here because the methods they are using are really very very well defined and that's not so much the case for example electrostatics calculations we all use PME or something other times we use cut-offs we may use for multiple methods there are different approximations schemes and they give different kinds of errors but all may be accurate with it and say 10 to the minus 6 in the forces but still the errors are of different kind and some errors may be malicious others may be benevolent and even help us not enough to just specify accuracy I'm afraid for example if you use a cut-off you can have a pretty accurate force but because it's systematically always wrong in the same direction it can accumulate to large artifacts whereas if PME things tend to average out a little bit more beneficially and so even with the same accuracy numerically you get probably quite different accuracy in your relevant observables then at the end of the day modern time-stepping is another approximation besides you name it I think our field because it's so computationally demanding and so dealing with so complex systems also this methodologically really really complex we have to get this under control yeah it was national quantum chemistry and I think it does help to look at our fields which are in that respect in terms of standardization more advanced than we are and I think that's not our fault it's just that our job is much harder I mentioned quantum chemistry may also help to look into solid state physics for example where also there are standards for electronic level calculations pretty high on the side but also it helps us go to the electron bolt rather than kilo truth the arrow is much much smaller so we with our kilo truth it's really they come up and say point to electron board you are super accurate yeah I was surprised on that with their standards are not as high as ours it's really true in our field a fraction of a KT can make a big difference actually that is what very often happens in proteins actually it can be a fraction of a KT and antibiotics we have studied binds in the tunnel of the ribosome of bacteria because there is just one H-bond more than in our ribosome so what the difference between life and death both for bacteria and us is just one H-bond a few kilo truth or something that sounds really frightening and that is for a system which has about two million atoms which means for each atom this accuracy has to be really tremendous and that makes our life really really hard and I think it's really much harder than in quantum chemistry than in physics okay and finally I just listed here what you may call practically reproduced we will never get to this you see beauty if you run a simulation for a year or something on a huge machine you just not get anybody to repeat your simulation it's about a similar problem as we have in elementary and particle physics right if you build a huge accelerator basically like that nobody will build a second one just to represent the state that will not happen and it's a bit like that also in our field here and we have just to be aware of it I'm not saying there is something really fundamentally wrong with that it's just perfect and I didn't list that here but because it was also mentioned I think in our field what is really crucially important is error calculations and how do we deal with add statistics it was pointed out that it is really hard to come up with good statistics in our field we are happy if we see a conformational transition which is functional maybe once or maybe twice it's by the way not exceptional that it takes a year to get the paper actually written not talking about getting it published so to deal with a small number of events I think in our field is very critical so one additional term I would like to throw in there is a very good statistics or probability theory we need and in my field there is only one this is base theory we need to really look more closely to base because base is the only correct way to calculate errors I would collect this is a bit controversial but this is a discussion here so there is only one way to add up to numbers three and four and always seven and if you deal with probabilities there is a life you only only one way to do it right and this is by base and therefore I think our field should also learn just the correct way and this I miss very much in our field so that just came to my mind when we were talking about error bars and so no I don't want black boxes for error bars okay I guess that's about it just one brief thought where all what I was talking about and what's discussed here also becomes I think very important I think the field is kind of mature to one more step from just looking at individual systems and finding out how these systems actually work and so we're doing that all the time it's highly exciting but I think we are in a position where we can also look at many systems at once and try to conclude from a whole bunch of situations and we are of course not the first to do that but never a field was mature enough to collect a lot of data for example in physics people discovered I think it was Headsprung in Brussels that certain stars have certain colors and that relates somehow to their brightness and to the absolute brightness and only from that serving a whole large number of stars they were able to conclude how stars evolve that was a big achievement only possible by looking what was done before those guys individual stars try to make something up from there and of course in sequence alignment the same thing happens so looking at many sequences taught us a lot of things looking at many structures same thing again taught us that for example structures evolve much more slowly than sequences underneath which is a bit puzzling but that's exactly what we learned from looking at many structures same thing in molecular dynamics and trying to see how this dynamic space actually looks like how proteins distribute in this dynamic space and also there it's of course absolutely crucial to understand now if you do several simulations with several force fields and other approximations how do the protein in dynamic space how do the dynamic fingerprint actually depend on those approximations and what is real differences between proteins so I just say we should fill the link in a more systematic way than just looking at individual proteins and then a brief preview is what we term the dinosomes rather than just simulating one protein as I said we went to something like a few hundred and try to answer the question if we have now two proteins of different structure is their dynamics and you're completely crazy question because of course the dynamic is different you can't even compare the dynamics if you have different structure right and I'm saying yes you can you can just need just to come up with the right matrix to do that dynamics may be very similar despite the fact that the underlying structure is very different and if you explore that you can actually in some abstract dynamic space with the lack of time I don't throw here it's just kind of a graph theoretical connection of these hundred proteins each dot would be one protein and they are just laid out here projected down to two dimensions in this graph according to their similarities so if they are connected their dynamics is very similar and if they are far apart here their dynamics is very dissimilar and what we have done here we have plotted on those for example signaling proteins here transcription proteins proteinases they are not uniformly distributed over the whole dinosaur space as we call it but they cluster in certain regions of that space and that makes it possible to predict function just from looking at the dynamics not in that sense so if you give me a new protein I do a simulation I try to place the new protein up here I would predict it's a signaling protein if it ends up here I would probably predict it's a glycosidase and if you do that I don't have the slide but if you do that then you can ramp up the success rate for predictions from 30% to 50% or something structure based predictions of function 30% 50% if you include dynamics so that actually helps and that is something to see in such a graph how much for the protein move just due to numerical inaccuracies just due to the choice of force just by repeating the same simulation with a different seed so what is the error range of this position in that strange space so that is also something which we address and I think it can only be done by repeating many simulations and see actually how much it can I think by time is we are running it's not right here very very interesting your comments about LHC where they don't run it twice the LHC experiment was actually designed to completely separate teams and actually I think we should learn from that to be honest when we want to run a simulation to predict it's observable you should take two separate teams and I think we should actually be refereeing if I want to run one simulation we should say no don't do that so one simulation is not useful so the LHC will give out places in one month and the physics community said that they have a significant frequency process between them actually there's no pre-study six months of simulation on the shade because it's completely worthless then how does that work well you can leave it for this session I also just want to make I think I'm not suggesting just a paper