 What we're going to do today is that we're going to finish up the part that we'll be speaking about protein folding and particularly kinetics of protein folding. We're just going to be the last few chapters in the book 90 to 21. Then, let's see, tomorrow, Wednesday and Thursday, I'm going to be speaking about a couple of concepts that are actually not included in the book, in particular docking. We're going to be speaking a little bit about nucleic acid structure and free energy calculation, which are important concepts that you might, well, you will see them in the labs, but they're also going to be slightly more applied and things you might actually do in your work in the future. But we went through a lot of things last week, because I figured we can start with rehashing first the discussion, and then I'm going to have a couple of slides to repeat a few of the key concepts from Friday. So are you good enough to run this, or should I ask the question? Well, you can read the questions and start going through them. I think it is on Wednesday. We're going to try a different setup than I'm going to have you discuss without me. Monday morning coma. Starting at the top is always a good alternative. Right, and that's if you focus on the transitions. I think I was thinking more of the properties of these three states. So how would you characterize them? And the key difference here is really is between native and multiple, because the fact that random is random, right? But multi-globular particular today, it has far more structure than we thought a few decades ago. I would even say if you look at what the book is saying about it, it likely has more structure today than what the book thought 20 years ago to. So what happens when you go from multi-globular to the native is really that you look all the details in place. So already at the multi-globular state, these things where you have the hydrophobic residues turning inside, the hydrophilic facing the surface, that has happened. We've started to form some sort of secondary structures already. All that has happened. But the thing that happens, that sometimes happens, is that for specific sequences of amino acid, you can then make the secondary transition and really lock everything into place. That will only happen if you have the right amino acids in place. So multi-globular will all virtually always be formed, but native depends on having a sequence that will be happy in the protein. Let's continue. Two, who would like to take that one? Hardly, maybe. But the point is, and I think that's even a real good, it's so close that it's almost indistinguishable. If you just, if I just showed you 10 examples of proteins where half of them was multi-globular and half of them were native, you would probably be hard-pressed. I bet there might be one of them where you could see that it was a bit unfolded. But in general, you would not even spot this difference if you just looked at the structure. Exceptually, similar to native. What really, the difference is that there, again, some side tears haven't fallen in place. There is some water inside the structure. Things that appear, they appear to be details to us, right? But it's just, you haven't really locked it in. I think a good example could be imagine sticking a key into a lock, right? If you stick the wrong key into a lock, it will still look right. You have a key and a lock and it doesn't open the door. I spoke a bit about this, both polymer and hetero polymer concepts. Why are they important? And not just favor. The point is, it would be infinitely good, so that no matter what sequence we had, it's hard to beat infinity. There is no finite energy that could beat an infinitely good entropy, so that by definition, they would always prefer to be spread out. Hetero polymers, then. And that's the, well, we don't, well, yes. This is a homopolymer, right? Homo polymers are boring. They're very useful, but they're boring. They're repeating units kind of similar to the fibrous proteins we've looked at. They're repeating and they get large scales, but they don't have the magic, the proteins you brought out, the magic and the specificity and the function. So then the hetero part here gets you specificity. And of course, if we start to randomly swap out amino acids, we lose the specificity. So how are side chains packed in a native state? It might be a bit unclear question, but I'll have you answer it first. How closely? Is there any water left at the inside of proteins in a native state? Nothing. Everything that isn't side chain has been pushed out. And the other thing I guess one could say there is again the specificity. If you, if you think about side chains, my fingers are side chains between two helices. They always pack in a very well-defined way. It's not that they might be packed either that way, or that way, or a third way. If it is a well-defined protein, and this is again, if this was not the case, if they didn't have both a unique specific and consistent packing, you would never be able to determine an x-ray structure of that protein because each of proteins in the billions of copies would have the side chains in a slightly different way. This is, well, I was about to say it's a problem. It's of course important because it gives you the protein stability. But if you think about things like bioinformatics simulations, this is a problem. So if I have a bioinformatics model, and the side chains should be packed this way, but for whatever reason, my model have them packed that way, you're never going to be able to just take a computer and fix up that structure. Because to pack the side chains in a different way, I would need to completely unfold the protein and then refold it again. So while, while I might have the right backbone and everything, but occasionally it might seem that getting the side chains right is just a bit of final polishing. But the side chain packing is really part of why the protein founded structure that way. Now you can, not in reality, but in bioinformatics, you can, of course, cut out all the side chains, keep the backbone, and then use a computer program to try to come up with a different way of packing the side chains. That's reasonably good, maybe 80 percent if you think of the person measuring this chi-angles or something. But in reality, the protein would need to completely unpack to pack the side chains in a different way. And this is also why it's so important if you swap out, say, an alanine for a tryptophan. Because by definition, if this was a stable protein, that alanine was packed before, right? And now you're going to try to squeeze in a tryptophan. There isn't room unless you're right next to a binding site where the tryptophan can stick out. Similarly, if you have a structure that needed a tryptophan, if you now swap that out for a glycine, there is too much room. And the protein is going to hate vacuum. And you can't just put water there or something. So it's very common that we try to swap out side chains to simply, for instance, if I have a helix, right, then I try to more or less systematically swap out side chains around the helix. And then I see when this has an effect on the function. I might be able to see what other helix, for instance, is this a big protein that's, let's assume that it's a three helix protein, but I think that there is some sort of dimerization interface there. So there would be two copies of that protein. But I don't know quite what the structure in this helix looks like. I only know that there are three helixes in my sequence. If I then start to randomly, let's say that's one, that's two, that's three. What you can do is randomly make mutations in each of the helixes. And then I might find out helix one and helix three. There, it's really sensitive. If I start introducing trip defense, but in helix two, it doesn't really matter. But I kind of know that that should be helix one and three, right? And that's why we, that's why we so frequently do these scans, histidine scanning, alanine scanning, anything has to do with systematically changing to see if we can perturb the structure of it. We spoke a bit about what happens during this folding. And you see that when we talk about this folding, I'm particularly talking about the folding to the native states. But which one is that? The folding between multi-globular or native or between random coil or multi-globular. And, sorry, not only is that important transition, I would even argue with a bit of hand-waving. Random coil to multi-globular isn't really that well-defined as a transition. It's down, it's pretty much only downhill. It's going to be a hydrophobic collapse. We might have some structures forming. But the part where we need to wait and actually get over a barrier is going to be multi-globular native. And that leads to two questions here, enthalpy and entropy. How do they vary? And it comes down pretty much continuously, right? And the enthalpy, this has to do with, I mentioned this concept at the very beginning of the course. All condensed faces where you have atoms that they go from the ideal gas case where we say that there are no interactions, as they start to approach each other, the reason why they approach each other is that they start to interact favorably. So the energy drops. And this drops relatively continuously until you get to the point where the atoms would start clashing into each other when you have a very dense state. And it's kind of featureless, smooth all the way. Entropy, how does that vary when you go from unfolded to multi-globular through native? And what does that drop come from? Exactly. When you get to the point where the both, well, primarily the backbone at some point, also the side chains, right? When you start out and being completely stretched out, yes, you have slightly less entropy when you've collapsed a bit, but there is still a lot of freedom, right? But at some point, you get the one where the chain starts to intersect with itself. Or even if you look at one Ramachandran diagram, when you've started to form an alpha helix or something, suddenly there are very few parts in the Ramachandran diagram that are accessible. And when you start to get to this point where you really collapsed, the amount of freedom you have is dropping astronomically. And that's why you get this entropy. It will drop all the way, but most of this drop will happen when you start to bump into the other parts of your own chain. Once you have started bumping into other parts of your own chain, the entropy is low. It will keep going down a little bit. But that's why you got to get this area. And when you paid that penalty, you paid most of the entropy drop. And the point here is that, taken individually, each of these are fairly boring or featureless or whatever you would call them. But the thing that will happen when you combine them, then you get exactly this effect that the energy goes down. The entropy goes down like that. And that will give us a free energy that looks something like that, where we actually have a transition barrier between the say, molten globular and the native state. So based on that, what is the main free energy barrier versus A-folding and B-unfolding? So based on that, can you say something about the time you would expect these things to take, the processes? I spoke a little bit about that on Friday, when I hinted that most chemical processes, they speed up if you increase the temperature, right? And unfolding is that type of process. Just increase the temperature, the warmer it is, the faster it will go. While folding is abnormal in the sense that it goes faster the colder it is. The problem though is at some point, you can't cool it enough because then things will start to freeze. In general, this will of course depend on the height of the barrier. But the key thing is that folding is, sorry, entropy is related to searching. So when things take a very long time, it's virtually always related to entropy, that it takes a long time to test all these things before you randomly find it out. And if you compare that to the unfolding barrier or something at exceptionally low temperature, if we for a second forget about the cold dead aturation, if the temperature was low enough, you could imagine that it would take a very long time to unfold too. But on the temperatures what we have, protein is working. And again, we also have cold unfolding in practice. These energy barriers are going to be relatively small compared to the amount of energy we're just adding. So searching takes time, which is of course related to Liventhal's paradox, right? So Liventhal's paradox is very much going to be about searching, not that we necessarily get a very high normal energy to get over it. There is no intuition, so that was an experimental result. So that's, we know that we can observe that experimentally, that folding, if you increase the temperature, the folding process is slower. And what we then did is that I used a number of equations that this is really, sorry, I didn't have, the fact that folding is slower when you increase the temperature is the reason why we derived that is really due to entropy. We got the temperature behavior from it. But it's not obvious. It's an exceptionally non-obvious result. But it is an experimental result, not my hand waving. So related to that, there are two parts we spoke about. We spoke about stabilization and kinetics of folding. So if we first go back to the stabilization part, when we spoke about these energy gaps, why is it important to have an energy gap in the first place? So that's one part of it. You don't necessarily want, you don't want to have a smooth landscape, so you gradually slide out of it. But you could of course imagine having many low energy states. What would happen if you had lots of low energy states? And we would lose the specificity up here, right? So it's related that we need proteins to be specific, or rather, we don't necessarily need proteins to be specific in the context of this course. But if proteins did not have a specific structure, they would not have a specific function, and then they would be fairly useless to the body. And then you can start to think, what would happen if this gap was too small? That you already answered, right? Then we could gradually slide out of it, or we would go up to the next gap just with kt units of energy. So it has to be appreciably larger than kt. What can happen if this gap is, say, sometimes too large? I don't ask that explicitly, but we did go through it on some slides. Give me one second, and we'll take your question. So that had to do with these things, right? That we need to be able to cross this energy gap or something, or at least the barrier can't be too large. We need to be able to cross this energy gap as the temperature goes down before we pretty much start moving, before we stop moving. So that eventually the protein will freeze because there are so many other interactions in the proteins. Fundar valves, electrostatics, and everything. And if they all start to find their interactions, so you're going to look more and more and more into the structure in place, and eventually we're going to be in this situation, I said, we have everything packed. The only problem is that it's packed in the wrong way. There is one thing I would like to do different, but I can't do that different now, because to get to fix this, I would suddenly, I would first need to destroy everything else, which of course will happen eventually just that it would be very costly. So such proteins would likely be weeded out in natural selection. You had a question? All right. Okay, good. So that it's kind of twofold here, right? That you need the energy gap to be stable, but you also need an energy gap that means that we can reach that state. And I'm well aware that that's a bit counter-contradictive even. Based on that, we made some arguments about the probabilities of random changes to folds. And do you remember what numbers I got you? So how accurate do you think that number is? To tell the truth, I don't know. I should have looked it up, but I haven't bothered doing that this far. The point is that it does really matter. I might be off by a factor of 1,000. So if it's 10 to the minus 8, it would be one protein and 100 million that fold. Let's for argument's sake say that I'm off by a factor of 1,000. Then it would be one sequence in 100,000. In practice, it's nothing. I can be off by four orders of magnitude, and then I could be off by five orders of magnitude, and it would still just be one random sequence in 1,000. If I'm off by more than a factor of 10,000, we might start to be in trouble. And that's a reason why I have showed you some of these mathematical differences. Don't be afraid of making assumptions. The way I got to the factor of 10 to the minus 8 is that I assumed that if the energy gap is going to be significantly larger than kT, let's pick 10 or 20 kT. And again, 20 is not really more accurate than 10, but the point is that it's not too kT. So based on that, if you ever need to create new proteins, don't try to randomly, you need the help of a computer. And even the computer, how will a computer be able to get a new fold to form? So is our molecular dynamics or bioinformatics full of the recognition algorithms so accurate that they managed to hit that one in, let's say, 10 to the power of 8 sequences? So why does it work then? Because sometimes it does work. We can occasionally build new proteins. It's still so rare that it tends to be very high impact paper when it happens. But yes, we cheat. So we look at what nature has done and we try hard to reuse what nature has done. And that comes back to many of the things we've seen, right? If there are beta sheets, and I showed you, there are two, three ways to stitch up beta sheets that are much more common than the others. So let's try to build folds that, sorry, let's try to build sequences that stabilize those folds. Same thing with alpha helices. There are rather than worrying about amino acids, maybe if you want something to be stable with a helix, say a Rosman fold, let's pick sequences that already form a Rosman fold, right? So you don't need to build your sequences on the individual amino acid level. You can reuse entire parts of proteins. You could even go further than that. You could reuse an entire protein. Because imagine you would have, like to have something, at least four helical bundles or something that should bind something, pick an entire four helical bundle. You already know that it's stable in a four helical bundle. And then let's try to introduce small mutations so that maybe I can at least change the binding site. That's 10% of the residues or 5% of the residues. While I will keep 95% of the residues, and hopefully those will still keep my entire fold stable. That's not obvious, because as we looked at what was that on Thursday, I think, occasionally these small defects can screw up an entire fold. But it's much, much better than trying completely from scratch and build an entire four helical bundle. So don't start by an entire structure. See if you can get things by just changing one amino acid. If that doesn't work, try to change two, three amino acids. It's only if you tried everything else and nothing works, then you might want to start using programs such as Rosetta or something and try to build a brand new structure. But then you should likely expect that that's going to take two, three people a year or something to do and lots of experimental trial and error sessions. Having said that, there are cases where this works. David Baker in particular, they've done some beautiful things where they've been able to build new enzymes. And that might not sound that sexy, but so they've taken a process that is slow and that there is no native, there is no known native enzyme that accelerates this process. And then they created a protein from scratch that accelerates, that's a catalyst that accelerates a process that no native protein has yet to catalyzed. And they're not insanely efficient for the point that they do work as enzymes. So we can in many cases, well in a few cases at least, we can do this biotech and really engineer specific enzymatic activity. What molecules are involved during in vivo protein synthesis? Ribosomes is one of them. What does the ribosome do? And where does that, where do those amino acids come from? tRNA. Ribosomes are important for another reason. There are very common target of say antibiotics and things. Because ribosomes, prokaryotic and eukaryotic ribosomes are a bit different. And if you can, if you can create drugs that destroy the process in the ribosome, but preferably only bacterial ones, then you would destroy the entire machinery that the bacterium need to produce its proteins. And then you're going to be able to kill the bacteria, but because eukaryotic enzymes, eukaryotic ribosomes, they have different properties. And if your drug does not bind the eukaryotic ribosomes, it will only kill the bacteria, but not the human. There are other molecules involved, which ones? Polymerase, which ones? RNA polymerase, which does what? Sorry, I think. Just in theory, it might have been right. But so there are two parts here. One, we need to copy DNA to DNA. And the other one is we need to read DNA by copying it to RNA. So there are two polymerases, which does what? Exactly. That's the transcription part, while the DNA part is just copying DNA to DNA. There is another fourth molecule that I didn't mention on Friday, but that we have talked about before. Not so much synthesis, but at least the biogenesis for membrane proteins, translocants, which are the ones that the ribosome attaches to the translocant to help it insert either in the membrane or really translocated and push it out into the cell. We also spoke a little bit about chaperones. Exactly. So what type of molecule is a chaperone? You think of these classes. Is this the catalyst or enzyme? Or how does it work? But in what way? So this is the catalyst? So not really right, because we use ATP. So that we need to feed it with energy. I also spoke a bit about these models. So first, why do we even bother with folding models? Why do we use, forget about folding, why do you use models in the first place? To simplify, but simplification per se doesn't really bias anything. And to, well, yes, sample, but the scientific goal of a model is what? Explain and understand, right? It's something that would be too complicated to understand if we looked at the entire protein, and therefore we try to understand it with some sort of conceptual thing. And the idea with models is that you should distill things down so that we only have the things that are common, not necessarily for all proteins, but for many proteins. So what we saw already up here is that we have both entropy and enthalpy, and they both drop. And we also said here, right, that the last part here, that the free energy is going to depend on the difference between them. So the height of this barrier, that's going to determine how long it takes to fold. It has to do with these reaction rates. And the problem, so what determines if this barrier is higher or low? Well, if it's, if it's very low, we don't have a problem, then we fold quickly, right? But Leventhal's paradox and everything said that folding would take an astronomically long time. Why would it take an astronomically long time? And how does that relate to this barrier? So searching takes a very long time, right? And that really has, searching is equivalent to the drop. If you have, if you have lots of freedom, then you have a high entropy, and then you need to search a very long time before you find a very small state, then you can have a very large drop in entropy. If we have a very large drop in entropy, this barrier is going to be high. So is there any way around that? We can't really not search, right? So why, that's good. Thank you actually, this is good. What I'm trying to answer, why do we need these models in the first place? And what I'm arguing is that the reason we need the model is that we kind of want to understand Leventhal's paradox. We want to understand why proteins fold, but before starting to look at the details, let's take a step back and really talk about things. Why do we need the models in the first place? So let's, so then I argue that the reason why it takes time is that we're going to have, they're going to be some sort of very high free energy barrier. And as Ellen said here, too, that the reason why this barrier is very high is that if we have a very large drop in the entropy, the searching here, if this drop is very high, this barrier is going to be very high. And we can freely, I can't choose not to search. That's not an alternative. I need to search and there will be a drop in entropy. So the only way I can make this barrier lower is what? But that we are not allowed to, we have to search. So what is it that compensates for this part? The energy, right? So that the only way to explain why this barrier is not astronomically high is that the energy goes down, too. But that can be great. Can't you just, the problem though is that it's not just enough for the energy to drop. Imagine that I had an energy that dropped like that instead. It's awesome. I have the energy dropping immediately. But the problem is that then the energy drops too fast. If you take that red curve minus this green curve, all I would have created, I would have the multi-globular be a very low free energy. Because I already got all the gain from the drop in energy. And then I'm going to be paying the barriers. I actually just made the barrier worse. The other part, if you had something that looked like that, right? But then you could, well, you could argue that all the, if all the energy here came from very specific cytane interactions, and it's not really until I have them perfectly packed, I'm going to get a good drop in energy. In that case, well, here there would not be any difference. So then I would end up with a very large barrier here. And somewhere here when I'm already in the native state, I would have an extremely deep native state. And then we would be in this problem with the energy gap. The energy gap would be too deep. Yes, it would be great when I am in the native state, but I would never find the native state because this barrier would be too high. So the only scenario where this barrier is going to be reasonable is if the enthalpy and the entropy drop by roughly the same amount. Roughly. If it's exactly the same amount, they would cancel, right? But they need to be so coupled that these barriers would correspond to the ballpark tank AT or so. So then, because we know proteins fold. So to explain this, that's why we, all we need to do is come up with one or several models that would explain how it might be possible for these things to drop. So we need to create some sort of fun, play around with things, toy with models and see can we come up with some model that would explain why the entropy and enthalpy would drop by roughly the same amount. That doesn't mean that those models are right. You could already, there are two things we could do here. First, you can of course go into the lab and we will do that for some of them and argue that some of them are much better than others. But occasionally it's quite, Einstein coined this term, gedanken experiment that I really love. And you don't need to make an experiment because all you need to show that in theory there is at least one way to do something. It doesn't mean that that's the way it happens because there could be an even better way. But it has to do with these things. If I want to prove that it's possible for me to get to the faculty club, all I need to do is walk there. I don't necessarily, there might be an even better way to work there, but that's details. That might sound horrible, but remember that the point here is not to describe exactly how hemoglobin falls. But for protein in general, there are ways by which they can fall. And then we came up with a couple of models. And the reason why we actually have, it's going to turn out that different proteins, they have exhibited a little bit of all of these models. And I'm particularly going to, I think I'm going to have one simple case today that shows the diffusion collision to be true in some cases. So what are they? What are the features of these models? Exactly. And what part, actually, what part would that solve? And why would, I would, I think I argued that is probably the least accurate of the three models we have. So what part, what barrier would that explain and solve? And conversely, why don't we think that that is such a great model? If you look at the very first line. And this is a process that's already very fast, right? So that it's, it's important, but it's, it's kind of focusing on a barrier that's not the most important barrier. And it wouldn't really explain why, how can we get across the things where it takes so long to form the secondary structure elements and the detailed packing? So for proteins in general or something, it does describe the collapse to multi globular quite well and how a random chain would collapse to this. So what have you, you are losing, you are losing entropy at an astronomical rate. But by doing so, you get rid of the entropy loss that was related to the waters, facing hydrophobic regions and everything. But we don't have that much good enthalpy interaction yet. So you should be aware of it. But it's at least in my, according to me, it's by far the least accurate of the three. Diffusion collision. And in what case do you think this could be a good model? So we definitely know in some cases, like for instance, that individual alpha helices can form very quickly. There are, there are examples of that, but that, that's just one helix. That's not a protein. And some beta sheets are slow to fold, but other, other beta sheets can be quite fast to fold. So can you imagine the type of protein where this would be a reasonable model to describe it? A large or small one? So if you can, at least you can probably imagine, imagine that we had a small protein that consisted of two, three helices or sheets, a few secondary structure elements, right? And if it's just a matter of, they will form quickly. And if they then pack in the right way, they will bump, bump into each other and try a few different ways and then we find the right way. So for a small protein, this might actually work quite well. The problem is for a large, as the protein becomes larger, the stability of one secondary structure element is going to depend on the stability of adjacent secondary structure elements. And you start having very large sheets. We might remember that I said that sheets can be non-local, right? So that you can have one strand and then there are lots of helices and other things. And then you come in and have the second strand. So in a sequence space, those strands are going to be very far away from each other. And then it's at least in, well, at least to me, I find it a bit more difficult to explain that with diffusion collision. So what did this third model, nucleation condensation say? Right. And a place is key, right? Not one secondary structure element, but one region in space. Or rather, rather than region in space, we could even say contact between a few key residues. And that should of course be the same. If I take the same protein, fold it again. If I could stop the camera and look at the transition state, I would hope to see the same handful of residues involved in that transition state all the time. So in this model, what is that determines how long it takes for the protein to fold? So I would, the key residues need to find each other, right? And how much free energy does it cost when I'm starting to grow this transition state? Because as I've started, as this has grown enough, I'm going to come over some sort of free energy barrier and then it's downhill. So in this case, I would need to understand the free energy barrier of that folding nucleus. And so the nucleation has to do with forming it. And condensation has to do with more residues condensing on top of it. And we usually use condensing with water. But in terms of physics, nucleation is any type of area or reaction, a core that starts to form. And condensation is there when we expand it. So the point is that the names of these three models, they basically, it's trivial. The names of the model basically say what they are. So what's this Amphinsen versus Leventhal? Infinity or whatever. And what was that about? In order to find the names, they just have to, but the searching is to, there's too much searching to do with it. But then I think Amphinsen said that it doesn't, that that's not the problem, the problem is would this happen in time? We'll swap them around. So Amphinsen, Amphinsen's ideas was only about the free energy, right? That the native state is the global minimum of free energy, which we now know is not necessarily true for large proteins, but for small proteins, this still holds true. And that's not just my hand waving. One of the ways we've been able to show that we can actually simulate the folding of small proteins, there is no way we could do that if there was a magic cell function involved. But that also means Amphinsen really only deal with things that happen at thermodynamic equilibrium up to infinite time. We do not worry about the transition barriers. While Leventhal's argument was very much focused on what things can happen in practice, and that had to do with searching time, that it would take too long time to search. And the argument then is that just as I argued here, right, the only way we can get over Leventhal's paradox is the enthalpy and entropy drop roughly at the same time. So we need, there has to be some sort of guided pathways, pathways along which we can take that will explain this that had to do with the folding models. And we occasionally call these folding funnels. I think that 15 years ago that was a very popular concept. So the folding funnel would really mean the pathway, the guided way, say, along the nucleation condensation model that this specific protein would roughly always fold. So is that true? Well, for now it's just a model, right? That might or might not be true. So what we're going to try to do today is look a little bit more about the models, in particular the nucleation condensation and these chevron plots and see can we prove experimentally whether these model actually has anything to do with reality. So what we're going to be speaking about is more about the folding rates. I will deliberately repeat both the arenas and their chevron plots a little bit. There might be some slides here. I decide to skip. And then we will start approach this question. How can we study the transition state? Because that's really, if I understand the transition state, if I can understand the free energy around it, then we're good or not. It might also turn out that the free energy, if this is so bad that Leventhal's paradox can't be explained by it. But as you can probably guess, I wouldn't be doing this lecture if we couldn't explain it. We're going to talk a little bit about these folding funnels and we're really going to crack Leventhal's paradox today, which I think is great fun. And then we're going to speak a little bit about these energy gaps are not quite as simple as they were on Friday, because it's also going to be a relation here. Proteins, it's not enough that they fold. They also need to fold fast. And in some cases, if we have energy, not just the energy gaps, but if the energy levels are in the wrong place, that can lead to very slow folding because you misfold first. And then we're going to look a little bit about the temperature of folding kinetics yet. So already on Friday, I spoke a bit about different states, right? And rather than worrying about that, I will show you a slide and then let's talk a little bit about these states. So what are the different states? A, B, C, D and E here. And how would you characterize? So I'll be nasty and pick the easiest one. There's a N state, sorry, um, coil or something. The fully stretched out random coil is up there. So that's a very high free energy state that we don't like. Now you can't pick that one. The quicker you are, the easier time you're going to have. We're going to need five different people to answer this. A, B, C, D and D. A is some sort of, okay, I'll buy that. A is a multi-globular because it's just downhill from the extended state, right? Anybody else want to pick some easy state? We can go to the other end. Yes? E sounds like a native state. That's great. Yes. And then it starts to get a bit more difficult. D is some sort of transition state to a folded state, but we also have B. What would be B? And what would C be here? An intermediate, a folding intermediate. And B would be some sort of other. So B would be a first transition state to the folding intermediate. And D would be the second transition state between the folding intermediate. And in general, even this is a extreme simplification. In general, as a protein starts to fold, there are going to be some states that are not transition states, but they are not the most stable states. But the protein is reasonably happy here while it's searching. And those would be the type of folding intermediates in C. Can we observe C in an experiment? We can. And the reason why we can is, at first, I have said nothing on time here. So if you have a slow membrane protein that takes 10 minutes to fold, C might have a lifetime of a minute. Or it might have a lifetime of a microsecond. But the point is that it has a lifetime. And it is going to be stable over a microsecond. And if it's a microsecond, you're going to need some very advanced technologies, but whatever, laser spectroscopy or something, there are femtosecond time resolution experiments today. So in theory, because it is a local minimum, we can observe it. It might be hard, but we can't. On the other hand, B or D, we can never observe them, right? Because there is no time scale over which they are stable. They are always balancing on the edge of an eye. The second we reach B, we're going to fall down either to A or C. And similarly with D. But what it's going to do, so C might very well determine how fast, it might help the process or something, but what really determines how fast this is going to be particular D. Why? Exactly. And I was sorry, I could even modify that. I would say it's the difference between C and D, right? Because that's the barrier we're going to need to get over. This is also a barrier, but it's much lower, so it will be much faster. Is this true for a protein in general? Is there only an intermediate state or are there more intermediate states? Or can you imagine proteins where there are no intermediate states? It might. So there are certainly cases, as I said, very large, complicated structures where we know there are intermediates. But there might be proteins more than enough that we can kind of ignore them. Because in practice, anytime we have a hydrogen bond, realistically, there are going to be intermediates. The question is, are these intermediates so short-lived that we can kind of ignore them, at least in our model? And this is a very open question. So what do we do if we don't really know? Well, one way is that you can just assume and create a simple, again, as I said, don't be afraid of assuming. You need to remember that you assume something so that we might have screwed things up a bit. But we can assume, we can see where this leads to, and that will take long. So I'm even going to be radically more simple. Let's assume that folding is a simple two-state process. So then I could get rid of that intermediate states. So I would say that you have some sort of unfolded state here. And for now, let's not care about whether it's coiler or multi-globular. So there is some state one where we are not folded. There is a state three where we are folded, and then that has slightly lower free entity. And there is some sort of intermediate transition state two here that we need to get over where we fold. The reason for doing this is that then there are only three energies that we need to worry about. And we can also simplify it even further. Let's just assume that the difference between them is so large. So once you fold it, we're not really going to unfold. Which is a bit of an approximation, but bear with me for now. You can show that with this type of process. And if each molecule, the probability of a molecule going over, that had to do with exactly this kinetics and the rates we looked at before, right? So that, I think, is that the probability of a molecule going over, it's going to be related to an exponential and then the time. And then there was some sort of complicated time constants here that had to do with the time it would take from a fundamental process and the free energy and everything. But you know what? Let's not worry about the specific numbers. We can just say that there is some sort of tau there, which is the time constant by which it happens. And from that time constant, that would include all these other constants we had, but I don't care exactly what they are now. So if this says that the probability that we are not yet folded, that's going to decrease exponentially as things go over the barrier. And the probability that we are folded then, that's going to be, well either we just said that there are only two states that are stable and that transition state. So that the likelihood of being folded by definition is going to be one minus that term. So time goes to infinity because I assume that things don't fall back. As time goes to infinity, the probability that I will be folded is going to approach one. And the probability that I'm not folded will approach zero. And conversely, at very short times, the probability of not being folded is one and the probability of being folded is zero. But if this is something that applies to very small proteins, in theory we should be able to test this. And we have. And these are simulations we run in a large particle folding at home a few years ago. I think I'm going to have a movie on the next slide here. This small protein is a toy protein called BBA5. And this is how creative scientists are. It's a protein that contains two beta streets and an alpha helix. It's beta, beta alpha. You can imagine how many other tests they did before this one. It's a Barbara Imperioli. They've designed small proteins to be very stable so that they would fold very quick. And they did a whole sequence of this. And this is one of them that is stable and we've also done quite a few simulations on. So what you're going to see here, we have a small protein and we have water on. I might even have shown you this movie before. So what's going to happen relatively quickly in this case is that after 40 nanoseconds or so, so first we've formed the helix there right. And then we're gradually forming the sheet here. But the sheet is still packed in the wrong way. And then it's going to unfold and pack in the right way. So they unfold it and let's turn the sheet around. And then we're going to be happy. So in this case, we can almost guess that this looks more like diffusion collision. That the secondary structure elements appear to form roughly independently. I'll come back to that in a second. Today, this is probably something we could fold in a computer. But it would take forever. Forget about that time scale. This protein takes in the ballpark of 10 microseconds or a few, well no, a few microseconds at least to fold four or five microseconds. But the only, if a protein were to say that the time constant of folding is four microseconds, that would mean that that time would be in the ballpark of four or five microseconds. So if you simulate this protein for four microseconds, that just means that yeah, on average say two thirds of the, of it, there's the probability that we'll have folded after that time. It's e to the minus one, which is roughly two thirds. Now that's not, well, sorry, one minus, the e to the minus one is what is 0.37. This is one minus that. Two thirds is not that good stakes, right? So to get 100% of it, well, you would never get 100%. To get 90% or something, we would probably need to fold it for 10, 15 microseconds, and then I would still just see one event. So that's very inefficient. But one advantage of these simple processes, it's just two stages. I can approximate it with this equation. And I've also said that each molecule is independent of the other molecules. If we just draw that, sorry, this shouldn't be black here. This is time. And time here goes out to 50 microseconds. And this is the probability that we've folded, and here is one. So after 40, 50 microseconds here, I would be at 95%. Then I, it's almost guaranteed that one protein has folded. But of course, if you go to shorter times, the likelihood that we folded is smaller and smaller and smaller. And if we're down here at say four, yeah, it's going to be 0.37 or so, which is not that good. If you do an experiment and have it resolved once, how much are you going to believe it? It's just one result to start with, right? It's, you would need to do it 10 times. And then we're at 500 microseconds expensive. But if we look at it down here, and I can't even draw it that small, if you magnify the part, look at how the red curve is going here, that at very small numbers, we can approximate the red curve by its derivative. And that had to do with these simple approximations that we already used in physics. And if we look at this not over a scale of microseconds, but here's 0 to 10 nanoseconds, do you see how good the approximation of a line is? So at small, at small times, the likelihood that you're folded is going to go from 0, in this case from 0 to 0.001, 0.1%. So if I simulate this for 10 nanoseconds, it's 0.1% likelihood that I will see it folded. I'm not sure about you, but I don't like those odds. But this is independent. And 10 nanoseconds, so doing something at 50 microseconds, that would require a very large supercomputer. 10 nanoseconds, I can run that on my laptop. So instead of trying to run one very large simulation, what if you run, say, 10,000 very short simulations? So the probability here is roughly one in 1,000. If you run 10,000 such experiments, we would expect to see roughly 10 events. And they are all independent of each other. So what we started to do at Stanford about 15 years ago in the Japanist group is really to use screensavers all over the world and have hundreds of thousands of them participate around small simulations. And this, before that, nobody had bothered doing that because it's one computer, 10 nanoseconds. We need 50 microseconds. But this way we could kind of decouple things. So instead of having them connected together in a supercomputer, let 100,000 computers work on the problem independently. The only problem is that this will only work if my approximation I made in the last slide is true, right? That we can approximate it as a first-order transition, that they are independent of each other and everything, which is, of course, not strictly true. But the point is, sorry, the point is that it works. And that's a structure we got. And this, we've done larger proteins since, I think, this group has done this up to 60, 70 amino acids or so. The other thing that we realized at the time that there are other pieces of hardware. So this small chip is a processor from IBM called CEL, which was part of PlayStation 3. So IBM, even, as well as IBM, Sony, the processors from IBM, Sony made the PlayStation 3. So Sony even helped to support this code to the PlayStation 3. So we have tons of gamers all over the world that contribute to time on their playstations. It's kind of fun because this is the only scientific program that was ever allowed to run on the PlayStation 3. So we even had it signed by Sony. The other thing that happened is not just the PlayStation. What also happened is that you get all these graphics processors or GPUs, and they're wonderful in your computer. The only problem, you only have one of them. Now we have supercomputers that have thousands of these processors, but because we could rely on having one simulation run without talking to the others, we could use all this hardware. So we get hardware that we could use all types of home computer hardware from gamers all over the world, and I think Sony were even nice enough to design a screensaver. I'm not even sure whether my kids don't use their PS3 anymore. They got a PS4. Sorry, that's too old for my 12-year-old son. We have one in the lab though, it's been a long time around this screensaver. And the cool thing is that it works. We've even had users in the US who were building special rigs, atlasfolding.com. So he basically built some 10 motherboards and connected four GPUs in each of them, contributing power and cooling to this. The Folding at Home project is still around. I would say it's still an awesome and important project. One challenge is that modern computers have gotten better and better at energy saving. So this used to be a free launch for us when everybody had a desktop at home, and these desktops were just running 24-7, and they were not powering down anyway, and then they were happy to donate the cycles. The problem today is of course most people have laptops, and even the desktops, if you have an iMac, that will go down in power save mode, so that there are, it hasn't grown quite that much. But on the other hand, what has happened is that you're having this cloud, the Amazon Elastic compute cloud, even most supercomputers in everything have started to work much more in this way. And my point is that even simulations and everything today has much more to do with sampling. The largest computers in the world in a few years are going to have in the ballpark of 1 million compute units and GPUs, and then you're going to need to use techniques like this. But what I haven't showed you is whether this worked, and this is what we saw. So remember that with 10,000 simulations, one simulation is just noise, right? But if I, let's take the average of these simulations, if I stop the clock after one nanosecond here, and then I look, how many of my 10,000 simulations have folded? Oh, shit, zero. That's bad. Because again, I would expect my model said that we should start at zero, and then we should go up linearly, right? But after one nanosecond, not a single one had folded. But of course, we don't necessarily run all of these pair. We collect the simulations over a few weeks, and then say, after five nanoseconds, not a single one had folded. But out of these 10,000 simulations, so far, they did not run all at the exact time, right? But the way you look at this is that if you look at all the simulations, suddenly you're like, wait a second, huh, funny. There's kind of one simulation that folded in 10 nanoseconds, just one in 10,000. That could, of course, be a freak simulation. It could be noise. It's just one outlier sample. But then you say, after 20 nanoseconds, there was something like 0.3% of them that had folded. So the point is that the probability that we have folded will go up. And this slope effectively describes how many proteins have folded, the fraction of proteins that has folded as a function of time, right? And if you trust me a bit here, it's not the world's best fit here because the standard errors are fairly large. But it definitely goes up and it's not completely unrealistic to argue that it's aligned. So is that what we predicted? Not quite right. This red arrow, there's something wrong here. So that it's, and the argument you could make here that maybe it takes in the ballpark of 10 nanoseconds to actually cross that barrier because you will, the atoms will have to move in place. They don't, they can't move infinitely fast. And the way to describe, another way to describe this, of course, we probably have intermediate states as we are crossing the barrier, right? But it's still, it's a decent model. Fundamentally, it does work that each molecule appears to cross it independently of each other. It goes up roughly nearly. So most simulations would not fold, but they would fold, a few of them would fold, and I can, I can even calculate the rate here that, that being the ballpark of five microsecond folding time. And the experimental, I think it's seven or so. Which given how insanely simple this model is, I'm kind of amazed that it works so well. So does that mean that the diffusion collision is true? How would you assess that? So what was the key idea of the diffusion collision model? There was some independent part. And if you know a little bit of mathematical statistics, we can test that. If two probabilities are independent, what is the likelihood of both of them occurring? Right. So if the probability of me winning the lottery today is 1% and the probability of me, say, whatever, now I can't come out there, getting a grant next week, they are likely not dependent on each other, right? And then I should multiply the probabilities. On the other hand, if the probability of both of them occurring in hindsight is not the same thing as the product of them, then there was a dependence. And the meeting with the simulation, I can calculate this because I, I can check, is the helix, sorry, is the helix present? Is the sheet present? And then of course I can check, is both the helix and the sheet present in each of these simulations? So that what is the probability of both the helix and sheet being present in that ton of simulations? And say that what is the probability of the helix being present? I could not care less about the sheet. And what is the probability of the sheet being present if we don't care about the helix? And then we compare them. And then we get pretty much a perfect fit, which means that they appear to be independent. So this protein folds according to a fusion collision model. So the other thing that you can then do, and again I'm just showing this, it's not necessarily that BBA5 is important, but I'm trying to show you the power of simple models, right? Another thing that has perplexed the entire field for a long time, that how important is water for proteins? We know that they can't fold without the water, but as you're folding is it somehow, is there a structure in the water too as we're folding? Is the water helping the protein to fold? What is this thing that we said at the end? There should not be any, we don't think that there should be any water molecules in the final structure, right? But is it, are we, are the water molecules first leaving or are they leaving while we are making the transition or something? So the neat thing with the simulation, so those are just the toy protein, we can start to answer questions like this. So what we can do is that I stop the clock in a bunch of these simulations, and then I remove all the solvent water, and then I add back new random solvent water, because if there was some important structure in the water that helped the protein to fold, I just destroyed that. And then I should have a much lower, if you were on the way, you were about to fold because of the water, and I now destroyed the water. Then I should, the probability of folding when I've added back the new water is that should be much lower. And here too, it doesn't really change. So if I stop the clock and replace the water structure, if you were just on your way to cross this barrier, you're still not going to cross it. So effectively I kind of have the path, this is almost the pathway as I go from the molten globular to the folded state, and I can now fix states all along this pathway and see would you still continue along the path if I change something. So here we can say that, again, a simple computer simulator said, well, there's likely not that much specific information in the water for this particular protein. Imagine, it's a very simplification. You can also try to remove the water completely. Why would you do that? Simulate the vacuum? That seems stupid. We know that proteins can't fold without water in general. So why on earth would you simulate that? There are only two possible outcomes, right? Either you reproduce the thing we know is true from experiments or we don't. Now what does that tell you if we don't reproduce what is known from experiments? But we already know that from experiments. So why would we want to simulate it? Yes, I'm probably already, remember, our simulation is a model, right? And as we've created the model, we've made a ton of simplifications. And there is always this, you should always have this nagging feeling about, what if I simplify too much? And the way to check that you didn't simplify too much is make sure that you can reproduce things that you know are true. This is a fundamental concept that, in particular, experimentalists are very good at, and you need to remember this too, and you call them controls, both positive controls and negative controls. And you see that in any scientific paper and experiment, you're also going to see that they have a positive control and a negative control. Because what, again, if we simulate this protein under conditions where it should not fold, make sure that it doesn't fold. And if we do something that we do expect to happen, make sure that that does happen. Because if your positive and negative controls fail, right, there is something wrong in your model. And a model is not, you can forget about everything in this course, even sequencing. If you have a, if you were testing some sort, I'm not sure. Hypothesis about mutation in a cancer tumor, and you're going to see whether a new specific drug you can help. And then there are four mutations, and you know that they have nothing to do with the disease. And if your model predicts that they're going to be the ones explaining your disease, there is something wrong with your model. So always, whenever you have a chance, try to use controls, both positive and negative ones. There's basically a reality check that your model didn't throw out the baby with the water. Which again, it happens a lot in experiments too. So what do you do if your positive and negative controls fail? So first I would redo the experiment, right? Because things can go wrong. I think in particular experiments, I would certainly redo the simulation too, but the simulations are not easier in the way that you can control them. So that if you do a mistake in the simulation, you're going to do exactly the same mistake next time. They're reproducible. The experiment, well, we work with frog eggs, and occasionally you're unlucky that that particular frog was not happy so that the frog eggs didn't express the protein. So you don't necessarily throw out everything because there is one failed experiment. You redo the experiment two, three times just to make sure that the fluctuation wasn't in the experiment. But of course, long-term, if your results are not compatible with the experiment, you have to understand why. Either there is something in the experiment you didn't account for or there is something in the model you didn't account for. The experiment is always correct under the assumptions, under the conditions where it was made. But you might, of course, not have thought about something. You might, you didn't consider that you had a contamination in the lab, but you did. The other thing you can do here, we can actually study, the problem is that we can study exactly what happens as we're crossing this barrier and moving into the folded state, which we could never do in the lab. Because this is literally the transition state, right? But the problem is that we're going to have at least 10 simulations and they all cross this transition state at different points in time. So what we do here is that we slide the X-scales so that the zero point here would be exactly when we say that the protein is folded, that it has reached a low RMSD to the structure. And there you can look here that, well, as far as zero to one here, this is just an arbitrary scale for a bunch of criteria. So that when this blue curve here is zero, that means that we are far away from the folded state. And when the blue curve is one, it means that the R root means square displacement of the atoms has reached the folded states. And you can see here as the red curve here would be the radius of gyration, meaning how compact the protein is. When this goes to one, I'm not literally measuring the radius, but I say one here means that it has the same radius as in the folded state. Zero would be in the unfolded state. So that the protein radius goes down as exactly the space basis that I'm really putting all the atoms in place. And the one thing that comes a little bit after is the solvent density on the inside of the protein. So what appears to happen here is that as the protein is being folded, as the side chains are finding each other, the very end game, the last thing that happens is that we push out the handful of waters that were occupying the core. But again, this is just for one protein. Sadly, this only works for very small proteins. Even if folding at home, we start to see that it's not strictly true. And then you get things that are very long time scales where we can't really study things that well. So I'm going to, it's 1020. I will take 10 more minutes and then give you a break. And I'm just going to rehash a little bit we said about these experimental folding rates. What we would like is to be able to have some sort of time resolved things ideally, but the time resolved things as a management that we can only use to observe things that are metastable. If I have a state that's stable in the ballpark of a millisecond, if I can use this stopped flow experiment or something, I can find what happens roughly after one or a hundred milliseconds, but I can't find the transition states. And to understand the transition states, we instead had to look at these energy barriers. So what is the energy barrier if we go from unfolded over the barrier to folded or vice versa? And then I also said on Friday that because we are in this building, one very simple way of doing this are these arenas plots. And the reason why you do a plot like that is again, sorry, I'm going to need to go back is that you look at something. You have something here that is a function of constants. We don't care about constants. And then an exponential of something that is a function of one, a constant here that I would like to find out divided by temperature. So the way to get rid in a modern computer, of course, you could you could you could fit this with the next arbitrary exponential function. But that is difficult to see and it's difficult to see with your eyes how good the fit is. So it's much easier to always try to translate things so that you would have a linear curve. And the opposite function of the exponential would be the logarithm, right? So if I take the logarithm of K here, and I would have a logarithm of K on the left side, the logarithm of the exponential, well, that disappears. And I don't care about the constant here, that's just going to be a constant in the curve. And those are so the logarithm of K equals minus a constant, we can include R in that constant to divided by T and they receive in a minus sign. And that's exactly where so the logarithm of K, if we plot that as a function of one over T, you're going to get these things that are straight lines. So these lines, the slope of the line corresponds to the difference in free energy divided by R. And that works awesome in some cases. The only problem is that in particularly here in the midpoint where we have both folding and unfolding, it's very difficult for us to separate the folding from the unfolding process because they're going to be mixed up. Some proteins will go from left to right and others going to go from right to left. So they will fold both back and forth. And unless you have some way of removing the protein that has already folded, it's going to be virtually impossible to measure. So, sorry, these slides I will skip. So instead, we went in to study these apparent folding rates. And the advantage of the apparent folding rate, so that's the effective rate, how fast does the entire process appear to happen. So I would like to determine this K the effective K, the rate constant going from unfolded to folding. But I need to do so accounting for the fact that some things also go back. And the way I did that is that I need to look at the entire equilibrium constant between folded and unfolded. And that's going to be the quotient of how much is going from unfolded to folded versus from folded to unfolded. And at infinity, that's going to correspond to the difference in free energy between them and it's also going to be related to the number of molecules I have in each state. And then we did a bunch of math that you can follow the details in the book if you want to. But the whole point is that I want to extract this time constant. How does the number of folded molecules vary or the unfolded ones vary as a number of time? And then I need to look at the change in those molecules of functional time, which is then the sum of folding and unfolding. I need to subtract the ones that's unfolding. And then I use the top equation because again, I don't know exactly what both MA and NBS. And after a bit of work here, you end up with a partial differential equation. And then you see that the number of molecules here is proportional to a constant that we don't really care about. And then it's an exponential. And just as a bit of a freak result, it turns out that the proportionality constant in this exponential is actually the sum of both those constants. Again, it's a bit counterintuitive. But that means that the apparent rate, the total rate of folding, is actually the sum of rates both folding and unfolding. I don't expect you to be able to prove that. My only reason for showing it is that I don't want to drop the results while saying that just trust me. And you can, it's not hard. You can spend 10 minutes if you want to go through it in the book. And what that then led us to, which is the final thing I'm going to say before the break, is these Chevron plots. So the Chevron kind of, they look very similar to the Arrhenius plots. And the same thing I do, something as a function of the thing that, in this case, is denaturation. And I could, of course, have temperature denaturation too. But this effective rate now measures both the folding and unfolding rates. So one end of the year at very low denaturant is going to be dominated by the folding and at very high denaturant or temperature concentration, it's going to be dominated by the unfolding. And then we have some sort of midpoint here, right, when we have a bit of both. And that's why we don't get down to the crossing point. These are, I wouldn't say trivial, but they're straightforward to measure in the lab. So if I give you a new sequence, you can hopefully in a day or so measure this denaturation and calculate how fast it goes in both parts. And when you've measured that for one mutant, I'm going to give you a second mutant, which is another day. And a third mutant, and a fourth mutant, and a 593rd mutant. And somewhere there you're going to have a piece of steel on the road, but we collect tons of these curves. And the point is that for different mutants, we're going to end up with these curves being shifted a bit up, down, left, right. And depending on how these curves have shifted, what we're going to look into after the break is that based on those shifts, I can draw conclusions about what happened to the free-ended barriers, both folded to unfolded and unfolded to fold it, which is going to help us identify the transitions next. But it's 1026 now. Let's meet here at 11 sharp and it will continue. Let's get started again. We have one more hour, 20 slides. The Chevron plots that we're going to spend a little bit of time working on. This is not hard, but it requires a bit of thinking what we can get out of all these curves. And just as I mentioned, when it comes to understanding a process, you need to consider all the components of the process. So this is practical to understand. We want to use these to understand what parts of the protein form the transition state. And the only way we can understand that is by doing mutations and start looking at changes. And the challenge here now is not necessarily exceptionally complicated, but there are many things changing. You have a wild type residue and you have a mutated residue. For each of these, you have an unfolded state and a folded state and a transition barrier. And that means that we have processes both from unfolded to folded and from folded to unfolded, both for the wild type. So there are really four things we need to think about here. And that's what makes it a bit more cumbersome, but it's not exceptionally advanced math. So the beautiful thing here, if we have a green curve, sorry, it's a book's colors, green and red is a really stupid color, choice given that 7% of males are color blind. Green curve here is the wild type. And if we forget about the red curve for a second, there is some sort of halfway point here when if you extrapolate unfolding to native and native to unfolded, well, you don't even have to extrapolate it, right? But exactly when they are the same speed here, so that crossover point. Let's call that some sort of C0, because that should be the zero concentration of the natural temperature or whatever it is. That is exactly the halfway point. For the wild type, that corresponds to the transition state. You're at the edge of the knife and you're equally valid to fall down in both directions. The only problem is that that doesn't bias anything. It just says that at this point, there are just as many molecules folding as unfolding, but we don't know anything about the state. And then I introduce one mutation in a randomly chosen location, which is not so randomly chosen, because if I pick one randomly, I will likely not have any effect at all. But sooner or later, there will be some slot in this, some position in this molecule. And if I mutate that residue, things will change. And here we have one where it changes. So what has happened now is that the entire curve has moved a bit to the left and maybe a bit up. And now there are a couple of things to think about here. So I would like to know what I really would like to know is whether things are part of the transition state or not. But it's a bit of a jump. So we're going to need to take this in a few steps. Let's pick these one first. If we are moving from the unfolded to the native state, and how have we changed the balance between the unfolded and the native state? So that should be at this zero point, right? Exactly halfway. Because we're interested in the wild type. The mutant is just an experiment. I do to what happens. And if I'm interested in the unfolded to native process, that's the left side of these. So on the one I have a slope in the green one and I have a slope in the red one. And each of these corresponded to the logarithm of this rate, right? And the reason why I have logarithms of the rate is that the rate was proportional. Sorry, it says equal, but it should be proportional to an exponential of minus the free end of the barrier divided by RT or KT. So the logarithm there should corresponds to some delta F's divided by KT. And if this has to do with unfolded to native, I think, yes, let's see. Oh yes, I even have it. So that's going to be, if I go from unfolded to native, what determines how fast that is, that is going to be how quickly I can get over that barrier, right? If I'm going from unfolded to native, what matters is how much energy I need to get across that barrier. And that is the simple old stuff. The only problem here, sorry, that is not a problem at all. So this is the right hand side is something I measure in this diagram. And that means that I can deduce the free energy difference, the barrier I need to get over. The only problem is that that's not enough. Because I would like to know what could have happened here is that there are two things that can happen. There are two things that can change this barrier. What is that can alter that barrier? It's not a trick question. It says on the left hand side. It's not more difficult than that. Right. So then, of course, if this has changed the transition state, bingo, that is exactly what I was looking for. The only problem is that if I just introduce a mutation, you don't know that it changed f hash, right? It might have changed f u instead. So if this barrier is lower, it could be because either we reduce the free energy of the transition state or I increase the free energy of the unfolded state. So this can think of this as an equation. I know the difference, but I'm only into what I was interested in is f sharp, not f u. And the obvious way to get f u. So now I compared the transition to the unfolded, but I'm also going to need to compare the unfolded to the folded. Because that will tell me whether I did I only change the transition barrier or did I change the unfolded state. And that is a bit more complicated because now it's not just I need to check how easy was it to go from unfolded to folded versus folded to unfolded. So what I'm really interested in is the difference between the free energy of the native state and the free energy of the unfolded state. And again, I just used that equation again. So that's going to be minus RT delta ln unfolded to native plus RT ln delta k native to unfolded because there are processes going in both directions. And according to the logarithm laws, I'm allowed to take a quotient of those two and put the the logarithm of the quotient instead of the sum of the logarithms, right? A difference, really. So that the difference in free end between those two is going to be a the logarithm of the quotient of these two k unfolded native and native to unfolded. And you can actually get this if you take the red curve then and look at how much here we have the unfolded native to unfolded and there we have the unfolded to native. And I want to see how different are those processes at this midpoint. So one of them is at the midpoint we had it there, right? And the other one extrapolated at the midpoint would be there. So the difference between those two would tell me what is the difference in free energy between native and unfolded. And this is the piece de resistance. So if I now compare these, on the one hand we have the because here too I don't know whether it was the native or the unfolded states we changed. You couldn't imagine that maybe I changed the native state instead. So let's look at the quotient of these two. So that is a free energy divided by free energy that's just going to be a dimension less number. If whatever I did in my residue, if this did not change anything at all here well then it's going to be zero. And if it's zero I don't really care what you did to the native state or anything you did not change the free energy barrier at all. It's not going to influence the rate by which we fold it at all. So then it's definitely not part of the transition state. If this number starts to be very large say one or something what I'm saying here that this residue influenced both the transition state and the native state. And the important thing is that it influenced the transition state. But if I just influence the unfolded state then I'm not going to get the difference. So these five values that they were introduced by Alan first the point of them is that all I need to do is I need to ask you no sir you're not going to do this yourself you ask your students of course. And you ask your students though I need 20 more chevron plots so try all these mutants for this protein and they ask the next student you two try 20 more next week. And I'm not joking but I at one point I saw that Alan first has made a gigantic carrier for a protein called barnace and at some point it was at a conference I was I was a few years older than you and then you realize that there is a plot he's showing and there are like 10 phd thesis in the plot because each phd thesis consists of adding a few more hundred points which might sound horrible but we've learned so much about protein folding from this. So with these five values tell you is that if they are high they can technically they can even be above one but that's very rare. That's that really means that you were part of the transition state. While if it's close to seven well you might have influenced the stability either because you improved the stability of the unfolded state or because you reduced the stability of sorry improved the stability of the native state or reduced the stability of the unfolded state but that is just stability it's not the kinetics of folding. So with the five values five values are one reading is that whatever rescue this was this was part of the transition state part of the state that determined the speed by which folding happened. It's not that the other rescue are unimportant the other rescue are still just as important for the stability of the protein right but they were not the rest used that control the rate of protein folding the rate of formation and this is a central concept that you see throughout biology and well biochemistry in particular five values of proteins. Normally we totally talk about them in the context of protein folding I guess you could study this as say sort of transitions between two states or something. So what Alan did is that he spent a large part of his career studying a small protein called barnace and for this protein they then have been able to map out what specifically were the rest used that are the first ones to form the transition state. So Alan the combination five value analysis you should ring a bell and you should say Alan first so it's a bacterial ribonuclease if I recall correctly which cuts RNA into smaller pieces and it has a friend protein hill called bar star and they bind to each other to say it's also an it's a traditional example of a very tight protein-protein interaction tight finding and everything that's a classical protein used by several groups not just Alan but this can be used for more Alan's work was very much about understanding protein folding in general but this can be used for a whole bunch of other examples let's see I think I have transition states for folding of outer membrane proteins this OMP remember that I said that there were these other classes of non-alpha helical membrane proteins and then we didn't really understand how or why they fold because they can't be asserted through the translocal. Talk about a difficult transition state right we don't even know where the process is happening and by studying this you can then study them and map out what are the residues and this beta sheets that are responsible which are the parts that form first and then you make chevron plots after chevron plot after chevron plot then we map out to see that there is a small band right in the middle of the sheets here that start to fold and that's what has given produced this model is that you're having the entire chain almost condensed together it's lying interfacially you're gradually having the secondary structure elements formed and it's pretty much pushed together and then the entire beta barrel inserts in the membrane and gradually expands so there is no way we don't have any structures of this they are not intermediate states in the traditional sense but by using 5-n analysis we can understand what there is no way we can see the structure right so all we've done here is that we've taken the final structure the stable one and then I've collared things that those that interaction will have to happen first but exactly how this state looks like we don't know we can guess roughly that again if this has to be in contact there has to be some sort of beta sheet formed there but exactly how the loops and other things look like we have no idea yep so this this has to do with the five value that goes from one to minus two in this case because there are some residues that one would mean that it's very much determining the transition state relative to the others and minus two would it up that it would actually be destabilizing a state instead so but both one and minus two are important for the folding process while sir would mean that it's not important for the folding process so that's a bit we're not going to go into negative five values right now but you could think of this scale that determines how important they are there are other things that's happening upstairs here actually michael oliverberg who I think I mentioned that they're very interested in protein misfolding right which is intimately related to a bunch of diseases such as ALS so here they've taken small proteins that consist of four beta strands in the few helices and one of the questions they had is that where does this process start to form does it start to form n-terminally or c-terminally and you recognize the plots now right so every small mark here is an experiment that you then have had to do as a certain concentration of gonadinium hydrochloride and then you pick one mutant and one mutant will produce one curve like this and that might take an afternoon well a few hours or so but the point is you can probably do a couple of these curves in a week and what you then get in this case it's easier because here it's just between zero and one so zero would mean that it doesn't influence that is not definitely not part of the transition state while dark blue here would mean that that is very much part of the transition state and that's interesting because see that this does not happen n-terminally or c-terminally it appears to happen right in the middle of the molecule right so what michael what they then did is that there are two parts of the protein so normally you would have a break here that the n-terminus starts here and then you go through the entire protein and then you have the c-terminus stop fairly close to the n-terminus so the idea they came up with here this works they should be able if the n-terminus is very close to the c-terminus sorry this would look imagine that you have part of the protein here n-terminus there as a c-terminus n-terminus there as c-terminus there to make this a little bit easier i'm going to color it a different way so let's say that the second half of the protein is red but when you draw this in space it turns out that the n-terminus is very close to the c-terminus so the n-terminus and c-terminus in 3d they're very close so what they did is well that's funny can we take these genes and instead put them that way so the idea is that if these two parts the end of the red is anyway close to the beginning of the green in space right so this should only require a very small loop and effectively i've introduced a break here but i just introduced a break in a loop or something where they were anyway relatively close to each other and the cool thing is that it works so they can do cyclical permutations of proteins and show that they will still form and they will still form in this case i think it was even more of a folding nucleus in the second case yes it's not i think it's down here i should have drawn this from another view but it's down there sorry you don't see it that well so the reason why they were interested in this it has to do with that what are what is the formation process of beta sheets and why do these stabilize proteins because i think it's the beta sheet in particular that are related to these aggregation and misfolding diseases but i think it's also a beautiful example of the importance of five values and i like that they i don't think that they they did not determine the structure of the protein the structure of the protein has already been determined by another group but this is a very convenient way again even even if i hand waved a bit through the mathematical derivation this is not difficult research we're sitting in with spectroscopy or something and just measuring how much protein do you have as i'm titrating and adding more denaturant so it's a cheap and simple experiments that provides super deep insight about very advanced processes and i would say that's how most most really good science works that way anybody can buy a super expensive experiment the best sciences what you can do with any undergraduate chemistry lab and still get very advanced results so based on that we're going to take a second look at live and talk see if there are two possible ways here we can say that we like ampoules and and that the native state was really the lowest free energy or we can say that we focus entirely on live and talk and say that native state really had to do with fastest folding or at least the state the lowest energy state that we can access the fastest and you already have it in the lecture notes i'm not going to ask you the question that or maybe we actually have a little bit of both here and a particularly argument that it might barely so that stable structures tend to lead to rapid folding pathways because if you think about the protein i had or if you had the stable beta sheet having a well-defined good beta sheets helped create the stability in the structure for michael that i showed you and as you're forming this beta sheet it helps you get the rest of the protein right so if you're having some sort of good starting point things will usually go the right way which is of course related to nucleation condensation um this is not a proof or it's a bit it's a bit of hand waving but based on this we can look a little bit more at nucleation condensation and i'm going to argue that at least in one case this can explain why it happens so you already know the argument about how long so sorry now i changed this to time instead of kinetics you could argue that i really should make up my mind here and i should consistently use either time or kinetics and that's a very well justified argument the only problem is that's not how it works in reality we sometimes look at rates and then we use k and we sometimes look at time so i so wish that the world wasn't that way but it is um in this case we're going to look at time um and the only difference here is of course we get rid of the minus sign there so that the time it takes for something to fold is going to depend on these free energy barriers and as we said before the break if i'm going to argue now there are ways for at least most proteins to fold in finite time we're going to need to find an explanation why this free energy barrier is not as high as it would have been by default if we had to search everything and you can argue whatever this time is it doesn't matter but that time is also sort of fundamental steps say elongating and helix or forming part of the beta sheet i don't really care that much about you can say that that's a microsecond that will work too but it can't be a second and we also argued before the break that why this works is that both energy and entropy drop during folding and they drop at roughly the same speed the only question is what happens first and what i'm going to argue and that it will show at the next slide is that this the whole concept of this folding funnel idea is that there are going to be pathways where we fulfill this where the energy and entropy drop at roughly the same rate so that it doesn't become too extreme and what i'm going to go through here is that this will initially look like a tiny difference but tiny differences are important when they show up in the exponents so the way we solve leventals is by nucleation conversation and what i'm going to argue is let's do the naive part first part first let's just say that i have a protein or so small a small region in a larger protein and x marks the spot this region we have no idea what its shape is but you have some small if we assume that this part is now native and this part is not yet native the number of good contacts really well this is going to be super complicated so then we introduce a simple model i don't i don't know exactly what the interactions here are and it's certainly not useful to try to make a computer model and calculate it exactly but we already said that the energy drops rough the enthalpy drops roughly linearly right so let's just say that the energy of the protein which is negative because it's good is roughly proportional to the number of residues we have here roughly and the number so that the number of interactions so the number of favorable energy that's going to be roughly proportional to that volume which is roughly proportional to the number of residues we have in that volume which is roughly proportional to some sort of radius cube right this is not the sphere but if we're talking about areas volumes in space and we have no idea what their shape is the best thing that we can say it should be roughly proportional to some lengths cubed and that's all we know about it the problem is that that is not going to help us if we're if we went through all this math and all that would happen is that we would mean that the end the end you will go down and the entropy would go down and it wouldn't really tell us a whole lot more but if you look at this nucleation condensation what happens is that I said that well at some point I need to start form this nucleus so let's erase this complicated picture for a second but let's remember the first line there that it was roughly proportional to the volume that was proportional to r cubed if you now have another protein here and but we are just in the process where we're forming this transition state and they're just a handful of residues here there aren't really we don't really have an inside volume here we might have one or two residues but there is no there is no interior of this part yet so if I'm adding one more residue here or one more residue there or say one more residue here in general this residue is not going to be paired the only thing there might be some sort of surface area of the volume here that we've already formed that I can interact with but most of the residues will not yet be inside the volume you can imagine say that if you're gradually forming ice at the very first part we're reforming ice the amount of ice we form is going to be proportional to the surface of the ice crystals that has already formed because that is part that I can condense on so in initially very early on in the starting game here the number of interactions is going to be roughly proportional to the area which is roughly proportional to the radius squared and that in turn is two-thirds the power of the first expression right so you could also say that this is the number of residues that have reached the native state raised to the power of two-thirds you buy that so that there is some sort of in general throughout the process there is going to be some sort of term proportional to the first expression that is proportional to the number of native native amino acids and there is going to be some sort of term proportional to the second expression which was again roughly the first expression raised to two-thirds because of our difference there so that's going to be proportional roughly to the number of residues raised to the power of two-thirds which of these terms is going to be largest as engross the first term is going to be much larger here so you could argue friend of order could then say okay we're just going to ignore the second term because that's much smaller effectively what this does is that you have a again low energy was good so you're going to have the energy dropping significantly but early on it doesn't drop quite that efficiently because we don't really have that volume yet but before we ignore the second term every argument I made here will hold for entropy too that the entropy the freedom but the freedom would be the opposite so the entropy would correspond to the molecules that have not yet found this state right so here too there is going to be part of it proportional to the volume and part of it proportional to the area and then when we take that energy and the entropy because the entropy is going to go down quicker initially what happens is that these blue parts that was proportional to the number of residues they will roughly cancel because both of them are proportional to the number of residues while this n raised to the power of two-thirds term well initially what happens is that the entropy will drop a bit faster than the energy which creates the small barrier but the point is that the barrier everything Leventhal said that you might remember the arguments that I made that there were two or three states the number of state was not important and then I said that there was 100 residues and then it raised to the power of 100 right so everything we said in Leventhal you had the number of residues in the exponent and that's a number that grows very quickly and this is something and this is roughly the part where you should sign that you feel that you wasted the entire course we spent 11 lectures going through that instead of having n we now have n raised to the power of two-thirds in the exponents this is nothing it's a tiny difference it can't make any difference but it does because it's exponents so this means that instead of having Leventhal's results of 10 to the power of 10 years this means that proteins will fold in seconds or minutes it's a bit absurd but just do the math instead of 100 try to enter 100 raised to the power of two-thirds if it were it would be the square root it would be 10 instead of 100 right and that is really the solution to Leventhal's paradox you don't get n in the exponent you get a number that still grows but it grows much smaller than n and this we can test experimentally so if you plot folding times on the y-axis and we plot that as a function not of n but n raised to the power of two-thirds then virtually all of things they end up here maybe there's a between within a factor two at least of this so that the alpha helices they fold pure alpha helices will fold slightly faster than pure sheets but all of them obey this given that they are reasonably small proteins it would not work if n was suddenly 100,000 what would happen well 100,000 raised to the power of two-thirds is starting to be a very large number so this is also one of the reasons or reason difficult this is of course related to evolution and natural selection right that if proteins are too large they would take too long to fold so you need to keep proteins reasonably small so that n might be 100, 200 residue or something because that means that we get reasonable folding times we're just really a third reason why we have the size of the domains we do and in practice is not the third reason so that remember that you said domains appear to be the evolutionary units in bioinformatics they appear to be the folding units and now we're also seeing they appear to be the units of size large enough so that things can fold quickly and all these three things belong together they are the units of evolution because that makes sense folding wise otherwise it would be too efficient if they were larger it would be too inefficient if they were much smaller it would also be inefficient because you would end up making tons of mistakes in evolution you would get things that don't fold so we need domains to be large enough that we're fairly confident they will fold even if i put them in another surrounding as part of another protein but i also need them to be small enough that they stay in this 100 to 200 residue ballpark so i know they will fold so did you follow that i think it's kind of beautiful result so based on these rates particularly this is effective what i said on the last slide right that this limits the size of the folding units the rate of folding is entirely determined by this transition barrier the midpoint or the equilibrium what says that my model here is correct very little i'm shown that it's compatible with experiments that but the mere fact that i'm compatible with experiments doesn't mean that i'm correct but there is also and this goes certainly goes for intelligent designer and i think it's not difficult to make a very fancy model that theoretically would be compatible with experiments but there is also a simplicity factor call it occam's racer or something if i can make a simpler model describe reality better it's likely the preferred model and in this particular case all i have shown is that there is at least one or a few ways by which proteins can fold without getting these astronomically high barriers i haven't shown that it's the best possible path so that in theory they could be even better paths and if there were better paths it would have to be even faster because that would that would justify why you had even lower free energy barriers but i have shown that you can have free energy barriers low enough that it's compatible with experiments at least and what this means that folding is not at all random it's not that you start out every single thing you are guided by these funnels you need to have a few key contacts early on forming a protein that helps the protein form the right shape or you're going to get stuck in an incorrect state so what happens if we get stuck in this incorrect states well sometimes you can have a chaperone or theoretically you could at least have a chaperone take care of you but if we needed chaperones for every single protein that would be exceptionally inefficient for the cell and we don't want cells to have to use another few units of ATP for every single copy of every single protein that we're folding so small proteins they need to be able to fold without assistance and then there are a handful of very large structures that are for whatever reasons just so important that there is no way we could achieve that functionality without helping them and then you might need chaperones or so but that's even even for 30 40 years later that for small proteins you don't need all that complicated machinery so Amphicen is still right that the lowest free energy minimum determines the state of the protein so then one could argue that we're almost done but not quite let's have a look at that chaperone plot x-axis was the de-daturant or the badness so being right here is bad this means that we're going to unfold and this means that we're going to fold and for all of those plots you didn't see it but well it might not have been obvious in all of them in theory as the concentration of badness here gonadinium hydrochloride goes to zero we would expect that the folding rates should just keep going up as a protein here here we would really be in the range where the protein should be super stable right and it would make sense that if you're super stable you're definitely going to want to be folded but something bad happens here we start to fall off we don't go up so things don't go quite as fast as theory would predict them to go under perfect conditions that's a bit strange you could of course argue that there could be something complicated happening at the transition states and everything but there are more things than that that would happen so in general on the left hand what has happened here is that our protein is happier for whatever reason that the gonadinium hydrochloride is not disrupting our structure we can pack better the energetic interactions are better everything is better for us what will that have done to the energy levels in the protein are they going to be worse or better better but that sounds strange right if they're better they should be lower so what happens is this let's see i might even have a slide of it what happens is that you start from the unfolded state here black and normally all states actually i'll draw it here it's easier normally all states are up there it's bad right none of them are going to be more stable than our unfolded state and if all the states all the states you can imagine are up here they are worse well occasionally you might reach one of those states but we are very quickly going to go back to our unfolded state that would happen on the right side in the diagram before we prefer to be unfolded because this is the lowest free energy level but what now happens when you make the protein better and better we move to the left hand side because the free energy levels are going down for the protein it's better to be folded and suddenly you have this native state being clearly lower than your unfolded state is that good or bad that's awesome we love this this is nirvana because now we're going to be folded and it's much better to be folded in the other state so now now we're gonna get i'll go back to the previous part now we're going to get here that the less good indium hydrochloride you have the better it is because they're all going to want to be in that beautiful state but what suddenly happens is that when things become too good all the m states there the misfolded ones they're also going to be better than the unfolded state and you could argue that misfolding sounds horrible but remember what i showed you this morning fingers packed that way is the best one but that one is going to be a pretty good state too not quite as good but it's going to be pretty good and it's better if this is now better than my unfolded state sometimes i will get stuck in that state by mistake and mind you this is still a higher free energy it's just not going to be the best state and it's not necessarily a prion or anything either so that is not going to take forever to him but the problem is that maybe 20 30 percent of my molecules they now go to the misfolded states instead of the folded ones and to fix them up they will have to go back from the misfolded states to the unfolded state over an energy barrier and then they need to make a second transition hopefully they don't get stuck a second time and the more misfolded states i have up here the more potential states the more likely it is going to be that i get stuck in one of them by mistake and if or sorry i don't get stuck but i visit them and if i have visited them rather than going directly to folding in a millisecond i'm going to need to unfold the misfolded state and then fold it again and that describes exactly what we see happening here so what's happening here is that your protein becomes too stable so you're folding beautiful low free energy states but they're not the free energy state we were looking for they're not the best not the 81 but again they will we're not talking about price here they will fold but it's just that it's not going to be quite as efficient as it was here and the reason why i'm showing you this is that these are experimental results we see this experimentally for virtually all proteins and that goes into justify why you really have this distribution of energy levels why you had and remember that i said it really helped to have one state that had a clear energy gap that should be lower than the other ones and here we see that not just based on theoretical hand waving but in experiments if you start having multiple states that are good initially that sounds great but it actually hurts us because we're not guaranteed to uniquely find the right one so they will fold but it's not going to be as efficient and of course if those energy states are if all these energy levels are super deep down in the floor there then we run the risk that some of these misfolded states could be so low that they will be essentially never unfold and then you end up either in prion like things or prions should probably be that the misfolded states should have lower free end and this is really how it was this the only difference for real proteins is that we're not doing this in one dimension we have slightly more than one dimension this is two i forgot oh i should know what this is but so what we typically do in experiments we might create if we would like to measure something experimentally we need to create a fake two-dimensional and the landscape or it's not really fake but we might have this is one distance restraint that i can measure and maybe this is disulfide bits that i know disulfide bits is binary say that the hydrogen bond i can measure the distance so i can find some sort of general coordinates between a protein and in this case the lowest free end just start down here and it's much noisier up there and in an experiment it's difficult to do that for more than two dimensions but with our computers we can easily go to a few hundred dimensions and we have done that um this is a protein called ntl9 which takes above them roughly 1.5 milliseconds to fold and now we're talking about some that's fairly slow we've been able to fold this or rather vince valts was able to fold this within folding at home a few years ago the green one here is the native the blue one is the folded and you managed to find that without having any information about the native state so whatever the mistakes in our models and everything are bad they're good enough that we can actually predict the structure of real proteins but the cool thing is that when we do this all these simplified models i throw in your body i wouldn't say that we throw them out the window but it's more complicated it's not one dimension so what happens is that you start out somewhere why don't have it somewhere here we have the fully extended states but when you simulate this we're going to realize there are lots of states that we tend to visit quite frequently and you can even cluster them you can recognize that what are all the distances within the state and i see this is something that we see very commonly and if you start mapping this out you can even you can even so basically we define these states and then we start looking at the transitions between these states so all these letters are intermediate states clusters of intermediate states and in the computer we can even see that if you are in a how likely is it that you will go to f to g back to d b and the size of the arrows here they describe how commonly we see this in the simulation so the point is that if we are in a we virtually always make a transition to m sometimes l but m is by far the most common and if we are in m we virtually always go directly to n which is the folded state the native one so this looks like a bit more like a spider's web or something it's a network and then so fundamentally it's all more complicated than the previous spots but again it's not just a single one-dimensional line so proteins in general they have tons of intermediate states particularly if they're slower folders but the fundamental concepts still apply there are intermediate states and there are transitions between these states the one thing that is more complicated here though is which path do you take to folding so it's not quite all roads lead to Rome but all roads lead either to or from Rome so this is by the dominant path but we definitely do have a folding path here too right so real proteins have more than one folding path so what if there is now a mutation in this protein that disrupts that transition what happens is that suddenly we probably have most of the protein take this other path instead right and this is very much true for well any ligands you're designing and I think the second you start to change the protein you might not completely kill the transition but suddenly you're going to make it well the protein will always prefer the best transition which is the big fat arrows here but if you block the best transition we're going to have to go with the second best so it will still happen but it's going to happen slower and it might take 10 milliseconds or something instead so to sum this up and I think actually we're we're going to be able I'm going to be able to let you go 10 minutes early today there's been a lot of things today the reason why protein folding works as that room or at least body temperature the energetic attractions outweigh the entropic cost of folding but they have to go down at roughly the same pace so that we balance energy and entropy and the slight deviation from the energy and entropy is what's going to cause you these folding barriers if you start increasing the temperature the cost of the entropy becomes too severe the same reason why you don't want ice at high temperatures so at 100 degrees centigrade the relative cost of being in the folded state due to the entropy is too high at low temperature well that's not quite as easy but remember that I said that the energy is always attractive and I guess one way to think about this is that the energy would always like to collapse but it's on average you're going to collapse to something that is not correct think of those low level misfolded states and the entropy creates a bit of resistance here so that you will need you can't immediately go to those because the entropy will create barriers everywhere so at very low temperature this resistance becomes too small and then you just favor the low energy and then you completely ignore the entropy and that also means that you get stuck in some sort of intermediate states you're not really going to fold it and the second you start to design proteins you're going to need to be aware of this so is this how you would like to design proteins if I ask you to design a protein well we have the whole bioinformatics approach right this is Mike Levitt to make the slide a bunch of years ago and I like it because it's beautiful that what you really want to do is that we want to go from a DNA sequence not even to a folded protein but to a function you want to do something and the only way you can express and insert that in the cell is going to have to be nucleic acids and then you need to trust the cell the cell need to help you you can't do all these parts in the lab a particular not of you want like a gram of it or something and one of the things that's fun is that there are different these processes it's just one protein that happens in your body right but the way this works in computers versus in life or don't think so much computers but think about your brains here that going from DNA to RNA well that's trivial you just change t to u right I don't think you've written any predictors to predict RNA sequence from DNA sequence it's not exactly something that's going to get your high grades in vivo that that's how you do it it's insanely complicated you're going to need to do machinery instead of different sequences together and everything so this is extremely hard to do in the cell the same thing RNA to protein you could do that at the first lecture just read the codons and insert them yeah that's the ribosome so both these steps are fundamentally thought wise but of course these are models they're very easy for you to do but they're exceptionally difficult for the cell on the other end the final step once you have your nascent chain as we've seen here that folding understanding the folding part in all these interactions why it happens that's super complicated in theory but in the cell yeah well just throw it out in the cell water and it's going to fall automatically and I think to be successful here in general a good strategy is kind of to try to pick that pair that pair that pair so try to stay in the green slides and it's not just a joke so I think part of the DNA and everything this we can easily do in computers there we don't need any fancy experiments here but try to piggyback nature anytime you can and it doesn't mean that part large parts of this course might have been more focused on the biophysics part while the bioinformatics course was focused on information and instead of information I think the better word to describe it was what evolution right and the point is that evolution has already selected the best chains for you the point is not that physics is better than evolution or that evolution is better than physics because if you said that only evolution was important only the cell was important then you're doing hard hard easy so try to pick the easiest pair in each of those so sometimes the information aspect is easier sometimes the physics aspect is easier and by all means don't forget sometimes it's far easier to do things in the lab there is a quote sometime I forgot about an advanced simulation no it's not simulation but basically an advanced simulation can frequently replace a day in the lab with a week in the computer so the point there are tons of experiments that are really cheap easy simple to do there's no brownie points for replacing those with a complicated theory and simulation the thing to find is a simple model so that you replace complicated things with simple models so that's all we had today the point here is that kinetic and thermodynamics not only can they be they must be unified in real proteins you can't have one without the other then they would not form a real protein for and the reason why we understood both that the transition states and cracking Leventhal's paradox is really that we got it from experiments but the only way we could understand those experiments because we created these chevron plots and this is typically how our experiments are controlled you might not think about it now and it's it's again it's a bit unfortunate that we don't necessarily teach science the way we do it there is no way you will ever have in science it's an obvious finished experiment you just need to do this experiment and replay and interpret the results the way science works both in industry and research is that first you need to think what is that I would like to understand if you don't know what you can understand there's no point in designing experiments yet and then see what this is that I would like to understand where might I be able to get this and then again if it's trivial it's trivial the reason why you're going to be paid high salaries is because you're going to do non-trivial things and then you need to find this how can I translate this non-trivial thing with equations or something to something that can be measured whether that is high throughput sequencing or something else is irrelevant but it's the model part and then find an experiments by which you could measure this when you've done that then you can ask your more junior co-worker to do the experiment for you because they will gradually you will have other people doing that for you but you need learn to think about in terms of models because a model is just as important than an experiment as it is in the computer there are a bunch of study questions that we will go through tomorrow the other part I'm going to do tomorrow I'm going to be speaking about docking and drug design for real the tools that people use both at SciLife lab and in pharmaceutical companies to design real drugs today and for once I'm going to leave you five minutes early