 So discussion items for today, based on yesterday, let's get started. You pick them as you want them, whatever road you want. I'll turn on the lights in the meantime. The reason why it's important to think of all those is that it depends what your goal is. If your goal is to understand evolution, and that is important in many ways, obviously you should focus on the evolutionary aspect, right? If there are similarities, it's because there are similarities in evolution. What you will talk about a lot in the final course this semester is something called Parallogs and Orthologs. You probably brought up that a little bit already in the Biophematics course. But you can treat genes and everything in a much higher level. So basically at what point do genes start to diverge in function? Or at what point do you end up having speciation so that you have the same function but suddenly it's two different species? And this is something that just the last few years is becoming exceptionally hot when it comes to understanding disease. And that part is that despite almost 20 years of the hydropogenomics and everything, we're not even close to start understanding disease. It doesn't mean that it has failed, but it's one of those problems that has a couple of orders of magnitude harder than we originally thought. But on the other hand, you might be working with drug design in a pharmaceutical company. And if you're working with drug design in a pharmaceutical company, what is the thing you should think about? So in that case, right, in that case you're much more focused on structure. I need something to bind to this channel to do something. And then it's probably, yes, it's interesting that in theory nature might have evolved something, but you're not nature, you're not following evolutionary. You're trying to design a structure and you want it stable in something. There is a very large project. Have you heard about the human protein atlas? So their idea is that they want to develop, they want to create a map of every single of these roughly 20, 25,000 genes you have in the body and wearing your cells to descend up. Of course, it would be great if we had the exact function and everything, but that's not realistic. But can we at least say this is a membrane protein. This is something that goes in your nucleus. We might think that it's related to DNA somehow, but we don't know the exact function. But if you do this for 20,000 proteins and then you can say, oh, in this particular form of cancer protein 14 is overexpressed and tends to go in the nucleus instead of in the cell. So just having this in a very low resolution is remarkably useful. So what they did is that they wanted to create antibodies against every single protein in the body, but that's pretty difficult endeavor. So what they ended up doing, they took a very small protein, protein A, that's roughly 60 amino acids. So that's a completely different protein. But then you try to systematically replace a small stretch of amino acids in protein A with fragments from all the other proteins, some of them just a few at a time. And then develop antibodies for the small protein that we need. No folds. It's a small well-controlled protein. And then you develop a gigantic library of antibodies for these so-called biotopes, a small patch on the surface of a protein. So in that case you're using biotech, but somehow you need a scaffold. You need something to express things on it. Yes, pick a small random protein and then hope that it's stable. So that, I guess you could even, I wouldn't say that it's functional convergence because the protein is not functional, but in that case you're using the fact that it's a small stable structure. You reuse that. So that the fact thinking about proteins from different viewpoints is exceptionally important in everything from biotech to cancer treatment or anything. And actually what you could argue is important for functional convergence. What if, assuming that you're now working on a cancer project and you've been successful, you've managed to develop a new antibody, is that risk-free? Well, apart from the stuff we talked about yesterday, what if you have functional convergence? So there are other proteins in your cell that have just evolved a very similar fold. Your antibody might strike against those two. So that even the functional convergence, even though the amino acids might be completely different if their overall folds and everything are similar, their hydrophobicity patterns are likely similar. And then there is a clear risk your antibody might strike against that one too. So one of them is not more important than the other. It depends what you're doing. Structural evolution. Give some examples. So the point is that either artificial, I wouldn't call that natural, that's probably artificial, the natural point would be fetal hemoglobin or something, right? A molecule evolves, nature randomly introduces mutations and when these mutations somehow causes the protein to have a better function, they tend to survive more. Is this something you can use in the lab? You don't call it structural evolution, but you do. We do occasionally use what we call genetic algorithms. So the idea with the genetic algorithm, even on the bioinformatics level, is that you somehow have one part of your program that introduces random changes. And then you somehow need to assess are these changes good or bad? If they're slightly better than they were before, we accept them. So you basically try to mimic evolution. But that's in bioinformatics. Could you imagine doing this in the lab? You do. So there is a fairly wide range of algorithm where you're trying to express proteins to say, let's say that you have bacteria and you would like them to produce ethanol. Very important today within the biotech industry to create deficient fuels. Because bacteria can produce ethanol far more efficient than we do it on the cornfield. So could you then somehow introduce random mutations in the bacteria and very rapidly screen through generations because you know how false bacteria develop, right? And then somehow try to select for the part of the bacteria for the mutations that appear to have been good and helped you produce more. And then you go back and do another iteration. And this works. The specific point with bacteria and ethanol at some point when they produce too much ethanol, the bacterium dies. But it's definitely used in the biotech industry in general to rapidly evolve bacteria to create... You don't know what function you want, but you don't know how to achieve it. But train the bacteria to basically randomly introduce mutations until you go in the right directions. What's the relation between that and sequence evolution? I can't have said it already, but... But if I have my big protein here, can't I just say that I want an arginine here and then it would be great if I had, whatever, a histidine and a leucine? Can't I just replace the amino acids and say what I want there? Anything I want? Why? So this is the problem, right? That if it was just a matter of adapting my structure to one day I would have to find this in 10 seconds on my computer. But the problem, as I've alluded to a couple of times that we're going to prove today, or at least hand wave about today, is that random sequences won't fold proteins. So yes, it's great if I could take my random sequence... Well, it's not random, but if I could take my sequence and put it in that particular fold, I would get a great binding site. That doesn't mean that that sequence will fold into that fold. So that's the problem. This is one way in a way. And it's much harder to say, given a structure, what sequence should we have? If you have what sequence, given a sequence, nature will fold that sequence into some structure or a random chain. And this is what makes it a bit difficult. Evolution needs to happen on the sequence level, and that has a result in structure. But it's not really an... Well, there's kind of an equal sign here, but it's an equal sign that's still very difficult for us to treat, either with mathematics in the lab or simulations. So why do proteins have the... Protein domains have the sizes they do? Right. So that's what we talked about. The stabilization energy of a protein is pretty much independent of its size, right? And that means that if you keep increasing it, the likelihood of having a small error somewhere keeps going up. So these are some pretty important... And this is actually, surprisingly enough, this helps you. If you're going to do protein design, at first sight, this seems like a completely impossible insurmountable landscape, right? You can design any type of protein. I would argue any type of small, efficient protein design is going to be a very small domain. Because they're small, they're stable, they fold quickly, they don't have too many side effects. So while you might start to think of a... When I say protein design, don't think of a ribosome. Think of a four-heel expundal. Designed proteins are typically super small. So at that point, it's not a completely impossible endeavor. Do you remember these champ peptides, the computed helical peptides in membranes? They kind of achieved these things, right? That you managed to design in a specific pattern, and then you need to be a bit lucky for it to fold and everything. This is not science fiction. We do design proteins today, and you might very well end up doing it during your thesis projects. So we... Oh, I didn't... Number five, I didn't mention that specifically. But speculate about it. What do we mean by sequence to fold fitting? Right. And one way to think about this is simple four-heelical bundle. Just up, down, up, down, right? Or something, well, that's five. But it's a fairly simple fold. You can pack lots of things that way. You remember early on in the course, when I said that proteins never have knots, but there was this Pepsi in example. So if on the other hand, if I'm not going to make a structure that, and then we go in there and have a knot there, the number of sequences where this will actually be possible and good is basically zero. The number of sequences where this will work is going to be billions. So we can... And that's why this fold is going to be more common, and nature will keep reusing that, because if nature, any sequence that would only fold in this fold will hardly ever be able to fold, because during folding you will sometimes have to thread it into itself. But there's actually going to be billions of alternatives that will be stable. So this means that some folds simply fit more sequences. They're liberal. They accept more things. And this, so this is important, because if you're now going to design your protein, you're going to need to start somewhere. If I asked you to design a new protein to do whatever, bind to some receptor, because I would like to enable or disable this receptor. It doesn't matter exactly what I would like to do. But there is some known receptor in nature, and you should design a new protein to fit it. How would you do that? You could start with some known ligands to the receptor, but I'll be a bit nasty and say we don't know any ligands. So you need to start at the start. We definitely need the structure of the receptor, right? So that... It's great, because you're now working, you're now the head of the pharmaceutical, of the research part of the pharmaceutical company. So just go down to the X-ray lab and put three people on this. Six months later they have a structure of the receptor. It costs a couple of million dollars, which I'm a bit serious that people... There were several companies that spent billions of dollars on the first DPCR structures. So seriously, millions of dollars is nothing in pharma. So now you have a structure of this, but we're still going to need something to bind it, right? As you will see later, we're going to talk a little bit about docking. We're going to talk more about MD of real proteins. So there are ways we can calculate if interactions are somehow good based on all these interactions I showed you earlier on the course. But you can't start from nothing. If you try to start from scratch and create something that binds to a protein, the likelihood that you're going to get that to fold later is going to be zero. So you need some template. You need a scaffold to start building from. So which scaffold would you pick? The knot or the four-year little bundle? So pick something simple, right? Do you know that it's not going to be trivial for you to create a structure that's stable here? Or could it... Maybe you could create, say, four cysteines or something to create crosslinks or something to super-stabilize it in whatever form you would like. So basically imagine having, say, helix there, helix there, and then you create a disulfide bridge there and a disulfide bridge there. That's likely going to be stable. Because your goal is not... You don't need to find the best possible sequence or everything. You just need to force... Have a small scaffold. You might have 50 residues in that one. We use four residues to force it to hopefully form cythyl disulfide bridges. And then we have, like, 46 residues or so left to play around with. So stabilizing a known simple fold into the structure is frequently not impossible. Much easier than you think. Stabilizing and forming a general protein when you don't have any idea about the structure. That's hard, but you can see it here. That was sequence fold fitting. The typical stabilization energies of a protein. Do you see how sloppy I am? For the last two, three lectures, we don't bother about, say, free energy. Everything we talk about at this point is free energy. So that's basically... Yeah, well, so that's when you... Yes, that's correct, and then we talk about letters. But I'm more thinking in terms of numbers. What is the stabilization energy? We talked about it in terms of defects, right? How many... How much defects you can introduce in a protein. But you should think about this way more concretely. You know you're free energy, right? So we have delta G there. And we have some sort of unfolded state here. And then we have some sort of transition state. And then we have some sort of folded state here. The stabilization energy of a protein is basically how much energy can you add before you unfold me? The stabilization energy would be roughly that. If the energy increases more by that, you're going to unfold. So that's the only thing that keeps me stable in my native fold. And what's the rough ballpark of that? Yeah, as I said, one to two hydrogen bonds. It will, of course, depend a little bit on the protein. But it's virtually... Just imagine... Look at the hemoglobin or myoglobin. Imagine the number of hydrogen bonds you have in all those helices. And if you destroy two of them, you're no longer going to have a stable protein. So this comes back to this, that it's not that proteins are fundamentally stable molecules that anything will turn into a protein. Almost anything you do to a protein will destroy it, sadly. So why is that not governed by the Boltzmann distribution? So there are two parts here. What I do here is governed by the Boltzmann distribution. That's the stabilization energy of a given sequence, right? But if you start replacing amino acids in the sequence so you do a mutation to destabilize it, that's what I meant. That's not governed by the Boltzmann distribution. So the point is, well, any time I say Boltzmann, you need to have this exchange between states, right? Boltzmann is about multiple states. And you do not change your amino acids. A protein never spontaneously changes alanine for glucine in position 47. And that's why we've made this long derivation about statistics and ended up with something that looked remarkably like a Boltzmann distribution, but it's not formally a Boltzmann distribution. You're not going to get more mutations in a single given changes because you increase temperature, right? Because that's what the Boltzmann distribution would say, and again, it's never that you have 14% in a single chain, it's never that you have 14% alanine, but only 6% tryptophan. We spoke a little bit about typical sizes of helices and sheets. Roughly what are those sizes and what was the argument that led us to limit the sizes of them? Sorry? Yeah, 6 and 10 watt. In watt. So 16 watt and 10 watt. So those are typical average size and anytime they're not going to be, I think that's a little bit on the low side, but it's not more than a factor 2 off. And anything that's with an attack to 2 is fine when you're talking about principles. So I would say that there are two whys there. Why are they not smaller? Yeah, but we don't care about function right now. Now we only care about physics. Sorry. So this is the classical free, you're quite right. So this is the classical free energy argument, right? If this is delta G zero, at first we only pay. And then we start gaining things back. And unless you get below the zero line, whether this might be slightly earlier there, but if you get below 3, 4 residues, it's going to be extremely unlikely that the delta G is negative. Then you will of course have some helices that might be 5, 6, 7 residues. Occasionally you just have two turns or something. But here we're talking about what we derived was the average size. And the average size will be longer than the smallest ones. So it's the same thing for beta sheets. There is a natural limit to their lower leg. So what is the reason then that they are not infinitely long if you keep gaining energy here? Yes. Do you see how this is different? They're two completely different phenomena. So the first phenomena was the Boltzmann distribution on a single given sequence. If you give me a sequence with 5,000 residues that prefer to be alpha helical, what's going to happen? They're going to be alpha helical. So in the Boltzmann world, if you've already decided what sequence you have, they can be infinitely long. But it's just in nature, you're not going to specify what the sequence is. So the upper limit here is a completely different argument that in nature, in practice in proteins, you can't choose. It's just that it's going to be very unlikely to have that type of sequence. And that will of course influence the average length we actually observe. But do you follow the difference, right? That there are two completely different reasons to make downwards versus upwards. So if you pick a very long stretch of residues that all want to be alpha helical, this will keep going down, right? But the problem is, if you're now randomly, well not really randomly, but by evolution, if we assemble things in a cell, the way mutations happen is more or less random. And of course, if you're now going to pick 5,000 residues, by the time you have roughly 2, 3 residues after each other that don't want to be helical, you will break the helix. So the likelihood of nature ever having assembled 5,000 residues after each other and never ever having two of them that don't like to be helical, I'm not saying that it's impossible. It's just so extremely unlikely. Now if you give me such a sequence, yes, it could be a very long helix. And you know what? There are long helices. There are actually some receptors on the surface of bacteria and everything that are very long coiled, coiled helices. You also have coiled, coiled helices in your hair, right? Those are exceptionally long alpha helices. It's just those are not typical proteins. And in that case, we have a protein that we only assemble of 3, 4, 5 residues that are all known to be very alpha helical. But for a general globular protein, the likelihood that a random sequence would be very long is so extremely small that the average length of helices end up being, say, in the range of 10 to 20. So what is GFP? I have to make a small confession here. GFP is super popular in this department. We love it. We use it in particular for that. Remember when I talked about these ways that we measure insertion of proteins in membranes? We use GFP there too. It's a wonderful small market. And I have to confess, if you look at it historically, since we've been fairly lucky, a fair number of our senior professors have had the privilege of serving and there is a pretty significant overlap between research interests in this department and what tends to get awarded the Nobel Prize. So we have GFP. We have the structure of biology. We have the membrane proteins. And this is, of course, not because we're biased. It's just that we've been lucky to recruit some really skilled professors active in these areas. But again, sadly, Roger died just a year ago, so he will not be able to continue that work. Allosteric modulation. I spoke a little bit about that. In two different ways. What is it? And a particular function. Amplifier. Actually, it can dampen too. Have any of you studied electronics? Do you know what a transistor is? So the way a transistor works is that you can have a small current control a large current. So by making a small change in current, it's basically an amplifier and your high-fee equipment. An allosteric modulator works the same way. So you might have a complicated these ion channels we work with, they're very sensitive to allosteric modulation. So normally, you have a small ligand binding here that decides whether the channel should open or not. And that determines entirely whether your channel is open or not. If you go home and have a glass of wine tonight, the alcohol will bind roughly there. If you only take ethanol, nothing is going to happen. The receptor will not open with the ethanol. Because that's like trying to steer a transistor without having any voltage across the main gate. But if you have the alcohol present and then this ligand binds, the protein will respond a factor 10 faster. The alcohol can actually dampen the receptor too so that you don't get as much as an effect. And those were the effects with hydrotenesthesia. You have a similar effect in hemoglobin, right? That you have a change. It doesn't have to be ligand binding. Although in the case of hemoglobin, it is binding on one oxygen that changes its ability to bind the next oxygen. We're not entirely sure why, but this appears to be a universal feature of biology. Versus almost all proteins are somehow involved or affected by allosteric modulation. When it comes to these channels, we even think that there are five different subunits and the way these subunits open is that they allosterically modulate each other. So when one subunit opens, it forces all five of them to open. So it's highly cooperative. There's still in there so much research remaining to be done here. And a whole lot of diseases are likely dependent on parts of this allosteric regulation failure. And essentially, this is what you try to do with a whole lot of drugs, right? One way of doing it, one way of doing drugs, could of course be that you try to add lots of, well, agonists, ligands, whatever the main thing they want to influence here. The other thing is that if a channel somehow doesn't open enough, the really sexy thing is could you somehow add an allosteric modulator to improve the channel opening just a little bit. The channel won't open unless it should open when the ligand binds. So you can take allosteric regulator drugs, they're way better than just normal brute force drugs. So that was allostery. What is the folding units of proteins? And how do we know? Yes, or you could say that domain is the fold. We define a domain as the folding unit of a protein. So what is the domain then? So this is the cooperative unit, right? That the part that has to be involved for it to fold. If you don't have the entire folding unit involved, you're not going to fold. And if you have more than one folding unit, these parts can fold independently. So if you have a very long chain, in general, if we have a gigantic chain here that has many folding, many domains, what can happen is that when folding starts to happen in this chain is that this part starts folding and then this part starts folding, that part starts folding, and that part starts folding at the same, at four different places in the chain. And as they gradually fold, they will eventually diffuse together to form a larger unit. Do you see the relation to the same membrane proteins? Because then you might have one kind of we call it, even call it a transmembrane domain. So the transmembrane domain folds one way. The helices need to diffuse together while you might have separate extra and intracellar domains. And of course, the extracellar and intracellar domain, they need to be independent of each other. They can't feel each other when they fold. So they need to be able to fold independently. But the other thing that we showed yesterday by comparing the amount of energy and what the phase transition looks like, basically the slope of how the energy changes, is that these are relatively large in size. It's not just one helix. You actually appear to have the entire domain involved. You can't start by gradually folding two residues into a single helix or something. That's protein folding in general is quite a cooperative effect. Or not just cooperative, it's all or none, like a phase transition. The other part of that is that this is intimately related to evolution. That the domains are the evolutionary units. And how are those two things connected? Why is it important that the evolutionary unit is also the independently folding one? So nature does this. And not only nature, scientists. These ion channels I showed you, the first ones we were able to study were all prokaryotic bacterial. There's nothing wrong with bacteria, but they don't have very good health insurance. So here's a bacterial channel. We can call it Glyc. Glyobacter visceolis ligand gated ion channel. That's its name. Fun if you're a scientist is proton gated and everything. It's not very important for disease, of course. What you really would like to be interested in is a human channel, say, as the Glycine receptor, GlyR. We know that that should look roughly the same. The only problem is that we can't determine the structure of that. That's unfortunately very common. Human proteins are generally much less stable than prokaryotic ones. Their membranes are more complicated. There are more mutations. They likely have to be more mobile. This is just a way of saying we don't know exactly why. But human proteins and, in particular, human membrane proteins are very unstable. So on the other hand, it would be super important if we could get a structure of the Glycine receptor, right? And in particular, if we could understand the transmembrane domain with the channel and everything. So how could you determine a structure of the human Glycine receptor, at least part of it? So the way this works, you have five subunits here. One, two, three, four, five. It would be one, two, three, four, five. Each of these subunits have a transmembrane and an extracellular domain. So what if you now do a bit of gene surgery here? So you cut, take the extracellular domain from the bacterial one, because we combine antibodies and everything here. And then we take the transmembrane domain from the human protein. So then we end up, let's see, let's red this human. Then you would end up with something that looks red, red, black. Remember, there are five subunits in total. And then we can express this in a bacterium. Then you get a structure of the human protein. And in this case, the extracellular domain, we can usually stabilize them with an antibody or something else. We already have antibodies for the bacterial one. We can recognize this part with antibodies and hopefully tag along and have that work for the human. This works. It was published roughly five years ago. So this type of creating chimera proteins is one of the earliest things people do before we can get structures of the real human ones. So in this case, remember, this protein is stable. We know how it behaves, right? And it's quite common that extracellular domains can be very large. So hopefully we can make this one crystallized really well by stabilizing it. If this part crystallizes, hopefully this part will just tag along and crystallize well, too. It's not a guarantee. But if the completely red one would be very unstable, this one is very stable. Hopefully this one can at least be intermediately stable. Yes, but there are really good predictors for that. You probably looked or designed some of them in your bioinformatics course. So that's not very difficult. The other thing to realize is that you shouldn't be so worried about testing things. Because they do it going into the lab and determine the structure. Yeah, that will take a year. But what we do, and we will show you that after Easter, stitching the two genes together and ordering, normally you don't do that yourself and you just order a gene. And that will take us three, four days to get. And then we have the genes. So it might cost $100. Well, maybe slightly more because this is a long one. But it's cheaper, far cheaper than the salaries of people in my lab. And after that, we inject this in the nucleus of a small frog egg. And once I have this in the nucleus of a frog egg, I put it in an incubator fridge for three, four days. And then I can have this overexpressed in the frog egg. And we put the frog egg under a microscope and measure the current in the frog egg. If this one worked, I'm going to see a current say, well, if I say add whatever. In this case, I don't have a binding site in it. But I'm going to say that it's modulated by alcohol. So I can easily check with is there a giant channel in this frog egg now? And in this case, if I overexpressed, 99.9% of the channel in the frog egg is going to be this one, right? So I can very easily check functionally. Did this form a channel? Is there a channel here working? If the answer there is yes, then it's probably worth spending half a year and trying to go after and create a structure of it. But testing this function might take two full weeks if we're lazy. It's cheap. The worst thing that can happen is, well, I bet that they tried 10 different places of stitching this before they found one that worked. So don't be afraid of failing. Failing is part of science. Our enthalpy and entropy were during folding. I don't think we talked about that. I'm going to talk about that today. So let's skip that. Cold denaturation. Yeah. And that simply had that tip. But what is it used for? Or rather, how does it relate to... You could, of course, say that it's a bad thing, right? In general, you don't want proteins to be denatured. Wouldn't it be better if proteins were super stable instead of having a stabilization in the one to two hydrogen bonds? Shouldn't you have 50? So you don't want proteins to be too stable, right? So it's not just that cold denaturation is something inherently bad that you never ever wanted. It's part of the life of being a protein. It's part of the stability. And just as you will eventually denaturate if things become too warm, things will denaturate when it grows too cold. This is very much related to this energy entropy balance. I know that I touched briefly upon it yesterday, but since we're going to go through it in more detail today, I will save that for today. Yes, but this has to do with the... If you go back in the book, I don't have cursor. If you look at the basic hydrophobicity versus hydrophilicity and the solvation energies of most molecules, right, it turns out that it's not just a linear curve. It ends up being this very shallow curve where you tend to have a minimum and the... Sorry, a maximum of the free energy of solvation in water, and that would correspond to a minimum and the free energy of a protein. And of course, if you start deviating from this through one side or the other, we will eventually unfold. The other alternative would be that you had a curve that somehow was monotonic, right? So that, say, the lower the temperature got, the more stable you would be. And at some point, if you are now an animal living under very cold conditions, if the protein just kept getting more and more and more and more stable, the colder things got, that would be a problem because it would be very difficult for animals living under those conditions that their proteins would become too stable. The other problem, though, is if you are... This doesn't happen, but in return, it's a feature that things tend to denaturate when it becomes too cold. And that's not bad either. That's not good either if you live under cold conditions. So how do we fix that? Cold shock proteins. So why don't you have cold shock proteins? That would cost energy, right, for your bodies. So we don't... Same thing with your way, you don't have heat shock proteins. It would be waste. And the same way that a bacterium doesn't have a nervous system, it would be waste for the bacteria. So today, we're going to talk about what actually happens. We're going to get back to this molten globular. I know that I mentioned that several weeks ago. Several lectures ago, I did not forget it. I have been lying a bit to you. There are not just two folded states. Sorry, there's not just folded and unfolded states. This molten globular is halfway between. And then we're going to talk a little bit about kinetics and actually look into when the white proteins are stable. We're going to get really close to cracking Leventhal's paradox today. And then I think we'll talk a little bit about kinomics there too. But let's talk about denaturing. Remember the first few lectures when I said that what happens if you just put oil in water? You get the quick hydrophobic effect, right? And it's super instant. And then I kind of alluded to that, yeah, and these things are going to explain protein folding too. But what is it that I've been saying the last few days? If you just pick a random set of hydrophobic amino acids, what's going to happen? You're going to get some sort of blob that's hydrophobic because it had a blob of oil, right? But that's not a protein. So on the one hand, we have a state that we can call completely unfolded or denatured. We have some complete native state here. And then we have something halfway between here that's a blob or something that has formed. It's not a protein. The solution is not going to be stretched out. Remember, and that's why we had this argument about what is the average end-to-end distance of a chain. If you have 100 residues, the likelihood that you will find it is one single state that you will never observe that. So in practice, while everything is on the way for the folding, we're going to have some sort of blob-like structure before we get to the 81. So there are at least three states here we need to think about. So we've already argued that we're not going to have an extended coil. And when you start doing experiments on this, it turns out that the experiments say, you can definitely say that you've added an adenium hydrochloride or something a little bit, at least. The protein no longer works. And then when you put it in an NMR machine or something, it's going to look almost like the native one. So just people, we got a little, well, not the eye, but the generation or two prior to me got very confused when people started doing experiments on this. It appeared that unfolded proteins can be almost like a native protein. It's just that they don't work for whatever reason. You still have secondary structure. And you know what? We talked about, what's it yesterday? That any random sequence, as long as you have some residues that like to be alpha helical, you're going to see a bit of alpha helix. That's not alpha helix or small beta sheet parts. It's not the same thing as being in a stable protein state. And to really, really fully unfold it, so think of a normal denatured protein that you kind of kicked it a bit so that it doesn't work. To really unfold it and get this stretch out completely, you're going to need to add like five molar of adenium hydrochloride or something to really throw the kids in sync at it. And at that point, you're going to start forming salt bridge. Every single peptide bond in the backbone will start to form salt bridges with going adenium hydrochloride or something. You completely altered the electrostatics. And that doesn't really happen in your body. So you can actually measure this. There are two things. You can change the temperature. And then I can also keep adding some very strong denaturants like going adenium hydrochloride. And at low temperatures, body temperature are lower. And when you don't have horrible amounts of good adenium hydrochloride, this is where you're going to have the native state of a real protein. I don't care about things that don't form proteins for now. If I just keep increasing temperature a little bit, I'm going to get to this perturbed state. This doesn't really work anymore, but if you put it in an MR machine, it's going to look like a protein. So this is what we call this molten state. And you can literally think of this, think of the native state as being an ice crystal or something. Actually, I know ice is not good. Think of this as being almost solid. And here we melted it a bit, but it still sticks together, right? But it's on its way of loosening up. And then if we really keep cranking up adenium hydrochloride and everything, then you get this completely coil-like structure when you've destroyed all protein structure in it. Here you're not going to have any helix or anything anymore. So when you change both denaturants and temperature, there are a bunch of different transitions here we want to understand. So we've talked a little bit. The coil is pretty boring in the sense that it's just a stressed-out chain. We've talked a lot and showed you a lot of native states. So let's spend a couple of slides on this molten globular and see if we can understand it. So when I was roughly erased, that's when all the experiments started happening that we started to understand more and more about the molten globular. I'm going to argue that if you just look at this experimentally, what you're going to find is that the main chain is still fairly ordered. You might have a bit of secondary structure in it. You will have broken the secondary structure a little bit, but to first approximation, you have roughly the right shape in the main chain and you have roughly the right order of the secondary structure elements. The hydrophobic core is there. So think of this like the oil blob that has formed in water. We have turned all the hydrophobic amino acids to the inside. The density is roughly the density of the protein. So you don't have tons of water in that. You have a little bit of water there, but the protein is definitely compact, almost like a football. The volume and the size of the protein is roughly what it should be in a folded state. And then there are two types of transitions. Let's start when you move from the coil to the molten globular. This is a fairly unspecific transition, just like the hydrophobic. This is entirely the hydrophobic effect, actually. Turn the hydrophobic residues to the inside, the hydrophilic surface. And that's not at all a phase transition here. That's fuzzy, gradual. But when you move from the molten globular, which is not really that well defined, into the native states, that's a very sharp all-or-none transition. This is boom, and the protein is folded, and then you get function. And as you start increasing the temperature again, you go back, and then it no longer works. So for proteins, the phase transition is moving from this unspecific globular state when it's just collapsed onto the native state. So that we actually did a small table here. The molten globular, there are tons of things that are like the native, and other things that are like the completely unfolded state. So it's a compact and hydrophobic core. It's not tons of water, but the structure is not rigid. It's going to be hard. You can't take a molten globular and put it in an X-ray experiment or cryeum and get a structure of it. The structure is not well defined. On the other hand, you do have secondary structures. So you can put this in an NMR machine if you have lots of money or in a CD spectrometer if you're smart. And you will see that's the amount of alpha helix and sheet and everything. So this comes back to the little things we talked about the last few days. Secondary structure happens everywhere, but that doesn't necessarily make a protein. And it's like the unfolded in the sense that when you move from the... let's see, when you move from the native states to this molten globular, something dramatic happens. There's going to be some free energy barriers involved here. So there's a very definite change here. And when you keep moving to the entire unfolded, there isn't really that many things that happen. So in this sense, the unfolding is really when you move from the native to the molten globular states. If you just keep adding stuff, then there isn't really a hole. There is no second transition from the molten globular to the fully unfolded coil. Just as we had this hydrophobic core, if you have large side chain ordering, trip defense are going to be buried and everything, the side chains will have started to pack partially, not completely, but they're trying to pack. But there is still some water and everything inside it and it doesn't work yet. If you take an ion channel and keep eating it until you get some molten globular, it's not going to be an ion channel anymore. There's no biological function anymore. Oh, sorry, that was the last one. There's no unique side chain packing. You start having this disulfide bonds formed, but it's not functional yet. I found this is, I don't know, this is not from the book. This is an, yes, what do you think? So all these things are equilibria, right? So in general, again, don't think the egg is a bit special because they actually end up with some very deep frinders. But in general, if you are here and gradually reduce the temperature again, it's going to work. And this is what happens in your body because when your proteins are synthesized in the ribosome, as I'm going to argue, they pretty much start out here. When they're coming out of the ribosome, the first thing that happens when they see the water is that they end up here. And then when your proteins fold over a millisecond or something, they go from the molten globular to the native state. So now we're heading into folding, but yes, all these things are reversible and that's what Christian Amfelsen got the Nobel Prize for. Exactly. So the problem here is that you now have like 5, 4, 5 molar of guinidinium hydrochloride. So you're now going to need a smosas or find a way to get rid of the guinidinium hydrochloride, which again, if you were to inject guinidinium hydrochloride in somebody's body, they would of course die, right? But in a test tube, if you can just extract the protein from the guinidinium hydrochloride, it will spontaneously fall back. So this is one of those things. You might sound, this is about as far from obvious as you can get. Remember what I said in the 1950s, 60s, when Christian and others were working with this, this was so remarkable that nobody measured it. People at the time thought that there is some, you could definitely imagine that the body uses energy to create, to build a specific protein in the specific structure you want. So the mere fact that proteins will fold simply depend and define the lowest free energy states and this is the process that will require any biological machinery. I find that completely amazing. So is it universally true? Will any protein fold? So these membrane proteins for instance, they might need a translocon, right? On the other hand, we're not sure, does the translocon really use energy or does it just catalyze things to make sure that it happens in a realistic time? So that we don't know for a whole lot of proteins. There are chaperonins that we're going to talk about later today, so I would say this is a bit unclear. I would not say that we've proven that proteins definitely need machinery to fold. They need machinery for it to happen sufficiently fast. But that's a slightly different question. Yep? We'll see that in a second. I'll come to that in a few slides. It's a good question. So this is an NMR experiment where people have been able to do NMR as a function of different concentrations of, I think it's actually going to dinium hydrochloride or it might be a temperature. So this is a sequence of, let's see here. Ah, I should see that. Basically, that is the native state and then you go up to increasing. I think this is actually a function of time or something. And then you go towards the more and more molten states. Do you see the difference? Hardly, right? So this is just a, think of this, if that is the fully native, beautifully packed state, what essentially happens is something like this. When you start to destroy a bit of the side chain packing, you might have a few water molecules. The point is that the helices are intact. And what you see here on this scale is that the helices are intact. You're not going to see the fact that there is a minor distortion in the hydrogen bonds and everything that they've started to move just a little bit. So when you measure overall properties, I would actually, this is a transition that really corresponds to the protein starting to unfold. And the real part that really decides whether something is a protein is not as obvious as you might think. So if you just, if I were to take the sequence and just throw it in water, it would like in well under a millisecond, probably a microsecond, move from the unfolded to the multi-globular state, something like this. But the actual thing that decides whether this is going to be a functional protein, then we're going to need to go in the other direction. So we're going to need to go from something like this, that it has roughly the right shape, but you haven't really finished all the pieces of the puzzle. And now you're going to need to pack these super well and get rid of the water and everything. So what decides whether things will pack, or things will form stable proteins, is really going to be this final step. What do you think that will depend on? There's a free energy difference. So do you think it's going to be a fast or slow process? I will come back to some slides about this, but it's more fun to have you think a little bit about this before I give you the answer. Now you said transition state, so you kind of assumed that there will be some sort of barrier between it. That's reasonable, right? So I already hinted that it's some sort of transition. So there will be a barrier here. And I can even say right now it's going to be a reasonable, a high barrier. So what type of barrier is it? So when you talk about barriers, right, and I deliberately decided that any type of barrier, if it's a free energy barrier, there are only two components of free energy, or enthalpy, versus entropy. So you said entropy. Why do you think it's going to be an entropy barrier? So it's going to be, that's quite correct. Or you could also think of the same thing. It's a searching problem, right? It's not really a matter that it will take, if this was an energy barrier, it was going to be that we needed a lot of energy to surmount it. But it's not really energy. It's just that you're going to need to try tons of different combinations before you find the right one. And that's when the barrier is entropy. I'll give that, yes, but we'll come back to that in a second. And here we start to enter some pretty deep stuff. Earlier on in the course I said that proteins were hetero-polymers. Do you remember that? So what does that mean? Yes? Yeah. And the fact that they're hetero-polymers, hetero-polymers are the mixed units. Why does proteins have to be polymers in the first place? They are, of course, but why do your proteins' sequences, the peptides, they're just sequences of amino acids, right? Why on earth do you need to stick the amino acids together in the first place? Couldn't you imagine having proteins formed by the amino acids without the amino acids being in a string? Well, that's the, of course, it's a difficult question, right? Because we know that this is what nature does. But there's a famous concept of what you call a gedanken experiment. So this is an experiment you can't, in this case you can't even do it. But it's not an experiment we're going to do, but it's interesting to think about what something would mean. It's very much related to searching. Let's have a look at this. So again, this is not an experiment we can do, but we can toy with the idea. So we imagined that we had a protein that was consisted of completely disconnected monomers, versus a simple, in this case we're not going to worry about different types of polymer units, but some sort of chain of small molecules versus having the same molecules disconnected. So to first approximation, if they interact in roughly the same way, the energy is going to be the same, right? So we don't care about the energy. So this will have to be related to the searching and the entropy. And before, you know, I'll go through and derive this very quickly. So at some point, we want to understand when things fold, you start out with something that's very stretched out and then you have some sort of density or a parameter decided that you've folded. When you have folded, the density is going to be close to one. If you don't like to call it density, think of this how much folded the protein is. And we want to understand how the entropy changes when you go from unfolded to folded. And you could very roughly assume that if we have these monomers and every monomer can move in some sort of small volume, V or V prime. And of course, if this volume is large, what's going to happen to its entropy? That is going to increase and that's good. But if the volume is limited, we're going to have a lower entropy, which is bad. And one way of thinking of this is some sort of, let's start with the chain because that might be easier. In the chain, if these are bound together, well, the amount of available volume is going to be the volume, the row here is basically the density of the monomers. So we have, if each monomer has some sort of volume omega and there are n of them and then we divide by the volume V, that's the density we have right now. And that's when we're stretched out, that's relatively low unless you're folding its increases. And if we are along a chain, well, this is not really going to change that much. The volume that's available, I have some sort of, there's going to be some constant for every monomer and then the amount of volume I have available for each monomer is the part of space that I haven't really filled up with the density yet. So just something one minus the density. It's not really going to change a whole lot because if you're tethered both to something before you and after you, you can move a little bit in your surrounding but you don't really have a whole lot of freedom. On the other hand, if you're in a large cloud, well, everything is completely free to move here, right? So that the total amount in the space that's available, that would be the total amount in the entire space, sorry, the total amount of space, volume, minus the volume used per monomer multiplied by the number of monomers. And then I divide by the monomer, the number of monomers. And then if you simplify that a bit just by using that equation, you end up with something that depends on the density. But to see that this depends more on the density than that one. If you, if this is, and again, this is just some arbitrary volume within a factor, but we know that entropy is proportional to the logarithm of volume. And if we just plot these two as a function of rho, we don't really care about all the constants. So set them to one right now. We're going to say if you are completely free, this cloud thing, the entropy will start to be very low. The reason why it's very low here, here you're perfectly packed, right? So when rho is one, you essentially, you don't have any freedom at all. So it's going to be super low here in the protein. And as you're then reducing, so now I'm moving to the left, your left here. As we're increasing the, sorry, decreasing the density here, the entropy will go up because you're freer and freer and freer. When you move more, there are fewer and fewer things you bump into. And at some points, again, these monomers, they're free to move away from each other. So what's going to be their best state? When they're infinitely far away from each other. And when they're infinitely far away from each other, how large will the entropy be? Yes, it's going to be infinitely high. The problem is that when you're going to fold the protein, you're going to need to move in the other direction. So if the unfolded state is infinitely good, that's going to be a problem, right? You will never, ever fold if the unfolded state is infinitely good. On the other hand, if you have something tethered together on a chain, what's going to happen in this break, you start, again, you start out roughly the same way you beautifully packed and everything. But at some point you get to the part where you can't really separate things further because you're tethered along the same chain, right? So the fact that proteins are polymers puts a limit to the amount of entropy you get in the unfolded state. If you didn't do that, unfolded states would always be infinitely good and you would always end up unfolded. So you might at first think that the unfolded state can't be important to the folding, but it is. Because if the unfolded state would be too good, you would never be folded. So this effectively means that the unfolded state can't be too good. And that's why proteins have to be polymers. If proteins were not polymers, we could not form stable structures from them. Then of course we don't know exactly how nature has evolved that, but you could imagine that in a completely arbitrary world we might be able to create biomolecules of something else than amino acids, right? And the reason why amino acids form these molecules is because they have these properties that they polymerize and then they form stable structures. If they didn't polymerize, we would not have life. So this sounds good. So any polymer should form really interesting life-like molecules, such as this one. I'm not sure whether it's pretty boring, right? So the fact that being a polymer is not sufficient. The polymers are necessary, what do you do in mathematics? It's a necessary but not sufficient condition. And the reason for that is that you just take one of these homopolymers and you can calculate, since we know the entropy and I said that the energy was roughly constant, right? So we can plot the energy which is roughly constant minus temperature times this entropy. And that should be the free energy. And then we can see what happens if you do this for a couple of different temperatures. So what would happen in this case as you gradually, if you have homopolymers, as you gradually reduce the temperature, it will get lower, you will have some sort of minimum here, but you see that it just goes up here. There's no clear barrier or transition or anything here. So they are remarkably boring and irritating. And it's something like water, to have a phase transition, you would like there should be an abrupt change, right? That some high temperature should be best to be there and then at some low temperature should suddenly be best to be there. So we're going to need something like this. It needs to be much more specific. How do you get that specificity? You mentioned it earlier. The side chains, right? Because proteins are not homopolymers. They are hetero polymers. So all these side chains, that the first side appear to be random, they create this extremely specific pattern that makes sure that one state is stable but not another one. So they're going to create some very deep well in the free energy landscape. And there are some classical images here. I think that it's Kendall who drew this image originally. It's one of these things. There are so many people that have stolen it on the internet, me included that he doesn't get credit for it anymore. So this is probably going through 5 or 10 copies. So the way to think about this, you can't draw this in 3,000 dimensions, of course, but there is some sort of landscape here that we're sampling, both in terms of energy and entropy. And most of the barriers here are really going to be about searching. That's entropy. But at the final point, what really stabilizes you is that there's going to be one very well-defined unique state that also has a very low energy. Because there is no question about it. Is the entropy good or bad here when you're native? Bad. It's horrible entropy-wise. You're just taking these beautiful molecules that can have almost an infinite amount of configurations and force it to be in one single state. It's the worst thing you can do. And the only occasion where that will be good is if the energy is very nice and stable there in the end. So what... Now we're getting close to those free energy barriers I spoke about, right? So once you get in the multi-globulet, you have this collapse. But you were unspecific. The side chains, yeah, they're hydrophobic and play around together. They're not really that well-packed. What happens in the native state is that they eventually find each other and lock things in. And once they're locked in, they're hardly going to move at all anymore. So that's why we see one of those differences between the multi-globulet, which was this middle state that I erased, and the real native state. In the real native states, you start to lock in the side chains. And that's if you take a real protein, you can put it in a crystal and determine the structure, and then we're going to see the specific conformations of the side chains. Why does that mean that they're specific? Couldn't you imagine that this differs from one X-ray crystal to another? Sorry. I went back a little bit here and tested your knowledge about X-ray. So what is an X-ray crystal? A structure you get from an X-ray crystal. How do you get it? It's a diffraction pattern. And how do you get a diffraction pattern? It's a diffraction pattern. How many molecules cause the diffraction pattern? Billions or trillions, right? So the only case you will be able to resolve these side chains if there were not just billions, hundreds of trillions of trillions of trillions of proteins. And every single copy had the side chain in exactly the same conformation. And that would not happen unless it was remarkably unique. It's so unique that you can't fact them any other way. So once proteins are actually folded, they're much more like a solid than a liquid. As you saw in those computer simulations, and there you're going to see it in your own computer simulations, it's not quite true. Because if you do this in an X-ray crystal, you might be doing it at liquid nitrogen temperature 100 Kelvin. And then it's going to be fairly solid. But at room temperature, they will move a little bit, but they're going to stay packed. The side chains will have their interactions, and you're going to see this in your own computer simulations of a protein. This, I would argue, is something that we've actually changed the view quite a lot the last 20 years, mostly based on computer simulations, that proteins are much more flexible. The first computer simulation that Martin Karplers and many others did in the mid-1970s, people even originally people thought they were wrong, because the protein was floppy. They were not as beautiful and well-formed as they were in X-ray structures. And then eventually people started to realize that proteins actually do move. And I think that was one of the reasons why they got the Nobel prizes, because we learned things about proteins that we couldn't learn just from X-ray structures. Yes. Sorry, say that again. So things like this about dancing or something, that you're holding onto your partner, right? You can still move a lot, but you're not randomly... It's not like in water. In water you would constantly break hydrogen bonds and reform them with something else. That doesn't happen. So the side chain interactant partners stay roughly... Yes, they might lose grip for a fraction of a second, but the patterns stay the same. Let's see, it might be easier to show here. So that in some sort of multi-globular state here, right? The side chains, they will just flop around and everything, and they will occasionally be on one on the left side versus on the right side. What happens in the native state is that they stay packed. And they stay with the same interactions. If there is a salt bridge here or a hydrogen bond there, they're going to be maintained. It doesn't mean that they won't move at all. So this leads to something else then that if you are here, you're non-native, and we start to denature you. What happens here if you go towards some sort of unfolding? Is that going to be good or bad? So that's always bad, right? Because you're losing things, you're losing energy here. You haven't really gained any entropy yet, because imagine if you're sitting in a crystal, when you start to hit other things, initially you're just destroying, you're not gaining anything back. This is what's stabilizing the protein. Because if this started to be positive, proteins would spontaneously unfold, right? So the packing and the strong interactions here, they stabilize the protein in the native state. But of course at some point you're going to get over a barrier because I keep adding energy, right? And suddenly there is enough space that say my tryptophan side chain can rotate. And then I start to gain entropy. And the second when I cross this barrier, it's much going to be much better to be in this higher entropy state when you're a bit freer to move. So effectively we've created a phase transition. And this goes back to your response so that what type of barrier is this? I can't put it in your leg here. There are two barriers here. There's one to the left and one to the right, right? So which barrier is entropic when you're going from what to what? Exactly, so if you start out here and you want to fold, you're limited by entropy. If you are folded and you're going to the left on the other hand, then it's energy. So the barrier has different characteristics depending on which way you go. And you know this, because any above you, you can, how difficult is it to unfold the protein? Pretty much anything heated or adduated in your hydrochloride or something, it's going to be super easy to unfold proteins. Folding them is hard. Folding them is a searching process. And you know this. I bet you could draw this, but you would have to think a little bit about it. To understand this, and again, anything, you're going to hate me for this. Delta G equals E minus TS. You can probably start to imagine that that equation is going to be somewhat important to know on the exam. Same thing here. Let's just say that the x-axis here is density, so you're folding if you're going to me here. What I said with the energy, the energy starts out high and then you somehow pack things. When you fold it, it starts to get better and better here. Why would the energy eventually go up? At some point you start hitting other atoms, right? So there will be some sort of, if you're infinitely far away, it's going to be zero. If you're very close, it starts to go very high. So there will be some sort of minimum here, exactly where it is we don't know. The entropy, that was roughly the curse we draw five minutes ago, right? And even if you don't want to derive those equations, you can hand wave about this. If you're completely folded, you don't really have any freedom at all. And eventually when you're unfolded, the entropy will go up. The exact shape here we don't know. I'm going to argue that most of the things happen when you actually start to move the side chains so that the main effect here will come in the middle. But you could imagine that this is relatively smooth too. If you now take the energy minus that one, there's going to be some temperature. We don't know what that is right now. You're going to end up with something that looks roughly like this. There's some sort of minimum there and some sort of minimum there. Because the entropy part is not exactly proportional to the energy, but these parts happen over a closer interval. And what I'm going to argue is that that is really this denatured or multi-globular state, and this is the native state. And now we have ourselves a small free energy barrier. This type of barrier is going to give you some sort of all or non-transition, because we are here in the denatured state, right? We showed yesterday that we're going to need some sort of folding unit. We can't fold one residue at a time. So during some sort of conditions we're going to move from D over that barrier. I guess once upon a time you used to call this sharps and everything. I guess now it's just hash-parked for Twitter. And then at some point you're going to get down to the native state that stabilizes you. And what happens when you're denature, you're going to go from the native state to the denatured or multi-globular. And then you can just separate this. Depending on what direction you go, is the energy going to help you or hurt you, and is the entropy going to help you or hurt you? Why it's S-shaped? So this is what I mentioned that if you start out with something that's very ordered, you basically have zero entropy. And it's not that this is not a continuous process, because you think my finger is like some sort of trip defense height change. I'm going to separate this a little bit. I still can't really move, right? But at some point when the side changes are no longer forced to be packed, suddenly they can move a lot. And when I'm already here, if I now separate it another five centimeters, I don't get a lot more freedom here. So any type of packing is usually that it tends to be more discreet. It happens in a fairly narrow regime that first you are constraining each other, but when the side changes are no longer constraining each other, they relatively suddenly gain a lot of freedom. Will this be universally true? It's a great question. Will this happen for any sequence? So what determines if it's a folded state? So I would actually say it's the exact opposite. You get a native protein when you have a well-defined folded state that is created by the side chain packing. So in general, if you just randomly select side chains, will they pack beautifully? Right, and if they don't pack beautifully, you're not going to get this property that the entropy drops over a fairly narrow range. You're not going to get the beautiful stabilization, and then you're not going to have a smooth, nice protein. So here we start to see the problem is that most things will likely not be proteins unless you just happen to have very good packing. And if you think it's difficult to build a 10,000-piece jigsaw puzzle, you can imagine how difficult it's going to be to pack amino acids in three dimensions, right? It's super hard. I'll come back to that after the break, but I'll spend a couple of more slides here, and then we'll have more fun stuff so that this comes back today. It's a great question because you were just one slide ahead of me. What is the native state of a protein in general? What defines that? Should it be unique? It's probably pretty good, right? Because otherwise it's not really well-defined. If you have 100 states, that's not going to work. It should be closely packed. It should have low energy because if this energy was not really low, what would happen? On spring day, you walk out into the sun and you do nature. That would be a bit of a bummer. So the nature has to be low. Two hydrogen bonds is not a lot, but again, if you start going to two high energies, it's not going to work. So the problem is you pretty much need all of these things. And that means that anytime you randomly start changing amino acids too much, you will likely start breaking one of them. You might not be able to pack it. If you can't pack it, you're not going to get a good packing energy. It's no longer going to be unique. So it's, sadly enough, you need lots of these properties. You can calculate a bunch of these things in either in simulations or experiments. For instance, the number of native contacts. So for all of these two, they somehow measure the free energy or in this case the, let's see. So on the y-axis, you have end-to-end distance. And on the x-axis, you have the number of native contacts. And this is for a number of different mutants of a protein. And with these scales, then, I'm going to tell you what is the free energy. So for all of these things that you're going to have a blue free energy, that's good. And the exact location of that for all these variants is going to depend a little bit on the number of native contacts we have and what the end-to-end distance of the chain is, right? So things are going to be very sensitive to the exact amino acids you have in the packing. This is a great place to take a break. After the break, we're going to start to looking at what this actually tells about why proteins fold and how fast protein folds and whether protein folds really are unique, because they might not be in some cases. It's 1024. Should we reconvene at, let's see what's 24, let's take 20 minutes, so let's meet at a quarter to 11. So we spoke about the native state. I guess we're on a really good question here over the break. All these things we talked about, ion channels and moving and everything, doesn't that violate the assumption that there should be one well-defined unique state? All these channels have multiple states, right? And that's quite true. I would argue that there are two ways to look at that. One of them is that these normal states we have that where the differences between substates are so small that it corresponds roughly to KT so that they will be part of the natural motion. One way is, of course, to say that that's what is the state. If you really wanted things to be unique, you would need to go down all the way down to individual spins of every single electron in the system and that, yes, may be interested in physics, right? But in biology it's pointless. So at some point we say that things are in a native state when we start to say that this is essentially the same state biologically. Even simpler, what if you have a valine? You have two CH3 groups. If I move that around, technically that's a different state, but of course chemically it's exactly the same state. So at that point I would say, yeah, that's the same state. This might worry that you have numbers of particles. It's a different state. So the entire definition becomes a bit fuzzy. One of the reasons of this fuzziness is that when most of these things were derived 30 years ago, there were very few, if any, protein structures that we had good structures of multiple states. The fact that we've learned much more about dynamics of proteins that they move don't think of proteins like bricks. Think of them as small miniature machines. I would still argue that we still like the concept of talking about native states, but who knows? This might change in 10 years. One of the things that my team is pushing, a lot in crye, in for instance, is to move away from thinking about protein structures as structures, but think of them like an ensemble of structures, maybe 10 or 100 structures, all with different probabilities. I have no idea whether this will gradually take over, but that's how science moves, paradigms gradually change. You remember that curve before the break, right? The energy and entropy created this slight bulge in free energy. Earlier on in the course, we had these super abstract curves of... Yes? So what types of energy do you think? So first, all of these energies, there's all the energies in the system, right? Kinetic energy is not really going to change because the kinetic energy is the average velocity of atoms, and that's roughly the same. You don't cool a protein automatically when you fold it. But I would say up here you start having lots of hydrogen bonds or so and everything. What you get in the very end game here, it's kind of all packing when it goes down to the citins, but these are all the energy terms. And of course it's not going to be... This is a highly simplified curve. Imagine if you have a 300,000-dimensional system, that would be extremely rugged in practice. Rugged, uneven, that you have lots of sharp peaks. And you will be able to see energies in these simulations you're going to do later in the labs. Trust me, they're going to be rugged. They're going to be very noisy. But we could also... When we started to define temperatures and all these things, remember that we had these plots where we plotted entropy that the number of states something can be in, essentially, as a function of the energy. And what this will effectively correspond to, you can read up on this in the book if you want to, because it's not entirely obvious, that as we're reducing the energy, that is what's happening here, right? As the energy is going down, if we don't put the density here on the x-axis, but if we put the energy variation of the red curve here on the x-axis and check roughly how does that change relative to the entropy. So each point here, you point that red point against that green point, that red point against that green point, et cetera. You're going to end up with a curve that looks roughly like the red one here, that you start to go down in energy and that means a reduction in entropy. And the reduction in entropy mostly happens at the relatively narrow range. And then we've had the entire drop in entropy, right? And then the entropy doesn't really change as I keep reducing the energy. So this would be the final packing. But we also had the concept that we talked about these curves, the slope of such a curve has some sort of temperature, right? Or one over the temperature. So what's going to happen here is that effectively as we're moving down here, this is going to correspond to somehow going down in temperature. And the question is, under what circumstances can I reach that native state, the really best possible state down there? Because I said, I hand-waved a bit, but I said that we wanted protein structure to be unique. And I would argue that there are two things that can happen here. One thing is what we have here on the right, is that as you're moving down here, we are gradually going down in entropy, gradually, gradually, gradually. As the slope of this curve is increasing here, that means that some sort of temperature is going down here, right? And I'm going to argue that that has to do with these characteristic temperatures that depended on the amino acid composition. Don't worry, I don't expect you to follow this right now. But what's going to happen here is that you get lower and lower and lower energy. This is smooth and gradual, right? And the number of states, I have lots of states here because the entropy is gradually being reduced. So there's lots of freedom and I'm just gradually reducing my freedom and gradually squeezing the protein a little bit. It's nice and smooth. And at some point I just stop. So this corresponds to gradually freezing something. And don't really ice it. I'm not making a transition. It's just that your molecules will eventually stop moving because there is no space left to move. And there was nothing all or none happening here. There was no amazing transition or anything. It's just that at some point there is no more freedom available. So you gradually frozen something here. You just stop moving. Yes. Yes. So this is what you, in physics, you have something called the glassy systems or something that you just get entangled in yourself. It's not the well-defined structure, but it's the average properties are there. But the other opportunity could be that you're gradually moving down here but towards the end game here there are some structures. You get to a point where I start to have very few states. And at some point there's going to be one, two or three states that have much lower energy than the other ones. To think of that as that very deep well in the free energy landscape. And suddenly I find that and I'm going to jump all the way down to some super low state there. This is going to be unique, well-defined, and I'm very constrained. But here's the problem. As I'm moving down here I get less energy and less entropy, right? So that if there is now a free energy barrier to jump to these states I need to start to be able to find, I need to get reasonably close to these states so I can start to find them before I've lost too much energy. Is that, that's probably a bit of hand-waving. Let's try to describe it in a different way. In some sort of general free energy landscape here as I'm moving down here my entropy goes down right. So I get a better and better free energy and at some point I won't really have that much energy to go through and at some point I might be low enough that I sort of have really good energies and interactions and I can't really get across the last few barriers. I get stuck on the way. So to really find that good part I also need to get there. And that's slightly different from free energy but it's getting there, that's going to be a matter of can you get across the barrier. So to be a good protein on the one hand yes I need a well-defined unique state but I also need to be able to find it. And I need both those properties. So what will that depend on? Whether we have the left or the right. So I'll come back to these temperatures in a second that this will depend on the composition of your molecule, right, the side chains. Under some conditions for some side chains we will have a protein that is able to have some well-defined state and in other cases I won't. And remember that when we talked about different musations and different stabilization energies not the Boltzmann distribution but I could define that as some sort of characteristic melting temperature or characteristic temperature for the chain. And what... Well I'm going to hand wave about here I think the book goes through in a bit more detail. What basically happens is that depending on the characteristic properties of the chain if I get to a point so that I can get sufficiently close and really go over this transition while I still have enough energy so that the melting or characteristic energy of the chain is relatively low then I will be able to find that state. But if the whole structure is so large and complicated that I keep getting stuck before I can actually find anything then it's going to be... You might have 10, 100 or even 50 states up here that are... Technically one of these states is slower than the other ones but the difference between them are going to be relatively small. So here you might very well populate the way the book talks about this and the way I'm going to talk about it in a few slides is kind of that this is good, this gives you well-defined states. But what do you have in an eye and tell? You kind of almost have something more like this, right? You might have more than one state that is biologically relevant and important. Sorry, just have the... So the way the book spaces up which is not strictly true but the first approximation I would say that it's true that if you take a protein and just randomly assemble this in a structure there are going to be extremely few structures that have super low energy because things will in general bump into each other. If you do not believe me look at those molecular simulations you're going to do in a couple of laps and the energies in general particularly at the beginning your energies are going to be astronomical because anytime two atoms bump into each other the energy goes up. And in these cases that appear to form nice proteins either based on the physical hand-waving I just went through or simply things that we look at in experiments. It appears to be that this native structure if it is stable and really good and well-defined it is characterized by very low energy and an energy that's significantly better than the second lowest structure. Again, what would happen if the energy of the second lowest structure was very close? It would keep switching, right? And again in some cases that's okay if that switching is not biologically bad or if an iron cell that should flicker a little bit but if one of these states was significantly different that you started to unfold the helix you would not spend enough time in the native state and that would be a remarkably inefficient protein. So it's the fact that this gap is relatively large compared to KT that gives us an energy barrier that's why we move over to a stable state. And that is caused entirely by this packing and the specific amino acids. It's not true for a homopolymer it's true for a heteropolymer when we have the specific amino acids. And in particular it's not true for random heteropolymers such as random polypeptides. If you go into the lab and randomly assemble side amino acids it's not going to form a nice protein which is a bummer because that kind of destroys 99% of all the great ideas you could do with protein engineering. If you randomly do things you will quickly destroy your proteins. Do you think this is true in general? Does it hold that proteins have well-defined minimal best states? Let's look at it. Oh, I think I might even have a slide with the temperature here. You know what, this is a little bit theoretical I'll in your interest I'll skip that I'm not going to ask about that on the test. But basically what happens if you're interested in physics the process that happens in the system here on the right is related to this amorphous system what you would call vitrification that you get stuck in a glass-like state before you have a chance to fold. The reason why I decided to bring this up this is what you're going to see in cryoEM So in cryoEM I want to get the water to stop moving but I don't want the water to form an ice crystal because if the water forms an ice crystal the entire pattern I see will be the ice crystal pattern so I want the water to stop moving but be randomly oriented and that's what I guess is this type of vitrification process but if that were to happen in a protein the entire chain would get stuck in something like a multenglobular but it's not going to be well defined and if I repeat this a hundred times I would get stuck in different of the hundred lower states and the likelihood that all of these hundred different lower states would have the same biological function is basically zip-not-annil So a friend of order would then say if I now say that sequences that fold into stable and neat proteins do so because the native structure is the energy gap from the rest and I also think that this is caused by national selection then I should be able to roughly calculate these probabilities and you actually can if we know that the probability to have one state at a very low energy the probability of that should be let's leave the Boltzmann factor e to the power of the energy gap divided by in this case it's k vitrification because it's not really, I'm not changing between different amino acids in the sequence but it doesn't matter, it's the same Boltzmann factor so you can say kt and if we have something if our energy barrier is now 10 to 20 times higher than kt we say the likelihood for a random sequence to fold is in the Boltzmann factor of 10 to the minus 8 so just one tip from the coach if you're going to do proteins doing them randomly is a pretty bad idea because you can't do proteins before you get one paper to publish that's why we're all starting from something that's known or letting a computer design a good sequence for us but I haven't really proven that folds are unique so what would be the likelihood of having two such states can that happen? what's the likelihood of having three such states 10 to the minus 24, right? let's start to get where this actually does happen for proteins and you've seen it with these strange diseases amyloid disease such as prions are likely due to proteins having multiple stable states so that it doesn't disagree and that's why I figured I should bring up the question this doesn't at all disagree with our ideas about proteins that for virtually everything we've tested in the lab and that has been done for tens of thousands of structures the idea with a well-defined unique state holds but again there is no single rule in biology without an exception and these are the exceptions and now and then it's going to be extremely rare and it's bad for the body and that's why it's usually related to diseases it can happen and the problem what happens in all of these cases is unfortunately now it comes down to defining what is a native state, right? and here we've said that the native state as long as we maintain the function I don't really consider it a different state but these states are apparently so different that they start having a different function and in the case of prions that the amyloids that the really lowest free and the state is even worse mad cow disease is another example of that we have well there are the prions basically that your proteins that aggregate or something bad happens nature tries to avoid it but it can't always avoid it for you I think I'm going to stop the kinetics there I will talk a little bit about kinetics maybe at the end of the lecture but to set you off with a little bit more biology I figured that what the book doesn't do is that the book is all physics and we're going to talk a little bit about how this happens for real in vivo too there's another reason why the book doesn't do this can you imagine why? sorry, exactly most of these structures were not known 15 years ago this is actually not just 15 when I did my PhD most of these structures were not yet known this one actually was most structures we've learned about the last 15 years many of them we've understood from biological or functional point of view for many many years but the time when we get good high resolution structure often in particular good high resolution structures of the complex here this is a newer structure that's quite recent it's not easy to determine these structures so the very first part that happens and I know you know about this this is when we replicate DNA so it's copying DNA so that's a protein that binds to the DNA it makes sure that we basically unzip the DNA and then we're adding and when we've unzip the DNA we keep, we make copies by having new pieces of DNA bind to each strand it's all caused by a protein what type of protein is this? it's an enzyme it's the same type of enzyme that some of the ones I spoke about yesterday and everything it's because we don't have we're not using ETP or AA anything here we're not using energy for this process so technically if a DNA structure were to unfold you would have new small bases appearing but that would take forever almost because the DNA is supposed to be stable so what the molecule here helps us to stabilize this unfavorable state when we're breaking the hydrogen bonds here so that we can copy it under reasonably safe conditions this is an insanely amazing machine this is basically if you think that your computers are good this basically doesn't use any energy it just builds things from building blocks and it never ever makes any errors so today one of the things that are going on in large storage people are actually trying to use DNA for computer storage the problem is that it's it's basically too rare encoding at a surprisingly easy but to get your storage back you pretty much have to sequence DNA which is of course super expensive nowadays but if you could do it the amount of information you could store is far better than any optical medium than anything you could imagine I have no idea what's going to happen in the long term you could even imagine having DNA-based computers because again these computers are not that advanced computers about maintaining information than pushing around information that's what you're doing here you have AGC and T if you can move these around and copy the information and steer what happens you can start to solve simple computational algorithms with DNA they're not going to be fast but the energy efficiency it's like probably five orders of magnitude better energy efficiency than the best GPU you can imagine could you imagine using this for anything DNA polymerase if you discovered this molecule in the lab what would you do apart from publishing and getting a nice paper out of it so you can use this to speed up if you want to copy DNA in general particularly to sequence it or something if you want to produce DNA for whatever reason the only problem here is that initially you had to add more DNA DNA enzyme after every single cycle because at the end you need to add heat for this to and what happens when you add heat to a protein you denature it so you had to add new DNA enzyme after every single cycle it was expensive, slow, boring so if you needed to solve that what would you do now hydrogen bonds here now you're thinking like an engineer biologist to find a protein from an organism that because I hinted about that yesterday there are proteins that live under very hot conditions in particular a bacterium called thermozequaticus and they found that in the Yellowstone geyser so the TAC DNA polymerase is a DNA polymerase from this bacterium and this bacterium is just happy at 70 degrees centigrade it loves it, it will survive like 80 degrees centigrade so what if you use the DNA polymerase from TAC you just keep heating it, cooling it heating it, cooling it and the DNA will keep bind, unbind, it will keep duplicating it you don't need to do anything completely automatic that's what Kerry Mullis got the Nobel Prize for polymerase chain reaction do you see submit I'm not sure whether you've thought about yourself getting Nobel Prize most Nobel Prizes are awarded for a single paper one single idea, it's not the lifetime it's one invention and this is an example of such an idea at some point one person in the room had an idea wait a second couldn't we try to find a protein that's thermostable it's just a crazy idea I bet there could be 50 reasons why it wouldn't work but somebody was crazy enough to say you know what it just might work let's try it and in this case it did work the next part is the RNA polymerase when we read this I actually found that you're not so not going to believe me the first day I actually made this slide before he got the Nobel Prize so RNA polymerase copies DNA into messenger RNA we have a piece of DNA in there yes we have a piece of DNA in there these are super complicated molecules and they also consist of multiple subunits you're seeing all the beta sheets and alpha helix there we haven't even started to talk about DNA or RNA structure I'm going to do a little bit about it later because it's related to some diseases but these molecules are significantly more complicated than the simple protein structures I showed you how many domains do you think there are in a molecule like this one or many many so any molecules like they're going to be many domains the third part of this Holy Trinity are the ribosomes 70s ribosomes this particular structure is from Tom Steitz Venky Ramakrishnan was also one of the portal figures here to determine the first structure of the ribosomes I would never have thought I would see a complete structure of a ribosome because remember when I was your when I was your age we draw them as blobs large and small subunit that's all we knew about it and this is a marvel of RNA entering being decoded you can just imagine the number of binding sites here to first make sure that RNA binds then you're going to need to stabilize the transition states to cause the protein change to polymerize then you're going to need to release the RNA from the amino acids and then you're going to need to let them fold gradually in the exit tunnel here and all this is going to need to happen in a matter of seconds it's simply amazing how it works so think about a few seconds and then you will fold the protein in vivo through this entire process until roughly five years ago all of these structures were determined with X-ray crystallography what's happened now is that everybody is switching over to cryo-em and in particular for the ribosomes everybody is using cryo-em so if you're interested in this we have a very large cryo-em team at SILAP lab working on determining ribosome structures so what use would it be to understand ribosome structures or much more simple like that anytime you want to understand bacterial infection this is really the factory of life if you're producing proteins if you could design drugs that specifically disrupt the production of proteins for bacteria but not human but prokaryotic versus eukaryotic ribosomes are quite different so if you could specifically knock out the production chain for proteins in the bacteria you could develop entirely new generations of drugs antibiotics what you see here is that the protein production in vivo is a bit more complicated than it is in vitro in vitro we just assume that we have this entire chain fold the entire chain starts and it's just a matter of deciding what form this folds into what happens if you try to do this in vivo so this is a small enzyme called luciferase it doesn't matter what you do but you can monitor its function by the amount of by the light you see here basically so here you have the protein working but if you're starting to produce this you see that nothing happens and then boom after some 10-15 minutes things start happening and then when you stop the synthesis you're here so the point is that synthesis can be remarkably slow in this case it takes 15 minutes and that's be likely because the protein folds very slowly the entire process the ribosome is slow and everything so many proteins are way more complicated to fold than the simple ones we've looked at the reason why still this still works you have way more than one ribosome in your body so that this is kind of like a pipeline process and as long as you have many of the proteins in this pipeline you can still produce much more than one protein per 10 minutes but this is what you're using a huge part of the energy in our bodies just to keep producing proteins but it's we can do a fair number of experiments for instance having a stop codon or something so that this increase of course assumes that we keep the pipeline going right? so if I just stop the production in the pipeline I'm not going to increase it anymore so it's still possible even though the folding and production of one protein can be slow I can still monitor this in a fairly finally resolved time scale the the other problem is that I hate to break this to you but I'll be lying all the stuff they said about Christian Anfelsen that small proteins they fold it's simply a matter of finding this free energy it's all about physics you can avoid worrying about life it's not quite true so for small proteins this is fine they fold co-translational gradually you don't really have to worry about the whole lot of mis-folding or a super large construct if you look at one small domain at a time it's a beautiful simple world that we can understand the problem is that even if this were true for all proteins they're going to get so complicated so maybe the free energy barriers become so high that we can't really get over them and then it starts being an academic question whether will they fold well if they won't fold in 100 years in practice the answer is no so there are a fair number of cases and this we know for a fact that we need some sort of end times to make things fold faster catalysts it could even be that for some cases things won't fold at all we had an infinite amount of time unless we have some helper proteins involved and that could for instance be that there are some proteins that are so large and hydrophobic that if they just start to fold before the entire protein has folded because the end terminal part will come out of the ribosome before the C terminal one there's not anything we can do about that but what if this is now so hydrophobic that the end terminal part starts lumped together we might not be able to fold the protein when we get everything out because the ribosome somehow forces this to happen from end to C terminus and in that case you might be stuck you might never ever get to the right part unless you can somehow keep the protein from folding until you produce the entire chain and then let it fold so when I was your age this was not known we started to have the first models about this and people argued that it could be important today we have structures we have neurons GroEL and GroES they're basically large molecules there's basically a small dustbin almost with a lid that are completely hydrophobic on the inside so what these do is that they bind these small hydrophobic aggregates and then they provide an environment when they bind the protein chain on the inside that basically allows these misfolded states to unfold and then they release them again it's literally an enzyme unfolded state and then it can fold to the right conformation it's simply amazing that they work but they do there are a fair number of groups that have spent their entire career understanding all the molecular details about this it's actually they're not pure enzymes because they will need a bit of energy to they basically use a bit of energy to open the lids here and everything and then close the lid to bind the protein but the actual unfolding part I would argue is it's an enzymatic process that you're just using providing a nice environment for it where we're stabilizing the unfolded state so that it can unfold and then unfold completely but the problem here is of course this is not good it will cost us a little bit of energy we will spend time and effort to correct things so nature tries to avoid this likely only happens for proteins where there's simply no other good way to achieve the structure unless we have a large and complex chain having something that folds by itself will always be better so with this we should be able to solve living task paradox 2 if we, I'm gonna hopefully remember this, if we had this 100 residue chain and say that there were 2-3 conformations per residue it should be say at least 2 to the power of 100 and probably much more and that would take like longer than the age of the universe to fold but we know how to solve this now right because how would you solve this with the ribosome to see terminus do we really need to test every single possible state? no so this is a great idea and there was Philips who first suggested this that folding likely starts from the end terminal and then we gradually progress along the entire chain it's a beautiful model it completely solves living task paradox there's only one problem with it that it's not true proteins don't fold that way we know that folding, you can basically kill the end terminus and then, well, remove the first helix or something, the protein will still fold so proteins don't start they don't start folding from the end terminus I think it's a beautiful example don't be afraid of having ideas but once you have ideas don't be afraid of testing them and if your idea doesn't agree with reality go with reality wonderful idea I would have loved to have had that idea too but it doesn't work at all so this makes it a bit complicated we still don't understand why proteins can find these states because we understand a whole lot more about the stabilization we can talk for hours about entropy versus energy but how on earth can they find these states we've just spent two weeks working around the problem but what Leventhal essentially says based on what you know now when we first talked about Leventhal we just talked about these states and how many states you could try but you have now learned much more about kinetics versus thermodynamics and that has some huge implications because what Leventhal is essentially saying is that there is no possible way you can test every single possible state period so what Leventhal is you can't test every single state in the Boltzmann distribution and automatically and I think we will all accept that automatically that means that if you can't test every single possible state you are not testing every single possible state and if we are not testing every single possible state there is no way we can guarantee that we are in a thermodynamic lowest possible free energy so what Leventhal is saying that proteins are essentially under kinetic control in the sense that it's only going to be a native protein if we can get there in a reasonable time which violates half the things you could even argue that Leventhal is violating what Amphysen said and Amphysen was wrong, Leventhal is correct so it's going to be a matter can you find some sort of pathway that makes it possible to find your native states that's what we saw in these figures just after the break that in some cases you end up having so horrible energy barriers that a given chain will not fold the protein even though there is a very low state we might not be able to reach that state then that's not the protein it's not going to have a well defined state or you could argue it will have a well defined lower state but it's not the state we can reach in practice and then it's uninteresting biologically so the biologically interesting native states are the ones that are both well defined but we also need to be able to reach them well so that depends the problem is that if you mean the native state so what would happen if these were just local minima that would not lead to any problems prions so the problem is that in general that's very bad for you because sooner or later you might reach that other state which is really bad so in one way you need both of these things once you are you need to get I think we can all agree that we need to get to our native state otherwise it's pointless but once you are there you don't want to get away and the danger is that if you are not in the lowest state you are going to get away so it's not if if Christian Anfinsen was completely wrong I wouldn't have talked so much about it so the problem is that you kind of need both you need if you didn't have Christian Anfinsen proteins would not be stable but if you didn't have Leventhal they would not be able to fold in reasonably good times yes you mean that effective you had a second state that's completely impossible to ever reach yes but then we started to get into definition if it's completely impossible to ever reach something does that state exist a minute energy barrier exactly but the problem with thousands of years suddenly you have a small mutation and then it's hundreds of years you will survive that suddenly you have a second mutation it's now 10 years so the problem is you start being very fragile but if you have a super stable beautiful state the first 5 mutations you are going to survive so that there is what would nature gain from that proteins would not be as stable that nature would likely select I'm not saying it's impossible it has to do with these probabilities it's just that my bet is that natural selection would select against that it's not a good property to have it's much better to have well defined unique stable proteins exactly and again this is not just hand waving we've tested this for a fair number Christian Anfinsen's early experiments is just one example in general all small domains actually do appear to reach their global minimum of free energy which again comes back I think it's so sad that one thing is that's obviously it's not at all obvious it's an amazing result that connects physics and chemistry and life together let me take two or three more slides I think it's going to be more interesting discussion for you two if I show the next two or three slides because the problem is one way or another proteins will have to fork give you two, three different models you can have five hundred models but these are the three popular models that somehow had people to try to explain why it can happen and the reason for having three models is that if protein folding appears to be too slow the reason why something is slow is that you have a very high energy barrier right? and how could we make this process faster we're going to need to reduce this energy barrier so the idea is that if you can come up with a handful of different models here that might be able to reduce them if you can show that let's say that you show that that model is possible well, that might be something even lower but if that model is possible that's good enough then we can suddenly say that the energy barrier is at least not higher than that one so I'm not saying that this is the best possible models but these are three examples of models that people have tried to explain why things actually work in practice so what Levinsal can build and others have said that let's see that I know I didn't have that concept here you talk about these so-called folding pathways or folding funnels and that rather than randomly jumping over all the barriers here if you needed to hike here right, you would probably try to walk in the valleys right, to avoid can you find some good path down here that doesn't require you to scale all the peaks and that would mean that could you what happens when you go from the unfolded state down to the folded one, the energy will go down that's good for you but the entropy will go up and if the entropy suddenly goes up that's when you get a peak, that's bad so you somehow want to find a path where you bring down the energy and the entropy at roughly the same pace that's going to be a good peak for you that doesn't have any horrible barriers if the energy goes down too quickly you're going to get stuck in a local minimum so all these three models try to find how can you gradually get improvements in energy and gradually get ordering by far the easiest one of this is what you call the diffusion collision model and that's almost what I would have guessed based on this helix coil transition a random chain that will start to form some parts of helix very quickly so that in some sort of milli or microseconds you're going to form some small segments of alpha helix in a sheet and then we've also said throughout this course that think hierarchically, right so once you have these small elements well now you have folding here suddenly we don't have 100 residues but we have three to five secondary structure elements and then we can have those diffuse together and form a protein that should solve it because we kind of we do a hierarchical searching process first you form local structure and then global structure that would means that there's not going to be a gigantic increase in entropy and you will gain some energy early on there are some proteins that fold that way could you even imagine testing this in a let's see here I might have a slide on that no I don't we've actually shown this in simulations so what you can do is say what is the probability of having the helix here versus the probability of having the beta sheet if those probabilities are independent of each other so that the probability of having both the helix and the sheet that's just the product of the helix and sheet probabilities then they form completely independently and then you can show that this model is true we have actually shown that in simulations some 10 years ago for small proteins so there are cases where this is true very simple proteins not in general for large proteins it doesn't work another way of thinking this molten globule you could of course thinking that you had some sort of super quick hydrophobic collapse you have all your residues and they form this oil droplet all the hydrophobic stuff on the interior and then when we do this well you've done the most of the searching problem and you could argue all the contacts here are now based on interactions and then we finally form the secondary structure so a few residues start to well everything collapses and then gradually form the secondary structure at the end the problem with this is that this doesn't work not the molten globule NMR experiments and everything you start having secondary structure already here sadly I would say this one is definitely wrong period doesn't work but that's again that's a great way of testing theories come up with 50 of them and then prove that 49 of them don't work and another way is again you're now starting to get a physics card maybe we should think about these phase transitions more that you start having rapid folding but it's not a random coil you start to making some few key interactions not necessarily disulfide bridges but say hydrogen bonds we know that some helical structure will start to form easily but it's not really that well ordered and then somehow if you have let's say four or five residues making contact here in the end we gradually have more residues almost condensed on these just as if you had a molecules condensing on a crystal when you form it so then the point is that this gradual growth that's the uniform thing with all these you need to get something that grows gradually in a directed way that would then explain that you could gradually now have a transition over to the folded state and gradually extend the structure that would also help us and that we will also see in some cases this one I would argue for larger proteins is the one we usually believe in today how protein folding happens and this one I will show you after Easter that will crack 11,000 paradox the other problem though is that to understand how this happens we're going to understand which one of these three models is correct we're somehow going to need to understand something about this intermediate state right so if we could say something about the intermediate state here we might be able to say something if I can show that all the helices and sheets have formed in exactly the way they should form as we did in the simulation if I can show this in the lab it's strong evidence for this model if I can show that there is no secondary structure here it would support this model unfortunately we did the opposite there is secondary structure with as an argument against this model for this model we would somehow need to show that there are unique contacts formed very early here in the core and can I somehow identify those contacts yep sorry you can mutate some residues you can mutate some residues but the problem is this is a transition state and you can't really determine the structure of a transition state right because the transition state is the most unstable state so it's going to be a bit of a pain you can't really measure on a transition state because you will never have the transition state so that you have 100% of the molecules in my test tube isn't the transition state I can never measure directly on the transition state well if it's unstable if it's unstable it's going to go fast in one of the directions so it's going to be a rapid versus slow arrow despite it will be entirely correct here the slow part here would be that we would gradually adding more residues here would basically be a search problem so I would still argue that there could still be a searching path but you're right slow might not be a good so the universal way of talking about these things there are two ways you can have folding intermediates and there are some stable structures some intermediate is not fundamentally different from say the two different states of our ion channel this is a state we have somewhere on the way and then you might be able to move over to another state these are by definition observable because they have a local free energy mineral you can measure them it might not be very stable but under the right conditions you can measure them transition states on the other hand they're the opposite, they're a peak in the free energy landscape you can't have a transition state it's stable in there because the second it's stable it's no longer a transition state so somehow we're going to need to measure effects indirectly from this bottleneck or try to measure how fast things happen because that's the only information we can if transition states are the peaks in the free energy landscape we talked about the fact that peaks determine how fast things happen so the only way to gain some information about things are basically to make conclusions about how fast things happen during different conditions so that's the only thing we have right if we're going to move we're going to need to move at least between the multi-globular and the native state one way or another and that's why we can move two ways across this barrier so there are a bunch of tricks we can try to do to determine versus experimental rates or anything how quickly things happen you can have some sort of time-resolved folding properties such as light there are and the way to do this in a time-resolved fashion you can have two syringes and you mix something together and then you have a very high flow here so that, and if this flow is several meters per second depending on whether I measure, if I measure here I'm going to measure on molecules that mixed a millisecond ago if I measure here I'm going to measure molecules that mixed two milliseconds ago if I measure here it's going to be three milliseconds ago so that this time-resolved measurement they're not as hard as you think it's very easy to do in this type of stopped flow kinetics and then you're going to somehow need to go into your protein and guess so I think that should we try to mutate that residue and see what does that do to the rate do things happen faster or slower now if there was no effect whatsoever on the rate this was likely not important to the transition state this is hard it requires people spend their careers trying to understand a single protein this way but we also, we want to understand more than the experiment we also want to understand what this does to folding we're going to need to think a little bit about these energy barriers and we have some sort of say called that the unfolded or multi-globular state A and then a state B and then this barrier between them so this, that barrier is when we go from A to B and this barrier is when we go from B to A right and for each of these barriers we know what the free energies are and then we can calculate these what the rate constants are, what is the flow per time from A to B and from B to A just a bunch of mathematics in principle we can measure these if you have a fancy way to measure how fast the folding happens versus how fast the unfolding happens I'm not going to tell you how to measure it there is a reason why we are in this building so the reason if you discover this you might get the building named after you and the noble price the Svanta Arhenius who was a Swede came up with this idea of measuring reactions this way and since it's a bit difficult you notice that these are exponentials right and then there are the rate constants we measure and then we have the temperature and the denominator in the exponent here so if we are really after these free energies and everything we can put the logarithm of the rate constants versus one over the temperature and in principle we should be able to get these free energy differences or something from the slopes here I don't care about what the specific free energy differences are here but what's going to happen here is that depending on the temperature one of these barriers is going to be more important than the other otherwise proteins would never fold right so at very low temperatures then we are to the right here the lower the temperature gets the more you fold things fold it to native and let's forget about the black curve for a second on the other hand if we are looking at unfolding well if I increase the temperature I unfold more and more and more that makes sense to right but you know what if proteins should unfold at high temperature but the problem is more complicated than that because if you are you don't just if you are unfolded and then you fold can you go back from folded to unfolded you do right because this is statistics so you are going to have some of this that goes in the other process so any test tube you look in you are going to have some molecules folding and other molecules unfolding so you are not going to try to measure unfolding and folding of the same protein at the same time you can't do that so the problem is virtually if you only want to measure folding the second the protein has folded you would somehow need to remove it from the experiment or to unfold again and that's a bit difficult to do to say the least so these molecules are really nasty sorry these plots are really nasty to work with but what I am saying here is that below that temperature you fold faster, above that temperature you unfold faster so that net effect is going to be very easy to measure do we in general see that more protein is folding or more protein is unfolding then I could just look at the total amount of protein that folded right if you you can actually estimate these energies and everything from these diagrams I am in the interest of time I am going to skip this because we just hand-waved us through this instead if I want to calculate actually if I want to calculate how these the measured values in the previous plot the logarithm of the rate constant changes here with the temperature well I should just calculate that derivative right and then it's a matter of looking at all these terms which terms depend on temperature which terms do not depend on temperature and in general the free energy let's say that the energy part there doesn't depend so much on temperature but then you have T s and then you have the entropy proportional to the temperature if you basically extract this you end up with saying to first approximation that the derivative of free energy per temperature mostly on energy here and it's negative don't worry about the exact shape there what this means is what I just told you in the previous slide is that when you are increasing the temperature the unfolding goes faster because the unfolded state has lower energy is that normal or abnormal let's go back to this plot it's probably easier for you so if I'm increasing the temperature here this process goes faster is that a normal chemical reaction goddammit it's a completely normal any time you have a chemical reaction then add some fire to it you expect it to happen faster this on the other hand folding if you decrease the temperature it goes faster that's a pretty abnormal process there are a few chemical processes that go faster because you add some ice but this is one and if you go through this Matt you can actually show that this comes out from these dependencies on the free energies and the character of the barrier and just by looking at these type of energies which we can learn from the Arrhenius plots we know that the energy of the transition state is higher than the energy of the native state that makes perfect sense because to get from the native state to the transition state we had to add energy right you had to heat it but you also say that if you are at this horrible transition state and you keep adding more energy well you do fall in a well into free energy but the energy just keeps going up so energy keeps going up the further way you are from the folded state again at this point it's not just hand-waving based on equations but we actually see that in the Arrhenius plots and because the energy keeps going up that's why I know that the stabilization is called by all these interactions we have it's not just hand-waving native states have lower energy and there is a monotonous increase of energy as you go to the first the multi-globular and then the really unfolded states you can do exactly the same thing for entropy I'm not even going to try to take you through that Matt I think the book does and the entropy is exactly the opposite so the unfolded state has the this is actually the same number but since entropy acts the other way you have the highest entropy in the unfolded state which is really good you have an intermediate entropy at your transition states and you have really low entropy at the native states again based on the shape of the Arrhenius plots not just on my hand-waving of course this was for one protein I'm going to show you in 999 more Arrhenius plots you can find plenty of them online if you search so that we can actually based on those experiments you can basically prove these concepts that the the barrier to folding when you're going from a denatured to a native state it's entropy it's a searching process but when you're unfolding the barrier is energy because you're breaking all those beautiful interactions and again what I did in the first half of today I hand-waved about that based on physics and what was reasonable based on Arrhenius plots we can derive that this is true so this is not just a good idea this is a highly schematic simplification of it but this works and that's why it works in computer simulations and everything too so you have an unfolding barrier and a folding barrier and they're very different in nature and here is where I actually would argue that folding is a bit different to say ion channels between different states so what happens where you have a molecule such as an ion channel that go through two well-defined states it's usually not randomly going through from one state to another but you're having some sort of external factor such as a ligand binding or something that suddenly changes the position of these states but in the case where we don't have a ligand bound we have a well-defined best state and then when we bind the ligand we get a slightly different best state and in those conditions it's well-defined which one is the best one we have two more slides so I will just introduce you to the way we're actually going to solve Leventhal after Easter nobody I'm so sorry nobody uses our e-news plots today almost nobody at least not at least in protein folding because they're too complicated to work with what you do is that you can actually show that if I just have a test tube and I measure what is the total amount of protein being folded per unit time and the total amount of being unfolded you can go through a bit of mathematics the book does it in the interest of time I won't have time to do it now if you go through and show just what is the number of molecules we have per time unit and what is the how quickly does the protein appear to fold that is a minute of time how many protein molecules do I get per time moving how many molecules per time do I feel that I'm moving from the unfolded to the folded state now there are going to be two contributions to that I have a positive contributions from the one are folding but then a negative contributions from the one that are unfolding if you go through all that math at the end it's going to be super simple the total apparent folding constant is just going to be the sum of the two constants and that means that you don't have to worry about all about trying to extract the molecules from the sample you just measure everything at once and then you don't get two separate curves you get one curve so effectively we're just measuring the total rate whether it's folding or unfolding I don't care about and what that's going to give you at very low temperature in this case in this case is a denaturant going to dynium hydrochloride at very high concentration this is mostly going to be unfolding of course right so that's going to have a very fast reaction that is unfolding very quickly at some point the denaturation I will reduce the denaturant constant constant so much concentration so much that pretty much nothing happens I'm going to be at equilibrium and at some point when I start reducing the concentration of the denaturant even more I get the protein to start folding so suddenly the reaction starts to go in the other side the point is doesn't really matter what I'm measuring here just here it's going to be a bit difficult but in total measuring the total rate of reaction happening in any direction here is going to be so much easier to measure anybody can do this we're going to see that well not next week but after they're called chevron plots and the only reason for this pattern on military uniforms if you do a PhD in some groups and either this department or others you might do a PhD that basically consists of producing say 400 chevron plots or something these chevron plots are going to be what we use to determine how fast proteins actually fold and can you change the protein folding and in particular we can use them to capture some information about the transition states because when you start making mutations what's going to happen is that these plots will ha ha I need to make a chevron here these plots are going to move either in that direction or that direction and from these offsets I can decide did I change the stability of the transition state or did I change the stability of the entire protein and then we can actually I can't capture the structure of the transition state but I can say did if I now change residue 47 if this changed the stability of the transition state residue 47 was apparently part of the transition state and this is going to be now we can use a really cool protein engineering both to crack Leventhal's paradox the design proteins that fold faster I won't go into any details there and I didn't even plan to there are a bunch of study questions here and I think this is pretty much the last part of the Connecticut review I'm going to solve I'm going to show you how we solve Leventhal's paradox early next week after Easter but apart from that I'm going to head a little bit more into DNA and practical research and everything do you have any questions for me I'm going to either tonight or later tomorrow I'm going to send out to you one of the exams that I've given the previous years I have to confess there is a reason I didn't give you a start because I don't these exams change a bit I will add some new stuff after Easter don't focus your studies on exams because it's bad for you and I try to change the exams so that they're not too many type questions what I think I told to did I tell all of you this when it comes to the exams I separate them into two parts if you know these study questions you will pass the exam period but there are lots of them and I'm not going to ask you all of them I'm not promising that I have exactly the same formulations but again I'm not going to try to trick you if you know these questions you're going to pass the course but on the other hand if you want to ace the course I'm going to have a couple of questions then about more conceptual things to force you to think and that might very well be things I haven't even gone through here so I ideally I'll see if I can find some example where I give some data from an experiment try to model this try to solve this because that's the type of problems you're going to be faced with if you do a PhD or you sit in the industry in a couple of years and being able to think is important