 So good morning. Let's see. Did you have fun on Friday? Did you learn anything? So in general as you realize when it comes to defense you don't have a hold of time So that in particular the opponent jumps pretty quickly into the deep side of the pool So most of the questions there were of course way more advanced than you would expect on the undergraduate level But the key message that you might want to see there is that there's a surprising amount of There's a surprising amount of cutting-edge research that really goes back to very simple things, Boltzmann distribution, Delta G's understanding how you can derive Delta G's and I think there were at least three or four of the questions that Christian got from Tom that were really about how can you measure this and Can you back the equations and prove why it's right? So the whole point is that just because you're graduating and start doing PhD research that doesn't mean really mean You do more advanced stuff in the sense that you're applying it to more advanced stuff But the equations are pretty much the same. You don't necessarily use more advanced equations It's just that you have to think more about them. I Would suggest that we start by discussing some of the things we covered on Friday I'm gonna come back to some of these things today too because I'm well aware that today things are starting to get a bit more complicated We will pretty much finish the discussion about protein kinetics today and then tomorrow likely I will be speaking a bit about free energies Design of proteins possibly a little bit about drug design and then on Friday likely I'm gonna finish up with talking about docking high throughput studies and everything basically everything you can do in industry You're gonna have to I think we're only gonna do two more labs actually this week because Darin Bjorn I've been happy with you and depending on I Might actually try to move the docking part until tomorrow so that you have a chance to go through that before the lab on Thursday Because I think their plan was to give you Friday afternoon off and then early next week. We'll finish up everything I might talk a little bit about our research and I'm gonna do a repetition of the entire course and then next Friday We have the exam things are going fast Discussions pick a question and get started. Oh, yes, we do we do have handouts there you go Handling but those aren't gonna help you a whole lot with this slide Actually The danger with choosing what you can answer is that you tend to answer the things you already know Rather than the things you don't know, which is a really stupid strategy So I'll suggest doing slightly slightly different. Let's get started and take question number one And then we'll just go around the table Right, and will we frequently talk about denaturation or something which one is the most comment of the non-folded states? Yes, so the whole the whole thing is that it's it's a bit bad that we keep talking about folded versus unfolded states Where it turns out that this really completely unfolded when every single trace of protein structure is destroyed It doesn't really occur in nature You can create it in the lab with high temperature and very high concentration of denaturant but in your body you will get a molding globular the second something exits the ribosome and Then it's a question real protein folding is mostly a question about how it transfer between the molten globular and the native states or back Yes, and we're gonna come back more whole lot more to that today because the big question that we really haven't solved I'm not sure if you remember it is that we don't we can't going back all the way to live in tall Right, we cannot understand why can't why do proteins fold so fast? And what is that really that determines when a protein can fold and how it can fold we haven't solved that We're gonna try to solve that up to the break today But the key thing is that it's really this transition between molten globular native that determines every single properties of the proteins We see what is the characteristics of the barrier? What is that stabilized protein structure and everything and the reason why that is important? That's because that's a second problem by the time you start trying to design a protein or something It's not just enough That the protein you've designed has a neat binding site for molecule X if you want to catalyze some process involving molecule X You're also gonna need this protein to fold reasonably fast or otherwise you just gonna have a sequence It's not gonna be a protein and That's turns out to be a bit of a challenge But we'll cover that in a lot of detail today Tarinas in an NMR experiment which one of these would be the multi globular be most similar to Yes Because I'm a bit nasty. I'll just go around the table. I'm not so interested in the answer I'm more interested in how you think about it Yes, so and the whole idea was sorry the whole idea with the NMR experiment is of course that you can get approximate results under Realistic conditions that is under room temperature or something in an x-ray crystal. You would need to have it at 100 Kelvin with us So the key thing isn't necessarily an NMR experiment, but it's under realistic conditions So what does the molten globular look like? No, so that you know you don't necessarily have high temperature As you if you're really bringing the temperature up So if you if you increase the temperature a bit you might go from native to multi globular If you go to really high temperatures and high Concentrations of the natural and you would eventually read the random coil, but my point with this question is really that In practice, it's virtually indistinguishable At room temperature if you have a look at multi globular versus native when you look at all these average properties like at the concentration of secondary structure the This is not an NMR course I don't expect it no NMR, but you have all the NMR helps you to drive contacts between residues You can see for instance that residue 47 must be close to residue 396 etc But all these average properties there's hardly going to be a difference in NMR It's going to look not just similar to but virtually identical to the native state It's very hard to detect that type the first type of unfolding in NMR in an x-ray structure You would see it because again the multi globular would not well first the multi globular wouldn't really crystallize because if it Crystallizes would likely have to fold right, but the point is that at first sight you if I just showed you two pictures I should have that in the slide. I'll do that next year if I just showed you two pictures I bet you couldn't separate multi globular from native in most cases The devil is in the detail very small details of sidechain packing and everything that determines when you really click in and fold the fold from the folded state Well, no, so they are the key thing with the hydrophobic effect Right is that when you start with a completely random coil the random coil collapsing to a multi globular It's going to happen instantly and any any sane temperature that you would ever look at an NMR You're going to have the chain collapse to this intermediate state Sorry, the other part you asked about was So that's the main part of the hydrophobic effect is what explains the transition from random coil to multi globular So when you are in the multi globular you pretty much, you no longer expose any hydrophobic amino acids you even have most of the secondary structure formed and The only thing that really hasn't formed is the detailed sites in packing and you need to push out the last water of the structure Now mind you this far. That's something you're gonna have to take my word for but I will show some computer simulations that that Doesn't need seem to be the case that pushing out the water is the last step No, not really. So first thing the caveat here is that Crye and all the all these things cry you need to freeze something, right? You typically plant freeze it, which means that you get you don't really get normalized You get a vitrified system around it If you actually go all the way and determine a structure Yes, of course, you're gonna see these tiny details right that there is differences in the amount of water contents But look at these biophysical properties about the first approximation of what is in contact with what and how much secondary structure do you have? There are virtually no differences there. So if we continue with three Why is it important that proteins are polymers? The number of microstates aware would be so large While having a large number of microstates is good, right? So the key thing so that the key thing with folding is that it's not one state, right? You're always looking at at least two states So of course if something is good in one state Transition-wise it's going to be bad for the other state So if the number of microstates is short Polymers has to do with the fact that the number of microstates are can become very large But in what states would they become very large? Yes Because that the second you fold it right by definition Well, if you have something folded you've already started to have the residues since the proximity of each other So it's not going to be an infinite number of states there But in the unfolded state if they're not connected they would on average be infinitely far away from each other So that and this actually is not this is very common So the second you start to design proteins or design an allosteric modulator designs on that should change an interaction There are two things you can do if you have a closed state and an open state for instance in a protein Channel and you would like the channel to be more open. There are two strategies You can either stabilize the open state or de-stabilize the closed states And in many cases kind of 50-50 there are different drugs do different things both of them can work really well So related to that. Why is it important that proteins are hetero polymers and So there's a keyword that I like you to use or to think in terms of that helps you a lot specificity, right Because we talked about you need all these second side chain interactions and packing that it has to click it has to be fit like pieces in a puzzle and If you do not have that specificity, you're just gonna have a hydrophobic collapsed blob or aka multi-globul So the reason why you get native states is that when you have all the pieces in the puzzle fit perfectly And that will of course depend on how the pieces fit together, right? If the pieces do not fit together specifically, you're not gonna have one specific state. That's better than the others All right, Sarah Sidechain, how are scientists packed in a native state? Jesus. That's an awesome. I keep I keep well answer it anyway Well, the reason about it. What can you say you can say so that? First that's a very bad ass. You should never ever say that you don't know something Because you do know I would bet you know something about it and In general when it comes to solving difficult problems and now I'm not talking about classes now I'm talking about life in general or in particular research in general saying that you don't know something about something that's giving up and No hard problem and particularly we're gonna go through some equations later today. No hard problem was ever solved instantaneously So when you start working a hard problem, you start to focus. What do I know about the problem? So focus on what you do know about it rather when you don't know about it and see how far it can go So what do you know about sidechain packing in native states? So you look at lysine and leucine in a globular protein, which one is going to be on the inside, okay? So you do know something about it. Yeah, so don't say you don't know anything about it. Let's start over And why is that the case? hydrophobic so first we have hydrophobic things on the inside The second one we say packing Even if you're based what we just said about the molten globular in the native states Can you say anything about the density or? interactions And also what we said about just said about hetero polymers. Mm-hmm and that goes back to the keyword. I mean I talked about which is Or specificity in terms of size back there. It's specific interactions. It's not just a random blob And when these are specific Can you even guess something about the density if you compare the molten globular to the native state prayer particular? So if you were to guess how high the density is Compared to say well water or something or the packing efficiency if hundred percent would be perfect packing in a native state Yeah Now I wouldn't say that it's higher than water because water is well the density of water is roughly one right? The average proteins are slightly lighter, but if you think in terms of percent It's not going to be better than hundred percent or 80 But but the point is the path this packing is high if you compare to this the molten globular then you start having water Or something's in it and I would that is actually the one difference You might be able to see in a more experiment once you get down to the molten globular There is some water and everything inside to probably drop to 60 percent the whole protein swells a bit like a sponge or something And the thing that happens when you actually go to the fully native state you squeeze out that water and things are really clicked in place So think of this like I think it's think of this is a three-dimensional jigsaw puzzle, but it's like it's densely packed It's very specific. You have hydrophobic hydrophobic or hydrophilic hydrophilic if you ever have a Charged side chain inside a protein you're gonna have you virtually always gonna have a charge of the opposite sign to form one of these So called salt bridges plus the wine is Yes Yes, no better than if you compare the water right water is water really is a liquid It's easy to move things around it's better because it's those small molecules by definition proteins first proteins are much larger There are much longer range connections and everything so These just think to think if you have two amino acids packed like that They will not be able to go straight through each other to that state Which is the big problem to bioinformatics and structure prediction if you Make some mistakes with side chains You're pretty much gonna need to unfold the entire protein to get it to fold again But it's it's more if you have to choose is more similar to a solid than a liquid Which comes back and that's actually important concept because relates what we're gonna talk about today phase transitions, right a Protein folding it really is like a phase transition You're going from something you can think of the multi globular more as a liquid But the fully native state more like a solid something really happens there There's a barrier we're going over and forming something different So let's continue alloy. How does the enthalpy vary from unfolded through transition states to native states? It will actually go down and a particularly in the last state it will go down quite a lot and then you could argue Well, but if this was the only effect in the protein then you would very rapidly slide downhill And you would fold an entire protein and like way less than a microsecond you could fold large proteins in your simulations The reason why that Dozens suffice to explain it is the next question, which is how the entropy varies from unfolded through transition state to the native state We never go So now you probably think about the free energy if you just look at the entropy part Don't think don't think too much. Just you have these large chain and it's collapsing. What does that do to the entropy? Is it more or less ordered so that the entropy goes down? And you know what we're gonna come back to this several times today This is the key because what is the free energy and whether the definition wise? What is free energy and thought the minus entropy right and we just realize that both of these go down So that whether we have barriers and whether we are stable that will then depend on the balance between them, right? There's going to be a very delicate balance in many cases And that's going to turn out to crack all these problems in the end that we explaining why some proteins fold and why others don't We're going to come back to eight to nine two But we can cover the many way because this is repetition and it's good that you're so what is based On our hand waving of on Thursday. What is the main free energy barrier when you fold proteins? I didn't prove it. I just hand wave them Hmm, but if you have to choose one of these components enthalpy or entropy when you fold the protein Which is the which is the main barrier for you? Have a second guess entropy and The hand waving argument was basically based on the fact that this is the pros where you need to search a lot Right, you need to find exactly what sites and confirmation should we fit in the second you have the site and confirmation The very endgame is going to be relatively quick because then you just then you can slide down in the free-end your landscape And just improve the energy, but the whole point is that we need to search Before we really started getting the benefit of better interactions So when you fold the protein the main problem is really that we need to search I a IE entropy So then during unfolding This is not fair because this is easier for you that the reason for having both these questions of course during unfolding the barrier would rather be Anthropy and can you reason about that? Why is that reasonable? Right and that's and it's exactly the same process of course But now we're once we have all these beautiful interactions the first step backwards we're taking we have to break the beautiful interactions That's cost us lots of energy, right? But we're not really going to gain any entropy because the last part we don't really have a whole lot for no But eventually when we get over the barrier then we're going to get all the entropy of the state So when you're folding we have to search that's entropy limited when we unfold We have to break interactions and then the energy or enthalpy is primarily the barrier I'll come you know what let's skip ten for now because I'm going to talk a whole lot more about that today And it's the part I talked about last week was a bit too much hand-waving So let's move to 12 roughly. What is the probability of random chains to fold? Dorinis no 12 so we're skipping the energy gaps Right do you remember roughly what probabilities we spoke about so that the The key thing for a random chain to fold it's not just a matter of the chain being stable, but it also has to fold The transition state barriers cannot be impossibly high because then you will never fold in practice and what I argued about there was that the Stabilization energies we saw they of course they have to be related to KT and they compare that to the distribution of these energy defects we had and That turns out that you got something that was the work of 10 to the minus 8 or so whether it's 10 to the minus 7 or 9 I don't really care, but the whole point. It's not 10 to the minus 1 or 2. It's 10 to the minus Large number which means that it's the first approximation zero And that leads to two things at least the most random sequences that you just synthesize in the lab They're going to be great at doing what? Where will they end up if you look at question number 1 random Multin-globular and native states if you just synthesize random sequences, where will they end up? Why would they be random court? Right, so that a random sequence on average we should contain roughly 50% hydrophobic and roughly 50% hydrophilic residues now, of course In theory, it's a possibility only having hydrophilic residues in a very long chain But and then it might be stretched out But if you have roughly 50% hydrophobic residues, they're gonna want to collapse because of the hydrophobic effect That really does not have anything to do with protein folding and this is a scary part of because and no We spent all this time early on in the course studying the hydrophobic effect It makes so much sense that you're turning the hydrophobic sites into the inside, but that doesn't give you a protein that just gives you a blob with hydrophobic things on the inside which is very similar to molten globular because you might form small stretches of helix or so So a random chain would fold the molten globular But they would not be able to go through this phase transition to a really stable lockdown native state because then you need specific chains So that's one part and this is a problem because the second you start to synthesize proteins. Sorry, not synthesize, but design proteins Well based on this probability Would you if you had to design a new protein sequence to do Catalyze a reaction or something. What would you do? Right, and why is that? So that probabilities are dangerous It's very hard to wrap your hand around a particularly very large and very small numbers if something is 10 to the minus 8 Forget about it. It's zero The probability that you are so smart that you will be able to design a protein fold that stable is virtually zero Now there are some computer programs that can occasionally do it I'll get back to that in a second But the whole point is by far the easiest approach Put pick something that's existing and just try to modify it a little bit if it's a gigantic protein And then you just change a handful of residues around an inter and around a binding site If you're lucky, this might still have the same overall fold and then you just change the binding site. Will that always work? No, because you might be unlucky, right? It might be that one of the two or three residues you're changing could be a very important residue for the protein stability And then you just destroyed your protein. You're gonna see that in the lab very quickly. So what would you do that? Yes, or well, actually that's not such a bad idea try a different fold because there is If your job is to catalyze reaction X remember that your job is to catalyze reaction X not necessarily fold something into full Y Right, but the other thing that you can do that computer programs are reasonably good at today Pick the fold pick the entire fold and then tell the computer to design a sequence that both bind something here But also that is stable in this entire fold This is not a trivial problem, but it's an easier problem for the computer than trying to design something from scratch So here we know what we're gonna fold in and just design Try to design specific interactions that are stable in this fold as a native state And that's typically what you do in protein design today Sure, and a pet right and a particularly if you're gonna need to bind something large when you need your protein to move it I'll move or breathe a bit to a bind around bind something But I'll come back to that later this week when we talk more about design and the drug design in general 13 what molecules are involved during real protein synthesis in vivo. So what does the ribosome do? a large So we skip the ribosome for a second, what is the first step we need to get from DNA do something Yes, and that's done with a protein called DNA polymerase There's a reason why I keep bringing those molecules up that is an insanely complicated Process, right? You need to bind to DNA. You need to see well first you need to bind a protein to DNA You need help of other proteins to recognize the sites you're gonna bind to you're gonna need to cleave the DNA Like the num the number of processes this goes through is insane The second process is not DNA but RNA so there you have RNA polymerase involved to what does RNA polymerase do So this is the protein that copies the genetic information to RNA right from DNA so that you can take it to the molecule You mentioned first which is ribosome The ribosome actually always connects to membranes. That's because there are several membranes in your body So this the protein machinery part of your body is what? Yes, and the endoplasmic reticulum, right? So what happens is that you have these translocons sitting in this membrane So this is the membrane inside the cell so the ribosome connects to the translocons and either pushes this out to become Globular proteins in the cell or you can push it into the ER membrane And then if it's now going to be a membrane protein that should move all the way to the plasma membrane You're out the membrane Then that's a really complicated process how the protein has to move in that membrane We actually do not know exactly how that happens There are all these guesses that it might depend on the concentration of cholesterol or the thickness of your membrane But we don't really know how membrane proteins diffuse inside the cell There we do know that there are some membrane proteins that are only expressed in the plasma membrane and there are so there are Differences between them, but exactly how that's a good PhD product. Well, no, there's probably a hundred good PhD products A chaperone then what does that do? It's a kind of Yes And in particular so what type of proteins usually by or intermediate states usually bind to the chaperones Now say a chaperone appears to usually take care of misfolded states or unstable states Things that things that are not according to the main folding pathways is basically rounds these up and pushes them in the right direction instead That is also something we're going to come back today. There is a reason why that speeds up folding and Then we had three small folding models that I will come back to today But let's describe them diffusion collision first You're on the right way you have secondary structure and it's even a plural secondary structures As you said, so what is the key thing? It's almost you can almost I guess it from the name diffusion and collision So the whole the whole point of course they are connected by a chain, right? But you do form the secondary structure elements they form independently, that's the keyword and Then when they have formed independently they diffuse around of course, they're still connected by a chain So they can't move infinitely part of it and then occasionally well, okay They collide and when they collide in a advantageous position then they will form larger stable elements We're gonna see some simulations that today We had this intermediate folding model that says hydrophobic collapse I would argue this is by far the worst, but you should know about it anyway So the problem with the hydrophobic collapse is that in many ways the hydrophobic collapse describes the transition from a random coil to Multin-globular better for that position. I would argue that is pretty good But after that it's pretty much magic happens then everything starts to fold And as we've seen from the multin-globular and then a more and others experiments This is not just completely random and hydrophobic. There is a surprising amount of secondary structure and some correct context and everything so it's It was a good idea at the time, but it's not really true anymore and the last part The last of the three models called nucleation condensation We just also something you can occasionally see in physics, but in terms of proteins, what does that mean? Right, so think of this like a crystal ice or something for me that you need to have some sort of first ice Crystal started growing what they call that crystallization grain or something Once that has started to form you're gonna get more things condensing on the surface And you start to grow something and that it's gonna turn out that you won't we will never receive that state because that is a very Expensive transition state, but that's gonna correspond really well to all the things are seeing so that Nucleation condensation while a bit complicated is going to be the main explanation to everything And That relates to 18 which is the last thing we discussed or I brought up this I realized It's a hand-waving and abstract slide about Amphicin and Leventhal and thermodynamic versus kinetic stability Amphicin said that proteins can form spontaneously because they are reaching or aiming for the most stable state and then Leventhal sort of pointed out that The search is too large to actually happen and this relates to thermodynamic and kinetic stability because it doesn't matter which one is the most theoretical stable state because the only other method is the one that you can actually reach Right and I think and this is important because I feel so sorry for Leventhal in the literature that Everybody gets to credit and everybody just says that Leventhal merely pointed out there's a problem He didn't merely point out that this is a problem, right? He also suggested that by definition. There is no way While Christian Amphicin might be right in the sense that it's a global minimum But the point is that in practice for proteins in general there is no way you can sample every single possible state Period and you know what the second you realize that? Pretty much that that's also a lot of the problem too because if you cannot sample it by definition You cannot sample it The only way you can fold the protein is then if it's somehow it's a guided process The the proteins that do fold most fold reasonably fast in some sort of kinetic stability that it happens because it can happen fast enough Now for whatever reason that we haven't explained yet that what we were looking today It might actually be the case that these fast folders also by definition tend to lead to the global thermodynamic minimum But whether proteins can fold biologically that has to be a matter of kinetic stability It's completely uninteresting whether they can fold in a billion years So we're gonna talk about that today a lot and we're gonna speak a bit about folding rates arenas plots It's important at Stockholm University. I guess We're gonna look at both these transition and intermediate states and come back to what they were Folding funnels is a very key concept. This really goes back to what I just Told you about a folding funnel is some sort of guided path where we are able to fold fast enough And we're gonna use this nucleation condensation model to really explain live and toss paradox and what proteins fold fast enough And this actually works out experimentally too. It's really cool And then I'm gonna talk about the energy gaps and a little bit about this temperature dependencies so let's Rather than having a boring slide here, I'm gonna move to That one and have you explained this what is what's here a b c d and e? Well, we can have something before a and something after e2 If this is some sort of free energy and we are as a function of a process. What are these different states? E is a global minimum which in terms of proteins you would call native state good Here to say the longer you wait the harder is going to be for you to answer The ill of a smartie picked by far the easiest one C could be a multi globular and and another way to call that in terms of what I had in the previous slide when we talked About intermediates and transition states So that's an intermediate state. Why is it an intermediate state? It's not a global minimum But it's a clear state in the sense that it says it's locally stable It's something that is possible to observe experimentally also depending on what these barriers are and hey This might only be stable for a millisecond, but it is stable for a millisecond So with some experiments, we should be able to monitor this might be hard, but it's possible theoretically It could be of course in terms of a prion or something but but for now we're gonna skip that and but but this Actually, that's a very good comment with Everything is in the eye of the boulder, right? So what we call a native state is what we might mean some sort of biologically active state and for a prion You could of course say well, that's the native state and that's the disease state But to nature there is no this just that's a matter of that has to do with the time scales It takes to go over D, but in terms of simple protein folding. This is an intermediate and that's the native state So what's a then Well, or if there is no other really stable state up here This would be some sort of Denatured state and whether that's random coilers. I don't really care about now, but that would be a denatured state So what's B and D? Yes, so B and D are transition barriers Or rather if the barrier is this entire part, what would be that specific state up there be? That's the definition, but there is another word for it So Dorina's you almost had it if the entire part is the barrier. What do you call that specific? the transition states So that the entire part is usually the transition barrier and with the transition barrier could also be that we interpret that how many kilo calories per Mole is it the specific confirmation there the specific state is usually called transition states? Which is a key concept both in chemistry and everything so we have two transition states here B and D and I drew B fairly low here So this would be the core collapse going if a is denatured and C is multi globular Why is B low if a is that some sort of denatured state and C is an intermediate multi globular or something? Is it reasonable that this barrier is fairly low in B? Why? Yes, but you can also relate this to what we know experimentally about how fast things happen So we know experimentally that the collapse from some sort of completely unfolded to multi globular This is primarily the hydrophobic collapse right this should happen very fast not necessarily instantly, but pretty well I would even say that this it should happen almost simple So you have a very low transition state in B possibly so low that since it's in the ballpark of KT that it happens almost directly Well, so all these states would be at equilibrium after a long time, right? But equilibrium might mean that you have 99.9999999% of things here equilibrium just means that things aren't changing anymore Equilibrium is boring that the sense horrible, but equilibrium has to do with thermodynamics now We're talking about kinetics and kinetics has to do with how far even when you are in equilibrium You're still gonna have things going from A to C and C to a is just that the relative concentrations don't change And this is the problematic part here because at equilibrium you only care about a C and E But now we're talking kinetics and that's hard with kinetics we have to care about D to So what is D? Yes, or at least so this is the transition state between the multi globular and the native state And what does that do in particular in this process? I? Would argue that's the most it possibly the most important state in the entire picture The state D here actually I'll ask you You just said that B was something D is also very similar to B. So what type of state is this? Well, yeah, well, yes or no, so this is a transition is quite correct that it's transition state It's a transition state between what? Well, and this is not the unfold the safe. What state did we call this round? Well, it is it's not Intermediate of the multi globular, right? So this is really the transition state during the main part of the folding Is this gonna be a slower fast process? much slower and We talked about this very briefly a couple of lectures ago when you have one by barrier that's significantly higher than the others You typically call that process something or that barrier that it determines something rate limiting or rate determining a step because the rate This one is very quick. So the height of this barrier is going to determine how quickly you're fold and Then the ring this is quite right that if this was what's lower we would unfold quicker But we would also do something else quicker We would also fold quicker, right? So that this all the that in turn is going to depend on the balance between C and E We don't I didn't put labels here, but what would this be? Why is the free energy very high to the left here before a? Yes, and why does the free energy eventually come very high there? No, well, you would have bonds because we're not we're not it's not a nuclear brother So the reason how many eventually remember this is this is the normal kind of random coil state But eventually you would get a very high free energy because that's eventually you would stretch out the chain completely, right? And then you're back to only having one state. So if you start to go very far, you're gonna go up in free energy So this is some balance where we have lots of disorder and everything you have so many states that is relatively low Why does the free energy go up after the folded state? Well, and at some point we would start pushing atoms into each other, right? So if this is some sort of function of density or something at some point We're gonna start having bad energetic interactions and probably even fewer states Think I know that this is not entirely easy But try to think it in terms of this because if you understand what these different states are and they do That's gonna serve you way better than just being able to read long lists of what they do Yep No, but as I said at this date is going to be in the ballpark of Katie and if a state is in if sorry If the barrier is in ballpark of Katie Effectively you're not going to see it, right? This is going to happen so extremely fast. So first nothing in nature Even even here you will always have one side chain that might need to cross another side chain or something Nothing in nature is ever perfect But again think Katie that it will be happening so fast that the second that this comes out of the ribosome a couple of Well, no no a microsecond later you're collapsed So for all intents and purposes you're right You're never gonna you're never really gonna feel this as a barrier at room temperature So that brings us back To something else. I'm not sure I studied this and if you are in physics. Let's see to have a pen here Let's forget about A and B for a second Let's live in a simple world where you have one state here and you have one state here and then you have a barrier between them and Then you can start here, you know as a function of time If you start having having all your molecules in C How long does it take until you start having molecules in E and How quick does that concentration grow? So this is This is something very similar that this turns out that this is called an exponential process or something and If you forget completely about kinetics and everything initially if you have everything here There are going to be lots of molecules that can move over So that the derivative of the process is kind of going to be proportional to how many molecules you have left But as molecules move over a barrier or something you're gonna have fewer and fewer and fewer molecules here And since the derivative was proportional to how many molecules you had that's gonna mean that the speed drops And if you remember your mathematics, what is the function that it's its own derivative? Exponential So if you if we for a second imagine that we had produce were so simple and remarkably beautiful that Folding was a simple process. You had some sort of state here Gigantic barrier and then a much this route much lower think of this a hundred KT and then a beautiful folded state in such a simple world The probability the amount of proteins that have not yet fold that it's going to go down exponentially So that's the probability starts at one and then it goes on as exponentially will eventually reach zero This tau is the time constant. We have for now. We have no idea about how fast that is And that also means that the probability of protein that's gonna fold that's gonna be one minus that exponential So you start out not having folded any protein and then you gradually go up and then eventually as most proteins have folded This rate is gonna start decreasing This is true. This has nothing to do with protein folding This is what you called a first-order reaction or a simple two-state transition And that's if you don't know anything else about a chemical process That's usually what you start assuming That leads to some Profound things that surprisingly it took us a very long time to realize Even though we don't know what that tau is Let's start to take an arbitrary protein like the one I showed here and assume that it's a two-state model. I Can't do that because I haven't proved it right or can I? Well, so that's kind of a problem right that first never ever be afraid of simplifying and guessing The key thing that this is probably among the simplest proteins we can have What we can of course do we can assume that this holds and let's see Let this a lead to reasonable things and at some point we're eventually gonna need to compare this to experiment to see if the model is Right, but if we want to understand folding it doesn't get any simpler than this and you could hear of course imagine that This is a multi globular. This is some sort of transition and this is a really folded state So let's just assume that and see what happens In particular Complicated diagram here forget about it. Let's just look at this diagram I should put this in white background is that the red line here is one minus an exponential as A function of time and here you actually this goes out of 50 microseconds So this is this protein BBA 5 Which is a small design protein. It's designed to be super stable by the Barbara imperiali group at I think it's Berkeley It's a really complicated name. It's a beta beta alpha. That's why it's called BBA. Can you imagine why it's number five? Kind of the fifth mutant. They tried right So they probably made mistakes at least four times before they found something that was stable and folded really fast But so this is a protein where we know that the experimental folding rate is roughly Folding time is in the ballpark of 10 microseconds. So this tau is roughly 10 microseconds and again That's because we can measure it experimentally How much protein has folded and that's exactly the the proportional or the probability of being folded So the beautiful thing is that if you could just go upstairs in the lab this afternoon and run a 50 microsecond simulation of this in order You should fold it Actually, you don't need you could argue that you don't need 50 microseconds, right? The folding time is just 10 microseconds. What would happen if you run a 10 microsecond simulation of this protein? No, you would be there, right? So the probability would be roughly two-thirds So one-third of the cases you still wouldn't have seen it fold and it would be completely in agreement with experiments and a 10 microsecond That's a pretty darn long simulation. You're running simulations in the nanosecond range but you know what if this is a simple first-order reaction all we're saying that In terms of being first-order action, there's just a single barrier and Whether things happen is really going to relate to whether we get enough energy to go with that barrier So if you the problem here is that if we then start to look into very very small numbers here The probability if you just plot this from zero to ten nanoseconds The red line is virtually linear here, right? So the probability of folding in ten nanosecond. It's going to be like ten to the minus three The ten microsecond was roughly one to ten to the minus a thousand times shorter simulation is roughly a thousand times smaller probability Now how many of you would run a simulation if it's only one chance in a thousand that the protein would fold? That's not worth it, right? That's essentially zero, but it's only essentially zero. It's not exactly zero So if you take this very small number 0.001 and now you perform ten thousand such simulations instead How many folding events do you expect to see? Ten ten thousand multiplied by one thousandth, right? So the expectation value Over ten thousand runs is to see ten folding simulations And that's essentially what we and others have done. I'm sorry That's where you're still in these folding at home. So that is one we actually had ten thousand simulations This is one of them that folded The problem is of course that out those ten thousand nine thousand nine hundred ninety did not fold But we can still so we can identify the proteins that do fold because we know what the folder state is it's a beautiful way to look at folding statistics and What I what I say what I mean when I said that this is a bit embarrassing This is not exactly new physics or chemistry, right? This has been known for a hundred years It's just that we haven't really been used to thinking of protein folding that way the book doesn't do it either So this is something that came using these statistical approaches something that came pretty much the last 15 years and This has been the driving force between some very large computing projects You know what this is That's a chip. That's actually a fairly old chip by now, but that's a graphics processor Graphics processors are very different from normal computing chips, but they are insanely good at high throughput You have this type of tips and lots of well game consoles You certainly have them in modern graphics cards, and that is so not the modern graphics card anymore so what people we and others started to do it Stanford some 15 years ago is realized that The traditional way of running a simulation and you might have done it too is running parallel or get a larger super computer The only problem the largest super computer is Sweden only has in the ballpark of 60 70,000 processors processor cores even and Then it's going to be very hard because if you make a longer simulation, you're going to need to take shorter and shorter and shorter Well wall clock times between your steps, and it's simply a problem. You can't solve So what my colleague did say in particular the idea is to run these as Screensavers over not just one or ten or a hundred computers But do it over hundreds of thousands of computers first in the US and then all over the world and The idea is that this is how we could collect ten thousand simulations for months Even that even that early project ten thousand simulations And they would need it three months that would be now one fifth of the super computer at KTH for three months for one project that's they would never give me that much time actually they might but only once and This project is now up to roughly half a million to one million clients So that has gone down a bit for complicated reasons But for a while the aggregate the computer power we had in folding at home was equivalent to the ten largest Supercomputers in the world together There have been some insane Contributors that have built these entire racks filled with GPUs and computers just to help us fold So it's in one way It's a horrible way of doing it not necessarily large and fast and simulations But a lot a lot and lots and lots and lots of very tiny small simulations And the funny thing that that enables us to study many of these concepts we talked about For instance this fold fraction and whether simple folding is the first order action You know what I know what the folded state is so I can plot this a function of these simulations on average Average time and then I've averaged this over ten thousand simulations How many simulations have folded and this is like zero percent point two point four point six So the bad thing is that initially nothing has folded and then eventually this rate starts to go up a bit It fluctuates heavily So the point here is that it's not really a first-order reaction had this been a first-order reaction We should start at zero and go straight upright as an approximation of the exponential And it appears to be some sort of time that it takes at least ten nanoseconds for anything to fold and that's Probably the time it takes to actually go over the barrier that it is a atoms do not move infinitely fast So you could say friend of water could say that okay. It's not a first-order reaction forget about it But the cool thing is that apart from that time after that it is works fairly well with the linear approximation certainly not 99 to 9% accuracy here, but It does really seem to be a part on the time that it's a finite process The slow peers in the ballpark of five microseconds and experimentally we talk about six to seven microsecond folding rates That's certainly true you could of course argue that it's some at very very large times It's something completely different. It's just that based on what we see this it does explain the folding rate, right? So that it might five versus seven does not my sound like a great fit But we talk about exponential dependencies only having a 20 to 30 percent error in something This corresponds to an error in the barrier of well the ballpark Remember what if you make an error in the ballpark of KT? You're making a factor three error roughly so that this corresponds to having an error in the folding barrier That is much smaller than KT So it's we have data this data appears to explain the experimental results That doesn't mean that something completely strange or magic could happen on much longer time scales But we haven't seen that and this would explain the experiments You can't prove that you can't really imagine you could of course you could have Again magic could happen here after after one microsecond things started to unfold and then they would refold or something Right, but that comes back to Occam's racer that this is the simplest model You can imagine these simplest model you can imagine when we simulate that that doesn't need lead to folding rates that corresponds exactly to the experiment So there is no unless we have other evidence There is no particular reason to believe that we have some more complex model You could of course imagine that this was not a full the first order reaction at all And then this would not fit at all But everything we can see in the simulations point to the fact that at least for small proteins Crossing this barrier from a molten globular to a native state is well described by a first-order model And there's no there are no bonuses for getting more complicated models unless you need them The reason why I showing this is Yes, there might be some scientists that you know that we're involved in these products But this is not just to push our own research What I love with these simulations that rather than having the book hand-waving about it We can actually check some things and This is something we're Sarah look at Probabilities of helices and sheets. So this is the probability I think we've this is over the simulation and then I've taken each cross here corresponds to a frame or something So that is the probability of having both the helix and a sheet in a frame And this is the probability of having the helix in that frame Multiplied by the probability of having the sheets. Does this tell you anything? Not sure how much mathematics and statistics you've taken They're not just proportional, right? They're identical even because the point three the slope is one So when the probability of a pair of events is proportional to the product, sorry It's identical to the product of the events. What does that mean? They're independent So the probability of having the helix is Completely independent of the probability of having the sheet and vice versa The helix doesn't care if the sheet has formed and the sheet doesn't care if the helix has formed because if they depended on the other This probability would be more complicated than the just the product of the individual probabilities, right? No, well, yes, because that the probability of the helix and sheets is Equal to the probability of having the helix multiplied by the probability of having the sheet and this tells you something about the folding models So what type of folding model is this? Mm-hmm Right, so this is a protein diffusion collision and This is what's a beautiful It's a similar rather than arguing that some small proteins might undergo this we see it it is diffusion collision It's a perfect diffusion collision model Which is a bit of a bummer in a sense because it's not really doesn't seem to be representative of these larger proteins But it's a very small one You can actually use this this is not the main part of it But you can actually use this for studying more complicated things The one of the reasons why we first started the studying this protein is we want to understand when a protein folds Would a protein folding back you in general? No So the water is kind of important at least to have us a hydrophilic environment, right? But then you could argue when you're on your weight when you're on your way to folding the structure of the protein, of course matters But all your water has oriented around the protein is the structure of the water important Well, it's not an entirely easy question, right? Because you could certainly imagine water forming hydrogen bonds or something is there some does the hydration shell that we show That was so important in the hydrophobic effect. Is there any role whatsoever in the water in the protein folding? And that's again something that we can check so you can take each of these structures rip out the solvent and you add new water in random orientation and same thing here that it turns out that The probability of continuing to fold is exactly the same that it was so only the protein structure appears to matter for the folding Not the water at all. The water is merely a passive medium Now it's what you do is that this has to do with the things that you do in simulations to write so that you can remove all the Water water molecules and then we add in new water molecules exactly the same way you did in the simulation And of course these water molecules will adapt a little bit to the protein But if there was any sort of special arrangement in the old solvent layers We would have destroyed it and this doesn't appear to affect the density significantly Again, this is just for one small protein. So these are not general results But these are things that have been this far at least have been virtually impossible to probe experimentally So at least for this simple protein water does not seem to play an active structural role in protein folding It plays a role for hydrophobicity out of all big interactions and The final thing that this enables us to do remember the hand waving that I said that the last step is really the water leaving the building and We can do this this is a bit of a complicated plot because we had lots of folding trajectories, right? And they fold at different times so I can't average them all But if I have some sort of folding criteria that is for instance that the arm is D to the protein If I then take my different trajectories and then I slide them and then I change the time scale so this is exactly the point where all my simulations where I consider them that they have folded and Then it turns out that the first thing that happens here is that the radius of gyration of the core goes down It might actually see that it goes up here But all these things have completely different y scales So I've just said that one corresponds to the value in the folded state and zero in the unfolded state So what first happens is that the radius of gyration on the core approaches the value It should have in the folded state and the RMSD the atom coordinates approach the value in the folded state And the last thing that happens roughly one or two or three nanoseconds after that It's really that the solvent density and the core goes down So for this small protein the last thing that happens to folding is that the water leaves and when the water has left The hydrophobic core they were really stable in the folded state Now of course, this is just one protein It's not universally true and everything but that's an example how fairly simple computer simulations can teach us a lot about Not necessarily teaches a lot But we can see that these simple hand-waving of paper and pen arguments actually hold in practice at least for simple proteins Yes Yes, so the problem here is that RMSD that would be measured in angstroms, right? Rages of gyration would technically also be measured in angstrom But those are completely different scales and RMSD might go from say five Down to say two or three or one the radius of gyration would maybe go from 30 to whatever 10 So there is they would have completely different scales There would be a completely different why scales to the solvents density This is a density that is not something that you could put in the same scale even So you wouldn't up with a super complicated three different plots that some things go up and something go down So what you're typically doing these kids I don't really care what the specific RMSD is I don't care with the specific radius of gyration is and Then you typically rescale them so that the value they have in one end unfolded a zero and the value you had in the other End is one and then it's only matter how quickly do they go from zero to one So that's and the other had there's been different plots I bet we wouldn't be able to say that this difference is statistically significant, but we actually could show that it was So the the RMSD here is measured in in each frame What is the value relative to the fold the known experimentally folded state? Exactly because that but that's a really important question if we had not known what the state was There is no way this would I work why so the problem is right that we started with 10,000 simulations only 10 of them folded And how can you say which one which 10 ones have folded if you don't know what the fall this state was Which is of course nice in a way that it enabled us to study the protein folding It's a horrible let down in another way We could never use this way to fold the protein if we don't know what the full this date is Bummer right major bummer. I Will come back and talk about that tomorrow. There are some really cool techniques Initially, we had some hopes that we should be able to detect changes in heat Capacity or something if you remember the book when you undergo a phase transition There is always a change in heat capacity Didn't work But there are some way cooler ways to study this, but I'll come back to that later but The key thing here is that we can at least study some things about how a false things happen at least for simple proteins It appears that we can approximate this really complicated folding transition by having a Multiglobular state there is a free energy barrier and then we go down on the other side of the free energy barrier Now that's not always the case There are some things because again friend of order would certainly say that this is not going to be true for larger proteins But even for a larger protein we could hope that to first approximation. There is at least one barrier That's going to be larger than the other ones So even if it's more complicated in practice, we can usually approximate it that way and Then if we don't know exactly what the folded state is the other alternative is rather than doing this on simulation We can of course try to do this experimentally, right? Experimentally, we should just see that as a function of time in this case is for instance Fluorescence or something so this is a folding time that goes from 0 to 500 milliseconds And then you have a curve here that describes a fraction from 0 to 1 this could for instance be the fluorescence of a protein That has out of fluorescence There are a bunch of techniques that you can use 500 millisecond is a well reasonably fast, but not super fast There are a bunch of very special technique for instance stopped flow kinetics that you Push in two syringes so that you mix something and then a reaction starts to happen and Then you can measure the fluorescence in this test tube and even you can even have a very long tube here Because if the flow continues in the tube if you keep measuring very close to the mixer The molecules that are flowing here They would only have had time to mix for a millisecond or something if you measure very far out You will measure molecules that have had time to mix for maybe hundred milliseconds It's complicated because they can use quite a lot of protein because you have to keep pushing But the whole point there are experimental techniques to study how things happen very quickly The only problem is that all these techniques they just measure where things have folded If this was an intermediate state we might be able to measure the intermediate states But they do not measure the transition state And what you really would like to get this is a protein they have in the book. I forgot what's called a chair Why I think Luisa Rano so Complicated protein and what did what we are interested if this is now is some sort of nucleation conversation model We would somehow be interested can we identify some sort of core residues here that really are the first ones that start to fold and All these techniques are completely useless They will only tell us about stable states But we're really after it's the transition state So rather than going home, we're gonna try to get to that so there's gonna be a tiny amount of mathematics here, but Less than you think and it's not that difficult So the first thing If we want to understand anything that has to do with rates It's no longer just about equilibrium and thermodynamics We're really gonna need to study how fast reactions happen and when it comes to understanding how fast reactions happen We talked about that a few weeks ago, right? There are lots of fancy names for this chemical transition state theory and everything But this is really based you have one state a you have another state b and you have some sort of barrier f sharp or something between them and The speed of going from a to b is proportional to the How high this barrier is an exponential of it and the Speed the rate of going from b to a has to do with the barrier from b Sorry has to do with the difference in difference in free energy between the barrier peak and state b. You remember that We have no idea what these constants k zero are for now, but we don't really care about it But that means that this is a complicated reaction But if you take this reaction if you say we take these tail these constants we can measure experimentally If we can just measure how fast something happens For instance in this plot, right? How fast does the actual folding happen that we can measure? So if I take the if I measure those constants and then I take the logarithm of it that I'm gonna say that the logarithm of k is Proportional to the difference in free energy divided by one divided by RT So there's a 1 over t there Proportionality, so if I plot the logarithm of k versus 1 over t I should be able to get these free energies as the slope And that's called an Arrhenius plot Way older than protein folding very very simple and famous concept in chemistry So this is a temperature scale 1 over temperature, which means that this is low temperature And that is high temperature and then the logarithm of a rate constant I bet you've seen plots like this at least if you're taking chemistry But you usually don't see plot with two lines in them. You usually see plots with one line in them So if you just for a second look at the red one here So what the red one says here is that this happens to be How far as you go from native to the unfolded state and at very high temperature this happens Past and as the temperature goes down. It's lower and slower and slower That is completely normal. That's how most chemical reactions work, right? If you don't know anything about a chemical reaction and you would like it to happen faster, what can you usually do? Increase the temperature that's always a good bet But if you look at the blue line, what does the blue line mean? Right, do you know any chemical reactions like that? They're not very common, right? So this is a chemical reaction that goes faster the lower the temperature is Well, they do happen, but it's again, it's not common. It's very counterintuitive the only other problem is that These are not two independent reactions if you go from native to unfolded Well, now you're in an unfolded state So there is also probability that you will go back from unfolded to native and in particular if you're sitting right in the middle right at this temperature The net effect is going to be well most experiments would tell you that nothing happens Because you have exactly the same rate of folding as the rate of home folding right and that's going to be pretty hard to measure So if you're sitting at this, you're not going to measure anything nothing in the system changes here Which makes it very very hard to measure Exactly that things go both left and right And it's not just that even if you are at one of these positions Here the blue curve is going to be higher than the red one But of course if you measure what virtually any experiment is going to measure it was is the total fraction of the folded state And what is the total fraction of the unfolded state? There are very few experiments that are able to measure these two concentrations independently So in particular for protein folding our enus plots are complicated It's really hard to come up with good experiments that measures only folding But not back back unfolding or only unfolding but not refolding It can it is possible to design it, but it's a pain so it's going to turn out that we frequently tried to avoid this But for a second, let's assume that we could get this in some cases we can't get it What does this give us? What can you do with folding rates? Good answer. Good answer so that either either you thought about this or you read up on it And it doesn't really matter Whenever you go through something complicated keep going back and realize so let's see why am I doing this if you don't know why you're doing it There's no way you can understand why we're doing it And any type of complicated derivation as you as you're going to see on the next slide We're going to try to derive an equation if you don't really know what you're after you're not going to get You can sit a month and play around with equations, but it's not going to help So normally when you play around with equations you need to somehow try to understand what we're after and what we were after Here's we want to understand First we want to understand these barriers We want to understand when folding happens and what type of barriers do we have at lower high temperature So it's that we somehow energies we want to get to But the good news is that we can measure this K experimentally as a function of temperature and Once we can measure the K we can also get the slope and as we saw in the ring the slope is related to this differences in free energy And even more than that we also have the temperature dependence right how that changes as a function of temperature and how it changes as a function of Temperature that really does describe when does folding happen faster or slower So this is a very common way to approach Things we have something that we can formulate them with an equation as we did on the previous slide But we can also measure the same thing experimentally So then we can try to take this equation see if we can study those things This is not as hard as it looks So what we had in the previous plot is really we had the log we know how this K changed as a function of temperature, right So we want to and it's like we plotted the logarithm of it So the slope of both those curves red and blue Red and blue the slope of those curves is really the derivative of that with respect to one over temperature And again, this is not some fancy invention I had we had them we plotted in the previous plot We had the logarithm of K plotted against one over T So literally the slope is going to be that derivative because K was proportional to an exponential of minus delta f divided by RT, right So had it been a T there and not the divided by T If I take the logarithm of that then I will get some constant I don't care about plus minus something that's proportional to and it's one over T there So if I plot that as one over T, I'm gonna get The difference in free energy divided by R as the slope Otherwise, I would got in a curve that goes as one over T No, so that in that point no, of course, this was known by Arrhenius But Arrhenius didn't work in protein folding. This is true for any chemical reaction in general But if we start We know what the expression for that constant is So that K was proportional to K zero Multiplied by an exponential, right? So we take the logarithm of that It's going to be the logarithm of the constant before it was K zero and Then the logarithm of exponential that cancels out. So this is the argument that we had in the exponential Minus f sharp the barrier divided by f in the original state divided by RT and the derivative of one over T Well, that's T raised to the power of minus one, right? That's one divided by T square minus DT Simple mathematics, although you might need to refresh the bit The first approximation this constant doesn't really depend on a whole lot of temperature If you don't trust me there go and look up what's fun Arrhenius wrote some hundred years ago for what you got the Nobel Prize and then you get these really complicated expressions So what is the log what is the derivative of these free energies? With respect to temperature and everything and you know what if I just had seen that if I was not teaching this course You'll probably have taken me several days to come up with And this is what I said that you know, but you need to know what we're after, right? If you don't know what you're after I could have spent a month looking at one equation after the other And I wouldn't know what to get But the point is if you're somewhere in the back of your mind It's so else it I was trying to relate this to something we can measure like an energy or something And then you start either you start playing around with equations or you start taking up what we have seen before So there were some chapters early on in the book where we Just for fun you started to look at how does the free energy vary with respect to temperatures and derivatives and one of those equations the book actually came up to with this When the book originally brought up this this was just a curious fact that the derivative of the free energy divided by temperature with respect to temperature is Minus the energy divided by T squared Really obvious result right not But suddenly you're standing here saying you know what we need to express this derivative in terms of an energy and This is exactly what that derivation would have given us But again, I bet this would have taken me a week to come up with But if you don't know that you were looking of Converting this derivative to the energy would have taken you a year at least if you could ever do it So if we assume that that part does not vary with respect to temperature all this stuff actually simplifies that This derivative is roughly the energy barrier not the free energy barrier anymore But the difference is energy of the transition state minus my original states and then divided by r So if you go back and think about the Arrhenius diagram now for a second so that the derivative corresponds to the difference in energy Is the derivative positive or negative? Well, that depends on what reaction we were looking at right? So the unfolding rate grows with T as a normal reaction, but the folding rate drops with T So if we start looking at this On the one hand we know that the energy of the transition state must be higher than the energy of the native state That has to do with the unfolding But the fact that the folding rate drops with T that goes in the other direction That actually means that The energy of the transition state on the other end is lower than the unfolded state So this is a bit strange so that we have something you have a completely unfolded state that has very high energy You have the transition state that has an intermediate energy and you have a native state that has an even lower energy So in terms of energy there is no barrier whatsoever. It's all downhill straight downhill And we know that because otherwise you would not have seen these Arrhenius plot with the two different slopes And then it turns out you can do exactly the same thing but express this in terms of entropies instead That's a bit longer derivation. I even think the book skips it. There's absolutely no point in doing it So here you will just have to trust me that you can do exactly the same thing for entropy and show that the entropy is also Highest in the unfolded state it drops when you go to the transition state and it drops even more when you go to the native state And this is no longer based on hand waving or anything But this is based on the reaction rates. We see an Arrhenius plots So for any normal reaction again if we forget about podiums for a second for normal molecules We would expect the reaction in both directions to go faster at the temperature goes up And in that case you really would have had the transition state be special both in terms of sorry This state be special both in terms of enthalpy and entropy But the special thing with protein folding is that it we have this Both the enthalpy and the entropy go down when we go to the native state Which in particular will hey If you're going from the native to the denatured state entropy wise that's going to be great So the only reason this does not happen instantaneously must be that there's an I should have written enthalpy there But there must be that the energy barrier is even higher enthalpy and same thing when we fold it When we're folding we're going from the nature to the native state. Well and an enthalpy wise It's downhill that should be good instantaneous It doesn't happen instantaneously and that mean that there must be a barrier with just the other component i.e. entropy Not hand-waving it's based on the Arrhenius plots This is likely a good place take a break, but you follow what this means for proteins No both happen in proteins all the time Which one is most dominant is a very good question. We should take it. We'll take a break here in a minute, but Both happens all the time Depending on the temperature one of them will dominate more than the other But at any normal temperature you will always have some protein molecules unfolding even if 99% of them falls So on average at equilibrium the total amount of proteins Unfolding and the total amount of molecules folding is going to be the same But you always have both processes happening because there are finite energy barriers between them now in terms We frequently talk about protein folding but protein folding and protein unfolding is really the same thing right we want to understand What is that makes it stable in one direction and what is that makes it stable or not stable in the other direction? So if you just want to understand how quick folding happens Then you could argue that it's just important to look at the blue curve But that would not help you if the unfolding was also so fast that everything you could fold instantly unfolded So for for the proteins to actually stay folded first we have to fold them, but then it has to be slow for them to unfold So it's not enough that it they fold fast. It has to be better to stay folded this is a great place for a break and After the question will after after the break I'll talk a little bit more about curves like this and Trying to reason what's really happening here why this gives rise to specific barriers and how we can use this to interpret things The take-home message you can think more about is really that if is a balance between enthalpy and entropy Which is going to complicate some things but also solve things for us it is 1035 should we meet here at 11? Let's get started again So where we were before the break is that we went through all those equations started the enthalpy and the entropy, right? And what we came if we forget about the equations and look at the shape of the curves This is really what we arrived at we can start at the denature state the transition state and The native state and this completely extreme Random coil state. We don't really care about for now. So looking at these three dashed lines What we could do with just by looking at the arenas plot is I show that the energy goes we start from the nature It's lower in the transition state, but it must be even lower in the native state Simply based on the fair on the rates of folding Similar with entropy it starts high it goes down in the transition state and it goes down even more in the folded state And what happens here if you can think of these two curves as two parts, right? So you have one part here That's roughly linear and then you have some higher order term But the whole point is that depending on the temperature the linear parts will usually cancel each other So that's the fact that both of them go down That just means that there is no significant difference of free energy And that will mean that it's going to be these non-linear parts in the particular the case that in the particular The fact that the entropy has a relatively sudden drop that will give rise to this barrier And that's why we have one type of unfolding barrier and the second type of Folding barrier so the folding barrier happens here Which is mainly the entropy while the unfolding barrier comes in the other way And that's mainly where the energy stuff because the energy starts going up here quicker than the entropy goes up While the enthalpy sorry while the entropy goes down quicker than the entropy goes down on the side of the barrier Yes folding rates we can see before protein structure, but of course Then we weren't really sure what is a folded state, right? Do we know that there's a unique physical state? We had no idea about science impacting everything wasn't really until we got the first x-ray structures that we could talk about Something something folded in the sense that it really was one unique state We're reaching now it might seem obvious, but when Christian Amfuss in particular started study these case This was roughly at the same time where we also start to see that the really is one state Because until the point you realize that there is one state. This is not an obvious question to ask how that state is stabilized But there was one problem with this and what was the problem with this if you're gonna measure Why don't we start measuring this for lots of proteins? No, it's because that's the first approximate. Well, that's we're gonna see right We're gonna see whether it holds out whether those lines are nice and beautiful and linear The problem is that it's very hard to measure folding and unfolding rates In isolated sir the Arrhenius plots for a process has complicated that this one in particular Where we frequently exist at the half point where they go roughly 50 50 the Arrhenius plots can be really hard to determine so what what you really This might sound like a strange title, but If you have a real experiments what you are measuring is not the rate of folding you're basically measuring What is the rate of folding minus the rate of the unfolding, right? That is the apparent folding rate. How quickly do we appear to approach the equilibrium? We start out with something that's not the equilibrium and The book spends a little bit of time going through this and I'll I'll just show you the equations because it's a bit of a strange concept What we know is that if you look in terms of chemistry and Constance of equilibrium or something The proportional molecules in the B state relative to the number of molecules in the A state That corresponds to how quickly molecules move from A to B divided by how quickly molecules move from B to A That in turn because they're the rate constants that is Related to the differences in free energy between those two states And the pre-factor cancels out And that's again This is just based on what we had in the last few slides when we said that how we express a rate constant in particular notes that The transition states energy cancels out because this is just the Again when we have equilibrium What is the probability of the molecules being in each states and you can also show that? That corresponds to at infinity when we've actually reached equilibrium How many molecules are in B versus how many molecules are in a but note that this is true Even when we're not that equilibrium. So can we do something with that? Well The first thing we are interested in is some rate or something happening as a function of time So let's start to study how quickly say the molecules in the A state change as a function of time So the change in the number of molecules in the A states Well, that's gonna correspond to first This will be reduced by the number of molecules leaving the A state, right? And the number of molecules leaving the A state is going to be the rate constant from A to B Multiplied by the number of molecules. We already have an A But we're also going to gain molecules that move from B to our A state And that's again the molecules in the B state Multiplied by that rate constant and A and B here as you will probably realize in second It's going to correspond to folded and unfolded, but now it's just two states in a simple chemical reaction Then there are a bunch of things you can do here So first we don't really know what NA and NB is right But we know that the sum of them must be some sort of N0 So that N0 is the total number of molecules and N0 Well as a function of time that's really constant But that's going to be NA as a function of time plus NB as a function of time So that we can always take NB and express that in terms of NA and exact Well, there's a constant too, but that constant is not really a function of time For now, we don't know for now We're just going to make if we can formulate something that I would ideally like to something that's a Either just ideally I would like to do so I would know what kA to B is and kB to A is But for now on consider this are we playing a bit with the equation and see where we can get We also know if we then use that equation we can say that The number of molecules in state A at infinite time multiplied by the kA to B is Equals the number of molecules B at infinite time kB to A I'm using the fact that that is equal to that So kA to B multiplied by A at infinity is B at infinity multiplied by B to A And then as a special case of that one, I also know that N0 at infinity is NA at infinity plus NB at infinity If I take all that and is that in that equation I Get that N0 kB to A equals NA at infinity multiplied by kA to B plus kB to A This is a lot just of numbers. I'll show you where this leads us to We don't really care what N0 is We don't really care what kB to us, but the point we can express is that NA the sum coordinates is constant multiplied by both of these numbers That means that this equation All the stuff that we have are on the right side simplifies to a sum of two constants Multiply does something that only contains NA So what I got here is I got rid of the molecule concentration in NB So now I have something that says how does the number of molecules in state A vary as a function of some this is just a big constant Multiplied by something that only depends on state A Does this look complicated? So many of these equations end up looking complicated because we have lots of things in them But all what we're really saying here if you this is a constant so some derivative of something NA is Minus sign, but forget about that. So it's a constant multiplied by NA itself So we come back to this a function that it's its own derivative, which is what? The exponential, right? So we integrate this We say that NA is again lots of constant forget about what that constant is for now Multiplied by that exponential of that factor And then another constant we don't really care about But the neat thing is that here we're only looking at one state And this is how the number of molecules in this state changes as a function of time in total And it turns out that the equilibrium constant there is actually the sum of the forward and backward constant So that's sort of apparent rate for how fast the total thing happens in total is the sum This looks really crazy. Why is that a? Rate constant can't really be the sum of both the backward and forward, right? It is It would be very easy to think that it's going to be a difference on but that has this has to do with logarithm laws And how is that them up? You do not have to know this derivation. It's not advanced mathematics if you want to I think the book goes through it in even more detail the key thing is that we can Express some sort of apparent constant as a sum of these two the neat thing with this apparent constant is that this is not Depending on just going forward or just going backward for a complicated reaction that goes in both ways This apparent constant determines how much for instance of either the folded or unfolded state we see So now we have a way to express We don't have to measure folding separately from unfolding. I can do all any of my Fluorescence experiments on how much folded protein do I have as a function of time or commercially how much unfolded protein Do I have as a function of time the second you can measure that experimentally with any technique? We know what this apparent folding rate is because we see how quickly that changes as a function of time There are like a dozen different experiments we can do to measure this fluorescence is a simple one Secondary structure constants might be another one of those probably hard Any any time you can do something with stopped flow or measure something with a laser or something you can get time resolution in these things down to Microseconds a short reading that and this leads to two plots at the couple of you probably Recognize Chevron plots. How many of you have heard about it? Some of you think by physics This is not as complicated as it looks rather it is in a way This is more complicated than those very simple arenas plots, but they're way easier to measure So that we have something that we're changing in this case is a concentration of a denaturant say goanidinium hydrochloride and This is my observed rate. How quickly is the reaction happening and So here we have something that measures safe folding and here we have the process measuring unfolding So the whole whoops the whole part is I'm just measuring how quickly a reaction happens I don't really care whether it's the forward or backward reaction So when we're measuring how quickly we're unfolding something well I can keep adding denaturant and I can measure how quickly say the fluorescence decays or something the more denaturant I have the The more denaturant I have the faster I'm gonna unfold so then this speed of this reaction goes up But as if we do the opposite instead and if I start to diluting this solution So I diluted more and more and more and more and then I'm gonna fold faster and faster and faster I'm gonna see that we get more and more safer Russians or something so the whole Complication or trick here is really just that we plot these in the same plot And what then happens is that you're gonna get one very clear line here And the reason why this line is so clear is that you see that it's a logarithmic scale So that this is 10 to the power of 3 the unfolding is 10 to the power of minus 5 If you subtract 10 to the minus 5 from 10 to the power of 1 or so that that point is not going to change And it's the same thing up here when one process is much faster than the other It's not going to influence it but right here in the crossover point Here you can have a some molecules that fold and some molecules that unfold So the total aggregated rate here is going to be slightly higher here. You really see that it's a sum of the two constants The only reason for doing this is that it's easier to measure. It's harder to understand than the Arrhenius plot But understanding is something you only have to do once once you've understood it once You can just keep measuring this in the lab and get a factor thousand more data. I think yet for the Arrhenius plots Yes Yes, completely different axis is so here. I'm just so the problem with with the Arrhenius plots I had to measure things as a function of temperature because I was measuring one Reaction in particular here. I just pick a completely arbitrary say six molar Go in a denium hydrochloride and then I measure how quickly dust in that case primarily an unfolding reaction happened And I just measure the speed of the reaction. I don't care about what reaction it is whether it's going backward or forward so it's like The whole point is that this is easier to get to in the lab for now, you don't know what this curves tells you Or at least I don't wouldn't know that is it obvious to obviously you have something that tells you that there is some reaction happening and Here you're going very much going in the folded direction and there you're going very much in the unfolded direction But it's so not obvious to you how to use this right? I don't expect it to be at least These are called chevron plots because of these characteristic patterns of military jackets, right? There's strange the first time you see it the second part you will never ever see a plot like that You're gonna see a diagram with this like with a dozen plots in it And we'll see that in a second. Why do that a dozen plots? So remember remind me again. Why are we doing this and what are we really after? Yes, so we started we started with the Arrhenius plots and that helped us to understand how our highest the energy versus the entropy and Then we can just see that the energy was going down But the entropy was also going down and we still really haven't been able to capture the transition state and in particular We haven't found in a good way to measure the transition state in the lab This will be just a little bit complicated, but the beauty of these plots is that it helps us to capture the transition states So why can't you just capture the transition state directly and measure it? Right the molecule will never spend any time in the transition state, right? So I can never measure a property of the transition state directly. Let's have a look at this Rather, you know what let's go back to this with this assuming that this you pick your protein X whatever X is and You measure this curve how? How quickly does protein X fold or unfold for instance as a concentration of the net your natural? This is the wild type of the protein you get one curve like this for the protein But we are interested in studying this transition state in particular if we can find one of these five six maybe ten residues That is really part of the transition state So of course to some extent this measures The folding part here that's going to measure how quickly we get over the transition barrier The entropy part right when we fold and this unfolding part is going to measure how quickly do you go from the native state To the denature state so most of these branches are going to measure how easy it is to fold versus unfold In general if we change something in the protein for instance if I mutate a loose internal anine The position of these branches will move around a bit. It might be easier to fold or it might be easier to unfold So typically a fluorescence is one great example You can anything you can measure that is somehow a simple property because I don't I don't really want to know What exactly what the structure is? I just want to know that there is some sort of structure Anything that's functionally related to the protein will work fine, too In most cases you don't need this to be time resolved. I just need to measure how quickly does it fold? Yes So this is a picture taken from the book green here is the wild type and then let's assume that we just make a mutant here And that's the red curve. So the see the curve has moved a bit both in the x and y So what we really want to do is there can I say for a given residue whether is this residue part of the transition state or not? And the easiest thing to do the L the only way can do that I can I kind of gonna need to see if it changes things, right? But what we have on the y-scale here is a logarithm of this rate constant And we know that that this well, it's not equal. There should be a proportionality sign there The rate constant has to do with the free energy differences divided by RT So if something happens faster, it's because the free energy barrier is lower, right? So let's just for argument's sake. Let's say that I mutate residue number 29 and Then realize when I mutate residue 29 The folding process goes much faster. Well, so that's that's my question to you. What can we say about that? So the problem with this is that we can't necessarily say a whole lot about that because instant instinctively It feels obvious, right? Well, if the if the folding process goes fast drives stabilized something that should be awesome That means that this must be part of the transition state But I might just have changed something that was only bad that only effect that was also stabilizing the entire folder state or De-stabilized the unfolded state. So just because something is better for the protein That's not proof that was part of the transition state. It just might have stabilized everything instead so I somehow want to capture is How much do we stabilize the unfolded to native versus how much does it affect the other part the Native to unfold the transition and it turns out that we can get both of these Sorry Yes, and this is not entirely. You might be here's the first one. I'm going to throw up If we are looking The may unfolded to native state that corresponds to these two slopes, right? so I'm starting in the green and when I mutated I moved to the red so this curve has moved down a bit and The change in the difference So I mean first I'm looking at what is the barrier So the barrier corresponds to the bar the free energy in the transition state minus the unfolded state, right? But I want to check how much so that barriers of course itself a difference But now I want to check how much this is barrier change. Does it go up or down? And I'm gonna argue that that's proportional to the RT times the logarithm and the logarithm again Is what we had on the Y scale here Well You're not used to see it in this form right, but take the exponential of both sides. What I'm saying here is that the The rate constant unfolded to native is proportional to an exponential of The difference in the barrier divided by RT, right? And that's the entire that's just a definition of a rate constant So this is not a complicated equation We just took the rate constant definition and sold it for the free energy The beautiful thing here is that R is just a constant temperature. We know and the shift Delta here, so I'm just in what is the difference in the logarithm? Well the logarithm We already have on the Y scale so that at this particular concentration Exactly how much did I move this curve down? That tells you how much the barrier from the unfolded to native state changed That's pretty awesome, right? It's by no means obvious So you have a really complicated plot that you're getting from an experiment. We're just measuring rate constants That plot tells you how much the free energy barrier from the unfolded to native state changed. I Can't say how high the barrier is saying that is much harder because that's an absolute number But the difference I can say The reason why I can't say how high it is in absolute terms that I would have this pre-factor, right? And I don't know what the pre-factor in the rate equation is But the diff in the difference the pre-factor cancels out So I can say that if I make my mutate my loose into an alanine Directly from this plot. I can say how much the T stable reduces the barrier But the only problem here is that I will reduce this barrier. There's one of two things that cannot happen either I Dropped the energy in the sorry either I dropped the energy in the transition state or I increased the free energy in the Unfolded state Which is it? Well, the folded state doesn't even enter here, right? We only have unfolded and the transition state So it was it if this number dropped is it because the free energy in the transition state went down or the free energy in the Unfolded state went up exactly. We have absolutely no idea And that's of course the problem, right? It's a beautiful definition That is completely useless Because the problem and that is why why I can't say that this it would of course be great to say that if I make a change here And if this energy dropped it would be awesome if the free energy of the transition state dropped I stabilized my transition state and then by definition this molecule must have been part of the transition state But I can't say it might just have been that my mutation could have destabilized the unfolded state instead And then this residue absolutely wasn't part of the transition state So it's not enough What else do we need? So we need to compare this with the folded state, right? We also need to see To make sure that this was not just the fact that changing the unfolded state I also need to compare what is the difference between the native state and the unfolded state Because if this did not change then the native versus unfolding was constant and in that case it was just the transition state I affected But this is pretty much the same so now I have a native versus unfolded But I can use that equation again. If I go from unfolded to native That is like going from unfolded to transition and then transition to native Right, so I just did up with two equations like that But one of them the second one of them will have a minus sign Because first I go from unfolded to transition states That's the same But then I'm going to go from transition state to native and that's like minus going from native to transition state, right? So I end up with exactly the same thing, but I get two terms like that And if you know your logarithm laws you can even take those two logarithms and put them inside one expression And the difference between two logarithms is The quotient between them So the stability of the folded state relative to the unfolded state is again constants Divided by the change in the ratio between these two And actually since I already plotting the logarithm It's actually going to be that line of it. It's just the difference between these two. So then we have the difference between Here we talk about native to sorry Native to unfolded states So now I'm going to have what is the difference between that curve and the difference between that curve So rather than just having a small difference end up with the larger one here You're going to need to think about this a couple of minutes. I'm sure Sorry, we use that a lot Well, how on earth are we going to use this? So the whole trick here is right it's going to be to compare these two and I'm going to throw out a definition here. They call you introduce something called phi So that we have the first one the barrier unfolded to native and then we divide that by the barrier Which is says the stability of the folded state So let's see what this happened what this means if if we don't change the transition state at all Sorry, if this is roughly zero That means that the difference between the transition state on the unfolded state is virtually zero, right? So that that could of course I meant that I changed both the transition state on the unfolded state equally But that's fine. But if I don't because if I've done that I haven't really either destabilized or stabilized the transition So when phi is zero Whatever mutation I introduced is not gonna change the transition state On the other hand if phi starts to be very high so that this term Sorry, I say it can only be one if this term is roughly the same as the second term. What does that mean? Well? That means that the entire difference I introduced here was really a difference that shows up already in the transition state So the entire difference I did was in transition state This particular residue is going to be right in the middle of the transition state right every single because by definition The transition state is when I start having the first few residues forming a core of the protein if the entire difference in free Energy I'm getting shows up already in the transition state This molecule is to one hundred point zero percent in the transition state If I get roughly if this is zero point five so fifty percent Well, whatever residue you had fifty percent of the difference shows up already in the transition state and the rest in the really folded state So this is a way that I can start to say for each residue. I need to do a mutation But once I've done a mutation I can say for instance my rescue 29 is rescue 29 really part of the transition state This is really cool because we're measuring something you can't measure You can't you can't really measure the transition state There is no way we can access the transition state directly, but by comparing the forward and the backwards barrier with these measures I can actually come up with this number Alan first who introduced this He's probably had like a hundred students who spend their entire piece de stewing curse like this Alan is famous for a lot of things, but there's in particular very small protein called barnace It's a bacteria ribonuclease and that's what's the name comes from but it's forget about that. That's not really around This is a small fast-folding protein It's very stable and it's actually very famous for forming interaction partners with an inhibitor called bar star And this protein protein interactions are remarkably stable So it's a it's a very is one of these toy proteins that no not toy protein, but it's a very very common Proteins for doing simple structural studies on What Alan first did this that he introduced this so called folding transitions data analysis So nowadays is just called five-value analysis because it's so extremely common So what you can do with fine analysis Alan steam they basically taken every single residue in these proteins and pretty much mutated it to probably every single other residue you can imagine and To try to determine That rest your princess right there in the middle is this one part of the transition state Is that one part of the transition state? Is that one part of the transition state? Is that one part of the transition state? Why do we do this? I come back that that's always the best question to ask yourself when there are difficult things So we want to study these folding models, right? And remember Both on Thursday last week of this I kind of think that well, you know This nucleation condensation model that we talked about Is going to turn out to be the one that describes protein best and there are certainly some good hand-waving arguments for it And even some of the physical arguments that I'm going to go through here shortly It's turned it out to be good in hand-waving sense, but we haven't really showed Remember even the simulations I showed you before the break they appear to be more diffusion collision, right? But if the nucleation condensation model is true We should be able to identify that there has to be some sort of nucleus that forms sooner than the other proteins And that's going to be the transition state. So the idea of going after all this is that we really want to see Can we find is there a nucleus? You could of course One possible outcome is that if you take a protein and mutate every residue And the five value of every single residue is pretty much zero. What would that mean? Right this bba5 protein that we worked if it's a diffusion simple diffusion collision first order model The I would guess that all the five values are going to be zero or very close to zero So if it's really there is no real transition state that we have to stabilize On the other hand if you find five ten residues in a center And remember you could do this without actually knowing what your structure is right? You could in principle do this already on the sequence level It's going to be hard if you have a large protein It's going to be hard for you to identify what sequences to start to mutate, but you can't do it So even before you know the structure you can say what residues should be part of some sort of folding nucleus So the amount if you actually in the case where we see proteins and some of these five values are much higher than others Then this is a very strong indication that this is a protein whether you have a couple of key residues That need to be the pioneering residues that form the first folded core of the protein I wouldn't mention this of course unless it worked And there are some really cool things so that alan these are alan first studies There have been amazingly important people have done simulations on them too But they're really cool part here. So we start using this task questions that are pretty hard Remember that we spoke about beta barrel membrane proteins Or rather I hardly spoke about them right because I said that well, there are some beta Sheets membrane proteins, but we kind of forget about them because they're so rare The only problem that we know that they don't fold through the translocum But they somehow had to be inserted directly So this is a paper a few years ago. I think it was 2009 or so Why a bunch of research they use five value analysis on Other membrane poor proteins. So this is a beta sheet One two three four five six seven eight nine sheets, I think and then they have Started taking this protein residue one two three four Do you see how many mutants are there are mutants everywhere? And then you need start making this in this case It's you have sort of concentration of urea here And in this case you have some absorbance or something. So here you don't see the role Oh, well, sorry here. You have the chevron plots So this is because you measuring a wavelength of absorption or maximum or something and then you translate into the chevron plots And then you're going to have a ton of different chevron plots for all the different mutants This is just a small snapshot of But in this case, you know roughly what the structure should be So each and every one of these mutants we can also map on the protein And then we can start to define. What is the folding nucleus? So here you can color things by the five value whether there's a yellow here means that they stabilize the transition state And red would actually mean that they destabilize the transition state I'm not going to go into what that means right now But it turns out that there are a couple of key interactions a couple of beta sheet interactions that have to form first That is hydrogen bond is important. These hydrogen bonds are important And that really helped this Sheets so you start at something coil and you're gradually moving into the membrane You start to form a few key interactions and eventually you fold into the membrane Yes Oh, that's a good question So in principle RNA folding is quite different from protein folding in many cases RNA folding is actually much In many cases it's actually easier to predict the structure of RNA than from proteins So on the one hand RNA is much more floppy and flexible But RNA structure is usually local structure. So this local secondary structure is really critical for RNA But there are much fewer long range interactions in RNA So in principle you can do it. I don't think I've ever seen it And I'm not even sure whether Oh, let's see People have people have certainly simulated RNA and RNA folding I'm not sure whether a whole of experimental structure where people have looked at how fast RNA folds It's an interesting question. We can look it up but The main point here is that five value analysis and just studying these transition states Enables understand some really complicated problems. There is no way we can study this in a computer simulation It's too slow There is no way we can study this in experiment because we're never going to be able to capture these horrible transition states And the only thing we know is it doesn't understand this insert through the translocum So most of our understanding of beta barrel membrane protein folding is really based on how these insert through five value analysis Super simple biophysical technique The only problem is that it I remember once hearing in winter school Alan was presenting He's an amazing scientist. There's lots of really fun stuff And then you see this plot and it's completely full of points and then you start realizing that I think he was talking about a recent PhD student But you realize that that plot probably contained 20 PhD thesis is worth the work So now there are some PhD projects were simply just about adding another 10 points in the plot But that of course what what it occasionally takes to prove a very large research problem Alan could certainly be a Nobel Prize candidate These inventors might start to be A bit too old in the sense that they're obvious, but it's a remarkable advance in biophysics But we have closer examples. Mikael O'Liverberg. Some of you have taken courses for him Mikael They work on protein misfolding and Mikael is very interested in what is it that causes from proteins to fold or misfold And the amount of charges on the surface. I'm not going to have time to go into details in his work But what they have done they've taken these simple proteins and then you start doing five-value analysis And in some cases you find really beautiful folding nuclei So blue here means that the five-value virtually one And to prove this even cooler you might not even be able to see the difference here But so this is a beta sheet protein But to really prove this folding you cannot do a circular permutation of it So do you see here that the no you don't see it here, but the n and the c-terminus here are very close to each other But it's certainly not the case is to start folding in the n-terminus and fold in the c-terminus You start folding in the middle So what now if you do a circular rotation of all the residues? So that you connect these n and c-terminus to each other and then we make the cut appear instead So there with the n and c-terminus here is a completely different protein sequence, right? So you take the first half of the sequence and put that at the end of the protein And it falls to the same structure And we can show that again with five-value analysis And again, this is just one paper that each and every point here Is where you've measured a rate constant as a function of the naturant and every curve is a new mutant So it's an insane amount of work behind these studies But it's cool because it enables us to see things that are not it enables us to see things you can't see I think that's all i'm going to say about five-value analysis, but it's definitely You don't need to be able to re-derive those equations But you should understand what this means in terms of stabilizing the transition state versus stabilizing the folded versus unfolded state Not because you're necessarily going to become ice experts on five-value analysis But it really helped if you understand that you've understood a whole lot about the folding and unfolding processes So with that you should start to be in pretty good shape In principle Well in principle, we don't understand anything right because remember what I said last night that We still have this problem with ampensen versus leventhal Is it that the native state is the lowest free energy or the native state is the fastest folding? So all I did since last Thursday was really you know what forget about low free energy for a second Let's look entirely at kinetics And I cheated and you didn't none of you read it found out What I'm going to argue that this could really be both Um, you will not be able to have a really fast folded protein unless the native state has a very low free energy And you will not be able to have a native state that is a very clean lowest free energy that you actually can reach Because if you can't reach it, it's a philosophical question whether it's a protein unless it's a fast folding one So there are some indications both experimentally and theoretically actually that stable structures will Virtually always not always but virtually always lead to neat nice rapid pathways And this is really going to be how leventhal and ampensen they say they don't say the same things But in practice, they're going to talk about the same structures stable proteins have low free energy Virtually it's virtually always the lowest free energy the global free energy minimum And that also means that there are the fast folding proteins Can we solve that? Well We still have leventhal's problem right that live in if we don't know anything that The time it takes for folding should take too long and we know what this time is Now we talk about time rather than rate constants, but they're equivalent It's just a minus sign in the exponent. So the time is x goes up exponentially as the folding barrier, right? So what if you should formulate leventhal in another way? What is leventhal saying? Simpler than that rather than saying something about time. What does leventhal say about delta f? No, what so if delta f was large, what would that mean? Is it if delta f is large for real proteins the barrier would be high and real proteins would not fall and that's obviously not the case Right, that's what he pointed out So the leventhal pointed out that in practice they do fall quickly So what does leventhal actually say about delta f? No, we don't know exactly why it's smaller than we would expect But it's delta f. Delta f sharper is significantly smaller than we would expect from searching all states So for some reason we're going to need to find a way to explain why delta f is relatively low From the top of the peak to either side Well, that's when we're talking about There's a leventhal. He primarily talked about protein folding, right? It actually turns out that protein unfolding can be quite quick So then this case it's delta f sharp would be from the transition barrier down to the unfolded state when we talk about folding No, uh, sorry, I should probably have that so And we also you also know that What is f? Well f is this balance between enthalpy and entropy, right? So that somehow we're going to need to explain this that they will need to go down by roughly the same speed because if one of them Go down significantly quicker than the other you're going to end up with a large difference and that can't happen So that the only way this can be small is that if enthalpy and entropy drops roughly at the same time Which they kind of do but there will if they did drop in exactly the same rate The difference would be zero and it's not exactly zero so What Amphis has said and what we kind of assume when we first study this is that well first you have the chain collapsing And then when the chain has collapsed you start forming all this side chain packing That's what I said even at the beginning of this lecture, right? That can't be true Because if that is true, we will get a gigantic penalty from the difference in entropy from chain collapse But we would not start gaining any energy until we packed it So somehow we're going to need the chain collapsing, but you also need to start forming these contacts at the same time Otherwise it would be too costly Do you see the analogy with nucleation condensation? We're going to need to start forming some good contact early on or it's going to be too costly And this is this concept of folding funnels that both Leventhal and others started with that There has to be some sort of pathway But I think of this as a Road be in the valleys between the mountains There has to be some pathway that is reasonably guided where I don't pay a gigantic penalty and free energy because otherwise it could never happen You can actually crack this all of you could crack this now It's not hard mathematics, but it's one of the cities would require you to think a little bit If both energy and entropy drops well We're not going to be able to get further on unless we start looking at it about how this happens But how can we know that this is right? I keep I keep talking about nucleation condensation, but that's just something that I Pulled out of my back pocket How do we know that's true? Well, that doesn't tell you that nucleation condensation is correct, right? So no, I can't really do that. So first nuclear this is a model There's nothing that says this model is right but remember the ones with properties I said about free energies that The free in general a process there might be multiple pathways a process can take And if there are different free energy barriers, which one is going to be the dominant one? The one with the lowest So all I need to come up with to explain this to show that there is at least one Possibility of folders where we would have a relatively low barrier That doesn't mean that it's the lowest one But if I have shown that there's one possibility, I'd shown that obviously the barrier doesn't always have to be high There might be something even better But for a second it's going to turn out that it matches experiments relatively well But all I want to show now Nucleation condensation provides one way Of explaining why these barriers do not get very high So necessary but not very sufficient So remember with what nucleation condes, sorry, this is a bit faint pictures What nucleation condensation did is that you start some are forming a very small core here And then we're adding more residues to the outside of this core gradually And Once you start having a fairly large volume here each amino acid or residue inside this Or is going to have Well the number of good interactions with the number Well, yes contacts or interactions or whatever you call them think of that the hydrogen bond or something or a hydrophobic hydrophobic interaction That's going to be roughly proportional to the number of residues In the native state right the volume we have in here And that's volume is roughly proportional to the radius of that one cubed But when we form the very first contacts, we don't really have any volume yet And so if I add something on the surface here Well, it's only the part here that's given proportional to the area that's going to be good Every residue will also have something that faces the outside So initially the number of interactions only goes up as roughly the area which corresponds to the size r squared here So this is hand waving mind you The well the book hand waves two sits up for the And that I would argue means that the energy of these interactions There's going to be some sort of term proportional to the number of residues in the native state And then a second term correspond to the number of residues in the native state raised to the power of two thirds Do you agree with that? So initially the energy will go down the blue line to first approximation It goes down proportionally But initially we're not going to gain that much because we don't really have any interaction partners yet It's not until you actually start to form a slightly longer nucleus that the energy starts going down a bit You can do exactly the same argument about entropy So the entropy Will initially drop I can show you that here the entropy will initially drop Again the first approximation the entropy will drop as the number of residues We are bound in the native state because those residues can't move anymore But on the other hand the first few residues that you're going to bind They're actually gonna that entropy will lose we will lose even more entropy initially when we start forming something because we don't really have any volume yet But i'm starting to start by freezing two residues And you can actually show that that term two is going to be proportional to the area So both for the energy as for enthalpy and the entropy You're going to have that the main term is proportional to the number of residues And then there's a second term that's proportional to the number of residues raised to two thirds Now this will of course depend on temperature because I here just a delta s not t times delta s But if we then take the difference between these two curves In temperatures around room temperature The blue lines will roughly cancel each other So the only effect we're going to have remaining is going to be that red line minus that red line We're just going to give you a small energy barrier Well that didn't really help the whole lot did it So I spent all these slides you said through two and a half hours today two and the best I could come up with saying that well Rather than saying that this is proportion rather than saying that this is an exponential raise to the number of residues We said that already eight lectures ago. This is now an exponential raise to the number of residues to the power of two thirds Tiny difference So let's go back for a second. So why did I originally say that it was an exponential raise to the number of residues? Well, that has to do with Levent Haas paradox, right? That if you have if each residue has say two or three, it doesn't matter what the constant is a number of confirmations The total number of possible orientations is then two or three raised to the number of residues So that we have n in the exponent All this gives us is that we still have some factor. I don't care what the factor is And now we have an exponential with n raised to the power of two third Do you remember that slide I showed you about large numbers and small numbers Have a look at it again. It's very hard to grasp large and small numbers If you put an n here So this is going to be the difference of raising something to the power of 60 versus raising it to the power of 100 That's an insane difference. You're talking about seconds instead of times the power of 10 years So this explains it So just because it looks almost the same like it's a gigantic difference And this explains Levent Haas paradox The second we get n to the powers of two thirds there It's not a problem anymore. We've solved it This doesn't mean that it's true although I'm saying is that at least with one of these folding models it is possible to come up with Time behaviors that's sorry is possible to come up with A gradual drop in both enthalpy and entropy at the same time. So they cancel each other out So this free energy barriers will be proportional to the n to the number of residues raised to two thirds Or sorry raised to the third so it's not multiplied by two thirds Have a little fun and see if you put some numbers in this tonight or something just have an idea Look at the exponential with the calculator So how do we know that this is correct? Well, there are kind of two steps here, right that the first one is that as Theoreticians we are very happy because we've now shown that there is at least one possible model that can do this It could of course be that this model is still a factor of 10 up for something But no theoretician words assault cares about a factor of 10 Right with the I guess you've seen this like that right within an order of magnitude is frequently translated as wrong by an experimentalist But in particular where you're talking about the exponential something right within an order of magnitude Is all we care about The other cool thing is that we can measure this And it turns out that if you take these Say 1.5 to 0.5 multiplying the first thirds Virtually all these small proteins measured they fall within this range A pure beta sheet also falls within this range A pure alpha helix falls a bit slower And that's because a pure alpha helix doesn't really have any folding nucleus or anything. That's kind of this Remember these one dimensional transitions we spoke about So it's kind of reasonable that alpha helix, which is almost one dimensional falls a bit faster This is a profound plot What do we have on the x-axis here? Yes, and just above the x-axis you have what n originally is right So here we have 12 residues going out to 200 residues So some of these proteins are starting to be fairly large And if you look at the corresponding to hit when you go from say 90 to 200 The logarithm of t goes from 0 to 10 So that means that the time goes up how much? The folding time Yes e to the power of 10 right, which is a gigantic number So why do you think that the proteins why don't we have proteins significantly larger? Why don't we why don't we have a whole lot of domains with 400 or 600 or a thousand residues? Right because here you start to have something that takes probably several I'm not sure what the pre-factor here is but At some point so they oh, sorry. No, sorry. Here you have the calibration. That's 10 nanoseconds So if you go up here you start into a broad seconds or something right minutes If you are at a minute and start to take something that takes another factor e to the power of 10 longer And another e to the power of 10 longer You're very soon going to be in the arrange so that something would take 10 years to fold Nature can't do that it takes too long. So therefore nature Doesn't really have protein more protein domains that are significantly larger than the size What does nature do instead? Yes, so this is why you have multiple domains is based on physics Now of course the reason it's also based on Genomics and how mutations happen right, but there you go hand in hand. It's not one or the other So what this really means is that? Folding think of the I think the folding funnel is a great Oh, I think at all the folding funnel is really great concept to think that folding is really a different process I think I have a slide that I can show you I'll come back to this in a second. Yes Think of an energy landscape like this that you're really going down You see that there's almost guided here in the blue pads And this is actually something to write from a very small peptide. I think So that you don't randomly seek out the entire landscape, but you relatively quickly Find some folding pathways And that means that the real proteins you see are proteins that have evolved to have fairly well defined folding pathways If they do not have well defined folding pathways, they would never get here That leads to something else Uh Oh, sorry No, I'm gonna I'm gonna scale there's just one thing I wanted to comment on this one Eventually as you start reducing these concentrations more and more and more and more you don't get almost infinitely fast Folding rates why not so why does this start to go down here a bit? The theory doesn't fit the equations. It seems sorry the experiments don't seem to fit the equations So that's kind of related to what we saw in this simulation I showed you to write at some point it's going to take some time for the actual folding to happen There might be some folding intermediates or some rearrangement. So You don't keep going up here. There will be some things that eventually you're not just You're not just limited entirely by the concentration or lack of denaturant But there will be some small interior processes The other thing is that you should be able to explain this When do protein what happens? This is my unfolded state And remember these simple. I think you did some you did this lab in the two-dimensional you looked at energy diagrams, right? So if you have one native state here and then lots of other states up here First what happens if all the these states are higher than the unfolded state? Because by definition you're not going to like to be folded. That's just a polypeptide So the first thing that happens if you now drop these say drop that have been that can happen at high temperature for instance If you now start to reduce the temperatures on these suddenly the native state has lower energy than the unfolded state But there are lots of states up here This is going to be awesome because if you ever go from the unfolded to one of these states You're very quickly going to go back because they're worse And then eventually when you go into the native state, you're going to love it But eventually if the temperature drops too much or something Suddenly all these states are going to start having lower free induced than the unfolded state And that corresponds exactly the thing. I think that this is bad. The packing is not correct here So suddenly you have the fingers packed like that I can't take that state and move it up to have my fingers correctly packed, right? But this is an awesome state. You have some good interactions here and you have some good entropy So for this one now to unfold we have to start paying a penalty I have to break interactions here And I have to pay pay pay pay pay so I go back And then eventually when I go back, I just get stuck in another state Because there were more misfolded states. So misfolding usually happens We have many states that are more stable than the unfolded state And eventually when you've tried this enough times, I might be able to find the really nice folded state I had This is the case where large proteins usually fold much slower. They will fold They will eventually find that state. So it's not necessarily prions or anything But since we keep having so many things that go off the pathway that have to be corrected back, it will take longer to get here And this is where chaperonins help Large proteins that get stuck on the way The chaperonins help bind these misfolded states and push them back to the unfolded state So that they will hopefully then go in the right direction And eventually of course if you end up having two states that are much much lower than the other ones here Then you might have a prion or something, but that's a separate chapter I'm gonna we're pretty much done, but I'm just going to show you one small real example. Yes. Can I take two more minutes? There are some cool things that you can do with real modern examples And this is a protein called NTL9. It doesn't matter what it does But this is a much larger protein. That's like we'll talk of 50 residues This has a folding time that takes about a millisecond But this too has been possible to study in folding at home with completely different form models I want the time to go to describe those models But the cool thing that we can see in modern simulations is that sadly all those curves I showed you have been pure lies A modern protein at a real protein doesn't fold. It's not one-dimensional. You can't draw this as a free energy that it's Single curve and that you have to go through all the states So here the red one might be the multi globular the large the size of these rings corresponds to how large the state is But from the state there are lots of other states you can go to each letter here is a state that they have observed in simulations So this is an entire mesh network is spider's web So that you can the size of the arrow here that says how large the rate constants here So you can see that the dominant folding pathway is to go from a to m to n for native But you can certainly also imagine going from a to g to i to n Both of them happen And depending on what mutations you do and everything one of these might be stronger than the other You can even imagine having something that blocks one of these transitions and then it would take another pathway This is pretty cool because as I'm these types things something that we've never been able to study with proteins before And it might very well be that protein folding is really in general And how efficient the protein folds is really determined by the connectivity of these states If you have lots of states that are connected in lots of ways Those four states are more likely to be on the folded pathway Lots of fun research. I can upload this paper on the website too. Oh, and that reminds me. Sorry. I forgot to upload the slide copies I'll do that too after lunch So in this case you have two folding pathways The larger proteins are the more pathways they appear to have And for these proteins you actually start to see it in the simulations. I'm not going to show the The video, but I can do a link to it. They start to have a little bit of the Nucleation condensation folding rather than pure diffusion collision So that as we move to larger proteins the theories actually fit much better What this means You can think of this temperature is in another way that what happens really at high temperature The reason why we don't fold at high temperatures is really that the entropic resistance becomes too large We lose too much entropy And the reason why we don't fold at too low temperatures is that the entropic resistance is suddenly too small And that just means that we get trapped in the local in the closest local energy minimum And then we don't have enough entropy to get out of it I'm not sure how useful it is to think of it that way I think I will stop there. Um, it's just one funny slide that I got from Michael. Um, I like to use Remember that I spoke to you about all these protein folding in vivo in vitro This is not so much a questionnaire result, but a fun way to think about it You had DNA from RNA to protein to folder protein, right? If you start to see what happens in life You all know how to get from DNA to RNA. How do you get from DNA to RNA? Yes, t to you, right? Agc, t to agcu It's not particularly hard to write a computer program to do that This is an insanely complicated biological problem With all these different states and everything involved And then the RNA to protein. Well, let's just triple let's write the DNA the genomic genetic code You could write a computer program that does this in 10 minutes That's a ribosome factor. It's an insanely difficult problem too On the other protein sequence to folded protein That is insanely hard to do in a computer Basically, don't try to do it Trivial in nature. Just find the free end I'm not sure why but it's a pretty fun way to look at things and this course also somehow related to the difference between physics and bioinformatics Overall, it's good to focus on the it's better. Whatever you do in life and work Try to stay in the green squares rather than in the red Um, and I'm not kidding there like it's surprisingly how common it is for people For instance, try to fold proteins with brute force It's not a particularly good way to try to predict something unless you absolutely need to And same thing here understanding exactly how this happens. Well, it might be interested as a Problem, but in practice, you're just going to predict that just use this in computers is that much easier That's all I had for it today. Um, this covers books chapters 90 to 21 Most 1920 actually, I think and then we have a bunch of static questions for you And that's pretty much all I'm going to say about kinetics This is a lot of physics. So my plan is that tomorrow to I'm going to take a fairly long discussion in the morning and try to go through this But read up as much as you can and then we'll talk about it And then I will probably spend the second half of the lecture tomorrow talking a little bit about free energies and things You're going to do in the labs Because now you know it's free energies in theory. The question is how do you measure them in practice? Do you have any questions?