 Let's see, some of you, Sarah and Samara, were away last week. I hope the lecture recordings have worked or other. I know that one didn't work, but we recorded the second half of the Friday lecture this weekend. So lecture A2 should be up now. What we're going to do this week is that we're now going to go back a bit and combine proteastructure, the stuff I talked about last week, both globular and memory proteastructure. But now we're going to combine that with all the physics we studied, what was it, two weeks ago. And the idea here now is that now we're going to start looking at the concept that we brought in folds, fold stabilization, what folds occur, why do they occur. But study this using the physical methods, in particular, Boltzmann distributions, free energies, and everything. And it turns out that most of the things that I've been hand-waving about, we can actually prove or at least argue that it's reasonable. On Thursday, then, we're going to have a second lecture. So you're actually going to be free both Tuesday and Wednesday this week. And that's because I have a board meeting tomorrow. I have a bit of reading material, if you're interested, that I uploaded on the course website. And I'll tell you more about that when we talk today. And then on Thursday, I'm going to go into, I wouldn't say heavy math, but it's a bit conceptually. There are lots, there are going to be lots of new concepts. So I would strongly recommend you to start reading the book ahead of Thursday's lecture. The beautiful thing on Thursday is that all those things that we were just hand-waving about early on in the course, like Leventhal's paradox, why do we have the proteins we do? What is the probability of a sequence being a protein? We can actually prove all that with math, which is pretty cool. And it actually works. And most of that is, today, we're going to speak a lot about experimental results just from the last five or 10 years. And even the theoretical results about protein stability is like 20 years old or something. It's fairly new, actually. And then in the labs, in particular, this week, today, Björn and I are going to start you out with a simple but real molecular dynamic simulations. And later this week, you're going to be working with real proteins. And I'm also, we're trying to sync with Marta to have you visit the Cryean facility. And the final thing is that on Thursday, we're going to have an opponent here, Tom Cidam from University of Utah, working both with free energy techniques and everything in simulations. He also has worked with drugs for a very long time because it's in pharmaceutical chemistry. And then on Friday, my idea is that you're going to join a thesis defense at KTH, actually. It's like pure coincidence that we happen to have it right in the middle of the course. But in particular, if you're on the master level and consider doing PhDs, it can be pretty fun to have seen one. But let's get started with the study questions that I put up at the end of last week's lecture. And I'm not going to go through them as always. Pick one, answer it. And I think, well, some of these things I might not have talked about, but in that case, we'll see. We'll discuss them here. Number 14, what is the fluid Mosaic model? Right. So I wouldn't say extremely old. Like the very first models had the membranes being almost rigid, right? And if we actually have all these textbook models where the lipid chains are exactly straight. So this fluid Mosaic model also has a name that Singer Nicholson proposed it. And it's not fact or like we're talking about the 1970s or something. Well, OK, it is that old. Overall, I would say it's fairly good in the sense because what the fluid Mosaic model introduces really, that proteins diffuse in your cellular membranes. And that's true. That's so true. That's that we can see that in the microscope facility down here on gamma 3 and everything. And it's absolutely beautiful. However, what you're quite right is that if you compare this, say, to a pure lipid bilayer, which is sadly what you also see in the textbooks, a real membrane is not a pure lipid bilayer. There's tons of protein in it, in particular, in your membranes, there is something that you won't find in a bacterial membrane, which is what? There are actually two things. Cholesterol. Cholesterol is one. What does cholesterol do to a membrane? Yes, it makes it much more rigid. And that's likely, well, of course, that's not a coincidence. But this rigidity likely helps us have, for instance, particular receptors and everything on membranes. Occasionally, you have receptors where you need, say, two receptors to be fairly close to each other. This has been a battle going on for like 20 years that do you have these more rigid domains in membranes that you occasionally call lipid rafts or something? I'm not going to go into that discussion because that could easily take two hours. But there are ways that human and eukaryotic membranes are quite different from bacterial ones, which is probably related to the fact that we are multicellular organisms that need to communicate in different ways. The other part that membranes in higher organisms have is that they have a cytoskeleton. So you have these protein parts and everything literally like some scaffold that keeps the whole cell more rigid. But so overall, you're right. The fluid mosaic model is important because it's the first modern model. It's not entirely true. You're quite right there too that really modern models would go more into lipid rafts and rigid domains and everything. But overall, this is pretty good. And that relates to some other model. In 15, the Popo Engelmann model. No? So Popo Engelmann, that model, too, has a different name. There is a reason why I didn't write that name. That's called the two-stage model. And did you say the two-stage model does that ring a bell? So that had to do with how membrane proteins are inserted, or rather, how proteins become membrane proteins. So this was actually also a good comment. No, so Popo Engelmann was earlier than that. What Popo Engelmann, and this is based on a classical paper, I should reveal. I'll try to upload that on the site tonight. Popo Engelmann was based on the idea that membrane proteins, they looked so simple, or at least when they had this model, they looked simple. So they came up with the idea that could it be that you have individual helices being inserted in a membrane, and that is determined based on the sequence of each helix. Every single helix has to be hydrophobic enough. And then in the membrane, these helices diffuse and find each other, even though they're connected by a loop. A loop between two membrane protein helices can be 100 residues or something so that they don't have to be that close. And this is what they showed by taking a larger membrane protein, cutting the gene in two parts, expressing them separately, knocking out the original gene. But then the cells still had the activity, meaning that these two halves must somehow spontaneously insert in the membrane and find each other in the membrane. Now, the part that they didn't include was, as you say, translocon and everything. So the whole part is just that membrane proteins diffuse together and find each other. Today, we know that the way things actually do insert in the membrane is with the help of these translocons, but Popo Engelmann didn't talk about that. And that is kind of important. Number 16, was the difference between the channel pump? Yup. Channels provide a facetly aware of the ions to go through the membrane, following the concentrations of the chemical gradient. Other pumps work against the gradient, actually creating the difference across the group. Yes. And what does the pump use for that? Yes. So the whole idea, pump is not spontaneous. A pump uses energy to go to something that is further away from equilibrium, but the ion channels restore things to equilibrium. So that's kind of stupid. Why do you have the ion channels in that case? Wouldn't it be easier to just have the pumps? Well, but you already wasted the energy with the pump, right? So wouldn't it be easier to just pump it directly where it should be? So the key thing is that if you only had a pump, if I wanted to draw something on that whiteboard, it would happen like this. Pumps are so slow. So pumps are like at least a factor of 100 slower, if not a factor of 1,000 slower than a channel. And the idea is with the, remember these things that we showed you with the sodium potassium ATPase, right? You have to go through like a sequence of 10 different conformations on the unit to use the energy. It's an exceptionally slow process to keep moving these ions, but the beautiful thing with the channel, you just open a hole and they go through right away. So channels are super fast. That's like compared that to charging a battery versus just using the charge. It usually takes longer to charge the battery than to use it. So that way we might want to go into, do you have some ballpark ideas how fast these processes are? I know that I did talk about it. For what's one? And no, that I would say is too fast. Let's start with the first one. How long does it take for an ion to go through a channel? It's good that you're guessing. There is a saying, I think it comes from Richard Feynman too originally, that any physicist worth his salt or her salt should be able to estimate anything in the world within an order of magnitude without really knowing and think about it. Should we try to do that? So how quickly do atoms move? Well, you could look that up easily with a diffusion coefficient, right? But an ion that is not hindered by anything would move several nanometers in a nanosecond. On the other hand, even though, to first approximation, we can assume that a channel is just a hole that is not hindering the ion at all, but is of course not a completely random motion in any direction. To actually go through that hole, we would need to go in the right direction. So let's assume that that might reduce it by an order of magnitude or something. So then we're talking about, say, 10 to 15 nanoseconds or something for a single ion to go through a channel. That's pretty much the ballpark we have. How long does it then take for a channel to gate? That is whether the channel actually opens or closes. Well, you could start to estimate that the same way, but that's harder. So then you will start to need to what are the typical energies you have in a protein? How long does it take for a protein to change the conformations? Because when a channel gates, you literally have to move healers or something to close or open a channel. The very fastest nerve responses we have in the body are in the ballpark of, say, 1 to 10 milliseconds or something, but that will, of course, have to involve a whole cascade of these channels opening or closing on the way. So let's say 1,000 or something, and then you're down to a microsecond or so. Even that would be a bit fast. I would say 10 to 100 microseconds for the fastest channel gating. And that actually corresponds roughly to the time scales you saw in these simulations of the voltage gated channels, 100 microseconds or something. And that is very much related to 13. How long does a larger structural transition in a membrane protein take? Well, that would be something opening or closing or something you're talking at least 10, if not 100 microseconds, maybe a millisecond or something. But we rarely need to wait longer than that. And most of that time is spent doing what? Waiting, why? Well, yeah, partly that's right. It has to do with the Boltzmann distribution. And I wouldn't necessarily say so much energy, but it's more about entropy and searching for packing. So it's based on probabilities of attaining that free energy. So it's not just building up energy. A lot of this is searching, and that has to do with the entropy. So you could say building up free energy, although that's not entirely actually correct. Let's pick something else. What are the functions of membrane protein? So transporting things in general, could be ions, water, or even a charge of proteins? Right, and there's something else that might transport, which you might not think of as transporting something. Is this a signal? Yes. So if something happens on the outside, that should result in something happening on the inside of a cell. And this could be a cell growth or a signal for the cell to, well, cell division is cell growth, of course. But anything that really should happen on the inside as a consequence of something happening on the outside, there is usually not, unless you need a specific molecule, you don't necessarily have to open the cell. There is something else that happens in a human cell. What do you have? In a human cell, in contrast to bacteria, you had different compartments of the cell. You have the entire cell, and then you have the nucleus, right? So what do you have in the nucleus of the cell in particular? Nuclear material, but yes, with what type of nuclear material? DNA, yes, genetic information, right? And at some point, when you start to read the DNA and then you get the RNA, and you're gonna need to get this information out to ribosomes and everything, you're gonna need to transport things to membranes. So there's a gigantic complex called the nuclear pore complex with just like 50 to 100 proteins. And they literally allow gigantic molecules to go through and everything very selectively, but just still force water and everything to be out of it. We still don't really, let's hear. There are models of structures like these now, because they're so gigantic that you can't just take the entire 50 protein complex and determine an X-ray crystal of it. So what people have systematically done, there is determined complex, sorry, determined structures of each parts, and then they frequently use low-resolution data, say cryeum or something, to determine how these parts are fitting together relative to each other. So there are models of nuclear pore complex based on a combination of crye and NMR and X-ray crystallography and bioinformatics. Extremely cool. I'll see if I can find that paper for you too. But it's a good example how you need to combine computers with experimental data. Yes, but the communication between cells happens with chemical signals. Well, there's one example. So that would mean that one cell releasing something that a second cell binds. And that's of course, both of those things are related to membrane proteins. There are even cooler things. For instance, if you're looking at, it's been went, well, it was a relatively mild winter, but if any of you caught the flu. So the flu is really a virus infecting your cells. And the way the virus infects yourself is that it has, the virus has genetic material on the inside, right? And then they have special proteins that are sort of fusion-aiding proteins on the surface of the virus that helps the virus membranes to attach to your cellular membrane. And then it uses with your cellular membrane and delivers its genetic material on the inside of your cell. But it's the same thing there. It's all proteins on the surface. So we have other questions. Is this your work? Yep. You said that this is your somewhat interactive present-person membranes, but probably not in the main bite-inside. It's more of an upstream protocol. So that's good. And actually this is a great or bad question, depending on how you see it because it's like three or four possible answers. At the very last lecture next week, I might, I will probably spend that lecture talking about modern research, both stuff that we are working on and other people in the lab and the department. The important thing here is that this appears to be a combined effect that there are things that enter the membrane, but it's not just the membrane. They are interacting. This somehow leads to an effect on the proteins. Exactly what that effect is, I haven't told you yet. But it's also a complicated interplay between the lipids and the proteins. And the reason we know that the lipids have to be involved is that the more hydrophobic and anesthetic is the better, well, the more hydrophobic a molecule is, the better it works as an anesthetic. May or over time. One, how are membrane proteins stabilized? Anybody want to have a go at that? So hydrophobic everywhere, that's a good start. Which means that the actual stabilization or interaction has to do with what? Yes, or packing in particular, right? That you have to do with packing hydrophobic side chains. Which is problematic, because it's all the Lennar-Jones interactions. It's exceptionally difficult to get this right. And this is one of the reasons why it's much harder to predict membrane protein structure. There are fewer clear signals that, say, positive charge should be interacting with a negative charge or something. For globular proteins, we can very frequently find beautiful signals, but that rarely works in membrane proteins. Now I was answering two, two, which is fine. Also, the traveling through the membrane might be such a high barrier that once it's in the membrane. Yes, and that relates to the biogenesis that we can talk a little bit about. So biogenesis literally means biological origin. So what happens, if you go beyond this Popo Engelmann simple model and look at the real biogenesis of membrane proteins, how does that happen? Yes. And this is something I didn't mention. Why do you think you have the translocal? Right, and this is something fundamental in nature. If you didn't need the translocal, there is no way it would be expressed in all these organisms, right? If it was, you could of course argue that this happens slightly faster, but there is a very large cost to producing a protein and expanding the energy and the ribosome and everything. So if we didn't need this, I would bet there would be tons of organisms, in particular bacteria that I got rid of it. So we do appear to need translocals to insert things in membranes, and I don't think that that's entirely based on kinetics. Something of this is actually stability, in particular when it comes to these helices I showed you like the ones with the four arginines. So I was cheating there. I'm gonna come back to that next Friday. I'm not sure whether I should be happy or sad that none of you discovered it. Remember that we talked about all this biological hydrophobicity scale and that it was cheaper than we expected, right? But the cost of inserting a single arginine was still positive. And there were these helices in the voltage sensors that had four arginines in them. And even though it's not quite as expensive as in water, we certainly don't gain, sorry, even if it's not quite as expensive as insert them from water in a membrane as into a pure hydrocarbon, we certainly don't gain energy from it. So how on earth can we actually insert a helix with four arginines? Because both according to POPO Engelman and according to Biogenesis, we do insert the helices one by one. And this is a helix that has positive Delta G. It should not insert. And yet it does. I'll talk about that next Friday. That's something that's been troubling us the last decade, but I think we've solved it. So relate to that, what sequences become membrane proteins? That's a pretty smart answer for that, wasn't what I was looking for. So what is it that defines those 40%? I would say 30, but 40 would be fine too. Yes, so hydrofobicity goes an extremely long way. Two thirds of this is, you're gonna get two thirds, right? Just by looking at hydrofobicity. In addition to hydrofobicity, because in particular for large globular proteins, there are parts on the inside that are hydrophobic. So when it comes to membrane proteins, something that works exceptionally well if you start looking for segments that have length roughly 20. Because they have to, and the helix that goes through a membrane has to be roughly 20 units long. After that, you need a small loop and then you need a length of 20 again and then you need a smaller long loop and then you need a length of 20 again. Then you also have these patterns that you want positive residues on the inside of the membrane, right? So that's what we typically use to predict this is something called the hidden Markov model, which is just a machine learning algorithm that is frequently used for speech too, for instance. And this is very similar to speech that you have different words that are separated, but there is a grammar and everything too, and you can't combine words any way you want. So that's why it's surprisingly efficient to predict membrane protein topology. We can predict the helices and where the loops are, but then predicting how these helices are packed together, that's way harder. So we already spoke a bit about the translocon. How does the translocon alter the free energy of insertion? No, no, what did we say about free energies? This was, of course, a trick question. What do you know about free energies and states? Exactly. We could literally have a black box of magic there and you know that it can't influence the free energy. It can certainly alter the kinetics of insertion, but it cannot alter the free energy of the insertion state. It likely reduces the barrier, but not the free energy of insertion. And actually, I should probably have said free energy of insertion there, yeah. So I spoke a little bit about voltage and ligand gated channels here. Those aren't really questions there. I should probably have been a howler or something, but since I will come back to both those next week, we might skip that for now. I think we had the difference between a channel and a pump. I think that covers most of it. What I'm going to speak about today is the primarily fault stability and structural evolution. So that's related to all these faults that we've been seeing in the last few lectures, but now we're going to go through to physics and not just hand wave, but motivate this. This is pretty much primarily chapters 15 and 16, but I might go in and do the beginning of chapter 17 too. And there are going to be some things that you see here that look exactly like Boltzmann statistics, and that's going to make you so happy that we recognize it until we realize that's completely wrong. And at the end, I might have time to talk a bit about transitions and allosteric modulation in particular. So as we spoke about last week, is that we came to the conclusion that it's kind of fun that as amazing and diverse as the universe of proteins are, protein sequences in particular, there are surprisingly few faults. And again, just to give you a gut feeling of orders, how many sequences are there? Sequences, how many sequences are known? So this is a good question. Let's start from the beginning. How many genes do you have? Do you have any idea how many genes a small bacterium might have? There was actually a very fun paper published just a few weeks ago by Craig Venture when they had created a, I'll dig that up for you too. They created an artificial organism, so they're taking a bacterium and made something that could live with a few genes as possible. That was just a few hundred. And that's why G, sorry, that's why bacteria are much more beautiful than you are. It's pretty amazing to get something to live with just, and if you're talking about a few hundred genes, you're talking about a few hundred proteins that can sustain life. So all these processes we go through. So we take something like simpler, like a pine tree or, Nordic spruce or something. Any idea how many genes that contains? 200,000. So you're, so the Christmas tree is roughly 10 times more complicated than you are. And that's pretty fun. We don't know why. One possibility could be that there is not a whole lot of evolutionary pressure on Christmas trees. I guess it's important not to be beautiful, but while a bacterium is the opposite end, it's an extreme evolutionary pressure. So that it's costs, keeping up all these different genes costs of money. So from one point of view, you could say that they're in the ballpark of a hundred, a couple of hundred thousand completed different sequences, but then there are all these variations. So in particular, now that you can do high throughput sequencing, we don't sequence hundred million human genomes because they're hundred million copies of the same, exactly the same sequence, right? There are variations there. So when it comes to individual sequences, you're talking about hundreds of millions that have already been sequenced. I would even say probably close to a billion or so. So that's for all intents and purposes, that's infinity. There are way too many sequences. And for most of them, we just know that it might be, it says putative protein. We don't know what most of them do. That's a huge challenge to get some sort of automated annotation working, but that's bioinformatics. And then we come down to the number of proteins for which we know the structure. How many, what's that? How many entries are there in the protein data bank? Yes, or a hundred to 200,000. It depends a bit on how you count. But then when you start talking about folds that is different to polygys in which these proteins can fold, you're talking about a thousand, maybe 2,000 or something. There's an extremely small number compared to the, say, a billion sequences or so. So this is gonna say why, the important question that we're gonna try to answer, why do most sequences seem to fit a relatively small number of folds? And the first thing you can do is you can do a bit of observation. So if you start doing a bit of statistics in the protein data bank, it's far worse than this, or better. It turns out that it's not a matter of 2,000s. It's a matter of one or 200, maybe three, 400 folds that account for almost all proteins. This is the classical 80-20 rule. 20% of something explains 80%. So that it's not just a thousand that when you start studying proteins, and that's why I actually showed you some of these folds, it's not that they occur for one or two proteins. The reason why I showed you some folds the last week is that you're gonna see those in hundreds, if not thousands of proteins. They're extremely common. And that turns out to be the case for RNA, too. You can actually do structure determination and structure prediction on RNA. We usually don't do it, but it's kind of fun. We know much less about RNA than proteins. But DNA is the extreme opposite. For DNA, there's pretty much only a single fold. You can have slightly different confirmation, A, B, or Z DNA, but you're not gonna see this diversity in DNA. And for all of these, you're gonna have tons of homologs that are related and up in the same fold. You could also, in some cases, might of course be some sort of functional convergence into folds. If you wanna have a protein that should bind something on the inside, you can either imagine having, say, a four-helix bundle like this, or you could have two small beta sheets, both of them form a pocket. I guess in theory, you could imagine something having one beta sheet and then one layer of alpha helices. But somewhere there, you run out of options, right? If you just want a small pocket, you have roughly three ways. You might have two different ways of putting the helices or sheets, but we're talking at most 10 different ways to create something small that binds something on the inside. So in that case, that's not evolution in the sense that they are related. It's just that to get a specific function, there simply aren't that many different ways to do it. And in some cases, there also gonna be some very clear physical restrictions that, for instance, if you want something to go through a membrane, right? You have to have a hold and you have to have something around that hold to shield it from the membrane. That doesn't mean that all membrane proteins are evolutionarily related, but there is simply physics but some very strict limits on the diversity you can have if you want to hold through something. And that leads those two, say two or three different alternatives. Why are proteins similar? Based on the bioinformatics course, this is what we should say, right? That everything is based on evolution and then things have gradually diverged and we might simply not have had time to diverge to that many different folds. Because at some point, if you are in fold A and you should somehow evolve to fold B, right? At some point you will actually need to change the fold and that's gonna be a very big step. So it might seem reasonable that it's hard to move between folds and therefore we haven't explored as many folds. Another possibility is what I just told you that there simply might not be that many folds that are actually useful to have. So you can imagine there could be another billion folds but those billion folds are not functionally important or good and therefore nature hasn't explored them. But what we are more and more ending up with in this course is that based on physics there simply aren't that many folds that are possible. And of course there is some truth to all three of these points and it depends in what sense you're looking at podium similarity but try to keep all three of them in mind. In this particular course we're likely more here in bioinformatics you might be up here and from a medical or perspective you might very well end up in the functional convergence. And the same for working on the thousand blocks. So this is of course partly related to it. In DNA you have less diversity in DNA in the type of building blocks. And that's just when it comes to DNA the base pairs on the inside that's just one part of the building rock, right? You have all the phosphates in the backbone of DNA and that's exactly the same. So the whole polymerization in DNA that has to do with the backbone that's identical and that's why I have a much more regular structure. On the other hand you do have RNA where we don't have as regular a structure. So in general it's hard to say. But yes, there's some truth to that. So when it comes to proteins there are a bunch of, the book goes into quite a lot of detail here so I'll just cover it briefly. Because we have much fewer folds than you think that's why it works really well to try to classify or divide things into folds. Say all these red blood beheluses and then you could have some of these are globin-like. Remember that I spoke about HEMO and myoglobin? It's an extremely common fold, kind of tetrahedral. You have these up-down helix bundles and then you might have a bunch of beta sheets that are different forms and then you have this mixture of helices and sheets. The reason why this makes sense is that there are relatively few folds. So if you're not doing bioinformatics one possibility is of course to go straight for a structure prediction or try to find something related. That's gonna work great if you have a close homologue where you have a structure then you're gonna find it. But in many cases these homologues might, their sequences might have diverged so much that you can't really identify them. But what if you take your sequence and we essentially try to thread it so that I take the sequence I have and see would my sequence be stable in this fold or that fold or that fold or that fold or just a part of a fold. And that's also something that's very commonly used in bioinformatics. I think Arne might have spoken a little bit about fold recognition, right? So the whole idea with fold recognition there are so many folds so rather than trying to explore all of the universe just look at the thousand folds or so that we have and see do you fit in one of those. In theory your protein could be fold 1001. The likelihood for that is, well the first approximation is zero. So that it appears the folds we have are fairly simple permutations of helices and sheets. They're all characterized by these stable nice local patterns that lots of hydrogen bonds that we spoke about last week. There are some very fundamental hydrophobic patterns that you typically have one side of a layer that's hydrophobic and the other side is hydrophilic. And as we spoke about last week, these sheets, they really wanna form larger continuous sheets. Having an isolated strand is extremely bad. And then based on that, the book at least spends like one full chapter to go through fold classification databases. This can occasionally be important in bioinformatics. I think it's rare that you're gonna sit and work directly with this databases. But I figured you could at least update because some things have happened. There are two big databases. I would say one is called CAT and the other one is called SCOP. But SCOP is nowadays SCOP2, completely new database. The reason why I mentioned this is that CAT is class architecture topology homology. And the whole point is that this is a hierarchical division. So class and architecture, this, well class would be these four alpha beta or alpha and beta. And the architecture and topology would be the overall way which things are organized. And eventually you get down to the homology when there is actually sequence evolutionary, evolutionary related sequences. CAT is entirely automated and you can have, for if you just give CAT a new sequence, you will have a computer sort this instantly. SCOP is extreme opposite. SCOP1 was for a long time handled by Alexei Mertzen, who came from the crystallography community and everything and they pretty much sorted things by Alexei sitting down and looking at it on his screen. I love SCOP. Most people in there, SCOP relies on the fact that Alexei Mertzen is a very smart person. So SCOP, there are some beautiful thing ways, for example, where proteins are, at first sight they appear to be completely unrelated, but there is actually some very important functional relation between proteins. Now of course the computer mixes that up all the time. Alexei gets that right. But then there are other cases when Alexei absolutely thought that they should be related and he was wrong. And then of course there is still wrong here. So these are both databases you might hear about at some point, but they're just hierarchical classifications of protein structure, where the upper part relies on the cause that we've seen in this course and the lowest part relies heavily on bioinformatics, sequence relatedness. You need both. And when we're talking about this relatedness, that brings us straight to the next point. Evolution. Forget about, well, evolution of course happens on the sequence level, right? But what decides whether a mutation is successful or not? And how does that selection work? Right. So remember what was the central dogma of molecular biology? Sequence structure function. So there is something else there that we've left out before. You just said it, that there is also an arrow from function back to sequence, right? And that's evolution. But the reason why we have this function is of course based on the structure. So the vast majority of mutations have no effect whatsoever. Some of them do have minor effects. By far the worst effect, if somebody has an effect that influences the organism's ability to breed, then of course it's gonna have an instant effect in one generation. Many of these effects are much weaker. And in particular, some of them might not even strike you as a disease or something until you're 50 years old. And in that case, there's hardly gonna be any evolutionary pressure at all. And that's, for instance, why we have all these prion diseases and everything. They don't really enter that much into evolution. There are a bunch of examples that I've talked about before. And I might bring up that. Hemoglobin is a great example. Lama hemoglobin binds oxygen harder than ponyhorse hemoglobin. And this has been shown exponentially. There are a few residues mutated so that it binds oxygen harder. And why is that? I said, Sarah, it wasn't here last week, we'll ask her. Where would you expect to find a llama? Have you seen a lot of them in Stockholm? Exactly. And what's the thing with oxygen and high altitude? Exactly. So that this is an organism since there is too little oxygen in the air, it's a big advantage evolutionary if you have hemoglobin that binds more of it. Here at sea level, it doesn't matter because we're not really deprived of oxygen. Same thing, fetal hemoglobin is different from adult hemoglobin. And this is even cooler because this is something that occurs in all of us, right? Depending on how old we are, there has to be differences in gene expression. And this is one way where you're very different from bacteria. All these introns and exons is that humus, we need a system that regulates gene expression. A bacterium doesn't. A bacterium might, well, there are other poses of bacterium when they divide and everything goes through, but the more complicated your cell is, the more advanced mechanisms that you might need to regulate things like this. There are a bunch of other cool examples. I think I have one of them here. Hemoglobin that we're gonna talk about later has, and I bet, I'm gonna talk about the allosteric regulation of hemoglobin after the break, but there is a famous example of a mutation in hemoglobin called sickle cell anemia. Discovered in the 1960s, if I recall correctly. So the disease is old. So it's a disease that strikes people and particularly in the Horn of Africa. And if you actually, if you get, and it's caused by a mutation, if you get these mutations on both chromosomes, then you die. If you get them on one chromosome, you're gonna have a significantly lower ability to bind the air in the red blood cells. And if you look at these red blood cells in a microscope, they appear collapsed. They literally look like small sickles rather than the classical red blood cell shape. Why on earth would that ever survive evolution? Right, so that for whatever reason, this means that the people that have these mutations are much less sensitive to malaria, which again, in Stockholm or the US or something is completely pointless and we don't see it here, but in a particular region of the world, suddenly there's, instead of having a negative evolutionary pressure, which we would have in most of the world in this small region, it's a positive evolutionary pressure and that's why it has survived. You know exactly what these mutations are, although I don't remember where they are, but you can look that up. If you look at the hemoglobin molecule, there are like two or 300 mutations related to disease. So one thing when you start looking into things like that, you pretty quickly start asking the questions, are there some systematic differences between prokaryotes and eukaryotic or even vertebrate organisms? And I would actually say there is. These higher organisms, while the folding patterns seem to be similar, eukaryotic proteins, they usually have more domains, they might have more complicated domains and there are simply more things we need to do in eukaryotes. That is not entirely good. Well, it is good, but these channels I spoke about, this was the first channel determined called KCSA, it's a bacterial channel. That's pretty much landed Rod MacKinnon, the Nobel Prize. This is KV1.2, which is the eukaryotic one. K here stands for that it's a channel conducting potassium ions. Any idea what the V stands for? Voltage gated. So this is also a K channel, but this is not voltage gated. This is regulated by pH, which makes sense from one point of view because a bacterium doesn't really have a nervous system or anything, so you don't need to connect nerve signals. The other thing, which one of these do you think is most expensive to produce? So there's way, the cool thing is that the sequence here in the center is pretty much the same sequence here. You have these four lined, it's a tetrameror of four helices. But what we've done in the human is that we've kind of, we've added an extra domain part to each of these. There are four helices here and then another two helices here. The two helices here are similar to the eukaryotic one. So this is an extra domain that we've added in human to get it voltage regulated. Costs way more, but that also enables you to have cool things like a nervous system. This is much more efficient, but less fancy. And of course, a bacterium is all about optimizing efficiency, not about being fancy. What we've also seen that we don't quite know, but it appears to be the fact that all eukaryotic proteins are more complicated. They are less stable, they move more, they have longer loops. They're a royal pain to try to overexpress and crystallize. I have no idea how many examples that are that, well, the nicotinic acetylcholine receptor that I showed you, people spend decades trying to get that to crystallize. We never really got further than eight or nine angstrom diffracting crystals, which means you could never get a structure of it. After 20 years, people gave up. Prokaryotic structures, glick and elick that I showed you, took a couple of years and people had crystals often. So for some reason, it appears that it's much easier to stabilize and determine things for prokaryotic structures, and that's why we use them as models. Now, that's, sorry, yes. Are they overexpressed in single cells or used? So that's also a good question, that any time you have a eukaryotic, the first thing you try to do is overexpress it in a bacterium. Because the difference, if I have a protein, I just need to know the sequence, right? To overexpress this in a bacterium, I wouldn't even do it myself. I would go to a facility across the road here. I would send them the sequence, say I would like some of this overexpressed. Three weeks later, I would get a flask with one liter of it, and I would pay like 500 kroner or something. Super cheap, easy. The problem is that it's kind of, it's at least 50-50 that they would come back to me and say, sorry, didn't work, because this is a protein that the bacterium might not naturally use or anything. For whatever reason, it might not even fold. And then you might go to a slightly more complicated bacterium-based facility, or you might go to Baker's yeast, Saccharomyces cerevisia, oh, sorry, beer yeast. That might work better, it's a eukaryotic, but the problem there is that at least one-third of the cases, the answer is still gonna be, nope, didn't work for whatever reason. Didn't fold correctly. And then eventually, yes, you get into human cell lines, competent cells or something. When you first, a normal minus 80 freezers is not gonna be enough, you need minus 140 degree freezers, you need a super complicated lab, you need ethical permits. And then the bath you get from this, after paying 100 times more and waiting 10 times longer, you're gonna get 1,000th of the amount of protein. So when it comes to overexpressing proteins in human cells, you have these entire labs and people are standing for weeks and years trying to do it, it's insanely difficult. But of course, at the end of the day, you might not have any alternative. For expressing bacteria and then getting, nope, as in bacteria. Well, we know that it's the same sequence. You can't be guaranteed that it has the same fold. So what you frequently try to do then, for instance, to determine a structure, is that you try to be smart, say can I find mutations that stabilize this protein, that makes it more stable, that cut out the loop or, well, in this case you may demand, what if we cut out all the voltage sensors? That might very well help, but then you don't really have your protein anymore, so that would be pointless. So hopefully you try, you hope for the best, basically. You try to do a handful of mutants, you frequently try to learn from the prokaryotes, and then you hope that your mutants didn't destroy your protein too much. But there are certainly a long story of examples where this had gone wrong and went up with crystallization artifacts. We even had an example of a crystallization artifact on Friday on these, the first structures of these sensors that wasn't based on stabilization though, but when you had these, the fourth and the third helix here lying in the interface instead of sitting straight through the membrane. Mistakes happen. Just like you have bugs in a computer program, there are mistakes in experiments too. Oh, I wouldn't know, I think that had to do with the, so the problem is that when it comes to, when it comes to actually crystallizing these proteins, you don't have the membrane, you have them in a micelle environment or detergent. So what you typically do to crystallize the mystery is you have a gigantic antibody that you get to bind to certain parts on the surface of the membrane protein. And what you really, since this antibody is not a membrane protein, that's a gigantic globular protein. If you're lucky, you can get the antibody to crystallize. And as part of these gigantic crystals now, you're also gonna have small parts where you have detergent and a membrane protein attached to your antibody. So you're really getting the membrane protein structure, which is what you're really after as a byproduct to the X-ray structure of the antibody. But now we're getting into complicated things about X-ray crystallography. But it's hard. It's super hard. All of this stuff, you might think it's so common in these courses as we talk about things in the 1850s or the 1950s or something, but there's some pretty fun new stuff here. 2011, just four or five years ago, there was a really cool paper in science. And this is, I've uploaded this on the site where they showed that just the last, well, a few hundred years or something, as we've started to pollute the Hudson River around New York, you can actually show that Atlantic Tomcat has developed selective resistance to PCBs. And this is typically the way evolution happens. So up here in the pure, beautiful, well, at least pure waters, you pretty much just have one type of allele too. But what then has happened for some reason, there has been a duplication of this gene and the closer you get to New York here, the more they have of the second type of the protein, it's a receptor that, and this second receptor for whatever reason is not sensitive to the PCBs, which I guess is pretty important if you live in the Hudson River as a fish. But remember that it might all be a hundred years, but again, we certainly haven't polluted the Hudson River for a million years, right? So evolution, and this is definitely a structural evolution. You can look at the papers. What's actually pretty nice with this nature in science papers is that they're extremely brief. They're three, four, five pages and they're written for a fairly broad audience. It's a beautiful example that there is a second type of the receptor. We didn't know it's structure, but we do know that there are some key sequence differences and over the course of at least, not more than a couple of hundred years at least, this means that you're now sensitive to something. So evolution can happen pretty quick. And what you say, sorry, it's not just that we have a second type. You see, the second type is pretty much taken over, right? The original has died out. So that leads us to another topic, structural stability, which is very much related to what I spoke about in membrane proteins, right? What is it that determines if a particular protein is stable or not? And you know, some of this, it's definitely based on the hydrogen bonds. If you have lots of nice hydrogen bonds in a structure, for instance, inside these beta sheets or something, that's gonna be good for you. It's kind of like this art thing. You know a good protein structure when you see it, but that's not quite enough. We can't have too many loops and coils in the interior, but this kind of begs another question. It's not that we just can't have too many. You virtually never ever see them, right? It's extremely rare. Even if you have one helix that then has two, three loop residues and goes onto another helix, that never happens on the inside of the protein. Those loops always seem to occur on the outside. You never have loops in the center of it. That's kind of strange. It's reasonable that the edges over all must face water, but it's not just that it's a bit strange that it makes sense that it costs, well, money I would say, it costs energy to have these effects. But why is it that we still have 100,000 protein structures? Why do you never ever see them? That's kind of strange. It's not just that it's rare. It seems to be forbidden. And as efficient as evolution is, evolution is not 100% efficient. You can also start to think of this in terms of layers. I'm actually gonna show later on that you can have a protein, well, based purely on probability, you could imagine almost any sequence folding a simple one layer protein. But I'm gonna argue that the reason we don't see it is that it's pretty much entirely useless. Why? And why can't you really have any function? If you think of globular proteins in particular, right, so you can have a small blob that's hydrophobic, hydrophilic on one side and then it has to be hydrophilic on the other. And that's certainly nice if you like small blobs, but that blob isn't really gonna do anything. Well, technically, I guess it could bind something on the side of a blob and then when you bound something on the side of a blob, you bound something on the side of a blob. Which, again, from an evolutionary perspective, it's pretty hard. It's very hard to do anything. It's like, it's hard to build a machine from a single piece of metal that you can't really move or do anything with. Two layers, this is awesome. The second you have two layers, you can separate things. You can have an inside and an outside. And a globular protein, this could be like two layers of beta sheets, that hydrophobic pocket on the inside, hydrophilic on the outside. Even a membrane protein, you could think of as two layers, you can have a beta sheet and a membrane that's hydrophobic on the outside, hydrophilic on the inside. Even our simple, even our simple ion channels are almost two layers. These three layer folds, you can have these two cavities. You can have alpha helix, one layer of helices and one layer of sheets. Sorry, helix, Ct helix or something. There are certainly lots of ways you can imagine having these three layers. There's slightly more complicated structures. These membrane proteins I showed you are somewhere between two or three layers, right? You typically, two layers might not be thick enough in a membrane protein to shield, say the ion pore from the surrounding. And having slightly more protein helps you shield it better. By the time you get four layers, you can pretty much forget about it. Why is that? So the problem is that in particular if you're thinking globular, right, that at some point we're going to need to have high, on average we have roughly 50% amino acids that are hydrophobic and roughly 50% are hydrophilic. And if you start either alternating this layer, hydrophobic, hydrophilic, hydrophobic, hydrophilic, we're going to get to the point where we're burying hydrophilic amino acids. You could, of course, say that, well, we can have large parts that are entirely hydrophobic. Either they would become membrane proteins or at some point it's going to be, it's simply going to be unlikely to have that many purely hydrophobic residues in a stretch after each other. I'm going to come back to this probability soon. So five layers, forget about it, sorry. If somebody predicts a single domain to be five layers of protein structure, it's wrong. I'm sure there is an exception, but I can't think of one immediately. The four layers to change population would be much more complicated. Exactly. The second you want to change, the second when you do anything, proteins might appear extremely complicated when you look at them, but when you start looking in the details and we'll see that after the break, they're actually surprisingly simple. That's, you can't do things if things are too complicated. So that would be, say, one beta sheet or a set of helices right next to each other. This is not uniquely defined, right, but layers frequently have to do with the thing, concept that they're one hydrophobic side and one hydrophilic side or something. It's not a mathematical, unique definition, but that's why you'd never see these proteins with a billion residues that are a gigantic blob. What nature will have to do in that case is fold these as smaller subdomains for stability? And that's how you get things like the nuclear pore complex. The nuclear pore complex, as I mentioned, that's like 100 different proteins that all are connected together, but each of them will have to fold as a separate domain or they're not going to be stable. So this far, we spent a lot of time in the course of conceptually asking how can a given sequence adopt into a fold? But given how many, how few folds and how many few possible layers are, perhaps the better question to ask ourselves is that given a fold, because there are so few of them, which sequences in your genome can fit that fold? Because that's gonna, if you just start creating random sequences, right? The question is, which of your random sequences will be able to fit in fold one or two or three up to 1,000? There aren't really more than 1,000 different ones. So either we fit one of those folds or this sequence is not gonna be one that will fold into a stable protein. And now we start getting into this part that you're likely, there are gonna be very few sequences that fit folds. The reason why this 80-20 rule or something, the reason why some simple folds, simple folds can accommodate lots of sequences. You kind of saw that even in the 2D computer lab, I think, and that's likely why 20% of your folds are so common. We see them over and over again, say four helix bundle or something. It's an easy way to pack helices. The globe in fold, same thing there. Simple things are simple. They don't require very special amino acid. They don't require a disulfide bridge. If something is simple, it's gonna be easy to fold lots of sequences into them, and that's when we're gonna see them more based just on probability. The second, it doesn't really matter what you need to do. The second you need to do something, the second you say, for this fold to be stable, I need a cysteine exactly here. You start reducing the number of options exponentially. For now, you're gonna need to take this as a, you need to take my word for it. It is exponential, you will see that after the break. And that is what you see, for instance, in the Greek keys. I know I already mentioned this, but the reason why we have this pattern on the urn, it's exactly the same thing. There aren't that many different ways to have a sequence be periodic in two dimensions without lifting your pen. You could imagine something even simpler where you just go up, down, up, down, up, down. I uploaded the Richard Jane Richards on paper from Nature 1977. And again, it's not, well, by your standards, it might be all, but I was alive when I was five years old when this paper was published. And it's the same thing. A simple pattern can accommodate lots of sequences, and that's why we see it. And you do not have to take my word for that. We're gonna show it. Sequence patterns. You know what, let's forget about the fibers and membrane proteins for a while and just look at the globular proteins. And you can imagine that the black dots here are hydrophobic amino acids and the white dots are hydrophilic amino acids. Remember that we already spoke about looking at a helix or a sheet, that how many residues, you can't have too few residues. If you have too few residues, you're not gonna be stable. If you only have, say, two residues that would like to be in helical shape, you're never gonna be stable as a helix. But based purely on the free energy argument, we said that, well, once you started a helix, it's downhill and free energy. The more residues you add that would like to be helical, the better the free energy is gonna be. But there is something that's stopping us from having helices that are a thousand units long and that has to do with this probability. So let's look a little bit about that. I'll come to that in a second. So what we really wanna get to is that, why it appears to be that for some reason these defects are rare. You could certainly imagine having a small defect. You could imagine having a proline in the middle of a helix that will lead to a kink in the helix. We do see that now in them. So they're not entirely impossible, but it's just that it appears to be extremely rare. Proline and helix is a good example. You pretty much lose one hydrogen bond and you kind of destroy the hydrogen bond in the next turn a little bit. So maybe 1.5 hydrogen bonds. Now, that is certainly not free. You're gonna pay five to 10K cows for that. But what five to 10K cows between friends, say if you have the nuclear pore complex or something, if you have something with 10,000 residues, that must be completely insignificant to the whole protein, right? But this seems to occur throughout nature. It's always a matter of a single defect and that single defect makes a fold much less favorable, no matter what the size of the entire protein. So there appears to be something here that's independent of size. And we don't really know what that is yet. Let's see. Sorry, it might have been a slide that I skipped here. Sorry. And then, what do you mean by five to 10K? It is a lot of energy, but compared to, you're talking about how many hydrogens might there be in a large protein, right? There might be 500 hydrogen bonds. And why on earth is this Armageddon-style disaster if you lose 499 instead of 500 hydrogen bonds? But seriously, it can't make such a difference, right? There are also five to 10K. No, but obviously, I think you're quite right because obviously it does, right? But if you're looking at phase value, if you had to make a prediction, if you didn't know how many hydrogen bonds there were in this protein, you couldn't say whether we had 499 or 500. For some reason, 499 is completely hopeless, bad, while 500, you have a beautiful, stable protein. That's kind of strange, right? So that obviously this has, at some point this has to do with the free energies, we're gonna need to think a little bit in terms of enthalpy or entropy. And it's partly related that folds, if you have something that's very bad, so that there are only a few confirmations that you can adapt, that's gonna be related to the entropy. Then you can probably only accommodate very few sequences. For instance, if you absolutely cannot have a tryptophan because you're so densely packed, well, the second you have a tryptophan, you're no longer gonna be stable in the fold part. You could also imagine proteins that have horribly bad energy because we couldn't for, say, assault bridge or something. So it appears to be that folds that are simple in some way, they have way more freedoms and that we can accommodate lots of sequences in lower energy states, but this far it's just hand-waving. When you see this, can you think of any statistical equations or something that might decide whether things are stable or not? And that's an awesome idea, which is completely wrong. I would have guessed that too had I not. It's Boltzmann. We know that it's exponential. It has to do with energy and entropy. It's divided by KT. The only problem is that it doesn't work. So what is the whole axiom between Boltzmann is that we visit different states, right? You do not visit different states here. There is no equilibrium. There is no detailed balance. Detailed balance has to do with that the flow for one should be equivalent to the flow to the other. You can't really have, there is no equilibrium. We don't have both. For some reason we see one type, but not the other. So this is not a Boltzmann distribution. It walks like a Boltzmann distribution. It quacks like one, but it can't be. Now, experimentally, it still seems to hold up though, and that's kind of interesting. So the book formulates this thing with a called the multitude principle, which I think is like say formulation, which I think is quite good. The more sequences that can fit a given architecture or fold without disturbing that fold stability, the higher the occurrence of that fold in native proteins, because there are simply more random sequences that can fold into it. And that appears to hold. Remember this 80-20 rule and everything? Some folds are much more common than others. So it's not entirely random here. I will come back to this, might not be until after the break, but the easiest thing to start matters is to start with something very simple. So start looking at our helices and sheets. And we also know that there's approximately the same number of hydrophobic and hydrophilic amino acids. And if you start thinking of these so-called layers or something, well, to form layers, we would need one side of the sheets that hydrophobic and another side of the sheets that hydrophilic. And that means that we have some sort of repeat here of two units, hydrophobic, hydrophilic, hydrophobic, hydrophilic, hydrophobic, hydrophilic. For a helix, we have kind of an average 3.6 residues per turn, right? So if you want one side that's hydrophobic and one side that's hydrophilic, we're gonna need repeats of roughly three or so. But the beautiful thing that these sequences are simple enough that we can start looking into these probabilities. So let's assume that we have these small stretches and then we can start looking, what is the fraction of non-polar residues? Well, let's just call that P. It's gonna be roughly 0.5. But, and the probability of having say 10, or what, one, two, three, four, five, eight non-polar residues would then be eight of those P's. But if we're gonna have exactly eight, I'm gonna need to have something before that that's one minus P hydrophilic and one minus P also hydrophilic after, right? So the probability of having exactly eight non-polar is one polar, eight non-polar and one non-polar. Otherwise, it would be nine or 10 if we just didn't have the blue ones. So the probability of this occurring, we can just formulate in terms of P and R would be the number of residues here. And then the book, Loving Math, it instantly puts up that expression. This is not, I'm just gonna skip through this a little bit, but I'm just gonna give you the overall reasoning here. The probability to calculate the average number of residues, then we should sum this up and that's the number of residues R multiplied by that weight, right? And then we should do that with two, three, four, five, six, et cetera. If you only have one of these residues, we don't really have a pattern. So we'll let's skip the one instead of one. Those weights might not sum to one. So we're also gonna need to normalize by the weights in the denominator. And then you end up with these beautiful series that goes from R equals two to infinity. And if you know your maths, you can actually prove that this is the case, which I'm not gonna do. And then the books goes through a bit of maths and there are a couple of tricks you use here. So you can actually change these so that they start from one instead with this expression, thus. And then you can actually show that that upper expression is kind of related to the derivative of the lower expression. If you love algebra, it's kind of a funny exercise, but I'm not gonna go through it because it's somewhat pointless for this course. But you can show that that expression is actually exactly that one. Two plus P divided by one minus P. And of course, in principle, you could have approximated it too because this is a rough order of magnitude. If P equals 0.5, this is the ballpark of three. So we're gonna need some sort of repeats that are in the ballpark of three. And in this case, it's actually not gonna be residues because it's gonna need to be related to this. The sequence length we have when we go from hydrophobic to hydrophilic, from hydrophobic to hydrophilic, if that is what we're looking for. So with a bit of hand-waving here, we can say that with alpha helices, if this repeat is three or four, it's something like 10 to 12 residues. And with beta sheets, three repeats would then be roughly a six or seven residues or something. This is not super important because it's hand-waving, but the point is that this pretty much holds in nature. And the reason why I go through that is that you realize what limits the sizes of these elements on the low side is that if the elements are too short, they're not gonna form stable helices or sheets. You don't have enough hydrogen bonds. As the element starts to become longer and longer and longer, it's simply less and less probable to have very long specific hydrophobic hydrophilic patterns that match, for instance, the sheet or helix. So it does happen. There are certainly alpha helices in nature, long stretches of receptors or so that can be 50 residues. It's just that they're not the norm. Even membrane proteins, there are 20 residues or something, right? But that's because you have something else. You have lots of hydrophobic amino acids there. So that on the low side, things are limited by stability. On the high side, things are limited by some probability. It's simply unlikely to have that many residues in a specific pattern. So this limits the practical freedom we have. We are gonna be fairly concerned when it comes to lengths. It turns out you can do this for loops too and loops turns out to be even shorter than the helix or sheets. And the point with this is that even a random, completely random sequence can form one layer. All you need is small helix. That will happen now in that, right? It's not that unlikely. And then you're gonna need a small turn that can happen too and then either helix or sheet. If you just throw out random sequences, and then throw them in a CD spectrometer, you're gonna see some helix and sheet structure. You might even, if you could determine an NMR structure, you would occasionally see helix and then a loop and then another helix or sheet. So it's not that you can't, a random sequence will always be random coil. But what won't really happen is that they're not gonna start forming these larger structures. They're not gonna have multiple layers. So there appears to be something very fundamental between one layer and two layers. And what is that? I already talked a little bit about it, so it might not be obvious from this slide. So in terms of a protein structure, what is the difference between a small structure that is just one layer, say, a single sheet, versus a larger structure that actually has at least two layers? That's kind of right. But if you think what is that you're doing to at least some amino acids? You bury them, right? So there's gonna be some amino acid that no longer faces water. And that turns out to be the key difference. So if you start looking at this type of stability energetics, it's actually not just one hydrogen bond, but even when you get a one K-calc, difference in the bulk of one K-calc starts to be really bad for proteins. That's kind of in the bulk of KT. And what we do know and what certainly is governed by the Boltzmann distribution is that individual amino acids, their preference of being buried in, either being in water or being buried on the protein inside or equivalent alcohol or oil, that is very much related to on how hydrophilic charts they are or how hydrophobic they are, right? That's what we went through in one of the early courses because there you, for an individual amino acid, that can turn to change in both directions. There you have equilibrium and you have detailed balance. You might have 99.999% being either buried or solvated, but at least in theory, there is detailed balance. It can sample both states. So on the single amino acid level, this works fine. And if we start to think a little bit, what would happen for a protein if we had this single amino acid and if we could turn it either outside or inside? Well, let's pick something simple. Lucene and serine, so these are two residues that are pretty much exactly the same size where lucene is entirely hydrophobic and serine is hydrophilic. And then we imagine if you're at the inside, let's do the opposite. We started being on the inside of the protein, which is our good protein, and how much it's gonna cost to move this residue out to water. When it comes to serine, well, it's pretty much the same. It's relatively happy on the inside, but it's also relatively happy on water. Lucene, on the other hand, if we're gonna need to turn lucene to face water, we're gonna pay a lot, at least two Kcal or something, because lucene hates water, right? It's very hydrophobic. But what this means, if I now have a protein where serine is happy on the inside, all these folds where serine is happy on the inside, there should also work with lucene, because if I move lucene from the outside to the inside, I gain two Kcal, right? For a second, let's ignore whether these interact in bad ways on the inside, but again, just looking at whether they are exposed or not. For serine, it doesn't really matter. That's okay. I'm good in my protein. Lucene is gonna be even better. But on the other hand, if I have a fold, so that a fold, let's say a fold with serine inside will also work for lucene, but a fold with lucene will work for even more sequences because there are more sequences that I can stabilize here. So if we start to separate this, you can separate this into two parts. What is the stability of my entire protein or the rest of the protein? That's delta F, and now let's go back and actually look at the stability of the protein. So now we're going from the unfolded to the folded state. If this is the stability of my entire protein that's folded, I can think of this as the entire protein or the rest of the chain, and then some small delta E or delta lowercase F maybe, that is just the stability of one specific residue. But again, mind you, let's change side again and look at folding stability. Do we know anything about this? I need to draw here. So now we come back to this physics, right? Delta F and delta F plus delta E, this is stability of my protein. So the total stability of the entire protein is gonna be delta F, the rest of the sequence, plus delta Epsilon. You must know something about that, right? Assuming that this is a real protein that can fold, whether it does fold. That must be negative, right? Otherwise it would not fold. So we already know something. So that means that this protein is gonna be stable, i.e., even if we have a different, say delta E plus delta F is smaller than zero. As long as delta F is smaller than minus delta E, I just subtract that on both sides. I'm good, I have my fold. So if you now introduce a mutation, this mutation can still fold into my protein if delta E for the mutation fulfills that. The protein will still be, I might have introduced a perturbation, but the perturbation is not still not so bad that I've canceled the stability of the rest of the chain. The second that this is no longer true, the entire protein is not gonna be stable and then we're unfolded. So this is probably just assuming some very simple things without really knowing any details of the protein, right? So the question really has to do with how many sequences are there for which this is true? If this is true for a sequence, it's still gonna be stable. If the mutation we're introducing does not fulfill that, it's not gonna be a stable protein anymore. And here things become a little bit more complicated but not really a whole lot. We don't really know anything about this. What is delta F of the rest of the sequence, right? And if you look at random sequences, what's the distribution of that? No, why would it be Boltzmann? It's certainly a good guess in this course, it's a guess ball. If you don't know anything else, guess Boltzmann, I would do that too but in this case it's not true. So this is just a random distribution. You don't know exactly what that is. Or that is right, for each individual amino acid, this is random, right? And delta F is the sum of the entire sequence. So this is gonna be the sum over lots of random terms. That's all we know about it. Now, if there is something you know when you have a sum over lots of random terms, what will that random distribution end up being? Normal. Yes. And it's usually normal somewhere if you go to the infinite number of sequences, which in physics happens over around five or 10. So all we know is we have some sort of Gaussian distribution. That's, for sequences there is an average value of this delta F the stability and there's gonna be some standard deviation sigma. For now we have, let's just call it, it's an average and a standard deviation. We have no idea what they are. And it turns out we don't really care. And if you remember your mathematical statistics or something now, or I'm not sure actually we don't demand you to take a mathematical statistics, but if you had at random distribution the probability each possible point along this curve is possible outcome and event. And if you look at the region at the curve, the possibility of falling in this region corresponds to the integral, the area under the curve. So all I wanna know for what sequences is this true? For what sequences is delta F smaller than minus delta E? And delta F was really the variable in the random distribution, right? So what is the probability of being somewhere here? Delta, minus delta E is some sort of constant. So what is the probability of being somewhere in the curve that is smaller than this constant? And forget about the sign of delta E for now. And that's just gonna be an integral of this random distribution from minus infinity up to this constant, whatever it is. And for now we don't even know what the sign of that constant is. And you can probably already now see that that's a fairly small area under the curve, right? Well, but that's just my reasoning. Why shouldn't the area go all the way up there? It's a bit of hand waving. So there are two possibilities for this minus delta E. Minus delta E can be smaller than zero or it can be larger than zero. And if you think a little bit about it, delta E was a stabilization energy of this individual amino acid. And what this corresponds to is that if minus delta E is smaller than zero, that means that delta E is positive. And if delta E is positive, that's a bad change, right? While the second part, if minus delta E is smaller than zero, then it's a good change so that we're actually, with the mutation you're doing, you're actually improving the sequence. So we can already here see that if you start to introduce a random mutation, there are gonna be more sequences that are stable. Sorry, it's gonna be like clear that this is stable. If this is a good mutation, then it's a bad mutation. And that doesn't really tell you a lot at all. For now, we have no idea where this happens in this plot. So all, but on the other hand, the beautiful thing is that we know that it's a Gaussian. And a Gaussian is really just an exponential function. So as long as we know that there is some average here, delta F, and we know that there's a standard deviation. So you can actually just start integrating this. We don't know where the limit is right now. The book spends, by the Patriots on this, it's not difficult. You don't need a whole lot of mathematics for it. And then it turns out you get a large constant and then an exponential raised to minus delta E divided by sigma squared divided by delta F. And remember that this is the, we're integrating over delta F here. So minus delta E here is just a constant. From the point of the integration, it's completely relevant what delta E is. And then we have sigma squared, which is the variable. Sigma is the standard deviations, so sigma squared is the variance. And delta F is the average value of the stabilization chain. Stabilization energy of the chain. The beautiful thing here is that it doesn't matter what sigma is, it doesn't matter what delta F is here. If you actually wanna calculate what the specific value is, it might matter. But all we really care about now is that, yeah, there is some sort of large constant in the denominator, but then it says delta E in the denominator. So what this shows you is that we have some, this looks like a Boltzmann distribution, right? An exponential raised to minus some difference in energy, the stabilization energy of this individual residue, divided by some complicated expression, we can call that KT for now. But it's gonna need to have, that has units of energy, so that's gonna need to have units of energy too, right? So let's forget about that. This is a sort of constant characteristic energy or whatever. So this looks exactly like a Boltzmann distribution. And of course it's related to a Boltzmann distribution in the sense that we got these individual energies from a Boltzmann distribution, but it's really an approximation from the central limit theorem of Gaussian distributions of random energies. It's not strictly the Boltzmann distribution. What that means that if we start to increase seeing delta E just a tiny amount, the number of sequences that are stable, so that is the probability that this is gonna keep my sequence stable goes down exponentially. So if we are talking about something fairly simple, say one hydrogen bond, five K cal, let's call it six because that makes it easier. Six divided by KT, which is 0.6. So you're talking about E to the minus 10. We just lost the fact of 25,000 sequences that can be stable in this fold. If you start losing two hydrogen bonds, it's gonna be 25,000 squared. And that is, it might not look a whole lot, but it goes down exponentially. I'm gonna have one slide and then more slide and then we'll take a break. So friend affordable say, but I haven't, what's the whole thing with the size of the protein? Can't all these complicated expression you have here, can this somehow make this dependent on the size of the protein? Because here I kind of assumed that the denominator was some sort of constant. So let's look at that. Delta F, the average value of the stabilization energy, how does that change with the size of a protein? Yes, or should at least, the first approximation is reasonable that the total stabilization energy for the entire chain is proportional to the size of the protein, right? Variance, well, that might be all right. The standard deviation or fluctuation of a random sample, you might be aware that that goes as the square root of the sample, right? It's a fairly fundamental result for statistics. So if you take something, sigma, that goes as a square root of the size and you square that, you're gonna get the same thing. So that sigma square is roughly proportional to the size of the protein and delta F is proportional to the size of the protein. So no matter how large your protein is, this term is pretty much constant. Sorry, is it? So delta F for different proteins is, I would say it's roughly the same, it's all stable in the same amount of 5K, God, thank you. Now we're talking about something different. The total energy in the entire protein, that is how much of the delta F of the entire chain, the question is if you now start to introduce a small mutation in the amino acid, will this one no longer be folded? And those are two different things. So that when we talk about the stabilization energy of a protein, if you have two helices that have the same construct, the stabilization energy of those two helices will the first approximation be twice of what it would be for a single helix, right? So now we're rather talking about, for a given sequence, how likely is it that it will fold? Which is a slightly different question. Look it up, but this is a great question, Eloy. If you start looking up, for example, the calorimetry or something, how much energy you need to, say, denature a protein, that energy is going to be roughly proportional to the size of the protein. And that's experimentally, kind of like water. It takes twice as much energy to boil twice as much water. So the strange thing is that this sigma square divided by average of F, it has kind of the same role as KT. But it's not KT. I think this is a great place to take a break. I'm going to continue with this slide after the break so that we have found something new that looks like KT, but it isn't quite KT. So there is some sort of characteristic stabilization energy, and that's Eloy, what you were looking for. So there is some sort of stabilization energy in a protein that appears to be very small and independent of the size of a protein. Think about that, it's 1034 now, should we meet here in 25 minutes, 10 to 11? So before the break, I got to this point that we have something that actually looks exactly like Boltzmann's Thesistics, and mind you, we are going to use this as if it is Boltzmann's Thesistics, but somewhere you should have just a little bit of a bad conscience if you call this Boltzmann's Thesistics because you should know that it doesn't come from detailed balance, but there's an approximation of the way the stabilization energy works. So this strange thing, sigma squared divided by F, it's something that looks exactly like KT, and this is going to decide that, so we have something that's much, sorry, and as we also showed before the break, it's independent of system size. If the NUD factor change related to your mutation is smaller than this characteristic KT thing, we might call it KTC for temperature characteristic, it's going to be fine, but if it's larger than this, it's likely going to destabilize your protein because of this same exponential relation. And this is important because we might have no idea what it is, but we can measure it if you start introducing random mutations in your sequence, do they fold or not? Do they fold or not? Start doing for every position, try all the 19 other amino acids. We at least have a rough idea what this delta epsilon should be based on whether it's in hydrophobic or hydrophilic amino acid, right? So we can start collecting tons of statistics for this. Simply based on where does the protein, how large a distortion do I have to introduce before the protein starts to unfold? And that way you can actually measure how large that term is, and that turns out to correspond to an energy, that is the ballpark of 0.67, 0.8 K cal. So that would be a characteristic temperature around 350 Kelvin. But this is not the temperature at which you're conducting your experiment, this is a constant. But this makes sense in lots of ways because that's what does 350 Kelvin correspond to, like 70, 80 degrees, right? Centigrade. And that's roughly where protein starts to denaturate. So this works beautifully with everything we see about proteins and everything. And it appears to explain why the actual stabilization energies we have in proteins are so small. This is based on a fairly deep theorem to call the Shekhnovich-Gudin theorem that the book loves and that's because Alexei Finkelstein's PhD advisor was one of these. Don't bother too much with it. But the point is that energy defects should be compared to this characteristic energy with KT where Tc is around 350 rather than the energy of the entire protein, no matter how large things are. And this is another way of looking at the size of the proteins. If you now have 1,950 residues, what is the probability of one of those residues being mutated or not being ideal? It starts to be fairly large, right? And if that happens, you're gonna need to throw away that entire protein and the organism dies. Well, if it's too bad because the organism won't be able to produce whatever that protein did. On the other hand, if you have lots of small proteins, like 50, 100 amino acids or something, it's much less likely that you're gonna have large defects in them. And that's indirectly another reason why proteins appears to be, they can't be too small because they need to be large enough to achieve their function. But once protein starts to become too large, we run into the likelihood of just randomly starting to having defects. And it's simply very improbable that all to only produce perfect proteins throughout your body. What we have not shown this far, but what you can do with this experimental things, if you actually start measuring this energy and then you start to comparing this to what is the random distribution of all possible energies, that is when we see that this is extremely small. So the delta F here, this would correspond to temperatures of thousands of Kelvin or something. Again, this is based on experimental results that we're then back interpreting using our equations. So based on this results, we know that the likelihood of a random sequence or just the likelihood of a single defect still stabilizing the protein is fairly low. Even very small defects can lead to horrible things. And if you now start to think of this in terms of an entire protein with 1,000 residues, you could of course see each and every one of those residues as a potential defect, right? So if you have an entire protein that just consists of a thousand residues, the probability that that is gonna fall, the protein is gonna be almost zero. We will talk more about that on Thursday, but that actually turns out to be the case. Those are probabilities in 10 to the minus 10 or something. So virtually never. There are, and again, this is about free energies, not energies. So many of these things have to do with packing. The left versus right-handed sheets, we talked a bit about, probably in an alpha helix is an example of that. They can happen, but it's gonna be bad. And this also means that structures with more conformational freedom, if these structures are nice and happy, many mutations are not really gonna be defects because the structure is still happy to accommodate it. And if a structure can accommodate those things, they're not really defects, and then they, well, delta E might even stabilize it, or it's gonna be plus minus zero. And that's why we have these simple structures in so many different proteins. Sorry. And at the end of the day, it means that they're gonna be likelier in this probability distribution. So it's exactly the same quasi-Boltzmann effect that explains this multitude principle as before. Quasi-Boltzmann, you can call it. There's a beautiful, simple example. Have you thought about all these structures you've seen? They virtually always have alpha helixes on the outside and beta sheets on the inside. And you hardly ever see any structure with beta sheet on the outside, but helix on the inside. And you can account for that with entropy. If you take a long sequence of length n and somewhere we're gonna put the helix here, let's just assume that n here is very large, so this is a small part of the sequence. To first approximation, there is gonna be like roughly n different ways we can put that helix somewhere. But rather if we take this helix, and instead of having a helix, we call it a small beta sheet or something. Well, the first beta sheet we can put in roughly n ways, and the second beta sheet we can also put in roughly n ways, say n half. So that's gonna be roughly n squared divided by two. So short segments such as beta sheets, we will be able to have lots of way many more ways that we can pack those on the inside than a helix. And that's why we typically have these things like large things on the outside, such as helixes, and the small things, beta sheets on the inside. And the reason for that is that sheets contain fewer residues per length. A helix packs more residues. As always in biology, there is no rule in biology without an exception, and that's a cute little protein called GFP. Have you heard about GFP? The green fluorescent protein is a protein that consists of one of these beta barrels, and then you have a somewhat helical shape on the inside. This absorbs blue or ultraviolet light, and then it emits green light, typically. This is a super cool protein that we are in other groups. We use it in tons of cases. All these fluorescence we observe in membrane protein insertion, and everything is frequently based on GFP. The reason why you have fluorescence here is that you have a small chromophore, so you have a small chain here on the inside that can adopt different conformations. And if you replace that with slightly different chains, you can get this protein in lots of different colors. You can have kind of like blue fluorescence protein, green, purple, yellow, you name it. And apart from being very fun at the department's Christmas party or something, you can actually use this for some pretty cool stuff. There are animals that use this. There are lots of animals where you might have some fluorescence or something either because that is related to the way they produce energy or they use it to scare other animals and make jellyfish, for instance. This is from Monterey Bay, probably, or something. They use a protein that is, well, they basically might use a type of GFP very similar to the original one. This was, well, I wouldn't say that, I'm not sure where it was first discovered, but this was really explored by a group around Roger Chen, who started to, and they were the ones that started, well, can we start to tune these things? And can we then start to use these proteins and somehow get them to bind to different tissues or something? So first, they were able to take these proteins and mutate them into a whole range of different colors. This is actually a small petri dish with different types of proteins. Do you see that, all the different colors? So here you can say based on the light what protein you have in the T versus the N versus the E. And this might be more fun for the Christmas party. But the cool thing that, if you then combine these proteins, if you're not starting multiple domains, say something that looks almost like an antibody, but something that binds a specific type of tissue, that's easy. There are several proteins like that, but it's not really gonna help you just to bind a specific type of tissue. But if you now hook that part that binds a specific type of tissue, hook that up with something that has a specific color. Now you have proteins that you can color things based on the type of tissue. So you can do fluorescent peptides, for instance, to highlight nerves. So you add these peptides and then you're gonna see that what is a nerve? Roger Cheng got the Nobel Prize in what was it, 2008, if I recall correctly. Not so much for this, but for their discovery and exploration of these progression proteins. Could you use this for something? So first, this is fairly new stuff, 2011. So this is used experimentally in surgery. So this is, if you're a surgeon, this is how you would see some tissue. And there are some tumor cells here and you probably can't see what cells are tumor cells, right? So what you would do as a surgeon is you would cut and you would cut a lot. The more you cut, the better. Now, but just to be safe because you absolutely don't wanna cut straight through a tumor cell and leave half of it. So what you're gonna do, you will, by definition, cut too much. You're gonna cut away some nice tissue and that's, sorry, there is no way around it. It's better to cut too much and have some side effects that the patient lives than the alternative, right? Or you could use these peptides and turn on the ultraviolet light and then you see the tumors. So here, you don't really see the tumors, right? But you see the fluorescent proteins that are bound to the tumors. And what then, you've had some domain that this bind into some protein that is overexpressed in the tumor cells. I have no idea exactly what proteins. But that's pretty cool. So suddenly you turn this into extremely difficult job into something that almost a five-year-old can do, cut away the green stuff. But there's a problem here. Right in the middle, you probably don't wanna cut exactly at the green stuff, right, you want some sort of margin here. But what now, if this is some, there might be some important nerves here. This might be very close to your spine or something. You don't wanna, you wanna cut enough, but you don't wanna cut too much. And the cool thing is that you can call her the nerves too. So you can basically get this, cut the green stuff, but not the yellow stuff. So you have one type of dye that binds the tumor and another type of dye that binds the axons in this case. You probably can at some point, right? So they've commercialized this, of course. And this has been used in actual surgeries, at least experimentally. I'm not sure if it's on the market and you can buy. Well, you can definitely buy the dyes. I'm not sure if, how advanced it is by now, but I bet this is something that's gonna be used throughout the world in 10 years or so. What I wanna show here that we talked about, he got, these discoveries might be like 15 years old or so I think. He got the Nobel Prize less than 10 years ago and this is used in surgery on patients today. So the distance between very fundamental groundwork, this has to do, of course, what type of, what type of wavelength will you emit based on the surrounding in your protein? Very simple physical chemistry that can have some pretty far-reaching implications. Good, that is pretty much what I'm gonna talk about the stabilization. But since I have a little bit more of time, as I expected, I'm gonna move a little bit into chapter 17 too and do one part of it. And that has to do with molecular interactions and transitions. The reason for starting with this now is that on Thursday, we'll start at 9 a.m., but then we're gonna start with the cryeum seminar, the structure seminars as always. So I will only have two hours for the actual lecture on Thursday. But if we do this now, and then I will gonna have plenty of time for the rest of the things on Thursday. And before, well, while I remember it before I jump in this, I've also talked to Marta. We're gonna, if you want to, we can plan a study visit to the cryeum facility because that's finally up and running now again. And there are two times that works. Either we can do Wednesday before lunch or Thursday after the, we had the seminar by Thomas Cheatham at 1 to 2 p.m. So what works best for you there? Do you wanna do this on Wednesday or do you prefer to have all of Wednesday off and do this Thursday afternoon? Thursday afternoon, good. I'll send Marta an email and I'll add this to the course schedule then. So that's completely voluntary. It's not that I'm not gonna ask anything about the exam on it, but it might be pretty fun to see a new state of the art facility working with these things. So the one thing, this far, all the proteins we looked at, they were just single proteins, right? What is the stability of one protein? What determines if one protein is folded? But in many cases, we're gonna have structural transitions and you might also have one protein binding to another protein. In principle, that's the same type of interactions. So the good thing is that all the same theory holds. If you have an interface between two proteins, deciding whether these two proteins bind are gonna be based on exactly the same principles as we do here, one small defect can destroy a complete binding interface just as it can destroy the stability of a protein. No new theory to go through. If a protein moves between two states, you save an open and a closed state of an ion channel, it's the same thing there. It's very easy to make one of these states much more stable than the other. And that's why we can have very small changes like changing a voltage across a membrane and suddenly you can make the open state be more stable or binding one small ligand to a gigantic protein can still cause the entire protein to open the channel. So this is used throughout your bodies for some very important functions. This has gone through a number of different genera, sorry, a number of different models for how proteins interact. So a long time ago, there was a hypothesis by Koshland actually that was called lock and key. And lock and key is a super simple model that if you have some sort of substrates and this could be either another protein or a small molecule or something, this should somehow bind to a large receptor, doesn't need to be a membrane protein, could be anything. And the way these works should be that you literally have a lock and a key. And if the key fits the locks perfectly, they are gonna be so happy and bind, you reduce the hydrophobic area or something and you're gonna have a very stable and nice complex there. You could even imagine something like hemoglobin binding this heme group or something. It appears to fit this really beautifully. The only problem is that it is wrong. So this is lock and key traditional model for enzyme and substrate complex. What people found out fairly early, in particular with hemoglobin and myoglobin that the hemoglobin has one shape without the heme group bound, which you call the oppo or oppo structure, the structure without it. And then you have a hollow structure when you have bound this heme group. And it's, to first approximation, it looks the same but there are small changes in it. So it appears that this heme group somehow pushes the protein or something to change the shape a little bit. And that's what you call induced fits. That you have an enzyme that roughly fits, but it's really first when the substrate or something binds this receptor or enzyme, where you form the large complex and the enzyme has adapted itself to the substrate. This has really been the induced fit model has been the dominant model for like the last 30 years. In particular, when it comes to docking that we're gonna do next week and you're gonna do a small lab on it too. That is kind of the traditional view of the world in docking. It works fairly well. The only problem is that it's wrong too. That's not how it works either. Because based on what you know to change the shape of this protein, this would take a lot of free energy, right? And that free energy would somehow need to get from binding, this would likely be a very slow process. So as we started to see more and more simulations and everything, we are very much moving over to a view of the world where you have a protein. If one here is the protein or the enzyme in some unbound form, this one will naturally breathe or change so that sometimes it's gonna be with a square hole and then a little bit later, maybe a microsecond later, it will have changed to this shape. So you have an equilibrium between these two conformations. It might spend most of the time here, but 10% of the time it might spend in two. So what rather happens is that rather than first binding something and then forcing the protein to adapt its structure, what likely happens is that these are continuously changing to each other but you're spending 99% of your time here, maybe 1% here. But suddenly when you are here, this small red molecule, the ligand can bind and once this small red ligand has bound, we're gonna be locked in here and then we're suddenly gonna spend 99% of the time here instead. This model is called the selected fit. So the whole point is that the protein always visits multiple states but you're changing the relative stability of the states. So in theory, you might still move from three up to four but it's just that once you have bound the ligand, you have changed the stability so much that we're gonna spend 99% of the time here instead and only 1% here. Do you see how this fits much better than statistical mechanics? That it's a distribution of states? This is the one you should think of in the future. We might still do this a little bit with docking but we should have a little bit bad conscience and the reason with this is that we can predict way cooler things. For instance, when it comes, do you remember these ion channels that I showed you, the ones that can move up or down? I will show you some research results on that next week but it actually turns out that we can stabilize them in one state by binding a toxin but you can't bind that toxin in the wrong state. You have to wait until it spontaneously is in the upstate for instance. Then we can bind the toxin and then we have locked it into the upstate. It's not the toxin itself that will force it to the upstate. So this rhymes much, much, much better with modern experimental data. And this is related to the question I think you mentioned before the break that proteins need to be simple because they have some sort of simple motion or something right. We see this. This is a small protein called the glutamate binding protein. This is 10-year-old stuff when we work the methods to predict docking because it frequently happens. This might look like a mess to you. It might be a mess. Let's see if I get it. The red structure here is a structure of the protein. So it's a glutamate binding protein. Do you have any idea what it might bind? Yes, that's the yellow one. The red structure here is the form without glutamate bound which is called an oppo structure. And the blue structure is the structure with glutamate bound which is called the holo structure. So the blue one really, sorry, the yellow belongs to the blue and the red structure was determined without the yellow one. There's a fairly large change between the structures like six angstrom or misty difference between them. And this is not an extreme example. This occasionally does happen. It happens with proteins that bind DNA for instance too that they change a lot upon binding. But you see that the topology of this small protein is essentially you have one half here and then you have just two small beta strands in between and then you have another half of the protein on the right. So you can actually show this protein if you simulate it that this protein will actually breathe all the time it moves out and in. So what happens though and that's actually good because it's a fairly narrow and small cavity in here. If this one was always in the closed state the glutamate would never be able to get there. So the fact that it moves out and in kind of breeds that gives us a fairly open cavity. But if you happen to have a glutamate bound on the inside when it's closed then it will stabilize the closed state a lot. And suddenly you can have the red state be the better state. And now you've selected. So the glutamate here has helped select for the fits that let's see if I get it right. No, sorry, the blue state should be more stable on the red. And that appears to be the way most proteins bind and this works fairly well to predict these bindings to for a molecule as small as this we could even have done this in a computer simulation but what you did here is something much simpler. You can use very simple computer programs and just based on the shape of the protein predict that it has this part has to move relative to that part. And then in a program a web server that takes five well five minutes you can predict where the binding side should be and roughly how it should bind works remarkably well. And what we did in this particular work is to use the red structure then knowing only the red structure where the binding side is can we predict what the blue structure should be? Which is important because there are a whole lot of cases when we have one of the structures but not the other in the protein data banks. I have a question. This model is visible that protein is able to move from the state in which it is capable of binding the substrate and the one in which it is not. And this is an equilibrium and you said that probably the state in which it binds the substrate is less populated so it's maybe 1% or 10% of the population. But then we have seen it much sooner with basic and technology experiment because the effective concentration of the enzyme should sort of be instead of the clear concentration so the amount of the amount that you put in it should be like 10% or 1%. So like when I said 1% is of course just a number here you should rather think this numbers can be 10 to the minus six. So you can have extreme stabilization effects and of course if these stabilization effects are large enough you're gonna see that it goes from nothing to all. So you're not really, technically it is an equilibrium but the equilibrium distribution can be so extreme that you only observe one of the states. My point is that in basic and small experiments when you put the, for example, if you want to estimate the catalytic brain or something like that you always depend on the concentration. Yes. And if I did this correctly and this is in my battery it's always supposed to be like because of the quality of the device actually. But the point is that the more enzyme you have you haven't specified what molecule you're gonna bind to, right? So the more enzyme molecules you have that are free the more enzymes are available for you to bind to. And in reality there's also gonna be a these molecules have diffusion rates and everything and it's gonna be a finite time it takes for this molecule to be first to diffuse into the site. In many cases it's gonna diffuse out to the site again because it doesn't bind there perfectly, right? So the more enzymes you have the more enzyme molecules we're gonna visit and the larger the probability that we lock one of them in. There is another very famous example of the structural transitions hemoglobin. So what you're seeing here is a small movie. Actually it's not a movie it's just an alteration between two different states. Remember that what we spoke about last week that hemoglobin and myoglobin are similar and that myoglobin has one subunit but hemoglobin has four subunits. That's not a coincidence. So myoglobin exists out in your muscles but hemoglobin exists in your blood and the problem with that is that hemoglobin has to exist in well two or even three environments. First is gonna be one environment when we're just carrying oxygen in the blood and there you want the oxygen to be bound at equilibrium. You should neither release nor pick up more. The second environment is gonna be in your lungs. So in your lungs you want hemoglobin to bind more oxygen we should take up oxygen there, right? But at some point in the third environment once you reach the muscles we want hemoglobin to donate its oxygen to myoglobin. And this is the problem because for the first two environments it would just be awesome. The better hemoglobin binds oxygen the better it is. The problem is that that goes exactly against the third need. The better hemoglobin binds oxygen the worse it's gonna be at delivering oxygen to myoglobin. The best case to first approximation if they were equally good you would not have more than 50% efficiency. Half the oxygen would stay in the hemoglobin and they would keep distributing you would keep transporting 50% of your oxygen back to the lungs. It would be horrible in efficiency. There's no way nature would do that. So what people discovered fairly early with hemoglobin and this was long before you had structures is that there appeared to be two different states of hemoglobin that you call tense and relaxed. And you can occasionally see this drawn as squares or circles in books. I think the present, the book, the Shekhnovich book does it too. And what this seems to correspond to is that you had one of these states in the case where it says deoxy an environment where there was very little oxygen but when you change this over to the oxygenated environment, high oxygen pressure you seem to get the other state. This is based on modern actual structures. You see that there appears to be some slight difference here that these two domains, well these four domains they rotate a little relative to each other, right? But it's just a tiny rotation. And it also appears to have to do that. You see that this heme group here it's straight in one case but sorry, but a bit distorted in the other. And this is because hemoglobin as a molecule itself is actually not that awesome at binding oxygen. But what happens is that so by default when you have very low oxygen pressure hemoglobin's affinity for oxygen is somewhat low. So the saturation is still low. As the oxygen pressure starts to go up well eventually you start binding some molecules. It's not particularly happy to do so. But when the oxygen binds here you change the shape around the heme group first and that causes the entire part of the protein to change its shape a bit. And when this protein starts to change its shape a bit the entire protein starts to move over to a conformation that likes binding oxygen better. So suddenly you kind of wake up hemoglobin here once you've started to bind the first oxygen it's going to be slightly easier to bind the next oxygen. So the more oxygen you have the better hemoglobin is at binding oxygen. The more oxygen has bound the more molecules it likes to continue to bind up to four because they're one per subunit. While myoglobin doesn't have anything like this myoglobin has a completely natural saturation curve that you start by binding well when you don't have anything bound we're really good at binding things in the beginning but as you get to the point where almost all myoglobin molecules already have an oxygen bound it's harder and harder to bind. So if you look at this sort of saturation curves you have one classical curve for myoglobin but the hemoglobin curve is pretty much the opposite of what you expect for a normal saturation. So initially when we don't have anything bound it's hard to start binding but when we're should be starting to get saturated hemoglobin just keeps getting better and better. So what this means what the body does this way is that when you are in your lungs when the oxygen pressure is very low hemoglobin's ability to bind oxygen will be what? Not the oxygen pressure in the lungs is high. So in the lungs we're out here. So then hemoglobin is gonna love to bind oxygen right? But as you move to the muscles the oxygen pressure is low and then we're down here. If you are here and you're bound to the red molecule it would be far better to bind to the blue molecule right? So the cool thing when you're in the lungs hemoglobin steals oxygen which it should. Once you're out in the muscle hemoglobin starts to donate oxygen to myoglobin. And this gets you exactly this effect that you have a binding affinity oxygen that is high where you have oxygen in the lungs and low in the muscles where you expect to donate your oxygen to myoglobin. This is a classical example of what you call allosteric modulation. So that you have some other factors some external factor third party or whatever that regulates the primary process in this case the affinity of binding. And in this particular case it's called MWC Mono-Vaiman-Changé at the Pasteur discovered this. There is actually still quite a lot of things about hemoglobin we don't understand. It's a remarkable fund molecule that you can actually simulate the oxygen binding and unbinding. Allosteric modulation is an extremely wide concept that occurs throughout nature. Those ligand-gated ion channels I didn't tell you about that but I will try to go into that. They're very sensitive to allosteric modulation. So the normal part that actually does open your ion channel that's a neurotransmitter that's always a neurotransmitter. So if you followed my advice and did this experimental tests and had some alcohol this weekend when you had that alcohol the alcohol didn't open your ion channels. Alcohol can't open the channels. But alcohol acts as an allosteric modulator. So if you have high concentration of ethanol in the membrane and around these channels it's the channels appear to get more sensitive to the neurotransmitter. So the alcohol itself can't open it but it can help act like a transistor to increase the sensitivity to the neurotransmitter. I'll spend a lecture on that, all the research stuff. So the last transition that we're gonna cover and I have like 10 slides remaining so we're gonna finish on time. It's really folding and denaturation because these two are transitions. They're plane transitions. You might not like the denature state we might think that it's functionally hopeless which is of course true. You might say that it's not a protein but in the physical way of looking at it denaturation, if you go from a protein or folding if you go to a protein it's just a structural transition. The denature state might correspond to lots and lots and lots of states but it's the plane transition that we can start like any other one. We can certainly look at this either from thermodynamics, the free energy or we can look at it towards the kinetics. I'm gonna save the kinetics until next lecture on Thursdays. And these two we can observe with lots of different ways. The classical way to kill a protein, well there are two ways. You can certainly boil it. The problem is that if you boil it bad things can happen that makes it difficult to unboil it. It's not entirely easy to unboil an egg for instance. But if you take Guenidinium chloride very strong ionic concentration. You see this is not millimolar. This is molar, one, two, three, four molars. If you have four molar concentration either of Guenidinium chloride or urea or something. The fluorescence here drops extremely quickly. So here you pretty much destroyed your protein. It's unfolded. The neat things with these salts is that you can usually purify this. So remove the salt again and then we will see the protein refolding. So it's a very neat way to study folding transitions. You virtually always see that type of an S-curve that virtually nothing happens, virtually nothing happens, virtually nothing happens and then a fairly narrow range. Everything happens and then nothing happens again. That is usually a very strong sign that it's a highly cooperative transition. So that has to do with the phase transitions. That it's an all or nothing transition in the sense that if you start doing minor changes the protein is stable. But certainly everything moves over. It's not that you have proteins that are little bit more denaturated, little bit more denaturated and little bit more denaturated, a little bit more denaturated and eventually all proteins will be fully denaturated. So what's gonna happen here is that here we have all proteins be stable and suddenly we get to the point that one protein is completely denaturated, two proteins are completely denaturated. So it appears there's one protein at the time that is completely destroyed. And again, the classical way of doing this is salts. You can certainly do it with pH2. It's easy to destroy things with temperature. What happens is eggs is usually you get bad things happening with disalt, fiber, bridges and other things. That technically you can actually unboil an egg but it's not entirely easy to do. And you can't do it just by reducing the temperature again. You already know about the most famous person at this experiments because that was Christian Amphilson. And even so, one thing that you will realize that there are a whole lot of these seminal amazing papers that have incredibly boring titles. Reductive cleavage of disalt fiber bridges in ribonuclease. It doesn't exactly sound like a Nobel Prize, right? But it is. Because this has to do with some very simple basic properties of protein structure, the disalt fiber bridges in particular. And what he really showed that this he can by altering the amount of salt or change the pH, you can actually get your original PA protein structure back. Nobel Prize 1969. And again, today these, by far these the way of looking at this is CD spectroscopy. You can look at these things in calorimetry and remember this is the reason why I showed you all these curves in thermodynamics and phase transitions before. These are not gradual transitions. It's not entirely obvious from experiments but I'm gonna argue very strongly that these are all or none transitions. A protein is either native or denatured. Proteins do not like to be half denatured. And I'm gonna at least gonna argue that we do not see semi-folded states, period. Depends on... It's sort of complete fold, it's just not... So first I would say that that's extremely rare. If a mutation starts to destroy a protein, you've like, by far, remember, delta F plus delta E. In general, you're gonna destroy the protein. This couples to another concept. What is the fold? What is the folding part? Because you could certainly imagine what if a large protein, a long sequence has kind of one part here, one domain here, or one domain here. So a mutation might destroy one domain while the other one is still stable. So that's one part of it. The other part is that we might introduce a small defect here if delta E is very low. So that yes, we pay for it but we pay so little that the protein can still be stable. But in general, these mutations are gonna be so bad that you completely destroy the protein. You can measure this with very simple fact. These big isolated vessels we measure, how much energy do I have to put in by increasing temperature and how does the, for instance, the heat capacity change? Because here we see these transitions. At different pHs, I get the transition at different parts. And again, the change in heat capacity is usually indicative that something happened. But that doesn't really, that doesn't prove that it's cooperative. It doesn't prove that it's a phase transition. It just, that means that something happened in a fairly narrow range. So this is exactly the question you were asking, right? Do I know what happens that I fold? Do I fold or destabilize the entire protein? Or do I fold or destabilize one helix or one helix turn at a time? Well, we don't know. Let's try to calculate it. So what we really, from the calorimetry, it's very easy to calculate the specific heat, right? That is how much energy do I need to add per mole of protein or something when I'm unfolding it? So I want to know what is the specific heat per melting unit? Is a melting unit roughly the size of the protein or the domain? Is it much larger? So if this melting unit is equal to or larger than the full protein, it's going to be an all or non-transition, right? Then it's the entire protein. If this melting unit is much larger than the protein, well, then it's going to be more multiple protein molecules and these fibrous proteins or something. And if the melting unit is much smaller than the full protein, then we have exactly what you were arguing that you might have a protein that unfolds in small parts. So we somehow need to get what is the specific heat, what is the size of this melting unit? Because if we know what this is, if we can calculate this, I can compare that to the experimental value. And this is another one of those definitions that look hard the first time you see it. I'm always a bit torn when it comes to equations, how much you should introduce them or something. This is not how I would work with equations in reality. So what would you do here? Don't look at the notes. Well, right, then you would need to start sitting down and working with equations. Do not be afraid of making mistakes. You make mistakes when you work with equations. The whole point is that you're going to need to follow your gut feeling. Do I feel like this is going forward? Do I feel that the more I work with this, the more I learn about my system? Or do I feel that I'm just sitting here and juggling different letters and numbers and equations and it's not teaching me anything? The only thing you can know is that if this feels as if it's going forward, keep going. This might take days or weeks or months. But that's, of course, we're not going to continue this lecture until June 1st. And that's, so on the one hand, you feel so we hide all these equations, but I still want to try to teach you the feeling how you work with them. So the specific heat, this is going to be related to the enthalpy, right? The energy we need to put in the system. But whether something is folded is going to be related to the free energy. So at some point, it's going to be fairly easy. If we forget about those pressure things for a sec, we're going to need to introduce what is the energy of the native state of my protein? What is the entropy of that state? And similarly, we're going to need to define what are these in the molten or denatured states? And we can call them E prime and S prime. And the second you've done that, we can put up the free energies of each state and everything. These are just definitions. Even this is something you would not do in one minute, perhaps, well, this you might be able to do in one minute. But after this, you have to sit down and let's see, are there other states we want to care about? Do we care about the transition states? Why do we don't care about the transition states here? Yes, because here we're looking at equilibrium, right? We don't care. The key thing is transition states is coupled to what? What is that has to be important for you to care about the transition state? Kinetics. So the second you start talking about time or how fast that you need to think of the transition state. If you're thinking about equilibrium, only the free energies in the state matters. And that's why we can instantly say that, forget about the transition state for now. And at some point, we're going to need to say, as a function of these things, how much protein is folded? Or we could, of course, say, how much protein is molten? It doesn't matter. It's just one or the other. And since we're usually doing this experiment by denaturing something, it's easier to think that we start from having 0% molten and eventually we have melted everything. And that's the only reason why I say probability of being molten here. I could easily have said probability of being folded instead. And here it's easy. I just had two states. I have a native state and I have a molten state. And the probability of being in the molten state is just the Boltzmann factor for the molten state. And my world just consists of two states. So the partition function here is just the probability in the native state and the probability in the molten state, right? There are lots of equations, but it's actually very simple. And then you can do a bit of math. We don't know what E is and everything. If you start simplifying these things, after a while you realize you don't care what the exact energy is. And it's much easier to introduce the difference. If there are lots of quotes, all these quotients here, that corresponds to division between the exponentials and that corresponds to differences between the factors you have inside the exponentials. So if you do sit down here and use a little bit of your exponential laws, you can show that this corresponds to one divided by one plus and then it's a Boltzmann factor, but now it's the difference instead of the absolute numbers. So all these really bothers us that I do not have to care about the energy and the entropy in the specific states. This is just how those things differ in the phase transition, delta E and delta S. Keep that expression in mind for a second because then that is what we could get from the theory side. We can't really get much further there for now. But remember that we have an expression that says what the probability of being in the molten state is as a function of some stuff. But there is another way we can get this, right? So if you look at this phase transition or something that we have in a diagram, we know that before the phase transition here, we're not really molten. And after the phase transition, we can say to first approximation we're completely molten, right? So that the probability of being molten when you move from one temperature to another, that is really proportional to the derivative here. Or the derivative, the change in molten is really the change in melting divided by the change in temperature. And if to first approximation, we can say that we go from zero to 100%, well, the change in probability is going to be one, right? So this derivative of this expression is roughly proportional to one over the change in temperature. This temperature range over which we saw the change. Do you remember what those temperature changes were when we spoke about water versus protein? I know that this was so elegant, it was the first week. So this had to do with these collective transitions. And in a very large system such as water or something, this temperature range goes to almost zero, is 10 to the minus 10 degrees. For a very small system such as a protein, this is a fairly large range, say 10 Kelvin or something. So for a protein, this is certainly not zero. My only point here is that the delta T range when you start seeing this transition in an experiment, that's something that we can easily observe. And that means that we can calculate what the derivative of that expression is based on the experiment. We just solve it. Well, the derivative is one over the delta T. But remember, we already had on the previous slide, we also had a mathematical expression for this p-molten right. So if we can try to derive that and see if we get anything fun, we know what it should be. So if we get anything fun, we might be able to solve for other fun parts inside it. And then you start having these equations. This is not as hard as it looks. Seriously, it's not. The first thing when you get these large expressions, what is it actually that is the variable? There is only one variable here, and that's T. Everything that is not called T is a constant. So yes, it's a large expression, but it's mostly constants. The only thing you need to know here, what is the derivative of a fraction in this case? And what is the derivative of an exponential? And then what is the derivative of that term? And yes, there will be a little bit of mathematics to do here, but it's mostly bookkeeping. It's not really hard. And then you get a large expression here, squared. Another large expression, and then you get a small expression, and that has to do because you use the chain rule, right? So first it's the derivative of that entire expression, and then it's the derivative of the exponential, and then it's the derivative of the inside of the exponential. And you can actually show that these two large expressions corresponds to P-molton and one minus P-molton. And then you get that small effect on the inside there. And that is not something that's entirely obvious. You would need to spend 15, 20 minutes with the math and do this. Or just open mathematics. That is not a super complicated expression, right? If you do this in an experiment, you know what P-molton is because that's something we can roughly measure. How can you measure it? Roughly is the keyword. Well, you can't really, you can't directly measure it, right? You're saying, wait, it's too complicated. So we don't need to know what this is as a function of temperature. We just need one point where we know roughly what it is. And the folded states it's 100, but that's not really, if you're entirely in the folded state, you get zero, multiply 100, and then you no longer have any derivative, right? So you're gonna need, when it's zero and 100%, yes, then we know what it is, but it doesn't really tell us anything about the transition. So we're gonna need some point in the transition where we know roughly what it is. Put it in the middle, right? Halfway. To first approximation, if you're exactly halfway through the transition, what is this? 50%. It might not be exactly, and you might not, we're not talking about, we're just thinking that is this, eventually we wanna, is the melting unit significantly larger than a protein, roughly the size of a protein or much smaller? So we're talking about an order of magnitude here, right? Don't worry about 5%, 10% here and there. So if you're roughly halfway in the transition, we know what P-molteness, we know what the temperature is, and we know what the entire expression is, right? Because that's what we had one over delta T. So then we can solve for what the energy is. How much the melting energy of a protein is. So exactly at the halfway transition temperature, we know what the transition temperature is, and we know to first approximation, the probability is 50%. And we also know that this derivative, sorry, and that simplifies everything because it's gonna be 0.5 multiplied by 0.5, which is 0.25. So the only thing that survives is this last expression here, 0.25 multiplied by that term. But then we knew from the experiment that we can approximate this derivative by one over the temperature range over which folding happens. And then you just solve for that. And they say that the difference in energy for a folding unit here is roughly four times the Boltzmann constant multiplied by the temperature squared divided by the range over which we fold things. But we also know that the change per molecule, we know what delta H is in total, right? If you have five moles of molecules, we know how much energy we put in and then divide that by the number of molecules, then we know how much energy we would have per protein. And then it's just a matter of comparing these two. And this turns out, this is not, of course, there are exceptions, but I would argue that in 99% of the cases, they match. And that means that the folding unit you have is really one domain. So that is not the same thing as one entire chain, but they might be two or three layers, a small part of a protein. These is what we usually draw. Maybe 99% is a bit extreme, let's say 90 to 95. The exceptions here are, occasionally you have gigantic long chains. And in that case, you might have the first part of the chain folds as one unit, the second part of the chain folds as one unit and the third part of the chain folds as one unit. Occasionally you need to do that in nature. There are some problems with that because if you make a mistake in one of the chains, you're gonna need to throw away the other two thirds too. The advantage is that they belong together, which might be good for their interactions, but nature doesn't wanna combine too many domains in one chain. And that's why you frequently have proteins that are, for instance, these iron channels I showed you, you actually have four separate chains that come together. So, and that also means that we get this whole concept of what is a protein domain. There are a couple of different ways we can define that. So you can, from physics or biophysics, you like to think of a domain as a folding unit. Now, evolutionary, you could define a domain in a different way. How would you define a domain purely from bioinformatics? An evolutionary unit, right? And that actually turns out, when you think of mutations, you frequently think of individual amino acids. I'm not sure if I only went through this in the course, but the way evolution frequently happens is that you have domain swapping. So that nature cuts out an entire domain from one gene and puts that into another gene. Remember those volt-inscated channels I showed you? That voltage sensor? So what's happened there is that you've added one residue at a time and then boom, suddenly you get a voltage sensor. So you took that very simple protein and at some point in nature, we had a mutation where you had these four other helices suddenly be part of the same gene. So you're adding an entire domain at a time, never ever just an amino acid. This is also the strongest argument against intelligent design, because in intelligent design you would say that if you have something complicated like an eye, what would be, the argument is essentially that until the eye actually works, there is absolutely no evolutionary advantage from having those proteins, which of course would be quite true. And then why would the body keep that protein around while it's mutating it over billions of years until you have formed an eye? The second you have an eye, it's an evolutionary advantage, but at some point you need to go from no eye to an eye and until you are at that point, it's not an advantage. But the way evolution is offset is not the third. The way evolution and nature is offset is not by adding one amino acid at a time. You have the domains around, but you combine the domains in new ways. Both of them are fine. It depends what you're talking about. The domain is either a folding unit as here, or it can be an evolutionary related part. What do you think is gonna happen when you have things like prions or so here? The melting units will be a lot higher, which means that this delta E suddenly starts to get a lot higher, right? And that starts to enter into this. Because the melting energy per unit for a large aggregator or something is so much higher, that's why you get these prions or these plaques or something to be super stable. They're very, very hard to degrade. You can degrade them, but it costs a huge amount of energy, which is of course good if you wanna form something stable. It's bad when there's a plaque that you don't really want. Yep. So from the last slide, so I missed something. Let's go back. If you missed it, I bet that half of you missed it, but it didn't speak up. There's something I gave you this morning, Dan. It's a question, but why is it, when you said... Forget about the last line for a second. So what we get from here is that based on a single molecule, because a single molecule undergoes, a single molecule, the behavior of a single molecule is really governed by the Boltzmann distribution, right? You're either folded or unfolded. Your single molecule can visit one state or the other state. So this is a part that is a single molecule that it's either has folded or it has not folded. So at this slide, we're entirely working on one molecule and what is the probability of serving that in both states. And that's also why this delta E is the delta E, the probability for one molecule to be observed in that state. Now, the actual expression might of course then be an ensemble average and everything, but since the probability is based on a single molecule, that means that the energy is also relevant. This is the folding energy for a single molecule. The second slide, here we took it from the extreme opposite way. So here we're now looking at this ensemble because I can measure this and we realized there is a way to at least approximate what the probability, what the derivative of being molten is based on a very simple experiment. At this point, this is just a corner result. We can't do anything with it because I don't know what this is. The reason for bringing this slide up is that if I can simplify this expression, if I know what this expression is gonna be, then I should be able to eventually solve for what the energy here is, right? And this is really what I wanna get. What is the melting energy for one folding unit? And then what we showed as a, well, apparently it's the derivative of that expression that's relevant. So then I take that derivative, I spent some time of calculating that derivative and since this is based on the expression of that single molecule, that delta E is still the energy of folding a single molecule, or sorry, melting a single molecule. And then I just, since I have two different expressions, I can just set them to be equal. And once they're equal, well, there are a bunch. I had to assume that this thing, let's assume that we're exactly halfway. And you can actually, so if you work through the Boltzmann distribution, it's not gonna be exactly a 0.5 halfway, but this is a bit of an approximation. And then I just put the two sides as equal. Forget about the last line here again. If those two sides are equal, I can show that that energy of melting a single molecule depends on the temperature where this melting happens divided by the temperature range, the width of this. And that I can get from an experiment and that I get from the experiment. So suddenly this is an expression, how much does it cost? What is the melting unit for each melting unit? And the neat thing is, of course, that might tell me that is six kilocalories, but that doesn't tell me anything. But I also know how much energy did I in total spend on the melting and that might be six, well, six kilocalories, not per mole, but six kilocalories period. And if I now know that I had exactly one mole in the setup where I made the experiment, I know that, well, I have spent six kilocalories per mole for melting it. And since these appear to be the same within a factor of two or so, the melting unit is typically one entire protein for small proteins. So that leads, why did you take it and cross the road? Why did the protein unfold? Why does the protein unfold? And that is such an easy answer. Yes, because after taking this course, you say it unfolds, that's a chemical reaction, and when do chemical reactions happen? Yes, and it's favorable because the free energy is in the unfolded state is lower, right? That's why it unfolds, period. And we already explained this a little bit with the hydrophobic effect earlier on in the course, and you see that we keep connecting back to these very simple things. Dave Chandler in particular, he keeps doing insanely good work, just understanding the hydrophobic effect better on simple water systems. There is an amazing amount of good science here, and he keeps publishing on it. And we show this experimentally actually, that in this expression, delta E goes up with temperature in the hydrophobic effect, and the delta S is positive when we unfold because the more unfolded you are, the more different states you can be in, right? So this is just gonna be a balance of these two terms. And then when you start studying this, well, the entropy effect is relatively easy. We'll drop a bit with temperature, but it's not a gigantic effect. And in most cases, it's gonna be better. The delta E is usually better to be in the folded state rather than being unfolded. You have more favorable interactions inside a protein. So what does that normally give you? Well, if you look at protein stability as a function of temperature, where are proteins usually stable? Ballpark. Yes. And when would a protein become unstable? That's 350, 400 Kelvin or something. When the temperature goes up, there is something bad happening here, and that means that the protein is no longer stable, but it unfolds. But what you can actually show is because of this temperature dependence in delta E is very weak, and I'm not gonna go into all the details again because this is another one of those spots where I think I measure the black curve here, measures the free energy when you are unfolding it. So I will keep making sign errors if I try to reason about it. But the point is that as the temperature drops, under some circumstances, it appears as if the total free energy goes through zero again. And I know that I already mentioned this briefly earlier on the course. So there appears to be, for some proteins, there should just be a range in which they are stable. So if you keep dropping the temperature enough, you should be able to unfold proteins at low temperature too. And this has been published in a number of cases. I think the book has this example for apo-myoglobin. Could you imagine what apo-myoglobin is? I don't think he explains it. But based on what I talked to about with apo and holo and molecules. Well, so it's, oh yes, so it's a molecule that's either unbound or that it doesn't have its prosthetic group right. In this case, I think it's without oxygen. Here it's stable, but you can certainly, and here you're measuring the heat capacity. And when there are peaks like that in the heat capacity, it's an indication that something happened. There's a phase transition or something. So if you start heating this to around 60, 70 degrees centigrade, bad things happen and the protein unfolds. But as you see what starts then, when you go to lower and lower temperatures, here it gets really hard to measure because the water is freezing, right? But the way the heat capacity goes up here is telling you that something is starting to happening. And in this case, you can probably, they probably were able to have a little bit of salt in this so they could get to minus 10 degrees centigrade or something, but here I would assume that things froze. There are a bunch of recent studies. So this is 2009 with ubiquitine, where they've been able to show the same things. Here you can go to much lower temperatures with modern techniques, cold in saturation. You'll see it there too. That's why they... Good question. So this is what happens. Forget about the right part. This is a strange plot in a way because the scale hits negative up here and it's positive down here. So at normal room temperature, all these different proteins have negative delta H, which means that the energy is better in the folded state. It's better to be in the folded state than in the unfolded state. You would lose energy or enthalpy if you were to unfold it. But this is highly temperature dependent. So what happens as you reduce the temperature, it's still negative, which is good. But for some of these proteins, it starts, eventually it starts to become positive, right? So suddenly when you are here, for myoglobin in this particular case, myoglobin here actually pays a penalty in energy from being in the folded state. Energy-wise, it would be better to be unfolded here. And this has to do with the exact packing and the interactions. And you also see that this depends on the protein. So for myoglobin, this starts to happen at zero to 10 degrees centigrade. And here water is still liquid. And this is the reason why for myoglobin you could actually see this on the previous slide. If you had RNAs, well, RNAs would eventually become negative at minus 40, 50 degrees centigrade. So for RNAs, it would likely be very difficult to see this cold unfolding. But for myoglobin and cytochromes, you can do it. Now, this has nothing to do with biologists. This just has to do with the temperature effect of interactions. And this will happen for any system. For proteins, proteins have evolved so that they are stable in the range we typically need it. I'm not sure about you, but my body is rarely zero degrees centigrade. If you're around 37, here all these proteins are gonna be nice and stable. So normally for biology, this is not particularly relevant. Proteins are stable in the range where we need them. But this sounds stupid. Wouldn't it be much smarter if we could just have all of these be really negative so that you never ever had cold unfolding? Would be awesome, right? Much better stabilization in the use of proteins. Then proteins wouldn't be as sensitive to these small defects or anything. If proteins had stabilizations, they would have energies of one mega joule per mega calorie per mole. Oh, they would fall. They would fall instantly. It's beautiful, right? They're super stable. Exactly, because for everything in biology, sooner or later you're gonna pay. All that beautiful energy would become, we would become heat factors because when they fall, all this energy would be released at heat. Which would probably be pretty bad for yourselves. But if we ignore that for a while, at some point you're gonna need to degrade this. Just degrading food in your stomach. For every single piece of meat you had to degrade protein, you would need to pay that one mega joule per mole. That there is no way you would be able to produce that much energy. So all the biology works in the fact that proteins needs to be stable, but just stable enough, not too stable. Because you are, for every single killer calorie that you're gaining, you will also be paying that killer calorie at some point of view. So stability is, it might seem that stability is just a good thing, but it isn't. You should only be stable enough, not more. No, I agree, that's the way it is. The point, this of course also has to do with, all these numbers are also depending, are you looking at the free energy change when you go from folded to unfolded? Or from unfolded to folded? And make no mistake about it. I bet, and that's a bad thing with recordings, you can probably, I must have made five or six sign errors. It happens. You make sign errors when you just talk about it. The key thing, once you're sitting down and designing an experiment, take 20 minutes, go through this design, let's see. My delta H there, for this particular calculation I go through right now, it is a delta H from what to what? What is the start? What is the end? You can get it right, but it's just if we were to do this in every slide here, you would, we would take five hours per day. So that it's not hard, but it requires a bit of bookkeeping. And I agree that it's bad that the book keeps flipping it around. But when you take that minus TS, you get all these curves that you see, all of them have some sort of maximum stable point here, and then the stability starts to go down again. The only consolation here that I can give you is that it's usually fairly easy to realize what the sign is, because you know that proteins should be stable in a narrow range, right? So in this case, what is the delta, this is a delta G for? Is this a delta G for folding or for unfolding? Because, because we appear to have a narrow regime where it's, well, sorry, in this case it's actually negative up here. We have a narrow regime here where it's negative, so it's for folding. Had this been a positive axis, you would have been right. You might think that this is just a corny artifact, but this occurs in cold shock proteins. And they're actually heat shock proteins too. So there are some organisms, for instance, fish in the Antarctic. Their body temperature actually is around, well, zero, it can even be below zero degrees centigrade. There are fish that have an average body temperature of minus two degrees centigrade, because they swim in the ocean. And it's not like they're heat factors that heat the interior of the fish to 37 degrees centigrade. These organisms have evolved protein, first they have evolved proteins that are in general more stable at low temperatures. Second, both these and other organisms, they tend to express a special type of proteins that protect other proteins from low temperature. And they're called cold shock proteins. And this gene regulation usually goes up when they, I'm not sure if you have an extremely cold winter in the Antarctic or something, then you express more proteins to protect the other proteins from unfolding. There is the opposite to, of course, heat shock proteins. Any idea where you might see heat shock protein? Yes. So there are amazingly cool bacteria that actually live in volcanoes. And they thrive around 70, 80, 90 degrees centigrade at a point where, well, other bacteria would start to die, right? And these are bacteria that you can't kill them just by boiling water. At 100 degrees, they might start to be slightly uncomfortable at 100 degrees centigrade, but they're certainly not that. But again, these are exceptions of nature. They occur, but it's not common. And that brings me pretty much to the study questions. Sorry, I drew over four minutes. I'm going to go through this on Thursday. So I won't cover them now. Do you have any questions for me?