 I love this part because right now we're in the middle where we can actually talk about real proteins but we can do it at the slightly more advanced terms and a much better understanding than we had at the beginning of the course. So I'm going to continue a bit of the theme. Yesterday I finished up talking about membrane proteins and today I'm going to talk about lots of different proteins but not necessarily focusing on the individual proteins. I'm going to keep my voice down a little bit not to disturb Joe too much. They're having their course on scientific writing there. And the reason for looking at more proteins is partly related to what you did in the bioinformatics course. One protein is certainly important function-wise, but if you want to understand nature, you need to understand the diversity of all proteins. So why do you have 20,000 genes in your body? It's because they do different things. And why did those 20,000 proteins end up the way they did? Speaking about the number of genes. So let's just build the genes there. You have roughly 20,000 genes. Is that a lot or a little? So it's way less than we thought, right? A small bacterium can get by with a few thousand genes, possibly even less. I think there's a paper, I think it's part of the lecture today where, or might have been yesterday too, where Craig Venter a few years ago showed that they could design an artificial genome. So they created a minimal, an organism with a minimal genome. So they just created the very, the smallest set of genes that they could imagine that still created a functioning bacterium. If you look at the other extreme, Norwegian spruce, do you have any idea of any genes there is in a Christmas tree? Roughly 200,000. And you thought humans were fancy. You're roughly one-tenth as fancy as a Christmas tree. We have no idea, well, we do have some ideas why, but part of it, of course, has to do with evolutionary pressure. There is less evolutionary pressure in Christmas trees. While bacteria, on the other hand, they have to be mean, super efficient machines, or they're not going to be able to survive. So today we're going to talk a little bit about the whole concept of faults. And I'm going to keep talking about faults in general. We're going to talk a little bit about evolution, protein sizes, why things have the size they do. And then we're going to start looking at the whole concept of faults and the distribution over faults. This is going to look just like Boltzmann distributions, but it's going to turn out that that's completely wrong. Well, not completely wrong, because it's going to be related to it, but it's not a Boltzmann distribution. And we're going to talk about this duality between sequence and structure, and eventually start coming into a little more about the stabilization. Why are proteins stable? When are they stable? What happens if you destabilize proteins? What is the most normal way you can imagine that we're going to destabilize proteins? Mutations. Mutations in general destabilize proteins. In a few rare cases, they might stabilize them. So the reason for the stabilization is not just a theoretical exercise, whether we are going to be interested in the physics, but it's going to tell us something about what mutations are likely to stabilize or destabilize proteins. Let's see. Oh, yes, there. I should have some lecture slides, notes for you. Nine, nine, nine. Pass those around. And then we're gradually going to get a bit into transitions between states, and depending on how much time we have, we'll get some of the allosteric modulation out of the day tomorrow. Yes. You've done this a couple of times now. So I'm going to take a step even further back and let you handle this. Pick some questions and talk about them. I'm more than happy to help you answer them if you can, or comment on your answers at least. Some of them I might not explicitly have talked about yesterday, like 11 or 12, but that's a good reason to either you look them up or we're going to talk about them this morning. So I have time to wait. Pick one. Well, you can pick one. The first one or the easiest one or the most difficult one, anything you like. So all these things are right, but when are these large questions, and again, somebody said, let's break it down into parts. The first thing you said is super important that you can't have the hydrogen bonds whisper around facing the lipids. And that per se is going to cause a strong, strong effect that favors one secondary structure, which is what? Alpha helices. That is the main reason I would say why membrane proteins are to 95 or even 99% alpha helical. So then we'll take one mark off. That explains why we stabilize the secondary structure we have. The second thing you mentioned was what? So well, yes and no. If you compare this to a globular protein, right, how is this different from a globular protein? Right, but that's the definition of globular versus membrane protein. So what is the stabilization of a globular protein? The main stabilizing. Yes, hydrophobic collapses on the right. You need to have the hydrophobic parts on the inside and hydrophilic parts on the outside. That is obviously not true for a membrane protein. Yep, but that per se is not the stabilization effect, just being hydrophobic. So what is that stabilizes membrane proteins? Fundavals packing. Or aggregates whatever. And it's a much, much, much weaker and delicate stabilization than globular proteins in general. So it's more complicated to predict than everything. And I think that partly covers two other differences in interactions compared to globular proteins. One thing I can mention there that I didn't have time, well, we indirectly mentioned it. Are there never any hydrophilic or polar residues in membrane proteins? In the edges, I would say yes. So facing the membrane surface. Why do they occur there? Can you imagine any advantage with that? So imagine I'll draw two cases. First, this is my big droplet of oil. And then I'm going to draw my helix here. And then a second helix here. I can even assume that they're connected. And then I'll draw a second case. Here's a membrane. We have oil. What's the difference here? In terms of structure-wise and stability. Which one do you think is going to be most stable? Why? So there are two parts here, right? One of them is that there is, it's quite right, that one part is that you want to be hydrophobic to stay in here. But again, if you're hydrophobic everywhere and if you're hydrophobic in the loops, you can get up or down, right? And that's pretty bad. So that surprisingly, the fact, it's good to have hydrophobic things here, but it's also important to not have hydrophobic things up here. Because that's what creates the edge that prevents, you can't slide the helix down and you can't slide the helix up. So it's stuck in the membrane, exactly where we want it to be. And there are a number of very common patterns you see in proteas. So first you have the number of residues in here that are alpha helical is roughly in the ballpark of 20. Now I sort of had an eraser. Well, I'll get that later. So you have roughly 20 amino acids in the middle. And then you frequently have motifs such as glycine, glycine, proline, glycine. They do two things. First, both glycine and proline break helices. And if you put four helical breaking residues after each other, you can imagine that that helix is dead. So it creates a very sharp boundary of the helix. It's helix here and then it's no longer helix. And the other thing that many of these residues, protein is one of them, but also tryptophanen. It's very common to see those residues in the end, not even out in the loops, but at the very end of the helix in membrane proteins. Because it helps anchor the membrane proteins vertically. So do you know some typical functions of membrane proteins? Any particular function that might be different in membrane proteins compared to globular ones? Or fibrous ones, would that matter? Yes, that's quite right. If you would use one name for all that, I would say transport processes. Anything that's going to move things across something. And that's what you call transport. There is a famous theoretical argument actually by Onsager from the 1960s, I think it was, where he actually proved to maintain any type of process in particular life. You need a system with anisotropy. And what the membrane creates is, creates this anisotropy. Because if you have a homogeneous anisotropic system, it's theoretically impossible to keep any sort of process going in it, even if you feed it with energy. You will reach equilibrium. So what all these membranes actually help us do, it helps us create separate compartments in the body and in your cell. And that separation is what makes it possible for us to have things like processes involved in anything that you can use to send signals. Because it would be completely irrelevant to try to transport ions across a membrane if the two sides were connected, right? Because it would equilibrate it anyway. So what is biogenesis of membrane proteins? I have to confess we're somewhat biased being in the department we're at. Yeah, that's perfect. There's only one small question there. How does the DNA that is supposed to form a membrane protein find the translocants that are attached to the membranes? Is there not translocants? Ribosomes, I mean. Because there are ribosomes everywhere, right? So how do you find the right ribosome? Bring it to the translocant. No, we don't know. We don't know yet. So many proteins, of course, things that should be translocated and move to the inside of the cell is quite fine. But some things should stay in the ER. So there have been some studies, and I should look this up, because I don't think it has changed the last three or four years. But until a few years ago, we didn't know. And there was one argument that some of the residues that are very hydrophobic are expressed by multiple codons. And there was an argument by a colleague of ours in Israel who argued that there is an imbalance in the codons so that it's more common. There are certain codons, and particular codons that have lots of uracil in it. And those codons would be overrepresented for membrane proteins. So his argument was actually, it might actually be the RNA codons that help you decide which one's target. But that's, I would, there was one bioinformatics side. I find it exceptionally interesting because it's such a crazy idea. And crazy ideas are very frequently wrong, but just occasionally they are also right. There are very few amazing ideas that turned out to be obvious from the start. They were frequently seen as crazy. So that's, it's a great research topic, but there is so much we don't know about signaling and targeting in cells. The other thing that the second those proteins have been secreted and everything, why, how do we decide what proteins are not where? So what's the signal peptide? I know we didn't cover it yesterday. Right, that's the function, but what is it? So it's just, it's a stretch of amino acids at the start of a sequence, right? And they're usually hydrophobic. And they can be a pain to separate them from membrane proteins because they look just like membrane proteins. But they are frequently not quite hydrophobic enough to actually be inserted in the membrane. I think that was the biogenesis. What sequences become membrane proteins? And you think this is easy, but it's not quite, we can take the obvious answer first, but it's not quite as easy as the obvious answer. So what is required for an alpha helix to end up in a membrane protein? So that's the obvious answer. They should be hydrophobic. But then, sorry, it wasn't yesterday, but the day before yesterday, we showed, I showed you all these S4 helices in the voltage-skated ion channels. So based on that definition, all of you would be dead. You would not have any voltage-skated channels. So obviously that definition is not right. It's not, it's not that I would mark you wrong at the exam if you said that they're hydrophobic. But my point here is that to every complicated answer, to every complicated question, there is a simple answer and it's wrong. So it's not, apparently, just the hydrophobicity of an individual helix is not enough to decide whether it goes into the membrane or not. So that, the voltage sensors I showed you, they had four helices, and this was the fourth helix that was very charged even. It's not a coincidence that it's the fourth helix. Can you imagine in any way the body, how the body does this? What do you think would be, assuming that this was the first helix, that it was the first segment, do you think that sensor could be a membrane protein? And you're quite right. So it's very important that it's the last and the fourth helix that can be a bit bolder, even charged. So if you look at this before, what comes before, that helix had four or five depending on how far down you go, four or five charges. But before that, S3 has two charges, negative ones instead of positive. And that's critical. So why does the fourth helix insert? Well, no, not by itself, because we just said that, right? If this was the first helix, it would not insert. So it can interact with the third helix that is already there. But now I'm just moving between layers of turtle sur or something. So why is the third helix there? So no tabene, the third helix did not have as many charges. But it's still not good to have a third helix there. So what happens in these cases is that the first helix, the first segment has to be clearly hydrophobic. That is what drives insertion. And then they're connected. But that connection does not mean you can insert anything, because it would just get stuck in the translocon or something. So what you can then do is that the second helix is probably also quite hydrophobic. I haven't checked S2 specifically. Then the third helix, yes, technically it has those two charges. But apart from those two charges, it has a ton of very hydrophobic residues. So all those other hydrophobic residues will help pull in those two charges. And we usually call that a marginally stable helix. So well, it's not really happy in the membrane, but it can push into the membrane if we absolutely force it. And now we have this surrounding with three helixes, and in particular two negatively charged residues. And that gets the fourth helix in, even though the fourth helix in itself would not like to be there. So it's quite right that it's right in the sense that a membrane protein as a whole, the transmembrane domain has to be hydrophobic, but it has to be hydrophobic enough as a whole. We can allow individual helices towards the end of the sequences to break this pattern if they are stabilized by other more hydrophobic helices. Ah! No. We don't know. We have... It is widely believed, meaning that I and some other people think... If you look at an individual translocon, there is not enough space there to have more than one alpha helix. And that would seem to answer your question. They have to be inserted one by one. But the problem is that if you look at many of these helices, the loops... Let's see here. If we start to insert this helix first, this is number one and then number two and then number three. And we're pushing it in that direction. So the first helix can go there easily, right? But the second helix, if we just push that in the same way... Wait a second, that's not going to work. Assuming that these are short loops. So you somehow would need to turn it around or insert those two helices as a pair or something. Or imagine if you had a very long loop here, you could imagine that... Insert helix two in the wrong way first and then have it turn around in a membrane. Can you imagine how expensive it would be to have an entire membrane protein turn around in a membrane? Is that going to happen? Well, so... Full disclosure, if anybody had asked me that and I didn't know the answer, I would say, obviously, no, that can't happen. There appears to be proteins that can do it. It's very rare, but they can. We have no idea why. And we absolutely have no idea how it does it or why it's turning around. But there appears, under some conditions, some membrane proteins appear to be able to turn around in a membrane. I have no idea. Or could you maybe have a pair of translocons or something help create a larger channel or something? We don't know. So what this relates to another concept is that the entire topology and everything is that decided co-translationally or post-translationally. Is that are we inserting them and determining not just... Are we inserting every helix exactly as it is being translated? Or are we somehow first creating the helix as maybe inserting them in the wrong way and then fixing this up later? And the truth is that they're probably a bit of both. Again, this is still very inconclusive results. They're exceptionally hard studies to make, but there are a couple of examples of both. So we know almost nothing about it. It's a great... It's a great topic, probably 10 PhD thesis. So we don't know. It's certainly not an eight question. We don't know. All we know is that there are, again, there are some examples in the literature where we can... Well, we know that if the way things would insert, the protein would not work and then you wait 10 minutes or something. We're talking about long times compared to molecular times, and that's how the protein works. So for whatever reason, the protein, something happened when it was in the membrane, so it fixed itself up. Likely that there were some helices turning around or something. Whether this is going to be the case in entire helices or you're going to have smaller re-entrant segments, I find this much easier to believe if you think about small re-entrant segments that you probably saw in the bioinformatics course. So there are certainly some cases where you can insert a small segment by dipping down in an already existing protein, but we don't know. That's the answer. You have to do research on it if you want to find out. Yep? We've got in the tertiary states, it's possible for them to... because protonated, so it's uncharged, and when protonated, can they deprotonate and stabilize? It is possible, yes. And remember what we spoke about protonation? It's actually a surprisingly good question. When we compared the positive and negative charges in membrane helices, which one was easier to insert? Why? Well, positive inside, that has to do with the location where we have the loops, right? And now we're talking about things inside. If you absolutely have to insert charges into the membrane and you're not allowed to deprotonate them, would you try to insert something positive or negative? Do we have any other suggestions? Yeah, why? So what did a lipid look like? Well, most lipids, there are some charged ones, the most it's with rhionic, but slightly below the entire head group region, you have these carbonyl groups. And the carbonyl groups... I'm writing out this. Let's draw a lipid here. So first you have some dipole up here, and then you have the long chains. And out roughly there you have the carbonyl groups, so they're a bit done. And the carbonyl groups are characterized by being a carbon and then a double bond to an oxygen, and then they continue down. That oxygen is going to have what type of partial charge at least? Just like water, it's going to be slightly negative, right? So if you're a positively charged residue, you can interact with the carbonyl groups. While if you're negatively charged, you're going to need to stretch all the way out to the phosphates, which is even further out. So you will end up distorting the membrane more with negative charges than positive ones. And you can actually see this pattern in bioinformatics. There is a strong pattern of the positive inside rule, but if you try to, if you remove the systematic difference between inside and outside, you can definitely see there are very few charges in membrane proteins, but the positive ones are overrepresented. So what happens, in general, it's always going to be better to insert positive and negative ones. The other alternative message is that you can deprotonate them. I would say, and again, this is reason we did these calculations a few years ago, positively charged residue. Arginine and lysine, they're virtually always going to retain their charge so that they are protonated in the membrane. Negatively charged ones is 50-50. It's just borderline whether they deprotonated or not. But that's not going to help you. Why? Otherwise, it's a great idea. Can't you just deprotonate them? They're no longer charged, and they're going to be advantageous to insert. The point is that you're going to pay to deprotonate an aspartic acid or glutamic acid. And you're going to pay in energy almost as much as it would cost to insert it in the membrane. So the only question, are you paying by deprotonating it or are you paying by inserting it in the membrane? There is no such thing as a free lunch. We talked about translocons yesterday, too. That's a fun one. And also, not quite as easy as it might look. Here's another question where I would suggest to break it down if you find it difficult. So where is it sitting and deciding that? Yeah, it's a membrane protein. That's pretty much a helix channel, roughly, right? We don't call it a helix channel. But it's a membrane protein that sits right in the middle and helps things insert. So how does it alter the free energies? So you answered the question completely correctly, but your first state, and because of your correct answer, your first statement was incorrect. So let's say, and it's a great illustration because I wanted you to fall in that trap. The mistake you did do is that you did not break it down. You tried to answer it immediately, and then you rushed over it. So what is the first thing you said after that? You said that it doesn't alter the free energy. You said that it lowers the barriers. The last time I checked, that's a free energy. So it does alter it. But the point is that there are many things here, right? I'm going to need to go and get a... If you start out here and then there's a barrier and an insertion, right? Anytime I ask you about a free energy, draw a curve like that. So you know what free energy is. There are at least three energies involved here. So you have one, two, and three. And what you meant by your first answer is that it's not alter the free energy difference between state one and state three. And that's quite correct. But it does change the transition state. So it does change the barrier, but it does not change the difference between one and three. So again, don't move too fast. Break it down into pieces so you realize what question you're answering first. Number seven and eight aren't so much questions. It was not a coincidence. Again, full disclosure, these are pet products of mine. We love them, but they're not just random channels. They're fundamental important building blocks in your nervous system. And voltage channels conduct electric and nerve signals. And by that conduction, where are they conducting that type of nerve signals? Well, but all cells have membrane. The body is conducting it inside a cell, right? Along a cell. And again, nerve cells in contrast to all other cells have very large physical extents. So they conduct it inside the cell. Voltage-skated channels actually occur in... It's a super cool concept, actually. The whole concept of excitability. That you could take a cell and somehow put it in an electrically active state. Which, of course, is a signal we see in the EKG and everything, but it's everywhere. It's in the peripheral nerve system. It's in your central nerve system, your brain. Actually, the reason why we all exist is that when a sperm fertilizes an egg, it's the voltage-skated channels that help close the egg so that other sperm don't come in, so we wouldn't exist without them. All your heartbeats, voltage-skated ion channels. Ligand-gated channels, on the other hand, they mediate these... Sorry, the ligand-gated channels mediate the nerve signals that we cause by the excitability, but they move between cells. So when you release something, that by sensing this uptake, we can cause a new nerve signal in the next cell. And I might have touched upon that briefly on Tuesday, but can you imagine why people are so fascinated by ligand-gated ion channels nowadays and why they become so hot? They're outstanding drug charts. So there's an entire new field, well, new, it's not that new anymore, because neuropharmacology, specifically developing drugs to tune your nervous system. Because until even roughly when I was in school, the nervous system was just one big black box, the brain and everything. We have no idea about the brain. And many of the diseases that we tend to treat with psychiatry and everything, there are, of course, disorders. So just a way that right now we're pretty good at fixing you up with antibiotics or something if you have an infection. A whole lot of the diseases that we... There are, of course, psychiatric diseases and everything, but we don't have any good ways of treating them. And I bet to give this two more decades, the vast majority of psychiatric diseases will be able to go in, find mutation differences, correct levels of neurotransmitters or whatever it might be. The difficulty here is that traditionally we try to design drugs by just shutting processes off. And that's not going to work here, because if you start shutting off, nerve signals, bad things will happen. So we're going to need to find ways to tune nerve signaling. I didn't mention this either, but so one of the reasons why my colleagues of mine are super-interest in ligand-gated ion channels is that they're working on better anesthetics. And I did mention that anesthesia is important. But why do you need better anesthetics? There are tons of them. So we don't really need better anesthetics for you. We can't take any of you and go and state to, oh, I shouldn't, but... The problem is that anesthesia is a pretty rough process on the body. So you're sedating a patient, but this completely disrupts all the signaling in lots of places you didn't intend to disrupt it to. So what suddenly happens is that as the patient is lying on the table, suddenly the blood pressure goes up. And it's actually not enough with anesthesia because you also want to make sure that the patient should not feel anything. The muscles, because if the muscles are not relaxed, it's very difficult to cut in them. And you want to make sure that at least during... you want to make sure that the protein is amnesiac, that they should not remember anything from the procedure. So suddenly we're talking about a cocktail of three or four drugs. This still works great with you. No problem, whatever. We give you 50% extra of the drugs. But then suddenly you have a severely obese 80-year-old patient. A problem with blood pressure. This starts to get complicated. Or if you have a surgeon and say you have patients in the army, if you have patients who are severely wounded or something, they've lost a lot of blood, and you're now also going to sedate them during this fairly rough process. And this body has just lost half its blood. Those patients don't wake up again. And this is difficult enough that even for completely normal healthy patients, it's happened roughly once in a thousand or so that you have a bad response to the anesthesia, and the patient dies. It happened a few years ago. Even at Stanford where my colleagues are, there was some sort of TV show anchor that went in to do a standard procedure, and he died on the operating table. Bad things happened. So again, so my colleague here, who is actually working in the OR and sedating people three days a week, he would never have voluntary surgery. Things can't go bad. And the problem that happens is that this is why you actually have anesthesiologists, a special doctor just sedating the patient. So what happens is that this doctor realizes that your blood pressure is going down. But that's not that bad, because we have drugs to keep your blood pressure up. So that means that we now give you a fifth drug. But the drug that kept your blood pressure up had some other problem. I'm not an MD, but... And then you give you a sixth drug. So this works when you are in the operating room, and then we pull you out to recovery. And then you now have six drugs that are all balancing things, and they're all wearing off with different rates. So people frequently die, well, not frequently, but some of these, that's actually happening in recovery. Because suddenly the drug that kept your blood pressure up wore off faster than the one that repressed your blood pressure. So one of the reasons we would like with this, we would like drugs that are... Ideally you would like a drug that's tunable. Rather than having drugs that fall off in 15 minutes, imagine if you could have a drug and a counter drug, so that you can specifically decide, oh, I want to sedate this patient 5% deeper, like a dial. And then it's a little bit... The breathing is starting to go down here, so let's reduce the level of anesthesia by 5%. And I can't do that if you're about to die. I can't reduce it and say, well, the patient is going to start breathing in 15 minutes. No, in 15 minutes the patient will be dead. So I think there's a lot of interest in finding drugs that we can use, that we can fine-tune, whether there are fewer side effects and everything, so that we can perform surgeries more and more elderly people. And this actually works. It's one of the reasons for all these new anesthetics. So today we don't hesitate to perform surgery on a 90-year-old. You would never have done that a generation ago. But they're important. They're also super important for lots of drug abuse. P-type ATPases. They're important in a whole lot of physiological processes, in particular one. So what do they do? The main function of it. All right, this is an entire class of proteins, too, whether we spoke specifically about one. What thing? Ions. What ions? Given the name, you can probably already get that. This is the one that used ATP for the transport. But there's a very common one. What did we call it? If you know the name, you're going to know what it transports. N-A-K. N-A-K. And sodium and potassium. So sodium-potassium ATPase. Which is the big... You could almost end it in your body. This is the one that creates the imbalance in ions, that creates the nerve potential everywhere. That is what keeps our entire nervous system going. We have a gigantic group out the side of the lab, if you're interested in this, working with super-resolution microscopy, where they can actually get resolution down to 10 nanometers so we can see where in your nerve cells are specific individual copies of these molecules located in live cells and also see how their transport works and everything. It's super cool. I could ask you, so what has explained the role of voltage and ligand ketadine channels? Because they have these distinct roles. One is responsible for the signaling inside cells, and the other one is responsible for the signaling between cells. And that is... I also think it's a beautiful example of how evolution has worked. Because, again, the more you think about it, between cells you're going to need some sort of chemical signaling. Inside cells' chemical signaling is really inefficient. It's much more efficient to do with electrical signaling. So I spoke a little bit about anesthesia, Bob, but this is another trick question. We have no idea really how anesthetics work. We know nowadays that they bind to the specific binding site as allosteric modulators in these channels that I showed you. But we don't know what it is that makes you go unconscious. Another great reason, a topic for P, we're not just one. Ten P is D, this is... Not just bind, so that... They do, but... Let's get things right here. So the ligand ketadine channels, the fundamental way they work is that they have one extra cellar domain, so a domain on the outside of the cell, even, where you have the neurotransmitters binding. And when these neurotransmitters bind, they will cause almost an earthquake in the entire protein and eventually open up the pore, the channel, in the transmembrane domain. And then the ion channel starts to conduct ions. Both anesthesia, all these drug abuses and everything, they bind to the same channels, but they don't bind to that primary, the normal binding site. You typically call it, occasionally called the agonist binding site. But they bind, there is some sort of secondary binding site, which we nowadays, we tend to call these allosteric binding sites. No matter how much anesthesia or alcohol or something you bind, the channel will not open, period. So it can't open the channel. And that's, I think that was one of the reasons why it took us so long to understand how they work. Because the obvious thing that... My obvious would guess should have been that it binds to the same thing and opens the channel, right? They don't. But they kind of act like a lubricant. So that if you have the anesthetic bound, the channel responds much more quicker and to much lower levels of the agonist, the normal ligands, the neurotransmitters. So it somehow, it usually primes the channel. It makes the channel more receptive to them, for some anesthetics. Other molecules close the channel. Remember this thing I said about the Dr. Jekyll and Mr. Hyde attitude of these channels? They are complicated. It's not just one channel. It turns out even the GABA channel, so there, I'm going to talk a little bit more about when we talk about our research later on in the course. It's a small channel with five different subunits. There are, and now we're not talking about all the channels, just one, the GABA channel. There are 17 different genes in your brain for these subunits. So just the GABA channels, 17 to the power of five different ways you can assemble these channels. And we know very, well, we know that some of them are, only some of them are sensitive to anesthetics and everything. But this is likely the reason for all the diversity in your brain, why we have different types of nerve cells in parts of the brain in the central versus peripheral nervous system. We know surprisingly little about it. But I'm going to tell you a little bit about our research towards the end of the course. But the point is that both anesthesia and all these things, it's an allosteric process. So you have a small secondary molecule that changes the way the normal molecular binding works, or at least the effect of the molecular binding, which is super important and common in biology. We're going to talk more about it later today. I did not tell you about how long it takes for things, well, indirectly with it, but I'll be a bit nasty. Try to answer these. Even if you don't remember it, try to reason what times we must be talking about. Don't just say a number. Try an argument why you think it happens in that timescale. For which one? So number 12, then. Yeah, I think you were in the right ballpark. Well, why would you say 200 microseconds? Yeah, okay. No, but actually it's a super good answer because this is of course how I reason about these things too, right? I compare them to other things I know. So an upper limit here could somehow be the time it would take to fold a normal protein. This is a smaller transition that needs to happen quicker. It can take a second. Why? What would happen if it took a second for a channel to gate? Yeah, it would take like half an hour to move a finger. So that obviously it has to be way faster than a second for these things to happen. And you can say that even without knowing anything about structure. Good. So we have some sort of upper limit. And then if you reason a bit more, there are going to be tens of thousands of channels that need to open on the way down. Things that are way faster than milliseconds. So a few hundreds of microseconds I think is a great estimate. But it's also, it has to be slower than say just adding some more residue into an alpha helix or something, right? If you think about the time it takes for an individual ion to go through a channel. Actually, you could. You could calculate the acceleration and everything. I'll spill the details. Think ten nanoseconds. And I guess you could compare it to say that adding one more turn in an alpha helix or something, but it's the charger. The reason to mention this is that membrane proteins and in particular ion channels, they are so insanely efficient that to an ion, to the right type of ion, it just looks almost like a hole. There's nothing whatsoever that stops it. It goes through with perfect, like billions of ions per second. Remember at the beginning of the course that I also talked about this difference between sodium and potassium. The reason why this is so amazing is because it also, to the right ion, the smaller one, sorry, to the right ion, the larger one, it's just a hole. We don't even, there's no speed bump even. To the wrong ion, it does, it never goes through. Not even once per second. It has an insane efficiency in terms of the speed by which it conducts ions, combined with an even more insane efficiency that it never, ever does a mistake. And you're doing this with something that's the bulk of 20 alpha helixes or so. Oh, sorry. Exactly, open or close. No, as a gating, because if you mean opening, you mean opening, right? Closing, you mean closing. And if you want to talk about the process of the open channel either opens or closes. That it changes from one state to the other. We're using that word for it. Which is much easier to say than open or close all the time. So channel gating is the process when it's changing between those two states. So how long does a larger structural transition in a membrane protein possibly take then? Yeah, number 12. So we said bulk of 200 microseconds or something. But again, we're within an order of magnitude. Because it was significant that if you started to go up to the level of several milliseconds, right? My reaction time is maybe in the bulk of well, 0.1 or 0.05 seconds or something depending on whether I need to do it voluntarily or not. So that it has to be individual channels has to open hundreds of times or even a thousand times faster than that. So bulk of hundreds of microseconds. But no, it isn't exact. There are some channels that will gait. Some of the ones here might very well gait in 20 to 50 microseconds. There will likely be some other ones that take a millisecond. So there isn't a unique number. In many cases, we don't know. It will vary from channel to channel. So the main point of these questions is to teach you to reason about it. To realize what is reasonable and what is not reasonable. The point is not that it's 200 instead of 500. 500 might very well be a better number. I don't care. Because the point is it's not one second and the time you start there, you can start to use this forking down so that you have a rough idea whether the process can be important or not. And if you think about it, it's roughly the same reasoning we argued when we talked about Gibbs versus Helmholtz free energies. That's what in practice the pressure and volume fluctuations are not going to be important for proteins. I never calculated that. You didn't calculate it either, I think. But we just said, you know what? Compared, yes, you're right. There are six orders of difference magnitude that there is no point of even worrying about it. So it's important to have this gut feeling about different concepts. So all I'm asking for is to have a gut feeling about these things. I would never say that you're wrong even if you were in order of magnitude. But if you say that it takes 10 seconds for a single ion to go through a channel, it's not going to work. So number 13 is a bit related to the things we talked about here. How long does an even larger structural transition, take? I even said it earlier today. You can be talking about 10 minutes even more. It can be exceptionally slow. And compare this to the beta sheets that we talked about before. This means that there must be some gigantic energy barriers involved. Such as the energy barriers if we need to move things into membranes and everything. And I'm sorry that it sounds like hand waving. The reason is that it is hand waving. We don't know that much about it yet. Let's go back to the research front. We spoke a little bit about these different models. What is the fluid mosaic model? With the fluid part in particular? Yes. It's a better answer than you think. This really is a two-dimensional liquid. It's completely fluid. There is no rigidity whatsoever. The membrane proteins will literally diffuse around. You can actually track this with this super-resolution experience I talked about. The ion channels or the pumps in that case are actually diffusing along the membrane. That's a function of time. The fluidity here comes from the lipids. The lipids are amphiphilic. Meaning that you have one hydrophilic part facing the water. And then you have the hydrophobic tails facing each other. The lipid bilayer itself will actually be what you call a liquid crystalline phase. That's not part of the course. You will have a two-dimensional liquid. Things are completely free just as molecules are free to move around in water. Lipids are completely free to move around but they can only move around in the bilayer. They can't move out of the bilayer. You have a two-dimensional liquid. But then of course we start to assemble things into the cell. You might have membrane proteins. You might have some sugars, cholesterol and other things. Some of those molecules might even bind a bit to each other or interact. You might think of lots of different regions in the cell. But again, if you were to look at this in a super-resolution microscope it would move around. And today this might seem obvious with the computer simulations. But this was so not obvious in the 1970s. Another alternative you could have imagined is that couldn't membranes be completely rigid? Most other things in our cells produce are rigid. Bone is rigid. Collagen is rigid. If you had to take a bet you could now fold the entire membrane into a specific structure and then all the proteins sit exactly in place. And this is both a curse and a blessing. It's a blessing because things need to work that way. It's a curse because this makes it very difficult to study membranes. You can't image membranes live because there is no average structure in a membrane. And again, if you want to not that you don't need to know it but there is another name. That's the fluid mosaic model. The guy's behind it. And then we had the other model called the Popo-Engelman model which also had another name and the reason why I say Popo-Engelman is because if I tell you the other name it's going to be so much more obvious what it does. What was that other name? No guesses? Do you remember what the name was? Two-stage. If you don't remember anything else remember that Popo-Engelman corresponded to this model and if you do that you will likely ring a bell and realize it was the part that you first insert and then you find each other. Is that model true based on what we talked about this morning? Yeah. Not really right? There are exceptions to it. There are things that are in some cases there are helices that need to find each other that S4 helix obviously could not insert if it had to insert completely isolated first. And this is of course the reason why people spend so much time finding that it's not strictly true that things insert completely independently and find each other. But I think the point of models is not that they should be exact and accurate. The point of models is that it provides a conceptual way of describing things that help us understand it. And I think the Popo-Engelman model is outstanding in the sense that it is roughly correct but the point is there are exceptions to it. And that gives us a 16. How are pumps different from channels? Hm? Yeah. The channels are simple boring. They're just holes in the membrane. In so many channels actually of course how these holes are selected so they only let through one type of ion. But in general an ion can never transport anything against a gradient in concentration or voltage, whatever it may be. Channels can only help equalize things. I already went through the outline that we're going to talk about today. So let's jump straight into it. We spoke about the fold universe a little bit on Tuesday. And in particular the classical quote from Cyrus Schottja, a thousand folds for the molecular biologists which sounds so much better than 1,500 folds for the molecular biologists. And as we always say that if you think that there are few genes in your genome, 20,000 this number is insane. Well, they're not identical structures, right? But there are only in the ballpark of 1,000 different ways to fold all the proteins in your body. Or the Norwegian spruce. And being able to stay in these complicated life processes with such a small number of folds, it's I would never have guessed it. So that the complicated question that we end up with that we didn't really answer on Tuesday that on Tuesday we just observed this. Is that there appears to be there is a, we know that there is a huge diversity in sequences. And for some reason all these sequences end up forming just a handful of folds. If you have fairly large hands at least. And we don't know that. Or rather we know that it happens but we don't know why. As we saw yesterday that there are a handful of typical folds. These four helix bundles was one of them. The book likes to talk about this 28 year old which actually I think is quite good that 20% of folds account for 80% of the proteins. So if you think 1,500 was low we're talking about 2,300 folds that describe almost everything. So now we're down to 300 different ways to fold all the amino acids in your bodies to create you. This is mostly true for RNA. There are a bunch of folds for RNA. DNA in contrast, DNA actually depending on how you count it can have one or three folds. There are AB and Z forms of DNA. So for DNA there is only one stable structure. And now we could argue that there are three possible answers to this question. This could be caused by evolutionary divergence. That is maybe it was the case that all those if you pick a fold, the globin fold maybe it is the case that every single protein that has the globin fold was at one point in time related. So that all the hundreds of proteins in your body that they have the globin fold their sequence identity might be so extremely low nowadays that you can't even identify that they are related anymore but at one point in time they must have been related and that is the reason they had that fold. So they have diverged so much that you can't even say that they are related anymore but at one point in time they must have been related. You could also say that it's some sort of functional converges that globin fold is so awesome or maybe the fatty acid binding protein is probably a better example. If you have this small pocket that you want it to be hydrophobic on the inside and hydrophilic on the outside it's so obvious to just do that with two beta sheets. You can't nature favor simplicity right and there aren't that many simple folds. So maybe evolution has converged though the sequences came from completely different well the sequence were completely different from the start not evolutionarily related but they somehow gradually convert to the most efficient folds from instance of function. And the third possibility you can imagine that maybe just maybe there aren't that many ways to fold the protein. There are some folds that are better than others. So what do you think is true of these three all of them to some extent? I would argue we obviously know that there is a lot of evolutionary relationships in cells. So they're going to be tons of proteins that are there are definitely cases I think iron channel is the obvious example that if you're going to need if you need to push ions through a membrane you're going to need to create a hole that is more hydrophilic on the inside so that will shield the ions from the hydrophobic environment. The amount of imagination we can have there is limited. There has to be a hole in the middle and there has to be something around it. But what we're going to spend a little bit more time talking about is this third one. There are surprisingly few different folds that will create stable proteins which is intimately related to the role of mutations and other things that we're going to see and why proteins form the folds they do. Did you talk about fold patterns in the bioinformatics course? I'll spend a little bit of time talking about it. So something already in the late 70s or early 80s when we started to get more and more structures we realize at some point we want to classify these structures and we somehow want to say that all the global folds are obviously the same and then at some level we want to be able to say that well we have some class of parallel beta sheets versus orthogonal beta sheets or something. There is no sequence identity whatsoever so let's just look at these proteins. We want to be able to classify them. And there aren't that many. We also talked about mixed alpha slash beta or alpha plus beta proteins Tim barrels etc. You can dig this down as much as you want. So there are a bunch of ways you could imagine classifying this. If there are a bunch of ways to do things scientists will not agree in one way of doing it. People are going to find a few alternatives. There are two databases that you might want to hear of. One of them is called SCOP and the other one is called CAT and again you don't this is more bioinformatics. I'm not going to ask you specific details about the databases. The reason why these are different is that CAT is very much mostly automatic classification. We try to let a computer classify proteins which is awesome in particular today when you have 130,000 structures you need computers to do this versus saying whether your beta sheets are parallel or anti-parallel whether it's alpha and beta or just alpha or just beta if I have the structure you could probably write a program to do this. And then of course there will be some more difficult things but some sort of very large class which is just alpha or beta or alpha and beta and then a source of large architectures that corresponds roughly to the examples I've been showing you and then there's going to be the topology which describes what is the exact number of sheets and healers you have and on the lowest level is going to be the homology which is what you've done in the bioinformatics course but on the class they don't need to share any sequence identity whatsoever so these are descriptions of what the proteins look like, the shape not the amino acids SCOP on the other hand is the opposite. It's a database where people sit and look at proteins in a brush it's completely insane. Alexio Merzins spent a few decades of his life doing this but as good as computers are computers will not be as good as a really good scientist and Alexio was an outstanding X-ray crystallographer when he did this so while it might sound bad to do this with humans if you're interested in biology if you're interested in just doing statistics and everything CAT is fine but if you're really interested in using this for biological conclusions and everything SCOP is amazing because there are of course examples that they appear to break the pattern but if you know enough about the biology and if you read all the scientific papers about this protein you actually know that it belongs better in that class and it's just a freak of nature why they had that extra alpha helix on the outside so yes it has an alpha helix which is still classified as an orbita protein because that is the family that really belongs in so I love to use SCOP but Alexio has gradually pushed this over to others and it came from the Cambridge group which I was to and it's definitely worth browsing around and looking at these databases yep sorry those no I think that they faked this a bit actually I figured no they might very well be they might very well be I've never thought about that they probably are read structures they probably are read but of course shown from the right way and then I bet that they cut off the domain they are right because that's a timbrel right this is a dimer they used to make sure that it looks like an S they might have faked the dimer that I don't know but they were read structures the other thing that's going to happen and why these databases are important that structures undergo evolution too or of course it's not the structure that undergoes evolution it's your genome that undergoes evolution but evolution happens because of what so what was the central dogma so where is evolution in this picture so at this point this is evolution which is important or actually it's not so much evolution it's rather a natural selection right and there are a bunch of cool examples Lama hemoglobin binds oxygen harder than pony or horse hemoglobin why and what difference does that make you have less oxygen which means that you have lower oxygen pressure so that it helps you to bind it harder but shouldn't it always be good to bind oxygen harder why don't all do it all organisms do that because you're also going to release the myoglobin so the equilibrium for a Lama is different than it is for most horses the nature has it at fetal hemoglobin is different from adult hemoglobin why you don't you can use well so in that case why on earth would you even have hemoglobin in the first place in a fetus where does it get its oxygen so that it needs to steal the oxygen from the mother so that so that the fetal hemoglobin needs to have a much higher affinity of oxygen so that it can steal it so that they don't necessarily mix the blood right but through the placenta and everything so you need to have an oxygen that is to steal the oxygen from the mother so that the fetus gets it and of course but the second you're born this genes tend to be shut off and they would be inefficient have the other thing that we appear to see is that I showed you some example about that on Tuesday that eukaryotic and in particular vertebrate proteins they are more complicated than prokaryotic ones those ion channels for instance the first ion channels that people determine were prokaryotes and in prokaryotes you don't have voltage you don't have nerve signaling but you have exactly the same type almost the same type of potassium channels in bacteria but in bacteria they're controlled by pH they just change the pH and they will open or close and then in a human and they consist of four small domains and what still has happened and that's pretty much three helices per domain what has then happened in a human is that you've added an entirely separate unit that is the voltage gated unit but the central part of the channel is exactly the same in bacteria but this extra unit means now in humans we can control this with voltage across the membrane so that humans frequently do more advanced vertebrates in general do more advanced and complicated functionality and it's I wouldn't say that it's never happened but it's rarer with multi-domain proteins in bacteria so can you imagine why bacteria don't like this advanced proteins wouldn't bacteria want to be like us so the point is that it's yes the point is that we are the inefficient organism and they are the efficient organism because a bacterium that would require seconds for a process to happen they're going to need to go through a generation in 15 minutes they can't afford it so bacteria I would say they're in many ways they're the pinnacle of evolution because anything that is not super efficient has been killed by natural selection and that's also why they have exceptionally small genomes just a few thousand genes you only have the genes that you absolutely need for the bacterium to survive so that anything any advanced things like this right there's no way bacteria would ever have it so that the overall patterns of these things that as I mentioned with the iron cells right that we frequently use the bacterial ones as model organisms but there's more embroidery in human proteins human proteins are frequently less stable we don't really know why that one it's harder it's always harder to work with human ones oh yes sorry I even had an image of this bacterial channel and it's bad because I should the coloring is off here sorry about that but do you see here the central pore here and even those loops there it's exactly the same gating mechanism there and the fact that you have these helices tilting here it's exactly the same way the helices are placed here but then this domain then has this part added to and that domain has that part added so these are the voltage sensors that instead help push it open it's a very common difference between your carats and pro carats how fast do you think this happens hmm of structures how do you think it would take for a protein to evolve a new functionality they are the same function right yeah they're the same function in the sense that everything depends on what you mean by function same function in the sense that they both conduct potassium channels this one gates by controlling the pH that one gates by changing the voltage this one is not sensitive to voltage change so function yes the effect is the same but they're controlled in different ways but take a guess if you would imagine that we had high pressure how fast can things happen yeah remove the million part it's a really cool paper seven years ago now in science it's tom cod in the Atlantic and apparently if you live very close to the Hudson River the level of PCB in the Hudson River is insane don't eat fish from the Hudson River and it's just basically in a few generations they can actually trace this that based on where they fish the cod there are certain genes that have been changed so that tom cod living here they developed new genes here so that the proteins are resistant to the PCBs it's basically environmental toxic chemicals in the environment basically pollution by humans and you can specifically see the alleles and everything super school study I uploaded a copy it's not really the evolutionary study it's a bit off from this course I don't expect to read the entire paper but if you want to read it it's present on the website oh that's good it was a while ago I think they're comparing this to older samples of the fish or something but the point is that this can happen super fast and the other obvious thing with the timeline when did we start releasing PCBs? it was not a million years ago it was in the 50s well it's not really a new function but it's just that it's not that we've developed completely new proteins we have adapted the proteins functionality and I don't know what specific functionality PCB would have on these proteins but for whatever reason the nature has evolved the functionality a bit so that we're no longer as sensitive to PCBs and that can happen in a few decades but this is of course an example where the evolutionary pressure must have been insanely high around here right that you can't it was likely so polluted that fish couldn't even live there and another example where you see this is actually around the Chernobyl reactor in Ukraine animals have caused me to create a mutation so they can withstand radioactivity so that we have a case I guess the negative way of looking at the world is that the humans polluting the world is at the end of nature nature is pretty good at that thing doesn't necessarily mean that we will be around but nature will likely be around but so that the only thing we're going to create is that we might kill off the humans but nature will survive but if we go back to this thing that if there is some sort of structural evolution here or that we say that there are a handful of structures that are better in some sense than others why are those better and what we said earlier I think it was already the first week we argued for the importance of hydrogen bonds having lots of hydrogen bonds is good and that means that to create lots of hydrogen bonds we can't really put a whole lot of loops or coil in the inside because if we started putting those loops on the inside we would not have all the beautifully paired half a helix and beta sheets so even when we start doing things there are floppy things that can form a lot of good hydrogen bonds or secondary structure we have to place them on the outside so the cores need to be regular alpha helix and beta sheets and already there we've started to discard an insanely large fraction of all the proteins you could have right they have to be regular in the inside of loops on the outside the edges of in particular beta sheets but also the ends of helices must fade with water roughly for the argument I made about the membrane proteins that at the edge of the alpha helix the hydrogen bonds well the peptide groups that means that the hydrogen bonds are not as paired at the edge of the beta sheet if this beta sheet did not face water there well all these unpaired hydrogen bonds there then would be placed right in the middle of a protein that would be astronomically expensive so now we've also said that we can't just take a beta sheet neither the loops can't really be on the surface either the ends of things also have to be on the surface and that means that it's going to be much more difficult for us to create something very large because if we created a very large proteins we would have too many of these edges on the inside of the proteins so we just as we're talking here we're seeing the number of possible folds just dropping away helix and sheets we just must be separate might be a strong word but if you look about these classifications where we mix them right there are ways of mixing helices and sheets remember one of them there were two common folds sorry Rosman was one of them and the other one was Tim Barrell but are they if you really look at those folds sequence wise I buy it helices and sheets are mixed they're perfectly mixed but if you look at the actual structure are they mixed because there's a Rosman fold here right that the entire sheet is one sheet and then the helix one region there and helice are one region there so even in these structures that mix them perfectly the secondary structure things are still discrete so that in practice the helix and sheet regions are largely separate and then there are even fewer ways we can combine them because if you now have a small structure and if you have one sheet region and two helix regions yeah that's you can have helix helix sheets or helix sheet helix and that's pretty much it and then you can choose a little bit how you orient it orthogonal or parallel and somewhere there somewhere around there my imagination runs out you can probably find something with a loop or something but the point is that even with something fairly large like this there aren't that many the reason why this is a bit large this is also a dimer so if you cut this in half there aren't really that many ways to organize it and what this somehow hints is that these are they're rules but they're not hard laws but the point is that anytime you make one of these exceptions you're going to be paying and you can of course afford to pay now and then but you can't always pay for all four of them so in general exceptions anything that so called defects that violate these laws are going to be costly and what does nature think about things that costs it's bad right because if you now have a protein that has this cost say an edge inside protein and then one of your offspring suddenly has the reverse where they put it on the inside that is going to be a more efficient protein it will cost less energy to hold it it's going to have an evolutionary advantage and it will change might not be an insane evolutionary pressure so it might take a million years but eventually it will change so we already spoke a little bit about these layers the problem with one single layer it's by far the most simple but you can't really do anything with it right one floppy beta sheet the first thing is that it's going to be floppy like a piece of paper and the same thing is I'm not sure whether you saw the news the other day it was got proof of the EPA he had asked to get the $70,000 bulletproof desk and was apparently rejected and then people are making jokes about this because a bulletproof desk doesn't really help you because you can just move or go around the desk and shoot the guy instead it's the same problem here right one layer yes well what are you going to do with this layer you can't separate anything from anything else because you can just diffuse to the other side two layers we're in nirvana this is the fatty acid binding proteins the smallest things you can imagine the grade for shielding it is also small and efficient and while I said beta sheet here it doesn't say sheet it said layer the myoglobin fold you have one small cavity one small hole in the inside where you bind the protoporpherin group the heme group and then an outside so if you want to keep things simple stay there three layers well that would be the rosman folds right and there are cases where you could have those two cavities if you need more than one binding site technically you could have two proteins like that but if you need lots of these well having something with two cavities is efficient and good so you will definitely see these two four layers now it starts to get complicated you don't now you're going to start to you need to bury some hydrophilic amino acids here because if everything was hydrophobic they would not be stable and everything I can't immediately think of any protein that has four layers they do exist but it's and the reason that I can't think of it is of course a good indication there is no obvious reason why you would absolutely need four layers that you can't do with two or three at least so that's going to be complicated to create you need a very special composition of amino acids five layers forget about it I will eat my left shoe if you can find a five layer protein don't take that as a challenge I bet there is one exception in the PDB but I bet you can't find ten yes divide it over the 130,000 structures there is five difference per individual in one form because it's not being too expensive right so you need to nature needs to keep things simple and with that argument any time you're going to need a gigantic protein such as the one of the iron cell anything in a human in particular it has to be decomposed in smaller units and in effect you're going to have those smaller units be the ones that are the folding units I will show you one more slide and I will take a break so what the book and I love the way the book brings this up because the point is we always ask we always start from a sequence and say how will this fold but I think the much more interesting question is the opposite if you have a given fold say a globin fold what sequences would fit that fold because we know there are very few protein folds and I think you tend to think that any sequence will form a protein and that's actually wrong if you just randomly create amino acids sorry you can you can do it to the days you're dead you're not going to get a protein on average so there are exceptionally few sequences that will even be stable in folds and that means that it's much more interesting to think about if the folds are so special what is that determines that a sequence is so special that we'll actually find one of those folds most sequences will not form a protein and it turns out that some simple folds can host almost not any but lots of sequences and that's likely why they're so common and as we're going to talk about after the break here that if there is now some sort of defect one of the exceptions I had on the last slide if you have a fold that somehow needs to create a very tight loop or something there are going to be very few amino acids that can fit in that loop only glycine basically and now you said that well you just reduced the number of sequences you can put in that loop astronomically right because it's only if you have a glycine in those positions that you can even form that fold and have it be stable so that if the more defects you have you're going to need some very special amino acids maybe even some disulfides or something to stabilize it and that means that that fold can only accommodate very few sequences and it's going to be much less likely to find that by evolution or spontaneous folding good liberal folds that don't really have defects they can host almost any sequence but they can host many more sequences and that's where they're going to be more common but we're going to talk more about that after the break it's 1027 so let's meet here at 11 after the same the concept here I brought up we somehow want to understand why do some sequences fits these common folds that we see everywhere in these sequences that make the work that way one example that we already spoke a lot about is the Greek keys and the same thing here is that we need for this fold to even be possible to create we need to have things that like to be in beta sheet all the way here or they're not going to be stable in the beta sheet part but equally well we also need some super tight turns for the inner turns here they should only be 2-3 residues if they're more than that they're going to start perturbing the next term and then we on there that we need something a large turn a large stretch of amino acid here that prefers to be coiled or turned and that might seem obvious but you might need 10 amino acids here what if there are 3 or 4 of them that suddenly prefer to be in an alpha helix then you wouldn't get a Greek key right so for even for this fairly simple structure to form there are lots of restrictions glycine is there things that don't want to be helical there but they have to be the same length of all the beta sheets if you the same thing if we compare different types of proteins globular and membrane proteins globular can be a little bit mixed up hydrophobic versus hydrophilic so hydrophobic hydrophilic hydrophobic hydrophobic well if this is probably more alpha helical than beta sheet right so one side of the helix might be hydrophobic the other one might be hydrophilic you could imagine every second one is hydrophobic and every second one is hydrophilic for a membrane protein it's much more common to have one hydrophobic region and then a hydrophilic region and then a hydrophobic region because they correspond to the trans membrane segments and the fibrous proteins on the other hand they have exactly the same they can have a mix of hydrophobic and hydrophilic but they need to have the same repeat happening many many many times and again as we're seeing we're seeing all the freedom we thought we had in the sequences slip away and we've kind of seen that a couple of times that I know when you see this this doesn't seem so bad that these defects it can't be that bad what if there's one unpaired hydrogen bonds we can survive one bad hydrogen bonds right it's just 5-10 kcal per mole and what is the total stabilization the total energy at least of an entire protein that must be astronomically higher right it can't matter if you have 500 hydrogen bonds in a protein how on earth can that 1 or 2 hydrogen bonds matter it doesn't make sense same thing with beta sheets the beta sheets should be large it shouldn't make sense and yet when we start looking at it these smaller you always see things going over that way you virtually never see something like that with beta sheets I even talked about these left versus right hand turnovers it's a very small difference it's slightly better to move it in one way and what always happens when there are two ways to do it and one of them is slightly better we never see the battle turnover although it's just one or two hydrogen bonds that matters so there's something here we don't understand the difference should be small but in practice well the difference is small in terms of just 5-10 kcal per mole and yet that small energy is enough to decide that you will never see a protein versus a super stable fold so the stabilization energy of proteins is surprisingly low you can think of it as a bit of other ways you can think of this in terms of entropy so maybe it's this that folds that if it's a very rigid small fold then you're going to need to put things that's a very low entropy state there's only one specific way we can put all the aminase there while if you have a fold with a bit of flexibility that would have much higher energy and that doing much lower entropy there so you would have much higher entropy that you have more freedom and that would create a lower free energy so there's also an effect that folds that can choose between many different conformations at least local conformations because if you start to unfold it's no longer in that fold if you have some sort of built-in flexibility in the fold it's likely good for you too don't worry we're going to come back to this when we talk about protein folding transitions and here you should be happy you've seen this before right how do you determine how likely things are depending on whether you see differences in energy or entropy the probability of seeing something is proportional to an exponential minus delta E or delta F depending on whether you're looking at energy or KT the only problem is that it's completely wrong it's completely wrong here astronomically wrong there is no detailed balance and here we have this term why on earth does he keep bringing up detailed balance what was the assumption under what conditions did the Boltzmann distribution work equilibrium and equilibrium requires some sort of change right that you're visiting different states and arginine never visits a serine state the second you have picked your amino acids you're stuck with those amino acids so we never change between different things so the problem is that the Boltzmann distribution directly can't explain why some amino acids are stable and others aren't and this detailed balance has to do with this concept that you move both to the left and right over the barrier all the time but an equilibrium means that the flow over the barrier from the left to the right is the same as from the right to the left and that's why detailed balance was so important when I introduced it so you don't have any you need to have an exchange between the states and we don't have that Alexi likes to call this the multitude principle which I think is a fitting name but and his suggestion the way to think about this is that the more sequences that can fit a fold without disrupting that fold or introducing defects in it or something the more frequently you're going to see that fold because again think of that as a liberal fold it will accept almost any amino acid well a fold that is very picky is not going to see basically as a restaurant that's very picky with this guest is going to have fewer guests so that there is still something that's similar to the Boltzmann thing that defective things are not impossible just as high energy things are not impossible it's just that they're less common and we've seen this with helices and sheets and everything there are a limited number of folds for globular proteins you could argue that that was the case when we saw that we only had helices and sheets on the secondary structure level yep so remember now we were specifically talking about when we're looking at different sequences so if you have one sequence and you think about what are all the possible states that this sequence will fold into and which one is going to be most stable then you have detailed balance and then we have an equilibrium between different states and then the Boltzmann distribution definitely applies but if we're looking at should there be an arginine or serine in this particular sequence that is not something where we can use Boltzmann distribution for directly because we don't change them and even if you think about that even the amino acids we have roughly the same number of hydrophobic and hydrophilic amino acid in your genome it's almost 50-50 and that starts to relate already on the secondary structure so what fractions of amino acids create heli, sorry, what patterns in hydrophobic versus hydrophilic create different types of secondary structures or if you look at the secondary structures that prefer to be sorry, the amino acids that prefer to be an alpha helix secondary structure versus amino acids that prefer to be in beta sheets secondary structure what determines the size of these elements how large they are and how stable they are so what do we need to get an alpha helix yes, but four residues doing what? Four residues that want to be in an alpha helix and similar for a beta sheet so when does the alpha helix start or end if you want to ask how likely is it to have 20 residues in an alpha helix or a residue that does not want to be in an alpha helix so at the start you need to have something that does not want to be in an alpha helix and then some things that want to be in an alpha helix and then other things that don't want to be in an alpha helix again this gets complicated if we're going to start looking at the secondary structures here and everything so you know what because here you have some sort of the repeat here might be 3.6 residues per turn the repeat here would be two residues this is an excellent example that we don't want to confuse ourselves with all that detail so let's create a much simpler model let's just talk about these repeating patterns that I mentioned so let's say that we have some sort of red pattern that is something what we're interested in and the blue dots is something that is not that this can be an alpha helix or a protein that could be anything because what I'm now interested in how large are these regions and why and the book goes through this in a bit more detail but I'm going to skip through it a little bit just to get the idea whether you call this here I said polar versus non-polar forget about this specific property but you have P is some sort of property and the probability these residues have that property they like to be in alpha helix or between 0 and 1 and if we then want to ask what is the how long are these groups of those elements that we see well to get something like what is it 1, 2, 3, 4, 5, to get 8 elements we need to have something that is not P and the likelihood of having that is 1-P and then we need to have 8 of them that are P and instead of 8 we can say R so that means that 8 are such terms and then one more term that should not be P right the probability of having R such dots after each other and then if you want you can do the math it's not super complicated but it's I am a physicist I do think this is beautiful even I wouldn't remember this but just as you need to know your amino acids I've done enough math and physics if I see that I think oh there are rules for these series and I would look up my mathematics handbook and then I would find out that there is a for that particular series something to the sum of P to the power of R from 1 to N there is a formula for it and then you use this formula and then you need to recognize that the upper term here the R there in front of it complicates things a bit right but you can take this normal series and take the derivative of it then the R is going to fall down here and then you have to change that it's actually fun but it would probably take me to just follow the book and it's completely this has to do with that we teach math different ways from what we do math I think it's a super fun exercise but I'm not going to bother you with it because it's not central but if you do all that that expression that is the likelihood of having a certain length because we want to calculate the average length weighted with the likelihood of having each length it simplifies to this expression and again I don't expect to know it by heart but the point is a fairly simple expression that just depends on what the probability p is of having something in the red state and if you then take p and say that that's roughly 0.5 you would say that if there's something that is 50-50 the average sequence length we would have of these things is roughly 3 but that doesn't seem to ring a bell or anything but this has to do with that we now there's a gross simplification and any type of property you just need to put them after each other and the probability is roughly 0.5 you're going to see the average length of such elements is roughly 3 and then we could argue that in alpha helix well the repeating unit we needed to have you're not going to have an alpha helix before a residue you said 4 this would work with 4 too so 3 times 4 that would be 12 residues and that's kind of maybe a little bit short because again this was an approximation but the point is we don't have 100 residues in an alpha helix and the reason for that is that it's not that you can't have it if you actually do take 100 residues that would prefer to be alpha helical they would form an alpha helix when we just randomly mix put this in a sequence after each other in the genome the likelihood that you would never ever have anything that did not want to be alpha helix is very low so remember that we had earlier on in the course we had this argument about the shortest length of stability of the structures and the shortest length had to do with the better elements and now we can certainly start conclusions about the longest stretch for the beta sheet again the repeating unit there would be 2 right you need to get back to the same side you need roughly 2 and 3 times that would be roughly 6 and that's also a bit short so this underestimates a bit but the the sizes we have on these elements is not given by evolution or anything it's fairly simple physics they can't be too small because if they're what would happen with the secondary structures we would pay in free energy because we would only get the barrier and we would not get the gain and here's the difference they can't be too short because that's bad for physics the reason why they can't be too long has to do with the probability and the genome it's very unlikely that we put hundreds of residues after each other that would all like to be beta sheets so there are different things that determine the lower limit to the size of protein the size of secondary structures the upper sizes of secondary structures and you can do that exercise this actually applies to loops too then you would have 3 states right then the loops should neither be helix or sheet so that loops too they have these characteristic lengths it would be rare that you would have 500 residues and none of them would ever like to be helix or beta sheets and then we removed even more of the structure of freedom we had you're not going to have very large elements and you're not going to have very small ones yes so this is again it's a probabilistic upper limit so this is just the your genome is not entirely randomly organized but when you combine this with natural selection if things are well in your genome you know what the fraction of residues are that prefer to be in alpha helical shape so it's not really an upper limit apart from the number of genes in the number of expressed amino acids in your genes but the probability will of course be so low that in practice you're never going to see it there are some exceptions but normal I mean normal helices would be up to the longest you would see in a normal protein might be 30 residues long there are some exceptions and when you have coiled helices the alpha helices in your hair again it's an exception but we still haven't answered the question that we were after that 5-10 kilojoules or 1k Cal why does that kill a protein and we did study this the very first week and the very first week we talked about how much we gained when we took an amino acid and put it on the inside of a protein or solvated it in oil or something right and what we're going to go through the next few slides is going to be very related to that but it's kind of the opposite and this is of course there is a connection to the Boltzmann distribution as simple as we thought so rather than worrying about that well actually you can see the good illustration here is that if you see how common is it to have residues on the inside of proteins in particular what is the likelihood of having something in the core versus the surface and then you see that you have lysine and arginine here in the core as are in the surface and then in the core you only have the hydrophobic ones so there's almost a not perfect but there's very good correlation between how hydrophobic things are and whether they occur in the core or inside and that is definitely related to the Boltzmann distribution because that had to do with individual existing amino acids and if you, let's pick one fault anything, myoglobin and let's say that there is a particular residue 47 that could be either serine or leucine and then we need to go back to what we did the very first week and say serine likes water and oil by roughly the same amount so serine can go either way leucine on the other hand much prefers oil so if we are in the inside of the protein and would like to expose something to water for serine he doesn't care leucine on the other hand then we would pay him to kick us which is bad but if you now think about this inside a protein every single fold that has serine inside will also work with leucine on the inside because leucine is going to be better to have on the inside but then there might also be other folds let's say my myoglobin had leucine in this particular point if I now take this one fold with leucine and move it to serine it's not obvious that it's going to be stable because that's worse so leucine will always be better on the serine so I can always go that way but I can't guarantee that I can go that way sometimes I can and sometimes I cannot go that way so that there is a relation between how stable a fold will be for a sequence and what the hydrophobicity or the solvation of the individual amino acids are and this was related to Boltzmann distribution so if you think about this entire protein now my myoglobin you could have the rest of the protein let's say that the stabilization free end here is delta F whether it's 500 or 5 billion doesn't really matter but it's also the large number that you would then just be delta Epsilon so what we are interested in is that what is the total free energy in the entire system and what is the role of that small delta Epsilon and what I'm going to argue you can actually prove you can prove this mathematically but you can show it based on experiments that if you take any number of protein sequences you can imagine and mix them up and somehow we could measure the free energy how much it would take to fold them into structures positive and others would be negative there is a... have you heard about the central limit theorem in mathematical statistics so if you take lots of things and add it up they will always be a normal distributed any type of statistical distribution if you just draw lots and lots and lots of samples you're going to end up with a distribution that looks like a Gaussian and for that you need to do an infinite number of samples which we always joke about and say in physics it's roughly 12 or so and let's say if you look at the H distributions or length distributions of students or anything it's going to look that way and it will look that way for proteins too and what you're going to need to trust me for a second and say in general this is going to be positive so for general sequences they won't even form proteins but of course there will be some small fraction the black part here that will be stable in this fold because we have seen the myoglobin fold so the blue ones that are stable but let's then assume that I introduce one of these defects the important thing is going to be when I cross this zero bar right if I stay below zero well it's not good that I introduce a defect but in total I will still fold my protein but the second that this effect is large enough that I cross zero the protein will no longer be stable so the part that matters is not this entire thing but for the individual mutation what matters is will I cross the zero bar or not and since this is going to be very close to zero here we're not down by the door there this is a very small component and again if you're into mathematics since you know the shape of this one and you have an average which is delta F we also have the standard deviation from the center and you can formulate this as a Gaussian and if you formulate that as a Gaussian the likelihood that we are on the positive side so that we're still that the defect here doesn't screw up the total stabilization of the entire protein so that I actually stay as a folded state that would be an integral of this distribution I showed on the other hand and you're going to do a bunch of mathematics because this is an exponential you will eventually end up with an exponential expression here and that will depend on the small stabilization energy and the denominator here you're going to have the properties of the distribution on the other side and that had to do with the standard deviation of the distribution and the average value of the distribution don't worry I don't expect to know this by heart so you end up the proportionalities we don't care about but the point that the likelihood that a fold will be stable the likelihood that a fold will be stable with the sequence you just gave me it's going to be an exponential that depends on the energy and then the sigma delta F you actually know what it is have you seen things that looks like this of course you have this looks it walks and quacks like a Boltzmann distribution right it has exactly the time of shapes but there is something that you don't see in this one which is what it doesn't say KT right so it's not the Boltzmann distribution it's just very very like a Boltzmann distribution because it's coming from similar properties but there is no KT here directly so if this delta E goes up the defect the number of folds that will still be stable well this probability will go down right so as the defect energy goes up the probability will go down and this is the other reason why two weeks ago I had this corny slide where I showed you how quickly the exponential distribution where sorry how quickly the exponential function grows it grows tremendously fast so as this you might think that the energy is low but as the energy starts going up here the probability here will very quickly be so low that things will never be stable you can actually see something else from this that is pretty cool so we said delta E for a hydrogen bond that might be say 5 kilojoules per mole or something what is all the stuff you have here in the denominator I don't have this equation on the next slide so just have a look at the equation first A in the denominator and then a denominator that is a quotient between sigma squared and delta F delta F the average energy in something that is proportional to the size of the protein do you agree with that if the protein is twice as large the energy in the protein is likely twice as large when it comes to probabilities the standard deviations I know that you haven't talked that much about this course I actually have a lecture on that at least half a lecture later on the course understanding statistics is important not going to be mathematical statisticians the standard deviation sigma usually goes up at the square root of the size so sigma squared will also go up at the size of the protein but the denominator we had here was a quotient so if sigma squared goes up at the size of the protein and the energy also goes up as the size of the protein this term will be independent of the size of the protein right because you have a quotient of two things that both grow with the size so what we really have here in this equation we're saying that the probability to be stable only depends on the defect it does not depend on the size of the protein yes sorry if we have so sigma is a standard deviation and the standard deviation we have a lecture on this trust me for now the standard deviation will grow as the square root of the number of samples it's so not obvious there's a sigma square which is the variance will grow as a sample so what we're saying that and the second we've seen that you can almost forget the denominator it's just some sort of constant there so the likelihood that a protein is stable if your protein is 5000 residues or 15 residues the only thing that is important is those 5 kilojoules per mole if you have a protein that consists of 100 residues that can probably work it's just you're going to need to make sure that you don't make the mistake in 20 places imagine you had a protein of 50,000 residues you can't make a mistake in 50,000 places so that it's the size of the protein is not relevant it's only the individual defects energies that matter and if you can actually think I'm going to I will come back to this KTC later you can actually think of this denominator as some sort of characteristic energy and the reason for that is that if the temperatures around you are if the thermal energy around you is higher than this energy we're going to unfold the protein, right? because if the stabilization energy of a protein corresponds to KT then we will have more energy than that at room temperature the protein will not be stable so you can if you can, and this is beautiful you can use temperature as a scale for energy so that the stabilization energy for most proteins tend to be 350 to 400 Kelvin which corresponds quite well that you can denature them by heat at roughly these temperatures that's parenthesis for now but the important thing is that never compare the stabilization to the entire protein, just the defect it's around the zero mark here and that also means that the vast majority of the structures, all the things up here will not fold proteins you can go and synthesize as much as you want it's only if you by chance happen to end up in the black or the region below zero here that you will fold the protein everything else is just going to be a waste of money and synthesis then of course we haven't said how large this area is but it's smaller than you think we spoke a little bit about packing I will skip that and here too I will skip this too the book goes into some curiosity no actually sorry, this I will include this I will include because it's fun you can make the math here but is it more common to have sheets or helices in side proteins and a helix because it's more, because you keep going rotating around, for a given length you need roughly twice as many residues in a helix as a sheet the sheet is more stretched out so let's say a factor two to make it easy so that if we have the same upper side and the same roadside we take make something as helix or sheet well having lots of beta sheets that means that you have six small components instead of three large ones which one will be able to organize in more different ways you will have more entropy if we have smaller things we can organize in different ways so in terms of entropy that's going to be a better fold so we normally want beta sheets on the inside and then maybe alpha helices on the outside in particular the interior must be hydrophobic slightly easier to achieve with beta sheets so there are going to be many more sequences where we can create that way so in general you would not expect to see alpha helices on the inside of the protein that's roughly okay if you look at the protein data bank it's much more common to have beta sheets on the inside of proteins you will virtually never see a you will virtually never see a protein that is beta sheet on the outside and then a helix or two on the inside and then another beta sheet and it's always good to make those checks that are actually true but then there is the rule there is always the exception to the rule in biology GFP, very cute protein the beta sheet and then a helix right in the middle can you imagine why it folds this way? this is not really many different beta sheets it's effectively just one beta sheet so you don't have a whole lot of freedom here in organization and they have to go around each other this would just be a curiosity and physics and everything just to show that it can happen it turns out that this is a super important protein because it's fluorescent not only is it fluorescent but it's something that's a protein that's fluorescent that will bind different chromophores we can express this if I inject this gene into you you would be fluorescent if we have the right chromophores so this is used in biochemistry labs all over the world including this one we love GFP here and it's even cooler than that so that depending on the chromophore you add it you can get normally normally it would emit green light when you irradiate it with ultraviolet and that's why that's the G in the green but depending on what motifs and structure here you can get it to bind different chromophores you can get it to be blue, green, yellow cyan anything you want and there was a group that was really skilled at starting to use this protein and everything these things do occur in nature this one version of green fluorescent protein Roger Chen unfortunately he died very young a few years ago so sad but his lab this is literally a small pitredish with lots of different versions of green fluorescent protein you can add as many colors as you want and somewhere here you would think that yeah that's really fun if you're a researcher but I'm so tired and this would grow up why are you sitting with coloring pencils is that this is one of the coolest discoveries in recent years because you can use this and they have used it they use it, you can mark things in life tissue and if you use this surgeon let's say I think yes I do have it if you're a surgeon and if you're going to cut a patient that has a cancer tumor or something I'm not sure about you but can you find the cancer here on the left side but if you now create a green fluorescent protein that will and combine that with antibodies or something so that it will bind specifically to the cells that have a cancer tumor and then you get that view green is cancer can you imagine how efficient you're going to be at cutting here and again if it's a very early stage tumor doesn't matter but if this is advanced you can't cut out the entire patient the patient would die or if it's in the brain or something so being able to do a minimally invasive surgery but it's still very important to actually get all the cancer cells so you probably want a bit of margin here sorry this was not in brain but a variant cancer but there might be another complication here what if there is now a nerve we know that there are nerves going here if you cut the nerves you're going to make the patient limp but you can add more GFP's so now we add another GFP that marks just the nerves so you can literally and they have been doing this in patients so you can now have another so you can basically tell the surgeon cut green but not blue yeah but the cool thing is that we're not talking about tear this works so it's amazingly cool what you can do with it there is one small simple protein so you got the Nobel Prize for a whole lot of this work so that you start to see after a while there is a certain pattern between the things that we find in this department and the Nobel Prize because half the Nobel committee is the colleagues upstairs we're going to let's see we're doing good on time yes no it's not enough to do with GFP so GFP works really well as a marker selective marker that you can get it to shine a specific color to get it to bind to a specific tissue you typically need an antibody that you know will bind to certain proteins that are expressed say on the surface of a nerve cell then you're going to get this antibody to bind to the nerve cell then you can tag along the GFP as a specific fluorescent marker and then you tune your GFP so that this particular one will be red that you paired with antibody X and then you know that anything that antibody X binds to is now going to be red and then you create another antibody Y and mix that with another GFP and that way you can say that anybody that antibody Y binds to is going to be was it yellow or whatever so this is also very much related to the thing I said about antibody design right you want to be able to design antibodies to very specifically target proteins or anything that's on the surface of the cells but it's not that difficult to identify cancer tumor cells for instance because they express a huge amount of things on the surface the problem is that it's not ideally you would even like to selectively kill it or something that we can do or in some cases we can but not in this case we're going to spend another part of the interactions and gradually get a little bit more into transitions here I don't have quite as many slides tomorrow so let's see I promise we're going to end sharp noon today and one of the things that we spoke about with the study questions this morning was how long it's transitions between proteins and I spoke about that when we talked about iron channels too this in general would be exceptionally boring if they did not bind and do things this is something that you might not have seen that much about the structure of bioinformatics the problem is it's hard that as simple as you thought you were that you only had 20,000 genes for each of those proteins there are going to be 20,000 partners that can interact with so that's 400 million possible pairwise interactions and in some cases there is more than two partners in an interaction, there could be three and they multiply by another 20,000 so that while you might feel that the number of proteins is limited, the number of potential interactions between things is the same but we don't know a whole lot about it because in some cases we have X-ray structures of interacting things but in general it's hard and it's even harder if you don't know what it was interacting with because in X-ray we would need to co-express them or something so this started very early actually even before we had structures of proteins the obvious was what happens when things bind even these ion channels that I spoke about yesterday and you might have seen this in textbooks and everything that you have some sort of protein and enzyme and active site and then we have our magic substrate here and it fits perfect as a lock and key and if you have the right key here the key is going to fit in the lock and we are very happy and it's a beautiful world and we are very proud tell me if you ever find a protein like that because that would be interesting they don't exist but this was still remember that I said multiple that models are never right but they can be useful but this was a very useful model because it helped us to think about it so this is called lock and key hypothesis historically important but no real protein works out for you then David Koszler came up with a model called induced fit that for a very long time is how we've been thinking about this and here you have the same type of enzyme but do you see what happens there that what is the fit between yellow and green here not particularly great right but somehow when the green substrate here binds to the enzyme the enzyme will change shape and fit the substrate really well so that it's really that when the substrate comes here and pushes in the substrate the substrate will induce a fit so that these two together when they are forced to be together they will suddenly adapt and fit really well you can imagine like a crowd of students and if there is a crowd of 5 students and the 6 students comes in you will adapt and make room for the 6th one that is probably good in a whole lot of other reasons because it will also mean that once you have adapted here you've likely found a more stable thing with protein folding if this state is if this state is now more stable and everything you're not going to want to release it and go back until some sort of reaction here has happened in your substrate and with that reaction has happened then we want to release it so then it's good if it would release that is a model that a whole lot of us have been very happy with for a few decades the more modern structures we see we're giving up on this too and and the reason for that if you start to look at our ion channels and everything it's so much more complicated because we also need to think about these barriers how quickly do things bind and the ion channels I spoke about all these with anesthetics and everything we actually know that they kind of breathe the channels is sometimes open and sometimes closed and it actually has to be that way because if it wasn't that way imagine that if you really wanted your ion channel to stay closed then you would need a very large free energy barrier right so that we stay close here but like two seconds later you know somebody's coming around screaming we have a nerve signal that has to be delivered suddenly you need to get over that point the amount of energy you would need to waste in your nerve system here because now you would need to spend a huge amount of energy to suddenly get to this point instead and now we're here phew the nerve signal was delivered your orders are now to go back and then we need to go back here it would be so inefficient that you couldn't do it so what rather happens is that you have fairly low barriers so that under normal circumstances the channel would sometimes be open and sometimes be closed it's not specific to a channel G-protein coupled receptors most of these structures remember the receptor the tyrosine kinase receptors I spoke about that they somehow spontaneously dimerize and then they diffuse away again and the reason why we're learning more about this where we can see many more things with time-result spectroscopy and everything the proteins move all the time but in most cases two proteins bump into each other they might release again but that does not mean they don't bump in so what we've learned now that there appears to be most of these processes happen that it's not that my small molecule that would normally open the channel forces the channel to open up like putting your like forcing the door open but it's more like a door opening and closing all the time and then you put your foot in the door and when my foot is there you can't close it so that the small ligand binding here would the molecule that would normally be say 90% closed 10% open when I put my small ligand there then suddenly it's the opposite 90% probability to be open 10% probability to be closed so it's selectively stabilized one state but I didn't really have to alter the barrier I didn't have to force it across the barrier but the barrier was so low that this could spontaneously happen and then the binding here just alters the equilibrium a bit and I would say that 90% of the proteins we look at today are selected fits this is the model by which proteins interact is this important for you beyond a mere theoretical exercise in physics any drug you're going to need to design in the future will have to be based on these mechanisms you're going to need to find a drug that alters an equilibrium between states or if a channel is too close and you want to open it you're going to need to find a way to alter that equilibrium can you favor the state you want or disfavor the state you don't want remember you can actually do both so that any time you're going to do something you have three states to play with start, middle, end and if you want more of that than less of that or do you want more of the transitions you want things to play with here then you can imagine and virtually all of these things can be tuned by binding the right molecule to it I'm going to show you a couple of it glutamate binding protein can you guess what the yellow part is it's glutamate I've done we wrote a paper there the entire protein here undergoes almost over half a nanometer of motion every time it binds the protein there's a very large change and it's kind of it's almost like you have a hint here in the middle right you see that you only have the blue copy is one protein you just have two beta sheets here in the middle you can probably say guess something about the rigidity of this structure you have one fairly rigid domain here and then very floppy here and then very rigid here so somehow nature has over billions of years of evolution coded for this that if you want something to move you're going to need some small very flexible region there in the middle and Laura I think that some of you talked about her research point which is actually working on this type not this particular protein but similar types of proteins involved in growth factors and everything cancer and it appears to be that in many cases these intermediate states when there's such a small hinge undergoes through a motion that really determines the binding state to which the antibodies bind where we can identify this as cancer and tumor growth and everything it appears that the transitions are important not just the states of proteins and I'm so not saying that you should start doing computer simulations for all of these things but there are a ton of experimental methods that are very good at indirectly getting access to these transitions so don't assume that your structure is everything we also need to understand how they move I promised you another protein a few lectures ago hemoglobit some of you might know this the coolest transitions or concepts of allosteria in nature do you see that there's a slight difference between two structures here I'm just alternating between two structures so first hemoglobin was the more complicated brother of myoglobin so that each of these there are four subunits here and each subunit is a globin fold and the hemoglobin molecule needs all of them so you have four binding sites for oxygen and then you have two shapes here you have oxiform deoxy oxiform deoxy people have been able to determine crystal structures of this in a saturated oxygenated state with lots of oxygen and you can even see that the heme group changes shape a bit and then in another crystal under deoxygenated oxygen-poor conditions so depending on whether you have oxygen bound or not the hemoglobin will change the structure so slightly so what was the other thing we said about hemoglobin we had these things with hemoglobin sorry hemoglobin in your lungs versus myoglobin in your muscles so what happens is that this is the same type of allosteric modulation as we had with anesthetics but it's more complicated because it's the oxygen itself that is an allosteric modulator so what happens here is that normally hemoglobin is not particularly fond of oxygen it will bind but it doesn't really love it and that means that the first molecule oxygen molecule I bind I will bind it but what happens when you start binding more oxygen the mere fact that you're binding oxygen induces a slight shift in hemoglobin structure and as hemoglobin undergoes this shift it moves to a state with higher affinity for oxygen this means that you're going to get normally any normal molecule would behave like myoglobin here so that forget about the red one for a second on the x-axis here we increase the amount of oxygen and the y-axis is here the saturation is effectively how much oxygen I have bound so any normal molecule that would like to bind oxygen is going to be super happy when you start having oxygen but eventually you saturate because most of your molecules will already have bound oxygen right so that the effect is strongest in the beginning when you bind oxygen hemoglobin behaves in exactly the opposite way when you don't have a whole lot of when you don't have a whole lot of oxygen it doesn't really help that you want to go to the services because there isn't a whole lot more oxygen to bind so the affinity is going to be low while as this goes up suddenly myoglobin, sorry hemoglobin wakes up and in your lungs where you have very high oxygen pressure hemoglobin will be fully saturated so what's going to happen is that in the lungs hemoglobin starts binding oxygen and because there is lots of oxygen it's suddenly going to be really good at binding oxygen in the lungs it will steal any oxygen it can in the lungs and then it moves out in the bloodstream to your muscles but in the muscles your oxygen pressure is low so that there will be an individual oxygen molecule here and there that unbinds if hemoglobin behaved like myoglobin it would stay there there would be the individual molecule here and there but what's now happening is that the foot in the door effect disappears suddenly you no longer have the foot in the door and as the oxygen from hemoglobin starts to unbind hemoglobin will relax back and then it doesn't really like to bind the oxygen anymore and then it's going to start releasing even more oxygen so suddenly when hemoglobin is in an oxygen pore region it's going to want to release all its oxygen and then myoglobin takes over and we should be pretty thankful for this because if this was not true we would not be alive so this was a famous paper in the 1960s by Monod and Weiman and Dead is still alive and we're actually we're even colleagues we're here too working together in Charles Narisse I'm actually a bit sorry that they never got a Nobel Prize for this there is this whole culture around Nobel and everything I think this discovery is too old this will never get a Nobel Prize it would have been worth one I've no idea why I didn't get it but in particular with the two first co-authors being dead Charles would be worth a Nobel Prize but he'll actually not get it at least not for this so that brings us to the last yes we have 10 minutes I probably won't finish all slides so the last part that to understand all these transitions whether it's between allosteric states or whether it's folding and unfolding we're going to need to start to look a little bit what happens when you actually fold how quick do things happen and we'll cover this in much more detail much more detail tomorrow but I'm at least going to introduce it so we already separated this concept of thermodynamics and kinetics right and now we're going to move over much more to kinetics we're only going to study how fast things happen and most of the things that we look at just as hemoglobin they have some sort of S-curve so things start to happening but the actual action happens in a fairly narrow range and this is characteristic things are cooperative they help stabilize it somehow and there are a ton of different ways that we can unfold proteins we'll come back to them I also already told you about Christian Anfelsen which was the original experiment here where he could show that you could select you could repeatedly get a nucleus to refold and then you just measure the CD spectroscopy and show that it's actually at its native state again but what all of these things they don't really tell us a whole lot about the protein folding if you really want to drill down and understand what happened in the proteins all we know that all these all are non-transitions beautiful as they are all we really know that it's happened very fast we don't really know that does the beta fold gradually fold and fold very fast or is it somehow I argued that the beta sheet was a phase transition but you just believe me we haven't proven it so we don't really know that is it possible to have any sort of state in between here be stable and that's hard to say conclusive because by definition the intermediate states are difficult to measure they will never be stable they can be at least semi-stable none of this is obvious from experiments we can measure most of these things calorimetry is by far the same calorimetry basically measures how much heat we're putting into things and how the heat capacity changes and here too if you start doing this for a protein the protein will undergo some sort of transition that will depend for instance on the pH and salt concentrations and after this transition it will have denaturated and behaves slightly differently but again it doesn't tell us anything it just says that there is some sort of transition that happens in a narrow range whether that has an intermediate states or something we don't know that's a super hard problem that people have spent decades on understanding and there are some pretty neat ways that we can get some information about it Fanthof started studying this and his question was one that we already touched upon what is a protein I hand waved a little bit the first we still have it answered that what is a protein will 5 residues be a protein so somewhere we need a protein to fold right and have a stable state but again if you say that the protein folds what is the melting we had these gigantic proteins with millions of amino acids do all of those million amino acids fold together or do you have smaller beads that fold along the chain so this is because what is really the folding or melting unit how large is the part that really is a protein core and I already hinted at you that you might have things in higher organisms that consist of multiple units then they are likely independent and if the melting unit is the entire protein or folding unit then it is definitely an all or none transition if the melting unit is significantly smaller than the entire protein then we can have the protein folding or unfolding in smaller parts and you could imagine that the unit required is actually much larger than the protein and then you would need many aggregates of proteins for it to be stable and if you think that this is strange many of these prior diseases probably belong down here we can calculate this and this is going to be another equation and I promise I won't ask you to derive this one but it is a beautiful example of what you can do with fairly simple mathematics and models so what we would like to understand is that what happens when you fold and I will deliberately move forward I am going to move back to the other one if you have some sort of process and then you go through that well there is something happening here over a fairly small temperature range where we measure the energy but if this is a protein and we are going to measure what is the probability that it is as unfolded or something below this temperature is folded and after the temperature it is unfolded and in this range we can somehow talk about what fraction of the protein has unfolded and then we are going to worry about the different states in this case I won't worry about the barrier so that there is some sort of native state and there is some sort of molten or unfolded state and that means that we can talk about the free states and if we simplify the world and say the world only consists of these two states the probability of being in the molten states well that would be the Boltzmann distribution right of that state divided by the sum of both states to normalize it and then it is nice rather than having to work with absolute energies it is nice to introduce the differences between them and if you do this and do a bit of exponential math you can actually formulate this as a slightly different progression but the important part is that only the differences in the entropy and energy go there it is not hard but it takes a bit of time so that means that if we know the free energy between the states then we can say what is the likelihood of being in the molten one and always we are going to need to find something else that describes these probabilities of being in things and that is when we had this curve that anytime I have a probability or something if I want to know for us how this changes and we are definitely looking at the temperature here right if we make a horrible approximation that if it is over a narrow range and from that range we go from 0% molten to 100% molten let's approximate this derivative with virtually a straight line and if this changes from 0 to 1 it is just going to be 1 over the temperature difference so then I have an expression for the derivative of the molten here that we can get from an experiment and I had an expression not of the derivative but of the expression itself on the previous side for mathematics so then I just take and derive that expression and then you end up with a long beautiful expression that we are so not going to derive but it's not again this is not hard it's just that it will take you 10 minutes to do and you can translate that back actually to use instead of having these large expression you can notice that that corresponds to the molten probability we had two slides ago so now we can formulate the derivative on the other hand it's the probability of being molten the probability of the protein being molten and on the other hand in the fraction we saw in this curve and the cool thing is that now we have two numbers we can compare at the midpoint transition let's say that this is roughly 0.5 you are more than welcome to disagree with me pick 7.25 or 7.75 it's not really going to change things substantially but right when we are making the transitions we are roughly halfway molten and then you can get a number that's probability and say that this depends on the stabilization energy and then we get two numbers one number we get from the experiment per melting unit we get this much energy but I also know how much energy we have in total because if you use the calorimeter and if I have 1 million molecules and if I need to add 10 million kilojoules per mole it's 10 kilojoules per molecule per mole right so I now have a number I don't know what the melting unit is but I know how much energy I am spending per melting unit and I know how much energy I am spending per protein and now we can compare them and it turns out that for virtually all and this is more than hand wave for virtually all small proteins this fits perfectly so one small protein the entire protein folds as one and for large proteins this will correspond to one domain so if you have a large protein with three or four different domains and again you have seen these domains in bioinformatics and here is the funny part what is a domain well in bioinformatics we like to think of domains as the part that evolution carries over and when we look at this one the domains are also the independent folding units in proteins and this is of course not a coincidence because if it is very rare and very difficult for random sequences to fold what would you do if you were nature don't try to swap out the individual amino acids swap out the entire domain that will be a stable folding unit we are going to spend a little bit more time talking about this I will repeat the last two or three slides here but there is a fun story here at the end that I will go through all this has to do with denaturation and we are going to talk a lot more about denaturation tomorrow when do proteins denaturate if you just increase temperature well they will eventually unfold because you are boiling them right and delta S one has to be a bit careful here whether we are talking about folding or unfolding but in general when you unfold things delta S is positive it goes up and that is good because you have more freedom the protein is more removable but S will also drop with temperature which means that as you go down in temperature you might have opposite effect here it will be more complicated if you start drawing all these plots and again I am going to repeat this tomorrow but if you start drawing all these plots you are already well aware that the protein has a region where it is stable and if you increase the temperature too much here we are no longer going to be stable but just the mere shape of these curves indicates that if you keep dropping the temperature at some point you should cross the bar at very low temperatures too so proteins should not just denaturate by boiling them but also by freezing and there are some examples a few very rare examples you can show you a cold denaturation of proteins so it becomes too cold the protein will unfold have you ever seen this in an experiment? but that is probably more that is a good one but that is probably more due to the fat actually so if I show you a bunch of proteins here real proteins these curves these are experimental curves the shapes here are interesting right? definitely for the cytochrome there it is definitely going down and the myoglobin too and you can probably guess where the other ones should go so why on earth why on earth didn't we extend the scale? to the left water freezes at 0 degrees centigrade so it is hard to do experiments so what they had to do in the previous you had to add some salt or something to be able to but then you add lots of salt and the protein is going to denaturate with salt so it doesn't really happen but in principle apart from the fact that water freezes this is true, you would have called denaturation of proteins and there are a few cases where this is important cold chalk proteins there are fish for instance that live in Antarctic and due to the salinity of the water there the water can be minus 10 degrees centigrade and these fish then they need they express special proteins special types of protein special amino acid composition to make sure that they are stable even at very low temperature there are similar effects at very high temperatures for bacteria living in world canes that need to be able to live and have their proteins be stable at 70 degrees centigrade we are going to come back when we talk about those bacteria later in nucleic acids but that's all I had for today a bunch of study questions and tomorrow we are going to be looking at the really fun kinetics because tomorrow we are going to be able to couple the most complicated things what happens on scales of individual molecules with experiments that we are actually doing upstairs here super fun