 What we're going to speak about today is mostly protein structure, real protein structure, but we're going to do it in terms of analyzing structure in terms of all the things that we've learned about free-in and entropy. Today I'm mostly going to talk about first fibrous proteins, which is by far the most common proteins, but likely the ones you've heard least about. Then we're going to talk quite a bit about water soluble proteins. Then tomorrow I'm going to deviate completely from the book and talk about membrane proteins. But before we do that, given that you've had 10 days off, it might be a great occasion to repeat this. There might be some things in the simulation part down here that I didn't have time to go through, but we'll get to that when we get there. So let's start from the top, I guess. What's the difference between thermodynamics and kinetics? Yes. So in general, thermodynamics, there is one important word that you need to remember when it's about thermodynamics. Thermodynamics apply when? At equilibrium. Actually, in principle, that's not strictly true, but the whole field of thermodynamics, you can actually do non-equilibrium thermodynamics, but everything we've talked about, the Boltzmann distribution and everything, that describes things at equilibrium. The problem is that equilibrium can take a very long time, even infinity. While kinetics, formally kinetics, only deals with how fast processes happened. That's complicated. Why is kinetics complicated? Exactly. So kinetics, let's see. I'm not sure whether I have an important part. Well, we have that in question three, but it's difficult to measure kinetics. It's even more complicated to just calculate kinetics from the energy levels. That means that both you and most other people have historically ignored kinetics. We try to avoid it when we introduce thermodynamics. The problem is that in particular when it comes to proteins and lots of things in biology, life science and everything, that kinetics is, in many cases, more important than thermodynamics. Because kinetics deals with speed, it's also going to deal with what are the things that can practically happen, while thermodynamics describes what will theoretically happen after an infinite amount of time. What features of the landscape determine thermodynamics? I thought that I thought this was better. What features of the energy landscape determine thermodynamics and stability? So this is a bit twin-edged. It's a bit of a trick question. So we start with thermodynamics. So you said height or other. Let's expand on this. What do you mean by height? Let's give it a longer push. Let's see. This one has been a break too. So the problem is that I think you need to break this down into parts. So let's start with thermodynamics. And thermodynamics was about what will happen after an infinite amount of time. What parts of the energy landscape determine what will happen after an infinite amount of time? Sorry. Sort of everything, yes, but I have no idea what this is about. I don't think so. So in principle, everything, yes, but are the peaks important after an infinite amount of time? No. So the likelihood after an infinite amount of time, the fraction of your states are going to be anywhere close to the peaks is zero. So after an infinite amount of time, what's going to describe the important state are the troughs, the low values, the minima. So that's really going to describe the stable or metastable or intermediate states. See if you can open the door there and see if we can get into the next classroom. I have no idea whether it's that because this is going to get a bit tedious otherwise. Whether we can make something. Well, I have a bit of light there at least. I'm going to be in light. Yes, my old niece, open it. I did manage to turn on some light. No? See, yeah, try to turn on the light in that classroom and see if that works better. And I don't think you have any other light switch here. Yeah, but that's in that half. Let's try it once more. Otherwise, we'll move over to that part of the classroom. We'll try it. Did that one, no, that light stays on her? For now. OK, let's see if it. I'll take, we'll have time for one more question before the light goes out again. Stability, so the only real stability is thermodynamic stability. Because kinetics deal with things that might happen in the finite time, right? But if after some point you will move over to another state, you were not stable. You could have been stable for a billion years if we're talking about the scales of the universe. But if you will eventually move to another state, that is not stability. So the only real stability is thermodynamic stability. While kinetics, on the other hand, if we're then a bit sloppy and think as life scientists, you could also argue, yes, in our particular case, on the other hand, while it's not real stability, the relevant stability is the kinetic stability. Because we don't care what would happen in our bodies after 100 years. And it's magic. I think it works. Just a matter of threatening to move to the other classroom. The kinetics, what determines the kinetics? What part of the energy landscape? Exactly. So you see that when we start looking at this, they look to be almost two sides of the same coin. And of course, there are two sides of the same coin. But they are determined by completely different features in the energy landscape. And that answers question three. What's most important for kinetics? That's the energy barriers. So if we start applying these things to some of the things we looked at, what is the point of separating the initiation versus elongation free energies, both for helices and sheets? I don't think this question deals with helices. Why do we even do this? And if you're completely lost, you can go back 10 seconds and think about what the last question was. Exactly. So that's why we did this, both for helix and the beta sheet. So for helix, it was just the first few, well, the first four residues that I had to put in a helical turn while I was only losing energy. Same thing with the beta sheet. I needed to define what is the part where it's only uphill. And by definition, right, once I reach the point where the uphill slope stops, I am at the barrier. So the reason why we wanted to identify this, both for helix and sheet, is to determine, we didn't really determine the specific height of the barrier, but we determined the character of the barrier. What is that we need to get across for this to happen? And you see that this was really mostly about kinetics. And that is how this is different from when you started the first week of the course, when we only looked at thermodynamics and the Boltzmann distribution. How did we determine those numbers? So what related to what I just said, but you can expand a bit on it. Yeah, so that's how we specifically did it for the helix, but I was thinking more in general that what is that we identified? What was that peak of the initiation barrier? What type of state is it? A transition state. And what is the definition of a transition state? It is the worst one that you still have to take. So in one way, it's the worst one, but it's also the best one, right, that it is the highest barrier along the best path. And then we talked a little bit about what type of transitions that is. So the alpha helix folding in particular, what type of transition was that? So there are a couple of types of transitions. First, was it a phase transition? So if you can, there are tons of ways of classifies transitions, but I would say one of them could be that it's some sort of continuous transition. You could have something that's highly cooperative, and you could have something that's all or not. And the first one is probably, by mean continuous here, that would basically mean that each residue can either be alpha helix or not alpha helix, and it doesn't matter whether helix 14 is, sorry, whether residue 14 is alpha helix would not depend on whether residue 13 or 15 was. Well, the cooperative part means that if your neighbors have turned to alpha helix, it's gonna be more favorable to you to turn to alpha helix two, but there is still some sort of balance equilibrium between helix and sheet, sorry, between helix and coil. And all or none would be that you can't really be stable having some residues in the alpha helical conformation if a helix starts to form, it has to move over entirely to helix or back to coil. So if you look at the equilibrium between helix and coil, what type of transition was that? It's a highly cooperative transition. It's not continuous because it does matter if your neighbors have formed helix because that has to do with those hydrogen bonds, right? It's easier to add one more residue when a couple of residues have already formed hydrogen bonds. So the mere presses of helix will induce more helix, but it was not an all or none transition. It is possible to have an equilibrium along the chain that some parts are helix and some part are coil. We talked a little bit about typical folding times, but rather than talk about specific times, is this a faster or slow transition? It's relatively fast, yes, even very fast. So if you think about the beta sheet folding, how was that different in the terms of, say, kinetics in particular? This is related to the next question, so we might take both of them at once. So this is a bit more difficult. Do you see that things are not quite as trivial anymore? Because in principle, you're just applying F equals E minus TS, but there are lots of things to keep in mind here. And the best strategy to do this is to break it down into parts. So if you're thinking about the transition, the first, well, any transition, there are in principle three things you might wanna think about. What is the state before? What is the state after? And what is the state between that you need to go through? And if we're talking about kinetics or something, for the beta sheet, it's particularly gonna be that state in between. So what is the difference between the beta sheet compared to the alpha helix there? It is, beta sheet was definitely an all or none transition, and it is a formal phase transition. That's probably more related to the next question, but I was thinking that if you look at the kinetics, how fast this is, how fast this is compared to an alpha sheet, then what's the character of this barrier? So it can depend. And it depends, might sound like a non-answer, but the point is it's not a non-answer. It can be exceptionally fast or exceptionally slow. It spans over orders of magnitude. While an alpha helix, that's probably actually why I asked about this specific time. The alpha helix is fast, but the point is that it can occasionally take nanoseconds or in a few rare cases, maybe 100 nanoseconds, maybe a microsecond, but not really much more than that. It's never gonna take a second to fold an alpha helix. Beta sheets, they can easily spend over six to nine orders of magnitude. And we already asked for that beta sheet appears to be a phase transition. We didn't formally prove it, but I think we hand-waved, and we hand-waved and definitely showed that it exhibits all the features of an all or none transition. So if you talk about time of beta sheet folding, what would you guess? We already said that it spans over several orders of magnitude, but normal beta sheets. So this is a good exercise. I'm well aware, I didn't specifically mention that before Easter. What would you guess? Why? So it's not entirely bad. Guessing is good. And the reason why this is a good exercise is when it comes to estimating things. I would say microsecond is a bit fast because that's pretty much down to the region where you're starting having alpha helix. There might be some beta sheet that fold that fast, but most proteins don't fold in a microsecond. So how the upper barrier here would of course be what is the bulk part of time that a normal protein folds? And that might be a second. There are certainly proteins that fold slower and then we're out of light again. I'm gonna need to call them and fix this. But I would say about a second is an upper limit for a normal protein, and then maybe a millisecond if it's faster. So I'll just say millisecond to second. This one we didn't answer, but I will go through that in a second. And what terms do we need to model a real biomolecule, a protein? That has to do with these energy functions and force fields. Let's see, is it lit in the other half? No, I'm not sure whether that's broken or not, let's see. I'll call Paula during the break and then I'll have it fixed for tomorrow at least. So you're gonna need some sort of way to describe your energy. So what is the energy in a protein? You say bonded and bonds angles, so what are the terms? So this is the reason why we spent the first week going through some of those terms. And if we look at our favorite equation again, which is what, all these terms go into that E. And that is why we went through them. It's not just a theoretical exercise that it was fun to know what in principle are the interactors in a protein. To be able to describe and understand that E, we have things like bonds and angles that we need to calculate, but in practice they're so hard so they're not that important. That is why we spent so much time talking about the Ramos-Chanton diagrams and the torsions. Because the torsions are the ones that depending on their confirmation, they will actually change the energy. There will be states that are better or worse. The bond energies are so highest that they're always gonna be at their equilibrium. But as a protein moves between different torsions, the E is gonna change. Same thing with electrostatics. And hydrogen bonds are pretty much electrostatics, right? That in a favorable confirmation, you will have a low E because you can form many bonds. And then of course, you shouldn't bump into things either. So we also have to find the balls. But just having E doesn't make us very happy. Let's see if there was a question about this. So if you have your E, you can do an energy minimization. But that's all you can do. So this is a function of lots of variables. If you have 10,000 atoms, that's a super small system. You have 30,000 variables. So you have a function of 30,000 variables that you're gonna try to find the minimum of. And some of you might have studied advanced calculus at university and there are a couple of different ways to try to find the minimum of function and everything, but even finding a minimum of a one-dimensional function is impossible. You can find a local minimum, but finding the global, there is no finite algorithm that will always find the global minimum for one-dimensional function. You have to do an exhaustive search. And you can imagine how difficult this is gonna be with 30,000, and that's a small system. We frequently work with membrane proteins, 200,000 atoms in the system, that's 600,000 degrees of freedom. It's a function of 600,000 variables. So you're never gonna really minimize this. The only thing you use energy minimization for is that we wanna get rid of the peaks. We wanna get rid of the collisions when atoms clash into each other. In some cases, because if you try to submit a model to the protein data bank and you have two atoms clashing into each other, the reviewers are gonna say that's a crap model. You haven't even done your homework. It's not a realistic model of the protein, paper rejected. On the other hand, you can argue from a physical point of view, it's not really that important because we know that those atoms, they won't really overlap. Sure, it looks nicer if they're offset a bit. The reason why it's still important if you wanna study theoretical things is that in the next step, we're gonna run some real simulations here. And if you'd have two hydrogens that are overlapping, that they are, say, 0.01 angstrom away from each other, what's gonna happen when you try to calculate the forces that are the derivative of your potential? So instead of getting a normal same force, this force gonna be very low or very high. They're gonna repel each other because it's basically a new explosion you're trying to simulate. You gotta get a force that is so high that the acceleration will probably, within one step, your molecule will have a velocity that is roughly the speed of light. And apart from the fact that we're not doing this relativistically or anything, within two steps, your simulation will have crashed. Because there is no, you can't integrate things at the speed of light because the temperature would also be a billion Kelvin or something. So that's not gonna work very well. So we need energy minimization to temper this term to make sure that we don't wanna start in the part of the energy landscape that is almost that infinitely high energy. But that's also why energy minimization is not particularly interesting. It's a necessary clean-up step when you start, but it's not really gonna help us. Why does the energy minimization not help us? It should be great to have the low energy. Why is that? Why doesn't that tell you everything about the structure? But that's quite true. But let's assume for a second that you had a new snazzy, let's call it the Lindala algorithm, that could find the global minimum in energy of the function. So this is where you need to remember the difference between energy and free energy. Minimizing that term does absolutely nothing for you. Because everything, and this is why we looked at this partition function, right? If you know the partition function, if you know the free energy of all the states and we know everything, but just knowing everything about that term but having no idea about that term doesn't help you. So the reason we need to do simulations in the second term, how do we calculate the second term? But how do you calculate this? That's the problem. You're quite right that you cannot calculate this. Why can't you calculate this? So the point is entropy is not the property of one state, right? That's like taking a still image from a movie. You can't calculate the entropy of somebody walking around by a snapshot because by definition, if you only take a snapshot, what is the entropy of a single state? Zero. The volume is one, right? And the logarithm of one would be zero. It's just one state. Entropy is only defined when you have something moving between different states. That's why how you first introduced energy in the lab, we need to account for different states. So entropy only enters when you have a molecule moving between different states. And the only way to get that term is by letting the molecule sample all these different states in some sort of simulation. And that's why you need to do the simulation. And it's very easy to think that a simulation is a way to generate a beautiful movie. And sorry, that I think the entire field, including my group is guilty of that. It looks super sexy when you have these proteins moving around. But that's not what you're trying to do. It's not the protein didn't move in exactly that way. The reason to do the simulation is so we get the entropy so that we can calculate the entire free entity. And then yes, as a nice bonus, you might also get a movie that show roughly how it might move. But the real point of the simulation is sampling. So the simulations, the samples and energy landscape. And that's also related to that. What we get with this sampling is that if you sample the important parts of the energy landscape well enough, I can calculate pretty much any observable from cryoem, neutrons, scattering, x-ray diffraction, diffusion, binding free energies and everything by averaging over all of those states. And if I sample this well enough, I'm gonna get a pretty good average. And the second, I would say pretty much anything you can measure in the lab is gonna correspond to a free entity. But what you do not get of it is an exact prediction of the motion of an individual molecule. And that's not your goal either. Now, if you're looking, you could argue that all those things that that had to do with equilibrium, right? And that's mostly thermodynamics. Is the kinetics correct in a simulation? So that relates to a couple of the slides that I skipped because we didn't have that much time. Bring them up again, we can talk about that. So this is from the slide lecture notes before the break. So the simulation is not an actual prediction of the motion of an individual particle. And the reason for that has to do with the chaotic properties of these processes. If you have a minute difference in starting conditions, you're gonna take another path. But the point is that if I'm starting with a reasonably good approximation that the average properties of that molecule is still gonna be okay. The roughly the speed with which this metal group rotates is gonna be roughly correct. And same thing, the speed with which a particle diffuses is gonna be roughly correct. So in theory, you could imagine using completely different methods. Sorry, I might have an even better description of that. You could imagine, thank you, let's see. You could imagine just trying to enumerate all the states in your phase space or in your landscape. So first I try to put the molecule this way and then I try to put the molecule another way. If I test every single possible way you can assemble it, I would know every single state. The problem is that I can't test every single way you put a protein. But let's, if you make at least a highly simplified model, like putting things, small beads on a lattice or something, I think you will even have a lab on this. This phase space and energy landscape is so much smaller that you actually have a chance of testing every single possible state. This will work great. But the point is you lose something. So this works great to sample all the troughs. You will sample all those really low lying minima and the landscape. So if you're only looking at properties of equilibrium, you don't necessarily need to simulate motions. The reason why this has fallen somewhat out of grace is that this works if you have just a small theoretical chain, you could imagine doing this for a protein and just turning the Ramachandran torsions, phi and psi. But what's gonna happen for a real protein if you randomly start turning the torsions in the backbone? And might is the understatement of the decay there. So let's say that I'm a large protein here and this one chain and then we turn in the middle of me. In general, if I start turning, I'm gonna turn the entire protein here, right? It's a gigantic motion. So a small torsion change here is gonna have a very large chain later in the chain. In this case, it works. But what if my protein looked this way? If I now start to turn, it's gonna bump into something. And bumping into the chain is one thing, but a real protein is in water, right? So that the second you start to make a gigantic turn that will move atoms two nanometers, what is the likelihood that it's gonna bump into something? It's not possible, it's certain. It will always bump into something. So this type of simple methods is not gonna work for proteins, you will always bump into something. And then it turns out that almost as a backdoor it actually works quite well to just use Newton's equations of motion so that we calculate the acceleration. And from the acceleration, we calculate the speed and from the speed, we can get new positions. I will skip through this so we get to a movie. And then you essentially, well, we end up with something where it appears that we're really tracking the motion of particles. Of course we are. This gets us two things. The most important things is each motion here is quite small. We're not gonna bump into things. But that has nothing to do that we're trying to track the real motion. It's gonna be a fairly efficient way to sample the free energy landscape. And it's also a way that guarantees that we sample them according to Boltzmann's distribution. Then as a pure bonus, because we now have realistic velocities on all the particles, we can actually also get kinetics from this. So we can determine, again, how frequently does this protein open or when it folds? How long does it take for that helix to form? There is a certain randomness here so that sometimes in the simulation the helix might fold in 10 nanoseconds. Other times it might take a microsecond. But if you do this for a long time, we should be able to calculate what is the average time it takes for this helix to go through this barrier. Because we also need to sample the peak, the transition state of the barrier according to Boltzmann's distribution. So simulations are kind of a twin-edged sword here. They're good because they sample the Boltzmann distribution. But compared to this simple statistical sampling, we actually get a bit of information of the kinetics too. We can calculate kinetics on average. But the goal is never to sample the exact, predict the exact motion of that atom that I can't do. And I also skipped through these things a bit. So once you do that, if we now sample things correctly, I can calculate, for instance, the binding free energy of a small molecule or something because I literally sample all states. What you're gonna need to do in a simulation that all these things that you've thought about in the lab or that you actually do control in the experimental lab upstairs, you're gonna need to control that in the simulation too because it determines what we are actually simulating. For instance, are you simulating at 200 Kelvin, 300 Kelvin, or 500 Kelvin? It's gonna influence the outcome of what you're doing, right? And that has to do with this thing that you typically define this thing as some sort of thermodynamic ensemble which is a fancy word that's quite deep in physics but the easy way to think about it is just describes the conditions what we're doing. So fixing the temperature might seem obvious but it's not the only possibility because if we fix the temperature, I'm gonna need to either feed energy into the system or take energy out of the system because if I'm changing the velocities of particles, right? If I'm increasing the velocities of particles, I'm adding energy to the system. If I'm decreasing the velocities, I'm taking energy out. So that corresponded to one of these systems what I could exchange heat or energy with the surrounding. Another opportunity could be to have a completely isolated system where the total energy in the system is constant. A physicist might love that. The only problem then is that if you start with a protein, if your protein is not in the lowest possible energy state, the protein will like to go to a lower energy state because the interactions are better. It's good if we can form a hydrogen bond. But what then happens is that the protein's energy drops. The total, sorry, and that means that the total potential energy in the system will go down but since the total energy in the system has to be constant, what happens to the kinetic energy? That has to go up and then you're increasing temperature, which is what would happen in physics. So if I take a stone and drop it, the friction generates heat. But that's typically not what we want for a protein. So that in our case, we would like to keep the temperature constant. Same thing, we can decide should the volume be constant or do we want the pressure to be constant? That's also a pair. The only way I can control pressure unless I change the number of particles is to change the size of the system. If I reduce the size of the system, I increase the pressure, right? Because I have the same number of particles in it. And if I increase the size, I reduce the pressure. But then I'm of course changing volume. So same thing here, I can keep either the volume or the pressure constant. And I'm not sure where they're at. There are a couple of ways you might even see this in simulation that you typically define this ensemble with three properties. So that N, V, E. N means that the total number of particles in the system is constant. V means that the volume is constant and E means that it's constant energy. You could also say say NPT. That means that we keep the number of particles constant, the pressure, and the temperature. And the only reason for introducing that is that you might see it in the labs. And you can also do say N, say VT. That means constant volume and constant temperature. And theoretically, you can actually change the, instead of having constant number of particles, you can actually have constant chemical potential. But we're not gonna go through that. So this really describes the physical rules of your simulations. What is that you wanna sample? And in chemistry, it's usually one of these two we use. In principle, it should be NPT, but sometimes the pressure changes are so small that it's easier to do that one. And then we said that this was really statistic. Well, it's all about statistics so we're gonna need to sample things a lot. You will run some small simulations, but in general, to do this for large proteins, you're gonna need supercomputers, which thank God we have. And I think later on in one of the labs, you're gonna actually try to doing some real small simulations here. That means that I've pretty much covered most of the things that we skipped last week. There's one final thing that I didn't tell you about that we had four slides about the coil. I'll go back to that and go through that. Why do you care about the coil? Why do I care about the coil? What protein is active as a coil? None. So that means that, well, you can just skip these four slides and I save you from a few equations and then we can be happy and focus on protein structure. So why should we care about the coil? So we talk about native states of proteins. What does this correspond to? Well, the beta sheets and the alpha helix that would be the native states, right? And this is definitely not beta sheets or alpha helix. So in contrast to the native state, what type of state is this? Non-native or denatured. So we'll get to the presence of states in two seconds. Well, 20 at least. So this is the non-native state which is completely pointless to study. Why do we care about the non-native state? So let's take a step back. What is that you would like to understand about proteins and their function and how they fold? You would like to understand how proteins fold and what happens when they fold, right? What proteins are stable? Why is the protein stable? If you introduce a mutation, would this mutation be stable or less stable than the wild type? So our goal or your goal is to understand the process of protein folding and stability. And what I've said a couple of times when it comes to understanding a process, what is the prerequisite? You can't understand any process by just looking at one of the states. It's completely possible. You need to have a before and after. Let's see, quick. So this is the before state. You might not spend any time in the before state when we have folded, but we can't understand why we prefer to be in the native state if you don't understand why the denatured state is less favorable. So the coil is gonna correspond to the denatured state and I think either later today or tomorrow. No, it's probably gonna be the end of this week. We will specifically look at the kinetics of these transitions. But when we do that, we have to know what we're talking about when we say the coil or the completely stretched out state. And the first thing that I already mentioned a couple of times is that the coil will never look this way. You will never have an old trans structure. Why? The entropy, that's just one specific state, right? Does that would correspond to entropy zero? So technically, yes, you can have it, but it's just one state of a gazillion. It's not gonna happen. Let's see if I can get something else working. No. And unfortunately, we can't move over to the other half of the room either. We might need another classroom tomorrow, unless they have to fix it. In general, it's gonna look something like this, but you have no idea how large this is. And it turns out that we can estimate that with some fairly simple means. So we start out here and then we move one chain length at the time in completely random directions because this would correspond to completely disordered states. So on average, what is the distance between these two endpoints? That's gonna tell you something of roughly the diameter of this shape and how this will increase, in particular as you increase the length of the chain. And this is surprisingly easy to do. It's gonna be a bit of math, but not very hard. So the first thing, so we're gonna need to describe this as a vector. So when you move from the first one to the next one, that's a vector in three dimensions, x, y, z. And then you add the second vector component and the third and the fourth, et cetera. And then it can go out of the whiteboard too. And that means that the total vector, once you get to h, it's a vector that is a sum of all the vector elements. And that will mean that you have started from that point and move there. And then we would like to know what is the average length of h? Well, the easy way to do that is to square it and then take the square root of it, right? Because the square here is gonna be a scalar number. And that corresponds to just squaring that entire expression. And then you can expand this sum. So that's gonna correspond on the one hand. So we have one sum multiplied by a second sum. So that every term is gonna be multiplied by every other term, including itself. And then I'm gonna argue that we can write that, so let's, well, we can always write that. Let's first put all the terms where we multiply the term with itself as one sum. And then the double sum where we multiply each term by some other term. And that likely doesn't look a whole lot simpler. So what is the R i's and R j's? In general, we can't say it because this is just one state. But we were interested in the average, right? And if you're looking at the average, well, we can write that in a slightly different way. So we take, in statistical mechanics, we frequently draw averages as these angle brackets. And I forgot one bracket there, sorry about that. So this is the thing that I drew on the last slide. I'm gonna take the average of that expression and then I can take the average of the first term plus the average of the second term. I can always make, I can always move the average here inside the sum. So first I'm gonna have the sum of the average of each term multiplied by itself. That's just the square of the length. And then I'm gonna have the sum of each term multiplied, the average of each term multiplied by some other term. And here's the beauty. If I take a random vector, because again, this is random, this is gonna be average. In general, if you take one vector and multiply it by another vector and the average of this over all possible vectors, it's gonna be zero. Because in general, this really describes how correlated they are, right? If they're parallel, they would be one. If they are orthogonal, it would be zero or it can be minus one. And in general, if they are independent, it's like two independent statistical variables. In general, that term will disappear. There is no correlation between two random vectors. So all this complicated expression really goes down to the first term. So that have R squared and I have N such elements. So that, and here I don't, the point here, I don't really care about exactly how long R is, but the square of this sum goes up as the number of elements, which means that the average length, and this was a square, right? So if I take the square root of this, the average length increases as the square root of the number of elements or the number of residues. So if you take a protein chain and move say from 10 residues to 100 residues that a factor of 10, that means that it gets roughly a factor of 3.3 longer. So that the size doesn't really go up that quickly with length. Can you see some approximations here? Well, there are some of them. So first you can't, these torsions of your amino acid, right? You can't move amino acids any possible way. This is gonna be restricted by the torsions. It turns out that we can compensate for that so that you have some sort of effective length that would probably correspond to two amino acids or something. So that's the correlation length in the chain. But there is another more severe problem. For a protein that's in a small space, in general, you're gonna bump into yourself, right? And this is a concept called excluded volume. So that some of, you've already started filling up some of your volume with protein. So that volume is not free. You can't put other things there. This is a super complicated physical problem. And you can actually solve this, not numerically. You can prove it. Paul Florey did this in the 60s, I think it does. And you can show for a real chain in three dimensions, this goes up not, so N raised, the square root of N would be N to the power of 0.5. And there is a theoretical expression for this that Paul Florey proved and derived. And that's actually N raised to the 0.588 and then lots of more digits. Paul Florey was a physicist that's pretty much the father of everything we know about modern polymer physics and polymer chemistry. And that was really the birth of what then became all the biophysics. And we started to understand amino acids and everything. There was actually this amazing book about polymer physics that he wrote in the 1970s as the stick. And it's pretty much describes everything in the field that hasn't really happened in the things in Florey because he pretty much solved the entire field. We're not gonna spend any more time talking about this, but the point is here is that that's super simple model that N raised to the power of 0.5 is pretty darn close to a value that you would get the Nobel Prize for. With that, I will go back to what we're gonna do today. Do you have any other questions about the stuff we did last week or before the break? And we even have some light for now. This is gonna be a really fun video towards by the way. Today, we're gonna be back in chemistry. We're gonna be looking at lots of protein structures. We're gonna do relatively little physics, but I'm gonna connect back to physics and then towards the end of the week, we're gonna be looking a whole lot more about kinetics. So in particular, if you feel that your knowledge about the statistical mechanics and kinetics is a bit rusty, try to read up on this this week. You don't have to do it for today's lecture, but it will help you on Thursday and Friday. And I've also made sure Wednesday is completely blank for you. So you have four lectures and two laps this week, full schedule, so use that Wednesday to study if you haven't had a chance to read the book yet. Let's jump straight into the protein part. There are, I would then, and the book and quite a few other people, would classify proteins at a very broad scale into three areas. The normal proteins that you've seen most of are water-soluble globular ones, and that really has to do with the shapes, the globin-like folds. The reason why they're most common is that the ones that was easy to determine structure. The first protein is that Kendru and the whole MRC determined structure, they were all globular because they're easy to crystallize. And they also have a clear, repeating, well-defined structure. But that's a common misconception. It's a common danger in science that we frequently only believe the things we see. Rather that we assume that the things we see are representative of everything that can exist in nature. And that can easily fool you, right? You never observe the transition states and therefore we think that they don't exist. There are other classes of proteins. There are membrane proteins that we're gonna talk about tomorrow. They're much more difficult to crystallize and I'll tell you why when we get there. And then there are all these fibrous proteins, all the building materials, the scaffolds in your body. And they're by far the most abundant proteins. But we haven't seen that much about them. So today I'm gonna start by talking of the globular proteins and then we're gonna talk a little bit about, no, sorry, not globular. I'm gonna talk about the fibrous proteins and then the globular proteins and then we'll save membrane proteins for tomorrow. Most of these pictures I'm gonna show you are generated actually, not with VMD, whether it's a small product called PyMol. The point is that there are a bunch of these free tools you can download. I would strongly encourage you to do that. In particular, if you're thinking about proteins and wondering the second you start to look at proteins in your studies, download the structure. Because just by looking at the structure, you can usually understand quite a bit of proteins. And it's so much more instructive for you to look at them in three dimensions and rotating them around rather than just looking at my two dimensional images. The other reason of using these viewers is that it comes back to all those things we talked about the first week, I think. You can choose how you wanna represent a protein, depending what you're interested in. You can choose to, for instance, draw some sort of volume filling cavity here. This might, actually initially, this might appear that it doesn't really show you a whole lot of details about the proteins, but you're gonna see that later today. Just seeing the shape on the surface is frequently quite important. For instance, if you wanna see how two alpha helices can pack. Now, the problem with this, though, is that you don't really see much information about the secondary structure elements or how the amino acids are positioned, right? You can't really see whether that's a helix or sheet. So occasionally, we might just wanna trace the backbone. And in this particular case, it's all green, which is a bit stupid. I should have used a rainbow coloring. So these are rainbow color structure. When you go from blue is C terminal, N, until the red one, that is the N terminal, O. So just by coloring it this way, you can see where is the start of the chain and where is the end of your chain. If you have a very large structure, such as an ion channel, you might wanna color the chains in different colors. You can see what are my four different chains and how is the chain positioned? In the second sub-panel down here, you're also seeing the secondary structure, right? Which again, it can be super important. And I think in the last case down there, we are even seeing the protoporpherin group, the heme group, which again, it depends what you're interested in. If I'm interested in oxygen binding in the hemoglobin and everything, this is probably gonna be pretty important. But if I'm just interested in the general shape of the protein, well, maybe I wanna see the general shape of the protein and the specific position of the two heme groups. It all depends on what you're interested in. So it's usually worth taking a few hours and learning how to use one of these programs. I have to confess, while the author of VMD is a very close colleague nowadays and we work really well together, I have a preference for pymol because that's what I learned. And even when I was in your age, I learned a program called RASMOL, which is ancient by today's standards. The point is that programs develop all the time. Just pick something you're reasonably comfortable with and learn to use it. It's a useful tool. The other thing you can do with these programs, you can do these drop dead gorgeous presentations. I think this is a bacteria of age. This is a ribosome. And I think this is an illustration of what happens in some of these new nano pores people are trying to design for sequencing. So I'm not sure if you know about this, feel that it. Not really part of this course, but I'll describe it super briefly. The techniques to sequence amino acids are usually dependent on copying them and amplifying them, right? And then doing some sort of chemical fluorescence or something. But what you can do is that if you take graphene layer or something and then you have cut out some sort of very small hole in this, these graphene layers, they're super sensitive to the currents and everything. And the DNA is a fairly charged molecule and the specific charge is gonna depend on each base. So in theory, you should be able to take a DNA structure and pull it through this hole. And depending on what basis you're seeing here, we should be able to detect the variations in the current here. And this works in theory. People haven't really, people are doing experiments. There is nothing commercial on the market yet. But the idea for this is that you should be able to just take a piece of DNA and pull it through super quickly through rapport and then have computers detect the currents super fast as a way of detecting it. But that's still very much research. So that brings us to our fibrous protein friends. You have probably not seen a whole lot of fibrous proteins in VMD or any other molecular viewer. Why don't you see these proteins? Can you get by without the light? Good. I'm sure we can. That's the advantage of spring. Will you pull the curtain there a bit? Oops, no, not quite. Good, bit more light, natural light even. I know. So the fibrous proteins are the structural building blocks in your body. That partly means that they're super boring because they have to be large, right? There's no point in having structural building blocks that are only a nanometer large because then you don't really get any structure. So for this to be forming a structure, they need to be millimeter sized or something. And that means they're gigantic by molecular scales. The only way to get there is having elements that repeat and repeat and repeat and repeat. But just sitting and looking at all those repeats can be pretty boring. This is gonna be the stuff that force pretty much everything you are or at least what you feel you are. Skin, hair, nails, fibers, shells, claws. It's mostly proteins, large proteins. And the way they form this is pretty much by building serums hierarchically. Very much like they went from amino acid to secondary structure to tertiary structure to quaternary structure. But now we're gonna add like three, four more layers here. They don't really have as well-defined names. And just for you to get a feeling about this, I'm gonna show you a couple of them. Silk. Silk is pretty much beta sheets, pure beta sheets. And it's a small repeating unit that's seronglycine, alanine, glycine, alanine, glycine. At least both that. And this is gonna form anti-parallel beta sheets. And then there are some, there are slightly, I'm so not a silk expert, but there are some other proteins that will give different types of silk slightly different properties, but it's pretty much only beta sheets. And you almost get some sort of like it's almost gonna be crystalline, but not quite. All these glycines also gonna mean that it's a super flexible structure, right? And that's one of the things where you get these extremely soft properties of silk. So I'm not sure if it's probably still on the market today. So if you go down and say, you buy some shampoo or something, they occasionally say that they have silk protein in them. That sounds fancy, would you pay more for that? Silk protein, do you think it has anything to do with silk? Well, it does in a way. But the point is that you're not getting it from spiders or anything. It's just that you have this protein in it. And if you just produce this protein, the cost will probably be, say, a dollar per kilo or so. So it's nothing fancy at all. It's just a random protein. And I guess maybe, well, maybe I have no idea if this is gonna stick to your hair. Maybe I know that whether the softness actually comes from that, but it's pretty much a marketing gimmick. It has nothing to do with silk. Well, it does, but it has nothing to do with it. It's not natural silk. Another example is collagen. And the only reason for showing this is that this is a triple chain helix. So you see that there are three chains there, but it's not an alpha helix. So you have three chains that are full of proline. So this is not gonna form any other stable structure. But because these proline, the backbone here, they can hydrogen bond to other backbones and the two other chains, right? But for this to work, you're gonna need to express proteins with tons of two thirds of proline. It's pretty much polyproline. This is roughly a quarter of all the protein in your body. Bone, teeth, it's hard. Okay, it's a very rigid. You probably know that already. Proline can pretty much only be in one place in the Ramachandran diagram, right? It can't really move that much. So if you wanna create some large, heavy building block, this is great. But that's tiny. So you have each chain here, and then by grouping them together, they may be 1.5 nanometers wide and maybe say 300 nanometers long. I'm not sure about you, but my bones are a bit larger than that. So what then happens is that you take those three chains and then you group them into slightly larger super chains here, a super helix, which I don't remember whether it's six or nine. So now they start to be maybe, say, 10 nanometers across or something, and then they're even larger. And these you can actually see in an, I think this is an electron, not a cryo, but this is an electron microscopy image where you actually see these super fibrils in your teeth. And even here you see that you have many of these fibrils, right, that are then grouped into even larger structures. So it's the same thing as a secondary structure. You need the body builds things hierarchically by coiling and then coiling at a higher level and then coiling at an even higher level. But this is also why it's pretty boring because it's just an hierarchy of things that look pretty much the same. And then as you go down one level, it turns out that, yes, there's another layer and another layer and another layer. So it's important, but it's not really that biologically fancy. It just creates some large solid building blocks. But the fact that it's not fancy doesn't mean that it's not important. So if you take one, remember there was a glycine proline proline. And if you take that glycine and mutate it to something else, it turns out that these chains are no longer going to be that stable. And then you can get brittle bone disease. This is just a single mutation in the collagen genes. And due to that, they're no longer going to form stable structures. Just because the structure itself doesn't, it's not, these structures are rigid. They don't undergo transformations or kinetics folding and all these fancy things we're going to look at later, but it doesn't mean that it's not important. So the theme we've seen this far is that it appears that nature likes to take small building blocks and then keep orienting them more and more. There is a very general concept when it comes to helices. So if you have individual alpha helices, that corresponds to, well, you already know that you take a heat, you put the rest using a helical shape, right? But if you now think of jarn or something that if you have two small threads and if you want to create a larger structure or rope, it usually works pretty well to keep coiling them, right? And you can do that with alpha helices too. It's normally pairs of helices, but it can be more. So we look at two helices here, top and bottom. And if we look at those from the top, you might remember that we had those, we call them helical wheels. I'm not sure what it might be out of battery there. And then we have residues here with A, B, C, D, E, F, G. Why did the residues have that order? Why am I jumping between the residues? Why does it just say A, B, C, D, E, F, G? Right, so there are roughly 3.6, there are 3.6 residues per turn, right? So I'm starting at A and then I'm moving 100 degrees B and then it's C and then it's D. So A, B, C, D is one turn here and then E, F, G is the next turn in the helix. And then I do exactly the same thing in the second helix here. That means that I'm gonna have the A's and D's here in contact in particular. And if we now extend this into a larger long helix here, you're gonna have all these contacts between A prime D, A prime D, A prime D, A prime D and then you don't see the A in the rear. So it's always gonna be the same residues based roughly three to four residues apart here, that form a contact with each other. This actually happens in quite a few turns. So this is myosin, which is very common in muscle fibers that they can even, when your muscles contract, these molecules will look like it's walking along the muscle chain. It's a really cool way. This part out here is just two-coiled alpha helixes. And in theory, you should have 3.6 residues per turn here. Why? Because that was a stable confirmation of an alpha helix, right? A stable alpha helix is 3.6 residues per turn. And that would mean that, but it's very frequently only 3.5. We wound it just a little bit harder. Almost. Hold that thought. You're gonna see it in two slides, but it has to do with the straightness. So if we look at the helix first, I'm not sure about you, but to me it's not easy to understand the helixes just from looking at that low-level representation when I see all atoms. And if I actually, if I were to even draw out all the atoms colored according to the type of the atom, it would be even more complicated. So it has to do with too much information kills you. You can't understand things if you see all the details. If I wanna understand the packing of two helixes, I'm interested in the packing. It's mostly steric interactions. So then it's actually much better to do a space filling plot. And if you do that, it's gonna turn out that we have some sort of ridges here. So all the things that are sticking out here are what sidechains? And they will of course go roughly along this 3.6 residues per turn line. And if you draw that, it's gonna be after a while, there's one residue here and then another one roughly 100 degrees up or something. And it's, if you start drawing these peaks, there are gonna be some sort of ridges here or here while there are gonna be some areas between these ridges that are more valley-like. And if you do the math there or just measure it from a plot, there is gonna be some sort of ridges that are roughly plus 45 degrees relative to the direction of the helix or minus 25 degrees. And that's just what sidechains are pointing out. So if you had two helixes there, you would like to interact. What would you need to do? If you would like them to pack well together. Let's look at the next slide. So here we have one helix and the points here are those ridges, right? And here's another helix. So this is one that is plus 45 degrees and this one that is minus 25 degrees. So if you take one helix here and then we turn it around and push it on top of the other one, it's gonna look like that. And that's horrible, right? Because you see that they're overlapping everywhere. But if you now take this helix and turn it roughly 20 degrees, do you see what happens? The ridges of one helix fits into the valleys of the other helix. So based on this prediction, you would assume that helixes should like to be packed at roughly 20 degrees angles relative to each other. And Francis Crick predicted this already in 1953. So what was cool by doing that in 1953? Would it have been as cool to do it in 1963? Or 65? Exactly, right? Because it was before we determined the first structures of proteins. Being able to predict that. And again, in hindsight, it's immobious, but most scientists sticking out their neck tend to get it chopped off because most predictions tend to be wrong. But that's an amazing prediction that helixes were, because if you just think about helixes in general, it makes sense that they should be parallel, right? If I didn't know anything, I would guess that they should be parallel. It looks more ordered and our brains are biased that we like to see order. They're saying seeing a protein, and you've actually seen this in all these structures, the structures almost look a bit disordered, like the hemoglobin, right? The helixes are all over the place. They're not all over the place. They're usually at about 20 degrees over the place. So they have to be packed away for the helixes to pack well. Which we, of course, now, since then we have confirmed that with structures. So if you take two helixes and twist them by roughly 20 degrees, they will form quite stable structures. But if you would now, assuming that you would like to extend this all the way up to the roof and down through the floor here. Well, to keep the helixes at roughly 20 degrees, if they were completely straight, they would deviate from each other, right? Then it would just be a gigantic X. So to keep the helixes at roughly 20 degrees, what do you need to do? You need to keep turning them, right? So if you take this and assuming that I could take both of these and I would push them together up to the third floor here and down in the basement, then locally the helixes would always cross at 20 degrees. The other alternative, if I wanted to keep them straight, I would need the amino acids to be at a repeat of 3.5. So then I would need to turn the individual helixes slightly tighter instead. But this is why you see these long coil-coil structure. That's why they always make this rope-like structure that they turn all the way. So that they stay at roughly 20 degrees packing. And then again, the concept is called, since helix is already a coil, we call this a coil-coil. And just as a rope, it's kind of fine, you'll see that the helix here is right-handed, but the coil is left-handed. So they go in opposite directions. When you have coils like this, it's very common to have lots of leucine in them. And in particular, having every seventh residue as a leucine. And there's even a name for this, leucine zippers. So what do you think these two leucines would do? What do you know of leucine as a residue? Remember, I think it was the very first lecture, and I almost apologized for repeating fundamental things about amino acids. Do you care, do you think I care about how many carbons are in the side chain of the leucine at this point? It's irrelevant. But what is the property of a leucine? It's not a polar residue, it's a fairly long hydrophobic residue. And that's what I mean. I'm not gonna ask you about to draw the chemical structure of an arginine or say a tryptophan, but you need to understand the physical properties of the residues. Because otherwise, when you see something, the point is when you see something like this, a bell should ring. So if these are helices that we typically see in water, what would happen if the leucines were turning to water? That would be a high free energy, right? So you'd like to turn the leucines away from water. And if they are now repeated every seven residues, pairs of these red leucines can turn to each other. And it's not free, it's definitely not forming any hydrogen bond or anything, but it's just two hydrophobic residues pair up with each other instead of exposing them to the water. The reason why they call leucine zippers is that you can imagine the entire alpha helical forming just like closing a zipper. So if you now see in bioinformatics, actually this actually, you can actually use this, not to detect the secondary structure, that would be pointless. But you can use it to detect coiled coils, which is actually important and I actually published a paper with my father and that's almost 15 years ago, was a medical microbiologist and that's very little idea about protein structure. But the point, we were able to predict some of these proteins that they actually were forming coiled coils, which is important to understand the recognition. It's also common to see quite a bit of cysteine in these helices. So if you have two cysteines, one in each helix, these tile groups can then form disulfides. Remember that we talked about the disulfide stabilize more often. And if you then oxidize this, the leucine zippers are nice to help them form, but the disulfide bridges, they're super strong, they're gonna be covalent bonds. So then you will really lock the structure in. And we actually do this. So alpha keratin is what forms hair, alpha helices. And you grow roughly, I think, 10 turns of alpha helix per second in each hair. And you can actually do the math and that corresponds to your hair growing sort of like one millimeter per day or 10th of a millimeter if it's my age. But here too, it's no course, as thin as hair might be and alpha helix is far thinner. So here too, you see that you form the alpha helix, they form coiled coils. And then you form these super coils with eight coiled coils or something and that forms a matrix and then you come a macrofibril and then you end up with something like, well, hundreds of this macrofibrils and eventually form this cortical cells that are attached to your hair. So they're in an average hair, they're gonna be hundreds or thousands of alpha helices but they are all alpha helices. So if you now go to the hairdresser and you would like to change the shape of your hair. A permanent wave, yes. How do you think a permanent wave works? Chemical. So you need two chemicals, right? First you need to reduce all the disulfide bridges that makes your hair floppy and then you reform the oxidize and then form the disulfide bridges in a new shape. And let's see, I'm not sure where they are. They are actually related, not really disulfide bridges and alpha keratin but there are similar processes you use. So for instance, clothes that are supposed to be non-iron and everything, you basically create disulfide bridges. Well, disulfide bridges, you form shapes that are stable. They're not thermodynamically stable but they're gonna be stable over months or years. Elastin is another chemical that you probably haven't seen it but it's worth having a quick look at it. Just this is a, it's kind of like collagen but it's a more complicated shape. And here too, you have lots of cross links by modified lysine residues. But this creates a very elastic protein which is used, well, native elastin occurs in things like blood vessels and everything. And you can even do artificial blood vessels and everything so you can have biomaterials. And one of the reasons for that is that it's here too. Some of the deficiencies in your enzymes might create very brittle blood vessels. So the cool thing with the fibrous proteins there are usually, since they're so repeating, there's not really, there's frequently not that much information about the protein structure. They're not forming something large and complicated by hemoglobin or something. And because these repeating units are so small, they look trivial and boring but there's very little genetic information and say, boned. Because the repeating units is so small, right? But that of course means that if you then have a, if you take a large protein and you mutate one amino acid, in general, things are gonna be fine. You're not gonna die. But single mutations in these repeating units, because you repeat them so many times, they can frequently have disastrous effects. And that's why it's, there are more diseases than you think that are related to mutations that alter the structure of proteins and the fibrous proteins. But that's all I'm gonna say about fibrous proteins. It's 20 minutes past 10. I would, before, it's a little bit early, but before I head into globular proteins, this is a great place to take a break. And now I'm gonna call Paula and see whether she can fix the life for tomorrow at least. And then, so let's meet at 10 minutes to 11 and then we'll talk about the water soluble proteins. Today, the rest of the day, we're gonna speak about globular proteins, which are classical, beautiful proteins. No, I'm out of juice for that one. There are a ton of different ways. Globular proteins are beautiful, but they're also complicated. There was a reason why it took 20 years to determine the first structures. So they're not simple, periodical, and hierarchical, like the fibrous proteins, so that there's way more diversity. There are way more things to understand in the structure. It's easy when you look at them to think that some of these parts of the structure are disordered. I would say that there can, of course, be disordered part in these structures. But in general, if structure is disordered, you're not gonna see it in the structure. Why? We spoke a little bit about this. Was it the first or second week? How do you typically determine these structures? You might try, for a large protein, you would never do NMR. This is small enough that NMR might work. And in general, if you have a large protein, you're not gonna be able to determine the structure accurately so that experimentalists would believe you with computational methods. Yes, so that the vast majority of structures in the protein data bank are determined by X-ray crystallography. And we did go through that. X-ray crystallography only works because you have things that are crystals. And then when you scatter the images, the scattering is gonna depend on the periodic distance between atoms. And if you have a very regular pattern, you're gonna get these characteristic dots when you have constructive interference. So X-ray crystallography will only create a well-defined, they will only create a pattern in the first plane. If you have lots and lots and we're talking about billions or gazillions of repeating structures that have exactly the same conformation. If they were in different conformations, if each small part of the crystal had a protein in a different conformation, you would not have the repeating property. And then it would just be a smeared out gray shape on the, it's no longer a film, it's a CCD nowadays. So the only reason why you get these dots is that there are regular patterns. And the problem if you now had a protein and let's say you might have a large part of a protein, but then you have a small n-terminal part where it's just a floppy chain sticking out. So that's what's gonna look like in one structure. And the next structure is gonna look like that. And the third structure is gonna look like whatever that. And forgive me for not drawing the next three billion ones. You're not gonna see this. It's not regular. So what's gonna happen in the structure is that they're gonna say, well, we couldn't see the first 20 residues. Another enumeration in the protein starts at 21. So if you're seeing something, and it might be some super strange complicated that looks that way. The only reason you're seeing this is that there were a gazillion copies of this in the crystal, and they all had the loop in exactly that conformation. I have no idea why the loop looks that way, but don't for a second thing is random. For some reason, that protein was well-defined, stable, and all of them looked exactly that way. It could be that it was binding to a copy of itself or something, but it looks exactly that way in all of them. So that means that if you see it, it's gonna be fairly rigid. Doesn't mean that it's necessarily rigid in water, but it will have been rigid in the crystal. The other problem is that for most of them, most of the real ones, this far we've only looked at either alpha helices or beta sheets, but in general, you're gonna have both. Some parts that are helix and some parts that are sheet. And there are a number of ways that you can draw this. In general, the best thing is, of course, if you have one of these 3D viewer and you can look at them, but if you're gonna draw them in print, what's frequently quite useful is to do some sort of these various schematic structures. So here you used round red parts for helices and triangular blue parts for sheets. And I'm not sure about you, but even here, can you imagine where those eight sheets here, what they do? Those eight beta strands, you can almost see that they're gonna need to create some sort of continuous beta sheet there, right? And then it's helices outside. So that, and if it's helix, sheet, helix, sheet, helix, sorry, sheet, helix, sheet, helix, sheet, helix. Can you imagine whether those are parallel or anti-parallel? So we have two different answers. Why do you think they're parallel versus anti-parallel? Right, so there, if you have a helix, right, and that goes out of the whiteboard, and then you have a helix that we need to go into the whiteboard again. So the next sheet is gonna be parallel to the first one. So these are super simple, but the cool thing is that you can see, you can learn a lot about the structure and what type it is just by looking at it. And again, this is way easier for me to see than starting to look at a large protein. So the schematic, this doesn't exist. They're just amino acids bound, but it helps at least me understand what are the principles of the structure. And here you have a more complicated beta sheet or two beta sheets sticking together and some helices outside it. You can do it in 3D too, but in many cases, actually I would even say here, the two-dimensional version is probably easier to understand than the three-dimensional version. Whereas the more data you add, at some point you're not gonna see the forest for the trees. So let's start, we're gonna need to start looking at one. We always start with alpha helices. So let's start with beta sheets for once. Because they're actually, they're easier than the alpha helices for once. Beta structures are simple. Surprising that they are. They only exist in continuous sheets, in particular if they're pure beta structures. And the reason for that is that every time you break this sheet, you end up exposing lots of residues that are then unpaired hydrogen bonds, right? Remember when we talked about the beta sheet formation, it's cheap to add to beta sheets, it's expensive to create new ones. So you want fairly large beta sheets. And then the structures, they typically have one, maybe two stacked beta sheets, but rarely more. We'll come back to that, okay? Anti-parallel beta sheets are by far the most common ones. In particular, when it's pure beta structure, why? Well, it's not just that, right? If you had to have them parallel, right? Because again, we're saying that this is pure beta structure, not mixed alpha beta. If you come to the end of one sheet, we're somehow gonna need to move back. And yes, theoretically you could have another beta sheet or something, but it gets more complicated. It's so much easier to just go up, down, up, down, up, down, up, down. It's a more local structure. And you might recall that we talked about these things, that helices are, sorry, that helices are, sorry, helices, beta sheets. Sorry, I bit of a headache today, not from partying, but a bit of flu over the weekend. So I'm a bit more confused than normally even. Beta sheets, they were slightly twisted because they were not exactly 180 degrees. And that's also gonna mean that there is a slight difference depending on whether you're moving to the right or to the left when you move from one helix, sorry, one sheet to another. So they typically have this right-handed geometry. And that might seem like a curiosity. But could you imagine ever using that if you know that one of them is much more common than the other? If you had to do a bioinformatics prediction of the structure here, you're likely gonna be pretty good at predicting the sheet because these are residues that definitely have to be sheet, and these are residues that definitely have to be sheet. And you will likely also detect that they're parallel. How good are you at predicting specific coil structure? So getting this right in bioinformatics is almost impossible. So when you see that and you're doing a predictor and you have to choose between right versus left, what do you pick? Exactly. Which could be wrong, but if you don't know, pick the most common one. There are two ways to pack beta sheets. We can pack them either orthogonal or aligned, also very easy, right? Either they are aligned or they are orthogonal. Both of these creates, you have two layers. You have one layer in front and one layer in the back. This occurs in our old friend, the fatty acid binding protein. And the one on the right is very common, say in immunoglobulins. It's very common for beta sheets to have two layers. Beta sheet proteins have only two layers. So why is it good to have two layers? I might have mentioned that before the break. Sorry, exactly. Well, you need to form an inside, right? You create a space on the inside that is different from the outside. And in the case of the fatty acid binding protein, that would be that it can bind something hydrophobic on the inside while it's hydrophilic on the outside. If you only had one layer, what could you do? Well, that's kind of like trying to build a house because you're only allowed to use one wall. And I'm not sure, you don't, you can't really do a whole lot of things with a wall. You can build a fence with a wall, but that's it. Actually, you can't even build a fence because there's only one wall stretched out and it's linear. You can just walk around the wall, so it's not particularly useful. And it's the same thing. You need two layers to form anything. There are a few cases where you can have one layer and then something's around it, but it's rare. The one exception could be if you have one layer that wraps around and then binds itself. So can't you imagine having three or four layers? So if you think about that, what would be good and what do you think would be good and what do you think would be bad with having more than two layers? We'll do this properly later on, but for now it's hand waving. So it's complicated, good or bad? Because it's gonna mean there's only one state we can build that way, right? So it's bad entropy-wise and everything. It's gonna be more complicated to fold. What's good about it? Not sure, maybe. But you can't really, would it be super good to have a fatty acid binding protein that could bind two fatty acids? Maybe. There are examples. The whole point, there isn't really a whole lot that would be good about it. We can usually get the functionality with just two layers and get it growing to more than two layers usually doesn't buy you a whole lot. And that's why evolution seems at the top. We don't need more than two layers to get functionality. We'll get back to the immunoglobulins. This is another small beta sheet that you have in the lens, in your eye, gamma crystalline, super small protein. If you look at this, you probably, well, you obviously see the beta sheets. There's even a tiny helix up there. And these things might look disordered, but remember what I said. The only reason we see them is that they are ordered in the crystal. What do you think about all these loops up here? Do they look disordered? They are kind of, but not still. So there is a very clear pattern here that you might not see it that well here. But if you draw, same thing here, it usually buys to draw things schematically. I thought I had that, no, there we go. Aha. There we have it. So we start with number one there. Number one goes to number two, and then we go down here. But when you come back up, number three takes a step back and goes to number four. So this, I'm sorry, but I deliberately lied a little bit to you. When you have these parallel sheets, it looks really neat to imagine that you had a structure that just goes up and down, up and down, up and down. But such a structure would be very floppy. You would have a long sheet and it would be hard to do anything with it. There is another problem. Let's assume, I'm gonna tear up one of the old slides here. Lecture five, nobody needs lecture five. Let's try to create the pocket like the one I had on the previous slide. Let's create my fatty acid binding protein here. Actually, I'll give it a, create the pocket out of it. Turn it around that way and create some sort of pocket. You might need another beta sheet. There's another beta sheet. Something like that. So I'm not sure about you, but if I look at, if you try to put something in this pocket, there's kind of a hole in the middle of the pocket, right? Things are not gonna stay here if you put it in the pocket. So that, yes, you can get something there and it would fall straight through. It's completely accessible to the water on the inside and the outside. Fail. But what if we had these loops? If these loops turn in a bit, so we can actually use the loops to close the pockets. So they're not, what might initially look at as if they are disordered and everything, these loops very much close things so that you get an inside that is isolated from the outside. Yes. Do you see them? So they are in the crystal structure. What conclusion do you draw from that? They're crystallized. So at least in the X-ray structure, they are regular. They are a bit mobile, but in general they're gonna be, they're much less floppy than you think. But for this to happen, it's actually good to have some loops that are slightly longer. And what this pattern creates, so first it creates a pattern where this loop can interact with the second loop and that loop can interact with the fourth up there. So while this, initially they might look a bit strange that you're jumping over things this way, but it actually creates a very nice regular structure where you can have multiple loops binding together. So it's almost as if there was a sheet-like structure up in the loop. So the dark blue loop here is binding to the light blue loop and similarly on the other side. This has a name. Jane Richardson found this out in the 1970s and they're called Greek keys. There is a, I didn't print it. Sorry, I should have printed that. So there are a couple of papers either today or tomorrow. It might actually be tomorrow. I'll try to print them for tomorrow. It's also a super short nature paper when she noticed it. And the reason why they're called in Greek keys that it corresponds to the shapes you'd see on Greek urns. It's exactly the shape. You take the protein, then you draw, you make a loop and then you go back. So why on earth does this occur both in Greek urns and in proteins? It's not a coincidence. That's quite right. And well, can't you do that anyway? If I draw something there and then I draw something up and then I go back. Whatever, I need a better pen. So it is a long line and order and compact but there is a key concept. There's another key concept here. What is that you don't do? What is that the line does not do? It does not cross itself. So you can draw it without lifting your pen. And it will never, ever cross itself. And it's the same thing here. It's a, in fact, it's just two dimensional shape and it will never have to cross itself. So what's so bad with crossing itself? Well, how many dimensions do we live in? Two or three? Good, good, I needed to check. But what would be required for things to cross in three dimensions? You would basically need the protein to tie a knot as you're folding it, right? How likely do you think that is to happen? It's not gonna happen. So that you need proteins to be able to fold so that independent parts can fold. That you're not gonna have a knot in a protein structure. Theoretically you could have but it's so unlikely that it's never gonna happen. So that's, and that's why nature has used that. That is, of course, at some point, Greek artists found out it's a beautiful, regular shape. Nature had slightly different driving forces, in particular, entropy and being able to fold things and not having insanely high, entropic free energy barriers. But it's effectively the thing is a very regular structure that you can fold. So sit down with a paper and pen and try to draw that. And again, then you can form hydrogen bonds up here too and down there. You can create a beautiful rigid structure and still have a bit of loops that help you close your pockets. And that's what you do in the fatty acid binding protein. Oh, here we even see the, do you see the small fatty acid there? And here we have all these large, both loops. And in this case, even the small helix up there that really helps to close it. So it's quite close here and then you have a bunch of hydrophobic side chance here too, so it's not as accessible there as you might think. And it's, I think it's yes, that they're oleic acid. So that there are two ways. Again, the hierarchy, the simple up and down, we sometimes call that a beta meander. And this meandering has to do with the shapes of rivers when they go back and forth. I'm not even sure whether the book includes it. And the other alternative to organized beta sheets are the Greek keys. And that's pretty much, if you find a beta pattern in a protein it's gonna be either a meander or a Greek key. In theory, you could, friend of order could, of course, come up with lots of other shapes here. But it turns out, and there are a few other shapes depending on how you classify them. But there are very few. If you have four beta sheets, they're gonna be like two to the power of four different ways. There are very few of them you observe in nature. We have no idea why it's just that we observe some and we don't observe others. Yes. So this would be more complicated, of course, because here you should probably take the loops into account too. It's a more complicated structure. Yeah, so I lied. It's called simplification. Never get it. How many here thought that that was an easy derivation that we did and that you would more than happy make it more complicated? Good, I thought so. And that has to do with, that's what we apply in particular in physics all the time, right? I could not put down, and I'm serious, if you asked me to do it now, I could not put down all the equations and decide what does the transition state of this look like? If you even wanted to include the alpha helix, I could hardly do this with a computer even, right? It gets too complicated. And what we wanted to understand, the only thing we wanted to understand is that why is it that beta sheet formation can be so slow? And we could understand that with the simplest model. So there is a classical quote by physicists called Wolfgang Pauli, is that you should always simplify as much as possible but never more. There is something called the Pauli point, which is a joke that that is the point where you have simplified as much as possible but not more. The only way to find out is to start with a simple model. You should never ever make a model unnecessarily complicated. The worst thing that can happen is that you realize, oh my God, this was too crude a model. I can't predict why beta sheet formation would be so slow. In that case, take a step back, sorry, and try to understand the simplifications you made. Is there something there where you think you went too far and then you might have to try something slightly more complicated but always simplify. And this is not limited to physics. You might think so and it's easier for me to describe it this way in the course. Anything you do in chemistry, life science, protein folding, sequencing. Do you have any idea what even happens in sequencing? The complexity of these molecules and their binding to different glass surfaces and then you're gonna need to account for the different fluorescence of the different probes. It's hundreds of orders of magnitude more complicated than these models. So there too, we're constantly simplifying. We're assuming that there is no crosstalk in the fluorescence. We're always assuming to get a simple model and then we just hope that the noise, the things that we introduced is not too severe. It's the same thing with them. That's the reason I showed you this excluded volume thing. We made a horrible simple approximation where we ignored the flexibility of proteins, the lacking flexibility of proteins. I ignored the things that things could not overlap and I still get that there was roughly a zero on 0.5 while the exact result would have been a 0.588. So that you frequently lose much less than you think from these horrible approximations. The problem or, well, it's not really a problem. If you look at beta sheets, there is surprisingly little diversity in beta sheets. You virtually never see mixed parallel and anti-parallel sheets and it has to do with the thing I said before that what would you get from that? It would be a more complicated structure and there's no obvious gain from it. And now where I mostly showed you anti-parallel ones, when we look at mixed alpha beta, there will of course be some parallel beta sheets too and that's case of where you need a mix of helices and sheets. Same thing that this left-handed crossover is more rare than the right one. It mostly has to do with this weak turning of the sheets. It would be slightly more expensive for you to turn over in the wrong way. It's not horribly more expensive, but again, why? If there is a cheaper way to do it, why pick the slightly more expensive way? And all this comes back to is that these are the properties of the amino acids. The amino acids that prefer to be in beta sheets, they're not completely floppy and flexible so that the properties of the amino acids are in beta sheets will lead to proper patterns in the crossover. It will lead to either parallel or anti-parallel beta sheets so that beta sheets, they're not quite as boring as simple as the fibrous proteins but there's surprisingly little diversity in them. And then I have confess that I lied again. Anti-parallel beta sheets are by far the most abundant. While loops in principle do not overlap, knots are in principle not allowed. Remember the other thing that I said? To every rule in biology, there's an exception. And this is the exception for beta sheet, pepsin. You see that the red, sorry, the green loop there goes that way and the yellow part sticks in inside that loop. So pepsin actually contains a knot. Can you, if you had to guess something about the structure, where do you think, do you think it would be stable or floppy? Somewhat stable, very stable or super stable? Super stable. Where do you think you have pepsin? In the stomach. Why would you need a protein to be super stable there? Yeah, it's like pH two or something. It's an environment that's so horrible that it pays off. I would bet, I don't know actually, but the folding time of pepsin is probably very slow. But normally that in isolation would be bad but because that makes the protein more stable and it means that it's not gonna unfold that easily, it pays off in the long run. So that they can be leaps. Yes, sorry, say that again. What does? Yeah, it looks the same. As far as I know, I don't think it's, I'm not aware of pepsin having two states or anything. And this happens, this is a small protein. So this just has to fold and, well, spontaneously, the chain spontaneously has to find this information. This is like, and that's why I said it's likely to take quite a while for it to fold but this protein is small enough that at some point the chain would likely spontaneously fold into that knot. It was much longer, it would be very unlikely for it to find that position. No, but this is a small enough protein that it will fold this way spontaneously. But as I said, that's the only exception that I'm aware of where you actually have a knot. So the other thing that if you absolutely wanna classify things here and go further down, if you just have four helices and they are parallel or anti-pal, there aren't really that many ways you can organize them even if you start to think about three dimensions. So here you can imagine that two of them are slightly further out from the whiteboard and to where further in. There are, these are the most common ones, there are few more if you're allowed to be in three dimensions. And of these, these two are super common Greek keys. The beta meander occurs and all the other ones, there are rare exceptions in the PDB. So if I now give you four helices, sorry, four strands and they're gonna ask you to arrange them into a beta sheet. Where would you invest your money? And that's how a bunch of programs, for instance, Rosetta, when you're trying to build structures, Abenit shows that sometimes bioinformatics is perfectly fine if you only wanna predict an existing structure. But what if you want to design a new structure? You want to design a brand new protein that should be stable and do something. You can't do that just by looking at nature because there isn't any homologous structure you can find here. But if you have a choice, try to bet the same way nature has been doing its betting. So that if I had to create something stable, I would primarily try to pick things the same way nature did it. The cool thing is that David Baker's group a few years ago, they actually were able to design proteins where you had groups of beta strands that were paired in ways that have never been found in nature before. So it is possible to get around this, but I think that was more a proof of principle. If you have the choice, go with the way nature has designed it. The other thing to be aware of a beta sheets that makes it slightly more complicated is that there is nothing that forces a beta sheet to just be formed from one sequence or one, well, one subunit. So here you have a green subunit and a blue subunit. And this might seem complicated. Do you see what the coloring here kind of helps you? If you didn't have the coloring here, you would have no idea that this was two different proteins. So some of those things that might seem obviously blatant, simple in those visualization programs, seeing that there are actually two completely different chains here. There are even different molecules, right? This chain has a beginning and an end and the blue chain has a beginning and an end. But the beta sheet, well, you don't see it here, but you can probably bet almost anything, right, that they're gonna be hydrogen bonds here, right? So effectively the sheet extends from one molecule into the next molecule. And here we are a bit sloppy as chemists or biochemists, we frequently think of this as one molecule, as a dimer. This is the type of dimerization you get, for instance, these misfolding diseases when you start forming plaques or anything, that existing beta sheets tends to favor a growing beta sheets into more molecules. So if you wanna create a molecule that should dimerize or something, having a beta sheet in it that's accessible is usually a pretty good bet. And here the complication is that when you're seeing something in a crystal, well, it could be that it's a dimer in the crystal, but in solution it would break apart. Or it could be the obvious other thing that in the crystal they're isolated, but in solution they would form dimers. So we even have this problem, right, Dari, wait, I was about to say one of my students, you know Dari, he's working over the weekend here with, just one second, he is working on expressing ribosomes to cryeum on them. And then you're adjusting the concentration you have on the ribosome and the concentration you have of your, the DDM, the solvent. And depending on what concentrations you have on the protein and solvent, you can end up with cases that the protein starts to dimerize. And in that case it's not the natural dimerization, you don't want that. Or they can break apart so that they're no longer dimers. So these things can be very fragile. Yep. So what kind of mutations for that? So they're not necessarily mutations. So for, we know very little about plaques. So the character of these plaques is not necessarily that they're mutations. This is something that always happened, but the activation barriers are likely so high that it will take decades for it to start to form. And even when it's forming, the formation process is fairly slow. So we all, sorry, I'm sorry to say that, we all get plaques. If you were to dissect my brain now, you would probably start to see them. You're young enough that you would only have trace amounts on. By the time we're 80 or something, it's gonna be visible in any brain. But the brain is also fairly good at rerouting nerves and everything so that we can get by with some deterioration of your cognitive functions. And then under some conditions, it becomes so bad that it starts to grow too fast. And when you're young. And the Kloysfeld-Jakob's diseases, for instance, a mutation where it's likely the prion starts to form easier, which accelerates the growth of the plaques. But the plaques, this process happens all the time anyway. Now this is a different type of protein. It's not related to the plaques. So it's not that the plaques themselves are fatal, it's just that if they start to form too early when you're too young, they start to deteriorate your cognitive functions too early in life. I think we already spoke quite a lot about helix versus sheet formation, but I'm gonna sum this up again because now we're gonna move over to helix. All the things that we've seen here are based on the fundamental features of the helix versus sheet. So if we particularly look at the sheets, there are non-local hydrogen bonds. That's critical to have this stabilization, particularly of the parallel beta sheets. It's even more important to drive dimerization or something. That you have these flexible interacting stands. The flexible strands was the whole reason why we could form these loops and everything and we can form these fairly small pockets. That we have a fairly strong all or non-transition is also important for them to actually be stable. We don't want one. If you look at the beta sheets, they're frequently more stretched out than the alpha helices and everything. So if it was very easy to start breaking those beta sheets, the structure would not be that stable. So that there had to be lots of constraints to form a large overall structure. As we're gonna see later on, it's much less common to have a structure that's just consisted of two alpha helices because it's gonna be too floppy. Alpha helices on the other hand, just to rehash, we had virtually all, well, all hydrogen bonds are local. So you form a super stable alpha helix, but it's beautiful in many ways, but it's not really a protein yet. And the problem is that once we're here, you've already used all your hydrogen bonds. You don't really have any free remaining hydrogen bonds. Well, that would be in the side chains. So the helix itself is stable, but it's gonna be a bit harder to determine how helices will interact. We had these coiled coils, but even the coiled coils are fairly boring. It's just a pair of helices. So there are relatively few constraints to the rest of the structure here so that this is gonna have to depend more on weaker funda-vals packing and everything on multiple helices. This very much goes back to the 3D organization. The helices, the sheets, they're parallel or anti-parallel and then orthogonal or aligned structure and boom, you have a bunch of really convenient small proteins you can form. With alpha helices, you're gonna need to pack these helices, you can do tons of more ways, which means that it's gonna be more complicated. So having said that, let's jump straight into the alpha helix organization. This diversity means that they're much harder to classify than all beta structures. And we can, if you look at hemoglobin, that's almost a nightmare and that's a super simple protein. KCSA, a ligand-indicated ion channel. Well, that's a little too complicated. If you go to a super simple structure, just four helices, you have to be called those four helical bundles. So let's start somewhere down there, right? Even with those four helices, there are a bunch of different ways you can pack them. Cytochrome C, Tobacco Mosaic Virus Code Protein and Hemerythrin Binding Protein. Do you see a pattern with all these helices? Are they parallel or anti-parallel? So that, right, so if you take one helix down here, right, it's very rare to have a loop that goes all the way up. It would be too many residues in the loop so that when you take one helix in one direction, we pretty much need to put the helix in the other direction and then you go down up. On the other hand, there is no obvious brownie points for forming an extended two-dimensional stretch of helices. So with helices, it's much more common that you group them into a small bundle of three, four, or maybe five helices. So if you look at these residues, do you see the crossing over there of the dark versus light blue one? Roughly 20 degrees, right? So it's not a mistake in the structure or that is not beautiful. In one way, the Cytochrome C structure here actually makes more sense to me than the Hemerythrin structure. Which structure looks most disordered? The middle one, right? And do you see this large stretch here? This must obviously be an artifact in the crystal, right? So when you see this in the PDV, what should you say when you open this in your molecular viewer? Is this disordered or ordered? You're both right. So let's start. So why do you think it's disordered? So if you were to run this in a simulation, how would this part behave? It would move around a lot. So why do you think it's ordered? So how can both of you be right? So I said if you take one of these proteins, it's gonna behave that way. But you might have some sort of crystal packing here. So the multiple copies of this might be right. We'll get back to that in a second. Yes, that's just the antipyrohort. If you go through them, the cytochrome one, which was on the very left most, this is huge diversity in these domains. And one of the key things that they're frequently metal binding, which means that they can be imported in electron transports. And that's when you have the cytochrome oxidase, for instance. We, oh my God, this is in 2002. At Stanford, and I suppose, so we actually got a protein for all that. We got a research grant from DARPA to do bioinformatics and structure prediction on a small organism called Chevonela on a Densis, MR1, which we argue this special because this organism contains more cytochrome domains than any other known organisms. Why on earth do you think we got that grant? And why do you think? Well, it binds heavy metals. So what is DARPA? The US Defense Strategic Research Program. So in particular, heavy metals. There's an idea that this bacterium can be used to chew radioactivity, which I guess DARPA had an interest in another. It's probably prescribed by now. I can confess this by informaticians. We were completely uninterested, but we were interested in developing methods to do protein structure prediction. Of course, we can predict any structure, and if they're happy for us to predict this structure, we get money for it. That's good. So that it's a bacterium. Imagine if you have radioactive fallout, and radioactive is usually uranium, something heavy metals, right? But it's everywhere. And you can have a bacterium, and bacteria, they grow. They grow exceptionally fast. And if these bacteria would then grow and bind the heavy metals, and then they would oxidize some or something. So the idea is that can you basically use this for the bacteria to bind and reduce the heavy metals so that you can take care of the radioactivity this way? I think they tell the truth. I think our research project was about as fussy as it sounds when I describe it here, but they bought it. But the point is that these four helices, they likely create some sort of environment where these helices could bind heavy metals down here or something. The large diversity of them. TMV, tobacco mosaic virus. Tobacco plants, you see these black parts of the leaves here? It's a disaster if you're growing tobacco, because it's basically you need, no, but you need the leaves, right? It's destroying your tobacco. And as, no, well, yes, but that's a separate discussion. But again, if you're a farmer in the south of the US or Cuba or something, this is how you make your livelihood. So tobacco mosaic virus is a disaster. And already in the 1950s, I think it was, we were able to determine the structure of this virus. You see these rods here? One of the first structures of viruses you were able to determine an electron microscope. And you can even magnify that significantly more. So now we're in the small, small part, long inside the rod here. There's 50 nanometers here. And it's given what we can do with the electron microscopy today. It's kind of sad how low resolution that was. That, what you see there is a structure that looks roughly this way. Do you see that there's something spiral shaped? You can probably almost see it here. Do you see that each of these one, whether this, four helical bundt. But the problem with a rod like this, there's gonna be less space on the inside here, right? And that's where you have your disordered part of the loop. Technically it is disordered, but because you're gonna need to pack thousands of them that way, since they pack relative to each other, they're gonna form a very stable and well-defined structure in this rod. And then you just repeat thousands of these proteins around each other. And then you have something else. The red part in here, what is that? It's RNA. So what the virus will do, this virus will infect the cell in the plant. And then these will unfold, release the RNA. What does that RNA code for? It codes for more of this protein. And that's it. I think that viruses are some of the most, I was about to say organism. A virus is probably one of the most beautiful features that exist in nature. Because it's kind of a limit between what viruses, they're not really alive, right? But it's an organism, it's life in the sense that it can reproduce. But it's used to host or reproduce. And it's definitely not life. But you could argue from nature's evolution's point of view, this is when you've taken the essence that the point of DNA or RNA in this case is just to make more RNA. It doesn't get simpler than this. It's just RNA and the protein that produces. Most other viruses are slightly more complicated, but it's as simple as it gets just for helices. And this creates the entire code. Why do you need this code in the first place? Because RNA is very fragile. And this was Rosalind Franklin to determine the structure. That one actually she got proper credit for in contrast to DNA that Watson and Grick stole. Hemoglobin, which is the similar to one of the healers that I showed you. I'll get back to the hemorrhage trend. Hemoglobin is a complicated protein. It was one of the first ones that we got the structure for because we know that it's so important. Why is hemoglobin important? What does it do? It binds iron and that iron binds oxygen. So it carries oxygen in your blood and it means that it's a very abundant protein. So this is actually a protein that consists of several subunits. We'll come back to that later on in the course. And in each subunit, you have this red group that is actually not a protein at all. So this is a protoporphyrin that when it binds the iron, then we call it the heme group. And that's a separate molecule that the protein will bind which creates the binding pocket where we can have oxygen. And you know what? Let's make it slightly easier by just looking at one of the subunits. So this subunit is what we call the globin fold. The full hemoglobin would consist of four of them. There is actually a related protein that we'll come back to in a second. So this consists of six alpha helices and when you just look at it that way, it appears to be completely disordered. But the reason for all each of these crossings of alpha helices have angles where it's either roughly plus 20 degrees or what is the other one. It's 55 degrees I think. Anyway, every single crossing between two helices is done to pack the helices as well. Which in this case creates a small binding pocket on the inside. Do you see how many helices we needed to create a binding pocket here? Beta sheets were far more efficient to create a simple binding pocket. But for whatever reason they didn't do this with beta sheets. So that we almost create two layers here and then we create something that binds the heme group. The reason why you bind the heme group is that in two of these helices we have specific residues and particularly histidine. So that histidine there comes down and then you have the nitrogen there which is negatively charged. So that's gonna bind, create a very nice binding environment for the iron down there. And then you will be able to bind that oxygen there. Myoglobin which is a closely related molecule looks almost exactly like hemoglobin but it's only a single subunit. So this could almost be myoglobin. We'll come back to that later on the course but there has to be something important here. There's some important difference between hemoglobin and myoglobin. You said that hemoglobin carried oxygen, what does myoglobin do? And we know nature tends to like redundancy so your body happily have multiple proteins doing the same thing. Oh, so you have a good protein and a bad protein. I'm not sure what you think evolution would have done in that case. So which one looks simple, clean and efficient? The one with one or four subunits if you had to choose. So one subunit is definitely easier, right? So there must be a reason why nature has kept something with four subunits. So there is one difference here. Myoglobin binds oxygen in your muscle tissue while hemoglobin carries oxygen in the blood. On the other hand, you have muscle tissue in lots of places including very close to your lungs and everything. So which one of these should be more efficient to carry oxygen? Oh, sorry, not again. Which one should be most efficient with binding oxygen? So let's assume that hemoglobin is better at binding oxygen so that hemoglobin would bind the oxygen in your lungs and then you would carry it in your muscle where it would not release it because hemoglobin is better at binding it. Doesn't sound like a very good setup. So do we have any other suggestions? Do we have one suggestion here that hemoglobin was better at binding oxygen than myoglobin? Can you think of any other extreme alternative theory? Myoglobin is better at binding oxygen. So let's test that hypothesis. So you have the oxygen in your lungs that would immediately bind to the myoglobin in the lungs and there is no oxygen in the hemoglobin and the rest of your body will not get any oxygen. Doesn't work either. That means there would a horrible inefficiency because you would bind oxygen everywhere. Yep. How low did the hemoglobin is? I think we will come back to this later on. But the reason for these things is that there's a really cool way that hemoglobin is good at binding oxygen when you have high oxygen pressure. Where do you have high oxygen pressure? Where do you have lots of oxygen in the lungs? But in a surrounding where you have low oxygen pressure out in the muscles, hemoglobin is not very good at binding oxygen. So that this creates a cycle where the hemoglobin will bind oxygen where there's lots of oxygen and then it will release it where there is very little oxygen. And that's why you're going to need these four subunits. We'll come back to that later on of course, what the difference is in. But you see, any time nature does something complicated, there is a reason for it. It's not just that it's a complicated molecule because after again, 4.3 billion years of trial and error, if there was an easier way to do it, nature would likely have found it. And now we're going to do a bit of bioinformatics it seems. Carrie, hang on with me in one minute here and then we'll get back to the structure. You know about intrusions and exons, right? And that we have the exons sits together from the DNA and then into the mRNA. And then you create the proteins. So if we now look at this beautiful structure here of the alpha helices, what parts do you think, how do you think that these secondary structure elements are correlated with the intrusions and exons? If you had to guess. And in particular say, so you only have intrusions and exons in higher organisms, such as humans, right? And humans also have this pattern that humans tends to have more complicated structure. We frequently structure multiple domains. So given that it would seem completely obvious that we had this would be, sorry, the blue part here would be one exon and then we had another exon and then a third exon as part of the proteins. And it makes so much sense, but it's completely wrong. So those three intrusions and exons, you see that they're cut right in the middle of the helices. There is no correlation whatsoever between exons and the structure of proteins, which is not that entirely surprising if you think about what happens. They get stitched together on the RNA level long before we convert, we use the RNA to create an asin chain of amino acids. So exons has nothing to do with the complexity of structure. So even though it's correlated, of course, we have exons in higher organisms where we also have more complicated structure, they have nothing to do with each other. And hemoglobin in particular tends to have three exons. You can actually get some weak hembinding even just expressing one of them, but that won't bind the oxygen. We also talked quite a bit about these helical ridges groups, right? And I showed that there's one case, you can bind them at minus 25 degrees. The other case is that you can actually turn them a lot in the other direction and then you get like 45 degree binding. And if that is not obvious, actually you can do the experiment by cutting, take a copy of the book or something, print these two and then see what ways you can turn them in. But in general, when you have two sets of lines here, they're gonna be two alternative ways you can pack them. And one of them is gonna be roughly minus 25 degrees and the other one roughly plus 25. I will come back to that tomorrow when we talk about membrane proteins. You might think that this is stupid, simple physics, it's not. It was used in a very cool high profile paper about a decade ago. And if you do this, you can actually do statistics in the Protein Data Bank. So all these things I showed you, if you look at what are the relative orientations of the two axis of helices, and this is probably two decades old, and there are two obvious peaks of all the structures in the Protein Data Bank. And they correspond to exactly those two relative orientations, roughly minus 50 or plus 20 degrees between the two helical angles. So if you're a bioinformatician and if you're skilled to predict the relative orientation of two helices, there are pretty much only two packings you should choose between. And this is what we're gonna use for one of the examples tomorrow. If I give you a protein, a membrane protein, and you now have five, six helices, and you're gonna need to predict how these are stitched together, well, you can't move them that much in a protein, and there are pretty much for each pair of helices, there are only two alternative orientations you can pack them. So you can use this, and Bill de Grado did to design membrane proteins. Helices, I mentioned this very briefly before the rakes, remember these helical wheels? And I showed you completely different slides before that, in that case it was collared, sorry, it's not collared in this case. If you take the residues here and put them in an order so that roughly every three, four residues you have something hydrophobic and then you have hydrophilic things. Then you're gonna, if you then look at these helical wheels, we can have hydrophilic things here and then hydrophobic things there. So hydrophobic and hydrophilic. So if you take two of these helices, they will likely turn the hydrophobic parts to each other. And you can even have more than two. So if you have four of them and it's roughly a quarter of the helix, that's hydrophobic. You can end up with a small bundle of four helices that are hydrophobic on the inside. And what you do this, you can call this a hydrophobic moment because just as we, a dipole, a dipole has to do with the difference in charge, right? That there is a negative charge on one side of a molecule and a positive charge on the other. And then we tend to draw that as a small arrow to describe the difference in the charge. Here too, we can use it, we usually don't use arrows, but we talk about the same concept. You can imagine an arrow pointing away from the hydrophobic to the hydrophilic residues here. So that this helix would have an arrow pointing in that direction and that helix would have an arrow pointing in that direction. And then they should both turn the inside of the arrow to each other. So you can calculate some sort of hydrophobicity index for molecules. So why on earth would you do this? Well, it turns out that there are quite a few helices that are so-called amphiphibic or amphiphilics that hydrophobic on one side and hydrophilic on the other. That means that they're gonna form some sort of three, four or five helical bundle that has some sort of hydrophobic binding pocket on the other side where the other thing is hydrophilic. And this particular molecule looks very much like a protoporphurin, a heme group. So what if you do a small four helical bundle like that and it binds a group like that? What do you think that protein could do? Carry oxygen. What could you imagine that to be useful for? Artificial blood. So why would you like artificial blood? So that the general problem, blood doesn't keep very well. It's hard to get people to donate blood. There is another problem with donated blood. Disease and infections, right? And that we are fairly good at screening blood, but there have been historically, but there have been mistakes. There were a lot of people that HIV infections and there are even some religious that for whatever reason, you don't accept blood from another human. I think some of them do actually accept artificial blood. Which molecule do you think is gonna be more efficient? This one or hemoglobin? Yeah, hemoglobin is far better because hemoglobin has, this molecule is not gonna have any of the fancy features that hemoglobin that have different affinity depending on the oxygen pressure. But if the difference is between dying versus not dying, I kind of prefer the latter. So artificial blood, you can have artificial molecules that bind iron and carries oxygen. You can also use this type of molecule, remember that said they're hydrophobic on the inside, right? What if you add a fat molecule, not this particular protein, but a similar protein? And if you add fat on the inside, then you get a way to dissolve fat in water that you normally would not, normally that's the whole definition of fat that doesn't dissolve in water. And then you could have, use this an emulsifier to use, say for instance, imagine having a margarine that only contains 50% fat. The rest of it would have to be, well, water soluble things. And if you didn't have any emulsifier, getting something that's 50, getting a product that's 50% water, 50% fat is not entirely easy. Actually it's dirt, it's super easy. You just add a detergent or something, right? But that's not ideal if we're gonna eat it. I'm not sure about you, but I don't like to eat soap. So you need an emulsifier, you can eat. Proteins, proteins make great emulsifiers. There's only one small problem. What can you imagine the drawback is? Sure. Yeah. Why would you want to express it in the protein gene? Well, if you were expressing mammalian, producing anything in mammalian cells is a pain because mammalian cells are so complicated and the second you start to try to express things, they die and everything. It's far more convenient to express things in bacteria. Bacteria, they are super liberal and everything and you can express 100 fold more proteins in bacteria. The other thing is that my hemoglobin would be complicated with the intros and extras, just getting the folding right for hemoglobin would be a royal pain. So if you're gonna wanna produce anything in kilo amounts, you want a simple protein. So if you think about these emulsifiers, can you have any idea what the worldwide market for these things is? You're talking about insane amounts of money here, right? So there's a lot of protein design that actually goes into the food industry. So what is the problem with low-fat margarine? What happens if you try to fry in it? So this is why you can't fry in most of those products. If you heat it up, the protein unfolds. What happens with the protein unfolds? It releases the fat and that's when you have this problem that the fat somehow clumps together and there is something water-like. There is a new generation of these emulsifiers. They're getting better all the time. So if you look, many of these products now, they say you can cook with it and that's when they've been able to improve the emulsifiers by getting them more thermostable. And if they're thermostable enough, you can heat it at least 200 degrees centigrade or something and then you can cook with it. I have no idea what company came up with it, but I can imagine they made a lot of money. This is something else. It's an immunoglobin. You're looking at roughly $15 billion. One of, by far, the hottest. Let's see, I can draw it. So immunoglobins is what you have in antibodies. This is probably the hottest contemporary trend in drug design today because antibody, you can make them super specific and you can get them to only target one specific epitope or something and then you can kill those cells or something. Very, there's a ton of research going on here, particularly targeting cancer cells. But it's not just cancer cells. So ideally, you would even like to be able to isolate just these small domains and everything because it's somewhat hard. You can't design the entire protein, but you're trying to design just the variable regions here. They're almost entirely beta sheets and this particular protein is what you have in Humira. The real name is Adalimumab. Humira is the trademark. So Humira is, at least was, I think it still is, but in 2016, this was the most sold pharmaceutical drug in the entire world. And if you look at these drugs, I should update it. So the 2014 sales is 13 billion and then I think my 15 billion figure was from 2016. Do you know what these numbers are? Yes, it's for years, but what year? It was future, from 2014 it was future. No, they were released already. So what do you think will happen with the sales of Humira in a few years? Patent expiration. So they're so expensive to develop. If you could sell a product for $15 billion per year, it's worth $1 billion or two to develop them, right? So this is what the pharmaceutical, and that's why they spent so much money on research that they mostly fail, but when they win, it's like $15 billion times 10, because we sell it for 10 years. $150 billion for one protein. Now, of course, if everybody was allowed to copy your protein the second you had designed it, and that's why we have the patent protections, but it takes so long to do all the trials here and everything so by the time you get it to market, you frequently only have five, 10 years where you can sell it. So what's gonna happen now, I haven't seen that about Humira, but there is gonna be a bunch of generics here that people just copy the molecule and they get it approved. If we'd like to take it two or three years to get it approved, and if I know them right, I would assume that there are also tons of patents on the formulation and the dose and everything, but in a few years they're gonna be copies of it, and then the sale, the price is gonna go down to a 10th. But if you went ahead into the pharmaceutical industry, virtually all of these high ranking are biological, what you call biologics today. So traditional drugs would be a small hydrophobic molecule. All modern drugs are proteins, because they're more specific and you can tailor make them. Yep, I should know, I think even have some notes here. It's a rheumatoid arthritis, pleuroreiatric arthritis, ankylosonic arthritis, Crohn's disease, ulcerative colitis, so it's a bunch of arthritis related diseases, which is not entirely a coincidence. If you were to design drugs here, what type of disease do you target if you wanna make 150 billion? Cardiovascular is good, do we have any other suggestions? Outdoor immune. Outdoor immune, cancer. Diabetes is good. So what's common with all these diseases? So that middle-aged, fat people in the western countries, never been here, lifestyle diseases are great because you can't cure them. If it's something you take to reduce your cholesterol because you will keep taking the drug for the next 30 years. The worst point, and again, this might sound so tough, the worst possible drug you can imagine is, of course, curing a patient. Because if you just cure the patient, they would take the drug for a week and then they would not take it anymore, right? And this might sound so cynic, but the point is, I hate to break it to you, but these companies are not in this business because they like charity. And as we, everybody's frequently talking about how horrible the pharmaceutical companies are and everything, and I'm not sure about you, but if you compare this in the grand scheme, you think there are some companies making pharmaceuticals and there are some making weapons. In the grand scheme of things, I wouldn't necessarily say that the people making this type of drugs are the nasty ones, right? But they're, of course, they're in the business to make money. So it's their stockholders who are nasty that force them to make money. And who are the stockholders? It's basically you, your retirement, benefit the accounts and everything. Because when you were gonna invest in a mutual, the stock fund, you pick the one with the best return and the pharmaceutical investment, that sounds good. And those stock managers, they're gonna focus on the company making lots of money. So the problem is that it's very easy to think that this is some other third party that's nasty, but it's really you who are nasty. You demand good retirement benefits, and for that reason, we try to charge a lot for it. And ultimately, we could, of course, fund all this research with taxpayer money, but we choose not to. That's why there's not a lot of research going into, for saying malaria, because the average person who gets malaria is not rich, and we choose, again, you don't get that much malaria in the US or Sweden, and therefore we don't invest our taxpayer money. Yep, I'm not an economist, I can't say. There are certainly lots of things that are wrong here, but then my point here is that this is a complicated business. It's a very complex, but it's also that, again, every time there's a mistake with the drug, all of us, we scream that there has to be more tests than everything, we're gonna sue the companies. And then we're so upset that it takes 15 years for them to get the drug to market. Well, it was our loss that we asked them to state that makes it difficult to get drugs. And one thing that has happened is, of course, there's a lot more research required for drugs. A whole lot of it is complicated. Actually, I would say roughly a half of the research that goes into a drug is computational nowadays, because you need to screen so many things. We'll come back to, we're gonna talk about drug design later on in the course, but it's complicated. It's not just technically, it's a legal and financial and morality-wise complicated business. I have a few more slides that I rushed to. Can I take five minutes? The final thing is that there are a few proteins where we can mix alpha and beta sheets. And the alpha helical parts, they look like alpha helices, of course, and the beta sheet one looks like beta sheets. There are really two common patterns here. One of them is that they are mixed, that's alternating, that alpha beta, alpha beta, alpha beta in the chain. And the other one is that you have one alpha helix domain and another beta sheet domain. That one, the last one is pretty boring because it's just one part is alpha helix and one is beta sheet, and you've already seen that. So the only interesting one is the one that's mixed. And that's when we have these helices that, for instance, there's Tim Barrels, right? That you have a sheet and then we need an alpha helix to go the other way so that the next sheet can be parallel to the first one. There are alcohol dehydrogenase is another fallback. Do you see that you have parallel beta sheets in there? This is the molecule by which you break down alcohol, actually, so that there's quite a lot of genetic patterns in Asia. For instance, that you have a slight deficiency in alcohol dehydrogenase. Or I would actually argue that we are the deficient ones because most humans don't have to drink that much alcohol. But in the northern parts and everything that I guess helped us do nutrition-wise or something. But if you have a deficiency in the alcohol dehydrogenase, you're gonna be more drunk if you drink alcohol. There are a couple of important folds here that are useful to know, but in the interest of time, I think I will save those tomorrow. I have four more slides, but I'll cover those slides tomorrow and then I will go into talk about how stable folds are too. So that the study question for tomorrow, skip the Rosman folder I haven't told you about yet and the number of folds we haven't talked about yet. But all the other ones you should be able to go through.