 Good. Today, I hope the other ones didn't get scared away from the math, because today, there's no more hard math. Now we're going to make much more use of our knowledge and look at proteins. But before do that, I figured let's spend a couple of minutes and talk about things from Wednesday. And given that there are so few of you this morning, we don't necessarily have to go through exactly these questions, but we can start from the top here. And if you have other questions, ask them to me. The point is, I hate to break it to you, but I know the answer to these questions already. So this is about what you want to know. But let's get started, anyway. So I've been pitching this whole difference between thermodynamics and kinetics and arguing that we need to care about both of them. What's the difference? Is there any difference? Right. So that's the fundamental difference. And that's very much related to the next question, which will we one step forward? Yes, what feature features of the free and the landscapes are related to thermodynamics and stability? So energy minima, right? Or in particular, differences between energy minima. Absolute values don't matter at all. So the absolute values, that's pretty much the difference between centigrade versus Kelvin, right? And some equations help you to have Kelvin when things are proportional to the temperature. But with free energies, you're virtually always only interested in relative free energies. It's even going to turn out, once it comes to computational approaches, it's very difficult to calculate an absolute free energy. Because calculating an absolute free energy of, say, an ion in water that, oh, sorry. I locked somebody out. No? Well, never. So calculating an absolute free energy in water, that would correspond to literally growing a charge in water, right? You would never do that. So what would happen in practice is that you would take that charge from back you and move it into water. And the second you include the world move or change or something, it's always a relative free energy. And in one way, we're happy because then we only have to care about the Boltzmann distribution. You only have to care about the low lying levels. But that is not sufficient. So that's related to the third question. So what's different with kinetics? Oh, sorry. You'll get it. Continue here in the meantime. So why do the energy barriers determine why is that important for kinetics? Right. And in particular, I think we're going to come back to that in a few questions later on. But what do we call these states that really determine how fast things happen? Transition states, right? Transition states are kind of strange animals because it's a state you would never, ever observe experimentally. Because it's the worst possible state. You would never, ever see error at transition states. It's the most unstable state you can imagine. But because that state, the better the worst state is, the easier it's going to be for the reaction to happen. So when it comes to understanding kinetics and understanding why things happen, we are super interested in understanding the transition states. And later on, both in this course and others, it's actually going to turn out that we frequently try to use different experimental techniques to indirectly get information about the transition state. But you will never be able to determine the structure of a transition state unless we stabilize it. Because the transition state itself is as unstable as it gets. So what's the, so if we start using these things and talk about free energy structures, I went into some detail, both for helices and sheets, to try to separate initiation versus elongation or extension free energies. So we start to get helices. Roughly how large are they? These free energies, initiation and elongation. Yes, and initiation. But the initiation can't be negative. If the initiation is negative, then you wouldn't have a barrier, right? So the elongation, I would say that's, and again, we're talking both parts here, right? One, single k-cals per mole. Initiation, roughly how large is that? And I already hinted at what the sign has to be. Four or again, two to four or something like that. Factors of two are not important here. But the point is it's not 0.1 and it's not 10 to 20. It's a single k-cals. So what does that, oh, sorry. And how did we determine those numbers? Right, so in this particular case, it was C, this spectroscopy. But if you think about this more general, how did you, how were we able to, again, if we go back a week when we started this course, if I had asked you to try to determine a model for the free energy barrier to start to create an alpha helix, I bet you wouldn't have been able to do it, right? It's not something easy to get at. So how did we get to that? So again, I don't care about, you can even argue, forget about alpha helix. I'm more thinking of the general approach we used. So what did we start doing? We started create something, right? What did we create? Exactly, so you created a super simple model. And in this case, it was just, what is the main thing that stabilizes it? What is the main thing that destabilizes it? And in that case, free energies, or even before we went into free energy, energy versus health will be. And then, when you, any time you create a model, you're gonna feel very sloppy. Because the whole point of models, you want to be sloppy in the sense that you want the model to be super simple, which of course is not sloppy at all. The point of making a simple model is that you wanna focus on the real essence of the problem. What is the really important part of the problem? And don't care about the details. And then you start, you work with your model and you see, based on this model, could I predict some non, well, some properties that it might be non-travel in the model, but you want this to be measured in a very simple experiment, measurable in a simple experiment. So what we did in this particular case that our model led to, that we could make some predictions that as a function of temperature, how many residues should be helical, right? So then we go from the model to the experiment and make a prediction about the experiment. But in this case, the point is not that we want to replace the experiment with a prediction because the experiment is trivial to make. On the other hand, the free energies I hand with my model were not trivial to get at. And now I can of course go the other way, right? Because I can also measure that's an experiment, I can now solve my model to get at the free energy that I was interested in. And virtually everything you do in the lab, and again, you can forget about physical chemistry or anything here. That is related to this. If you're measuring say sequence determination with high throughput sequencing, you are making a model in that case that has to do typically with fluorescence. So based on what not just individual nucleotides, but a sequence of three nucleotides, what fluorescence pattern do you expect to see? And then we solve this for all triplets of nucleotides. And then you basically end up with a large equation. And then of course you have a machine that determines these patterns at a super high speed. But because you now have a model for what fluorescence you would expect to see, we can then go back and see what sequences we had in the sample. So this is universal for almost anything you want to understand on the molecular level. Create a super simple model. The simpler the models are, the better. One of the amazing brains in physics here that started this whole tradition of simplified model middle was a Russian called Levlandov. You might have heard of it in some point if you started physics. And Landa was really amazing in this case. See, I think super simplified things. That's just as the bare essence of the problem. Frequently things where you just have spins relative to each other that can explain phase transitions. Then we start going into physics. I won't expand on that. Sorry. Yep. So what is that they measure? In particular, why do you want to initiate? You could argue for every single residue, right? There is some part stabilizing the helix and there is some part destabilizing it. So what is that we try to get at by somewhat creating a separate term? I call it the initiation here, but what does the initiation part really measure? The cost for what? The cost for starting to form a helix, right? And you can always, already with hand waving, we could say that if there wasn't any cost initially, if we started by gaining things, that's great, you would just be downhill. Everything would be helical. There is gonna be some sort of initial cost where we have to pay. Now, of course, that is caused by the very, there's not any fundamental differences in the stabilization energy between the first, the second, third, and fourth, and fifth, and sixth residue in helix, right? But because there are, you don't have as many stabilizing hydrogen bonds when you start, there are some things that will be different initially. So what we really wanted to get at here, can I explain where this initial bump comes from? Why does it cost energy? Because that's the barrier I need to get over. Once you've actually started to form the helix, I would argue that elongation is not quite as important. Elongation will tell you a little bit how fast the helix will grow, but the important part here is gonna be how much is it cost to start creating the helix. So what would happen if the elongation was just a cost? Yep, no? If the elongation was a cost, if you had to pay to elongate the helix. Because you can, free energy is almost symmetric, right? Any reaction can go in the other direction. And if it was a cost to elongate the helix, I would just shrink my helix and I would gain free energy. I would gain free energy, so the shorter it is, the better it would be and you would unfold the helix. So I went into some detail, this kind of, what type of transition is alpha helix folding? You could argue that it's not a phase transition. It's a highly cooperative transition and it's also very local transition, but it's not formally a phase transition. An important difference is that helix and coil can coexist. So you can have a very long sequence and part of it will be helix and then it will grow a bit to turn coil into helix and the helix will increase and then it will shrink a bit. So you can have a mix of helix and coil. So, that's actually a very good question. So a gradual transition just means that you can have coexistence of things in between, right? Any, most transitions are gradual. If you gradually heat water, that's a gradual transition. A gradual transition does not have to be cooperative, but the helix transition is relative cooperative that if you start to form helical residues, it will be favorable to form more helical residues and that is because you're gonna pay a little bit initially but then it's gonna be cheaper to add more residues. So cooperative transitions can be gradual and extremely cooperative transition would be a phase transition, but cooperative just means that in one part of the molecule has already made the move, it's easier for the second part to make the move. But rather than emphasizing the cooperative part which I would stress that it's, you can have coexistence between helix and coil. That actually makes the helix transition relatively boring. Helix, the helix transition is not really that related to protein folding, surprisingly enough. So you're gonna need to have some sort of stable structure that will partly be helix, partly coil. If you just randomly mix residues into a chain and put them into order, you will frequently get a little bit of helix. In general, you will not get a protein. You had a question. So that depends on the temperature and it depends on your residues, right? In general, and this actually comes down to the fact you wrote secondary structure predictors in the bio-athematics course. And one of the problem there when it comes to secondary structure prediction is that it's very hard to get that prediction right at the interface when the helix stops and you start having a turn. And that's frequently because it's not that well-defined experimentally. It will depend a little bit on the rest of the structure. You might have some residues that on average spend 50% of their time in helical state. So the helices can easily start to fray a bit at the end for if you pick any random sequence. Improdience is a bit different because improdience, the rest of the structures can stabilize them. That will generally be the case, yes. Did you have a question too? I mean helices and sheets as helix and coil? Any scale. We'll come to beta sheets in 10 seconds. So let me just finish off the alpha helices first. So the point was the alpha helix, we haven't really yet come to the point where we started talking about large proteins. So when it comes to large proteins, you have the rest of the proteins stabilizing it too. But in alpha helix you can create any arbitrary length of sequence and just throw it in water and you can measure that you're gonna have a certain fraction of alpha helices and the fraction of coil. So let's get to the beta sheets then. So how is simply the beta sheet kinetics in particular separate from alpha helices? We spoke about the structural and stability last week already. You can take that. A larger activation, larger energy barrier. Yeah. So this is a proper phase transition in the sense that once you start forming a beta sheet, there's gonna be a very strong driving force to keep forming beta sheets. Now my entire derivation here was based on the fact that we have some simple average residue, right? I didn't try to account for the fact that we have 20 different residues. Some of them wanna be in beta sheets and some of them would prefer to be in helix or coil. In general for a protein, it would be more complicated because for a protein, a beta sheet is finite in the sense that at some point you will get to the end of the beta sheets, you will have residues that would prefer a different state, in particular alpha helix. You would also have different parts of the proteins, destabilizing or destabilizing it. And we also have the statistical effect that I touched upon, was it earlier this week, I think, that at some point to form a beta sheet, you need residues that prefer to be in beta sheet, right? And if you keep adding residues, the probability that you will never ever have residues that would like to not be in a beta sheet, eventually drops. So there will be some sort of finite extent to the sheet. But when it comes to your question about the coexistence, if you think of the coexistence as the part of the structure that is stable as a beta sheet. If you start to unfold this, for instance, by raising the temperature, you will very quickly see that the beta sheet would completely unfold. You won't really have the beta sheet fraying and becoming slightly less stable. The beta sheet, it will of course be a finite time because the beta sheet will not completely unfold in one attosecond. But once you start unfolding, it's just a matter of time before the entire beta sheet unfolds. For the alpha helix on the other hand, if I start increasing the temperature or something just a little bit, I might unfold 10% of the helix. But then it's going to be stable there. I can keep it there and I will still have 90% of the helix around on average. Right. So with the whole point is that I can, if I just stop my process and I don't heat it more, I can keep it there. The intermediate state is going to be happy and stable. And that's the gradual transition part, right? That you can stop halfway. With the beta sheet, again, it's not that we have never said that this has to be an instantaneous reaction. It takes time to say melt ice into water or freeze water into ice. But even nomad, if you just stop halfway and wait, you're not going to stop the process. The process will still happen. And that's really the phase transition property that is going to be either or you can stop it halfway. Yep. So for a beta sheet, that would be true. For an alpha helix, it's not the case. For an alpha helix, again, if you have lots of residues that are reasonably stable at alpha helix, if I then start to raise the temperature, on average, well, let's say that I heat it so much so that 50% of them will stay in helix. Exactly what 50% of the residues are helical, that would even vary over time. So that you would have this helix coil equilibrium. Some of them would go from coil to helix. Others would go from helix to coil. And as long as I keep these conditions, you would, on average, have 50% of them in helix. So that starts to touch upon the fact that helix is not quite as stable as beta sheets because you have a gigantic number of non-local hydrogen bonds in a beta sheet. Just as the, yesterday, sorry, Wednesday, we spoke that you would get a very high initiation barrier to the beta sheet, right? But what does that barrier also do if you go in the other direction? You get a very high stability barrier too, right? So it's equally hard to unfold the beta sheet. But we're gonna talk much more about protein folding and kinetics later on when we have entire proteins. At this stage, we're just looking at typical secondary structures. So how long does beta sheet formation typically take? Yeah, so all bets are off. And this could take from seconds to years. And when it's a matter of years, we had all these relations to amyloid diseases and everything, plaques that can form. So I also spoke briefly about this thing, that how the end-to-end distance varies with the number of residues in a coil and made a short duration of that. So roughly how did the end-to-end distance vary and why is that important? Yes, but roughly if I increase the number of residues in a coil, how does the end-to-end distance increase? Yes, the square root of the number of residues to first approximation, it's slightly larger than the square root in practice. The reason why this is important in lots of size exclusion chromatography and everything else, all that stuff builds on this basically. And same thing when you're trying to sort DNA. But you can also use it to start understanding why we don't want proteins that are infinitely large. Because at some point, if you wanna build larger structures, if you try to build them from one coil, you're gonna end up with an insanely complicated puzzle to build the entire structure inside. And it doesn't really get that much larger. We'll talk about that too later on. And somewhere there, I told you that if we stopped the mathematical part and started to think more about real proteins, we're gonna come back and in the future lab, I think it's next week or so, you're actually gonna simulate real proteins. But the connection here I did somehow went back to the parts I started on in the course when we started about the interactions we had in real biomolecules. So if we now wanna model a real biomolecule, what do we need and why in particular? If you think in terms of free energy. So why do you need the energy? And what's in the energy? So the point is, we basically wanna go backward here too, right? Free energies is something we can measure in the lab. And now we're again making this connection up. We're creating a simple model of how a protein or a structure behaves so that I can then make predictions about how the free energies we would expect to see in the lab. Those should hopefully be possible to measure and then I can go back and at least confirm my model. Hopefully I can use it to understand some parts of the model and long-term we wanna now be able to use this as very high throughput, say, instead of measuring the free energy of interaction with drugs, I can calculate them. And as you mentioned, we certainly need the energy and particularly enthalpy. And what energy terms do we need? We basically need everything, right? And you have this in a few lectures, I've spoken about them before. But that's why we went through all these things. We need a model for the bonds, we need a model for the angles. And this is the reason why we ended up with these very simplified models. Remember what I said 10 minutes ago that the simpler a model is, the better as long as it captures the essence of the problem. And our reason for having these very simplified models that you don't wanna go into quantum chemistry unless you absolutely have to, keep a simple model and focus on the essence of the problem. In this case, the essence of the problem is I wanna understand proteins. You don't get any brownie points here for necessarily using the most complicated mathematics or using time-dependent relativistic Schrodinger equations. That's fine if you wanna understand physics, but we wanna understand chemistry. So the proof here is in the eating of the pudding. If we can make good predictions about proteins, our model is good enough, for those cases at least. And then we had electrostatics, we had Lenard-Jones interactions and everything. As you've probably seen the last two lectures, I've had a whole lot of movies about water and we need to include all the water. Why do you need to include all the water? Right, so that's one way of describing it. It's quite correct. But another way of thinking about it, if you think about the free energy, there are two parts, right? There is enthalpy and entropy. And the problem with all those beautiful quantum chemistry approaches is that they ignore the entropy. And at this point in the course, if somebody tells you, yeah, let's create a model and ignore everything about entropy, you should hopefully say that there is no way that model will ever be able to predict anything about proteins. So the problem here is that to be able to get the real free energy, I'm gonna need to include all the entropy effects. And then you're gonna need to include all the possible ways in which matter can organize itself. So we're gonna need to include all, we can't just ignore the water, even if it seems, in many cases, you're gonna have with having 95% of what you're calculating on is just water, which feels really stupid. But if you exclude that, you're not gonna have a real system. So that comes back, sorry, two questions from now on that we're gonna see the answer to that. And I spoke, what we frequently, typically do in simulations and all model is that we do this sort of energy minimization. That's usually where you end up starting. That sounds like a good thing, right? We're minimizing an energy sounds like a good thing. And particularly if we somehow wanna get to free energies. But what's the relationship between the free energies and this energy minimization? Well, can you approximate the free energy with the energy? In general, not. If you are super close to a folded state or some sort of stable state, right? So if you've already packed your entire protein, then you could argue that the degrees of freedom is not really gonna change much or the flexibility of my protein is not gonna change much. At such a point, you might be able to just energy minimize it and hopefully, again, that you can assume that the entropy doesn't change much. That is occasionally the case. If you're having a large protein that you've determined with extra crystallography, what do you get when you determine a structure experimentally? Sorry? Nope. It's a great answer. That's the answer I hoped for. But you don't get coordinates when you solve a structure experimentally. You don't even get a density of the electrons. Actually, you would with cryeum. So in extra crystallography, you would get structure factors. And those are dots on a... It used to be a film. Today it's a CCD device. That's images in four-year space. In cryeum, you would have super noisy images in real space. And everything after that is a model. Hopefully it's a pretty good model. People have spent decades developing amazing models. But there is more modeling involved in an experiment than you think. So in the case of extra crystallography, you then need to try to create a model that connects the coordinates with the structure factors and everything and derive this back. And at the end of the day, hopefully you can create a reasonable model, but it's not gonna be perfect. And you are certainly not gonna get the individual contacts between atoms. So even when you determine an extra structure, you're typically able to minimize it at the end. Because on the final scale to get a good model, the structure factors are not enough. So you throw it in a computer and tell the computer to minimize it. So what does that get you? And what is it that you don't get from it? You remove the bad parts, right? You remove overlaps, collisions, things that would later cause, say, a simulation to crash. And what is that you don't get? So you get a local minimum in what? In enthalpy or in energy. It doesn't, in principle, it doesn't tell you anything about free energy. And if you wanna do a small experiment later on in the course, once you start simulating proteins, you can just take an amino acid, sorry, a peptide sequence stretched out, put it in water, energy minimize it. It's hardly gonna move. It's certainly not gonna fall just because you're energy minimize it. And that brings me to the, what does a simulation do? So most courses and I think tutorials, we would just start to say that a simulation is somehow calculating how a protein moves. Does it? Sure. Okay, energy minimize it. You just get the local minimum. Yes, you know what? Can I, I can explain this in a much nicer way if we answer this question first and then I'll come back to it and give you a minimization. But it's a good question. But what does simulations do? So here's the keyword, you sample. Instinctively the first time anybody sees simulation, in particular when you see movies on a screen, right? You think that you're somehow predicting the motion of an atom. You can't predict the motion of an atom. Either because of Heisenberg's uncertainty principle or you could argue that we can't determine it that accurately. And there are these chaotic properties that you can't predict exactly how anything will move over long time scales. That's not what we use a simulation for and that's not what we're interested in. And the reason why I've been tormenting you with statistical mechanics and math for a week is that simulations provide a computational microscope so that we can look into these Boltzmann distributions. You can actually see in a computer, make predictions about what states should the molecule be in, what states should the molecule not be in, how fast will things happen? Will things bind? Well, binding is not binding on an atomic level, right? You're gonna talk about what fraction of the molecules will bind, what is the free energy? And to get that, we're gonna need to sample the Boltzmann distribution. And that's what you did in the lab on Wednesdays too, that when we suddenly started to introducing entropy, right? In this case, it was a super simple rule you had to try to sample your distribution. There are lots of ways you can do this. But what we always use simulations for is sample things statistically. And if you just sample things statistically, in principle, the initial state should not matter, right? If you just simulate long enough, we should sample every single possible confirmation of the molecule. You will eventually sample it according to the Boltzmann distribution and then you should be happy. But the problem is long enough can be a very long time. It could even be infinity. So if we now start with a random distribution here in free energy, of course, we would expect the molecules to spend a lot of time in the slow level regions, right? It should spend lots of time there. And then it should spend a little less time there, little less time there, and the molecules should virtually never be there. The problem, there are two things that will happen here. If you just, I just happen to give you an initial states, that's that black dot. Now you're gonna start out with having an extremely skewed distribution initially, because you could argue statistically, you should spend one out of 100 billion steps here. The problem is that your first step is there. You're never gonna get an accurate simulation here unless you simulate hundreds of billions of steps because you started out with something that was extremely unrepresentative. And that's of course because my initial state was a bit screwed up. So one of the reasons we're doing energy minimization is to avoid completely disturbing your statistical distribution before you even start the simulation. So that's one of the reasons. The other, yes, let's just assume that I'm an experimentalist and I didn't even energy minimize my structure, right? That it can happen. So in general, if you take a large protein, I will at least contain 10,000 atoms. If I give you a first approximation of 10,000 atoms, the likelihood, and they're gonna be packed, right? The likelihood that just two of them happen to be slightly too close, any pair out of 10,000 multiplied by 10,000. So the probability that one pair is not perfectly packed is suddenly quite high, right? So one of the reasons I wanna make sure that you end up, that you at least start with something that's reasonable. It doesn't have to be the lowest one, but it should not be, you don't wanna start with something that's extremely unfavorable. The other part is that when it comes to sampling these different states, there are a bunch of different models, laws, equations you can use. What you used in this lab was that you basically accepted or rejected moves according to the Boltzmann distribution. You call that Monte Carlo. For large proteins that doesn't really work that well, because if you randomly move things, on average, you're gonna bump into something else. So that would be very inefficient. So in practice, it's gonna turn out that a very efficient way to sample things is to use Newton's equations of motions. Calculate the forces. If you have a potential, the derivative of that is your force. If we know the force, we know how the velocity changes. If you know how the velocity changes, we know how the position changes. But let's now assume that again, you were maybe not at the peak there, but you were right here in an extremely steep part of the energy landscape. And that might be because you took two oxygens or worse, let's say two ions, two positively charged ions. I remember that we talked about hundreds of K-cals per mole when they're close to each other. And now we're taking two positive ions and I put them almost, let's say that I put them like one angst, no, sorry, not one, a hundredth of an angstrom away from each other. So they're gonna be extremely close. You're gonna have an insanely strong electrostatic overlap. That will mean that you're gonna have a force between these two that is like a nuclear device almost. This is gonna, this is a force, I can calculate that force, at least if the computer can handle the number. This will give you an acceleration that's so high that you can't even dream of it. That acceleration in turn will give you a velocity that's gonna approximate the velocity of light maybe. What's gonna happen to your protein? The entire protein would explode, right? And again, if you simulate this long enough, eventually you would hope that the protein would refold and everything, but then you're gonna be waiting a very long time in general. In general, you can't fold the protein in a large protein, you can't even fold in a simulation. So it's the same thing here. You don't wanna start out with something that's too horrible. You wanna start out with something that's reasonable. But the point is I just wanna cut off the upper half of the landscape. I don't really care about whether I'm in the lowest or absolutely lowest or 1% away from the lowest state. So think of this as maximization avoidance rather than really finding the true minimum. Yes? Well, I'll ask you the question, will it? So what is that you sample according to? You sample according to the Boltzmann, if you don't sample according to the Boltzmann distribution, we will not be able to calculate free energy, right? Otherwise you're just randomly picking states. So we have to sample it according to a law and that law is Boltzmann's distribution. So according to the Boltzmann distribution, will you spend more time in the low lying energy states or not? Yes, you will. This is actually a much better question than you think because this is kind of a problem. Because on the one hand, we need that. If that is not true, we're not gonna be able to calculate real free energy. But on the other hand, if the Boltzmann distribution says I'm gonna spend 95% of my time here, let's say that's the native state, you're gonna spend weeks simulating that state. And of those weeks, you're gonna spend virtually everything but one day in that state. That's a bit stubborn because after one day, we're probably gonna know the state really well and you still keep the simulation running and you're gonna spend all your time here. And if I would like to understand how expensive is it to move from there to say there? In one way, it's very smart what you said. Maybe can't we gradually force the system to move over this barrier and measure each state somehow, right? You can, but it requires quite a bit of math and then you need to be smart. So in general, while this is required for the Boltzmann distribution, you would somehow need to, first you would need to deliberately skew the distribution to force it to sample the other parts, but you need to remember how you did that so that at the end of the day, you can back calculate what the free energy was so you'll get what the barrier was. There you mean? So that depends on how high the peak is. In general, I would avoid that too. That's a local maximum in energy. Stick to the local, what energy minimization, the second you start an energy, and typical energy minimization might take one minute on a computer. It's super quick. So typically we always, I have to confess, it has happened that I don't run an energy minimization if I think the protein looks really good, but it doesn't cost you anything. And it's certainly not, you will never be wrong because starting in the states where you expect to spend a lot of time, that can never be wrong. So that I can't lose and it cost me 60 seconds. To me, that's a pretty good deal. We can come back to this later on, but the whole idea and the reason why this is interesting is that suddenly we can get away for computers to start to calculate things that are very non-trivial. For instance, how should you design a drug to open a channel? How can I force the channel to say move to an open state? Can I predict when if a protein binds another protein? Well, that binding is just free energy. If I want to design an antibody to bind to an arbitrary receptor that's important for cancer or something, how does that antibody bind? Well, then antibody is a protein and antibody will bind to the surface because it has lower free energy. And as long as you can design this molecule or something, you can start to literally build legal-like drugs that are much more advanced than traditional drugs. I will come back to that later today. So what we're mostly gonna do today, I'm gonna be looking at real proteins, some assemblies, and I'm gonna try to start discussing what these secondary structure and what these really do and why they create the actions they do. So the book has a couple of chapters on this that are horribly outdated. Anything that has to do with proteins or structure of biology is gonna be horribly outdated. The second there, it's a year old. So I will mostly talk about newer stuff here. The general concepts are similar to the book. And in your body, there are typically three classes of proteins. And one of them we will almost only hear about in this course and then we're gonna spend quite a bit of time talking about the other two. So going from left to right in the sense that they're the most, this is typically what you think about as a protein. Small blob, a couple of thousand atoms. The poster child here is likely hemoglobin that binds oxygen in your blood. Anything, proteins do things. You probably already know that from the bioinformatics course. It might sound stupid, but any molecule in your body that does something, there are some examples, exceptions that are hormones and other things that are small molecules. But if you just get a random name of a molecule, you have no idea what it is and it does something. Your first guess should say that it's a protein. It likely is. Enzymes are proteins that work as catalyzers. And the most common proteins are water soluble and you call them globular. And that partly has to do with this globene folds that the first protein has, but globular you can use as a synonym for water soluble. They were also the first proteins that we were able to determine because you actually could crystallize them. And I spoke about earlier on the course. The last 20 years or 25 maybe, we've learned a whole lot more about membrane proteins and because in particular we found ways to eventually crystallize them, which is much harder than you think because you essentially have to crystallize something in oil and oil doesn't crystallize very well. I have to confess that I'm horribly biased here and this is my love in life. And a whole lot of this department spends all our time working on membrane proteins. Well, I'm so amazing with membrane proteins that there are the doors and windows of your cells. Anything that's gonna need to get in or out of your cell either physically such as an ion or just a signal. If you want to signal something from the outside of the cell to the inside of the cells, hey, tell the cell to divide. Somehow this is gonna have to go through the membrane. So something creating an effect here that causes another effect on the inside. We're gonna come back to that, but that's gonna be next week. And the third class of proteins are the fibrous proteins, the building materials of your cells. You rarely see them in the protein data bank because they're large and pretty boring. You might think that they're not really structural and they're not structural in the sense that they create small specific effects like hemoglobin, but they are structural in the sense that they create very large structures, bone, skin, hair, all these things. It's protein. There are a handful of tools you will use later on the course to study proteins. The problem is as a physicist, you can say what you said before that a protein is just a set of coordinates. Technically that's two. It's just a set of coordinates and add-on types, right? But as you've seen earlier on in this course, we can choose to approach protein modeling on a whole other different levels. And rather than just studying these long text files that let's produce, it makes a whole lot of sensors try to visualize proteins to understand them. There are some, VMD is a great program that colleagues of mine develop at the University of Illinois at Urbana-Champaign. I like another small program called PyMol. All these programs are free. You will play around with them later on in the course and you can just download them. And the first thing I always do when I get a protein is that I open it in one of these viewers and start looking at them. So in this case, you have a large ribosome or something with lots of DNA and RNA in it. And what these programs allow you to do is visualize proteins at different levels. You might be interested in the surface or you might be interested in drilling down and seeing, emphasizing the secondary structure elements or looking at a particular binding sites. So they're really useful when it comes to understanding structures. In some cases, you're gonna have a protein that can move between two states, say open versus closed, right? And what you can do in these programs that you can superimpose them. So you see both the open and closed states and then you will likely see what is the main difference between the states. You can say plot the open in blue and the closed in green. The main difference that's happened the last few years is that we've been able to determine some amazingly large protein assemblies such as virus capsids or other things. I think that's a bacteriophage capsid. Ribosomes. If you do simulations, you can even simulate the structures as what's happened. We just take an entire DNA molecule and start to pull this through a nanopore or something. So there's pretty much no end to what you can do with these simulation programs and they create amazingly beautiful images. But what I'm gonna start doing, I'm gonna start talking about the fibrous proteins in particular, because they're gonna be the least important ones and then we'll get on the globular ones and then do membrane proteins next week. So fibrous proteins are important because you wouldn't exist without them. This is actually pretty different if you start comparing eukaryotes in particular protozoans, sorry, metazoans and mammals and vertebrates and everything, right? The higher our organisms are, the more structure you have in an organism while the opposite would be a small bacterium. So you tend to have, I hope you have bone, you have skin, you have hair and everything. So all these structures that somehow builds yourselves. They are typically less specific biological. Now of course your bone structure is important, right? But it's not the bone itself, it's that the bone creates the framework on which you build out of things. It's not that your hair transports oxygen. So they tend to have physical structural roles rather than a chemical reaction role. We're gonna talk a little bit about what all these different filaments, tubeless and everything that one key difference here, how large is a typical protein? Forget about the fibrous proteins for a second, but the other proteins we've looked at, what is the typical length scale of a protein? That would be a very large one, so maybe 10 nanometers. But you're in the nanometer range, right? And what are the length scales of these things? Centimeters to meters, right? So we're talking about gigantic structures here. So somehow you're gonna need to find a way to hierarchically build super large structures from really small ones. This is gonna be pretty much everything you have in your body, hair, nails, shells, clothes. Hopefully you don't have clothes. Anything you have in higher animals. They're typically large proteins. Of course not just one protein, but you have to somehow build one protein, aggregates with the second one, aggregates with the third one to cause larger and larger and larger structures. And they're typically very regular. The first one is a classical one, silk. Silk is just protein. Well, not quite, there are some other minor things, but to first approximation, silk contains a list of beta sheets. So you've all touched the beta sheets. And it's a pretty boring pattern, mostly alanine and glycine, and then you have a couple of serines. So it creates one relative of hydrophilic, somewhat hydrophilic interface with the serine, and then a hydrophobic interface. And that causes the silk to just build layer after layer after layer. And they create this very, very smooth pattern. So it's not really a crystal, but it's closer crystal, because it's not gonna be perfectly packed. So if you ever, whenever you go down and buy some shampoo or something, they frequently tell you that they have silk protein in it, as if that should be something fancy. They just have these semina acids. That's nothing to do with silkworms. So silk protein just means that they have added a bit of beta sheet in it. I'm not sure if it might help you provide a bit of luster in the hair or something, but you certainly don't get silk protein from silkworms. It's just random protein. But this is, you can make artificial silk, probably not as fancy as the real thing. I'm not sure why, but it's an example of a very simple structure that animals just create from proteins. You probably don't produce a whole lot of silk, but you certainly produce collagen. So collagen is another one of these very large structures that you frequently have in bone, teeth, skin, very hard things. This is a very special type of secondary structure that you never see in it. You have lots of proliens in it. And proliens, they typically hate to be in any secondary structure. But if you have multiple chains like this right next to each other, you can create some sort of super helix where each of these chains is not a helix. But when you put three of these next to each other, do you see how they can form lots of hydrogen bonds between them and then you have some extra water stabilizing it? So you don't have a single hydrogen bond inside each chain, but the chain's hydrogen bond to each other. This is 25% of you. I'm sorry, we're not that fancy. Pretty boring structure. So anything that's hard in your body is typically collagen. So they are a ballpark of 50 to 20 angstroms wide, and then you see they can be very long. Very is a relative measure here, but 300 nanometers or so. And then you can, of course, aggregate more and more of these into each other. So this was one, two, three peptides. Let's take a bunch of these. They're pairs of three, and I think here I put six or so of them together. So now you see it's kind of not really a helix in the way we've said before, but it's a structure that in turns consists of a smaller structure that in turn consists of an even smaller structure. So you have this entire large ring here where each part consists of this triplets of peptides and each of those triplets of peptides in turn consists of a chain. So you start to have quite a few chains here. This is still on the molecular scale, but at some point you can start going into an electron microscope and actually start to see the six. I forget what the length scale here is, but this is dentine. So this would be a very, very small part of your tooth. Yep? Well, so the proline is not 25. You're quite right. Proline is not common in normal globular proteins. But remember that how do you determine how common proline is? What determines how common proline is? Codons. So you're gonna have at least 1.64th of the codons that are proline, right? So there is a lot of proline in your body. But this is another example where you start seeing the first genetic effects. So what can happen is that for these chains to work, you needed lots of glycine in them. I think I had that in the previous slide. Glycine proline, proline. If you mutate that glycine to anything else, you're not really, the glycine was very flexible and that flexibility was required to make this stable. If you mutate this, the packing here is not gonna be as efficient and then these structures are gonna be much looser. So there is a classical disease called brittle bone disease and that's caused specifically by that one mutation. So here where I mentioned that last lecture too when I talked about the amyloid diseases. So you do start to see the pattern here. You have random mutations in your body that start to influence the stability of protein structures that then cause some sort of genetic perturbation, sorry, that causes the genetic effect causes a structural effect that in turn causes a functional effect. All genetic diseases are based on that. You somehow perturb your proteins. What happens if you perturb them too much? Yeah, you die. Then the organism doesn't survive. So virtually all genetic diseases are related to things that perturb you a little bit. Not so severe that you die. I'm not an expert in brittle bone disease but I could imagine that when you're relatively young it doesn't hurt you that much because our bones are fairly strong. We have a wide margin. So you likely have time to reproduce and get offspring but by the time you're 50 or so rather than having problems with your bone being brittle when you're 80 you might get it when you're 40 or 50. It's not really gonna influence your ability to produce offspring. So therefore it will survive. This concept when you start having a secondary structure or something that in turns creates something on a slightly higher level we see that in lots of other places. I mentioned already when I talked about Linus Pauling and the way that they determined the secondary structure remember that I said that they found some other strange bands that they really couldn't explain. And this bands were caused by you had a helix but then the helices were coiling up into even larger structures too, right? And that put them off a bit because they thought that there should be some sort of 5.1 angstrom periodicity. That's called coiled helices. So you have helices that in turn wrap around each other to form some second larger structure. Extremely common in nature whenever you have alpha helices. And one of the places where that occurs is actually in your muscles. So you have this myosin proteins in your muscles. You're basically, so when your muscles contract this is actually super cool. You're basically one protein walking along another protein. It goes super fast. So whenever my muscles contract is that you have these two proteins where one of them starts walking along the other based on pure chemistry. When you see this, I don't think I have a movie on this but if you go on YouTube and look for myosin or something there are simulations of these and you would think that it's fake but obviously it's not fake. We know that our muscles do work. They frequently end up having 3.5 residues per turn. So this twist is slightly harder. And the reason for that is that simply you wanna create strong bonds between them and then you get with 3.5 residues per turn you make two full turns and seven residues so you get a periodicity that's nicer. If it was 3.6 you would find it difficult to get back to the same place. And in this case when you were walking along it, right? You want one binding site there and then you want the second binding site there. You don't want the next binding site to be on the opposite side. But that's more of a parenthesis. So why do helices pack this way? Well, this comes back to this hierarchical approach that I've talked about so many things. So we started from simple interactions and then we started to think if rather than worrying about Lenard-Jones and electrostatic separately you started to talk about hydrogen bonds. Then we went from hydrogen bonds to talk about, hey, we can rather than worrying about all the detailed interactions we can think of helices and how this creates sort of regular structure at higher level. But a helix doesn't really exist in itself, right? A helix is just your way of describing a regular packing pattern in atoms. So even when you draw a helix that band doesn't exist. It's just something you're drawing in the molecular viewer to help you see this periodicity. And these periodicities keep coming back. So you could argue that nature isn't very imaginative or you could argue that nature somehow we need to use regular patterns to create higher level building blocks. So when it comes to understanding how multiple helices pack there have to be some sort of regular patterns here too, right? And now we start moving beyond secondary structure up to something higher. You're typically called the super secondary structure. For instance, when two or three helices are packed. And if you just plot the helix, well, you can draw a helix like that. And I'm not sure about you, but to me this is a bit too much information. When you see all the bonds here it's just gonna be a complicated pattern of helices, right? You don't even see the hydrogen bonds here. Lots of side chains sticking out. And to me it would not be obvious how can you take two structures like this and pack it. Here's where a molecular viewer can help you because if we then go into the molecular viewer and say, you know what, if I'm another molecule approaching this helix let's just look at the surface. And if you start to look at the molecular surface of a helix, again, roughly what I would see if I took a water atom and rolled it over. Because of this helicity that things go around the side chains will start pack in fairly regular patterns. And between, so you have one set of side chains that go here and then it goes on the backside and it comes out again. But between those side chains you're gonna have the part where there aren't side chains. So you essentially create ridges of the side chains and valleys between the ridges of the helices. And these are, since the side chains are discrete there's one side chain there, one side chain there, one side chain there. There is, these side chains are kind of mountains of the helix and between the mountains there are valleys. So there you could argue there's one valley here or you could say there is a valley here. So these valleys are always located when you go between the side chains. So what now, and you can calculate roughly what these are if you move from one side to another there are a couple of different values. So what happens now if you take two helices like this? How should you pack them? Right, so pack a valley against the ridge. And this is what helices do. So basically this is one helix and that's a second helix and then let's say that the lines, the solid lines here either green or red, they are the ridges. The dots here are the side chains and then this dotted region here blue here and they are the valleys. So if these two helices should now interact we're gonna need to take one of them and turn it around 180 degrees so that it's, because here we're seeing the face of both of them. So if you just put them on top of each other it would be completely horrible. Everything would bump into everything. That's the third picture. That's not gonna happen. But if you then take one helix and start to turn it around you're gonna pack one, you're gonna pack the ridges of one into the valleys of the other. And you see here how the black and white dots suddenly do not overlap but they pack very nicely against each other. Helices, this means that any time you have helices interacting what has happened here? So are they parallel? No. They make an angle with each other. That's always gonna be the case. You will never see two helices perfectly packed, parallel because of the interactions. And when you start looking at, when you first start looking at structures you might think that structures are not really perfect because any time you look at structures there's gonna be some complicated patterns. It's not perfect, right? It's actually the opposite. They are perfect. They are perfectly packed so that the interactions, there's basically, there's no space whatsoever between these two helices. It's densely packed inside. Now of course, remember what I said, helices don't really exist. This is just nature's way of making sure that atoms are perfectly packed. There are as many hydrogens as possible between interactions. And then it's a super complicated structure with a set of covalent bonds in one helix and a set of covalent bonds in the others. So somewhere at this point it starts, at least for me, I can't really keep track of things anymore if I start to look at individual atoms. And this is where we use this hierarchical approach that we talk about helices. And when I talk about a helix, I might not even care about the hydrogen bonds inside it. And at some point it actually even makes sense not to think about an individual helices but start to look at both of these helices. So we keep building our tower here into higher and higher order's concepts and how we talk about structure. But you will always see that helices make an angle against each other. How many different angles like that do you think you can make in a helix? Well, so if they're not overlapping for you, you could turn it to the left. You could actually turn it a bit to the right too. And in this case they are parallel. You could imagine that they are anti-parallel too but there are basically two, maybe four ways of doing this. So if you now wanted to predict any type of any way that you could pack structures, how should we do it? Well one way could be to make a very expensive large simulation, put helices in water and try to simulate how helices interact. But in principle there are only four states, right? There are only four ways helices can pack. So you could use a method like the one you used in the two first labs. You only have a handful of very small states and you now need to determine which one of these four states is best. And you might think that this is super simplified physics and everything but there are a fair number of very advanced drug design papers that had just used this. You pick out the number of helical packings you have and we can steal that from the protein data bank. And if I now want to create a new helix, well just check which one of the four packing patterns is gonna be best. The simplified models are way more powerful than you think. Cheating is good, it's allowed to cheat in practice. One of the most important such coil-to-coil helices is what you call alpha keratin. And this is a pretty boring protein. The entire protein is just a coil-to-coil. So you have one helix here, blue, second helix in green. Do you see that they make an angle to each other? And then these residues, roughly every seventh residue is a leucine, actually not roughly every seventh residue is a leucine. What do those leucines do? What do you know about leucine? Nope, that's lysine. So if you imagine you had this in water, what would happen if the red ones were lysine? They're charged, right? They would interact with the water. But lysine, a leucine is the opposite. Leucine is a hydrophobic residue. So if you have this periodic leucine every seventh residue, if this is now in water, what will the leucines do? They will like to patch it up, right? Do you see that you get a hydrophobic effect part here? It's kind of, I was about to say, it's kind of like a hydrogen bond, but it's not really a hydrogen bond. These aren't making hydrogen bonds. This is just a hydrophobic effect as if we're small drops of oil here. But because they stabilize each other, they're now gonna help keep your entire helix together. So this literally creates a zipper. You're sipping this up. It's also a relatively high fraction of residues in this that tend to be sustained. And I now actually said that, but why do you have to stabilize this by disulfide bonds? How will the disulfide bonds be formed? So we now start forming disulfide bonds between these. We're gonna create a very nice, relatively rigid rod. You have quite a lot of this protein too, because this is pretty much what hair consists of. It's a bit more complicated because you have cuticles and then cortices. I'm so not gonna ask you about all this. You have to drill down, a hair might be 0.1 millimeter, right? So you have to drill down a couple of levels before you get all the way down to the keratin. But it's a fairly simple structure. So you have this coiled coil of helices. You have one helix here in yellow, another one in red. They coil up in pairs, just like I can describe in the last picture. And then you end up having two of these pairs of helices and they coil up actually in the other direction. So now you have four helices. And then you have these pairs of four helices that then form a structure of eight or nine helices. This is still a very small structure, but eventually you build up this hierarchically to larger and larger and larger things until we get to the point where you can actually sense it. You can even measure it, that it's in ballpark of 0.1 millimeter. Remember the stuff I said that Max Puritz, when he had this idea about alpha helices, he went home and measured on a horse hair. The horse hair is entirely alpha helical, the first approximation. There is some dirt and some other contaminants in it. So based on this, there are two cool things. First you can actually measure how fast alpha helices form. And you form in the ballpark of 10 turns of alpha helix per second, which is slower than what we would say that alpha helix would form. One complication here is of course you actually need to synthesize the protein and actually start forming the protein too. But by any standards, it's growing pretty quickly. Can you say anything else about that cysteines? You were quite right that you formed disulfide bridges. Do you ever use that anywhere? Hair conditioner, but even there's some work, there is some more special thing. We specifically used the disulfide bridges. No, a permanent wave. So if you would now like to shape your hair, what would you do? So first you're gonna need to break the existing disulfide bridges, right? So then add a reducing agent in your hair that breaks also disulfide bridges. And then you shape the hair in whatever mechanical shape you would like to have it. And then you now add an oxidizing agent that reforms the disulfide bridges in the new shape you would like to have your hair. And now the hair, you've now changed the built-in structure to be, for instance, wavy. This will of course eventually disappear, eventually they will break spontaneously, but that's exactly how you create a permanent wave. First reduce the disulfide bridges and then you recreate. Another cool protein that has been related to some recent scandals, both in Sweden and elsewhere, has to do with artificial organs. Yep. So that's kind of similar when it comes to ironing clothes. What happens in general with any type of structure, and again, this is not limited to hair, cotton is similar, although cotton is not really protein, that eventually when you're adding water and then drying something out, right? This will cause the molecule to start aggregating and binding. You will have, you will form some sort of hydrogen bonds. And these hydrogen bonds will in principle be formed randomly. And when things are formed randomly, the structure is also gonna be random. Think of entropy, that flat structure will not form completely randomly, right? So what you're doing with this heat is that you're adding energy. So it gets you across some energy barriers, you can break things. And while you're breaking things, you're also applying some sort of mechanical pressure that you're forcing it to either stretch it out or push down on it. And then you allow it to cool, but you're now forcing it to be in this more ordered state when you cool it. And because it's now in a more ordered state, you will hopefully keep that order states. So that's, it's not really different whether you're ironing your hair with a small piece of equipment or you're ironing your shirts. You're changing hydrogen bonds, yes. Consider that a practical. Oh, and that's actually, that's the rest of it, it's something else. Do you know iron-free shirts or clothes? Do you know how you create it? So normally, why do clothes or anything get disordered, right? That's a natural process. It will happen because hydrogen bonds, they break and they reform spontaneously. So if you take a shirt or something, if you just let it hang, it's going to be reasonably flat and nice. But if you now put this in random shape, I have no idea that takes the shirt, I can just curl it up and then push it down in a drawer or something. What's going to happen over a couple of days or weeks is that when hydrogen bonds spontaneously break and reform, they will now be stable in this new state that you're forcing it to be in, right? But what if you could now add some sort of treatment to this point so that make, create stronger, not hydrogen bonds, but create any form of bonds between these multiple layers you have in the cotton or something? So you give it a, and that's exactly what you do. So you're adding chemicals that stabilizes the interactions between these layers. And because these are much stronger than the normal hydrogen bonds, they will stay intact. And that's how you get iron-free materials. Elastin is also another one of these, but it's a highly, it's a fibrous protein that's highly elastic. So it's not, the exact structure here is more complicated, but it's similar to collagen in the sense that you're creating something hierarchical. And here you have cross links between these, and this gives us this property that you can either, normally you're going to have a fairly disordered structure, but if you pull this to the right and left, if you pull this out, you're going to create a much more stretched out structure, and then you will relax it again, you no longer pull it, and it will curl back up. So this is like a miniature rubber band. There are a bunch of genetic diseases here too. If you have a deficiency of some of these enzymes that create these modifying lysines, you're going to have very, very brittle vessels, and that can lead to things like auto rupture. If you get an auto rupture, you typically die instantly, unless you're in a hospital. You basically have a couple of minutes. But another thing where this is important is that as we get older and everything, there are a lot of blood vessels and everything you would like to replace. How do you typically do that today? So as we get older, we typically get problems with blood vessels either because they're clogged around your heart or anything, or there are lots of reasons why you can have problems with blood vessels. For instance, when it comes to auto ruptures, they can simply start being brittle and everything. So it's very common that you have a blood vessels that you want to replace. The typical operation is what's called a bypass operation in your heart. So where do you get the blood vessel from? So typically you tend to take this from your own body, right? So you pick something from your... You pick a vein roughly from your leg or something, which is a pretty major operation down there, too, but again, you can survive that and it's more important to have it in the heart. No, the vessels are very different, of course, right? But the whole point is, by picking something from your own body, it's at least going to be biocompatible. You're not going to have any reaction where your immune systems start to push out your own vessels. But the problem is you're 85 years old and if you need a bypass operation, on average, the blood vessels in your legs are not going to be in outstanding shape either. So in principle, you could, of course, say that you could have a transplant and get it from some other patient, but then you get problems that with the immune system attacking this. So the last decade, there's been a lot of work in different types of artificial vessels and there have been some fairly big scandals, in particular in Sweden, where MBs have picked vessels from diseased patients and then you try to clean this from all cells and then you somehow would like the cells to take stem cells, your own stem cells and basically treat this with stem cells and the idea is that then you should somehow build a new cellular matrix on top of this then because there is so much money in this, this is people have basically committed research fraud and they claim that it worked, it didn't really work. But long term, if you could create this type of biomaterials, if you could make it completely artificial, if it's just a protein, your body would not push it out. Your body, the immune system will attack things like receptors and everything. Pure protein is compatible with your body. So if you could create ways of doing this artificially, it would be an amazing spare part factory for humans. We're not there yet. There are a couple of cases where this works, but in general it doesn't, but it's far better if we could produce this with biotech rather than picking it either from animals or disease patients. But I think that's all I'm gonna talk about when it comes to fibrous proteins. They're not really that important from protein folding and they typically don't fold and refold so that they're not gonna be, it's pretty rare that you're ever gonna be in a drug company and try to design a drug against hair or something, right? That we don't target that type of proteins. But there are lots of genetic diseases that create pretty severe effects with it. But the really cool part is globular proteins. I will spend, let's see, it's a, let me give you two minutes here and I will just introduce this part and then I'll give you a break. Globular proteins are good, simple, it's a great introduction to protein structure because in principle there are just two types of structure we need to understand. We need to look at our helices and we're gonna need to look at our beta sheets and then we're gonna mix these helices and sheets in a bunch of different combinations. And here too we're gonna approach this hierarchically. What all these proteins do is that, what you see here is of course fake, right? It's fake in the sense that I've drawn things like helices and sheets to try to make sense of the structure. But these are really just blobs of very efficiently picked atoms. But if you start to looking at something like this and if you could only see the surface we couldn't make sense of it. So what your body is doing is that we're combining elements of this to create say a specific binding pocket here or here or we're creating a protein that can transport something in your blood. And this is then a mix of physiochemical properties. If you're gonna transport something hydrophobic you have to be hydrophobic on the inside. But it's also gonna be related to bioethics and genetics. You need to be able to fold this protein. If the protein is not stable you will never fold it and it's not gonna have the role we have. So virtually all early x-ray structure and everything have started trying to make sense of these structures, make sense of how helices are combined with sheets, how they are stabilized and in particular what these structures do internally. But that's, we're gonna talk about after the break. It's 20 past 10 so let's take 20 minutes and reconvene at 20 to 11. So we were talking about globular proteins. I'm gonna, there are a bunch of different ways both the book and others try to make sense of this. I'm not really, this is, I think that these cartoons and all the topology stuff is old. You're gonna see it now and then I just want you to have seen this. The point here is that as I mentioned before the break ultimately this is just atoms being packed. But now we're gonna move even one layer further up. Now we typically don't even talk about two helices but you're gonna talk about a number of helices combined with beta sheets and complicated patterns. And again, you're interested in the patterns, right? You're not interested in all the details because then you wouldn't see the forest for the trees. And you can certainly look at a protein like that. It's called a timbarrel. I will come back to what that is later. If you just look at this on the surface you could hardly see anything. But when I've colored it in different ways you can start to see, first it's a sheet and then a helix and then a bit of a sheet again. So by alternating this way all these sheets can pack to each other, right? And you can draw that. You could even imagine drawing that from the top that you start at the end terminus and then you have this triangle that a sheet, helix, sheet, helix, sheet, helix, sheet, helix, helix breaking the pattern. And by drawing this super schematically you can kind of see how all these sheets would pack together. So the idea with this thing is that you can and here you can also see without even seeing the entire sequence here, right? Forget about the stuff you were seeing down there. Here you can also immediately see that oh, all these structures, they're forming some sort of extended sheets. So think schematically and think about or how structure is organized on a higher level. So if we start with beta sheets here because we typically always start with alpha helices, beta structures are simple in a way. They're only continuous sheets. Remember that we said before the break helices make angles to each other. Sheets are actually simpler in that sense because they're the first approximation planar. There's a slight twist of the entire sheet we talked about but the first approximation, they're planar. By far the most common ones are anti-parallel sheets. So what's an anti-parallel sheet and why do you think they are more common? They're directly opposing each other, but you can have very stable things even with parallel. So these would be parallel sheets. The sheet there and then something between a sheet. So the problem with parallel sheets, right? You can have a strand and then you need to come up with something else and then you can have a strand and then you need to come up with something else. So you need to go back all the time but if you run anti-parallel, you can go up, down, up, down, up, down, up, down. So it's much easier to have a pure beta sheet if you're anti-parallel and you just have these tiny turns directly between them. If you are parallel, you're gonna need to find a way to cross by basically moving back and it turns out that one of these crosses is slightly more common than the other. Why do you think that there is a right-handed geometry here? Yeah, it's a chirality of amino acids, right? Just the fundamental properties of amino acids means that some of these are more likely than the other. So if you now design a structure predictor, could you use that? Right, so the point is that if you're not sure which one of, if you know that there's a beta sheet but you're not quite sure about what the cross-server should be, again, use statistics. One of them is far more common than the other. And this is something we actually use in predictors. So when it comes to higher secondary, higher, not your secondary structure but tertiary structure predictors, we don't try to predict structures from scratch. You probably did that for the beta sheets, sorry, the secondary structure. But in general, when it comes to predicting structure, the extreme case of that would be to put an entire chain in a simulation and try to simulate how the protein falls. That would take forever and it would not be very accurate. So how would you predict structures if you had to predict the structure of an entire protein? Yes, but in general, if I have a homolog, that's great to use homology modeling. But even if you don't have a homolog, you can, we can still kind of cheat right and look at the structures that occur in real structures. And let's try to build that, borrow those pieces and see if they fit my sequence. Essentially threading, yes. So the whole point you can cheat and things that don't occur in nature, they likely should not occur in your protein either with a few exceptions. There are cases where people have been able to build new structures that don't occur in the PDB and then show that they can actually stabilize them and produce them. Beta sheets can pack in two ways. You can actually have this crisscross pattern when they're orthogonal and you see here how they're all anti-parallel because it's a pure beta sheet structure. It's virtually impossible to have a parallel beta sheet structure be pure beta sheets or you can pack them in an aligned parallel way. There are a couple of common structures here that can be just fun to know about. This more fatty acid binding protein that creates a small pocket. It's this orthogonal packing. This beta sandwich is actually immunoglobulins that we frequently, well, we use them as a way to boost your immune system. It's not technically a vaccine but you would typically eject immunoglobulins to help your immune system. There is another type of similar aligned structure. It's called gamma crystalline. This is what creates your islands. So you remember anything in your body that does anything, it's almost all with proteins. So this just happens to be a fairly transparent protein but it's a protein that's also somewhat elastic and particularly when you're younger because your muscles can change the shape of your lens. And that's how we, that's basically how you see how the focus the image on the retina. Super simple, small structure. But this is a bit more complicated. Remember I said before that anti-parallel beta sheets are nice because they can just go up, down, up, down, up, down, up, down, right? Does this one go up or down? You see the pattern here, you're starting from the N terminus and then you go up and then you go straight down. So good, the pattern, we do that. And then we go straight down and the small turn up again. But you see when we move to three to four here that we're now jumping over the first turn we made there. This is really stupid. And then we keep, this pattern actually keeps coming back on the backside here, five, six is small but then seven to eight is jumping over five, six in the opposite direction. This looks remarkably stupid. I'm actually gonna argue that it's very smart. I'll tell you why in a second. The, this was shown by Jane Richardson in the 1970s and she called it the Greek key motif. And the reason why it's called Greek key, you actually see this in Ernst's, this very traditional Greek pattern. But if you look at the white trace within the black part here, do you see that you have something small here and then the large that goes over it and then the small inside a large, you always have the large overlapping the small one. So if you were now an artist in Greece in the antique, why would you draw that pattern? What's special with that pattern? You draw it without lifting the pencil or whatever tool they used to put it on the iron, right? So it's a continuous pattern. What does that correspond to for your amino acid sequence? So it's a single chain, right? Because lifting your pen would correspond to breaking the chain. So if you go back there and this means that you can draw the entire protein without lifting your pen. But we could do that. Couldn't we do that for a simple sequence? I could draw the sequence like this too, right? Then I don't have to lift the pen. There are two differences here. The stuff I just draw on the board here, how stable would that sequence be? Well, it would be locally stable that it would be a beta sheet, right? But in theory, if the end of the beta sheet somehow bound to the first part, you could imagine almost creating a barrel, but you're not gonna have anything on top of the barrel and you're not gonna have anything on the bottom of the barrel. So by having these slightly larger loops here, these loops will basically help you create a lid and a bottom on the protein. And in particular, they're also gonna help, suddenly you have this, the upper red loop helps stabilizing the lower red loop because they're likely bound to each other. So this is gonna help you create a more stable structure that the loops also help stabilize the structure. And in particular, if you wanna create a small pocket to carry something in, well, you know, if you have a pocket and there is no bottom in the pocket, things tend to fall out. So there are a bunch of structures that this is by far one of the most famous so-called super secondary structure motifs. Have you seen any other super secondary structure motifs today? Sorry? I have a coil of coils. Coiled coils, yes. So there's also two alpha helices. So there is a well-defined structure, a regular pattern that is just above the secondary structure. And the reason why we call it super secondary is not really a protein yet. These are common reoccurring patterns. They're gonna be awesome anytime you're gonna try to predict something. They're also gonna be awesome if you wanna create a particular binding site, but it's not really full three-dimensional protein structure yet. I already spoke a little bit about this fatty acid binding protein, but I figured I should show you one where we actually have a fatty acid bound. So here's the fatty acid binding protein. Here we actually happen to have two helices. Sorry, I lied a bit when I said it was a pure beta sheet protein. What do the helices there do? They kind of create a lid, right? And you don't see that beautifully. There's a bit of a helix down here so here you also create a bottom. If you didn't have this lid and bottom, what would happen? We would somehow need to expose water to the inside and this water would be bad because the inside should be hydrophobic. The white or light gray part here you have, that's a fatty acid. That's gonna be transported and eventually used to produce lipids or something. So here we even see the, I think it's, yes, it's a oleic acid motif. This is a slightly different pattern. You see that this one actually just goes up, down, up, down, up, down. So that turns out there are two secondary side. This is called the beta meander. So meandering is this process that rivers go through and this is why rivers flow back and forth. So beta meander is one common secondary structure, super secondary structure and Greek key is the other one. Greek key in particular you should know about because it's such a beautiful structure. So you hear that again with crystalline. So what type of secondary structure was that? That's the Greek key, right? The whole point of Greek key, one loop goes over the other loop and then you can continue to draw it continuously. And then it also means that you see that you go, first going one direction and then the other loop goes in the other direction. So you just as you did here, you keep going back and forth, right? It almost seems a bit stupid when you're in the secondary structure, but this helps you create more interactions between the loops. The reason why I've been going through all this is has to do with how proteins in general fold and are stabilized. We don't really have that many local topologies. Just as you just have sheets, helices and coil at the lowest level, there are a handful of super secondary structure motifs. So nature somehow seems to reuse things to work very hierarchically. And I say there are only gonna be some sort of connections between these seven structures that are used. For the same thing, we virtually never see mixed parallel and anti-parallel beta sheets. The nature, if you're gonna have them anti-parallel, you can just go up, down, up, down, up, down. That's one type of structure. If you're gonna need to have them anti-parallel, you're gonna need to alternate between helices and sheets, and that leads to different classes of structures. So it's much more hierarchical than what we think just from random amino acids. And as we're gonna come back to it later, stable proteins require stable building books because there are so many atoms in a protein that if you had 100,000 atoms, that just would need to spontaneously arrange to find a stable minimum, that would never happen. It would be too complicated a puzzle to solve in terms of entropy. Yes. So first thing, the key word is almost, right? So that if you start having a sheet, then you can have the anti-parallel, anti-parallel, anti-parallel. The likelihood of then having one helix and then go back and suddenly be parallel, it's simply very rare. We don't see it happening in evolutionary. It will happen now and then, like it's not never, but it's simply very, very rare. And it's because I think you will see that later. They tend to form slightly different structures. The anti-parallel beta sheets are more common. It's partly because they're based on these hairpins. That's what you see in these manners too, right? So you can form a small turn and then make them very stable. I think that's because, well, not just I think, there are some other people who think too, they likely form quicker. We also know that loops can't overlap and you can't really have a knot. So a knot would correspond to if you're starting having, literally, if you have a piece of thread or something, if you would literally physically make a knot of that, that protein would never ever fold because you would now need to find the end of the chain should somehow spontaneously go in and tie something. That would be extremely unlikely for that puzzle to work. But then I say it's told you a couple of times, there is no rule in biology without an exception. Pepsi, small protein and sign. That actually has a knot. If you go through the blue chain there, and let's see, you trace it all the way to the end, to the red, do you see how the chain here goes through the loop? So this one, if you take the blue end and the red end and just pull, at the end of the day, you would actually have a small knot on the chain. But most proteins, if you just pull them, you would just have a straight chain. So with all this, and the reason why we go through this is that if you start to look at topologists, these small supinary structure, the patterns that we can randomly create, just pick four beta sheets and then try to think of different ways to combine them. I'm gonna argue that this is every single possible where you can combine the beta sheets. The meanderers might be the simplest one and then you have Greek keys and it's gonna turn out that some of these are very common, some of them we see now and then. And all the other ones you virtually never ever see. So now you're educated and know that this is partly based on evolution, right? But why has evolution only picked these three states? They have lower free energy. And this is also something I'm gonna come back to, that there's this duality, but when on the one hand physics, potentials, forces, interactions, and on the other hand, natural evolution, selection, bioinformatics, at the end of the day, because they are connected, you could use either to predict them. You could try to predict these based on actually calculating the free energies, or in this case you could just look in the protein data bank and it turns out we're almost only gonna see those three. So here's a famous example where David Baker did, I think it was almost a decade ago. They actually picked, I think it was two or three of these others and they, so combinations of beta sheets that have never ever been observed in nature. And then they managed to show that by with computers, specifically designing in pairs of residues to make, say, that residue stable, they could actually create a sequence of amino acid that folded to a stable small structure that had never been observed in nature before. So why would that be important? Apart from the fact to show that it makes sense physics and everything, you can create protein structures that haven't been formed before. Partly, well, to force them to be less reduced by enzymes, you would likely need some sort of non-natural amino acid, but as we will come back to later on in the course, most drugs in nature are small hydrophobic molecules. They're good in many ways, but there are typically side effects to drugs. And most things, if your body would like to achieve something, your body usually doesn't with proteins. So proteins are typically much more specific than drugs. A protein that binds to another protein is gonna be better at it than a drug. So we can, by designing proteins as drugs, we can create very specific drugs. But of course, by definition, we need to design the drug because it's something your body doesn't do yet, right? And that means that you're typically starting from scratch. We're gonna need to create something that nature hasn't yet thought of. And that makes it hard to predict. The other thing beta sheets can do is that they can dimerize. And this is exactly what we said about these amyloids, right? So if you have one large piece of beta sheet here that's green and another one, sorry, one molecule with lots of beta sheet that's green and another, in this case, exact copy of the same molecule. Normally, the right-hand side of the green beta sheet would be unpaired. And this is exactly the same molecule turned around 180 degrees. But if they now pair up against each other, do you see how they can then form hydrogen bonds all the way here? So we now got rid of two of those unpaired edges that I showed what is on Wednesday that they're not really that advantages. So beta sheets frequently drives, in this case, dimerization, but in other case, oligomerization. So longer and longer and longer structures. And that's how you created those amyloid plaques. Beta sheets tends to stabilize other beta sheets. Could this happen with a helix too? Normally not, right? Because there isn't really any unpaired side in a helix. So with helix, you won't really have this chain effect. The sidechains could stabilize it, but in a helix, all the hydrogen bonds are paired. With a beta sheet, you have the effect that hydrogen bonds on the inside are paired, but hydrogen bonds at the edges are not paired. And that's an effect you don't have in helices. So helices will not usually drive dimerization. Yep, exactly. So that the one exception to that would of course be the coid coil. But you only have two. So in a way, it's a dimer, but you're not gonna be able to go to more than, you get these pairs because you have the loose scenes. That's a bit of hydrophobic. The loose scene effect is not gonna be as strong as in this case, say 15 hydrogen bonds. So this is gonna be a stronger effect. And the other thing that if you now have larger and larger and larger beta sheets, one combined to two, combined to three, combined to four, combined to five. So you get chain effects. With the coil helices, you pair them up two and two. But once you've done that pairing, there isn't really any strong diving force to pair up the pairs. But it's certainly, it's a similar effect that they're happier together. So that, I think that sums it up a little bit on the beta sheet part, but this comes back to the differences between helices and sheets. And this is why it's so useful to think about these conceptually. You see the alpha helices are local hydrogen bond interactions. And I know that you're all aware of that, but think about what it means. So the local hydrogen bond interactions versus non-local, I'm arguing that's causing exactly the feature you saw in the last slides. Because if they're all local, they can only pair up with something that's local, right? That will never drive any large scale aggregation. But because beta sheets have non-local interactions, as long as you can find favorable such interactions, they're gonna aggregate and form very large structures occasionally. And partly related to how those interactions are formed, this typically means that alpha helices are much smaller and rigid. They're isolated, small, relatively stable cylinders, at least when they are in proteins, while beta sheets are floppier. And it's not a coincidence that the first structures we determined were all alpha helical. Why do you think? They're smaller. Can you imagine that the rigidity is important? So they're easier to crystallize. And this is something that you need to be aware of. There are a generation, it seems that this is something that each generation of scientists go through, that we think that only the things we see exist. And because if you start to work in a lab now and start to overexpress and crystallize proteins, there are lots of proteins, but you're gonna fail for lots of them while you succeed for others. The problem is that this is not random. You're gonna succeed for the ones that are alpha helical. So the first protein you determine is gonna be purely alpha helical. The second one you determine is gonna be purely alpha helical. By the time you come to 100 proteins, and it turns out that all proteins are alpha helical, what prediction would you make? At that point, you're gonna say that proteins are always alpha helical because you can't see the beta sheets. And this goes back to lots of other famous parts in science, such as the discovery of bacteria and everything, right? It's very hard to make predictions about the things we haven't been able to determine yet. But you need to be aware that this is always something that biases us. We think that the things we see explain everything in nature. What this means on a larger scale though, because they are flexible and everything, but they have all the hydrogen bonds between strains, there tends to be lots of global constraints with beta sheets. And that's partly what gives rise to these phase transition properties and everything. And this is actually why I started with helices here, I'm sorry, with sheets, because when it comes to understanding how they're organized three-dimensionally, beta sheets are kind of easier. All the organization of beta sheets had to do that we paired up their hydrogen bonds one way or another, right? So that you formed nice layers, sheets, something on the inside, something on the outside. So what would happen for helices though? For helices, all bets are off, right? You don't. If I say that you have one sheet here and then a second sheet, your prediction about the interactions of those sheets should be either they make hydrogen bonds and form a larger sheet, or they somehow pack and form two layers, right? For alpha helices, well, in general, how do alpha helices interact? Right, so that's one way or another they're gonna need to pack, and we know a little bit before they break that they're gonna make some sort of angle either 20 or 40 degrees relative to each other, but then the devil is gonna be in the detail, the exact properties of alpha helices. So when you look at an alpha helical structure, it can act, I wouldn't say that it appears more disordered, but it's more complicated. How do you predict the exact packing and everything here? It's also gonna turn out that you have quite a lot of diversity in helical structures. So the simple one is that if you just take a couple of helices and make them parallel or anti-parallel, they're gonna be easy to understand. In general, helices will not be exactly parallel, and particularly if you put something in a membrane, it's gonna turn out that it's very nice that helices that go straight through the membrane. So you're gonna have a number of helices that make some angle to each other and they create some sort of inside and outside, in particular if it's an iron channel. We will talk more about that later. You're gonna have these globular proteins that are hemoglobin and myoglobin where it appears to be a mess, but what this mess creates is creates some sort of binding pocket on the inside of them. But these three structures don't really have anything in common, apart from the fact that they're created entirely of alpha helices. So let's start with the easy ones, four helical, four helices. It's hard to make, it's hard to pack three helices perfectly, there are some exceptions, but four helices are kind of the simplest posted childs. And I would say that, and this, I think I follow the book, I don't remember if I follow the book here, but there are three very common helical structures. One of them is called a cytochrome C-fold, one of them is called the TMV code protein, and then something called hem erythrene binding protein. Don't worry, I don't expect you to know what these are. Which one of, if you just look at these, which ones do you think are more stable? Which one looks nicer? And which one looks worst? So why on earth would you have a protein like that one? This looks really crappy. Nature can't have had a good morning when it created that with evolution. So one thing that happens with all of these that they are all anti-parallel. So just as sheets can be parallel or anti-parallel, here you start with, I always start with blue in the N-terminus and go to red in the C-terminus. So you say you go up, down, up, down. And that's gonna be true for all of these. So you never, because if you wanted to have one helix and then you would need some other type of structure if you would like the second helix to be parallel. So all common helices are typically anti-parallel. They go one direction and they turn around, make a small loop and you go down again. So this, if we start with the cytochrome C-fold here, this is a very common fold that is typically involved in electron transport. And there are, electron transport might not sound that important, but your entire energy system in your cells or in the leaves with photosynthesis and everything has to do with electron transport. So it's some of the most important energy processes that we have in life. They are, this type of proteins or large cytochrome domains are very common in some bacteria actually. And these bacteria, there's a particular one called chevonela on adensis, that is known that they can bind heavy metals. So what you typically have bound on the inside of this is that you create some sort of binding site or something that you can bind heavy metals to. Why on earth do you think that I have a slide that talks about bacteria and heavy metals? So why would you ever want to digest heavy metals? Contaminations, right? If you wanna clear up contaminations. So this was a long time ago that when I was supposed to get Stanford, we had a grant from DARPA, which is actually the US military research branch. And I'm not, this is, well, I can confess it now because it's so many years ago, scientists are only interested in their science. But when it comes to funding your science, you basically need to look left and right for some agents or anybody that is interested in pretty much anything. And then as a scientist, you have to argue that whatever you do is super important to what they're interested in and try to get funding. So at the time, we were just interested in protein folding and understanding fundamentals, the ability of proteins. But it turns out that the US military in particular was super interested in these programs. So if you could somehow use bacteria to clean up radioactive metals in particular. And then we managed to create some sort of proposals that argue that this was super important and that we managed to get some money for it. I think DARPA has one annoying thing that they tend to pull grants when they need the money for war. So I think after four years, they decided that they needed the money for the Gulf War instead or something, but we managed to create some good paper, completely unrelated things that I think was far better science than the bacteria. The second protein is not as stupid as you might think. So tobacco, I'll say virus. What is a virus? Do all of you know that? So it's a virus live? No. How do we define whether something is alive? Well, can a virus reproduce? Yeah, so then you would argue that it is alive? Well, it kind of does. Not quite self-replicated self-replication. Yes, you had. Right. Yeah. I'm sorry to say, but you can self-replicate. So most of us aren't alive then, which would be a bit of a bummer. The point with this argument, there's not necessarily a unique definition. With life, we typically talk that there should be some sort of turnover of energy in an organism or something. You can define that any way you want. We typically don't consider viruses to be alive. The cool thing with a virus, a virus is pretty much just optimized genetic material, RNA. And then you're gonna need, if you just had the RNA isolated, what would happen with RNA in isolation? As I told you in the very first lecture, RNA degrades, right? So we're somehow gonna need to protect our RNA and you protect this with some sort of coat. You're gonna need to enclose it. What do you think that coat is made of? Protein. So tobacco mosaic virus is a virus that attacks tobacco leaves. That's because you have the name. And this was one of the first, this was some huge effects in the US, or crops in particular. So people wanted to learn more about it, to find it. And it was one of the first proteins, viruses for which we were able to determine the structure. So this is an electron micrograph. So you see these rods? These rods are the virus. And if you magnify that even more on the scale of 15 nanometers or something, these are the rods. It's super noisy and everything. And you can't really, this was a completely different type of electron microscopy and that's what we can do now. If you start to magnify that even more, the person who came up with the structure of this was Rosalind Franklin of DNA fame. And here's what the virus looks like. I think we have a, yes, we have a picture of Rosalind too. So the red part here on the inside is the RNA. And then you just have one single protein that's repeated over and over and over and over again. It doesn't get any simpler than that, right? So in this protein, you need to have, well, you can probably almost see it here, right? Do you see that there are the sealuses here? It's packed. But if you're gonna pack the sealuses circularly, it's somehow, it has to be narrow on the inside and broader on the outside. Otherwise, you're not gonna get good packing in a cylinder. And that's why you had this strange shape of the virus, sorry, a strange shape of the four helical bundle that it appeared to almost be disordered because you don't have room for four full heluses on the inside. And that's where we needed those loops. And then on the outside, we needed to cover it a bit better so that we had it a bit expanded. So how many proteins, if you look at this RNA, how many proteins does it need to code for? One. So you have a life form that only needs to code for one protein. And then it infects a cell and injects this RNA in the cell nucleus and starts producing one type of protein. And then you create more viruses. I mean, viruses is one of the most beautiful things we have in nature. I'm well aware of all the diseases causes and everything, but from a structural evolutionary point of view, can you imagine a more optimized life form than one that just goes for one protein, four heluses. And that replicates itself. It does need other organisms to survive. But you could argue that's a good thing, right? You've even optimized the way you don't even have to pay your own subsistence. You just borrow some other organism and let them do the work. Well, that's bad for them. If you're the virus. So this relates to something else. Most viruses are not optimized to kill their host organisms. They're optimized to make sure that they produce as much virus as possible. And if the host organism dies too quickly, that's gonna be bad for the virus because you want to produce more virus. Could you imagine something else? That's a problem with just having one small, one small code protein. A single mutation could disrupt it or this is gonna be a very simple repeating surface, right? So I know that this is not a protein that would attack you, but your immune system could likely recognize that system fairly easily. And then the only part it could of course try to change the composition of amino acids on its surface, but this would be a fairly easy virus for your immune system to recognize. And that's why some other viruses such as HIV has a much more complicated code and we have different proteins expressed on the code. Because the way a virus survives is as your immune system learns to recognize the surface of the virus, the virus has to mutate to survive. The third part is, sorry, the third of these four equal bonds is hemoglobin, which was, again, one of the very first proteins for which we determined the structure. Here too you can treat this in a couple of different levels and we've talked about protein structures, I won't go into details there, but hemoglobin is special in the case that it wouldn't work in isolation. You're gonna need this small heme group or technically it's called a protoporphyrin until you bind an iron and then it's called a heme group. This heme group is where you bind oxygen in your blood. And then hemoglobin actually consists of four chains that are identical. There are some very important reasons for that that I'll come back to later. So this is a tetramer, exactly the same gene coded four times. Why on earth would nature do that? It's kind of stupid for something to be a tetramer. Couldn't you just have four isolated proteins? This is gonna be a more complicated structure to fold, right? Yeah, so we'll come back to that later, but the point is there has to be some sort of functional advantage. We don't know what that is yet, but if there was not some advantage to that, there's no way nature would ever have a more complicated protein, a complicated fold. So one way or another, there has to be an advantage to this and we'll come back to that functionally. It's a bit of a story. I think that's gonna be way later on in the course. But as you said, this is gonna be important to how you bind oxygen in blood and how the blood binds. I'll give you a clue already. Do you think hemoglobin is good at binding oxygen? Sorry? That's not stupid. Shouldn't it be good at binding oxygen? So that's the problem, right? The better, actually it's great if hemoglobin is good at binding oxygen. The better it binds oxygen, the more oxygen you will bind. That's great in the lungs. So the problem is at some point, if you just bind oxygen, that just means that you have a lot of oxygen bound, right? At some point, you're gonna need to release the oxygen. And that's what this process is gonna be related to. So I'm gonna, but that's right to allosteric modulation that we will talk about after Easter. Hemoglobin looks almost boring, and unless you know this fact that you now know that helices tend to cross each other at predetermined angles. Virtually every single angle between helices in this hemoglobin form has been optimized so that the side chains are packed perfectly to each other. And just as the small beta sheets did, do you see that we've essentially created a small pocket for the hemoglobin to bind them? Now, the hemoglobin needs to be fairly far out in this pocket, we also need the oxygen to access it here. And the way the hemoglobin is bound, well, we don't really form normal bonds here, but you see that you have a histidine there, and then you have typically histidine there. So these histidines somehow, they tend to coordinate with the iron and create a binding site for the iron. So this, if you wanna understand interactions like this, you would actually need quantum chemistry, because here it gets super complicated. The large charged iron atom, and then this iron atom should go through oxidation, so is it gonna be an iron with charge two or charge three? Lots of electron transport processes you need to understand if you wanna understand this in detail. But here we're gonna focus on the protein properties. The second protein we had that was determined at the same time was myoglobin. And myoglobin actually only consists of one of these subunits instead of four. So apparently you can't bind oxygen without having all four subunits. What's myoglobin? So myoglobin binds oxygen into muscles, right? So somehow at some point you're gonna need hemoglobin to release oxygen to myoglobin. And that's where nature uses this one versus four subunits. So this should be easier, right? You just have a small simple peptide sequence, and then this peptide sequence folds up into this protein, and all this encoded for by a simple gene. So what do genes look like? And how do genes, how are genes transferred to proteins? So you might have talked about this in bioinformatics, but I like to bring this up together. Do you know what introns and exons are? So this is, this far we've just, we pretended that we could just start from an amino acid sequence, and that this amino acid sequence somehow came directly from bases encoded in DNA. But it's not that easy. So you're gonna have only the exon parts actually code for proteins. At some point we will have cut out lots of these introns. Today, when the book was written, I'm not even sure where the book mentions this, today we know much more about the introns that the introns are frequently used to regulate the amount of the genes we express and everything. But how are these exons stitched together into a protein? Sorry, the RNA is spiced, but I'm thinking much, if you think about protein structure, what do these parts correspond to? That makes a lot of sense, right? There is a famous saying that to every complex question, there is a simple answer, and it's wrong. If you see this, it would make perfect sense that that's a one helix, two helix, and one more helix. It has nothing whatsoever to do with structure. So in the case of hemoglobin, the first exon is the red one, second is the blue, and the third one is the yellow. So the splicing occurs right in the middle of the helices. So the whole exon and intron coding, that occurs way before you start folding the protein. So that has nothing whatsoever to do with any structure. And if you go in, people in the 80s went into some detail to prove that some of these exons can actually bind a heme group in isolation and et cetera, but then it won't bind the oxygen. Which again, is not a coincidence, why? Sure, that's from a structural and functional point of view. You could argue that we need the entire structure. But what if exon two bound a heme group and it could bind oxygen? What would nature likely have done? Yeah, nature would have optimized the way exon one and three, right? So we need all three exons for it to work. Nature rarely allows unused craft to tag along. In many cases, we have no idea why things are tagged along. We might not understand why it's stabilized. But you can almost always count on the fact if you see something in nature, it's because nature has, through billions of years of evolution, realized that it's advantageous to have whatever you're having here. So as I mentioned a little bit before, that all these packing you see in hemoglobin are related to these ridges. And the tilting one helix relative to another by minus 25% is common. Or sorry, ah, I had it there. I need to make those phones easier to read. It actually turns out that there are two common crossing points at roughly 40 degrees and minus 25 degrees. And if you just do the statistics in the protein data bank, it's gonna turn out that you can see these things very clearly. There's one peak in the distribution here and one peak in the distribution here. So what you see evolutionary by looking at structures in the B to B corresponds exactly to the rough crossing points we actually see if you just optimize the structure. So evolution happens because we see clove free energy. It might be just so slightly more stable. That's actually a good question. Anyway, this plot is probably 25 years old. They might very well have changed. If I saw two plots like this, I wouldn't necessarily make any strong prediction about it. And I would also argue that this one might be slightly wider. So I think it might be just so slightly more stable, but you will see this, if there is a difference, I would say it's like 60, 40. And they're so similar, right? That you would need any time you could never ignore that one and just focus on that one. You can ignore the low level regions here, but in practice, you would always need to take both those into account. So no, I wouldn't put any significance to it. One is slightly more common than the other. So there are some other ways we can use this helices. So here I just spoke about things that we do in nature, what nature does. But can we somehow use the fact how helices pack to create something else or do something with it? So we've already spoken a little bit about when helices pack relative to each other. You already mentioned that. So when do helices pack relative to each other? For two helices to start interacting. There should be, have some sort of hydrophobic side, right? Remember that the first week I spoke about this helical wheels that you could somehow, if you deliberately place hydrophobic residues on one surface of a helix and hydrophilic on the other one, you can somehow create surfaces of a helix that would like to be away from water. So we can do that and you can even use a slightly different term from it. So these are helical wheels. And here we've created, put lots of leucines here and then lots of glutamic acids and lysines on the outside. So you're going to be very polar on the outside, very hydrophobic on the inside and very polar on the outside. We spoke a little bit about dipoles in electrostatics. And then this dipole would mean that you have a minus charge on one side and a plus charge on the other side. And then you can calculate what a dipole moment is, which is basically the difference in charge multiplied by the length. Here you could similarly kind of decide if you say that zero corresponds to be hydrophobic and one corresponds to be hydrophilic. You could somehow define a moment across a helix to say how polar a helix is, right? Normally there is not a simple, unified way of calculating that, but you could imagine having some sort of arrow that points away from the hydrophobic to the hydrophilic part and the larger the difference is, the larger this arrow would be. So in this case you would have very large hydrophobic moments. And this means if you have two such hydrophobic moments, just as dipoles like to pair up, they would also like to pair up and turn the hydrophobic parts relative to each other. So one way is that you could of course create two helices like that, or if we have a four helical bundle, what if we make a quarter of each helix hydrophobic on the inside, right? So that all four of them would like to pair up. If you have four helices like that, we can create a pretty neat hydrophobic binding pocket on the inside. And then I have a hydrophilic pocket on the outside. And you know what, let's then pick something roughly like a heme group. Helices were good at binding heme groups. I tell that you needed a cysteine or histidine or something for it. So let's add some histidines here on the inside and then make my small four helical bundle bind a heme group. So if you create something to bind a heme group, what would this molecule do? It would bind oxygen. This is basically what there are a number of companies that are working on this now to try to create artificial blood. And there's slightly more complicated structures than this, but the whole point, you're basically reusing an idea nature had. And nature's idea was to use heme groups, right? But the hemoglobin was too large and complicated. So when it's protein engineering, let's try to create a simpler structure that nature really hasn't come up with. But we can design a structure that ideally has the right properties. Yes. So in general, the dipole moment of a helix goes from the end terminus to the C terminus, right? So that goes along the helix. While these hydrophobic moments they typically go across, because here we're looking from start to end. So the hydrophobic moment would go from one side of the helix to the other side. That would be perpendicular to the helical axis. So the helical axis goes straight into the wall here or out from the wall. But the hydrophobic moment here would go in the plane of the... So what would be good with artificial blood? So one problem, I wouldn't say a problem, but real blood has a whole lot of other things than just red blood cells and heme groups, right? That's good for you because you need it to survive. But in the case when if you have a major bleeding or something, in the acute phase, the main point is not that you need a better immune defense. What you're dying from because you're losing blood is that you need to be able to carry oxygen in your blood. And in large parts of the world, we certainly have lots of problems with blood-borne disease and everything. HIV, for instance, you can't really detect HIV until a couple of weeks after the infection. So there's always a risk. Anytime you get blood from another human, there is a risk that you will get an infection. And for a long time in the 1970s, there were a lot of people getting hepatitis C through blood because we didn't have any good ways of detecting it. So one way people do it, in case we occasionally freeze your own blood ahead of operation, but if you're in a major accident and you've lost too much blood, giving you artificial blood would be a very safe way. To treat you. And there were also a fair number of religions where you simply, there's against people's religion to accept blood. And there are even people who willingly die rather than accepting blood. Artificial blood, on the other hand, it's not blood. It's just a protein you're adding. You use the same thing, although not artificial blood, if you want to use emulsifiers. So what's an emulsifier? So an emulsifier is a way you try to mix fat oil with water, right? Oil doesn't dissolve in water, as we saw in this course. So one way, if you can now create some sort of small pocket where you're hydrophobic on the inside but hydrophilic on the outside, you could carry the fat in these pockets, right? And then you could increase the solubility of fat in water. Detergent is basically an emulsifier. So most dirt is very fat. But by dissolving this fat in the detergent, you can then transport it with the water. And we frequently use this in low-calorie food today because we want less fat, but you want a small amount of, well, you want a small amount of fat for frying, for, but you don't want it to be 100% fat. There is one big problem with that. What is that? If you look at, say, low-calorie margarine or something. So what low-calorie margarine does is you have fat that doesn't really consist of fat. You've mixed lots of water into it, right? What happens if you try to fry or heat low-calorie margarine? It burns, right? Or you get some sort of, something happens to it so that you, well, would you typically get a precipitate in the pan or something? It burns. So why do you think that happens? Think about it. So I said that, so how did I create the emulsifiers? What do the emulsifiers consist of? Protein. What happens to protein if you heat it? You denature the protein, right? So as long as it's room temperature, it's gonna be fine. You can spread it on bread or anything. But if you heat this protein, the whole idea that you had, you're actually trying to fry mostly in water, but you had a small amount of fat. But if you're now destroying the emulsifiers, the water and fat separates. And then the water quickly evaporates and I think it's frequently the protein that ends up burning. So there has been a lot of work in the food industry to try to create more stable proteins. Because if you could create a protein that could withstand 100 degrees centigrade, you can fry it. Actually, no, you could boil it in 100 centigrade. To fry you would probably need something that could stay in 250 degrees centigrade. But there are certainly some low calorie products nowadays that you can at least boil. Yep. Yes, but then the protein would likely just go through your entire intestine tract and go out the natural way. You wouldn't necessarily need to be taken up by the body. But then there's also most of this stability. It's not just pure protein design. You're using lots of other chemicals. It's very important in food nowadays. So that brings us to this other topic that protein design in general, right? There are some pretty cool things we can do if we start designing proteins artificially. Yep. So if you just take a piece of meat and put it in a pan without fat, what's gonna happen? Yeah, so boiling the meat, you mean? So the meat, oh, that's a good question. So the amount of water and meat is I guess is probably slightly lower than you have in these fat products. In general, the meat consists of a whole, meat is mostly amino acid, mostly muscle, right? I actually don't know exactly what happens inside the meat. The reason you get these special surface on the meat has to do with the myard reaction and it's even more pronounced when you barbecue meat. This is a very, very special organic reaction that has to do with the specific amino acids you have in the meat. But the point is that there are some, because proteins are so specific, they're much more specific than other molecules. If you can start to design proteins, you have a pretty amazing tool books you could do things with in biotech. Do you know what this is? This is lots of money. Lots and lots and lots of money. This is a part of an antibody. It's a designed antibody. Adalimumbab, sorry, difficult word to pronounce. I'm sure you've never heard of that. What does an antibody do? It binds to something that's considered bad for whatever reason. And what does it do then? Right, so it's based, you can argue, it's the bait in our immune system, right? That the antibody, one harm to the other antibody is basically used to activate your immune system and then by regulating the binding parts of the antibody, we can train the immune system to recognize different things and basically start killing cells. The immune system is pretty blunt in that way. It's just a killer machine. But just somehow you need to decide what things, you should only kill the bad things and not the good things. Antibodies, as you see here, are predominantly beta sheet structures. Do you see also that there are multiple domains here? There's some sort of domain here and then something up here and then something up there. At this point, at least for me, it starts to get difficult to keep track even of the beta sheets here, right? So now you're gonna need to move up to some sort of even higher organizational level. And antibodies are frequently drawn as more wise and the reason for these more wise is that you have some small and large subunits. There are lots of disulfide bridges between them and the whole point that this varying subunits up here, these are the ones that are trained to recognize the pathogens. And then you can reuse 95% of the structure which is the part that should activate your immune system and just change the 5% to target different pathogens. This antibody approach, what if you now design, there are lots of diseases, right? That the body isn't very good at treating or there are cells that for whatever reason you would like to kill, for instance, a cancer cell. So what if I could now tailor make an antibody where I redesign this part to bind to something that I think is bad but your body hasn't been trained to recognize that as bad yet? Could, for instance, be a cancer cell? Cancer cells express lots of things on their surface that might not be quite as expressed as normal cells. So then I could tailor make antibodies that would bind to the bad things and then your immune system would do the rest of the job. Just kill the cells. Just kill the, you could go straight throughout your body and just killing with razor precision, just killing the cancer cells, nothing else. That would be pretty amazing, right? This is already on the market. So how is this different from a normal drug? It's more specific. It's a protein drug, right? A normal, what does a normal drug work? A normal drug, benzene molecule of, I can't even draw benzene. If anybody says drug, think benzene. It's slightly more complicated. You might typically have two parts and then some other things sticking out but a normal small drug is a small, simple organic compound. It can't have too many atoms. That will of course, that will of course activate some receptor hopefully but it's also gonna bind in 500 other places. The reason why we approve something as a drug is that it hopefully doesn't have two severe side effects or at least the side effects should not be as bad as the disease. This on the other hand, can you imagine how specific binding this will have compared to such a small drug? You're talking about billions of times more specificity. So this would be amazing. The only problem is that how do you need to take this? You need to inject it. So this Adali Mumbab that you've never heard of and I write, this is something called Chumira. And Chumira sold, actually this number's over. Last year in 2015, Chumira sold for over $14 billion. Do you know what Amgen is? Amgen is one of the world's largest pharmaceutical companies. This single drug accounts for 65% of all Amgen profits. Can you imagine the risk to the company? For each of these companies, every single drug like that is to make it or break it. If this fails, the company dies basically. No, it's not quite that bad. It's because they have some margin but that's also why they need to charge so much for it because before they found Chumira, they probably had another 50 drugs that failed, right? The companies tend to go through a blockbuster now. This is in general one of the absolutely strongest trends in modern drug design that the really powerful and the really efficient new drugs, they tend to be biologicals. Have you heard of biologicals in the care courses? So biological drugs or biosimilars are drugs that work like in normal proteins. So it's not the traditional approach would be this one. And the traditional way of finding a drug is to go into Amazon's and realize that some strange plant tended to, no, I'm serious. Some strange plant tended to have an effect on, for instance, reducing your heartbeat or something. And we know that that was good in some cases. And then you spend 10 years trying to purify the substance. That substance is not gonna be good enough. And then you need to go into the lab and optimize that substance even more. But historically, most drugs is because we started from something that happened in nature. And this is a fundamentally different way of designing drugs. One of the reasons why we have to do that today is that most historical drugs would never have been approved on the market today. Aspirin, there's no way aspirin would have been approved today. There's far too many dangerous side effects and deaths. But of course, because it's already on the market and you don't even need the prescription for it, we tend to accept that. There's not a chance it would have gone through even stage one today. And because of these increasing requirements for lower and lower side effects and better efficiency, that's why we tend to go with biologicles. The other thing we can do with biologicles, of course, you can make it even more specific, right? But there is also a problem for these companies. Could you imagine another antibody binding, having a slightly different sequence of amino acids but still binding to the roughly the same biotope? So, and this is what's called biosimilars. So biosimilars just mean that you can have another antibody that actually has a different sequence. It just tends to bind the same way. And then it becomes pretty difficult to patent. You get, yeah, you just spent 50 billions developing your drug and you have a patent on it. I just create another amino antibody with a different sequence of amino acids. And now everybody will buy my drug instead. So it becomes very different, I wouldn't say difficult, that they're pretty good at making patents here. Because what they typically do as a company, you create a bomb carpet of 100 patents on formulations and dosage and everything. But this, so learning how to protect these drugs is a very big thing in the pharmaceutical industry. And even this is a fairly horrible, simple drug. Imagine if you could create a small molecule that was just three, four beta sheets and an alpha helix. Because this is still that you're following the rules of your immune defense. What if you could create an even simpler drug? And that's, I would argue, where most of the development is happening today, that people try to create smaller and more efficient things. And ideally, you would, of course, like to something that you didn't have to inject, right? They can really start to make money. If 14 billion per year isn't enough. Oh, sorry. Yes, I think I did share that one. Formally, that's, eumira is actually used for rheumatoid arthritis and there are a bunch of psoriatic diseases. And I think it was recently approved for using into some eye lens related diseases too. So that's the other thing these companies do that you would like to extend the protection of your drug, right? So that you can find a new disease and that you prove that it's good for this one too. Or you can use slightly changed the amino acid sequence. You would basically, if I was making 14 billions per year or something, I would probably try to extend this pattern protection too. Does this mean that companies are bad? Pharmaceutical companies? They certainly make a shitload of money from this, right? So this is the problem. I'm so certain, I'm certainly not trying to protect these companies that, in particular, these large companies, they have nothing more to do with research than pure marketing companies. I bet that Amgen didn't, well, I know that Amgen didn't develop this. It was a smaller company called AbbVie. And AbbVie in turn probably bought the original invention from some other places. So what you do is that these large companies, they just buy drugs that have been successful in smaller companies and then they do the marketing. So the amount of research in the large companies has actually gone down. But on the other hand, there's a very vivid market for research and there are so many diseases that we can't treat now that we couldn't treat a few years ago. And here's the problem that I wouldn't necessarily protect and say that it's great that drugs are expensive, but if you can choose between making fighter jets or making drugs that cure cancer, I wouldn't necessarily blame the company that's making drugs that cure cancer, right? So at some point we have to choose because our requirements for regulation are so high and everything's, if they didn't make this money, well, that drug they're making a lot of money from. But for every such drug, there are a thousand drugs that fail. So unfortunately, I would say that the whole pharmaceutical industry, this whole pattern that we're making drugs and they're becoming more and more expensive to develop and they're becoming more and more expensive to buy, it's failing. And overall, we're gonna need completely new approaches to drug design. Many of them will likely gonna be based on personalized medicine and genetics, but that's kind of gonna be your job the next 10 years to solve this problem. We also, yes, I think I have 10 minutes in my hands. I'll see if I can go through the last few slides. So now we talked about pure beta sheets proteins and pure alpha helical proteins, but they're also mixed ones. And the mixed ones, I kind of like the mixed ones. They're beautiful. You can't mix a single beta strand directly with alpha helices, why? So this beta strand would have to make hydrogen bonds with something, right? So a single beta strand can't make hydrogen bonds with the helices. So then there are gonna be two ways of doing this. You could either have alpha, beta, alpha, beta, alpha, beta. So you alternate them back and forth or you could have a larger part of a protein where one, say the first part is only alpha helix and the second part is only beta sheets. You probably spoke about this a little bit in bioinformatics and classification, did you? Well, okay, nope, then I'll... So the simplest one are the alternating ones. And one way of doing that is that, in particular we wanna use parallel beta sheets. So here you have the beta sheets on the inside and the alpha helices on the outside. So what you do there is that you have one beta sheet going up and there you go on the outside and have a helix going down. And then you go back and you can now have a second beta sheet that goes parallel to the first one. It goes up again. So by alternating them, you're using the alpha helices to get back to the first start of the beta sheets. This is called a Tim barrel and the barrel is the inside. Yeah, oops, I shouldn't throw that. This is another structure where slightly... So here it's more complicated. So here we have one structure. Here you have sheets helix, sheets helix. So you end up with some sort of sheet stack between two sets of helices. But on the other part of the structure you have a pure beta sheet domain, right? Yes. I would say there are coils. Because when you say turn, I would usually reserve... I might very well be sloppy and say turn at some point. But the real turns are these super tight turns you have in beta sheets where you just need two residues to reset and go back in the opposite direction. So here I would call them coils or loops. So even between... Parallel beta sheets. No, but if you parallel beta sheets you would need some sort of coil. But typically, having that long coil is bad, right? So you're typically gonna have some sort of... Anytime you see parallel beta sheets you typically see these structures. You have one strand and then you need a helix to go back and then you have one strand and then you need a helix to go back. Since it's Friday, I think that some of you... Since you are so fond of labs you're gonna do some labs on this course tonight. You know what this is? This is what you do to break down alcohol in your liver. And I would suggest that you do some experiments on this purely in the interest of science, of course. So there are some... If you look in the worldwide population there are actually some genetic differences here. So in the part of the world, in particular in Asia the... Parts of the population at least have a deficiency in alcohol dehydrogenase. I guess it's good if you want to get drunk because you don't need to drink as much. For some reason... For some reasons, in the Nordics it's the opposite. So in the Nordics you have this vodka belt in Northern Europe and everything and people are frequently have lots of alcohol dehydrogenase. Why? I have no idea. Normally it's like... Alcohol is not typically something you feed from. So you could argue that it's not something humans should need. But I guess for historical reasons we have... Alcohol is a very important part of culture and being able to use crops that have been fermented or something. It's just a fluke of nature. Yeah. And there are... I don't know. That's a separate chapter. I'll come back to that. We talked about membrane proteins. Alcohol is an amazing molecule just from a cultural perspective. There are 4,000 years of human culture related to ethanol both use and abuse. So there are a couple of these faults that I think is good to have heard. Rosman fault is this... You saw the Rosman fault on the last page but I didn't recall that. So Rosman fault is really this concept when you alternate between sheets and helices, sheets and helices, so that all the beta strands are parallel. And then you get one very nice beta sheet between two layers of helices. And it was Mike Rosman who found it. This is very common in binding sites, partly for nucleotides and everything. And one of the reasons is that up here you can have very lots... First of all, both the helices and sheets. So you can have fairly nice polar regions up here. So here right at the end of the beta sheet is typically a very, very nice way to create a binding site. It's revolution. So when you see anything like that, if you see any structure like that and I'll ask you to predict where is the binding site at the end of the sheets or the end of the helices, it's usually a very good guess. The likelihood that the binding site is there is zero. So I would say that there you could have a binding site or you could have a binding site up here or you could have a binding site up here. Because that means that unpaired hydrogen bonds that's also frequently going to be good. Yes. So I think we can show... Yes, I'm sorry. So what you frequently end up having here that you first have either helices or sheets, right? So that if you have a binding site, you're going to need some sort of flexibility in the structure. So the mere fact that you have loops that you can somehow change the structure a little bit off, that means that you have some free amino acids that are not necessarily bound inside the secondary structure. And these amino acids, depending on what type of amino acids you have here, you can create something that has good binding properties to whatever you want to bind. And the way you see this structure, if you want your amino acids to be free, right? You're going to need to have it at the edges of your secondary structure. In theory, well, the same thing here. Well, actually, no, this is just beta sheets. If you look at helices, remember the thing I told you about the ion channels that because you have this dipole moments in the helices, say at the end of the helix, something happens. Once you're in the middle of a helix to actually interact with the helix, you would typically have to break its secondary structure and helices don't like that. No secondary structure would like to be broken apart. So the other structure that you saw in the ADH is that you can have one part with just helices and another part with just sheets. And then you would typically have a few alpha helices, few beta sheets, few alpha helices, few beta sheets. They also occur. And when it comes to protein structure prediction, we typically classify proteins in a few very large classes that I think are nowhere through. So you typically can see something is purely alpha helical or you can say that something is pure beta sheets or there could be these two ways of mixing helices and sheets. So the reason I bring up, do you see the hierarchy again? So now we're even large, now we're just not even looking at super secondary structure. Now we start to look at folds. We have multiple helices and sheets organizing, but these are still just complicated blobs of atoms. But as we keep moving up, the hierarchies keep coming back and by organizing things in hierarchy, it makes it easier for us to understand. These occur in one very common place that I will deliberately speak a little bit about later in the course. Last year the students felt that they would have liked more about DNA. These are typically very useful for binding DNA. So that in particular these beta sheets, they tie into bind very beautifully in this ridges in the DNA. The sink finger is one of them and there's this tata binding protein, that's another one. And the reason why I call them tata is literally T-A-T-A is not really more complicated than that. And unfortunately I hate, at some point I hate the fact that you have a psychopaths. Why would you need to bind to DNA? Initiate transcription, right? Could you imagine any other reason to bind to DNA? Could you imagine any other way to bind to DNA? Or prevent transcription, right? Because you can also say, this is a gene you would no longer like to express. And then by having, not the tata binding protein then, but having something else bind there, you could prevent it from expressing. So that's also something you could use, say in cancer. If some genes are expressed too much, could we somehow block those genes so they're not expressed so much? And that's whether you do it for the protein or something else matters, but somehow you're going to need to find a way to interact with the protein. And whatever you want to create to bind here would have to bind better than this protein. Because if you don't bind better than this protein, this protein is going to knock out your molecule. So you're going to need to find something that has lower free energy of binding. And now this is a pretty complicated large binding surface, right? So if you would like to create an antibody or something that binds better than this, it's not entirely trivial. I will have three more slides, so I'll do those so that you can have a weekend and then I'll finish it. Some of these beta sheets, can at first sight appear to have quite irregular structures. I showed you this last week, I spoke about these knots, sort of cysteine knots, right? Where you have the disulfide bridges that created small and very regular structures. Not all proteins look like that. These are some neurotoxins that they literally have an unstructured long loop there. Why on earth would nature have a long loop there? Well, but if the loop is not important, right? Nature would just cut out the loop and produce a smaller protein. So these things, there's a lot of research where you talk about disordered proteins today. So what are disordered proteins? But that's kind of strange, right? There is no point in just carrying around amino acids that aren't used. Exactly on that. And now we're very close to the research front. So just the last five, six years, people are increasingly, what happens with most of these irregular disordered proteins is that they become ordered when they bind something else. So if you just look at the neurotoxin itself, it's going to appear to be very disordered. And they're typically so disordered, you can't even determine their structure. You can have an NMR structure and have a very rough model. But what then happens when this binds to something else in the body that is targeting, suddenly this might form stable beta sheets or something. And that's why you need the residues. So I'll just finish off with one thing. I've showed you a couple of folds, right? And so how many of these folds do you think there are in nature? If you look at small parts of structure, proteins or something. So there was a very famous paper that I think we have a link to in Mondo, otherwise I'll make sure that by Cyrus Schroetja from Cambridge. And the title was A Thousand Folds for the Molecular Biologists. And Cyrus predicted there are, there is only in the ballpark of 1,000 different folds. Despite all, you can imagine the diversity we have in pepta and amino acid sequence, right? But all these amino acids, for whatever reason, there are only roughly 1,000 folds that nature tends to reuse for all of them. And there are many reasons that that we will get back to discussing later. But one of them is likely that there aren't really that many ways, if you consider 1,000 to be limited, that you can form stable small structural building blocks or relatively large structural building blocks. Now, Mike Levitt wrote a paper in P&S almost 10 years ago, it is 10 years ago, where we actually followed up on this. And again, it's like 30 years later, right? No, 20 years later. And this appears to be to not hold. There is more than, depending on how we classify folds, I would think there are probably 1,500 to 2,000 folds today. But the point is that there are not 100,000. And we probably know at least a factor of 100 more proteins today. So the number of new folds we discover is not really growing that much. There are a few of new folds every year. And that's extremely relevant to the bioinformatics course that you went through. Because that means that if there are, we certainly discover new protein structures, right? But what we are now increasingly seeing, that most new proteins we find are similar to something we already knew. So that the white, there are fewer and fewer white spots on the map. And that's really how we're going to be able to design proteins in general. There are, most things are no longer white spots on the map. So that, I think we'll finish there. This is one cool thing. If you start to design proteins, you would imagine that the sec, well, you know that the structure is encoded for in the sequence. But at some point, if I change one amino, if you take a large alpha helical structure and change one amino acid, that's not going to turn into a beta sheet, right? But if I take, if I change 100% of the residues into something that's stable as beta sheet, then it should be a beta sheet. So at some point, there's going to need to be a crossover here. And Lynn Regan at Yale University at the time showed that for a small protein, they managed to find one single protein where they changed less than 50% of residues. This probably doesn't sound that impressive to you, that by changing half the residues, they could turn this into something else. So they started with this protein and then by changing less than half the residues, they could turn, let's see here. Yes, they could turn the partly beta sheet protein into a four helical bundle. No, of course, they changed it in various places. They carefully picked the residues, they changed. But if you, based on what you know about bioinformatics, if two proteins share 50% sequence identity, that's extremely high, right? That's each my left true territory. So if I had ever asked, if I have ever given you this sequence and said there is a protein that looks like that and I have a second protein that shares 50% sequence identity to that one, every single one of you should have said, based on your bioinformatics knowledge, it's obvious has the same structure. It doesn't. So there is no rule in biology without exceptions that structure is sensitive. If you pick your residues carefully, just because in general, on average, you would assume that they had the same structure, there are cases where relatively small differences or at least smaller difference that you would expect can lead to completely changed structure. Yes. So you're of course right in the sense that evolution typically doesn't work that way. But again, there are plenty of cases where you've changed more than 50% of the residues in a protein and they're still homologous. So the reason why evolution survives is typically because we need to reuse things, right? And it's unlikely that that protein would have exactly the same function as that protein. But I think we'll come back to that when we talk more about evolution later because this starts to influence what function does. We have, there are a number of chapters booked, sorry, study questions that you can go through. I will come back and talk more about protein structure next week because we're going to become more and more biological. Björn and Dori have a third lab for you this afternoon that I think still builds on statistical mechanics, but they are gradually going to catch up and have you work with real proteins there too. So have a nice weekend.