 What I'm going to talk about today, we're going to follow with a bit of recap from last time, and we're going to get into the beta sheets. I'm actually going to repeat the alpha helices a bit because I realized that there were lots of equations yesterday. And the second part today, I'm going to have a chance to speak about modeling of real biomolecules. And then the rest of this week, I'm going to be looking at real proteins. So by now, we're going to become, suddenly, going to be very realistic. Oh, sorry. My bad. Send those around. But before that, we have these relatively short questions that we can actually use this. I think that this might even be useful as a way to start discussing the thing I talked about already in my mail on Sunday, that in particular, try to introduce all these concepts without resorting to the fancy definitions. Hey, the fancy definitions are great in some ways. For instance, in an exam or something, I am a physicist in the sense that rather than writing half a page describing something, it's beautiful to just say what the definition is, period. And then you're done with it. You have these questions. You can answer something in two seconds, and you know you're right. But that is not necessarily the same as understanding it. And when it comes to understanding it, I think it helps a lot to be able to express things in normal words. And that's why we have this small challenge that I mentioned about Richard Feynman. So I'll actually let you lead this. So start where you want, but let's focus around these five letters. What is what? What have you learned in the course? What is energy? And what is energy? So it's in one way. This is, of course, something we haven't introduced. So energy, energy is some sort of concept that on the face, superficially, you probably all know what energy is in some way, right? But this one, if you can't really explain what energy is, how much do we really know? Exactly. But energy is really physically, energy is just a concept. It's something we've defined. It's a conserved property that appears to describe these things very well that you can convert. Well, you can convert it into different forms, potential energy, motion and kinetic energy, chemical energy or anything. But in principle, this is a fairly complicated concept. Now the only reason why you don't think of this concept as complicated, because it's a concept that aligns extremely natural to everything you see in the real world. Say chemical energy and gasoline, potential energy if you're raising something over the floor. But that's just because we were lucky. It's kind of similar to these temperature things that with that factor T we saw just happen to correspond very closely to something that you all feel exactly what it is because you feel that it's warm, right? But don't mistake those coincidences for something being simple. That just means that you were lucky and you could understand it. And then you talked a little bit about what types of energy you could have. So in our case, what energies are relevant? What energies do you need to consider? No, free energy becomes much later. We're still talking about, and actually that's a really good, so sorry, I did one mistake here. I call this energy and I should be ashamed of myself. But this is a really good example because in practice we all say energy because it's too complicated to talk about something else. Forget about energy. We should actually not call that. We can call it E, but then you might call it potential or something. So potential, if we for a second seek to potential energy, what would that be? Partly, but I would say that it's an energy that is specified as a function of its coordinates and that the simple part would be a brick that I'm lifting one meter from the floor or something. But inside a protein, all the specific conformations of your atoms, bonds, angles, torsions, all that is potential energy. Electrostatics, because you have an energy because of the proximity or distance two particles have from each other. Lena Jones is just based on where they are. That's by far the most important energy. Typically when I say energy, we frequently mean potential energy in biomolecules. In principle, there is kinetic energy too. The book goes into some detail about this and I actually haven't covered that. But it turns out in all these, an energy is a sum, right? The energy is a sum of potential energy and a kinetic energy. So in all these Boltzmann factors and everything, the energy enters into the exponential, right? No, no, we're still, no, we're still, for now, we're just calling it. Energy is all energy. But I'm saying if we for a second, if we for a second thought that there was just potential energy, then it would be easy. But no, E is all types of energy. Yes, it's sum of every, in this case, when we just talk about energy, if we forget all the other ones, this still looks fairly easy. In general, there are way more forms of energy than potential energy. We typically don't care about, say, chemical energy because we don't, we're not really looking into chemical reactions here. But the second you have a chemical reaction, that would be some sort of chemical energy. But even that chemical energy, you could kind of describe as a quantum mechanical part, right? So chemical energy is in a way also potential energy, but suddenly it depends on the electrons that change when we have a reaction. But the energy also has a kinetic energy component and a kinetic energy component is super simple. You all know this as high school or something. What is the kinetic energy of a particle with mass m? No, not mc2. That's actually, no, that's actually because that would be the matter energy or something, but mv, mv squared divided by two, right? We're not doing relativity here, so for a second I'll forget about mc2. The kinetic energy is really complicated because this will depend on temperature and everything, and on average it's gonna be 300 Kelvin. But that doesn't mean that every single atom has exactly that velocity. That's an ensemble and you can even show that this is not being a Gaussian distributions. Some particles are moving very fast to the right, others are moving very fast to the left, and some aren't moving at all. So in principle we don't know that. However, since the energy is a sum of things, when you have the sum in the exponential, that separates into two products, right? So that the entire kinetic energy will really be a constant and then you have an exponential that depends on the potential energy. And you can call this that you factor out. So you get one part of division that has to do with the division of a potential energy, and another part that has to do with the division of a kinetic energy. So that is when we're starting these distributions where molecules exist and everything, we can usually ignore the kinetic energy. But in principle, kinetic energy would factor into the energy too. But then related to what you were saying, there is a relation. For all these reasons, because I and everybody keep confusing energy with potential energy, or at least with free energy, or at least we confuse others. There is a separation between E and H, too. And H was what? Sorry? Enthalpy. Yes, and what is the difference between enthalpy and energy? Enthalpy of expansion. Exactly, so this would be a completely isolated system that can only, it's not complete, but it can only exchange heat with the surrounding. Here you can also do work or receive work from the surrounding. But to avoid the confusion with free energy, it's actually much better to speak about enthalpy. Because when you speak about enthalpy, it is very clear that we do not mean free energy, right? So if there is the slightest probability of confusion, call it enthalpy, even if you ignore that PV term. Of course, I won't remember to do that because I'm too old, I call it energy. But I feel a little bit guilty when I call it energy. But that is still fairly easy. Both of these corresponds to classical parts that you could understand with high school mathematics or something. Actually, you could understand the others with high school mathematics, too. It's just that we don't bring it up in high school. So then we have this SF and G, which is what? And why are they important? Why do we spend so much time on? Yes, but now you're just doing the definitions. I'm not interested in the definitions. I'm interested in you explaining this to your grandmother. Just to send to the Narcista Mondo. Yeah, so freedom is probably good. And in particular, I like to avoid this whole concept of disorder, because I think it's a stupid definition that leads you wrong. Because the whole concept of disorder is forcing you in the way of trying to understand entropy on a high level. Don't try to understand entropy on a high level. It's something you define on the lowest level, is that? And in particular, entropy, the word I like most is volume, because this is related to the available volume on the microscopic level on the system. And that, in turn, is related to this fact that the way we look at the world on a large scale and the way we look at the world on a microscopic scale is different. So you might have two glasses, and they appear to have the same water level in it. But, of course, internally, these systems can have very different configurations, right? The particles can be in completely different ways. And the problem, then, is that on the large scale in the big world, how we observe or measure something, that measurement depends on how many different ways it is to achieve something on the small scale. If it's very difficult to achieve something on the small scale, it's going to be very unlikely to measure this on the large scale. While if there are tons of ways we can achieve a certain energy, it's going to be much more likely to see. And the problem with that is that both those systems can still have exactly the same energy. So you cannot separate them just by looking at the energy. So we need some way of describing these sets. Some energies are more likely than others, even though they are the same, right? Because there's a multiplicity or many states, many volumes, available more microscopic states in which the systems can be. And that is really what we are describing with this energy. How many ways you can organize something internally. And that enters, for instance, in a protein. As you've seen on some of these slides, there are some tricks that I actually don't talk about. Remember that I said, when you look for the hydrogen bonds, you take one hydrogen bond and fold it either in vacuum or in a protein. And I told, when you do this inside a protein, there is virtually no difference in energy. That is not exactly true, but to a first approximation at least, true for folding of an entire protein, too. There is no clear difference in energy when you fold a protein. You can't explain protein folding by energy. But this is really related to a combination of energy and entropy. Because there are differences in, even if you have the same energy, there are differences in how many states you have available and everything. And that's going to decide whether reactions happen or not. And that, in turn, when you combine these things, you get the story. Is there something else? Is there anybody else who wants to say anything about entropy? Do you feel that you understand it? Send me a link to it, and we can put that up on the website. I think there are two ways of approaching this. I was about to say, if anybody who says they understand it is kind of lying. At some level, you have to accept that this is a definition. You know what you're talking about when we're talking about these low-level states. And I think that's important. Understanding the concept of these microstates, and there is more than one way to skin a cat or organize a system to achieve it a certain energy. But the second you understand that, the microstates are the microscopic volume, except the fact that the entropy is just a logarithm of that. That's a definition. On that level, you don't really have to understand entropy more. But so the microstates, I think you can understand conceptually. Understanding logarithms conceptually, I wouldn't say that it's impossible, but it's not entirely easy. And as I told you yesterday, the reason we take these logarithms is that we can add entropy instead of multiplying them. Because if there were probabilities of state, we would have to multiply them others. And then we spent some time actually showing that, no, actually I did not show that, but I had some links. You can actually show that if you then assemble these expressions of free energies, these free energies will determine how much energy really is available to perform work. Because you could, of course, argue if I put a rock or something on the floor here, the rock has a potential energy relative to the sea level. But this work is not really available, because I can't go through the floor and get that free energy. But if I have a rock two meters above the floor and drop it, that energy is available. It will convert into, well, essentially heat energy at the end. So what are those expressions? Sorry, is that? They're both free energies, but it keeps them actually into the volume in this equation. Yes? The equation. And do you remember the expressions for them? Almost. Minus the. So the first minus is a plus. The plus. Although, so this, of course, depends on what volume you're talking about. But normally we talk about the volume of our small system. And the best way you can think of that is that if you expand the volume against the constant external pressure, you're increasing the internal energy of the system, right? But normally we ignore this volume. And why do we ignore this volume and pressure effect? It's not significant in our case. We're not building nuclear weapons. But if you are building nuclear weapons, please, please, please don't forget it, because it might be kind of important. And this was also very much related to the Boltzmann distribution that we talked a bit about. You probably know the Boltzmann distribution, but by now. But that also means that we start entering this KB. And sorry, question two isn't really so much about what KB enters. But why do we have these constants? And I talked a little bit about that yesterday. So had you sat down and designed modern physics from scratch, you would measure temperature in these units. And in that case, physicists have a tendency to love to do things very simple. So physicists, for instance, we love to think of energy in units of KT. Because if all your energies are in units of KT, the denominator here is always going to be one, right? And then if you have that energy is 10 KT, it doesn't really matter what the specific temperature and everything is. You're going to know that the probability of that happening is going to be e to the minus 10, which is a relatively small number. But this also means that in terms of our normal temperatures, this also KB or KT, KBT, defines some sort of natural barrier height, right? And that's what I talked a little bit about yesterday. And remember this factor three. It's more important than you think. I've even had at least two students who get these questions in defense, and I also did myself at defense as two. It's really fun because you get somebody. They have a PhD in physics or something. And then you tell someone, well, so if this barrier is increasing by KT, how much higher, how much less likely is it to happen? But it's fascinating because these students are super smart. They spent four or five years with equations, but they failed some of those fundamental gut feeling things. And if you don't have these gut feelings, you do not really understand it. Anybody can learn 500 equations by heart. That's not hard. But understanding the equations and knowing what the equations mean, that's important. And in particular, yesterday we spoke about barriers, right? So how do you determine these barriers? Well, we typically try to get them from experiments and parametrized them or something. Are those barriers exact? No, actually I don't think I have any examples. But later today we will talk about real proteas, right? So what if this barrier that was supposed to be 2KT, assuming that you have a slight error and you end up having a barrier that's, say, 3KT? That's just two and a half kilo joules per mole. It's not a large error. Exactly, right? So suddenly you're now making a factor 10 error in the number of crossings. Sorry, a factor 3 error in the number of crossings you have across that barrier. And then you see the same person. They report the rates, or they use three decimals. That happens all the time. And you realize that there is no way the barriers are that accurate. And we talked a little bit about that before that occasionally simulations have the wrong results and everything, and they do. And this is frequently the problem. People have absolutely no idea what are the inherent approximations in your model. Now, the sad thing is that this is not limited to simulations. This is just as, actually in many cases it's even more common in experimental research because you have no idea, we have no idea how dependent all of us are on models, equations, and everything. There is no such thing. Have you ever looked at the protein structure in the protein data bank? There is no such thing as an experimental structure there. The only experimental thing you have in the protein data bank are structure factors. Everything else is a model. The only question is what computer program you use and what parameters you use for your model, how you try to fit the atomic positions to your structure factors. But that's not experimental data. And that's scary because even I frequently think of these structures as experimental, they are models based on experimental data. No? So that's up to the group submitting it. And there was a great example two, three years ago where people used new programs and they went back to the PDB and re-optimized all the structure factors and so that they could obtain better models that is lower resolution by taking the old experimental data but using better computer programs for it. But that's essentially up to the submitter, which in turn means that it's buyer-beware. If you download something to the PDB, you should better look at what group submitted this and do you believe in the accuracy of this model? Hopefully it's right, right? It's just too angstrom, it's hopefully a good model, but there are no guarantees whatsoever about it. Are you thinking about a simulation or, that's a much deeper question. So the first thing, you can't validate it. Because even an experimental structure might have a resolution of one or two angstrom, right? Typically a back-mode motion you see is going to be much smaller than that. But I think you're asking the wrong question. You know what, let me come back to that after the break. It's a super good question, but think about that. How can you predict that the motion you're seeing in a simulation is exact or correct? You can't. But that's not a problem and I'll tell you why, based on what we spoke about yesterday. So I think that covers the very simple part there. We spent a bit of time examining how a system's property, well, sorry, plural, how they change with energy or how the entropy in particular changes with energy or temperature. The reason for that is I'm not going to ask you to draw plots like that by heart because they're hard, they're not natural. But the beautiful thing of this, we haven't assumed anything about systems. But you can start drawing conclusions about when are things stable, when are things not stable. And this is, of course, part of the remarkable power of these very simple expressions. They do tell us what reactions happen and what reactions do not happen. And that's why all these experimental results at the end of the day usually come down to try to convert our experimental data to free energy so that we can say whether some things happen or not. That in turn led to these phase transitions that I'll come back to yesterday and talk a little bit. And then we spoke a bit about transition rates. What are the transition rates or barriers? Because there, those we're going to come back to in a couple of minutes. Exactly, and that's actually what I cheated on, didn't tell you yesterday. So the transition barriers are ultimately determined by the fact that for a reaction to happen, we need to get across the barrier. These barriers are always free energies, always, always, always free energies, never enthalpy under energy. And to get over that rate, we need to assemble enough energy to get over it. And that's pretty much going to be the Boltzmann distribution, right? And in addition, and sorry, it is the Boltzmann distribution. But then, of course, it's not going to be enough for one molecule of one mole for something to happen. So we had some arguments about that this also, you need to have at least a substantial part of your molecules move over. This becomes really complicated because all real reactions have multiple barriers and multiple steps. The beautiful thing is that this, as much as an approximation this is, it's an approximation that works simply beautiful. Small, simple proteins, they fold in a single step. And this works, and we've even published things the last 10 years in this. You can show that it's a, you have a folded state and an unfolded state and there appears to be a single barrier between them. That's certainly not true for a ribosome or something, but for something small, this is not an approximation. I don't have the slides today, but I can dig up some examples for tomorrow. It's not just a model, it fits really well. Not only can, there is. So one important part of free energy, a free energy only depends on the state. I think I tested upon that last week, but I probably didn't emphasize it enough. That's a very important physical concept. A free energy only depends on the state, not the way you get there. By definition, that also means that each state has a free energy. The caveat is that of course, it's up to you to normalize your scale. So one, the free energy difference between two states, that's always known. You might not be able to calculate it, but as a physical concept, it's well defined. Now, if you also have the same absolute value, you can decide what the free energy is absolutely in a state compared to absolute zero or something, that becomes even harder. But the free energy can't depend on the way you take to the state. Why? Assuming you have two states A and B. So assume you have two states A and B, and if we start that A is zero, it's always good to start from zero. And then when I go from A to B, there is one way where it costs, say, 5k cal, and another way where it costs 3k cal. What is the problem with that? No. Now we're talking about free energy. We're talking about free energy, so this includes everything. So if I go from A to B along the 3k cal way, how much did I pay? 3k cal. Then I take the other path back. How much did I gain? No. Well, it's 5k cal difference along the other way, right? So I just gained 5k cal. 3 minus 5 is 2. I just gained 2k cal for free. Good, I'll take the 3k cal way up again. And then I just gained another 2k cal. This is a preparatory mobile, right? That energy is conserved. If I start at A and then go to B and then I go back to A, I can't have gained free energy. That's impossible. That would violate all the rules of physics and conservation of energy and everything. So that's why it's very simple to say that the free energy can't depend on the path. It only depends on the state. That is not as easy as you think. We're going to talk more about when we come to membrane proteins. Yes, no, but it's a free energy landscape. Think of it as free energy. And I know this was bad right, because I showed you an energy landscape the very first day. At that point, we haven't introduced entropy yet. So I called it energy landscape, mea culpa. It's a free energy landscape. It's always a free energy landscape. Technically, you could draw an energy landscape, but it's only the free energy that matters. Well, if you start somewhere in a landscape, no matter what point you get, once you get back to the same point, you are at the same height, right? It doesn't matter what path you took. Applying these helices and sheets, that's what we're going to do today. So what I'm going to do is, sorry, this is mostly the stuff that I talked about yesterday. But we're going to go back to the, oh, sorry, this is all my bad. Let's get back to the helices that I'm well aware that I went through relatively quickly yesterday. So what we're going to do now is, we already the first week, we looked a lot into thermodynamics. And compared how thermodynamics was influencing the distribution of helices and sheet, then what we thought was expensive or not. So what is first? Thermodynamics is occasionally a bit of undefined. What is, in general, is the difference between thermodynamics and kinetics? That's certainly right. You could argue that kinetics involves temperature too. If you don't have any temperature, you're not going to get over any barriers. So thermodynamics primarily concerns what happens at equilibrium. So equilibrium doesn't mean that nothing happens anymore, necessarily. It's just mean that what you call a stationary state so that the distribution doesn't change. But you can, of course, you can have individual particles moving between states. But a great example of thermodynamics would be how much protein is unfolded versus how much protein is folded. In particular, this means what part of the free angel landscape is thermodynamics primarily concerned with? Yes, the valleys, right? Because at equilibrium, that's where we're going to spend time. If you had to cut off those red peaks, it's not really, to first approximation, it's not really going to influence your distribution. Kinetics, on the other hand, has to do with how fast things happen. And by definition, we're looking at things as they happen. Technically, we don't even need to assume equilibrium. Remember that when I spoke early on about the Boltzmann distribution, that there is a different way of formulating this as detailed balance, not just looking at the distributions between states, but standing on the edge of the knife and looking how many particles go left versus how many particles go right. Then you're looking at the transitions instead. And the transitions then deal with what parts of the free angel landscape? Yes. So then you're rather, on average, we're not going to spend any time there, right? But these peaks determine how fast reactions happen or if they happen at all. Because, again, in terms of physics, it's just a matter of how fast they happen. In biology, anything that would take more than 100 years is not going to happen, because the organism will die before it happens. Well, so that's what I brought up yesterday, right? That the peak, the probability of going over a barrier goes down. The time it takes to go over a barrier goes up with the exponential over the barrier height. So it's going to look almost entirely like the Boltzmann distribution, but you have to be a bit careful whether you're speaking about times or transition rates. The transition rates is one over time. So it means that transition rates are going to have exactly the same expression. But then we're talking about transition rates, not about population. So in principle, yes, but you have to be a little bit careful with your definitions. So what I said about a helix is that for systems in general, such as water, ice and water or something, it's very expensive to coexist for the same reason that it's expensive for a drop of oil to coexist in water. You end up having a very large surface here. That is actually, that's because it's a two or three-dimensional system. A simple helix that is one-dimensional, you can grow the length of each helix here without increasing the surface between the helical part and the coil part here. And that's a very deep result in physics, but the point is that in a one-dimensional sequence, such as helix and coil, you can have the helix parts coexisting with the coils forever. There is nothing that penalizes this. And if you would like to describe that properly, instead of with hand-waving, the way to do that would be the free energy, right? And the free energy would be the enthalpy or the initiation energy minus the temperature times the entropy, which would be the logarithm of the order. And this is probably how on earth you determine the number of states in a helix or something, and you can't really do that. So what you then do is say, well, you know what? Let's just say that each state here has one degree of freedom or something. It doesn't matter. It's just going to be a scale factor. And that's why we then end up with the part that if you have large n, total residues, but small n of those have been locked in into the helical part, the remaining entropy we have here is the logarithm of that difference. And for very large helices here, in particular, this term is going to be so much larger that it's perfectly fine to start initiating a helix, even though we don't turn the entire thing into a helix. I spent some time talking about this helix-coil mixing. And the point here to show that is just that if we know from definition that at some point we're going to have the length that corresponded, that we're balancing on the knifes edge, right? So this is good enough that the elongation energy is exactly 0. If the helices were to become any longer than this, we're not going to be happy anymore. And if the helices were any shorter than this, we're not really going to be happy anymore. So that's exactly halfway to the transition. And just by making some very simple assumption, say that the helix can start anywhere, this is, of course, not true, because some residues are going to be more than likely than others to start forming a helix. But by making some very simple assumptions and then saying that the delta F for hormon helix should be exactly a 0, then we can actually solve for this n, which is the number of residues. And that number then corresponded to these things that we can, an initiation energy that we can measure experimentally. Sorry, we can measure that experimentally because we can measure n, 0, experimentally. The number of residues that are helical we measure with CD spectroscopy and then we solve for the initiation energy. So by definition, remember that we're talking about residues, right? And they have their degrees of freedom in the Rheumachandran diagram. One residue will have to be the first to be helical state. It can be epsilon before another, but one residue will have to go before another. So also remember, when I said the initiation energy, I'm not necessarily talking about one residue. If one residue happens to be in the spot in the Rheumachandran diagram that we have a helix, do I have a helix? No. So what is the definition of a helix? Yes, so the second we have that first hydrogen bond to four residues upright, that is a well-defined event. You either have it or you do not have it. Before we have it, we have not yet formed the helix. The second we have it, we have formed the helix. So the initiation energy is specifically the energy that corresponds to getting to that point. And in general, this is very bad, right? Because there are no sequences whatsoever where this would be good. You paid an enormous penalty in terms of entropy of freezing them in, and you're only getting one small hydrogen bond. This is really bad. So that by definition, this is going to be the worst possible part. After this part, it will just start getting better. If it keeps getting worse, it's always going to be a positive free energy, and then we would never fall the helix. So the reason here, though, it's possible to define that well. And you could, of course, at the end of the day, this is up to you. You could have said that this initiation state should be, you define a helix as the state when you have at least five hydrogen bonds. The only problem with that is that then that state is not going to be the highest energy, right? And remember what we said about these transition rates. It's the highest energy along the pathway that determines whether the reaction happens or not. So if you're suddenly defining a state that doesn't correspond to the worst state, it's kind of pointless. And in a way that I find this somewhat fun, that these states are these states important. How frequently would you see them in a simulation? No, I virtually never, right? They are the states on the edge of the knife. You would never see these states in a simulation. The second you get there, you're going to start forming the next residue and helix instantly. So surprisingly, although these are the red points on the free energy landscape, we will never want to visit them. But they are super important to identify it because they determine how large our barriers are. And it's through these states that we can then, everything else that I mentioned yesterday is just about solving it. Once we know how many residues that you have in an average helix, it's just a matter of testing this for different helices. And then you can extract all these numbers. Say that the initiation energy is roughly 4k cal, positive, meaning bad. And the elongation energy might be minus 2k cal or something. It, of course, depends on the residue. And initially, you can see that the entropy part here is also roughly minus 2k cal. But that's negative, which is bad, so that we gain roughly four or five with a hydrogen bond and then we lose half of it back because we're freezing it in a red, in a helical state. I'm 20 years ago, or even 30 years ago, it was super important to start to studying all these helix coil transition. The reason for that was that computers were not fast enough. You cannot simulate a helix folding. So the only way to conceptually understand for what happens during the smallest levels is why do protein structures form. That was really to sit down with paper and pen and try to understand this conceptually. I think it might be fun to know historically or something. Today, a long time ago, we even had a lab on this. The reason why you don't have it is that you will learn much more just from looking at what happens in a computer and then you will see all the atoms in a model. So that's partly an historical parenthesis and it's also a parenthesis for a few years. People tried this as a sort of fancy bioinformatics prediction algorithms. Modern bioinformatics is much more efficient because you use evolution and many more sequences. So please, please, please, please, never ever use this and say you learned it in Stockholm. We also spoke a bit about the rate of formation and that comes back to an entire helix might work. I would even say 100 nanoseconds is generous. It might be faster than that, 15 nanoseconds. So helixes form super fast. You could easily simulate a helix, a small helix folding in a couple of hours in a computer or something. It's just not particularly interesting because I know that that helix will form. But it could be a fun exercise if you wanna do it in the labs. And you will later this week, you're gonna start having the first labs when you are working on real amino acids in water and everything. You can certainly do this. It's not hard. And I also mentioned that for helixes we spend roughly the half the time on the first step and the other half of the time on elongating it. But the take home estimate is that A, that it's super fast. And B, what type of phase transition is the helix folding? It's not a phase transition. And that is exactly because it's effectively a one-dimensional system so that they can coexist happily all the way. So the second part is the beta sheets. And these are... Did any of you try to design a beta sheet predictor in the bioinformatics course? Did it work well? Ah, okay, that's not bad. But in general, it's harder to predict beta sheets than helixes. Everything is harder with beta sheets. It's harder to keep them stable in simulations. It's harder to predict them in particular beta sheets and turns and everything. They're just a pain. And experimentally, they can take weeks to fold. It actually can be far worse than weeks, as we'll see shortly. It can be hundreds of years. And sometimes it just takes a millisecond or a microsecond. There are certainly many helixes we can see folding in a simulation. So we don't really... This is kind of strange. You're talking about the number of orders of magnitude here must be like five or 10 or so, well, 10. So the first question is... Just as for the alpha helixes, we might want to understand is the beta sheet formation limited by the initiation or the elongation? For helixes, it was roughly 50-50, right? So why do you think it's the initiation probably? Right. And that also has to do with that if we were limited by the elongation, the elongation would always be something that had energy barriers, right? And if this always had energy barriers, we would likely... Then it would cost energy to elongate it all the time. That would likely never ever result in energy sheets. So just from what you're seeing here, this is also a beautiful example. We don't even have any numbers here. We're just saying that it varies a lot. Just based on that, we could say that there must be some very high free energy barrier involved there much, much, much higher than for the helixes. And the cool thing with that when you have this really high and clear barrier, and we're actually going to show that beta sheets appears to be a beautiful first-order phase transition. So it wasn't just for fun that I showed you the phase transition stuff yesterday. So the beta sheets, as always, the second you're going to start doing something, in particular, it has to do with modeling. The hardest part is that you don't know anything. So you're going to need to introduce something and define it. When it comes to something that's complicated at a beta sheet, we're going to need, what do you mean by a beta sheet? What are the different components? And what is really that we're talking about? Usually when you fail things in theory or equations, because you didn't think enough about your definitions early on. So for a typical beta sheet here, there are many ways we can look at, but I would argue that the first really big component is when you have one beta strand here that is forming hydrogen bonds to a next beta strand, right? You could, of course, in principle, think of having something in isolation, but then it's not really a sheet. And for this to happen, you're certainly going to need some sort of turn between them. And if you now want to grow this, well, the first thing that's going to happen, you're going to need to form another turn, and they're going to need to keep adding residues so that you form a third strand. And after the third strand, you're going to need a third turn, et cetera, et cetera, and then you keep adding one strand at a time. So there's really what I'm not showing here that you can imagine just having an isolated strand first and then you have two strands and then three strands and four strands. It's a concept of adding strands. And we're going to need to look into what these turns mean, too. I found some really recent articles about predicting the better sheet returns because it seems like more difficult. So it depends. So the beautiful thing with beta turns is frequently that if you have these super tight turns that I mentioned earlier in the course, occasionally, these turns are so short that they're almost part of the sheet, right? So to technically, what most of these predictors do, they don't really predict the turn, but they predict the beta sheet and then they identify that there is a glycine or something in between and that glycine must be the turn because you can't have a proline or whatever in the turn. The other part is that, it's not super hard to get the turns, right? But if you get the one or two units wrong, you're going to screw up the entire sheet. So if you have an alpha helix and predict the start or the end of the alpha helix two units wrong, it's just going to be helix that's a little too short or too long, right? You still have roughly the right helix. If you put this turn two units later, you're going to just have shifted the entire sheet, right? So that the hard part is not necessarily that they're hard to predict, but if you get them just a little bit wrong, you make some fairly large errors in your structures. How likely do you think that this structure is going to be, for instance, we have a beautiful sheet and then you just have a turn, but no hydrogen bond or anything formed? Is it a good or a bad state when you just compare to this one, this is a state where both the helix are, here you just have forced this to be in a turn, but you don't have a single hydrogen bond or anything to stabilize it. That's a worst, probably one of the worst possible states you can imagine. Why on earth do we include that in the picture? It's not an important state, right? Exactly right. So it's a bad state that we have to cross on the way to get to something. And that's usually a very good indication that that could be an important transition state. The state itself you will never see, but you need to understand what the properties of that state is to get to the other side. We typically call this one, not just we, there are quite a few other people in the world who do so too, helix hairpin for obvious reasons. Not that I use lots of hairpins, but I have a 10 year old daughter, we have 500 of them or so. If you compare sheets with helices, there's quite a lot of fun physics here. We're not gonna talk so much about it, but beta sheets are actually two-dimensional, while helices are one-dimensional. And in particular, if you remember those arguments about ice and water, that work equally well in two-dimensions. So if you have a beta sheet in the surrounding, the area between the sheet and the surrounding will grow with a number of residues. This means that faces cannot coexist and you need to have a real phase transition without knowing anything else. Just about the fact that it's that anything that's in two or more dimensions cannot coexist. There will be a first-order phase transition here. And then the problem, but in two-dimensions, means that there are gonna be two types of interfaces with our sheets and the surrounding world. On the one hand, we have the edges here. These residues are not paired against anything, right? This one in the middle here, they form hydrogen bonds to both sides, but the white residues here have an edge here. And you also have this edge, in particular, for a large sheet, the edge that corresponds to all the turns here. So when we're gonna need to look at these energy with the surrounding world or something, those are the ones we're gonna need to look at. And here's where I actually would argue that beta sheets are gonna be easier than helices. We're gonna introduce two small things. Sorry, three small things. The four small things I'm gonna introduce. If you have a real helix or something, think about these or think about these schematics that if you have a residue that is completely on the inside, one that forms hydrogen bonds with some one neighbor to the right and one neighbor to the left, those would be the small black dots on the previous slides. I'm not gonna keep going back and forth. So a free energy of a residue inside that would be Fbeta. I typically use lowercase letters to denote something that's small or something, and you can probably imagine that the beta here has to do with beta sheets, but you could use absolutely any letter you want. We have no idea what this is. So we just define it. But then there is gonna be a difference, a residue that is then exposed on the surface that that only has one hydrogen bond. That's gonna be a different free energy. But on the other hand, some things are also similar, right? It has to have the same confirmation in the Rammachandan diagram and everything. So rather than saying that this is some say E for edge, it's much easier to say, well, to first approximation, this is exactly the same, but there is also a small difference here, delta, which is the extra edge free energy. There is something that's different on the edge. For now, we have no idea what the sign is. So that Fbeta, if we are on the inside, and Fbeta plus delta Fbeta, if you're one of the white residues in the previous plot so that you're exposed, you're only on one side of hydrogen bond. And then we will say every time there's one of these turns or something, whether that is one or two or three residues is of course an interesting question, if you're interested in the atoms. In the terms of stability, we just care about the fact that I will have to turn. So let's just say that there's one residue or something and say that this energy for this turn has to be U. And for now, we haven't bothered about anything about the signs. That is both good and bad. Not bothering about the signs saves you some work when you define things, but when you just see numbers, it's very easy later to assume that they're gonna be positive. And we haven't, that's not true because we just defined it. So then I will just throw out here and say that delta Fb there has to be larger than zero and U has to be larger than zero. And now I would like U to explain to me why that is true. The edge is less hydrogen bond. Mm-hmm. And therefore it would be greater than zero. Yes. That's right. Another way of thinking about it, it's called what you call reducturide absurdum in mathematics, assume the opposite. What would the opposite lead to? If this was negative, it would be better to be on the edge than on the inside of a sheet, right? Would you ever have any sheets? Right, it would just have, it would be better to be isolated strands than to form hairpins. That would be completely, and we know by the fact that that's not what proteins look at in nature. Ergo delta Fb there has to be larger than zero. So your argument is perfectly fine too, but the fact that we know from nature that that is not the case is in general stronger than saying that my gut feeling says that it's reasonable. So with you, let's try that argument then for you. Why does the turn energy have to be larger than zero? Yes, we would just turn back and forth and you would be the happiest sequence in the world, right? That doesn't really happen inside proteins either. So without just knowing that these structures do form, we now know what the signs of those are. And then I would argue that there are two scenarios here. I shouldn't have told you what the scenarios are because you're pretty good at reasoning about this. But let's go through them. So now you look at Fb plus delta Fb, this is the energy of something on the, the total energy of something on the edge, right? And if that sum is smaller than zero, then we would just have a single very long beta hairpin. It's not completely isolated. So it's still on the, this is one strand forming hydrogen bonds to one other strand, but you just have one long hairpin. Can that happen? Yeah, it happens in nature. There are lots of proteins where you just see one big hairpin. Is it very common? No, in general, and the reason for this is that you still, it's still relatively costly to build beta sheets and you don't really get a whole lot of the energy back, right? So we're gonna pay a very large penalty and then you get a little free energy back just enough so that we're stable, but it's not really a whole lot. There are cases when you need that and I'm gonna show you a protein like that, right? The other part is that Fb plus delta Fb that is larger than zero and that actually means that the hairpin is bad, right? But so the only thing that happens, the only reason why this is good is that after the hairpin, it will eventually get better and better the more strands we keep adding. And if you look at the PD, the structures you might have seen in the protein data bank of proteins or something, beta sheets are usually relatively large. You usually have four, five, six strands. So this appears to be at least in general more common in the protein data bank. So let's start by looking at this. It doesn't mean that the first one is unimportant but this is likely gonna be more important. So would you then say, and if we're now gonna look at how bad or good beta formation is, would you say that the single hairpin is the worst possible transition state you can have them? Well, when you say a sheet, well, a single strand with one turn, we just said that a single strand with one turn, when we actually have formed the entire sheet, that's even worse, right? But the more residues we keep adding here, the worse it is. So I would actually say the opposite, that once you have one full hairpin and one more turn, that's gonna be even worse. Because the longer, if that every residue we're adding to this hairpin, the worse it gets. So obviously it's not gonna be infinitely long. And at some point we have this hairpin plus one more turn. The only problem here is that we're not sure how long this hairpin should be, right? Because you could argue here that in that case, the best thing should be a hairpin that's just one residue long. But on the other hand, we know that beta sheets won't look that way. You're not gonna have beta sheets that are just one residue long and then go back and forth. But if it's bad to add residues, why are beta strands of a finite length? Say 10 residues or so. Well the point is that this is bad for the first single hairpin, right? But as we're gonna add more and more and more things, it will help a bit to have this slightly longer. So to be able to say what the transition state is, we need to have at least a guesstimate how long the shortest strand length should be. This is easier than you think. It's not at all the type of mathematics we had yesterday. So if we consider the case when a single hairpin is not stable, just as we had before. So the free energy here, we need one turn. And here we're not separating things in entropy versus enthalpy, this is just free energy terms. So that the free energy of the turn there, which we know is positive. And then there are two beta strands, right? So the two types of number of residues and each of these residues face an edge. So it's F plus delta F beta. You follow me? But that was just the first hairpin that was bad. The question is what happens if we add one more entire strand to this? So all that strand, that will maintain the number of edges because we're just moving that edge one step further out, right? So that we're gonna get one contribution from minus N times F beta. That's the internal free energy and N residues. And then we're gonna need to form one turn. So we need U and we need NF beta. And the shortest possible strand length we can have, that would be if this adds exactly plus minus zero. If it's positive, we will never, if it's positive, we're just gonna pay the more strands we add, then we would never form a sheet. If it's negative, it's gonna be even better. So the longer the better, right? But we're looking at the shortest strand length we can have because that's gonna be the worst and we're looking for a bad transition state here. And that would be when those two are equal. And when both those two are equal, we can say that the smallest number of residues we would then have would be the energy for, the free energy for a turn divided by minus the free energy of the internal residue. And once we've done that, we can do, take exactly the transition state you now talked about, that we have a single hairpin with a following turn. So that is a turn. These number of residues inside the hairpin and then another turn. But now we say that the number of residues here, we wanted the shortest structure possible because that would be the worst. If it's longer, it's gonna be more stable. The shortest structure, well, we had that on the previous slide, so I'm not gonna go back. Well, I will go back. Just take that and insert U minus F beta. And then it turns out that most of these use are gonna disappear and the only part remaining here is really gonna be N min and F beta, which becomes that expression. And then the book spends a lot of time, I think it's two pages, to prove that there is no lower, sorry, that there is no higher energy transition state. Sorry, no, that there is no lower energy transition state. Why is that important? Right, because well, I could, of course, I could go to any amount of mathematics I want here and prove this in excruciating detail. But if there was another pathway that had a lower maximum free energy, which pathway would nature follow? Yeah, so then I would just spend a lot of time proving something that was completely irrelevant. It's only the lowest transition state that determines the speed of your reaction. So that's why in this case, it's actually important to at least make it reasonable that there is nothing that's much shorter. So here too, we're gonna have something that's very similar to the alpha helices that depending on how high this is and depending on what happens with the number of residues, we're either gonna get beta sheets that are very stable or the ones that are intermediate or ones that are so bad that they will never ever fold. There's something that I don't quite understand is why, so Oliver said that the transition should look like one extended strand in a one-term and why is it not the transition state? And instead you go to the full-herbing plus one. Because remember, because we had this previous, I'm sorry, we had these two cases, right? You could certainly imagine the case where the hairpin itself was stable, but as I said, that's not very common to see in the protein data bank. In this case, if the hairpin itself was stable, then we'd also, the second you formed the hairpin, you would already be in a good state. And in this case, the transition state should just be a single strand and then one turn. But in some cases, it actually turns out that even the first hairpin itself is bad. We're still going, even when you formed the entire hairpin, you're still going uphill. So that the problem is where, even when you have formed the hairpin, as we continue uphill, we haven't reached the transition state yet. It's not until you keep adding at least one more turn that you reach the transition state. I would actually argue that even this is, of course, a simplification, right? Because depending on what residues you have, you could, in theory, imagine some very complicated sequence where even three stands, depending on the sequence, would not really be stable. But when you add the fourth strand, it would be stable. I think that would be so complicated that we will likely never ever see it folding. But you can certainly imagine there will be some small beta sheets that fall in this case. But the most common ones are they gonna be the larger ones? Because if that was more common, we would never see the really large beta sheets. It would be better to stick to one long sheet. Sorry, one long hairpin. And exactly why that is the case is, of course, a good question. But that's just based on observation in nature. Just like being focused on nature. No, but you could go through and do exactly the same thing for the first case. Actually, you know what? That's a great exercise. And I just realized that this might be something really good to put on the test. I think it's a great, please do it. It's not hard. In some ways, it's actually easier. But the point I wanna get to is that if you hang with me for one, let's wait one slide and I'll tell you why this, why I want to hang. I'm not gonna go through all the detail we went through for the alpha helices. So this is actually very easy. We know that the cost of the initiation barrier or something, the cost of crossing this transition barrier is gonna be some time constant that we have no idea what it is and then exponential of this free energy, right? And you can go into some details that the probability of this initiation can happen anywhere in the sheet. So we can divide that by the number of residues N. And we can also say that since the initiation is entirely time limiting, it's just the initiation we care about so that the total formation time, well, if the entire time depends just on the initiation to first approximation, we can say that the time it takes for an alpha helix to form, it's only the, sorry, beta sheet to form. It's really only the initiation that matters. We forget about the elongation here. It's completely irrelevant, partly based on experiments. And this expression is exactly what we had on the previous slide, two U, delta FB, oh, this is complicated. But if it's complicated, maybe we should, do you know what U is? Do you know what delta FB? Well, U is just some sort of turn energy, right? That's a constant. And that shouldn't, at least the first approximation, that should not depend on the amino acid. It's mostly the free energy component of freezing something in a turn. The delta FB, that is really a cost, that's gonna have to do with the lack of the hydrogen bond on the edge of the sheet or something, right? Or that there's slightly difference in entropy. To first approximation, that's not really gonna depend critically on the amino acid either. While this FB tells you how happy is a particular amino acid to be in a beta sheet. That will depend a lot on the amino acid. Some amino acids will hate it, others will love it, right? So basically, if you're gonna simplify this horribly, you could say that this is a constant C divided by minus FB. And you know what? There you actually have it. Sorry, I used a different constant A here, my bad. So what you've shown here, and this is also why we don't really care so much about all the specific details. You will get a slightly different expression if you do this for one strand, I think I don't remember. But the point is the time for a beta sheet to form, that depends exponentially on the stability of the individual residues to be in the beta sheets. Which will vary a lot depending on the residues. And if you have, say, a factor, if you have a factor 10 or something in the stabilization energy, which you can easily have, suddenly you're gonna have an exponential raised to 10. And if it's an exponential raised to plus 10 or an exponential raised to minus 10, it's gonna have a gigantic difference, right? There should be some sort of KT there too, but this is a proportionality, I don't really care about it. So the special thing with beta sheets is that you're gonna have an exponential distribution of folding times. That's why occasionally you have a folding milliseconds and occasionally you have them take years. Exponentials are more complicated than you think. Just trying to grasp how quickly, I know I had a slide about that two weeks ago. Just realizing how much an exponential function changes with the argument is one of these things that still frequently fails me. And so my reason for this argument related to what you asked two slides ago, what I primarily wanted to use this to show is that this justify this extremely wide distribution of folding times between different beta sheets. As is most of the beta sheets we see are the long beta sheets, I'm actually more interested in understanding why it's so difficult to form some of the very long beta sheets. My prediction, and I haven't done the math to caveat them through here, my prediction is that if you just do this for a single beta strand and one fold, you're in general gonna get much, much, much shorter folding times. So a single hairpin will fold relatively quickly if that single hairpin is stable. But that doesn't mean that all beta sheets behave like that. Some of them take much, much longer to fold. So a single hairpin can fold in a millisecond. And the reason I say that is that we have folded single hairpins in much less than a millisecond. And also I think that most designed proteins use. So that's another very big danger. I'll come back to that two later. Are designed proteins representative of real proteins? That's my question. So beta sheet summary, the cool thing that there are unstable sheets are extremely slow to, I wish I wouldn't really call this unstable, but sheets that are not extremely stable like the ones you asked about. In general, sheets can take weeks to form. And while the super stable ones form in a millisecond or shorter, and if you're now designing a protein, you can imagine what type of residues you're gonna pick here, right? You're gonna pick that type of residues. That doesn't mean that it's representative of a real protein. There's a gigantic, or at least a significant, free-ended barrier. And hopefully you're gonna believe it when I say that this is a first-order phase transition. The way to prove this is really to study how the energy changes with temperature and showing that this leads to a S that you end up having a very, very narrow temperature regime over which the transition happens. Yep, so that's a good question. What do you think? I'm gonna ask that question in two slides because I was kind of serving this up. So we've just compared the helixes and sheets here for a second. The funny thing is that alpha helix kind of cheats. The alpha helix avoid the phase transition by being able to coexist between the helix and coil. And saying that it's cheats, of course, it's just a freak of nature why the structure looks like that. But because it avoids this phase transition, that's why it has so much lower barriers and why it folds so much faster. And the height of the beta sheets, and now things get difficult because this is exactly the question I have. I would argue, and there are many people who think like I do, but not all of them, that these extremely high barriers are likely related to protein misfolding. So you probably, we heard about prions and everything before, right? So the first answer to your question is no. A normal protein is not gonna be functional if it takes a week for it to fold in a cell. Because that the turnover would be too slow, it would cost too much energy. Your cell is all about trying to optimize things to make sure that the protein production factor is reasonably efficient. And trust me, a week to create a protein is not efficient. But what if, there are some proteins that by definition, so this is in this case, it's the native form. You hardly have any helix sheets folded here because it's not really stable. But what then can happens very slowly is that suddenly you get a beta sheet. What do you think happens when you take two proteins like that and put them next to each other? Because suddenly the beta sheet finds even more beta sheet partners and it gets even happier. It's a bit of a groupie in that sense. So the problem is that I'm not sure whether that happens in general, but in this particular case, the protein is no longer able to be torn down by protease. The beta sheet is too stable. And that's kind of problematic, right? Because suddenly you have proteins, as I mentioned, the body is all about efficiency. It should be cheap and efficient and quick to produce a protein, but you also can't spend too much energy when it comes time to degrade your protein because it would be too costly, right? And this is a protein that you can't degrade. It's misfolded, it's bad. The natural way to treat that is to degrade it. You can't degrade it. It's gonna stay around. And what's worse, although in this particular case, I'm not sure it does, but there are lots of cases like this where they will start to aggregate because the beta sheets form other beta sheets. So what likely happens in this KOL, what we know based on free energy, in this case, the native state is actually not the lowest free energy. It's the lowest free energy to which we can get over a reasonable free energy barrier. But eventually you might be able to become an even larger free energy barrier and find an even lower state. But that can take weeks. So what happens then is that you might be here where we are really happy, but eventually over a long time, we might get enough energy to get over here. For the first protein where this happens, this is not gonna be a problem because it will take a very long time, right, by definition. What do you think happens if you now have lots of proteins like this in the cell already? Is it gonna make it easier or harder for more proteins to become like this? So the more proteins like this you have in the cell, the quicker it happens, right? So that's kind of stupid. Why has nature done that? Shouldn't this be a gigantic problem in nature? Yeah, but that's not good. We have natural selection and everything. Why on earth would you have this? This shouldn't survive, natural selection. So the first thing people made when they were 30, right? Natural selection only works if the disease or whatever thing prevents you from reproducing. So there's not gonna be any natural selection pressure here. The other problem is historically, apart from the last 100 years, humans tend to die when we're 40, 50. This is not a problem when you're 40. It's kind of nice not to die when you're 40 considering I'm 44, but... The problem is, of course, as we get older and older, these things suddenly start to appear. And when you're 80, that's like a whole lot of these modern diseases, Alzheimer's and everything, we learn more and more are related to protein misfolding. And this might happen over decades, which, again, historically has not been a problem, but it's starting to show up because we are healthy, right? So this happens in, and again, now I'm so not talking about this specific protein. But if you now have this protein and this happens in, say, cattle or something, and this particular protein in a brain, do you think it's a smart idea to grind that brain down and feed to other cattle? What's gonna happen to this protein in the stomach? Exactly, this protein is gonna survive all the enzymes, right? So it will go straight out in your blood. What's this protein gonna do in your blood? It's gonna make more copies of Pro-L. It's gonna force that. And this is essentially what happens in the Mad Cow disease. And that what happens in the human, the real disease in the cattle there is called BSE, bovine, spongiform, mesophilopathy. And then humus is called, it leads to Croidsville-Jackup disease and other things. It's a super complicated, it's not just one disease, it's an entire class of diseases. Micael, Oliver Berg out at Stockholm University, they're working a lot on ALS, for instance. And that's also related to protein folding versus misfolding. So it's not just, don't think of this as one protein, but like hundreds of different proteins. And I would guess that we're gonna see way more things like that happening. The problem here is that it's super hard to study them experimentally. Because we can't wait 80 years for experiments, right? So, but there are a bunch of experiments that people can do what you see in particular equilibrium between monomers and say dimers. And then it's about, can you gradually move to larger and larger constructs? Can you show that they start to form what you call oligomers with say, small parts of these aggregates? And then eventually that you form longer and longer filaments, protofibrils, fibres. And it's really, it's only when you get to these stage when these gigantic structures that are say, micrometer or something large, that they start showing up in a microscope. And at that point you start seeing the, what you call a plaque in a microscope in a brain sample. But it's protein. This was super debated when it first come because people found that there were new infectious agents that were neither bacteria nor viruses. And if you think about it, because that contradicts like at least the last 150 years of science. If you boil something, if you heat it enough, it should no longer be infectious, right? So it's an infectious agent that survives at least reasonable heating. It's an infectious agent that is definitely not alive. It's not a bacterium and it's not a virus. And somehow it could transmit disease anyway. So this is one of those things that could become a, well, a world plague within weeks if we don't know what it was. And then there were of course a number of researchers that started studying this and arguing that this was really based on proteins and everything. And this was in part this type of work that Stanley Prusiner eventually got an overpricing medicine for. But it was hotly debated for a decade. There is another lesson there just because everybody disagrees with you doesn't mean you're wrong. Because if he would have given up, we wouldn't have known anything about it. That doesn't mean big caveat here too. This does not mean that this is exactly right. There's a huge amount of research remaining to be done here. We know very little about it. There are, of course, there are very large genetic components here too. Some families and some families, this is much more prevalent than in others, which is kind of natural because it's based on the amino acid sequence, right? But exactly why? We don't know. Cool, but pretty, pretty, very nasty diseases. Sorry, say that again. The... We will talk about that the next lecture because there's last two lectures this week. I'm going to speak about practical protein structures. So yes, there are plenty of examples like that. Mostly in things like hair, bone, and everything when you need to form very large extended structures. But there are certainly natural fibrils occurring. Well, I would say that it takes weeks to extend them or something and it grows very slowly. But that's because if you need something that say hair, for instance, right? The hair has to extend to, not biological, but physical scales. It has to be centimeters or half a meter. And that is protein. It takes a while to grow it. What do you think hair consists of? Alpha helices. It's alpha helices, pure alpha helices. No, it's not pure, but it's mostly alpha helices. And you know what? This is a very funny exercise that you can do ahead of that lecture. Estimate how quickly your hair grows. You know what the dimensions of your hair is. You also know what the dimensions of an alpha helices. And you're going to get that it takes roughly five nanoseconds per residue you're adding to an alpha helix. And that leads to your hair growing. What is it, a millimeter per week or something? Or a centimeter. Yes? So here you mean? No, so the scary thing is that this is exactly the same protein. And when you first fold it, so this sequence, when you first fold it, it will look like that and that's going to be the biologically active role. And then some things happen. So in very rare cases, it can spontaneously refold into another. So this is exactly the same sequence. But it's a very good question from another point of view that the genetic component, if we take 10 people and look at the sequence of this particular protein in their genes, those sequences are not going to be identical. There will be minor differences and most of those differences are going to be in loops and unimportant regions, right? Unimportant regions like this one, loops. It's probably fairly easy to change a residue here. It's not going to affect the overall fold of this protein too much because it's not too structured. So if nature now starts to randomly exchange residues here, it's not really going to affect the stability of that protein so much. But it might have a very large implication on whether we form this form or not. The only problem, as I said, that this typically happens by the time you're 50 or, well, this actually probably starts to grow. I'm sorry to say, but you probably have your brains full of these by now. But that's fine because you're here, right? For this way to develop all the way up here, it's going to take another 40 years. And by you get to that time, you're certainly not going to have any more kids. So it's not really going to hurt you and every single brain of somebody who's retired is going to have some fear builds, that's fine. But in rare cases, this becomes so severe that it starts growing too quick and it could even hurt you in the 40s or 50s. And that's based on the amino acid sequence you have in the protein. But the actual process, it's not the change of an amino acid. And again, this is just one example. There are lots and lots and lots of them. Just before the break, I'm going to go through one more thing because now I'm actually going to finish up. We had alpha helices, beta sheets, and there is a third structure. There's a third structure that you don't think of as a structure, but it is a structure. And this is very much related to the lab yesterday because I realized that Doreen Bjarne started talking about diffusion and they're going to talk about kinetics today while my order was kind of the opposite. But after the break, we're then going to go over to show how we model real proteins. I'm going to take 10 or 15 minutes on this first, if that's okay with you. So the coil is the boring structure in a way and that is much less well-defined. But it's actually just as important or in particular it's important to understand it a bit. The coil is not a long stretched out linear chain. Why is it not a long stretched out linear chain? Sorry? Well, you can certainly, if you have an amino acid, you can totally take all the amino acids and put them in the spot under the Ramesh-Handeren diagram that corresponds to a beta strand, right? 500 residues in the long protein. So don't guess. So what determines whether this is a favorable state or not? Get back to your equations. What determines whether things happen or not? Free energy. So if we analyze the free energy here, the first approximation, well, if we haven't really spoken in specifically whether it's a stretched out or coiled up, we're not talking about something that's well-packed. There's water all over the place anywhere, right? So the first approximation, there's not going to be any difference in energy or enthalpy. So forget the energy and enthalpy. What happens to the entropy? Right, because you have 500 residues that you're freezing every single residue into one specific state and can't move it. And that's an extremely loss of entropy, right? So that there is no way you would like to do that. You want as much disorder as possible. And that disorder is going to be if you can, because you think of the collapsed state is, of course, not one state, but there are billions of collapsed states where it's more or less random, while the stretched out state is, so the fully stretched out state is just one single state. If you do this in a simulation within like 10 minutes, you're going to see the chain. And 10 minutes in a simulation is like a femtose, no, nanosecond at least. Within one nanosecond, the chain will collapse. It's super quick. But that leads you to another question. If you had 100 residues, and each residues was like five angstrom, it would be 500 angstroms point to point. That's definitely not the case if it's stretched out. So the question is, how large is this ball going to be on average? What is the average distance between the two ends? That's essentially what you did with your random walk, right? You start at one point and you work randomly and see how far you got. And what results did you get from these random walks? Yes, and how did that, what was the, it's not just linear, right? No, but it's the square. The square increases linearly, right? So that the end-to-end distance increases as the square root of the time. You can do exactly the same thing if we start at one point in the chain, but now we're not thinking of this as a motion over time. But we think we need to place this chain some way in space. Is this a hard or easy problem? Why is it hard? It's many degrees of freedom, but it's even, well, the degrees of freedom is nicer because we can just say what's the average structure is. Can I place these my beads any way I want to? And that limitation is what? Well, it has to be what you call self-avoiding, right? I can't take, if I put that residue, I can't go back and put another residue on top of it. That is a really complicated, but you know what? What you do if you're a physicist, if something is really complicated, the cheat, we simplify, right? So I think this sounds pretty good. Let's ignore this for now and see what happens. And then it's actually relatively easy. Eight square, well, if my entire chain consists of small segments that has length i, for now you can imagine that's one amino acid each, but it doesn't necessarily have to be. This could be any chain. Any type of segments that's connected. And I'm also gonna avoid the complication that I can't bend the amino acids any way I want. So let's just think of an ideal chain that I can bend any way I want. The total distance here, that's really a vector h. And if I just sum up the vector of all small things here as vectors, so I move from there to there to there to there, the end vector h I'm gonna get is from the starting point to the end point. And if I now square that, because I'm not really interested in whether it's to the northwest or southeast or anything, I'm just interested in the length. So I square it, that's that exact expression squared. And this is a long, long, long sum, of course, it's a sum of length i, but it turns out that there are only two sorts of terms in that sum. Either when I multiply one such term with another term, and one part of that multiplications of my multiply each length with itself. And the other part is when I multiply one length with another length. So these are small vectors, right? So there's gonna be one sum when I have the sum overall squares of each vector, and then I have one term that's a cross product. Simple mathematics. If it's hard to do for n, you can expand this for an orbit, say 10 or something and do the math. Does that make it easier? Not. Well, I would argue that it does. Sorry, I'm missing a bracket there. The average of this, for an average, because again, we were looking at the average, I'm not looking at a specific confirmation here. The average that we would measure experimentally. The average then would be the average of that expression, which is the average of these two sums that I showed on the last slide. And I can take that average and move it to the inside of the sum that I'm allowed to do. And that means there are two parts here. One of them is the sum of the average segment length squared. And if all the segments are the same, that's just the segment length r. It doesn't, the i doesn't matter. So that's just r squared. Or you can take the average length. And then it's a sum of one vector multiplied by another vector. What is that? Have you taken vector analysis? Have you, if you haven't taken vector analysis, do you know how you calculate the product between two directions or something? So that's the length of one vector multiplied by the length of the second vector multiplied by cosine of the angle between them. So in principle, this would be the length of the first vector that is r and the length of the second vector that is also r and then the average of the cosine. What is the average of cosines? Yeah, so if you take a cosine and move across an entire period, the average is zero. You can think of this from a much easier point of view. If you have this helix, if you have, what is the average correlation between one vector and another vector if they are completely random? There is no average correlation. If they are independent on average, there is, on average, you're not gonna move systematically more to the left or more to the right or anything, right? It's completely random. So with the caveat that this is super simple, that entire term disappears. And this term just says you have n segments and each segment has an average length r and we square that. So that really means that the average length of the sequence increases as the square root of the number of residues. The individual length we don't really care about here because that's based on the amino acid or something. And the average volume, well, that's roughly proportional to the third power of the length, right? If you have some sort of reasonably spherical component. And that's gonna be the number of residues raised to three halves then, square root of n-mult raised to the power of three. Do you think this is a good approximation? How large do you think the error is? Well, not quite that good. Because again, we made some pretty hard approximations, right? We said that on average, a residue will never ever, well, we can say that we completely ignore the fact that two residues can't overlap, for instance. So it's no, it's not gonna be one percent. Think, oh, well, now you're getting there. So the average coil length, there are some, first I said that segments can have any orientation. That's not true. In an amino acid, you can't have the next amino acid turn back 180 degrees on the chain. Forget about overlap. This has to do with the torsion angles. You can't, you simply cannot do that. But of course, there is nothing here that said that this was amino acid, right? I can decide to have some chain length. If this is a long chain, at some point, if you have 10 amino acids, I could argue that this is enough so that at that point I can turn any way I want. So you can actually detain some what you call a contour length or correlation length. And then that's the individual size so that the average, it doesn't really matter what the length of each amino acid is here. It's still gonna be proportional to the number of residues you have. And that's why it's so beautiful with these very simple approximations. And there are some models where you can allow each, you can have these angles and allow them to be fixed and just allow rotations around the angle, much more realistic and you get the same results. So that's don't knock down on the power of these super simple simplifications. They usually get you all the results you need and it's much less effort than trying to do it exactly from the start. But of course, at some point you might wanna check how good is that approximation reader, right? And if you do that, you're gonna need to get into this excluded volume part. This is hard, this is super hard. I have derived this once in my life. So this is something Paul Florid did. That was a physicist that went into polymer physics and our polymer chemistry reading in the 30s and 40s. And if you do this, you eventually, it's rather than having the number of residues raised to 0.5, it's gonna be roughly 0.588 and then lots and lots and lots of decimals here. This is an insanely hard duration and it's so hard that he got the Nobel Prize for most of these studies. Paul Florid pretty much funded the entire modern way when he used physics to look at simple polymers like plastics and everything. Because this is super important in plastics. This has to do with how much space will plastics and everything fill out. And this in turn led to the parts that we're gonna look at after the break using physics to study real proteins. So all these things looking at Ramesh Shandan diagrams, the degrees of freedom a protean I can have, that really, Paul Florid did not do that himself with a bunch of researchers in particular, Schneer or Lifson and people at the LNB in Cambridge that took these very simple polymer models, that is homo polymers because they're plastic, the same polymer and move them over to proteins because proteins are also polymers. It's just that they're hetero polymers which makes them way more complicated. But in principle, the physics is the same. I don't think you're gonna do a lab at this. For some point, we figured that, but it's actually surprisingly hard to calculate this even in a computer because you need to draw so many random numbers when you fill these models, you need to do it in the same way. And a long time ago when we did this, you need to run so long simulations that you can have problems with many of the simple random number generators. Because random number generators in computers are not perfectly random, they are correlations. And if you have those correlations, you don't get the exact result here. So this is the one thing I can promise you that I will never, ever ask you to drive. If I had to do, I couldn't do this without a book today. It's hard, very hard. This fits beautiful, it's a beautiful place to take a break. It's almost 10.40 now. Should we meet here at 11? And then I will spend an hour after the break to go into real proteins. So what I'm gonna bring up is how we use modern programs, MDE simulations, energy minimization and everything to apply all these knowledge, but now it's no longer gonna be model systems. Now it's gonna be real systems. And that, in principle, you can apply to all the proteins I'm gonna show you later this week, including the fibers. Let's take our 20 minutes. I will head on to a slightly different part. So this, we're kind of gonna complete the theory part here. Instead of looking at simple voice systems, we're gonna start looking at real proteins. To first approximation, you can consider a real flow in a protein like a blob or a point like particle. That actually turns out great in many cases. You have no idea. All of these things, I said, don't think that this is just some hardcore detail physics. What I just derived before the break, what is the average end-to-end distance as a chain as a function of the chain length and everything? Could you imagine a place where you could use this? Design a small molecule? Much simpler than that. That's it, forget the advanced physical. Gels, anytime you're filtering something in a gel or something that has to do with size, right? You base it, depending on your DNA or something, depending on how long the molecule is, you can actually derive what is the distribution between the length of a molecule and the distribution you will see in your gel. And you're gonna have this square root behavior. Actually, it's gonna be racist 0.588, but experimentally that you're not gonna see the difference. So that this is used in a range of very simple applications. Anytime you're filtering or trying to separate anything, it matters how large molecules are. And the beauty there is that you don't really care about the prefactors because the prefactors you can always calibrate. If you have one, if you know one molecule, you know exactly what that molecule is. You can say that, okay, I have my protein X here. And then the rest of the location of these bands is just gonna be the relative distribution of them. And the relative distributions, it's determined by these exponents. But my point here is that don't knock the very simple theory. You get a surprising amount of reuse for some of the simple theory. But we would not have this class if we wouldn't be looking at details. Inside that protein you have there are, of course, all these regular elements, right? And the reason why I show it this way is this is, of course, just our conceptual part of looking at this. These elements don't exist. There are no things that's helices or anything, these bands don't exist in a structure. The way they would look is rather something like this that you look at all the chemical bonds inside a molecule. And actually, even the chemical bonds in one way, well, we're not doing quantum mechanics so that we could argue that those concepts, that it's really, this is just one big sea of blocks, atoms. So everything else is really just in your mind. Now, hopefully those models are pretty good and the only, what is the definition of a good model? It's useful. That is useful. It doesn't have to reflect reality. If a model, there are only two types of models. The ones that are useful to you and the ones that aren't. If the one that isn't useful to you reflects the world, well, if you're just into theory, you might still think that it's beautiful, say, string theory or something. But if it's not useful in practice from the point of this course, we don't really care. So that all these concepts about saying that, okay, this is actually the same protein shown from exactly the same orientation. I think I magnified it a bit in the last two pictures here. But at least to me, it's not obvious where the helices are here. It's hard, right? And it's, when you just look at these secondary structure elements, it's so easy to fool yourself that they always obvious how it's gonna fold or something. But the way nature works is that you have that hydrogen interacting with that oxygen and nitrogen and everything. And this is why it's hard. The most appropriate folding of these interactors is really gonna be about packing and trying to pack all these residues. And the second you heard the word packing or order or something, you should start raising your entropy sense, because packing has to do with searching and sampling parts of phase three. The problem is that there is some good news and there is some less good news. The good news is that you actually understand all the theory you're gonna need, that you've applied it already in the labs. You can design different states for a system. You can design simple rules by which your system interact. You can sample the system by these different rules. And there is some particularly one of these rules that I haven't even brought up in class, but all of you managed to handle it really well. And it was these Monte Carlo simulations. You know all the statistical physics you're ever gonna need to study proteins. The only caveat is that this far you've only applied it for very simple toy systems. The challenge now is you're gonna start flying into this system. But the good news there is that it's just more paperwork. It's not really conceptually harder. So you know all these interactions. If we look at this system, what I did the first week is that I went through this, but you probably don't remember that because you didn't see it in that form. Do you follow those equations? Sorry, I should bring up the microphone here. I heard a bit of my voice. So that's the mathematical or physical way of formulating it. So physicists love to have this well defined. The first one just says that we have a sum over all the bonds and there is some constant and then the distance minus the equilibrium distance raised to the power of two. So that's just that harmonic potential that it costs energy to deviate from the average bond length. For every angle it costs something to deviate from the average angle. We went through that the first lecture but I just didn't show you the equations. It's the torsions or these rotation potentials. Well, you could formulate anything as a four year series if you have an infinite amount of terms here. And infinite in this type of physics is like three, possibly four. It's a first approximation. Actually, you know what compared to 6,947,843 terms relatively speaking, the approximations is just as good because both of them are infinitely far away from infinity. But the point is that we only need this to be accurate to say within 0.1 kilojoule or something. If you want anything that's even better than that you can need to go to quantum chemistry. These impropres is something that I haven't brought up but there are a couple of centers where you need to keep a group planar or something to keep the bonds. There are simply let's say a carboxyl group or something and we know that they should be planar. And then you have electrostatic interactions and you have these so-called Lenard-Jones interactions that say that atoms can't go into each other and at very long distance all atoms, even noble gases attract each other. And here it says V but you should read this E. This is an energy. What type of energy is this? It's a potential energy function that describes the system. In principle, you could use anything or you could use quantum mechanics if you want to. What would be the problem with using quantum mechanics here? Yes, you could do this if you use an entire supercomputer you could calculate what is the energy for one state. The advantage with this is that we can calculate this in one tenth of a millisecond for an entire protein in water and that gives us a chance to sample the entropy, the many states. You can certainly go in and that leads to another question that we haven't really solved. So what is, we've done a bunch of approximations here, right? So if you're gonna fold the protein or something, what do you think is the largest shortcoming? Is it that yesterday I said we need to sample all of phase space really, the entire partition function. We can never do that. So that's one approximation we have. We can't sample all the world. And the other approximation we've done, we've introduced a very approximate way of describing the interactions in our system. Which one is the worst? Yes, but that is so not an obvious answer. People have been fighting over this for 40 years. And for a long time in the ordinary, we just need a little bit better potential functions and a little bit better sampling and we will be able to fold proteins. It took people 30 years before we got there. And I, even I was a bit surprised to what it turns out that we were like five orders of magnitude short in the sampling. But modern computers and distributed convenience has solved that. So today we can actually sample the phase space enough, the partition function, that we can start to see that. We sample so well that we can see that based on the approximate interactions we have, the standard error we have falls, for instance, slightly outside the experimental numbers. But you know what's much more common? That you realize that the standard error in the experimental result is so large that you can't really say. And that comes back to these structure factors. It's really dangerous to think that an interpreted experimental structure is a result. Because what if you used a really crappy model to determine that structure from the structure factors in the first place? Anybody who ever tries to compare to that is gonna get a bad result because that model is bad, right? And that's why I mentioned this example. People have actually gone back and tried to re-refine proteins in the Protein Data Bank and improve the structures. And this is, again, this is certainly not to discredit either the experiments or the theory, but this is difficult. And that there are no easy answers in the world, but everything, even an experimental result, includes a model. And if you don't know what the model is, you haven't understood what people are doing. The other problem here is that you can't really simulate a protein. Well, I could. I could take a protein, a small protein, and put that in a computer and simulate it. What would be the problem with that? Well, the first thing, most proteins wouldn't fold in vacuum, right? Because all these things we said about the energies and everything assumed that we have water, so proteins have evolved to be stable in water. Some proteins do. Membrane proteins, for instance, would be fairly happy vacuum because vacuum doesn't have hydrogen bonds and the membrane environment wouldn't have a hydrogen bond either. So that if you start by throwing out the water, you could simulate a billion years. That wouldn't be realistic anyway. This is actually not a protein. You know what this is? It's a very old result that I did when I was a post-doc at Stanford. So they had a small collaboration with a girl up at UC Davis in California. And she was collaborating with a company that was very interested in these molecules. They have any idea what company it is based on this molecule. So this molecule is ethanol, roughly 10%. This is actually a 10ine. So that would have a red wine. So the company Gallo sponsors a very large part of their analogical faculty. I have no idea if she ever got any interesting, but I helped her set up the simulation so that they could simulate red wine tannins in an alcohol-water mixture. So that's essentially red wine you're simulating. The only problem is that that is not red wine because you don't have the water. That would be red wine, right? It's even a bit red. It's a burgundy, I guess. And the point is to even be able to claim that you're trying to model something realistic, you're gonna need the water. How much of this system is water? The vast majority, right? So when it comes to simulating proteins, it's mostly about simulating water with a tiny amount of protein in it. If you don't do that, you're screwed. The other problem is that this water appears to be strangely square for water. Water would usually not look like that. The problem is that I can't simulate an entire bottle of red wine. That would take, I have no idea, 10 to the power of 30 molecules or something. Sorry, it won't work. So to be able to simulate something, you're gonna need a very small system. But if you have a very small system, this would be a horrible surface tension or something that would change anything. So I think you apply this a little bit in your diffusion lab, right? That you had this thing that, what happened? You designed something, when a molecule went out on the left, it came in on the right again. And that's what you do here, too. So if that water that goes out on the left there, he instantly comes in on the right. That's, of course, an approximation. But this molecule is gonna feel that there is one or two nanometers of water and this water shields electrostatics extremely well because it's an epsilon of 80. So this molecule is not really gonna see its next periodic neighbor if this box is large enough. So that works very well. And that way you can have a very small box. And also all forces. So that atom is really attracting an atom there. So all these interactions, this would correspond to having an infinite number of periodic copies of your system. But the beautiful thing, you actually only have to calculate on your central system there. So that's like a 40-year-old approximation that works really well. What people do today, you can do this for a giant, this is a gigantic membrane protein, a ligand-gated ion channel that we're gonna talk more about on Thursday or Friday. That you first put this in a membrane, the gray part here, and then there is water and everything around it in blue here. So that's a much larger system. This might be 150,000 atoms or something. You can simulate this for one or 10 microseconds today. And you can see the channel opening. You can measure how much, if you bind a residue, whether that causes the pore to open or not. You can't fold the protein like that. Something that's really tiny, we could fold, but this would likely take a millisecond, well, more than a millisecond to fold. But there is a surprising number of very biological processes you can study this way. But then it is important to keep all your approximations in mind. Unless you're sampling the relevant part of phase space, you're not doing anything. Then you're just collecting beautiful images. And it's so easy to be seduced by these beautiful images just that they look beautiful. But what is that again? No, trick to do, that's not a tannin. This is a set of coordinates in a computer program that I'm drawing as if they were a tannin. It's not a tannin. And of course, if I screw up my coordinates and I was a young postdoc at the time, I very well might have. It still looks like a tannin, but it might be a really bad model. I might have screwed up the charge. I might have put the charge on that oxygen to be positive. It's gonna be a horrible model in that case. If I draw it red, it will still look like an oxygen. So that, thank God you're usually gonna use programs that are relatively stable. We at other groups developed these programs so that they had of course been carefully tested, right? But no matter if you're using somebody else's program, you're still applying a model. And that model can contain errors. Just as an experiment can contain errors. By default, you would have a small cubic box. I'm just gonna say that in general, there are other ways in nature that you can actually, if you go into solid state physics, you can actually take non-cubic boxes. This is, for instance, a rhombic dodecaedron and pack those boxes too in space. It's possible to do these things in most simulation programs. And the reason you do this is that this is more spherical so that the volume here, for a given amount of water, so you can reduce the size of this box because you're essentially cutting out the water out in the corners. But I don't think you're gonna see that in the labs. Sorry, there was somebody who almost interrupted me and had a question or? Mm-hmm. Yes. And the primary advantage of these programs that have been tested well and it's reasonably easy to work with, for instance, these programs that we are writing, there are roughly three million lines of source code. So they're gigantic programs. And the reason why they're so large is that because they're slow, well, the programs are fast, but it's slow to run these type of simulations. So you typically try to run them on supercomputers or graphics cards and everything and it's simply hard to write those programs. And it's very easy to make mistakes. But I would say from a scientific point of view, they're reasonably user-friendly. From a, compared to a program like Microsoft Word, they're horribly user hostile. But that's kind of the normal state of programs in academics. Academics don't care so much about user interfaces. They care about the gory stuff. But there are lots of other programs. Not everything is gonna be about simulations. Later in the course, we will look in things like docking. So in many cases, you just wanna screen through, given this protein, where is it likely that a small molecule will bind? There are commercial programs here that can do that in 30 seconds. And they're super beautiful. The most amazing user interfaces you can imagine. And if you're sitting in a company, you have to pay like $50,000 per year per seat to use these programs. Because that's, in contrast to Microsoft Word, they're not gonna send a million copies of these programs. They sell a couple of maybe 100 or 1,000 copies of them. And the pharmaceutical companies that use these programs, they, of course, save millions of dollars by not having to do the experiments. So they are more than happy to pay $50,000 per copy. So what is that we try to do in these simulations? I showed you this movie before, but I like to reuse the name. It's far easier than to rebrand it. So these landscape was really a free energy landscape, right? And what you are doing in a simulation, you might think that you're simulating a molecule. That's not what you do. You're really exploring the free energy landscape. And your simulation is good enough if you have explored all the relevant parts of the free energy landscape. So missing that part is gonna be kind of bad. Missing that part is gonna be kind of bad. Missing the red part is not really gonna influence your results. So what you see here, in theory, could of course have that atom bumping into that atom or two hydrogens interacting horribly, but this molecule spends most, virtually all its times in the confirmations where it's really happy, right? So we're gonna spend 99% of our time down here. We never spend any time in the red region. So while we would ideally like to sample every single point in this plot, in practice we have to make do with the really blue ones here, the low-lying energies. Is there a way to sample more? Could you think of a way if I still would like to sample more of the free energy landscape? Is there a way to do it? So I start there. What would happen then? Boom. Okay. So what happens now? Then you don't move. Okay. So that didn't really work. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. Boom. I can only sample that deep blue part. Do I sample a bit? I do sample a bit, right? Why do I sample a bit? Why is the molecule not entirely frozen? No, well, yes. But we'll get to that in a second. Why do I, if this was an absolute zero, I would only, I would be completely stuck there, right? The reason why this moves is why. They have a finite temperature. In this case, roughly 300 Kelvin. What would happen if I increased the temperature? Yes. So the higher the temperature is, eventually, if I may be at 1,000 Kelvin, I might sample all of this region. And let's say 10,000 Kelvin, I would even sample the red part. What's the problem with that? Well, it does exist, in a way. But how relevant is it to simulate? I would sample all of my phase space. But I would also be simulating, how does a protein behave at 10,000 Kelvin? It's not particularly relevant for biology, right? So all our samples, everything we're collecting, all the entropy would not be a natural state. So that, yes, it's a great way to sample more of your energy landscape, but I'm not sampling it correctly. It's the wrong temperature. So that won't work. There are tricks to sample more of your energy landscape. But the whole idea is just that this is a relation with temperature. And the good part is that, as bad as it might sound, if you don't care about the red part, it might very well be enough. If you want to understand how a protein moves, it might be enough to study this part. If you want to understand how the protein folds, you might need to sample both that part and that part. But unless you're going to simulate how the protein waves and then nuclear explosion, you're probably not interested in the red part. So this approximation works much better than you think. What you frequently need to do, you could imagine doing exactly what you said, that the simplest algorithm you could think, if you started a completely arbitrary point in the energy landscape, is that we would somehow like to find the lowest free energy, right? And now that is deliberately a free energy. The problem is that I can't really calculate a free energy. Because a free energy depends on entropy. I can't measure entropy directly. The only thing I can calculate easily in a simulation is the potential energy, because that's the equation I showed you before. The potential energy I can calculate at any time. So we forget about the free energy for a second and just look at the potential energy. I might be able to find the lowest point. The only problem is that that doesn't work in practice. Because if I am here, every single algorithm that tries to find something lower in mathematics, find something that's locally lower, a local energy, a local minimum in the function, the only way to find the lowest point in the entire space is to explore every single point in the entire space. Because there might always be something even lower beyond the next energy barrier. I can't predict that. So that if you still imagine a very, very simple naive algorithm, you can start at some point. It could be that red point there. And then we just try to walk downhill. And the second the terrain starts going up again, I should turn. And then I just keep doing this. So I change directions as long as I'm pointing downhill. And pointing downhill is very easy, because I know what my potential energy is. I can calculate what is the gradient or derivative of this. And I just go in the direction that is most steep downhill at every point. It's a very simple algorithm to do. It's not necessarily the world's most beautiful algorithm. So we take an example here. These curves corresponds to parts with the same potential value so that the gradient here would be orthogonal to this one. So of course, if I'm super lucky, I'm starting here and I have a very smooth landscape, I could go directly to the minimum. But in general, I would move back and forth along something and try to find the minimum point. The beautiful thing is that this works really well. It's not hard. I have an idea how much it's going to do to a protein. Any idea what this is going to do to a protein? How much will this change? If I take a protein from the protein data bank and run an energy minimization on it, what will happen? Should we try? So you're going to need to look here, because I think it starts the second I start the slide. The largest thing you had was that beta sheet turned roughly 30 degrees. So there are two reasons for this. First, we're only going to find a local state. The second, I bet that this protein was already energy minimized, likely with amber, because that's what people used to do in the 1980s. It's just that they probably didn't energy minimize it very well, or they might have done it without water or something, so that we got a slightly different state. But these differences were minor motions inside chains or something. So energy minimization is not really going to give you a different structure. You can do this on very large scale. You can take proteins, we can distort proteins and see can we get back to the native state? And if we do, the red curves here would be, if we go up here, more to the right, that would correspond to bad structures. And if we start from the red, energy minimization moves us a bit to the left here, so we get structures that are slightly better. Sorry, this is called the distance matrix error that I won't go into detail. But the short story here is that energy minimization can, you can get rid of some of the really bad things in a structure. If there are two atoms colliding with each other, energy minimization will move them away from each other. But energy minimization will never get over an energy barrier. You will never realize that, oh, these two helices should really be packed that way instead. You can't do that by definition. So why on earth do you bother with energy minimization? No, we don't really bother with energy minimization. But we will anyway. I'll tell you why in a second. But for now, let's give up on energy minimization. That's not going to solve anything. But then Fender 40, but you know what, based on everything I said that this is much worse. Can you even simulate a protein? Forget the fact that stupid people like me might try to do it. But think conceptually. If you have a single particle of air, because that's the detail in which you're trying to track this, can you track a single particle of air? Because it's going to be a chaotic system, right? Have you seen the Lorentz Attractor? Unfortunately, this is not a movie, but this is a mathematical function in 3D. And the point is that if you just start drawing a curve here, you will either go around in the same place or occasionally you will switch over to the other. And then you will go back here again and you can't predict when you move. If you make an infinitesimal change here and it can be arbitrarily small, you will end up with a different trajectory. And when you swap between these two attractors, it's completely impossible to predict. So if you run this in a computer and have enough floating point, the more digits of accuracy you have, the longer you can track the particle. But this deviates exponentially. So at some point, you can have a billion digits which no computer would have in hardware. Even with a billion digits, after say a billion steps or something, you could still not predict. This is the way nature works. Even, you know what, even assuming that we had, even assuming that this was possible and even assuming that we had a perfect computer with an infinite number of digits, there is the Heisenberg uncertainty principle. You can't know both the exact position and the exact velocity of a particle in quantum mechanics. It's impossible. And to be able to predict the motion, you need to know both the starting velocity and the starting position. So by definition, it's impossible to predict the motion. So now we should again close the course and go home. So this is completely impossible. Why does this work then? That's a good question. It's deeper than that. So you would be surprised with the number of students who think they can simulate the protein on the PhD level. You're not simulating a protein. Remember everything I said yesterday, about the partition function. If we know the partition function, we know everything. Simulations are models are based on predicting properties that we calculate from the partition function. The reason why I run a simulation is not to trace the exact point of the particles, but it's really to sample states from the partition functions that we can calculate these average properties. There is absolutely nothing here that has to be a natural sequence of states. You're calculating statistical averages. That's why it's called statistical mechanics. And sadly, there is a generation of people who think that they're actually tracking an individual particle. First, you don't know what they're doing. The way we normally set the initial velocities is from what you call a Maxwell-Boltzmann distribution of the velocities. So we assign the velocities randomly based on the temperature. But that's fine, because if we are sampling calculating the average properties from samples of the energy landscape, as long as you generate states that sample the energy landscape correctly, then we're gonna be fine, right? If you think of that attractor, there are two parts to it. And if we should spend on average 50-50 in each part, as long as your simulation spends 50-50 in each part, you will likely get the statistical properties correct. You don't rely on the individual trajectory of a particle being exact. It's just the average sampling that has to be correct. So we don't really rely on knowing the energy landscape. We're just sampling parts, and it's just that as I'm gonna show you in a while, it turns out that this way, using exactly the same equations as if we were moving particles, turns out to be a very nice way to sample the energy landscape. But we're not predicting the individual particles. So what you did in the lab this far, there are a bunch of different sampling methods used. I would argue that the simplest one is Monte Carlo simulation, and in many cases, it's the best. That's what you did in all your simulations and that you kind of derived yourself. And there's actually a fun is that this, you decided whether to accept states or not based on this Boltzmann distribution, right? Do you remember that? That has a very special name. It's called the Metropolis Criterion. And the first research paper was Metropolis. This paper was classified in the US for almost 30 years because they developed this as part of the Manhattan Project. Because again, in the 1940s, it was a pretty big feat to be able to simulate physics in computers. So until the 1960s, that paper was classified. Actually, it isn't anymore. I'll see if I can find it for you. It's hard, relatively hard physics because he keeps going through and drawing it. But we're talking about fairly modern science here. This was not until the 70s that was declassified. But Monte Carlo really just relies on one thing. If you can make smart moves that correspond to natural things that might happen and then we decide whether to accept them or not. If you're lucky, this works really well. In a protein, a very simple model on a lattice or something, you could have things that move on a lattice. If you have something in a protein or something, well, if you're just moving small loops or something, you can probably come up with an algorithm that tries to predict different ways of moving this loop. The advantage with that is that it's super efficient at sampling a large part of phase space and it does it very quickly. What is the problem with this? So the problem is if you have a water molecule here, right? What's going to happen when you move the chain? You're instantly going to bump into water. And that's actually even worse. If you do this inside a big protein, these motions are very small, but if you try to do a large motion here, you will bump into the next beta strand here. So if you're, and remember the picture I showed you just after the break, that if you're now in a space that's completely crowded with atoms, if you take any atom and try to move it, you're going to bump into something. So this algorithm is going to work really well. You suggest a move and the answer is no. You suggest a move and the answer is no. You suggest a move and the answer is no. You suggest a move and the answer is no. So you will never, ever be able to move anything because no matter how you try to move, you will bump into something. So for a small simplified system and particularly one without water, this is awesome. The second you add water, you're dead because you're never going to get any accepted moves. But all I was really trying to do here is to sample confirmations so that I fulfill the Boltzmann distribution, right? You tried that in the first lab. You can't use any criteria you want. You need to sample things according to the Boltzmann distribution or your average are going to be crap. But one very simple way to sample things according to the Boltzmann distributions is the normal motion, equations of motion. Force equals mass times acceleration, right? It's possible to show that this reproduces the Boltzmann distribution. It's the laws that particles in reality follow. So it's kind of phobias that it will reproduce the Boltzmann distribution describing reality. You can forget about those other things, but that's essentially taking a small protein. That must be the world's smallest helix. It's actually one of those non-cubic boxes. And it's a very small part of an alpha helix moving in water and it's hard to follow here. But you definitely have cytos and everything moving and because each move here is so small, you can definitely have the water. The problem is that your motions are going to be much smaller, so you need to spend way more steps than actually sampling things. So the idea here is really that we know that the trajectory, the path, or the trajectory, as we call it, for an individual particle over a long time is incorrect. But if the errors are small over reasonably long parts, the average will be fine. And it's the averages we are after. We're not interested in the specific motion of one particle because that's subject to statistical fluctuations anyway. And these motions, you actually solve these equations already in sixth grade or something. Newton's first law, force equals mass times acceleration. You know the potential, that long V equation. If you know the potential, you can calculate the force. That's just minus the derivative of the potential with respect to position. And we know the mass of every single atom. So you know what the acceleration of every single atom is. If you know what the acceleration of every single atom is, you know how the velocity changes from one time to a very, very short time later. That change in velocity is really acceleration, which is the derivative of velocity, multiplied by the small time step. Now you know how the velocity is changing. And then we do the same thing. If we know how the velocity is changing, we can integrate that and see how the position is changing. Same thing there, because the velocity is the derivative of the position. So now we just know how we updated the position. So now we went from time t to time t plus delta t in the simulation. And we've moved all our particles in the system. And then you need to recalculate all forces. You can show this. If you're going to be slightly more cautious here, this is how you formulate this mathematically. But it's exactly the same information. Force is mass multiplied by the second derivative of the position. And that force is also minus the gradient with respect to each atomic position. The curse here is that I'll show you this instead. This is what happens in reality. And I know you are silent because you are amazed with the amount of biology you're learning from this, right? This explains protein folding and everything. Each step here is two femtoseconds. And that's roughly how long you can take. It's pointless. We're not learning anything. The good news here is that the rendering here took like 10,000 times longer than actually doing the simulation. And I'm only showing one water because this water isn't isolated. This is part of a larger simulation. Let's see if I can start that. So this is a real simulation. And it's 10 picoseconds. And I remember because I ran this a couple of years ago. Today I could run this on my laptop. It takes roughly the same amount of time to run the simulation for 1,700 atoms as it took to render the movie on my computer. So this is still a relatively short simulation. And I'm well aware we're still not learning anything about protein folding here. But you can certainly start to calculate the properties of hydrogen bonds, average lifetimes of hydrogen bonds, viscosity, diffusivity of water, and everything. So suddenly you're at least in the physical chemistry realm. And it's, sorry, I think it's trying to play the movie backwards, but my hard disk won't have that. But the point, these are not trivial properties. As properties you can't calculate easily or anything. And even if these water models are very simple, they're actually reasonably accurate too. We might be 10% off in viscosity, which is the worst property. And the only reason why we're 10% off is that I want a very simple water model because later I'm going to put proteins in it. I don't want to waste my time on the water. I want to focus on the proteins. There are great water models that get the freezing point of water and everything exactly correct. So what we want to do is not really the water, but a protein. This is a small part of a protein called the villain headpiece. So what is this protein doing here? Yes. But I have to confess that I'm the source of the spin. So I'm just telling Pymol to rotate this while we're rendering it. So the spin is just that you're going to see it up inside of it. Actually, all the water is surrounded too. It's just that I'm not showing the water, because then you would just see the water here, not the protein. This was the first protein that people managed to fold, which was around the 1998 in the microsecond simulation. So this protein, we can predict how it folds. It's not trivial. It's like 25-residues or so. But what's it doing here? Well, so this is actually the native state. But my point of showing this at the native state is this one microstate or many microstates? Yes. So this is a beautiful example, right, that we are exploring the volume of microstates that corresponds to the macrostate that we call the folded state. The folded state is a state where, as a measurement, this would be a measurement of the folded state. Microscopically, these are thousands or billions of states that all these molecules explore. But they also, they're not pumping into each other. They're just really exploring the low-lying regions of the energy landscape. I'm going to, well, before I start the movie on the next slide, you can think of this a slightly different way. When people did this in 1998, this was a landmark paper. Because at that time, I remember when I was a student and then went over to Stanford, there was even Jay Ponderer, a lecturer, a really smart guy, who argued that based on the accuracy we have in the potentials, if there is an error in each term here of 0.05 kcal or something, when you sum this up, it's very easy that the errors could reach 10 to 20 to 30 kcal. And at that point, if the errors are large, no matter how much computing time we spend on it, we will likely never be able to fold proteins. So in the late 1990s, it was a very open question whether we could even fold proteins no matter how much computing time you spend on it. This project, which this movie is not from, but on this project, they used four to six months on one of the largest supercomputers in the US, and they used all thousand nodes of the computers. It was an insane amount of computing time that went into this, and they got two very high-impact papers. But based on what you know about the energy landscape, is it really smart to just run one single simulation? So why should you do lots and average them? Right, because we are not tracing the specific motions of one particle, right? We are sampling an energy landscape. There is no need that you have to have one, if you're going to map all of Stockholm, you could argue that one possibility is trying to buy a portion, drive very quickly through the city, or you could hire a thousand students and give them a map each. The lateral alternative is likely going to be more advantages. So a bunch of groups in the world, including us when we're at Stanford, started looking into other ways of sampling this, and this led to one of those problems called folding at home when we borrowed screensavers, not just on a thousand's computer, but on the hundreds of thousands of computers all over the world. So each of them were slower, but aggregated, they were like a factor of 10 more powerful. And then you run a hundred thousand simulations. And of course, most of them won't fold, but some do. So this is a different protein called BBA5, and this is roughly 23 residues, and I've drawn the backbone here. Sadly, pymol is not going to change the secondary structure during the simulation, so you're not going to see proper beta sheets and helices. And this is actually another large amount of water around it, but I've removed most of the water so you can see it. And then you have a timer up there. It's going to be very, oops, sorry, what did I do? No, sorry, I'm not going to do it that way. Ah, I need to click. So this one keeps exploring phase space now in extended chain. It has very quickly coiled up with less than 10 nanoseconds. It has collapsed. And then you see the helix forming. Did you see how quick that happened? Helix is formed fast, and there's actually a beta sheet too, but it's in the wrong direction. It keeps exploring sides here. It's not too happy with the beta sheet, and that beta sheet is actually slightly wrong. And that one occasionally unfolds and everything a bit, and then eventually here somewhere around 40 nanoseconds is going to be very happy. And this is within three angstroms of the native state. This happened super quick. It's 40 nanoseconds. If this happened in 40 nanoseconds, you could do it on your computer at any time. The problem is, of course, there were 10 simulations out of 100,000 that folds. And this is just statistics about transition barriers, because the expected folding time of this protein is about five to 10 milliseconds. And if you do that math, you would expect if you run 100,000 simulations to see roughly 10 folding events. And this is the curse. It is statistical mechanics. We need to have a huge amount of time, but once the hard part is waiting until you have enough energy, and that comes back to the thing we talked about, the energy barriers and the probability of dropping down, right? Once you have amassed enough energy, the actual process of folding is fairly quick. This is almost a first order, and I said no, not first order. This is almost a single state transition. We had the helix forming slightly earlier, and then you had the beta sheet forming. So you could argue that there are two kind of different processes. If you measure this experimentally, it still works reasonably well to approximate this with a single state folding. Today, in folding at home, we can fold proteins up to roughly 80 residues and predict it exactly with the structures. And if you're going from 20 to 80, it's not a factor of four. It's like, because we're talking about exponentials, right? So it's insane how efficient this has become. And this goes back to this little trajectory, because I kind of, what you saw in the last slide was an example trajectory that one particle could follow under some conditions. That does not mean that every particle will follow exactly that direction. That it's not the prediction of the motion, but it's a chaotic process, and we're sampling states from whatever chaotic process we have, for instance, the attractor. The important thing to understand kinetic synthesis is really probabilities, that if you are in the state where it's half folded, what is the likelihood of being fully folded? If anybody says that you will always fold this wrong, if anybody says you can never fold this wrong, it's all based on statistics and likelihoods. The good thing is that computers are pretty good at that, and they're getting better every single year. And the concept that people frequently call they call the shadow trajectory, that the simulated path is over a reasonable time sufficiently close to some possible path, but that doesn't necessarily mean that we can predict that path exactly. And the good take home is we don't care. As long as you're sampling things from the correct distribution in your energy landscape, all your average predicted calculated properties are gonna be correct. So focus on the statistics, not the individual samples. In practice, they're gonna be a handful of small complications because you're gradually gonna start simulating small alpha helices and simple proteins and everything. We're not gonna be able to fold things because then you would have to wait three weeks. And the idea is we're gonna try to have you focus on analyzing and understanding things. But there are some things that complicate things a bit that it's so easy to just push the button and say that you're simulating something. If you don't do anything in particular, what will likely happen is that energy is conserved, right? Forget about these cut-offs, there's details. Darien Björn will likely go through that. So in general, if you have a protein that's not perfectly happy where it is right now, as you're searching the landscape, you will likely find lower and lower values of the potential energy because it will pack itself better. That's great, the potential energy goes down. But as the potential energy goes down and the total energy is conserved, that means the kinetic energy goes up. The temperature goes up. So suddenly you're simulating your protein at 400 Kelvin. And just 400 Kelvin, your protein will unfold. So that didn't really work. So you need to somehow fix the temperature to make sure that you simulate at room temperature, 300 Kelvin. That's what happened in the lab too, in a test tube. The test tube can exchange heat with its environment. And you need to describe that in a simulation too. Similar, if you want pressure or volume to change, you should describe that too. But just as for our simple systems, forget about it, it doesn't really, it's not relevant for what you're doing. But the temperature is. So as a very simple way to control temperature, it's just kinetic energy. So you know every single particle, you know every single velocity in the system. I can calculate what the temperature is and say, oh, it's 310 Kelvin at this particular step. So let's scale all my velocities down a little bit in that case because that's gonna reduce the temperature. I don't likely wanna go, I don't wanna scale it perfectly to 300 Kelvin, but have it smoothly. If the temperature is too high, I try to take all my temperatures down by epsilon. And if the temperature is too low, I try to increase them by epsilon. And that will very quickly get you roughly correct temperature. You're not gonna need to do this yourself. Programs can do this for you. But the point is that suddenly for a realistic system, the reason we had the simple labs we had, for a simple lab, you only need to think of a state. For a realistic system, there's hardly gonna be 50 settings to think about. Bjorn and Ari will take you through that. You won't have to think about most of them. But the reason why these files are so long that you, for instance, need to think about the temperature and the pressure coupling and how you're doing your system and how you're handling all the interactions. Let's see, so did that one hang? So the take-home message for this, well, it's certainly possible to simulate almost anything. If you have a powerful computer, you also need to remember that at the end of the day, this is statistical mechanics. Let me repeat that again. It says statistical ahead of mechanics. And at the end of the day, that means that just because you see something, it doesn't mean that you've seen a protein move. It doesn't mean that you've seen a protein fold. No matter how beautiful these plots look like, these are numbers in a computer. And that's a model. It's a model just as if you have a model that you derive with pen and paper. Now, that doesn't mean that models are useless, but you have to realize that it's about statistics. If we see something, several things, I say 10 here, but even that is probably a bit stupid. You really should calculate a proper standard error here. And if you can say that based on the statistical model you have the likelihood of this is that it's at least 95% certain or 99% or even 99.9% certain, then you can say that within the statistical significance that you had in your model, this is something that you see in a simulation. But note that I say within the statistical significance, it's never binary, it's never right, it's never wrong, it's all about statistics. And conversely, even a single observation while at face value might see that that can be just a fluctuation. And if you're just seeing something once, I would argue that your first gut feeling should be that that was likely just a fluctuation. In some cases, you can say something about it. If you predict that a protein never folds, if it can fold once in a simulation, that's likely enough to falsify that hypothesis. But you really should catch one take home message here and that is don't really use simulations for show and tell. While we sometimes call them an atomic microscopes, I don't really like that word, that atomic or molecular or even computational microscope. This is not a microscope. This is the world's most powerful theoretical model in a way, but these theoretical models has to be used to calculate proper error estimates. So what you would do in a typical simulation then is that you would get a structure. Could be from the Protein Data Bank, but if you're working together with an experimental group, they might well, well give you a brand new structure that hasn't been published yet. In most cases, there are gonna be lots of missing parts here. Most x-ray structures don't contain hydrogens, for instance, there might be some side chains missing, maybe a loop, there could be some atoms that they couldn't resolve, bad things happen. Thank other programs to fix most of these things and most things can be handled automatically, though not all. So for instance, the protonation and titration that I speak about here, the challenge there is that this will depend on the pH and no computer in the world can decide what pH you should run your experiments on. That depends on what you're trying to reproduce. Item three here is a prepare a topology. This is an extremely labor intensive and complicated part because this is really, this small word parameters, this is really where your entire model goes. Every single value for every single constant, how you're modeling the force field, all the setup and everything, it's gigantic. But the advantage is that there are programs that do this for you. So in many cases, this will take two seconds, you hit a button and boom, you get all the parameters you need to start a simulation. You're gonna need to embed your protein in whatever solvent you're gonna use. This could be water for soluble protein or a membrane if this is a membrane protein and then you might even need water around the protein and membrane. And then there is something strange. It's its energy minimization. Why do I say energy minimization? I just told you a couple of slides ago that energy minimization is pretty much useless. It's not gonna help us get a better structure, right? Well, that's true, but it's also not really what the point is here. I could in principle start a simulation right away, but the problem with simulations is that there is always something overlapping just a little bit. For instance, those hydrogens that we're missing, well, the second I add a hydrogen, I might happen to put that hydrogen on top of another hydrogen. And suddenly you have two charged hydrogens that are gonna repel each other extremely strong. That strong force is gonna lead to a very large acceleration. And in fact, that acceleration might be so large that if you're unlucky, this small hydrogen, in the next step, it might suddenly have a speed that's equivalent to half the speed of light. That will lead to the entire protein system crashing one or two times steps later and you're gonna have atoms all over the place. And suddenly the computer is gonna say that these are not a number. It can't even calculate it. That's probably a reasonably good model, what would happen if you actually could put two atoms on top of each other, like in a hydrogen bomb or something. The reason this happens, of course, is that we're starting from a completely unrealistic structure. So when we say energy minimization here, what I really probably mean is energy maximum avoidance, but that's a bit too difficult to say. So the whole point of the energy minimization is to do exactly what you saw in the movie some slides ago here, that energy minimizations moves the atoms just so slightly to make sure that we don't have any horrible overlaps that would cause really bad things when we start to simulate. The point of energy minimization is not to move to anything, it's just to move away from those really bad states. And the second we've done that, then we need to somehow run a small relaxation simulation. We wanna make sure that the water packs around the protein. If the structure is not perfectly correct, we wanna make sure that it has a chance to adapt to the solvent and the force field. And after that, you should just run the production simulation you are aiming for. And then it says analyze the data. What should you analyze? Well, that's a challenge. That depends on why you ran the simulation. And at this stage, it's very common to see people that they've done a complete simulation and then they ask after the simulation, what am I supposed to analyze in the simulation? If that happens to you, you went wrong already item zero because before you do an experiment, you need to decide why you're doing an experiment. This is not fundamentally different from going down to a microscope lab and starting to take tens of thousands of completely random images. You're unlikely to gonna get some amazing results from completely random images. You need to decide first, what are you imaging and why? And in exactly the same way, if you're doing a simulation, you need to decide what are you simulating? Why are you simulating it? What is your hypothesis? And what is the analysis that might be able to either prove or disprove that hypothesis? But that is something you need to think about before you even get that structure. So please try to avoid being at the last step and not really knowing what you should analyze or why. And unfortunately, this is also by far the hardest part for any program to help you automate. But there are also some amazing things we can do this way. Suddenly we have an extremely powerful model tool here that we can and we will use this in lots of different ways in the lab to start studying realistic systems. In contrast to say a normal microscope, what we can do in the simulation, I can literally put a flag on every item. I can see exactly what happens. I can compare folding of different proteins to see the mechanism with which they are folding. We will get back to that later in the course, but when Finkelstein wrote this book, it was mostly a matter of paper and pen and ideas that were very difficult to prove. Now we can show exactly when a protein folds under a particular model by using the computer simulations. And that is pretty much what I had for you today. There are a bunch of study questions here. It's primarily book chapter nine for the healers sheets and the coil. And the last part here about simulating real biomolecules is not covered in the book, but please follow the lecture notes here. And I think you will be fine and you will get a chance to do some labs on this, if not this week, so at least next one. Thank you and see you tomorrow.