 Today I'm gonna we're gonna finish off the theory part today at least the heavy theory part So by Friday, we're gonna be back Into proteins, but then we're gonna be looking at real actual protein structures and start to consider what we can do with them How we can predict simulate and really understand biology So bear with me if you if you felt that it's far too much equations as far There's gonna be some equations later on in the course, but we're gonna become way more biological in a couple of days We spoke we spent quite a lot of time yesterday Rederiving the Boltzmann distribution for the general case. I spoke about alpha helices. I never got a chance to get to beta sheets I'm gonna repeat the alpha helical part and I I didn't quite forget about your study questions either But since we're we're right in the middle of derivations. I didn't really have that many of them So this is roughly what I had on the the very last slide So let's do a shorter version of the study part and then I will go through and repeat the complicated See me complicated part at least about alpha helices today. So what are all these letters? Let's start somewhere. What is E? Sorry energy Yes, and it's particularly it's particularly a potential energy or something I wouldn't call it enthalpy for a specific reason but that it's energy versus not free energy Are there any other letters you use for energy? Yes, you can. Well, yes as you started. What is H? Sorry, I Sorry, I just I just didn't catch it Enthalpy yes, so what's the difference between enthalpy and energy here? Yeah, so it also includes the part when we're doing work External work on the system or that the system is doing external work on the surrounding That was actually not what I was thinking about when I mentioned that there are other letters so the problem with as I've told you a couple of times what I like with physics is that if you don't know something you can just define it And don't be afraid of defining things Now in theory if you if I ask you to ever read or derive something You could of course say that you would like to use E for entropy as long as you made a very clear definition That that's what you're doing. It's perfectly fine The rest of the world is gonna have some major trouble understanding your derivations But as long as as long as it's very clear you defined you can use absolutely any letter you want Now that's good, but that's also bad. It's good when it's your freedom. It's bad when other people use the freedom So the problem is that there is a small truckload of different letters that people use for energy in particular So later on you will see sometimes we use V in particular for potential energy Sometimes we're gonna use you to determine an energy Sometimes use a small epsilon when it's small, right? It's a small energy. It's just stupid It would be easier if we all stuck to one letter But in many of these equations suddenly you need to account for different energies and Then it's very nice to have a handful of them that you can reuse That would be good if people just stuck to these letters as you will see later today I'm very sorry, but suddenly we're gonna use you for free energy It's just that the book you need a letter and then you pick a letter The solution to this is be very careful about your definitions and if you want something define it Don't assume that E means energy just because you see E in the formula a very long time ago I Was a PhD student at the Royal Institute of Technology here in Stockholm Then as a PhD student you're TA and you just have to sit down and correct exams and these physics students They're pretty smart and a particular well. They're pretty smart, but some of them are also pretty lazy and It's typical physics course. There's not that much yet to learn by heart said they have this gigantic formula books they could have So this was a course on atomic physics and in atomic physics They're you frequently use the letter Z to mean the atomic number and this student probably hadn't read that much of the course We had this very large formula So there was some formula where you saw the letter Z and then he started using his formula until he found the partition function That the yesterday if you saw the partition function, we used that for the partition function So he just happily inserted the partition function everywhere. It's at Z That's not entirely stupid because that the formulas are good, right? But be careful with your definitions because if you had just followed the definitions You would have said that it was a completely different Z So that brings us to the other letters. What is s? Entropy and That is just defined as Boltzmann's constant times the logarithm of the number of microstates And I know in the book we frequently use V for the microstates personally. I like to use Omega Because it's very it's not really a volume, right V V means at least to me a capital V Normally means volume Omega is something you never use anywhere else same thing here You can use absolutely any letter you want as long as you've defined it properly And then we have F and D that are So what's the difference between them? No, and which one is which? Normally it is where we define this But physicists in particular frequently very sloppy So physicists if you're well both chemists and physicists are sloppy It's very common in physics when you actually want to understand the difference you typically use either F or G But in many cases in physics you always use F because you don't really care about pressure Same thing in chemistry. We always need to take pressure into account. So you use F So F here, too, it's very dangerous when you see both of the letters You can assume that G includes the enthalpy F doesn't if you just see F. It can be either So be careful with the definition. I wish that this was more It comes down to this things that standards are good everybody who should have their own so why? This so actually you can also the the other question. Why do we have kb? Or the gas constant are it's very much related to your the answer you just had Yes Here you mean no, I was so this is taking the pressure volume into account would be F versus G Right, so you move to from e to H and from F to G that takes the pressure and volume term into account But this is different So Boltzmann's constant that's really a fundament I wouldn't say well you could say that it's a fundamental natural constant in the sense that it has to do with the scale We've chosen for temperature So the Boltzmann they're really in the natural way as we said temperature comes out naturally as a way How much does the entropy increase when you're adding energy, right? and that just happens to corresponds perfectly to this Intuitive concept you all know as temperature So physics the obvious way to do this would be to measure this in terms of energy And always have kb multiplied by T That would require us to change every single thermometer in the world have units of energy It would require us to reteach the entire population what temperature is and say well, there is no such thing as temperature This is really energy That's never gonna happen So in practice we have to use Boltzmann's constants to be able to translate between temperature as we've known it for Several hundred years and the way we thermodynamically derive these things But that's just a scale constant to translate from from Kelvin to Jules But then we have this difference are the gas constant So all the energies I talk about I'm sloppy you're gonna be sloppy too What is the stabilization energy of a hydrogen bond? Sorry now the stabilizer then sorry what is the end rather than a stabilize it? What is it roughly the free energy of a hydrogen bond per mole? Yes And you know roughly what a mole is right? I have a Gadar's constant So think about the actual energies we're speaking about in biological sisters are insanely low They're like something times 10 to the minus 25 or 23 at least right So they would be absurd to always if we were to work that you would always need to say well My energy is five times 10 to the minus 24 or something Apart from the fact it would take lots of space. We would keep making mistakes It's it's simply pointless and in the lab. You never you never go into the lab and say we added Six billion five hundred ninety six thousand molecules to the sample. You don't say that chemists work with mold chemists don't really work with molecules So the same thing that it makes lots of sense to talk about counting molecules in physics Or so in physics you talk about number of molecules number of atoms everything in chemistry we prefer to work with molds and That's of course why you use the gas constant. It's basically Both the Boltzmann constant and then you include up a Gadar's constant in it But that also means the second we include that we can start to talk about energies per mole But do not forget that per mole right because the second you start to say that the Stabilization energy of a hydrogen bond is 5k Cal you're off by a factor of 10 to the power of 23 So energies and these molecular systems are tiny so how does the system's property or How does that well many systems properties plural change with energy or temperature and this question is deliberately a bit fuzzy What do we mean when something it changes with energy? So what property what is the main property that happens if you increase the energy in a system? Volume in particular the thing we talked about all day yesterday in these curves Right, and how do they change? Right, so if we are increasing energy right there is more ways to divide this energy in the system And that suddenly means that there will be more micro states in the system that are populated So the system will be more disordered Entropy goes up. There are many all these things basically say the same thing But there are different ways of formulating it so higher entropy means that more micro states are populated There are more ways to divide the energy etc So you can think of this one mature anyway any toys or something if you just have two toys It's difficult to make a mess of them if you have ten thousand toys. It's very easy to make a mess of them So what is a phase transition? If you use this formulations in free with free energy, so there are two parts there. What our faces first? Different states of matter right so you can think of either water vapor or liquid water or ice And if you look at a general phase diagram in physics, you can have pretty much any state under any conditions Now the likelihood of having say ice at hundred degrees centigrade It's fairly low, but it's not exactly zero because now we're thinking in terms of both the Boltzmann function, right? Boltzmann distribution But and the reason why some of these are more or less likely that's of course because the free energies of different states vary, right? So phase transitions happen when these curves cross each other So suddenly normally you would normally you're gonna have most of the system by well not just most of the system The vast majority of the system in the lowest free energy state But suddenly ice is no longer the lowest free energy state when you're passing zero degrees centigrade water liquid water would be the lowest free energy state So it's not really that the water changes or that ice changes But since the relative free and the relative the free energy of water is smooth this free energy of ice is smooth But because they cross suddenly it's going to be better to move in the others move over to the other states Yes, so let's draw this here We can say that this is the free energy as a function of temperature and as the temperature increases The free energy of for instance ice is you start out low and then you gradually go upright because when you're increasing temperature It becomes less and less favorable to be in this very ordered state because the entropic term gets worse But this is smooth all the way even around zero degrees centigrade Water on the other hand starts being pretty bad because at low temperature. It's pretty bad to be at low temperature water The disorder in water doesn't really help you and they the hydrogen bond terms are gonna punish you and as we move to higher and higher And higher temperatures this disordered states because better So if you look at either of these curves, there is absolutely nothing special that happens at zero degrees centigrade But the point is that they cross each other, right? So here it's better to be red here It's better to be green and the reason why I keep pushing this simple physical chemistry stuff Is that as you're going to see later today beta sheets in particular undergo exactly the same type of transitions? So that protein folding large parts of it really corresponds to phase transitions. It's suddenly better to be in another state So related to this. What are transition barriers? That's going to be a little bit important transition rates and barriers So what is yeah, so what is that the amount of energy here that is the? Right, but that is in particular so that's about two things Yeah activation or the free energy barrier, right the free energy barrier need to get over and how is that related to transition rates? Yeah, so Per unit time so that once you have a barrier we can somehow calculate or at least estimate What is the time it takes to go over that barrier? But rather than talking about times it frequently makes much more sense to talk about velocity Is that how many molecules per unit of time go over so the higher the transition rates are the higher the quicker reactions happen? So what is that determines how fast the reaction happens which free energy barrier so you said the largest one Why the largest one? So you're partly right and partly wrong So you're certainly right in the sense if I have a complicated process that has to you starting out here And then you have to go through multiple states and there is one very high barrier So if you have to go through barriers sequentially, it's the largest one that's going to dominate completely But if we go back to my example if I need to get to the Kitchen to get my coffee. There are multiple ways I can get to the kitchen What determines how quickly this reaction happens? Yes, but the local mice in this in this case you can argue that I In this case you had a one starting point, right and one pathway to the ending point as a single pathway you can take Assuming now you had a different system here Because this is a gross oversimplification. Remember that Real chemistry has hundreds of thousands of dimensions, right? So if I have a starting state here and Then a finishing state here Imagine now if there are lots of different ways I can take here and this is now not as a function of free energy You can think of this as some sort of conformational landscape There are lots of different ways I can take here and along each of these ways There would be one barrier there one barrier there one barrier there Which one is going to determine how quickly my reaction happens suddenly? It's the smallest barrier. Why? Exactly so that and the smallest barrier that's going to correspond to the highest transition rates So the point is that if there are 10,000 molecules per second that take this path Yes, there might be another two per second that takes the next path But that doesn't really matter right in principle. We can sum them all up But we're going to have the largest transition rate over the barrier. That's lowest So if they are sequential, it's the highest barrier. That's most important if they're can occur in parallel It's the lowest barrier. That's going to be important. So this relates to stuff that you already asked about enzymes What does an enzyme do? How does the enzyme catalyze a reaction? It's an enzyme lowers the activation it lowers the barrier somewhere, right? And suddenly you can of course also take the path without the enzyme that path is still available But since there is now a lower pathway and sorry there since there is now a second pathway with lower barriers Virtually all the molecules is going to take that one is that We'll come back to enzymes later on Rather than asking you about the last one here. I will recap and go through the alpha helix Kinetics a bit and then we'll head on to beta sheets. So I Spoke where we were yesterday that I finished a bit quick the last 10 15 minutes yesterday Was that we wanted to understand how a helix forms both? What is the stabilization energy of the helices? What are the average properties of the helices? Can I somehow connect this to experiment because all I did yesterday was hand wave. I said it Yeah, you'll have to trust me that the initiative that the stabilization is roughly two hydrogen bonds, etc We have no idea whether that is correct or not that were rough approximations from me as a physicist and then I pretty much told you trust me for now Today the point of this comparing to experiments is that we will be able to see are these roughly correct? Or would we have to correct these terms of it? We will also be able to see how quickly helices form and This is not going to be correct to more than about one order of magnitude But as you will quickly learn here, it's the orders of magnitude that are important If something happens a factor 10 faster or slower Completely pointless, but when things can happen factor a hundred thousand faster or slower that you start seeing differences And as you will see in particular there are some substantial differences between helices and sheets so one thing I Said relatively early on as much as we talked about phase transitions Helices are actually technically not phase transitions in the physical sense because phase transitions happens because it's very bad in nature to mix Different phases you end up with a large surface tension between them Helices are one-dimensional systems and that means that the So-called surface the boundary between helix and coil that's always going to be finite here and just individual residues So the helix and coil can coexist That is Initially you might think that's bad But that's wonderful because it makes it so easy to do measurements and helices So what you do is that you take a small sequence and you put it in a CD spectrometer And then we change the temperature or the salinity or something and then we can measure is it between? What is the fraction of my small sequence that has moved over to helix? Yes Yes, so normally what happens in physics, right? Is that the point where you mixed any two phases say water and ice? That's going to be one of these local maxima that it's always going to be better for you to move Over to more water or more ice and eventually hundred percent water or hundred percent ice Because this mixing is bad along the surface. You're not going to have you're neither going to have water or And you're sorry you're neither going to have water nor ice, so you will perturb the hydrogen bonds in that specific case But when it comes to the helix if you increase the amount of helix in The way I'm not increasing the surface area between the helix and coil or if I reduce similarly if I reduce the amount of helix I'm not changing the area between the helix and coil So I can adjust the number of helix without really adjusting that energy terminal and the the only reason I need to bring this up because The point here is that this is not a course of landau phase transitions and anything the reason I need to bring this up because Since we talked so much about phase transitions the natural way to interpret this is that it's a phase transition and technically it's not So just remember that so yesterday. I started talking a little bit about the equations, but You can think about this from the other end if I want to understand something about a helix I need to find a way that I can measure things So you could equally think about this measuring this from the CD spectroscopy part, right? That I can change some properties and then I can change the amount of helix But that just tells me that I have 14% helix under some conditions and that doesn't really help as much to understand it So I need to find some way to translate the amount of helical residues I have to the stabilization energies free energies that we talked about and What we said yesterday is that there were two terms in this helix so Delta F helix That corresponded to a Delta F in it plus the number of residues multiplied by Delta F elongation and I'm well aware that we use small f's yesterday and what we're interested is that Technically this initiation term is constant the way I formulated this yesterday was that The elongation term is roughly constant per residue This is of course not going to be exactly true because all this will depend slightly on the type of amino acid You have and everything but to first approximation there is something that's proportional to the number of residues And what we are interested in that if we're now looking at the CD spectrometer There will be some helices inside the sample, but we don't know exactly how long they are But what we do know is that at equilibrium they're gonna have their equilibrium length They don't want to get longer and they don't want to get shorter So what I argued a little bit yesterday is that Well, first you can say that at the equilibrium point That is when this Delta F elongation starts to be zero. So at that point the end The enthalpy we're gaining from the hydrogen bond is sorry the free energy we're gaining from the hydrogen bond is Exactly the same as the part we're losing in entropy for alligating the helix a bit more, but when we're looking at a real Sample of say a hundred residues or something I also have to take this part into a fact that the helix can start anywhere from residue 1 through 99 Actually 1 through 96 because we need at least four residues in a helix and then it can stop anywhere from residue 4 to roughly 100 Same thing here in fix if we wanted to it has that means that there are roughly 100 by 100 Positions where we can a start and be stop a helix, right? There are roughly 100 square positions 100 square ways of forming a helix here Technically we need that factor of 2 to say that it's going to start before it stops If I wanted to be really accurate here I should also take it to account that they should be at least three four residues long, but That just adds noise, right? We're not interested in the noise. We're interested in the main principles So the entropy the number of different ways we can form a helix really corresponds to Boltzmann's constant times the logarithm of the number of Microstates and here's where I would have used omega the book uses V. So I will stay with V But V sounds like a volume and that has we're counting different possibilities here. That's not the volume and Then we skip that factor of 2 just to make it simpler. So it's logarithm of n square and When that factor 2 is inside a logarithm we can put it outside the logarithm So that's roughly two times Boltzmann's constant times the logarithm of the length of the entire sequence including the non-helical part and Then we just know that the delta f Helix is that we're gonna have an initiation cost the cost of starting a helix is the initiation cost Minus the entropic part here, and then we just start solving a bit here We know that when the helix doesn't get longer the delta f of making a helix That's if we are at 19 units, and if you would like to get to 20 We are just at the point where the helix is no longer. Well, the delta f is zero It's not gonna gain us anymore to make it longer And then it's just a matter of inserting that in the equations we have you can forget about that part for now So what this gives us is that The number of residues in the entire sequence when we have roughly halfway helicity It's just gonna be an exponential of this initiation energy divided by 2 kT We don't know what the initiation energy is. I Hand waved about that yesterday and argued that that was roughly two times the stabilization energy of the hydrogen bond The point here is that this n zero we can measure from the CD spectroscopy, and then we can extract this initiation barrier So why do I keep hiding this part? Well For a long time when people in particular in CD spectroscopy when we measure these things it makes a lot of sense To let's see if I had this. No, I didn't have it there. So rather than work with all those long equations We had these two parameters sigma and s We're not really gonna repeat them that much, but it's just these are parameters you frequently see so sigma has to do both Sigma and s has to do with the propensity of different residues to either start forming a helix or extending the helix But for now, and that's the exact definition. So forget sigma for a second So the neat thing is that once you know n zero we can calculate either sigma or simply solve for this initiation energy And of course, this is going to depend on your amino acid For a common amino acid if this is in the order of a couple of kcals That's going to be positive because it's a barrier. We need to get over right and If that barriers in the border in the ballpark of four kcals The free energy of this hydrogen bonds inside helix is in the ballpark of minus two kcals per mole So was I right or wrong yesterday? I mentioned that this FH was roughly the stabilization energy of a hydrogen bond, right? That was wrong. What's the energy of a hydrogen bond? Ballpark of five kcals per mole. So I was off by a factor of two Now considering the amount of approximations we made on the way here. I Think this is pretty good There are a few things that are fundamentally different because of a factor of two So why do you think it's two instead of five? What are the things I've ignored? Partly that but in particular I said I lied a bit yesterday I said that a helix, you know what? Let's assume that a helix folds in vacuum, right? And then we just get all the energy of the hydrogen bonds So what this is an indication of any real helix there will be some hydrogen bonds in the unfolded state So you're not going to gain quite as much from making the hydrogen bonds in the folded state But again anything that's within a factor of two. I'm happy with in principle. It's the right ballpark But of course, yes, there were some horrible approximations I made and That also once we know that we can also estimate the entropy We're losing to put one residue in a helical conformation That's also the ballpark of two kcals per mole and again this will of course vary a bit for the specific residue So do you see the cool part here is that by mixing one dose of experiment and one dose of theory You can take a super simple measurement and translate something you measured in the lab to a very deep understanding about the properties of molecules This is even something that's I wouldn't say impossible, but very hard to calculate in the simulation And you could understand this by making a $1 experiment with the cheapest piece of equipment We have in the lab almost And of course here I'm showing this for protein folding helix something But this comes back into virtually anything you do in chemistry and life science a large part of research is about Attacking a very difficult problem a problem that you really can't measure straight out say that in a sequence machine Right if you would like to measure what is the sequence a gc or t? There isn't really any direct adenine measurement so you can see is it adenine You're gonna need to come up with a simple model based on fluorescence or free energy or something You need to make a super simple model of I expect that this would be this way and Then connect it to a measurement and if this works you can then go back and say how much Which one of my four bases that I have in this machine? And so virtually all measurements come back to this you can't just do the measurement You can't just do the theory, but you need to combine a simple model with measurements Models should always be simple. You might think that these are dirt or ugly models because they're so simple, right? There is a famous saying by Wolfgang Pauli. We say you should always simplify as much as possible, but never more The beauty is in the simple models because they help you understand. Yes Yes Yeah, sorry. Yes, because this at this point when the when the helix doesn't want to get any longer, right? If what would happen if your elongation energy was still negative Then you would grow the helix right so at some point when the helix is happy the elongation energy will be roughly plus minus zero and Here we're not quite in the beauty because yesterday I kind of hinted that this should be constant It's not going to be constant It will depend a bit and at some point you're not going to gain stuff from making the helix any longer What you can argue what happens with the helix ability is that there are other reason why people use these parameters? is that depending on what The free energy of elongation is you're gonna have a gigantic shift in the helicity so at again this parameter sigma depending what the temperature is because the temperature also adds in here if as You're Stabilizing the energy of an alpha helix and you see there really we're talking about units of minus point one to plus point one KT here that's extremely small differences in the helicity So the helicity is really how much this amino acid would like to be in a helix So that if this the second this residue starts to prefer to Slightly prefer to be in a helix because we have many residues in the helix you will over Over roughly zero point two units of KT. We will go from zero percent helix to almost hundred percent helix So the helix is an extremely collective property Either you tend to have mostly coil or you tend to have mostly helix and This way it kind of looks like a phase transition, but formally it's not the phase transition It's an extremely cooperative process But formally not phase transition because the width had this been water for a large system You would eventually have this as a single point, right? You will be water water water water and then suddenly ice ice ice ice and here you always always have a finite width But this is again why helices are so stable if most of the residues in your structure prefer to be helical You will over very narrow temporary or sheen you will go over to be almost entirely helical So that's important for your proteins. Otherwise, you would most of your helices would be kind of floppy now Depending not all molecules have this is to minus 2k calurite. So glycein for instance doesn't like to be helix So you can measure it and show that it's positive as expected Proline on the other hand You see that it's plus three to five k cal That starts to be a really horrible if you're having to lie. Sorry two proteins after each other. It's pretty much never gonna be helix But you can certainly survive a glycine or two Alanine is slightly negative and then you have a couple of large hydrophobic ones that are really negative in In principle, you can use this to predict Given a sequence you can use this to predict whether it's going to be helix or not It's going to be really crappy really really crap but This were some of the very first attempts to predict protein structure I did already talk about two fast man rules in bioinformatics So that that time people didn't call it bioinformatics So at that time people started to look at experimental properties of amino acids both the how much they like to be in helix how much they like to be in a beta sheet and how much they like to be in turns and Based on this again, just simple measurements of them You could then for an arbitrary sequence try to predict whether it should be helix sheet This was a tremendous advance in the 60s 70s because you got like Probably 40 50% accuracy on your secondary structure determination But that this was really the first time you could use paper and pen or at least a computer to predict what the structure should be So why are you so much better today? In bioinformatics with your predictors Where do you get your more information from and you cheat right? You cheat by looking at actual proteins and evolutionary information So this is the problem that in principle little information is there in physics But if you only look at one helix at a time You're not going to get all the contributions from the surrounding helixes And that's why in practice you're going to need to look at the entire protein So if we move on but that was just a stability So if you then move on to the kinetics and I'm going to argue that helix is formed super fast both parts of hundreds of Nanoseconds for a small typical segment and that means that each helix or residue you're adding your torque five nanosecond might be a bit fast But say well below hundred nanoseconds at least So this is so fast that you can see helix is growing simulations. No problem whatsoever well We kind of know what this is right because we went through the whole theory of barriers yesterday The barrier should be proportional to some sort of time constant Multiplied by the exponential of the free energy barrier divided by kt And we know roughly what this free energy barrier is now because that's what we measured in the last few slides So if I look at a particular position say position one The overall the time I would expect it to go to take to go across this barrier is going to be some sort of time constant We have no idea what this is, but it's some time that represents Just the time it would take to move across the barrier if you had enough energy and then an exponential of the free energy barrier divided by kt and Again, you can forget the sigma if you want But we know that this has you could all say that this tau times exponential of roughly minus 2k cal per mole divided by kt, right? But here it becomes convenient just to use the sigma parameter But for a long sequence you don't have to start in position one You can start in the middle of the helix and then grow the helix to both sides So that the position the number of places we can start the position is basically roughly the number of residues in the entire sequence, right? the first approximation and We knew from the last slide two slides ago that that you could also relate to this one over the sigma parameter or If you don't like sigma this expression So it turns out that the time it takes to initiate the helix anywhere It's roughly going to be this tau and then well You can say the exponential of f in it divided by 2k t or again if we like to use the sigma it's going to be something like that And then if we're now going to need to propagate this to all residues Let's just assume that it will take roughly the same if we assume that it's going to be the first Approximation the same time, but then we now have to do this for all residues That too I can translate to this sigma parameter and again The only reason for using this sigma is to avoid using this long exponential expression everywhere If you do this math and compare them you will see that for a normal helix under this approximation You're going to spend roughly half the time to start it somewhere and roughly half the time to extend it The main take-home lesson there is that helices they start to form super quick The barrier the initiation barrier there is an initiation barrier, but it's low It's almost so low that you it's so low that you hardly see it You could almost see helices grow not quite but and that also means for large proteins large proteins take time to fold Helices are typically not going to be the time-limiting factor for protein folding They will if things are happy in a helix they boom they will form a helix right away so To sum up the helices they form super quick Both initiation and elongation matters for the time Because they they form so quickly that we're not going to be entirely dominated by initiation barrier It's possible to determine all these energies from very simple CD spectra The free energy barriers we're talking about this one K cal might be a bit optimistic But not more than a handful of K cal So less than the free energy of a hydrogen bond Which means that there is hardly any bump at all for helix to start forming and What this n zero matters that's already here we can start to see there's going to be a characteristic length of helices somewhere around 2030 residues at equilibrium just based on measurements We will come back to this and prove this more formally why 2030 residues are going to be the The most stable length of helices. Did you have a question or? So let's see we did this in two ways right What I what I did the last few things is I'd said that there is some sort of initiation barrier and there is some sort of Elangation barrier and then I said that what are the stabilizing parts and what are the destabilizing parts the stabilizing parts would be the Hydrogen primarily the end the free energy corresponding to the hydrogen bonds and then the parts that were purely a loss Were they lost in entropy because I'm forcing residues to become helico So in that case the barrier will have to come from the fact that early on The thing that cost you that thing that's cost you free energy to form an alphelix is always going to be entropy But the difference here is that initially on you're only paying that cost But we're not gaining anything back from hydrogen bonds So it's not until we've formed like three or four turns that we actually start getting some hydrogen bonds that will favor it Now to make it slightly more complicated that doesn't mean so there the part that cost you is virtually only entropy But the part where you gain stuff the hydrogen bond That's a bit of a mix right because hydrogen bond forms in particularly inside sorry in particular when we Take something from water where you already have hydrogen bonds and move it to the inside of a protein where we just Redistributing hydrogen bonds so the hydrogen bond formation itself contains both entropy and enthalpy and now I realize I've probably confused you more right This gets super complicated if you try to look at each and every individual term in the free energy So think about this hierarchical part So when I talk when I talk about the alpha helices and we're going to do this for beta sheets now Which will be another even slightly more complicated than the alpha helices Then we don't really care that much about energy versus entropy anymore So now I just talked about the free energy. What is the free energy of starting something? What is the free energy of extending it? Now each of these four energies will internally have both enthalpy and entropy parts But when I just thinking about the barriers, I try not to worry too much about the details So that was alpha helices so we tried to do this for beta sheets The funny thing is if you look at beta sheets It's not just that they can take 10 or 100 times longer than alpha helices. They can take millions of times longer to fold They can take hours to weeks to form a small beta sheet So there is something that's fundamentally different from alpha helices here, and that's what we would like to understand This is not merely a matter of understanding physical chemistry for low structures This as I'm going to tell you in a couple of slides turns out to be critical for a lot of diseases There are lots of diseases in the body that appear to be related to beta sheets folding and misfolding and it's very much related to the kinetics And in other cases beta sheets will fold in a millisecond almost as quickly as an alpha helice We don't know why so the first question that we need to ask ourselves is There is some sort of if something is slow. We need we need to there's going to there are going to be some barriers involved here, right? Can you say something about the barriers just by comparing these two that it can sometimes take a millisecond and sometimes weeks? You can that's a rhetoric question, but what can we say about it? So what can I say about the activation energy? Yes, I would say that they're higher in beta sheets because alpha helices could fold in a fraction of a microsecond or so But if it's sometimes very fast and sometimes very slow, what more can I say about the activation energy? It's going to be a very large range of Activation energies for beta sheets. So we need to understand why they are sometimes gigantic and sometimes relatively low I'm also going to again the beta sheets actually appears to be a formal phase transition It's going to be all or nothing So what happens when we fold things a beta sheets if you avoid looking ahead at the next few slides How would you try to understand beta sheet folding if I asked you if you had no idea? How do we approach the alpha helix? So you can measure the fraction of beta sheet and something roughly how you would see these big cross copies I'm not going to go too much into the experiments here, but Alpha helix is in a way that was a trivial process because we had our small helix right and we kept adding to the helix One of the big questions is how do beta sheets form right and what are the intermediate states? What happens on the way? And if you want to understand kinetics What can we say? About the kinetics. Well, you said that there had to be some barriers involved, right? So let's see. There has to be some sort of initial state here. I will Approximate things and say that there is just one large barrier here and then some stable state Why am I drawing just one barrier and in particular all or nothing as I spoke about yesterday and in the recap this morning Why cannot when can I approximate things with one barrier? One path or in particular, there's likely going to be one barrier. That is the rate limiting step, right? So let's in practice. There's going to be hundreds of small barriers, but let's try to focus on the main barrier Now could you imagine many ways of forming a beta sheet? Actually, we could write you could imagine having some starting and end state and there are lots of different potential paths between them But in that case, I would try to pick the path that is the best one So first we pick the path that is the best one because the first approximation I can ignore the other ones, right? We're going to see most of the flux there Once I have the barrier that is the best one Then we start looking at Okay, so this barrier is going to be the lowest of all different paths So what is the lowest barrier in a lot? What is the main barrier along the lowest path and? That is the barrier we want to understand and if we can again if I can characterize how high this is going to be Then I will understand how quick or slow they will form So in principle at this point you should bring out paper and pen and start to sketch a couple of different beta sheets Let's try to guess what would happen and what would the different states be here? So what would the states correspond to? Well here you're going to have a full beta sheet, right? Here's where we want to get that's the trivial one This one is something that's completely unfolded, but what is that? What do we have there at the peak of the barrier? well Forget about that plot for a second in general. What type of states are these? So it's a transition state and what do we mean by a transition state? Is it a good or a bad state? How bad is it in practice? You can ever measure that this is about the absolutely worst state you will ever go through on the path there, right? So we're interested in finding the worst possible state along the best path And here is where your intuition will come in a bit. We need to think about the ways a typical beta sheet will form From the start you have to accept this is not going to be exact. This will be a model Now how good or bad your model is can only be assessed one way and What is that? How do you determine if your model is good or bad? Will it have predictive value? Will you be able to learn important things from it? There is not necessarily any at the end of the day you could say that hey We should just throw all this in a computer and simulate it exactly Then you will get the answer, but you won't necessarily get the understanding So the beauty of a simple model is to get your understanding so that if we have a beta sheet if we just look at a beta sheet here at some point We're going to need to start to form the beta sheet somewhere around along the sequence and Eventually you can have a very very large beta sheet and then it will hopefully be stable We know that they have to be stable because you can see them experimentally But the interesting thing is usually happen. What happens when you start to form something when you have just one hairpin So if you have one strand and one turn and one more strand, that's what we call a beta hairpin It's somewhere there. That's going to be the start of a real helix, right? It's not necessarily going to be exactly there So there are some things that are good and bad here We will come back to them slightly run So the point is that we want to find the part here where we keep paying paying paying It gets worse and worse and worse and then see can we at least estimate that roughly what this state would correspond to and estimate What are the free energy parts that contribute positively and negatively to this state and? As some of you might have read in the book that's going to be roughly here So we formed a hairpin and we formed a second turn, but we haven't started adding more residues I will motivate that in a second why that's good or bad So if we compare sheets to alpha helices one key difference that they're two-dimensional So they they have both the length right and the width Fundamentally different from helices The interface area grows with the number of residues and the interface area you can think of that as the unpaired hydrogen bonds say Those hydrogen bonds don't have partners to the right right and those partner those high Sorry, those residues don't have any partners to the left. So there is some part here where they These residues have beta sheets on the inside but not on the outside and the same thing there And the more residues we put here the larger we make the beta sheet The larger that area is going to be and you will also have some residues here. That's Have an interface with the surrounding and If you're like the physics wonky part here that actually means that suddenly you're in more than one dimension And then faces can't coexist and that's why you're gonna have a face transition But what this means that we're gonna need to start to look a little bit about these What are the things when we interact with the surrounding? What happens at the edges of these sheets and what happens in these bends or the turns you make between two strands? Yes Technically everything is three-dimensional But in the in the sense in the way we're working with the model right and alpha helix you can even an alpha helix is three-dimensional, right? But on the model stage when determined is a helix Sorry, it's a residue in alpha helix or coil if you're just looking along the sequence. That's a one-dimensional property Even though the atoms itself if you look at the the entire amino acids on a low level it's three-dimensional so this all has to do with The the scale you're looking at a famous example in physics would be if you just have a very thin line The line is one-dimensional, right? So an ant work walking along a line that would be a one-dimensional process But if you now magnify this this line might really be a rope or something and if you magnify it enough to the ant This is gonna be appear to be three-dimensional So it's all a matter how much you magnify it So a protein would be three-dimensional the entire tertiary structure the secondary structure for an alpha helix would be bined dimensional beta sheet two-dimensional and the sequence is always one-dimensional, but it's it's just a model and Again the point here is not the main physics, but trust me that this will have properties as if it was a phase transition So let's look a little bit at these Sheet edges and bends what happens there and here's where this Can appear super complicated, but I would actually argue this is very easy What do you do if you don't know anything about something? You start to define things, right? And how lucky or unlucky we're gonna be that depends a little bit on how you make your definitions If you make good definitions things are gonna be easy In this case, it's gonna turn you can define this things pretty much any way you want It's not gonna be a gigantic difference in simplicity But rather I think I would confuse you if I defined this in a different way from the books I will try to follow the book So there are there are not that many things we have to consider Just as an alpha helix if you're moving something from a stretched out coil into a secondary structure, that's a more well-ordered state So that's in general gonna cost us But now we will stick to what I said five minutes ago. Let's not worry about the specific details differences between enthalpy and Entropy here. Let's think in terms of free entity So for now to make your life simpler, let's not even worry about the sign If I have a residue and move that from a stretched out coil and Put it in beta sheet state. There will be a difference in free energy just for the change in flexibility That's gonna be the same whether it's on the sheet or sorry Whether it's on the edge of a sheet or whether it's on the inside of a sheet to first approximation at least So let's just call that something the book calls it F beta the free energy of putting a residue inside a Beta hairpin so that is now in stretched out confirmation We're not gonna worry about the sign for now. It's a definition But then there is of course a substantial difference whether something is on the edge or not Sorry, I'm gonna need to go back. So once you're inside here all these black residues They will have a hydrogen bonds both to the left and to the right, right? But anything that's on an edge here will only make one hydrogen bond It doesn't matter whether it's just two strands or three strands The residues here that are in the first and large last beta strand there will be a difference there They will have one hydrogen bond less and Again, we're not going to worry about the specific entropy or enthalpy of hydrogen bonds But there will be some difference there You could call that F edge if you wanted to You can define it to anything the book likes to call it Delta F beta because it's a small extra change So that's the extra difference in free energy you get when you are at an edge So for no any residue when we move it to a beta sheet It's gonna have F beta and the ones that are an edge will also have Delta F beta and again at this point It's just definitions. We don't know anything about it So the residues inside will have F beta and the residues that are on these edges of the sheets Their free energy difference will now be F beta plus Delta F beta Do you follow me that far? Just definitions and The other part we're gonna need we're gonna need to take these turns into account whether it's called as bend or turns Or something doesn't really matter, but there will be I think you agree with me There will be some difference for the residues here whether that is because they're making fewer hydrogen bonds or because the entropy is different Let's not worry about the details. Well, we just say that this residue here Well, it's not the same thing as if it's inside But that is probably similar to that is probably similar to that So for those residues that are part of this one, let's say that there is some sort of free energy for it And whether that is one two or three residues. I don't really care But for every such turn I need to make there will be some free energy of making that turn I have no right now. I don't care about the sign There is a free energy. We might or might not be able to measure it But there is a free energy that corresponds to the turn So I just defined that as you and here you see the problem with physicists and the definitions In many cases you would use you for an energy and here we use you for a free energy Live with it. People will do that all the time People will define things in ways that are not what you would define it as in this case The book has made their definition. So let's follow their definition and now I and the book argue the last part here Since beta sheets form we must have the Delta F beta must be larger than zero and you must be larger than zero So this is the next step. I'm not I'm not at all Asking that you should take this for granted. So let's But it's the second you made your definitions, right? You can sit down and try to reason about them Do we know anything about our definitions? And I'm arguing that we know these two things So if I'm claiming that something must be the case, what is the best way of proving that this is true? Right in mathematics that you would call it reductio ad absurdo so that assume that the opposite is true and show That that leads to ridiculous Statement or that it would lead not compatible what we can measure So let's I think the u1 is easier What would happen if the free energy of making bends was negative? Right, so in that case all your sheets you would just have be making bends, right all the way The more bends the better Why why should why a nurse should even bother with a beta sheet just make more turns all the time? We don't see that in structures So in general that can't be true. I'm sure there could be one exception where it is true But in general sorry that won't happen So we must be paying to make the turns itself can't be good So we just and now we just learned something so one of our definitions. We've now said that that one must be positive I'm also saying that the Delta F beta must be larger than zero and there I will try Can we try to make that reasoning? So what was if Delta F beta first? Sorry Right, so the Delta F beta is basically the difference we would have When you are in a single strand instead of two strands, right? So if Delta F beta Was negative it would always be better to be on the outside of a sheet and That is you would never ever have any residues on the inside of a sheet and in that case You would just have one long beta strand You would never ever because we just said that turns weren't good either, right? We don't see that in protein side. It doesn't happen So that and again for an individual residue it can be true, but it can't be true for beta sheets in general So let's not worry about special cases, but focus on the general things So that's good. That means that we know a little bit about the things that we have We can so now we said we know something about Delta F and we know something about you Can we say something about F beta? The free energy for a residue to be on the inside of a beta sheet So dare to speak up there is nothing bad actually being wrong is a great thing You should be wrong because that means you're having ideas So what would happen if be F beta always is negative and remember I'm not talking about the specific residue But I'm talking for any residue then we would every single residue would always like to be beta sheets Beta sheet will always be better than coil. Is that generally true? No, because there are some residues that don't like to be beta sheets on the other end if F beta was also was always positive We would never see any beta sheets. So the problem and the reason I bring this up is that It's not always that you will be able to make these types of assumptions Sometimes you will and sometimes you won't so apparently we can't say anything about F beta that can have any sign The third part though is that the free energy at this edges We might be able to say something about F beta plus Delta F beta And I would argue that there are two scenarios here either they're larger than zero or smaller than zero They could be exactly zero too, but we don't worry about the that's just a single point So let's think about what this means if F beta plus Delta F beta was smaller than zero That would mean that As long as I Being in a beta sheet together with being at the edge is good as long as we make a single turn One large hairpin that like the longer it is would be good Because the residues are inside the beta sheet They're also at the edge. So for all these it would be positive as long as I can make one turn This would be awesome Well, that might happen in proteins if you look at the protein data bank You will occasionally see structures where there's just a beta hairpin That can happen not impossible The other possibility is that just for a single residue being in a beta sheet But also being at the edge is not really good and that would mean that the hairpin itself. This is not really good We kept paying forming the entire first hairpin is bad And it's only when we start adding the third Strand here so that we start having these residues that are fully on the inside of the protein It's not really until we start forming the third one that it starts to be good again Both of these can be possible Which one is which K if you if you're gonna look at beta sheets and we want to understand Why beta sheet formation can be so slow? Which one of this processes is going to be the slowest the most difficult to form beta sheets for? The second one right because at this point the second you form the hairpin is good And here we have formed the hairpin, but even the entire hairpin itself is bad And this is where you get to the point we had in the book So you need to have one entire beta strand Then we need to have a turn then we need to have one entire second beta strand It's still bad I Need to have a second turn the turns ourself. We said that you is positive, right? so that this is also bad and Finally when I start adding the first residue here now I start to have some residues on the inside So that to first approximation the worst possible state I'm gonna have here which is likely the transition state is gonna be something like this two entire strands and two turns This is a model I guess there are some things that I've ignored there and everything But let's try to work with that and see where that gets us So the free energy of this worst possible state would then be well, we have some sort of turn We have two Entire sorry and here would be number of residues in each strand and all residues they are in a beta they are in a beta sheets and They're also along the edge because we just have one long turn. Sorry. I'm jumping a little bit ahead here This is not there. What I'm what I'm gonna get that first Ha ha on the next slide I'm gonna go through and actually determine the free energy rate. The problem here is that we're gonna need to know For a hairpin or this in general, how long should the helix here be? Because we're gonna need to know that what is the small the smallest length when they start to be stable Because if n here was zero then I would only have turns But I don't want n to be infinity either, right? So we're gonna need to determine how long they should be So it will be a small extra proof here How do we determine how long this is? Well, we can calculate what the free energy of each of these parties right and as long as the free energy is good That it's gonna be longer and longer and longer at some point when they the difference is plus minus zero That's gonna be the smaller at some point We initially we just pay because if you make it longer we pay and at some point you're gonna get the smallest length Where they're actually stable? Because the part we are gaining from the inside here is the same as the part that we lost from the turn Yes, I haven't made the second turn yet. I will come back to that in a second Sorry as it was me that I skipped the part of it here. I'll make that derivation board carefully What I want to get that eventually I want to determine what that's transition status and I'll skip one slide ahead To determine with this transition status, I'm gonna need to calculate what is the cost of having one full strand here one turn One more full strand and one turn That's exactly where we want to get at The only problem to determine that I'm I know that I have two turns, but the question is How many residues do I have in each strand, right? The number of residues here is going to influence what the free energy is and we don't know that yet If this is just two residues or is it going to be 200 residues? That will in general influence how large the beta sheets are going to be and how easy it's going to be to fold them So I was a little bit too quick here to be able to say How large this free energy barrier is we first need to determine well What is the smallest length of these strands where they actually like to form beta sheets? So then we go back one step before we can get to the transition state We're gonna need to determine how long do hair pins want to be so if you just look at one hairpin there is some sort of length here and we're interested in The smallest length when the hairpin starts to be stable and the main difference here Here we have one full strand there one full strand here, but we only have one turn so we have The turn is you and then we have two times the length of so the number of residues We have in each strand and the part that it's beta sheet and the parts that they are at edges So both of them are at edges so we can group that together So what's gonna happen if we now keep extended this beta sheet for every beta sheet, sorry for every strand I'm adding here. I would add one turn and one strand, right and The second time I would add one turn and one strand One turn and one strand One turn and one strand So for this to be possible to happen The free energy here of one turn plus one strand Has to be at least zero. I can't if I were to pay here. I would never form beta sheets and In general, I only pay for the turn So I need to gain something back from the strand here But the problem is if the strand here is too short, I'm not gonna gain enough to counter the part I paid for the turn, right? So there has to be some sort of minimum length to the beta sheet or they can't be stable Do you follow that? So if I now extend this and this is where the beautiful definition helps me that I Still have exactly the same number of residues in edges because I still have one edge to the left and one to the right So I don't need to worry about that So what happens when I'm adding one more strand here is that I'm taking n residues and putting them in beta sheet So that was this energy f beta my definition Now if you had made another definition, you would end up with a difference here or something that works fine, too It's not the end of the world. It's just that this is slightly nicer And I also know that the cost here of making that turn. Well, that was the cost you So the case here the smallest length I can allow here That really has to do that the cost what we're paying for the turn should be equal what we're gaining back from the strand So the smallest number sorry you should then be equals to minus n f beta So the smallest n I can allow is when you that's you the energy the free energy of the turn divided by Minus the free energy of putting something in a helix The science here just had to do with how we designed the science of the various components originally So what I'm saying here is that the case where the free energy is at least plus minus zero here That's the first case when I will be able to add one of these So I'm interested in what is the smallest length here the smallest n Here that would allow this to be at least plus minus zero Now if you make n even larger here, it would be very negative That's fine, too, but I'm after the smallest length that we can allow So in the smallest length we can allow That would be when the total delta f would be plus minus zero So the part I'm paying I notice I'm not talking about the energy versus entropy now But there is one part that I'm just paying I'm paying the free energy of the turn And I'm gaining back the free energy of putting them in beta sheet So in this case f beta would have to be negative. Why? Otherwise these residues otherwise these residues wouldn't like to be a beta sheet in the first place, right? But that's also why we end up with extra minus sign here. Otherwise the number of residues would be negative at the end So when these two are equal, I just think u equals minus n f beta and That means that in the case where that is exactly plus minus zero if u equals n Multiplied by minus f beta I can solve for what the number of residues in that case just by Divide both terms with minus f beta So the smallest number of residues that this is stable for is the energy of the turn divided by minus the free energy of Putting the residue on the inside of helix that's just pure mathematics very simple formula I just say that that equals that delta f prime equals delta f bis So all that gives me is this extra help we needed now We have an expression that how what is the smallest length of the beta sheets when this can happen So did you follow that? It's a model. It's an estimate, but now I know roughly how long the parts can be So now I can get back to what we were really after Identifying this transition state So we're gonna use the formula on the last slide in a second But if you now believe me that the worst possible state on the way to forming something larger That's going to be when we have one strand one turn one strand one turn So that's going to correspond to the energy of making the first turn and Then we have the two strands and that's two number times the number of residues in the strand F beta plus delta f beta plus the second turn So why is it important to have N min here? Why couldn't I just have any end? Exactly right Because that this kind of this might very well happen for a much larger end to but that would be a higher barrier We're only interested in the lowest barrier because that's what's going to dominate the time that there are other secondary barriers That are even higher we could not care less and The book actually then goes into quite some effort to prove that this is the lowest possible transition state And that's just because if there was some other even lower energy barrier Then it could happen even faster This is a general problem when you're working with transition states You can unless you can prove that something is the lowest state all you can hope for is that you found the lowest state But what that means that if there is an even lower states that we ignored that just means that the process will happen even faster So this will just prove that it can happen fast but then We know a couple of things from the last slide We know that you Was equal to minus and then F beta right That's what we derived on the last slide So we have one you there and one you there and we have admin there It's kind of irritated to have this admin because there should be number of residues that's going to be virtually impossible to measure So what we would like to do here is kind of try to get rid of the admin So what you do is you basically use this formula twice. So first we have n men equals u minus F beta Well, that means that we can take that admin and replace it with a u right and Then we use the formula once again And if you use the formula twice you can actually show that you'd up with this expression Not super difficult mathematics So once you remove that end that means that the free energy barrier here that's a function of well What is it a function of let's see? So it matters on the energy of the turn if we increase the energy of the turn is we're going to increase the barrier if We increase the energy This delta energy that we had on the sides Then it's also going to be slower that also makes sense because it's if it's if it's really really bad to be in this herpin states This free energy barrier is going to be higher But then in the denominator we had this what is the stability energy of being in a beta sheet? so the lower that one is Then this term is going to increase in magnitude and that means that the free energy barrier goes down Does that make sense? Right, so the more things like to be in beta sheets the faster this process will be But the worst the transition states is the slower it will be Can you say something about this already at this point? Is this an easier or more complicated relation than it was for an alpha helix? Well for an alpha helix we ended up with an expression saying that the free energy barrier was roughly this initiation barrier Right and the initiation barrier where it between say minus two to plus three to five k kals per mole So for an alpha helix this did not really vary a lot This is going to vary a lot and That was kind of what we were after when it came to understanding why beta sheets could be so Have a remarkable broad spectrum of errors So let's let's do the calculations You know this There is going to be some sort of time constants We don't care about that and here's here's where you're going to see that it makes a lot of point that we don't worry about it and Then there's going to be an exponential raised to plus this free energy barrier It's roughly the same thing here. You can initiate the beta sheet anywhere and in this case is going to turn out that the initiation is entirely time limited and The reason for that is that this f hash the free energy transition state If you start to have a very good stabilization barrier here because it occurs in the denominator You're gonna take the free energy barrier down to almost nothing right away Same thing here if this starts to become very small or even have the opposite sign This one will grow almost without limits So you're gonna kind of have a very very strong Exponential relation that's some sort of constant We don't care that much about that constant because that constant is going to be the turn and the edges But because this now occurs in the denominator in the exponential if the stability here is good You're gonna form in no time whatsoever if the beta sheet stability starts to be bad It's literally gonna take you forever before this forms So compared to alpha helices, that's also why I could hand wave and say that the rescue doesn't really matter It's a couple of milliseconds well up to a microsecond maybe a millisecond, but it's fast The only question for alpha helices where it is faster super fast For beta sheets all bets are off You can easily have a handful of residues that don't want to be in a beta sheet and then this can take weeks This means that if a sheet is on well I wouldn't say if sheets are completely this is a bit wrong if sheets are completely unstable They wouldn't form at all right, but if sheets are marginally stable That the total free energy of the entire sheet is just so slightly below zero This can take almost forever before the beta sheet forms and This makes them very hard to do experiments on even in the lab It's hard to say have we reached equilibrium have they formed yet. Yes So sorry, so that's what I said that why I was a bit sloppy here if they literally are unstable It wouldn't form but if they're weakly stable or marginally stable That means that the total free energy is just so slightly below zero They will still form, but it will take forever to form For alpha helices, it's a bit different if alpha helices are just so marginally stable They would still form very quickly, but they would likely unfold too so for alpha helix So here you see the difference between barriers and Trouse in the free angel landscape Both for an alpha helix and a beta sheet You would have a delta G that can go if there's an alpha helix you would start out here Start out here, and then you have some sort of barrier and it would be lower there and that pen sucks For an alpha helix you would start out there having some small barrier and then be better to be in the alpha helix For a beta sheet you could have something like this too, or you could have it looked like that So that free energy difference here can be very small in both cases For an alpha helix what would happen they would still form fast, but it would also unfold relatively fast What would happen for a beta sheet is that you would start out here It would take forever for you to fold once you have folded though You can be super stable because it would also take you even more forever to unfold So here it's an example that the free energy difference between the stable states is the same But the free energy barriers that determine the kinetics how fast it happens are vastly different And here you start entering into some philosophical questions if this takes Four weeks it will still happen, right? If it would take 400 years Would it still happen? Well technically it would in the lab, but the problem in your body You would die before it happens. So does that mean it can happen? You can't really define it, right? So normally when we speak about stability and that's that's why I wrote down stable here So normally when you speak about stability you think of thermodynamic stability Equilibrium that means that your this is a more stable state than that one stability only if you're a physicist thermodynamics You're only looking at the world at equilibrium. You only care about the local minima in free energy thermodynamics Stability, it's a beautiful world. It's simple. You only have to care about the Boltzmann factors for all the local minima The problem is at equilibrium. We're all dead So in practice we're gonna need to care about kinetics and You can also think of things in terms of kinetic stability then barriers are so high that in practice you can't get over them Because if this barrier is so high that would take you 400 years, you could effectively say that you're stable here, too, right? Because you would never ever have enough energy to go across the barrier Because that corresponds to walking straight through a wall So you can talk about both terms of stability and we very frequently do and this is actually a great example Because I wasn't intentionally stopping here. I'm when I say unstable sheets here that sheets that don't really appear to want to form in practice So here I use the concept in terms of kinetic stability and the problem is we need to account for both So if you want to you can compare helices and sheets a little bit more I'm not gonna really focus too much on these phase transitions Again if you're a physicist this is kind of fun, but I want to head on to talk about real biology and I we should have a Break here in a second So the differences in this folder that is that alpha helices end up having much longer barriers that they can overcome very quickly The free energy of beta sheets is not just a physical Fun idea so here we actually started getting into disease This is super important for lots of things in particular this kinetic stability and that's the last slide I want to show you before the break that We know nowadays that there is a fairly large number of proteins that are diseased called by protein misfolding This is very strange when we first saw it. So Anything you know got feeling based on last semester at K. I how does disease happen? There are two traditional ways you get disease External factors ever forget about primes. I think in the old beautiful world before we know about this thing There were viruses and bacteria, right? There are all the only two disease agents we have Either it's a virus or a bacteria and when you get a new disease the first thing to determine is it a bacterial infection or is the virus it And what happens on 20 years ago? It was really scary that you've started to see a first couple of handful of examples of disease That led to very severe neurological diseases and there was some sort of infectious agent that we couldn't identify I Can you mad that again because this is based on over 100 years of experience that we know what the infections agents in nature Or and suddenly there's a new infectious agent Even if you heated it to 100 degrees centigrade you frequently didn't destroy it So this the amount of scare we had the entire world had because if there is something new Is it something new type of virus or bacterium that we can't identify they would basically uphold everything that we know about life science and And what Stanley Prusiner and a bunch of people were able to prove that this is what led to the entire Well, what we now What they done to the term Kreuzfeld Jacob disease and what became much more famous at the mad cow disease Is that this has to do with misfolding of proteins proteins that appear to have two stable states and this two stable states Really corresponds to these very high free energy barriers that you have proteins that normally would be in one state You could even say that this is the state where they that's biologically active But this state might not be the lowest free energy It's just the lowest free energy in the cells that would be stable here for a hundred years. So what's the problem? We're perfectly stable here. We can't ever get across that barrier That means you can ignore that state right because you're kinetically stable But what if somebody Who doesn't want you well? Created a catalyst So what if instead of the green path you could now take a blue path here that are much lower free energy barrier? That isn't doesn't require energy at the catalyst doesn't use energy Just some way you get rid of that really bad free energy barrier Suddenly you would move over to this state And if that is now a disease state when your proteins don't work. Well, that's gonna be pretty bad And in particular if that state can't even be degraded by your body because it's super stable You can't really unfold the proteins and recycle them This appears that all and we know that this happens for a number of proteins today. They are all related to beta sheets So what happens is that somehow one way or another when you have more of these beta sheets available in your body? They appear to catalyze the expansion of more beta sheets Exactly this way and that leads to misfolding of proteins And this is a gross oversimplification But the problem is that because you can't degrade these proteins what now if you take this probably happens in your body all the time But it's a very small fraction of your proteins that undergo this And again, sorry to say it's been a hundred years. You're all gonna be dead So it doesn't really matter if you have a small build up of these proteins if it's 0.01 percent of your protein population It's not gonna influence how your brains work Because you will be dead before it happens But of course if this happens naturally in your body and you can't degrade it What now if you take these parts of the body and feed it to another animal? These proteases they are what you have in your stomach to degrade amino acids So suddenly these beta sheets or plaques Can't be degraded and Now you get these by the time you're 20 or 30 you start having a relatively large population of these misfolded states That in turn will catalyze more misfolded states and Suddenly you will start building up these plaques faster and it will hit you when you're 40 50 And that was the entire origin of the mad cow disease that they fed cows cow brains Very bad idea. There are a bunch of other diseases kuru for instance of cannibals and everything. It's bad to eat brain It's probably legal too, but It's not just this but that we know that I'll There is a lot of debate today, but we know that there are a large number of diseases We end up having these plaques that appear to be all beta sheets Having said that in a beautiful simple world It should only be we think that these plaques are causing the disease There is Unfortunately increasing evidence now that they might just be a side effect in lots of diseases like Alzheimer's So that they're important, but we don't know what they do But it's definitely related to protein misfolding It's one of the really big unsolved classes of diseases in the world. Yes So I'm not sure I followed that question again. So you had many of the proteins and they form in this misfolded states and Your question was You might be able to but the problem is if this is a very This might not be a good example You can imagine if you take this green curve and extend it all the way down here instead, right? If this is now a super stable awesome state It would cost you a lot of energy to move this up across this barrier My gut feeling is that the protein we might be able to digest a small part of them, but not enough There's the proteins are too stable and this comes back to the thing that I've talked about before from an evolutionary perspective Proteins should be stable, but not too stable And this is an example of proteins that become too stable. I Held you a long time here. It's 20 to 11 I would think about this and then we'll meet at 11 sharp and I continue and I So I spoke a little bit We spoke a little bit about misfolding of proteins and there was one thing that none of you called me up. I Just violated something very important here that I told you about last week There were two fundamental people we talked about in the 1960s that said some very important things about proteins and native states Who were those people? I'm Vincent and the other one was So what did I'm Vincent say? Well, not really a sequence of structure to function. That's the central dogma I'm Vincent had a much more specific statement that he got the Nobel Prize for eventually folding can occur spontaneously and If you formulate that in terms of free energy So I'm Vincent said strictly that the native states corresponds to the global minimum of free energy It's free energy and it's global and the native states So what is a native state? Actually, that's actually pretty good definition, but But I would argue I would actually argue that it's wrong So what is the native states? Native someone means natural, right? So what is a natural state of a protein? The biologically active states and I think that's what I'm Vincent originally meant in the 1960s, right? So we're now saying that the biologically active state into which we will fold the protein should correspond And that's a very biological concept The really cool thing I'm Vincent did he's translated this to a very physical concept that that corresponds to the global minimum of free energy But what have we just said here? So what's happened here that this would be the biologically active states the native state But that is not the global free energy minimum So the global free energy minimum is some even deeper state and this is not specifically for a prion But what this could correspond to is that this could be the biologically natives the native state the biologically active one It's certainly a very low free energy state But if you wait forever There is a very high barrier that you might be able to get over and reach an even lower free energy state Which is bad because this is disease related and this violates and Vincent So what we're now saying is that native states does not correspond necessarily correspond to the global free energy minimum But the native state is kind of kinetically stable as I said before your break, right? Normally under the time scales we look at If you don't eat brain You would stay in this date So for all intents and purposes in your body, this will appear to be the most stable states Just because it takes too long to go over to the wrong one So today we frequently talk about folding funnels or something that the native state is a it's certainly a It's a local minimum in free energy. It's a very low free energy But it's the lowest accessible states There might be something even lower that we can't get to under normal circumstances. We ignore those Now the question is how common this is going to be does it happen one in a hundred proteins one in a thousand one in ten thousand sequences If you just randomly created sequences, how frequently do you think this would happen? We'll come back to that later on in the course rather than saying that you have a Correct and an incorrect state you can think of this is a protein that has two states, right? Both of them are really good. It's just that one happens to be slightly better than the other So the normal notion is that a protein has one stable state these proteins can have two stable states It just happens that one of them is a bit bad and it's going to be rare but Not impossible. So that brings us We've started both alpha helices and we've started both beta sheets But in particular when it comes to predicting structure, I frequently talk about the unfolded state or the non-native states So when it comes to understanding proteins in general, there's one more state that we have to understand a bit And that's the the state that is neither helix nor sheets but really the coil or the unfold state the completely random states and This is a bit complicated. What can you say about a completely random state? Well, if the free energy was Nick, well, so first a negative compared to what? Absolute free energies are difficult So that depends on whether the reaction will happen or not. So we can necessarily say something about the specific free energy of the state but When I think of unfolded states, I frequently draw them this way This is very bad. That's not how a protein is going to look like in solution How many stretched out? This means that all the myra mesh all the torsion angles had to be stretched out There are very few states like that Most states will rather look like something that I like that I Need to steal more pens Most states would look like some piece of jarn or something that you just throw on the floor If you throw a piece of jarn on the floor the likelihood that it will be stretched out is zero So one very important concept both for proteins, but even for DNA is really what is the average end-to-end distance between chains? Can you think of something else you would like to use that for so we use this a whole lot in size exclusion? experiments chromatography or something so if you have a gel or something that has a Very fine some sort of pattern inside the gel and if we now have and know what is the average size of our gel That means the chains that are very long they will get stuck in the gel the chains that are smaller They will go through the gel quicker, right? So lots of cases is going to be important for us to can I somehow relate the length of either my protein or a piece of DNA? to how large it will be on average and It turns out that this is a really complicated thing initially and then it's going to turn out to be really simple So if we forget about a protein a protein would be super complicated with all our torsion angles and everything But to keep things super simple. Let's say that each amino acid is really a bead. That's a point and Then we assume that you can turn so we start at one amino acid here And then I move to the next one I can place the average distances between two amino acids is roughly the same and You can calculate exactly what it is But we can just say that that's some sort of length R or L and The second you've moved from one amino acid to another you can move any direction you want So to first approximation, let's ignore the fact that you might overlap or something If you work with vector analysis This is actually easier, I know not all of you have But if you start here and then you move all along this chain You can say that you go from there to there. This is a vector h and This vector h you write it in fact because that just means that in this case, it's two dimensions You could imagine it being three dimensions to This is going to be a sum of the motion we have in the first amino acid and the second and the third Etc and each of these is also vectors. So there are small arrows pointing so we just sum up all these arrows and Then I want to calculate what is the average length if you go from the start to the end because that's Going to correspond to the end-to-end distance of either my protein Well, I can calculate that too, right? That's going to be The length here is going to be 8 squared. I will take the square root of it in a second But that length corresponds to taking that sum and I don't care how many residues I have here and square the entire sum and if you know your Mathematical laws here, there are going to be two parts of this first you have that means that every single term In this sum will have to be multiplied by every single term in the sum, right? So there's going to be one term here when I multiply each vector by itself and Then another large double sum when I multiply each vector in pairs when it's not the vector with itself And that probably doesn't appear to make a whole lot of sense to you right now But that means that in general if this what I had on the previous slide This was just one specific confirmation of the protein in practice the protein will move So if I now take the average of this is going to be much easier So the average and the average here, I denote with these big square brackets. There should be a bracket there, too That's then going to correspond to one part that comes from the average of the Multiplying the vectors length with itself and one parts that the average of multiplying one vector with another so Right now. This is a super simple model If I just have a forget about residues think of this as a mathematical sequence of beads Where I have some sort of length between each speed That's R and then I can place each speed in any direction from the previous speed You can think of this is a super simple model for protein or a super simple model for DNA and The length are here that could correspond to the distance between two amino acids or between two bases But at this point, it's just a super simple model of a chain. That's completely flexible This is of course a horrible approximation because you can't turn any in a protein You can't place any amino acid any direction. We won't write they can't overlap or anything But don't worry. Well, I like simple models remember This average as first side sounds really complicated But if you have an average of a sum you can take the average of the first term and then the average of the second term And I can always move an average inside or outside the sum. That's the same So I'm going to have one sum here. That is really the sum of the squared lengths And this doesn't really change because this is the lanes from one amino acid to the other If you know the average length from one amino acid to the other we know what that is So that's just going to sum of the squared lengths of the amino acids And then this one looks really complicated Have you calculated products between vectors? So when you matter my if you have one vector here and One vector there That's the product here is going to correspond to the length of that vector Multiplied by the length of that vector and the cosine between the two vectors here But the direction here can be absolutely anything What's the average of cosine of any angle if you move the cosine over an entire period it's zero so if you randomly Because this basically if I start at one point and I randomly move one unit in one direction And then I randomly move one unit in another direction on average. I won't have moved at all So we can forget about that entire term So what this means that the average The square average length of any type of this small self of any type of small chain That goes as the square of each element Multiplied by the number of elements But here I had this square of it, right? So that's if I want to calculate what is the average length of my entire chain point-to-point Then I should take the square root of this You follow me So that the average Length of a chain from the beginning to the end the average point-to-point distance will actually go as the square root of the number of residues And that's why I don't really care that much about what the individual distance between two residues are That's going to be one value in a protein another value in a DNA chain But on average if you take a protein and makes it twice at large It's going to be roughly one point fact one point four greater end-to-end distance The average volume then is going to increase as Well, if that is the point-to-point distance the average volume is going to be roughly that to the power of three Now in practice, this is a horrible model because I kind of assumed that you can turn any amino acid in any direction I also didn't account for the fact that they can overlap and everything if you want to do this accurately You can calculate some sort of Effective distance from one amino acid to the other and you can even try to assume that you can rotate around bonds But they can't overlap the book goes into some details there But the only reason why I bring this up is that this won't change anything fundamentally to first approximation Oh, no matter what this segment side the size of these segment is it's still going to go as a square root of the number of residues But that's pretty crappy Because what else did I ignore? So what happens if this residue starts to go back and overlap on the first residue will that be allowed? So the problem a real chain had to be self-avoiding How do you calculate that? That's going to be super complicated, right? Because if you have thousands of residues here for each residue you will now need to account that it should not overlap with the residue previously in space and This comes back to some of the very first origins of computer calculations self-avoiding chains So sometimes in the roughly raw after World War two people started to look a whole lot more into polymer chemistry So polymers are much simpler than proteins, but I have alluded to already proteins are polymers So the whole lot of the modern knowledge about Proteins they really started out with simple polymers and in particular there is a there is a group here that pretty much funded the entire field was was Paul Flory and Paul Flory really solved this problem the really cool thing. He solved it analytically There's a huge amount of mathematics involved in that and rather Saying that this goes as the number of residues the square root of the number of residues He showed that it's actually the square root would be n to the power of 0.5 And he showed that this is a very complicated expression that evaluates to roughly n to the power of 0.5 88 And then lots of different digits there So it's a very small correction factor So Paul Flory got the Nobel Prize not just for this expression but his entire formulation of modern polymer chemistry and polymer physics how polymers behave and Paul Flory's work later becomes some of the work that In particular in Israel People started working with proteins and understanding how proteins behave and a particular good group called Schneer-Lipson Schneer-Lipson never got the Nobel Prize, but his disciples Mike Levitt and Ari Warshall and Martin Carpenter's they got the Nobel Prize just three years ago for much of their studies of Simulations of proteins and everything so there is a very continuous thread how people are worked all the way from physics over basic chemistry Understanding how chains behave and then predicting proteins So that almost completes the theory part Or rather it doesn't at all this completes the heavy boring mathematical part, but now we can start looking at real proteins Because your advantage is that you can stand on the shoulder of these guidance that has spent 50 to 70 years understanding of proteins behave So if you start this is a general protein There's a surface of a protein Colored by the charges and I think every small red cross here is a water oxygen Now we can of course starts looking more into that and Consider the secondary structure elements and how say how the different helices here are stabilized. Why do we have helices here? Why do we have beta sheets up here? We can go down even further and rather than looking at those secondary structure elements We can now start to say so what are the electrostatic interactions between all these pairs Why is this water stable in particular that location and on this level? We can start to calculate each and every interaction between every all the atoms in the systems The difference is at this far we work with paper and pen and try to understand this conceptually, right? But it's not really more difficult to do it here rather than working As some of you had questions about when I've derived the stability of amino acids I've talked about an amino acid kind of an average amino acid with average properties, right? But that's just because for you if we needed to do this derivation and take into account that amino acids had different properties Well for the beta sheets, we would have 20 F beta 20 delta F beta and I would have had at least 20 use More than that because there are two or three residues involved in each turn, right? So wait, so we would have had 400 use Then we would also need to take into account how amino acids interact with each other. It gets very complicated very quickly But today we have computers for that So once we have a computer we can pretty much take every single position at every single atom here And just you know what let's just include all the atoms What is the interaction of let's see that hydrogen bond? What are the Lenard-Jones interactions between those atoms? What are the? Electrostatics, bonds, angles, torsions, everything here Because suddenly we can evaluate all the energies You can ideally sample multiple states and then all the things that we've now gone through on a conceptual level You can just let the computer solve this and then make predictions about proteins So this is partly very similar to bioinformatics in the sense that we use computers The part where it's completely different from bioinformatics is of course here We rely on the laws of physics and chemistry Rather than just genetic information So in practice, what should you use? both yes and There is not really If it comes to predicting the structure of an entire protein trying to do this from Abinitia ground up. It's going to be insane. It will cost too much. It's not going to be accurate On the other hand any bioinformatics predictor you use You don't really get the three-dimensional structure that the three-dimensional structure you're picking some parts of it up from the P to B But at the end of the day, you will still have to energy minimize it and get the structure to be stable so I would argue that anybody building a predictor server or Doing a project in practice you use both But for slightly different reasons so what I'm going to argue is that To understand this for any type of system We just really need to define what all the interactions in the systems are and then we know what all the energy terms are We don't know what the free energy term is because we don't have the entropy quite yet. So I'll get back to that in a second These energy terms might look like complicated mathematics, but you have gone through each almost all of these in the course This is why I spoke about bonds This is angles and the torsions and these are just really simple mathematical forms how we formulate them Exactly what these are just both of these are the simplest possible harmonic interactions A torsion should somehow be periodic so expressing that as a sequence of sine or cosine functions make sense But again, this is simpler than you think there is not the single Quantum mechanical part here the improper torsion here That just means that if you have a sequence of four atoms occasionally used to say that these four atoms should be in a plane We can describe that too with a simple harmonic interaction And then we just need to have a letter for to describe what our angle is and just to make this a bit fancy I can pick a fancy Greek letter But this just means that say there is something here that depends on how four atoms are placed relative to each other And then you need electrostatic interactions and we have our Lennar-Jones interactions So why don't I use quantum chemistry here? It would be expensive to calculate, but This goes back to the drunkard looking for his keys under the street light and somebody is helping him to look for it Right. I said why are you looking under the street? Well, I dropped them over there, but it's I can't see over there So I'm looking under the street light. It's completely irrelevant that it would be expensive to calculate If we're picking some bad form instead of the correct one just because the correct one is expensive to calculate So you're certainly right that it's expensive to calculate, but we also have to argue that we don't need it In some cases you do need it. The problem is what we don't have on this slide. So here. We don't have entropy, right? So the only way to account for entropy is that we're gonna need to account for the fact that proteins do not just exist in one microstate and the problem with quantum Mechanics is exactly what you did in the lab on Monday. You assume that there's only one microstate of a structure That's actually it's even worse than what you did on Monday because on Monday you worked with multiple states in the lab at least, right? So quantum chemistry corresponds to picking just one of those states and calculating the energy perfectly in that one state but ignoring all the other states and That's pretty much everything we've gone through the last couple of days has to do with for biomolecules Both the challenge and beauty of them is that there are many states and we have to take the free energy into account Otherwise there is no interesting biology. Yes Yeah, so I would say we cheat This effect that atoms can't overlap each other versus that's really a quantum mechanical effect But we approximate this by parameters that we get from experiments instead But the main thing here is that we choosing there are very few cases in proteins where bonds form and break During the folding of a protein disulfide bridges is the main exception But in general proteins they stay we have the bonds we have It's just a matter that we need to place the protein in different conformations and we need to understand how advantages or disadvantages all these different conformations are and Here's the problem. You could definitely imagine a State that you have some states that have the same energy, right? But in one state the molecule is much more free to move that state is going to correspond to more microstates It will have higher entropy You're not going to do a lab on this quite yet, but eventually we're going to start doing some laps on this The problem if you're going to simulate things it gets much more complicated because you can't just have something in free space To simulate something you're going to need to have water around it How much water? Well an almost an infinite amount of water if you want to simulate a test tube You're going to have 10 to the power of 25 water molecules in the test tube You can't simulate that in a computer completely hopeless So in theory you could imagine could you take a small molecule and maybe have a sphere of water around it or something? You could that too, but then you would have a surface tension in the sphere and that's also not good So what you can actually do is that you can cheat a bit So let's pretend that our system is a small box and When this water molecule on the left there is about to escape on the left what he really does it He comes in right again on the right So you're essentially creating a periodic system and if Water is pretty good at screening molecules, right? Very high epsilon So this molecule will probably not see his neighbor here or here the water is gonna if you have just a little bit of water The molecules won't see each other So this makes it possible to put a more small molecule in a computer simulation and calculate exactly how all the water will move And a lot of the part about us understanding our hydrogen bonds form today For instance, it's actually from computer simulations Because in a computer simulation we can see how all these waters will orient around a small molecule here or Forget about the small molecule. We can take an entire iron channel You can see how this iron channel open or closes how the ions will go through the iron channel These are a scales that are much smaller shorter than the experiment. So they're extremely limited But it's pretty cool because you kind of have an atomic level Computational microscope so you can look at molecules. You can put a flag on every single atom and see exactly what the molecules do So what's the danger with that? It looks really beautiful, right? You can a super fancy rendered images It's a model as prediction is quite a it's a model, right? Don't fall in love with your models because they could be wrong and As beautiful as these things are in particularly if you have a DNA you can make a super fancy system with DNA and everything The danger is that it will look so good that it's obviously right, but what if you made a simple mistake in your model? So don't confuse your models with reality. They are amazingly useful, but they're not the same as reality But the other part we're gonna need to look into this We're gonna need to account for this entropy and the cool thing is in a computer simulation We can do that. We can't do it in plain quantum chemistry unless it's time dependent But in a computer simulation, we can give the small chain a finite energy so that it has an energy corresponding to 300 Kelvin And then they will as you see it's moving around testing a couple of different confirmations, right? So with this molecule is now happily doing your molecule is exploring the free energy landscape here It's going to spend a lot of time in the blue parts here. It will spend very little time in the red parts here And if you set this up correctly, it's going to sample a Boltzmann distribution This is a key hang. What does a simulation do? I just said it but I want all of you to think about it So we'll come back to this in a second when it comes to what is that really that you're predicting or simulating? A simulation Samples the Boltzmann distribution You will pick states from your simulation That again within some statistical noise the longer you simulate the better a sample you're going to get of the Boltzmann distribution That's all I say that's all a simulation does for you, but it's also something amazing Because of course if you can sample your entire space, right? That's the partition function If you sample things well enough, you know everything about your system And on the atomic level I can calculate NMR parameters I can calculate x-ray scattering since I know everything about the atoms I can't calculate how any experiments should behave and predict what the experiment should look like So if I just have a computer that's powerful enough, this is amazingly fancy And that's of course the other key here Are computers fast enough? Or let's be formulated this way where computers fast enough in the 1960s when people started to formulate things So that's the horrible part, right? These things have taken decades to mature because people started doing this probably 40 years before computers were fast enough to do it And for a long time many of us felt that there is no way we will ever be able to do anything interesting But computers get roughly twice as fast every 18 months And there have been a lot of 18 months periods since 1960 So every decade we gain about another factor of 100 here There is no other computer laboratory technique or something that gets a factor 100 better every decade So even since there have been two decades since I was a PhD students So we were about 10,000 times better than when I was a PhD student And by the time you were my age, we're going to be a factor 10,000 better than we are today And that's why it's worth to spend some time with these, right? These will gradually these are things are gradually replacing lots of experiments There are two things we're going to do though One the first thing is that before we get to a simulation a problem here is that The model you get whether you get it Let's actually assume that you've done bioinformatics So you have now built a model based on your bioinformatics predictor And I hate to break it to you, but your bioinformatics predictors are not going to be perfect So you will may have some you will have made some small mistakes You might have placed say Whatever small C o h whatever Let's just say that that hydrogen is that oxygen is now very close to some other oxygen there So that unfortunately that hydrogen bond is not paired up and these two atoms are just a bit too close to each other Tiny amount 0.1 angstrom or so The only problem is that there are so many interactions in these systems So when these atoms starts to overlap and collide The force between these atoms will be so large that your protein would basically explode if you try to simulate this So we somehow need to fix up all the errors in proteins And there's any predictor you have used the predictor has done this internally So to be able to start to simulate you don't want to start out in that really horrible transition state Where you should ever be you want to be at a normal state that you would expect to see at room temperature We call this what we simply call that this is energy minimization. It's a really stupid name Because energy minimization would really correspond to be that we would like to find the absolutely lowest energy We don't care about the lowest energy. Why don't we care about the lowest energy? So if go back to it shouldn't the lowest energy be the native state I'll ask it again. What is the native state the lowest? Now, is it the lowest energy or something else? If we're not sloppy with our definitions Free energy, right? So energy and free energy is not the same. The only thing I can calculate is the potential energy I can't calculate the free energy So worrying about finding the lowest energy that's just worrying about finding us the lowest for half of your equation completely irrelevant But the problem is that I don't want the this is not really energy minimization a proper name for this to be energy maxima energy maximum avoidance I don't want to be in the part if you're going to simulate my walk a bad initial condition is to place me in the middle of the wall Because it's not a realistic state. So we need to get away from these really horrible states There is a whole range of fancy algorithms you can use for that in mathematics In practice, we don't really care because we just we don't want the best possible minimum. We just want to avoid the maximum So if I put you in the alps and I tell you that you should not be at the peak of a hill what do you start doing? Yes, you go downhill So that's actually exactly what we're going to do here, too So if you start anywhere you are here you check And what if what is the direction where I'm going most downhill? That's the gradient Take a step in that direction Once you're taking a step in that direction look around Which one is the direction that to keep going downhill as long as you go downhill you will keep going downhill You will only reach a local minimum, but it's going to be better than where you started. Yes Exactly. It's the same thing that and the only way to find a global minimum would be to test every single local minimum That's a super hard problem. And it's a problem. We're not interested in I just want to avoid the really bad There are Some a bunch of algorithms that you can use to do this, but this is just a mathematical formulation of exactly what you said I start at any random point And then I determine in what direction that's the gradient point So what direction is the most steepest downhill that would be that arrow Then I take an arbitrary step here And how long this step is well, that's up to you. You can let the computer decide it or just pick a constant step Now I'm here and then I do exactly the same thing and in that case Pointing downhill would mean going in that direction and then I take a step in that direction And normally you let the step length be proportional to how sharp it's pointing downhill So if we're going a lot downhill I can take longer steps And this will eventually lead you to a local minimum And by eventually here means that in today we spend a couple of minutes on this and then we're happy and then we interrupt It's because I don't care about reaching the minimum. I just don't want to be at a maximum Yo today's first pendrive What do you think this does to the structure? Did you just see this moving? Hardly right. I'll show it to you again So here's where we start from this is a structure that I actually got from the protein data bank But it could have been from a predictor to Say three two one zero minimize You hardly change it at all, right? There are some side changes that move a little bit You might be packing things a little bit. I think you end up extending the secondary structure just so slightly But this doesn't change the protein at all And that's why all your predictors tend to do this so that you have a reasonable starting state So the whole point here is that you change the coordinates by a fraction I can angstrom these changes are so small that you can't really This both the initial and the end state here I bet they're roughly both have both the initial and end state. I showed you here Are likely just as compatible with the experiments you can't tell which one of them is closer to the experiment So you fix up local mistakes with energy minimization The reason why I'm telling you this this is something you're going to be doing either in bioinformatics predictors or in simulation You do this a lot And what you do is that if you if you start from the red ones here the The error so that the the green one the red one here is you start from the structure that have been a bit distorted So that there is further you have an entire distribution that zero here would be in the the native state And if you start to minimize and you move them all just a little bit closer to the the native state But you won't be able to get all the way there Why can't I get all the way to a native state just by energy minimization? That's the most important thing first. We have tons of barriers, right? And I can never going across a barrier would at some point mean going uphill I can't do that to energy minimization. What's the other reason? That's what you just told me a couple of well two three slides ago What am I minimizing here? energy and the native state is defined by the minimum is in free energy So I can't get a minimum in free energy by minimizing the energy No matter how much computational power I throw at it And that's why we're going to need the simulation, right? Because what the simulation will do that will sample the entire phase space as part of the phase space We're going to sorry phase space is just a physicist way of saying every possible confirmation And every possible velocity of atoms it samples all the confirmations that are possible Now you're going to spend most of the time there in the very Low parts in the of free energy and you're going to spend less time in the high parts of free energy. Why? So what did you find out in the lab on monday? Björn told me that at least 70 or 17 of you handed in the report by yesterday evening. So that I think you've done it What did you find on the lab? Where do you spend the most time? In the lowest energy states Now of course on monday you cheated because that was only energy not free energy this afternoon You're going to do it for free energy and you're going to count for the fact that's it So the it's not just a postulate that a computer simulation will tell you the truth The reason why we know that if we as long as we just sample this according to correct laws of physics Most of the time we will spend in the lowest free energy states But the cool thing in the simulation you're not going to get the lowest energy We you will actually see the protein move and if you somehow sample higher free energy states you will see that So then we're in an amazingly beautiful world, right? Assuming that we had a computer that was infinitely fast And then you just take the exact structure of the protein from the protein data bank Or a new structure and then you could simulate exactly how the protein would move, right? Do you see any problems with this? So let's make a very simple simulation here Anytime you're going to simulate anything or predict a motion. What does that depend on? You have made some assumptions, right? So we certainly have a model how our interactions are So we we need to hope that all these energy terms we have are good enough in some way Let's assume they are let's assume we have a perfect if we have an infinitely fast computer You could assume that we did this with quantum mechanics But there is something else that matters the initial states, right? So this is a pendulum. Um, this is kind of well simple version of pendulum If this is my initial state, what's going to happen? So in this the coordinate the coordinate of the pendulum is that it can move here, right? If this is the initial state The coordinates is there nothing much is going to happen It's also you can also say that this is before my simulation happens So I could also say that I start here, but I actually have given it the velocity So then it will so there are two things that matter the initial coordinates and the initial velocities So just knowing the coordinates is not enough. You also need to know the velocities So in what database do you find the velocities of a protein? You don't So that's the other problem in general of a large system with lots of particles We can of course approximate the velocities. How can you approximate velocities of a protein? So the energy any molecule will have some kinetic energy, right? And that's the mass of each particle Multiplied velocity square divided by two and they use sum up this overall ice in the system And we know that that should correspond to temperature, right? So at 300 kelvin There will be a random distribution of velocities And this what you call the Maxwell-Boltzmann distribution in physics But if we know say at 300 kelvin I can assign this random distribution of velocities to a particle So I can pretty much give a protein random velocities But this leads to another problem that I can't just randomly give a protein velocities, right? Because that depending on velocities, I would give it it would go in different directions Yes So how how would you raise the temperature in a simulation? By adding energy, right and that too would have to be random So the point you can't know the velocities of a protein I'm going to make a small mistake here. Uh, sorry a mistake. I'm going to make a small experiment here I'm going to touch one of you. No, you didn't die Biology in general any type of simulation if I picked a different starting condition for my pendulum It would move in a completely different direction And for most physical systems, this is super complicated. This most physical systems are chaotic If you make a minimal change, this is the Lorentz attractor if you make a minimal normally this Particles here they move between these two halves of a mathematical expression But if you make a minimum change in either the position or the velocity It's going to be impossible to predict after a couple of turns how it will move And this is what's has led to all these popular science about that the Strokes of a butterfly in one part of the world is going to create a hurricane in another part of the world I have no idea if they've ever been able to prove that but The point is that infinitesimally small differences in initial conditions will eventually lead to large deviations To make this even worse if you're working with quantum mechanics You can prove with the Heidzenberg uncertainty principle that it's impossible to know both the position and velocity exactly at the same time So what physics is basically telling you that it's impossible to predict the future That's a bit of a bummer Um and also you didn't die here and you laughed at it That was a really dangerous experiment right Because you all instinctively know that biological systems are stable and minimal So essentially what we tried to do here. We altered your we altered the state the position and velocity of some of your atoms a bit It should be impossible to predict what would have happened So what is that we're missing here Probability is right. So this goes down to the Boltzmann distribution Uh, if you're looking at the solubility of a small compound in a test tube The exact confirmation of that molecule is not important because we're looking at average properties And everything in physics and therefore also chemistry and biology has to do what you call an ensemble property That this is just really a fancy word for average So the an ensemble of a molecule is that this small molecule moving over different parts in phase space Well, if you do measurements The whole point you don't just have one structure as you've seen in the lab You can have a range of different structures and the population in different states will depend on the energy And if I now make a measurement I might have 90 percent of my measurement might come from one state 5 percent from a second 3 percent from a third and then lots of other small states So any measurement you make in the lab is going to be an average of all the observations you have in the entire example In most cases This will mean that virtually everything you see are going to come from so low Level regions in the free energy and you're hardly going to get any contribution at all from the high lying regions But hardly it's not exactly the same as zero formally every single point for every single possible confirmation of your protein Or molecule contributes just a little bit But in practice you're going to have 99.99 percent from this part So to simplify our lives we can ignore everything else But that means that anything you do any way your molecules interact Will be an average property And the point is when I touch your hand is that Technically the molecules in your body ended up taking a slightly different parts in phase space here I need I need to have a tether of this But of course the average sampling of your states didn't change You spilled stent 99.99 percent in the lower level states, right? So any measurement any lowland and and this means that nature effectively dampens out all these small fluctuations It's not the path of individual atoms that matter And this is also why simulations work So what's and this is any I have no idea how many even professors think that you are That a simulation predicts the path atoms take You can't predict the motions of atoms What simulations do is that they sample parts So this is a small helix actually I think in water You run actually I'm just looping this around But the point is you can't it's not it's not even relevant to predict how the individual water molecules move here, right? But the average properties in them the average number of hydrogen bonds the average interactions with this protein is that is important And that we can measure in excellent ways in the simulation even if I don't know what the exact what the exact initial velocities were What's going to happen for these waters if I give them random velocities? Roughly five to ten picoseconds later the water will have forgotten that velocity Now of course if I assign a new set of velocities, I'm not going to get exactly the same simulation But the average properties in the simulation will be the same the things I can measure and the things that would correspond to things I could measure in the lab they would be the same For instance the number of hydrogen bonds But this means There are actually a number of ways you could do a simulation conceivably Because if we want to take multiple states into account I'm going to need to find a way to move from one state to another one, right? That's not an easy problem The only thing that we have to demand is that we need to fulfill the Boltzmann distribution I can't randomly pick states out of my back pocket You have to obey the laws of the Boltzmann distribution In particular this detailed balance that the probability of moving between states should still maintain the Boltzmann distribution That's what you've been doing in the lab on monday, and that's what you're going to do today So you just create a very simple way that if you are in one state Predict how likely it's going to be to move to another states And in physics you typically call that a Monte Carlo simulation You just move between states in some sort of intelligent fashion If each move fulfills the Boltzmann distribution The entire distribution you get is also going to fulfill the Boltzmann distribution The reason why we use that in the labs is that it's dirt simple and it's just one line of code You could imagine the problem if you were to do that for a large protein with tens of thousands of residues and everything If you randomly tried to move the protein to another conformation, what would happen on average? It would bump into itself, right? Or it would bump into water If you randomly tried to pick another conformation of the protein on average You would always bump into something else because most confirmations are not allowed. There are very few allowed confirmations So it turns out that for proteins and large molecules the best way of doing this is actually just to solve newton's equations of motions We know what the potential is and then I can just take the gradient of the potential to calculate Which direction would you like to move in? What are the forces? And then I change calculates. Well, the forces corresponds to acceleration Acceleration corresponds to change in velocities and then I can keep moving atoms a little bit So then it ends up looking exactly as if I'm actually simulating the exact motions of particles So that leads to the other part that is Completely dominant in how you actually simulate proteins or other molecules today. Yes. You had a question In the previous slide do you have something so cathodic or Brownian dynamic? I'll come to that in a second So normally what I'm arguing is that one really good way of generating states that corresponds to the Boltzmann distribution Is simply to calculate for each particle exactly how it would like to move based on the forces around you Now there are a couple of ways we can modify that. I can allow a little bit of noise or I can allow Noise either in the velocities or noise in the forces So this is just ways of you can actually use this a way to control your temperature For instance, if you're adding a bit of noise in all your velocities to make sure that you have an average velocity that corresponds to 300 Kelvin These are just Ways you can modify the dynamics And this is simple This is much simpler that now that you understand the statistical mechanics. This is a much simpler way than anything else we've done So all I want to do is given the initial Coordinates and velocities or something I would like to calculate how it moves We are well aware that there are errors in this But as long as the errors are small over reasonably long parts and these parts are realistic I should get good average properties of the atoms So the way I calculate these atoms is to use some high school mathematics First is Newton's first law If I know where all my atoms are I can calculate the potential And again in physics the force and anything is just the derivative of the potential with respect to position There are going to be lots of atoms here If you have 10,000 atoms there might be 10,000 squared interactions This is why you're going to need a computer. So it's an extremely complicated force to calculate on every atom But if again if I know what the force in every atoms is I definitely know what the mass of every atom is then I can calculate what is the acceleration on every single atom And if I know what the acceleration is well the acceleration is the derivative of velocity with respect to time, right? So if I know how the velocity changes I can calculate well that should change by say one meter per second per second Well take a small time and then I can calculate for one step. How much should the velocity change? Velocity though, that's the derivative of position with respect to time So then I can do the same thing again. Well over this step, how much did my position change? Now you just moved all your atoms by a tiny bit Because you're using Newton's law here, this is frequently called Newton's equations of motion simple classical mechanics You can find define this slightly faster with partial derivatives and gradients But this is telling you exactly the same thing Force is mass times acceleration and this force you get from the derivative of the potential with respect to all the particle positions This is going to be impossible to calculate manually, but trivial for a computer And what you get from this is that as a function of time you're going to get different positions from your atoms So let's do this The potential is the one I showed you before you know all these interactions And then we're having a small molecule that water molecule there and he's taking steps So these steps have to be in the ballpark of femto seconds I know it's amazing you We're understanding so much biology from this, right? It tells you everything about our proteins fault It tells you absolutely nothing about proteins fault. You didn't even this poor water molecule didn't even rotate 180 degrees in 100 femto seconds And this is where people were in the 1960s when they started out with this The cool thing today, even on a laptop, this is a rendered movie But this is roughly how fast I could calculate this in real time on a laptop Now I'm actually showing all the neighbors of that volume auto molecule at room temperature So in a couple of minutes on a laptop I can calculate a few picoseconds of real time So here you're seeing how all the waters are making and breaking hydrogen bonds Just using some very simple parameters from water. I can predict the density correctly. I can predict the viscosity I can predict the diffusion coefficients. I could calculate what the energy of water is. I could predict the phase transition of water That's still a bit boring because this is still physics or maybe physics borderline chemistry It's definitely not biochemistry But again, this was in less than one minute on a laptop People would have killed for this in the 1960s And with supercomputers or large things you can calculate how a large protein folds So when I was a little bit older than you are In the late 1990s Peter Coleman, they used something like six months of supercomputing time and they were able to fold this protein called the villain headpiece And this was one of those game changes because prior to that we weren't even sure whether it's ever going to be possible to fold proteins But they could actually fold this protein in a computer It's a real protein roughly 30 residues But you can imagine how much it costs to use six months of supercomputing time on one of the largest computers in the world, right? So when I was a postdoc at Stanford we started to play around with other ways of doing this The problem with doing this the the reason why I need a supercomputer for this Sorry, I'm going to run over just three four minutes today. So I can finish the slides here The the reason why you need a supercomputing for this is that you need to take first one step And then the second step and then the third step and then the fourth step and you need to repeat this like a couple of billion times But do I really need to do it that way? Because I'm not simulating the motion of one protein, right? I'm sampling states from a distribution Couldn't I do this with more simulations? You could right because you could have let you can give them initial velocities and instead of having one simulation You could have two simulations sample different parts in parallel and then they can run on different computers Then you're going to need two supercomputers Or you could have three simulations if you have three supercomputers The problem is that supercomputers get larger all the time, but it's virtually impossible to get the simulation to scale to all of them So instead of running this on one supercomputer, could you imagine using 10,000 small computers? And letting them all run a very short simulation So this is something we've been doing for 15 years as part of a project called folding at home So we're using we use distributed computing screensavers and sent out a small piece of a protein to each and every one worldwide This is a protein called bba5 that was designed to be super stable and form a small alpha helix and two beta strands of beta beta alpha This is roughly 400 atoms in the protein The problem is that you can't fold the protein in vacuum So we can need to add something like 10,000 atoms or 12,000 atoms of water So we're mostly simulating water in the tiny bit of protein in it And then you run this on 10,000 computers and of those 10,000 computers In a handful of cases you see the protein This is a you're going to see an alpha helix forming soon here. You see that you form an alpha helix there It was a pretty rapid collective transition This is actually a beta sheet, but it's formed in the wrong way And so we're around 40 nanoseconds. You're going to see this turn around and Break and then it will form in the right or as you see now We're losing the beta sheet and now we're going to form the beta sheet in the right way And then we cheat a bit and stop the clock when we just happen to fly by the native state This only happens in one simulation of a thousand Because in practice this would take in the ballpark of 10 microseconds for it to happen So on average it won't happen, but in a few simulations we will see it happening And that's exactly because we're sampling things from a Boltzmann distribution We can do far fancier things than this today and you can actually use this to predict binding And we're going to come back to that later on in the course But the point here is that there is no such thing as a trajectory is not a prediction But a trajectory is really a sample from these distributions And that's the one thing that you're going to use later on the course because you're going to be running simulations There are a bunch of other questions that I won't go into detail here, but that we will come back to later on If you want to simulate the cool thing with this is that we can actually simulate much more realistic things But the realistic things also gives you a bit of problem and a physicist would put something in an isolated system But in an isolated system you don't exchange The temperature is not constant So at some point the potential energy goes down because we reach a more stable system The kinetic energy would start to go up so I would heat the protein So I'm going to need to control I want to I usually want to simulate something at a fixed temperature say 300 kelvin I usually want to stay at room temperature or at lab normal pressure One bar unless you want to simulate an atomic bomb Which is actually legal in most supercomputers us as export controls and as part of these export controls We promise not to simulate pressures higher than a mega bar or something Maybe you Would you allow the salt concentrations to change then you're going to need to add or remove atoms? So they're actually there are quite a lot of complicated physical rules of the simulation that you would need to find a way to do And just as a simple example You can control temperature the way you control temperature is to by adjusting the velocities of the atoms to make sure that these Distributions correspond to 300 kelvin so It's certainly possible to make these simulations match experiment very closely And the cool part here in contrast to quantum mechanics We can definitely simulate things under conditions that should correspond exactly to an x-ray crystal or exactly a cry edm experiment And now you can make a simulation and predict what are the experimental observables you would expect to see in the experiments And then you have started having a beautiful connection between theory and experiment again But this brings me to the last slide here You would be surprised all this is based on statistical mechanics that we've gone through So what's the first part of that word? statistical Everything here builds on post statistics and the Boltzmann distribution Just because you see something in a simulation doesn't mean that it's true Because you're now looking at the scale of individual atoms and just now and then you will have a high energy So everything in the simulations you need to see it multiple times and you need to make proper statistics about this And this is the part that we're going to need to be a little bit careful about But if we can only if we can solve that it's going to be an extremely powerful way to make predictions Accurate predictions about real systems. We can calculate how much it costs to solve a compound Well, could you imagine that could be useful? Why on earth is it useful to be able to calculate how soluble a compound is? drugs, right Couldn't you just go into the lab and measure How soluble the compound is So drug compounds need to be reasonably soluble. Otherwise you can't administer them You can certainly measure it in the lab. So why don't we do that? No, I would say that it's trivial to control the parameters in the lab But the problem is you're going to need to start by making the compound And these compounds are typically super complicated organic molecules and at this point, I don't know yet know what compound I want, right? I would like to design a good compound So you're going to have people coming up with 10,000 new compounds per week in the computer candidate compounds You can synthesize them you can order them and they will cost like 50 000 dollars per compound If you go to your boss and say boss, I just ordered 10 000 compounds for 50 000 dollars each the next word is going to be you're fired And they will take four weeks to produce them, right? But if you can give this to a computer and overnight calculate the solubility and well only 100 of these are soluble Those are the 100 user order So that computers are increased computers are not going to be as accurate in the lab But the advanced computers are cheap and fast So we're increasingly using computers to screen properties like this You're going to do these later on the labs. I won't go through exactly how to run a simulation And we're going to spend a whole lot more time talking about this But just to study questions here so that we're kind of finishing up the ball Well the hard part of the mathematics at least and we're gradually moving over now to reel biomolecules how we can handle them in the computers tomorrow, I'm going to be in Swindon actually to give a lecture But on Friday, we're going to be looking at real protea structure And but now we will understand a little bit how to treat that protein structure