 this is a very difficult problem what we will do is a is one of the most simplified models that is out there, but even that should give some sort of insight into what is required for this whole ok. So, the basic paradigm of this protein folding is that you have an amino acid sequence and that sequence dictates how the protein will fold. So, that sequence dictates your structure and the structure of the protein dictates your function. So, precisely how you fold dictates how you the what function you perform. If you do not fold properly the misfolding errors can cause a number of diseases ok for a given for a given set of conditions, but in the sense that if you change the pH a little bit like this or if you do this post translational modifications like we saw right. So, it will go to slightly different states. So, you can go you can have a discrete set of confirmations, but given a certain condition whether it is a ubiquitous or whatever post translation modification is done. So, given a set of conditions there is an unique ground state. And the basic question that we want to answer is that given a sort of sequence how do I go from that sequence information to a particular structure ok. And you might say that well I will do a sort of random search in whatever possible structures I can obtain, but a quick sort of back of the envelope calculation will show you that a random search is going to be pretty well impossible. For example, let us say that let me work with this compact for compact random work model in three dimensions and let us say I have a protein which has some 100 amino acids, a protein which has some 100 amino acids right. The number of possible structures the number of possible structures given a compact random work like this would go roughly as some 6 to the power of 100 right. At each step you have 6 possible links this is and you have 100 such amino acids. So, that would typically be the number of structures which is roughly some 10 to the power of 77. And then regardless of how fast you search even if you were to say that every structural search in a femtosecond and so on, if you want wanted to scan this entire space of structures that would roughly take you more than much more than the age of the universe basically ok. So, it cannot be a folding process there has to be some sort of principle behind this folding and we will try to look at it in a very restricted sense. So, what this so, what we will do is that we will get rid of this and remember the number of sequences for a 100 amino acid like this is. So, the number of sequences is roughly like 20 to the power of 100 right. You can have 20 possible amino acids. So, for a 100 amino acid protein the sequence space is like 20 to the power of 100 which is even more than the structure space. So, the first step to do in order to reduce the complexity and starting to think about this problem what these people did way back in the 1980s and so on, Kendall and others was to come up with this way of reducing this complexity. So, what they said is that yes I have 20 different amino acids, but let me not worry about that. Let me say that I know that hydrophobic forces are a very important driver of folding right. We argued that when you have hydrophobic residues they would want to sort of cluster together and leave the hydrophilic residues on the outside ok. So, let me say that that is one of the important drivers of this protein folding process and then let me use that in order to reduce the complexity of this sequence space from 20 to 2. So, I will group all a mine all these amino acids is 20 different amino acids into 2 classes either they are hydrophobic or they are polar ok. So, the hydrophobic ones do not want to do not like to interact with water the polar ones like to interact with water. So, let me see that if I take a model like this how far can I go in trying to sort of understand how to think about this folding problem ok. So, these this class of models and their derivatives are called the HP models hydrophobic polar models. There is whole sort of literature on this hydrophobic polar model dating for 20, 30 years now, but what we will do is a very simple application of that ok. So, let me say that I consider a 6 amino acid protein ok. So, very very small amino acid very very small protein which is not really well of protein itself on a 3 cross 2 lattice. So, I will instead of doing 3D I will do a 2D. So, let me say this is my lattice and I take a protein which has n equal to 6 and I will work within this HP model ok. What is the number of possible sequences in this in this sort of a paradigm is 2 to the power of 6 right. So, how many possible sequences we can have that is 2 to the power of 6 that is 64 ok. So, these are the possible number of sequences that I can have for this very simple very short protein. What are the number of possible structures that I can have anyone given this 3 cross 2 lattice? What are the possible structures that I can draw? Possible unique structures that I can draw. Remember this is a compact random work which means I have to occupy every lattice site once. So, what are the possible structures that I could have? So, let me draw one. Let us say I draw a protein like this ok. Let me call this this is the pi structure. So, here is the 6 amino acid protein I have filled up all the available lattice sites I have got a structure like this. What other structure can I have? So, again I can have a g right. So, I can have a g something like this ok. So, I want structures which are not related by translations or rotations or reflections. So, I want unique structures. What else? I could have an s right. So, I could have an s structure and if you think about it hopefully you can convince yourself that these are the only 3 structures that are possible for this sort of a 3 cross 2 lattice. Here is my pi, here is my s, here is my g. So, I have a sequence space of around 64, but I have a structure space of around 3 at least for these very simple things ok. So, I have these 3 possible structures. Now, let me take a let me take a particular sequence. Let me draw the third structure anyway 1 2 3 4 5 6 and then now let me take a particular sequence. So, let me take a sequence which is H p H p H p ok. I have one hydrophobic one polar hydrophobic polar hydrophobic polar that is one of these possible 64 sequences that I could have. And the idea is that whenever you have a hydrophobic residue in contact with another hydrophobic residue that is energetically advantageous right. So, if I had a structure like this for example, if I had a sequence like this H p H p H p and I put it let me say that whenever whenever I have bond between a hydrophobic and a polar residue, it costs me some energy epsilon. And whenever I have a bond between a hydrophobic residue and a solvent that also cost me energy epsilon. Just the simplest case you could take different energies and so on. But let me just say that whenever a hydrophobic residue comes in contact with either a polar residue or the solvent molecule what is solvent? Solvent is all the sites over here. So, everything is surrounded by solvent. So, all of these are solvent molecules in all of these structures. So, whenever I have bonds like this, it costs me some energy epsilon. So, if I take this sequence and I put it onto these three structures. Okay hello everybody. So, in this lecture on this H p module there is a problem that you know the slides basically got stuck at one particular slide. So, what I thought is that it is easiest if I it is just a small 10 minute segment. So, it is easiest perhaps if I just re-record this and you can watch it. Alright. So, I think where we had gotten stuck was that we were looking at this toy model. So, we were looking at this H p model for protein folding. So, we were looking at this H p model. So, we were looking at this H p model and what we had said was that we considered this 6 amino acid protein on a 3 cross 2 lattice which means that the number of possible sequences was 64. But the number of possible structures, unique structures basically you know up to rotations and so on was only 3 and we had drawn those structures. So, one was this pi, the other was this sort of an S and this one was what we were calling the G structure. And that what we said is that let us now try to consider, let us consider a sequence which is this H p, H p, H p. So, let us consider a sequence, let us consider the sequence right here, let us consider the sequence H p, H p, H p. And let me try to place it on these 3 structures. So, the pi structure was this and there was the S structure and there was the G structure. Sorry man, drawing is pretty bad. So, let me try to place the H p, H p, H p on each of these. So, let us say we had said that you see my H was these orange beads are H. So, this is H p, then H p, then H p right. Then in this case again let me say you know I have this H p, H p and then H p. And then in this case again let me start with H, let me start from here. So, H p, H p, H p. And we know how to sort of calculate the energies of these structures and the way we calculate energies are that you know whenever hydrophobic bond is in contact with a polar molecule or with water. So, basically remember that you know surrounding all of this is basically water right. So, everything has water all around it. So, surrounding these structures are all the water molecules. So, whenever hydrophobic, whenever hydrophobic residue is in contact either with a polar residue or with water it will pay an energy cost. And let us say for the purpose of this argument the energy cost is excellent irrespective of whether it is in contact with a hydro with a polar residue or with water. So, let me see right. So, let me try to count. Let me get rid of these we understand that surrounding all of this is water. So, now let me try to count the number of therefore, such bonds which have an energy penalty. So, here is my first hydrophobic residue over here right. It has a bond with water over here, it has a bond with water over here and it has a bond. So, basically these are neighbors in that sense right. So, it has a bond with this polar residue over here. So, that is 3 epsilon. So, I pay an energy epsilon here oops I don't want such a big A. So, I pay an energy epsilon here epsilon here epsilon here for this hydrophobic molecule over here it is in contact with water here and with water here. This hydrophobic molecule is in contact with water here and this polar molecule here. So, I pay an energy cost epsilon here epsilon here, epsilon here, epsilon here. So, that is 1, 2, 3, 4, 5, 6, 7. So, this structure has an energy by this definition of energy of 7 epsilon. Remember, I am not counting bonds, you know, which are along the backbone because, you know, that is the separation. I am just counting sort of non-neighboring residues which are adjacent to each other along this lattice. So, if I do the same thing over here, this hydrophobic residue has a, is adjacent to this polar residue. It is adjacent to water here, water here. This one is adjacent to two waters. This one is adjacent to one polar residue and one water. So, again epsilon, epsilon, epsilon, epsilon, epsilon, epsilon, epsilon. So, this structure, the S structure also costs an energy 7 epsilon. Similarly, if I were to look at this G structure, then this hydrophobic residue has two bonds like this. This one has 1, 2, 3. This one has 1, 2. So, again, this is an energy of 7 epsilon. So, basically if I, so that is the way, basically to try to count the energies of these, energies of these toy proteins. So, if I were to consider the sequence HP, HP, HP, what this says is that all of these three sequences have an equivalent energy which is 7 epsilon. So, basically if you count the number of bonds hydrophobic residues make with polar residues of water and you assign an energy cost epsilon to each of these, then each of these three structures has an energy cost 7 epsilon. So, their Boltzmann weights are into the power of minus 7 epsilon. So, there is no unique ground state. So, if you were to have this sequence HP, HP, HP, none of these three sequences are favored energetically. So, all of these three have the same sort of probability and therefore, this sequence has no unique ground state. So, if I now consider second sequence. So, let me consider a different sequence. So, let me consider the sequence P HP, P HP, P HP, then let me take, then let me take again try to put this sequence P HP, P HP on these three, on these three sort of structures right, the pi structure, the S structure and the G structure. All right. So, the here is. So, I put this sequence of P HP, P HP remember that my orange residues are hydrophobic. Okay. So, in this structure, you can see that how many sort of bonds do these hydrophobic residues make with water or polar residues. This one makes one bond with water here, this one makes another bond with water over there, which means you pay an energy cost of 2 epsilon. If you think of, if you now place this on this S structure. So, P HP, HP, this hydrophobic residue makes two bonds with water, this one makes two bonds with water. So, it costs an energy 4 epsilon. If you think about this G structure, then again, let's say P HP, P HP, this one makes two bonds, that one makes two bonds, this again has a, this again has a energy which is 4 epsilon. Right. So, for this sequence, unlike this earlier sequence HP HP HP, for this sequence, P HP, P HP, this confirmation, the pi confirmation has the lowest energy. It costs me an energy of 2 epsilon, while these other two cost me an energy of 4 epsilon. Right. So, which means that this, this sequence has a unique ground state, which is given by this pi structure. And of course, knowing, you know about Boltzmann weights and so on, you can also write down what is the probability of that ground state as a function of temperature. The probability of this native state of the ground state is nothing but e to the power of minus 2 beta e, which is the Boltzmann weight of, which is the Boltzmann weight of this pi structure, divided by the partition function, which is, you know, the sum of all of these three weights of all of these three confirmations. And if you were to then plot the probability of this native state, so this is what we are plotting is the probability of this folded state or the native state as a function of temperature, where temperature is measured in units of the accident by Kv. Of course, you know, the qualitatively the curve looks nice that, you know, as you go to lower temperatures, you, with probability, you approach probability 1 that you are going to find your protein in the folded state. And this sort of a sigmoidal curve is actually very, is very, what should I say, is very common in these sort of protein folding problems. So, you will often see that, you know, if you experimentally determine the probability of the folded structure, you will often recover curves, which are very reminiscent of these sigmoidal curves. Alright, so having said that, let us now move on to thinking about therefore, that, so this is, remember, this is a very, very toy model, right. I have so many amino acids, I have just, I have just divided them up into two classes, hydrophobic residues and polar residues. But can I learn something about, you know, about what sort of, structures are basically protein-like structures. And I will sort of explain in the context of this very simple 2 cross 3 toy model, HP toy model of the protein itself. So now, remember, we said that, remember that we said that, I have these 64 possible sequences, right, 64 possible sequences, possible sequences. And I have these 3 possible structures. And so the question I want to ask is basically, out of these 64, out of these 64 possible sequences, out of these 64 possible sequences, how many sequences have a unique ground state? And you know, if they do have a unique ground state, which structure is most favored? Is it this pi structure, is it this S structure, is it this G structure? So, you know, because there is only 64 sequences, you can maybe, you know, if you wanted to brute force it, you could write down all these possible 64 sequences, place it on these 3 structures, the pi, the S and the G and see, you know, which, if they do have a unique ground state and if they do have a unique ground state, which of these 3 is the unique ground state. So, if you do this, I will just say what comes out. So, if you do this, it turns out that 9 sequences, 9 out of these possible 64 sequences have this pi structure as the ground state, 6 of them have this S structure as the ground state, whereas only 3 of them have this G structure as the ground state. And these are, you know, those 9 sequences. So, for example, P H, P H, H, H, let us say, P H, P P H, P. So, that way you can sort of read off the sequences from here. So, what this says is and the remaining of course, so this is what 9 and 6, 15, so 18. So, the remaining sequences, these 18 sequences have a unique ground state, the remaining do not. So, they are not really very protein like sequences in that sense. Now, of the sequences that do have a unique ground state, 9 of them basically fall on this pi-like structure, which means that this pi structure is the most designable in some sense of these of these 3 possible structures of this pi, the S and the G. And in that sense of the pi structure or this pi motif is within, remember, within this very toy-like HB model, this pi structure is in some sense the best suited for yielding a unique ground state. So, in some sense, the sort of the most favorable protein-like structure. And you could ask that, well, alright, so, you know, you are saying this, but, you know, does it carry over in some sense into real protein? So, this is a very toy model. Does it carry over into real proteins? And of course, the answer is yes and no in that real proteins are, of course, you know, very, very complicated. They are extremely complex things and a very simple toy model like this will, of course, not capture all the complexity. But they do capture actually some important, some important elements that go into designing this, into this protein folding problem. So, for example, you know, here is a sequence, which has been sort of, you know, you know that in proteins, for example, a very common design motive is alpha helix, okay. And what people have done, sort of experimentally, is that they have come up with designable sequence. For example, here is a designable sequence, HB, BH, HB, BH, HB, BH, HB, BH, HB, BH. And they, what they ensure is that there is a hydrophobic residue every three or four sort of repeats. So, hydrophobic residue is present on an average after every three residues or after every four residues. Because that is sort of the repeat that is known experimentally in sort of experiment in sort of actual protein alpha helices. So, basically, we know that in alpha helices in proteins, hydrophobic residues occur every three or every four residues. So, what people have done is that they've gone ahead and designed a sort of sequence, which has that sort of a repeating structure. And then what they were able to show is that, right, so people have designed these sequences such that you have these hydrophobic residues repeating every three or four residues. And what they've shown is that these sort of design sequences do indeed fall into structures which look like these alpha helices of these proteins, which is very nice. So, which basically says that this, simply by accounting for this hydrophobic polar sort of an interaction, you can make these sequences irrespective of what is the actual amino acid over there. So, you could replace this H by, you know, one of the many possible hydrophobic residue. So, irrespective of that, you can make them often fall into these sort of alpha helices, which is very nice. In addition, there is a functional role as well in that what people have found is that often when you design these sort of synthetic sequences, they often show some sort of enzymatic activity, which is great. So, you know, not only are these sort of structurally, they fall into things that look correct, but they also have sort of functional, they also have a sort of functional significance in that there is some enzymatic activity to these design sequences as well. So, again, so remember that this full protein folding problem is of course very complicated. It's an unsolved problem as of today, but very simple time models like this, like this HV model that we discussed, can actually yield insights into what sort of physical forces or what sort of physical considerations go into determining the native state of these proteins, the native or the folded state of these proteins. They do not tell you all that is to, there is to know, but they at least tell you a major chunk, at least we believe that this hydrophobic interactions play a major role in determining what is going to be the folded state of these proteins. All right, so thank you. I'm sorry about this mix up with this lecture video, but hopefully this should clear up, this should help you in understanding about this HV models. All right. Okay, all right. Bye.