 So, this gave us a hint about the size of the secondary structure elements. The next thing we're going to need to go after is looking at these defects, understand why they are so expensive, and why it's not the Boltzmann distribution. To do that, we're going to need to start by looking at amino acid side chains, again like we did in the first few lectures. This is where I basically compared the two types of distributions. On the x-axis here, I have the probability of certain amino acids occurring either on the inside of proteins or the surface of proteins, so k, for instance, lysine, very much surface while some of the hydrophobic ones tripped the fan very much inside. That had to do with the, when we actually see them in specific folds, they do not change places in those folds, so that's not really the Boltzmann distribution. On the y-axis, on the other hand, I have the transfer-free energy between water and oil, and here we have a pretty darn nice correlation, right? The charts once, lysine and arginine, they're very water soluble, and triptophan very fat soluble. So the y-axis, which is the correct Boltzmann distribution, appears somewhat correlated with this other stability thing that I claimed was not a Boltzmann distribution, and there is, of course, a reason for that. If we study those small effects, I'm going to need to consider a special residue and an energy landscape. So here is my usual landscape. Somebody should make statistics and see what they look like, my landscapes, not real ones. And let's say that this is the free energy of a particular residue to occur in different places, or different residues. If I have a serine or a lysine, I can start with, I'll start with serine. If I look how common it is to have serine in oil versus water, it turns out that there is not really any difference. So if I say, if I calculate the cost of moving that from water to oil, it's going to be roughly zero kcal per mole. It's just as happy in both places. And that would kind of correspond, so if this was a landscape where serine actually could move, that would correspond to having too many that had roughly the same value. It's going to be happy in either. That doesn't really tell us a whole lot. But let's look at something else. Let's look at leucine, which is a clearly hydrophobic amino acid. And let's look at oil versus water there, too. In this case, leucine is going to be much happier in oil. So the delta G from water to oil is going to be roughly minus 2 kcal per mole. And I'm well aware, this does not look like a whirlwind difference. It's just 2 kcal, so it's like a hydrogen bond. But here is the point. If we're looking at folds, every single fold with serine on the inside of a protein will also work with leucine on the inside of a protein because leucine is going to be more stable. So I can always replace serine for leucine in all those folds. But then there are going to be many more sequences where you have leucine in a particular position. Some of those folds might not have more than minus 2 kcal stabilization energy. And in that case, they will work fine if I keep leucine there, but I can't replace them for serine. So there will be many more folds where I can replace a serine for a leucine or keep a leucine where I can replace a leucine for a serine. And that is starting to get close to this concept, explaining why even seemingly small energies paying 2 kcal does not sound horrible, but it's certainly worse than paying 0 kcal. The way I would formulate that, and the book too, is that if we think of an entire protein, whatever, imagine something, and then I have a small contribution. So I can think of that the total delta F, that is delta F for the rest of the protein, and then some small of delta Epsilon, just for the residue I'm looking at. In this case of serine versus leucine, delta Epsilon here would be 0, and here delta Epsilon would be minus 2 kcal, because leucine would stabilize it. And of course, that means that all, if I use leucine here, I could afford to have delta F be 2 kcal worse. Now it sounds like I'm hand waving. You know what, that because I am hand waving. So let's look at this as a proper probability distribution instead, and that might be easier, even though it's a bit of math.