 So if I start to just give a historical overview of secondary structure prediction, the idea is that for a number of alpha helices or so, I can just plot the residues. This small diagram is called sequence logo. It's interesting and they look cute. The idea is that the most common residue is going to flow to the top here, and the more common it is, the larger the letter it is. I also measure the information context. So if a particular position always has alanine, I'm going to draw that one very tall. But this means that based on the residues, if I see a long stretch of residues that are lots of alanine and lots of leucine, and it's roughly 20 units long, it's likely an alpha helix. There are better methods for it, but the first approximation. We can do the same thing for beta sheets. You see here that there is a slightly different pattern here, valines in particular. Valines don't like to be an alpha helix as much. It's harder to predict beta strands, mostly because the alpha helix is a very local structure. It's local in sequence. It's going to be 20 residues next to each other. While a single beta, I can predict a single beta strand, but that strand has to form a beta sheet together with strands that might be very far away in sequence. So this is much harder. Predicting the secondary structure when it's beta sheet involves some sort of long range interactions. Harder to design the prediction algorithm, but today it largely has been cracked. The book in particular goes through quite a lot of detail of a beautiful algorithm called two-fasman. So what two-and-fasman did, when I was your age, I spent probably three lectures in the class understanding this. So they looked at podium structures and average propensities, and if we accept that one residue should never break a helix, but if we take the average of say three or four adjacent residues, and if those four residues virtually always occur in helix, we can calculate what is the running average likelihood of something being either a helix or a sheet or a turn. And then we compare those. And as long as the helix gets the higher score than the beta sheet, we predict helix. But at some point, when the beta sheet is higher, we turn over and say, ah, now we form the beta sheet. The idea with two-fasman is that they're using all the information that we've talked about previously in the class, free energies, probabilities of forming turn. It was a beautiful method. It was the first bioinformatics method. The only problem is that it's quite labor-intensive to sit and do all these diagrams. You can do it. You can even do it in Excel. The reason we don't do it is that it's crappy. I'm sorry. I'm a physicist. I love two-fasman, but they might, at best, get something like 40%, 50% accuracy and secondary structure prediction. Even a very simple modern bioinformatics method would get 60% and modern ones based on multiple sequence alignments get instantly 80% and the state of the art is 90%. I think this, too, tells you something that how much the field has developed that I don't even think it's worth the time to teach you the simple physics-based method anymore.