 So, this was just a sneak peek into the world of globular and fibrous protein structure. There is a wealth of them available. And based on these structures and these concepts of folds we started to talk about, I think it makes sense to ask, well, is it true that there are only a few folds that can explain most proteins? And in that case, how many are there, roughly? One of the researchers who spent a huge part of their career looking into structural biology in general, and some of these fold concepts in particular, was Cyrus Schroetje at MRC Cambridge. He unfortunately died a few years ago. And Cyrus wrote a seminal paper around 1980, which was called A Thousand Folds for the Molecular Biologist, where he made the hypothesis that there are likely only the ballpark of one thousand folds, and these folds are not being reused throughout nature for all the proteins in my body, all the proteins in your bodies, and even all the two hundred thousand proteins in the Norwegian spruce. The question then was Schroetje right? You could say both yes and no. There was an interesting paper a few years ago when Mike Levitt followed up on this. And what Mike has done is that he has basically looked at new protein structures obtained the last few years, well, the last few years in 2007, and see to what fraction do these end up being new folds versus to what fraction do they just reproduce old folds. And the first observation here is that already in 2007, we had in the ballpark of one thousand three hundred folds. It depends a bit on how you define fold, of course. So from that point of view Schroetje was wrong. If we are already one thousand three hundred, there are certainly more than one thousand. But Mike made another interesting observation here that the fraction of proteins that end up being new folds is decreasing quite dramatically. So we have fewer and fewer new protein structures that fall on the white space of the maps. Most things appear to be the same as a fold we already knew. And based on that, if you extrapolate a bit that we might just have in the ballpark of two thousand folds or if we're going to be really, really generous, let's say three thousand, it's not certainly not ten thousand. And remember in this class, we're interested in orders of magnitude, right? From that perspective, Schroetje was right. There are only in the ballpark of one thousand or a few thousand folds that nature is apparently reusing for everything. We're going to come back a little bit to what that means in next lecture. I think, no, sorry, two lectures from now after the membrane proteins and discuss folding stability and in particular, why are there so many sequences, but most of them somehow end up in these same holds, the same folds, which is starting to be related to buying informatics. Nature appears to be a very conservative master here and there is less freedom in protein structure that you might think, despite the gigantic freedom in protein sequence. That's going to be great if you want to design proteins, for instance, because it's not a matter of designing one out of two hundred thousand genes. We will design our gene, but the goal of protein design should just be to make it stable in one of these relatively few folds we know, which surprisingly makes our job simpler. So if this is true, every protein sequence should squarely fall in one fold and it should be difficult for them to move over to another. In bioinformatics, that's mostly true. If two protein sequences share 30% of the amino acids, that is 30% sequence identity, I would pretty much eat my left shoe if they had different structures. But the interesting thing is that it matters where in the sequence these identity folds. Core elements of the secondary structure are going to be very conserved, while loops and some other regions might be much easier to change things in. Lynn Regan, who is now at Yale University, made an interesting observation over 20 years ago. So they targetly went after a protein and checked how few amino acid is as possible to replace, but still have the sequence at up the new fold. What she did is that she took a mostly beta sheet protein and then they replaced less than 50% of the residues. And with those residues replaced, this turned into a pure alpha helical fold instead. So 50% sequence similarity, and yet it's a different fold. That might not strike you as extreme. But remember, if I just said what this means is that this fold and this fold that are different, they share 50% sequence identity. Two minutes ago, somebody absolutely not me said that they would eat their left shoe if 30% of the residues were identical, but you had different structure. That's the case here. Very lucky it wasn't me. The reason why this works is that they went after and changed the residues that had relatively little influence on the structure while they maintained the residues that had a large influence on the structure. And this is slightly different than how evolution would do it. Of course, evolution would end up changing things more randomly. And that's why in bioinformatics, the threshold for finding these identities is a sequence. Nature does not decide to change the sequence, right? Evolution works because proteins are related. And if something would certainly suddenly end up in a completely different fold, the organism would likely not survive. So there are slightly different mechanisms, but I think it is worth remembering. Normally, roughly 25%, 30% sequence identity will mean the same fold, but it is possible, in particular in protein design, if you decide what residues to change to go around that rule.