 So the first thing that we have to accept if we want more information is that you're not going to get it from the protein data bank The protein data bank is too small. We have to find ways to indirectly access structural information From UniProt where we don't have structures. That's a bit of an oxymoron but Deborah Marks and Chris Sander came up with a beautiful way of solving an old problem And this old problem is the following that if I have a sequence here This sequence are going to have lots of residues in spatial proximity, right? And the idea here is that if I have two residues that are neighbors in space here, if one of those residues change It's likely that the other one will change too. How? Well Say that I have alanine, leucine, tryptophan Let's say lysine Leucine alanine glycine And then I have alanine leucine tryptophan Let's assume that I'm sorry. I'll add another one Glutamic acid valent. Let's say that that lysine has changed to a leucine So it's a plus charge that became neutral glycine Glutamic acid that is a minus charge. Let's say that whenever that lysine changes that one also changes This is interesting if that residue changes that residue always changes at the same time So maybe it could be that you have a plus charge here that is normally paired up with a minus charge But wherever the plus charge disappears the minus charge also goes and vice versa if we just look at Alignments here. It turns out this is very common. We have so-called co-variation or Alternatively we call them correlated mutations The idea is that if two Positions in the sequence always correlate they are likely to be close to each other in sequence That's a dirt simple explanation. Why didn't anybody come up with that before? Well, there is a problem here Assuming that residue A here now. It's not an amino acid. It's just a site if that one correlates with B and Then B correlates with C and C correlates with D and D correlates with E This will also mean that a correlates with E But it's not likely that all of them are close to each other So the problem here is that we can frequently end up with these very long chains where it looks like everything is Correlating but they're not really it's just that a was close to B B was close to C C was close to D and D was close to E, but a is not close to E To solve this required quite a lot of statistical mechanics actually and that's why it took a while and I would even argue that this was Yet another example where the devil was in the detail of the implementation We all thought that this wasn't really possible It wouldn't work until Deborah Marx and Chris Sander showed ha it does work and a year later Everyone had implemented these in their programs This was another one of those 10% improvement steps suddenly everyone used this and it proved great So exactly how great are correlated mutations? Well, they were great enough that when David Baker implemented this in Rosetta They pretty much won the biannual competition where groups are comparing different algorithms There was a whole range of proteins where they showed that they could determine structures within two three angstroms Compared to experimental structures and now we no longer talking about small toy proteins fairly large proteins membrane proteins It wasn't that they had solved the problem But suddenly was quite obvious that for large classes of proteins the problem is likely possible to solve with ab initio methods The other amazing thing here is that Rosetta is freely available at least academia So we can download and use these methods and all groups including ours. They use them all the time a Few years ago, this would have been the peak of the class and show that this would buy informatics can do David has a ton of other structures I want to have time to go through all of them But they've been able to use this to determine folds that were previously not known and a few years later It's been shown that those folds were indeed correct