 So that takes us to homology modeling. Remember, the idea is that if I have a query sequence where I don't know the structure, but there is a homologue that is a sequence where we have a common ancestor and at least one of those sequences have a known structure in the PDB, then I can build my structure on top of my neighbor's structure. In practice, this is gonna work great. This is an example of two PDB structures that shared some 40% sequence identity in this case. Do you see that they're virtually identical? There might be some minor deviation here, but I'm not even sure whether that is, there could be a minor experimental deviation. In principle, we start just by taking the coordinates of the other backbone and then we're adding in the sidechains. 75% of the sidechains sounds like a lot, but remember most of those changes are gonna be small and in practice, there will usually only be two, three positions of most sidechains I need to test. There are really good programs to do that sidechain prediction. What's gonna determine the quality of this? Well, you could argue that it's gonna be the algorithms or so, but the deciding factor here is whether I have a homologue of known structure and ideally whether I have a homologue with really high sequence identity. Because even if I only have 30% sequence identity, I might trust that it's a homologue, but it might be fairly difficult for me to do that alignment. What if there is a large loop or insertion here or something, right? Then I might not be able to build this region very well. But if I have another relative, also with known structure, but with 85% sequence identity, then I will be able to build a much better model. So this is an idea that the entire community spent a lot of time on some 15 to 20 years ago and this has been a tremendous success. So we take all 200 million sequences in Uniprot and just plot them in some sort of arbitrary space here. There is no way I could ever determine structures of all of these. We simply don't have the time and it's likely impossible for most. But instead of hoping to get structures of all of them, let's strategically try to determine structures of a handful representative. And if we then assume that if I'm close enough to each of them, again, this is a completely arbitrary dimension I'm showing here, maybe everything within a certain region of the red ones are things that could be modeled. So in this schematic plot, I would not be able to model that one or that one or that one. But all the other green dots, I could then predict structures of as homology models. And then of course I should regulate these so that they're small enough that I can trust the quality. This was roughly the goal of the structural genomics consortium that determined close to 100,000 new structures. And based on those efforts from 20 years ago, this field has just continued. So we keep having an increasing number of protein structures, but there is a lower and lower number of reading new faults we're detecting simply because we're gradually starting to have very, very good coverage here. So homology model has gone from being a narrow, lucky streak that we could occasionally use to being something that will possibly work in 75 to 80% of the cases if you just have a new sequence. Again, homology model should be your first go-to method if you need to get the structure of a new protein fast.