 There's one final statistical concept I want to share with you, not because you need to know the background by heart, but this will help you if you ever use this in practice. Assuming that I give you two alignments, one on the left and one on the right here, you might say already now that this one looks slightly better. And if you just score this with the substitution matrix I showed you before, you will get 19 here and 40 here, and this one should be better in some way. But how much better is better? 20 is still better than 19, but 20 versus 19, could that happen by chance? That's hard to say. So to do things better what one would have to do, at least if we hand wave a bit, is that we would have to look at the entire distributions of these scores. So for instance, you could look at pure identities like dot plots and compare these with the alignment scores from this blow-sum matrix. What we see already here is that with this blow-sum matrix, we're going to get slightly more of these higher scores. So these are proteins that actually are quite similar. It's just that they might not share exactly the same residues and the pure dot plot when we're just calculating identities will notice that. Second, do you see the small red dot out here that there is something that is scored significantly higher than the others? This is also something that it's going to be a bit complicated in practice. What you would have to do there to decide whether this is significant or not, we would have to compare our scores to the entire size of the databases, the expected distribution of scores and everything, and I could spend three hours talking about that. But the take-home message for you there is that we somehow, we need to know at how far up here do we need to be to say that something is significant. Rather than taking you through all that math, I'm going to give you the answer because you will see this in Outputs. The way we typically talk about this is in so-called E-value or P-value occasionally. So of course, if you make lots of predictions, occasionally we would have a high score that happens purely by chance, right? So if I give you that red hit and say that Eric Lindahl thinks that that's a good hit, what my colleague might be interested in, well, okay, I'm well aware that Eric might be a decent bioinformatician and making predictions, but what is the probability that he's wrong? And the E-value really measures is that if I give you a score here of say 360, what is the likelihood that I would have gotten that score purely by chance if these two sequences were not evolutionary related? So the E-value for a particular score is really the probability of getting this match by chance. So this should be a very low number. If you're talking about sequences or something, you might get an E-value of say 10 to the minus 10. If an E-value is 10 to the minus 10, you're safe. That means that it's going to be 1 in 10 billion that you happen to get that result by chance. I think I'm willing to take that bet. The problem though is that if we're talking about protein structures, we will virtually never get E-values that good. Then you might get an E-value that is 0.01. That's pretty good. That's 1 in 100. You might be willing to take that chance. If the E-value is 0.1, that means that it's a 90% probability that you're right, but it is a 10% probability you're wrong. So that if you make roughly 10 such predictions, one of them is going to be wrong. I can't tell you exactly what E-value you should use. That will depend a little bit on the method, and in particular if you have a ton of data such as for sequences or very little data for structures. But this is the value you can use and that you will typically see in reports from alignment, scores, prediction and everything. A low E-value means that it's very unlikely that you got this top scoring hit purely by chance.