 So let's move on to ways we can actually detect similarity in particular things our homologous in practice. So the first thing is to separate those two points I can only detect similarity. Now if the similarity is large enough I might personally draw the conclusion that two things are homologous but that's always a guess. It can be a really good and safe guess but all guesses can be wrong now and then. The simple way of doing this is just to check whether amino acids or DNA nucleotides are similar and I'll start with the verse we normally use for DNA because it's simpler and it's really only there we use it. If I have bases on rows sorry rows and columns here I can mark this so that I make put a red square here or think of this giving a plus one score if they are identical and if they are not identical I give them zero. Already for these random sequences that I drew here actually they're not entirely random but this is a bit of a mess. The problem is that if you only have four bases we're going to have lots of random hits in principle roughly every fourth position on average right. So let's filter this a bit. Do you see this diagonal path here this meant that have GCT lining up to GCT so maybe I should add a filter here that I will unless I have at least three residues lined up I'm going to filter them away and ignore them and then I end up with this is that this is massive. Do you see here I have GCT matching there and then I have a streak of six residues matching there it is just that I happen to have two extra residues that have been inserted in one of these two sequences. This is very common occasionally we have single residues that have been changed and in other cases I have residues that have been inserted or deleted. Now in practice this is going to be very noisy and it's mostly for DNA I'm interested in this type of identical comparison so at some point I want to move over from these bases to amino acids instead in particular if I'm interested in protein structure.