 Dear students, in this module we are going to talk about the similarities and differences in the biological sequences. You already know that by comparing two sequences you can arrive at the similarity between the two sequences or the differences between the two sequences. Also such an exercise can help you understand the relationship between the sequences. Besides that you can also look at the evolutionary relationship and the history by looking at the differences between sequences. So let's start by considering a protein sequence that is given here. Only a few amino acids are mentioned. Typically the proteins are very long. But this is just an example. So we have M Q V K L F T that is about seven amino acids. And now we are going to look at what similarity means precisely. So if you have another protein sequence let's say M Q V K L F T then you can obviously say that in terms of similarity these two sequences are one hundred percent similar. However it can be the case that you are trying to compare two sequences wherein there are differences in their amino acids or nucleotides. So let's consider that case. So for specific differences if you have another protein sequence then you can argue that there is a difference between these two sequences in leucine and isoleucine. So by looking at specific differences you can see how two sequences they differ from each other. So in this case the similarity will obviously be less than one hundred. In terms of relationship so if you want to see the relationship between two sequences then you can consider again two very similar sequences and the relationship is obvious that these two proteins or peptides are sourced from the same protein or species. More so let's consider another situation where you have slight variation in the last two amino acids. Then as you can see here these two amino acids are different for both of the sequences then you can say that these two sequences are partially related or may be related because they do not have a complete alignment or they are not totally similar. Next you can look at the evolutionary history as well so if you have another sequence then you can say that these two sequences are varying in one amino acid and if you have another sequence then you can further say that these two sequences are different in two amino acids and so on and so forth. So we can see that we can probably assume that this sequence has given rise to this sequence and so on and so forth. So therefore evolution has taken place in this order and probably the sequence that is here at the end is the most evolved sequence. So in this way you can also look at the evolutionary history. Now I would like to introduce exact matching to you. So the exact matching idea is that two sequences they are not only similar in terms of the amino acids or nucleotides in case of DNA and RNA but that their order in which they are present in the sequence is exactly the same so as I mentioned in the previous example if you are going for exact matching by placing two sequences against each other then they should have the exact same amino acids or nucleotides plus they should exist in the same order. However as we just saw there can be cases when this is not going to happen so in that case you will need to accommodate the differences. That can be done by the in exact matching and there are various options that you can use such as the regular expressions, the signature sequences and the prosight patterns database which we will discuss later in detail. So let's take a look at the prosight pattern. So in this case this is a pattern that is given in your textbook chapter 3 page 44 and the hyphens here. So let's start from the hyphens. The hyphens they separate the elements of the pattern. So all the hyphens are essentially separating amino acids from each other. Next the letters refer to the amino acids for example here, here, here, R, D, K and C and next X indicates any amino acid that may be from the 20 different amino acids and the bracketed numbers they denote the repeat length. So as you can see here this X can be repeated nine times so it means any amino acid occurring nine times. However there may be a situation where this repeat length is variable so in that case the variation is represented like this. So any amino acid that is X may occur 72 times till 86 times. So that is how you represent the prosight pattern for map kinase protein. Map kinase protein is a very important protein. It phosphorylates a lot of proteins and is responsible in the mitosis and apoptosis in the cellular behavior. So using such prosight patterns you can talk about sequences that are there and represent them using prosight pattern. In conclusion the exact matches allow you to compare two sequences exactly while inexact matching allows you to compare rather stochastically between various sequences.