 Dear students in this module. I'm going to get you inside the blast How does the blast algorithm work in? This module we will understand it in a step-by-step manner with the help of an example But to start with let me Go through the background on what was the purpose of using blast So the blast algorithm can help you to search and compare By alignment various types of nucleotide and a minus it sequences So on one side you have the query sequence and on the other side you have a database So this database and the query sequence can be a nucleotide or a minus it Sequence set and it depends on how you want to perform the blast and there are multiple types of blast that you can take Now to help you understand how blast works. Let's take a look at this example So here you have a query sequence So this is the first thing that you need to have while performing blast So as I just mentioned on one side you have the query sequence while On the other side you have a database So as you can see here your query sequence is an amino acid sequence in this case a very small peptide So the first step in the blast algorithm is to make a list of all possible words What does it mean? So let's say if this is your sequence Then you can have PQG a word of length 3 or QQ QGE or GEL or ELV So you can make words of Length 3 So here you go. I've listed down PQG QGE GEL and ELV that is the four combinations or four words That could be formed by looking at this sequence So once you have obtained all the words from the sequence Then you need to score them So how do you score? So you simply put one sequence Versus One word and you fill up the matrix the alignment matrix using Lawson 62 matrix The block substitution matrix So you can obtain the score for each word one at a time. So here you have computed the score for PQG and put it here Next you compute the score for QGE and Then you put it here and So on and so forth. So once you have computed the alignment score for each of these words Then we set a threshold. So the threshold essentially means that We select Those words Which have a score greater than some value X So in this case X has been chosen to be 11 so the only two peptides or two words that have a score of greater than 11 are PQG and GEL so you select them like that Okay, so once you have selected these words now you need to mutate these words so PQG Needs to be mutated such that One amino acid is changed at a time So PQG can be PEG PHG and many others So once you have mutated The word then you score again. So as you already know PQG had a score of 15 here right so the mutated Words are also scored and if you do that you will find out that PEG has a score of 13 So now you have PQG and GEL that you got from the first Alignment and now you also got PEG which has a score of 13 GEL had a score of 12 PQG had a score of 15 so in this way you create You mutate these words For all possible combination of amino acids and you calculate their scores and remember Those scores are selected which are above the threshold So in the end as I just mentioned we got three words with us So now we go to the database with these three words and search the database so the words that we have are PQG GEL and PEG So in the database We find where we get a hit a hit essentially means where we get matches for these words within the database So once we have a hit, let's say if this is the protein from your database and we get a hit for PQG Then we try to extend this alignment Okay, so let's see how we how we do that. So if your query had this sequence and Your database Give you something like that Then you can see that there is a match of one word and there is a mismatch here between I and E So it compute the score for this four a Mino acid window and the scores for each position Are given here and the high scoring pair or the HSP. This is simply called an HSP is The sum of This score. So this is about 20 so your PQG I Has a score of 20 Now if you have Two HSPs within the database, so this was one and let's assume that at some other position you get another HSP so now if the distance between them is less than 40 That is 40 amino acids Then you perform full dynamic alignment on the query and hit sequences So this means that we need to compute the full alignment because this is probably a very good match So in this way blast reduces the computational cost by simply comparing full alignments for closely located HSPs Last is available online and you can use it to quickly sort and align and compare the nucleotide and protein sequences The results are tabulated for you and you can look at the details of the statistical evaluations as well