 So, we had calculated in the previous part the denominator of this particular of this expression here. We had this expression which was the probability distribution of the state given the information and we were calculating that probability distribution. We calculated only the denominator of that I will just show you what the actual expression then turns out to be. So, the numerator the numerator here is this given given s this numerator turns out to be in fact 1 3rd times 1 4th times 2 3rd times 3 4th plus 1 3rd times 1 4th. So, if I put these together the probability we get the probability of x 1 equal to p bar given g g s given this information turns out to be just 1 by 1 by 7. So, as a result of this we can now go back and ask what is j 1 of g g s here j 1 of g g s remember was the minimum of these two terms which is the cost of cost of taking action cost from c and the cost from s cost from s was all was one and cost of cost from c is 2 times this probability the probability of x 1 equal to p bar given i 1 cost of s is 1. So, we have computed this probability to be 2 by 7 sorry to be 1 by 7 as a result of that the this here is the minimum of now 2 by 7 and 1 and 2 by 7 is less than 1. So, therefore, this becomes 2 by 7 and the optimal action to take at time 1 when you have information g g s the optimal action is to is to simply continue that means keep running the machine. Now, we can do this again for the information when the information is b g s. So, let me write this as case 2 when i 1 is equal to b g s when i 1 is equal to b g s we can again compute these probabilities it turns out that the probability that the machine is in in state p bar given b g s remember b here is just z 0 g is z 1 s is u 0. So, this probability it turns out this probability is actually equal to the same as before it is equal to all again 1 by 7. So, in this case then we can write j 1 of b g s is also equal to 2 by 7 because a similar expression will will be written out there as well. So, you will again have so so in place of in place of this probability now we have this probability but the numerical value is the same they are both 1 by 7 and so as a result of that j 1 of b g s is turns out to also be equal to 2 by 7 and mu 1 star of b g s is is also c. So, this is when the information is b g and s. Now, let us go to the next next case what are the other cases possible remember that we have remember here that z 0 and z 1 can both be either b and g u 0 can be either c or s. So, in other words i 1 can take 8 possible values. So, we need to really do this for each of the 8 possible values of i 1. So, far we have completed 2. So, now let us go to case 3. So, i 1 we can here i 1 is now g b s. So, in this case we can compute that the probability that this that the machine is is broken in this this can be computed as to be 3 by 5 and then we get from here that j 1 of g b s is unity mu 1 star of g b s is equal to s case 4. So, for case 1 where i 1 is b b s we again get this probability to be the same as the one above we get it to be i 1 b b s this probability also turns out to be 3 by 5 and as a result of that j 1 of b b s is the optimal action in this case is to is to stop. So, in case 3 and case 4 the option optimal action is to stop and inspect and the reason it turns out to be we need to stop and inspect is because because remember this probability is now 3 by 5 this probability is 3 by 5 this times are cost cost is 2. So, the 2 times 3 by 5 will give us 6 by 5 and that is that that is is is greater than that is greater than the cost from stopping. So, we are we are looking at the minimum of these 2. So, that turns out to be turns out to be the minimizing action is to stop and that turns and that gives you a cost of cost of unity. So, now let us go to case 5 and case 6 case 5 would be when the information is g g and c. So, here this turns out to be 1 by 5 then since this is 1 by 5 the cost form continuing is going to be 2 by 5 and that will be less than the cost from stopping. So, as a result in this case the optimal action is to actually continue. So, j 1 of c c g is 2 by 5 and mu 1 star of c g g sorry this is my mistake here this is c g g c is to continue. Then case 6 when the information is b g c when the information is b g c probability of x 1 equal to p bar b g c this is equal to 11 by it turns out to be equal to 11 by 23. And since this is 11 by 23 again the cost form continuing is going to be 22 by 23 which is less than 1 which is the cost from stopping. So, again the optimal action here from b g c is again to continue and this is the optimal cost. Second last case is the case when i 1 is g b c here then the probability of the machine being broken turns out to be 9 by 13 and therefore the cost form continuing is now larger than the cost from stopping and so the optimal cost the cost to go from that in state g b c is unity and the and the optimal action is to stop. In the final case when the information is b b c this probability of x 1 equal to p bar is 33 by 37 and the optimal action in the state b b c is to stop and the cost to go turns out to be turns out to be unity. So, here I have written out all the all these remaining cases case 3 to case 4. Now what you will observe in case 3 and case 4 is that the probability that the machine is broken in either of these 2 cases that means when i 1 is g b s or b b s the probability that the machine is broken at time 1 is 3 by 5. Now 3 this is to be multiplied by the cost remember the cost the cost was 2. So, the cost from continuing is actually for us is 2 times the probability under consideration. So, that is 2 times 3 by 5 and that gives us the cost from continuing as 6 by 5 and that is greater than the cost from stopping which is the cost of stopping is 1. So, we get we are comparing 6 by 5 with 1 and so therefore the optimal action is to actually stop. So, that applies in this in both of these cases in case 5 and case 6 we in case 5 we have the probability of the machine being broken as 1 by 5. So, the cost is 2 by 5 cost of continuing is 2 by 5 and that is less than the cost of stopping. So, the optimal action is to continue the in case 6 the cost of the probability of the machine is broken is 11 by 23 the cost of continuing turns out to be 22 by 23 and the and that the optimal action then is to continue. In case 7 the probability that the machine is broken is 9 by 13 of the cost of continuing then will become twice this which is 18 by 13 and 18 by 13 is greater than 1. So, therefore the optimal action is to stop and the cost of the cost to go is 1 the same holds in case 8 the cost the probability of the machine being broken is 33 by 37 and therefore the optimal action is to stop here the optimal action is to stop. All right. So, in so you can notice here that we there are some very intuitive observations that we can make for example in the last case here if the machine has if your inspection has led to readings has led to readings bad and bad for the last two occasions the probability that the machine is broken is rather high it is 33 by 37 and so therefore the optimal action for you to take at that time is the optimal action that you need to take at that time is to stop and inspect the machine. So, that is that is what that is what we find here. We continue to run the machine until the up until the cost of continuing does not become too much and the cost of for the cost of continuing to become too much the evidence has to be there that the machine is probably broken. So, for that if the machine has so far so far never given you good never given you bad has never been has always given good inspect the inspection has always been good the outcome of the inspection has always been good then we really do not need to bother we just continue. So, it is G and G then you just continue and that you can that that is the optimal action you we continue even when the earlier reading was bad and the current reading is good we continue because this we it is it is a borderline almost borderline case you get that the cost of continuing is almost one it is 22 by 23 but it is still less than one. So, we continue but if but here in this case what is happened is that the bad rating was further in the past and the good rating was more recent maybe that is why the optimal action here is to continue. If when I switch these two when the good rating was in the past and the bad rating is more recent it turns out that the optimal action is to stop and inspect. So, the machine has gone from good state to bad state and you need to now take a look at the machine. So, the same has happened here when the machine has gone from good state to bad state we have we need to take an action S and stop and inspect the machine. So, this here has has told given us what the optimal thing to do at time one is. So, this is this is we got these eight cases because there were eight possible values for the information that we would have at time one. Now, so to summarize our policy is is the policy can be thought of at time one optimal policy. The optimal policy at time one is to continue if the result of the last inspection was G and to stop if the result of the last inspection was B. So, looking back at all of these we find that the optimal policy actually has a very simple and intuitive form as I have summarized as we have gone through above we can be summarized as following as follows the optimal policy at time one is to continue if the result of the last inspection was G and it is to stop if the result of the last inspection is B. So, if we find that the machine is broken as per the last inspection then we stop and we was in a bad state as per the last inspection then we just stop otherwise we continue that is what the optimal policy at time one is. Now, let us go to let us go to stage one the first stage that means K equal to 0. Let us now write this for K equal to 0 here we now need to plug in plug in whatever we have computed for for J 1. So, we have computed J 1 of I 1 for various values of I 1 that needs to be plugged in here now as part of the DP equation. So, here so in our at K equal to 1 this term was 0 whereas in at K equal to 0 this term is not 0 anymore. So, this needs to be we need to plug this in and as part of the as part of the DP equation. So, let us write out what the cost is from continuing and cost from stopping in at stage at K equal to 0. So, the cost from continuing the cost of action C remember this is going to be the sum of the expected stage wise cost and the expected conditional expected stage wise cost and the conditional expectation of the cost to go that we have computed in the previous step. So, the cost of C this remember is so once again we have a similar term we have two times the probability that the machine is broken given the information and given that we have taken an action C but now plus we have also this term which is the expectation of Z 1 given of J 1 of I 0 C and Z 1 given I 0 and C. So, this can be written more explicitly as twice probability that X 0 is equal to P bar I 0 comma C plus probability of Z 1 equal to G given I 0 comma C times J 1 of I 0 G C. So, let me write this in a different order let me write Z 1 first and then let us write the action that is that is the way we have written it so far. So, let us write this and then the action C. So, this is therefore I 0 G C plus probability of Z 1 equals B given I 0 C times J 1 of I 0 B C. So, the cost this is the cost of taking action C now the cost of stopping and inspecting and stopping and repairing this will now have the expected cost of stage wise cost which was 1 as before plus now the 8 an analogous term from the 1 above. So, you have J 1 of now I 0 Z 1 S given I 0 given I 0 S and that is equal to I 1 plus probability that Z 1 is now equal to equal to G given I 0 S times J 1 of I 0 G S plus probability that Z 1 is equal to B given I 0 S times J 1 of I 0 B S. So, this is the expression now J 1 of the J 1 at each of at for all these values can be computed by putting in specific values for I 0. Remember I 0 is just the information at time 0 the information at time 0 is simply the observation at the result of the first inspection. So, that can take only two possible values which is G and G and B. So, we let us put that in here. So, case 1 now is I 0 is equal to G. So, then we can just do a direct calculation from here. So, in this case we can compute each of these probabilities easily. So, we have the we need to compute the probability that Z 1 is G given that given G G C. So, we need to remember compute the following probabilities. So, we need to compute the following probabilities here. We need to compute this probability here. We need to compute this sort of these probabilities 1, 2, 3 and 4. These probabilities the four probabilities of related to Z are the probabilities that of seeing an observation G a certain observation at the next time step given that given the information. So, those probabilities need to be computed and we also need to compute the probability this particular probability here which is the probability that the machine is broken given the information we have so far. So, let us write out these probabilities probability that Z 1 is equal to G given G C is turns out to be 15 by 28 probability that Z 1 is B given G C is 1 minus that so it is 13 by 28 probability that Z 1 is equal to G given G S is 7 by 12 probability that Z 1 is equal to G given G S sorry Z 1 is equal to B given G S is 5 by 12 and the probability that X 0 is equal to P bar given G C is 1 by 7. So, in this in case 1 we now we see that these are the probabilities for the 4 terms that I have for the 5 terms that I have highlighted. So, this is what they turn out to be. So, as a result of this I we can now we can now write that J 0 of G is the minimum of the cost from C and the cost from S. So, the cost from C can be written out as 2 times 1 by 7 plus 15 by 28 J 1 of G G C plus 13 by 28 J 1 of G B C and so this is the cost from taking action C and the cost from stopping is 1 plus 7 by 12 J 1 of G G S plus 5 by 12 J 1 of G B S. So, this here is the cost from C this is your cost from S. We can we now have from our previous calculation we have values of J 1 for each of these that are written out here see notice here we have these, this, this, this, this and similarly this and this these are the 8 cases for which we have written out the values of J 1 and those can be now plugged in into this expression here in place of in place of these but these J 1's here. Once we plug those in and do the remaining calculations it turns out that J 0 of G is just 27 by 28 and the optimal action in when you have information G is to continue. So, the optimal action is when the first reading is good is to simply continue. A similar calculation can be done when the first reading is B in that case it turns out that so case this is for case 2 I 0 equal to B in that case also now J 0 of B is turns out to be 19 by 12 and mu 0 star of B the optimal action in that case is to stop. So, if it says basically that if the machine is in a bad state during the first inspection itself then you stop and inspect the machine and get a complete inspection done and repair the machine. So, the summary in this for this policy is that if we get a bad answer from the inspection then we stop and repair the machine otherwise we just continue. We now need to we have therefore now the J 0 the cost to go computed when the information is G and the information is B. We now need to compute the optimal cost J star we need the probability that the information is G or the information is B. So, the optimal cost J star is given by the probability that Z 0 is equal to G times J 0 is equal to G plus the J 0 of G plus probability that Z 0 is equal to B times J 0 of B. Now probability that Z 0 is equal to G can be computed Z 0 of equal to G this probability can be computed as 7 by 12 the probability of Z 0 equal to B is is 5 by 12. Remember this is here comes from our observation equations here and we need from here we need to compute we need to multiply we need to use we have these observation equations out here and we need to use that the initial the probability of the initial state being P 0 P bar or P that we from so we from there we will get these probabilities the probability of getting an observation good or bad and then we once we substitute this we get that J star is actually can be computed you can check this is equal to 176 by 144. So, this this example showed us a complete demonstration of how one we needs to do the needs to do dynamic programming in when you have you know when we when we actually have the incomplete or imperfect information of the state. So, we need to notice we have taken a long time to solve this particular problem and that is because the information vector is grows with time and therefore can take many number of values and we need to we need to write compute the DP equation at each step for each of the values of the information vector.