 So, now we will calculate what the optimal policy and the optimal and the value function or the cost to go is for the particular inventory control problem that we defined in the in the previous lecture. So, once again what we will be doing is use the dynamic programming algorithm. So, we will so apply the DP algorithm for the above problem. So, that is going to be our our task for now. So, we let us let us let us write out a generic a generic step and also write out the terminal step. Now, since the terminal if you recall the terminal cost that we had assumed was equal to equal to 0 here. So, g n of x n was was taken as 0. So, and since we are talking of time n equal to 3 as the time period. So, now what we will write this for k equal to 3. So, for k equal to 3, we have that j 3 of x 3 is equal to 0 and we have this for all x 3. So, for all values of x 3 j 3 of x 3 is equal to 0. Now, for for an intermediate k, so that means for k less than 3 and k greater than equal to 0. In other words, for me write this differently for k equal to 0, 0, 1 or 2, then we have we can write j k of x k is equal to the minimum over u k. What is u k required to be well u k is is constrained to be between 2 minus x k and 0. Now, is the expectation is of this particular cost, the cost is some of the stage wise cost which is remember the stage wise cost comprised of these 2 terms here the 1 times u k that is just u k plus this particular term the squared term here, this was our stage wise cost. So, u k plus x k plus u k minus w k the whole squared plus the cost to go. Now, the cost to go is for the next time set. So, that is j k plus 1 of x k plus 1 and x k plus 1 can be written from these state dynamics that we have here. So, x k plus 1 is given by this particular expression. So, I will now write that here. So, that is just simply the max of 0 and x k plus u k minus w k. So, this is therefore, j k of x k. Now, we have we now we will actually do this for every value of k. So, let us write this for k equals 2. So, for k equals 2, notice that the since the demand that the since the amount of stock that we order and the amount of demand that we have can take only non-negative integer values. The inventory can also only take non-negative in the stock that you have at any time period can also take only non-negative integer values. And since you cannot have so the only possible values that the inventory can take the amount of stock that you have at the beginning of time at the beginning of any time period can only take integer values in this particular problem in particular non-negative integer values. The reason for that is simply that we can only order we can only order an inventory we can only order integer quantities and the demand also is realized in terms of integers. So, you can after a little bit of reflection it will become clear that the amount of inventory at any time step can only take 3 possible values in fact, which is 0, 1 or 2. So, xk can take 3 possible values 0, 1 or 2. Well, in that case let us write out now j2 of x2 for each of these values. So, j2 of x2 for each of these values. So, j2 of let us write this first for x2 equal to 0. So, for x2 equal to 0 we are now writing j2 of 0, well j2 of 0 is equal to the minimum over the minimum over u2. Now x2 is 0 which means if x2 is 0 then u2 can take values up from either 0, 1 or 2. So, notice this particular constraint that we have u2 if x2 is 0 then the right hand side here is 2 and left hand side is 0. So, u2 can take values either 0, 1 or 2 so u2 in can be 0, 1 or 2 and the expectation the cost is the expectation of u2 plus x2 plus u2 minus w2 squared. Now, what was x2, well we said x2 was 0, so what I will just remove this term all together and simply write this as u2 minus w2 the whole squared plus the cost, the cost to go at the last step. Well the cost to go at the last step here is already defined here that is j3 of x3 is equal to 0 for any stages, any state x3, so consequently this is also 0. So, in other words the final cost after I will erase all of these things and write this as my final cost. So, j2 of 0 is equal to the minimum over u2 where u2 can be 0, 1 or 2 of u2 plus u2 minus w2 the whole squared, now the expectation here is over w2 because w2 is the only thing that is random out here, now w2 is random and can take remember 3 possible values according to this probability distribution that we have written here. So this expression here can be expanded out I can write this as minimum over u2 in 0, 1 or 0, 1 or 2 expectation of u2 plus now this what I will do now is write out this expectation actually in detail, so I will write this as u2 plus now the where w2 equal to 0 happens with probability 0.1, so I would get 0.1 times u2 squared plus now w2 equal to 1 happens with probability 0.7, so this is going to be 0.7 times u2 minus 1 the whole squared plus w2 equal to 2 happens with probability 0.2, this is 0.2 times u2 minus 2 the whole squared, so this is now the expression that we get and we want to minimize this over all possible choices of over the possible choices of u2 the choices that are listed here which is 0, 1 or 2, so now let us do this computation it turns out that if u2 is 0, so if u2 is equal to 0 then this expression evaluates to let us write this out, so if u2 is 0 then I get 0, 0 and 0.7 times 1 squared which is just 1, so if u2 is equal to 0 this is going to be 0.7 plus 0.2 into 2 square which is 4 and that becomes equal to 1.5, similarly if u2 is equal to 1 I can again do this calculation and it turns out that this is this expression after doing the appropriate calculations turns out to be 1.3, you can check this yourself and finally u2 equal to 2 I can calculate compute this again and that gives me 3.1, so if I do these compare these 3 numbers here u2 equal to 0 is giving me 1.5, u2 equal to 1 is giving me 1.3 and u2 equal to 2 is giving me 3.1, what this is telling me is that if I was at the beginning of time period 2 and if I did not and I started off with 0 inventory at time, if I had 0 inventory at time period 2 then ordering nothing would give me a cost of 1.5, ordering 1 unit would give me a cost of 1.3 and ordering 2 units would give me a cost of 3.1, what this is effectively saying is doing nothing would give us a cost of 1.5 and ordering too much is also costly it gives you a cost of 3.1. The right thing to do is order the in between number which is 1, this gives us the minimum cost for starting at time period 2 with 0 inventory, so if you are starting off with 0 inventory at time period 2 the optimal thing for you to do would be to order 1 unit of inventory. So the optimal thing here the optimal 1 is this is optimal. In other words u2 star of u2 star equal to mu2 star of 0 at 0 is equal to 1. So the expectation has been computed here. So as a result we find that the optimal thing to do is to order 1 unit, let us see what the value function amounts to which is what is this particular expression we found that this expression is actually 1.3. So we have got therefore mu2 star of 0 is equal to 1 and j2 of 0 is equal to 1.3. Now we are not done with stage 2 yet because the same has to be done now for another value of x2. Now x2 can suppose x2 takes value 1. Now if x2 takes value 1 then again I need to do this, I need to write j2 of 1 is equal to the minimum. Now remember we have that uk can be only is between 0 and 2 minus xk now my x2 is equal to 1. So uk is between 0 and 1 which means uk can take only 2 possible values which is either 0 or 1. So u2 now is either 0 or 1 expectation is let us write out this expectation again. Again you have u2 plus x2 is 1 so it is 1 plus u2 minus w2 whole square that is the expectation. So this is now the minimum over u2 in the 0 or 1 and this particular expression let us evaluate this expectation it is u2 plus w2 equal to again 0 happens with probability 0.1. So it is 0.1 times 1 plus u2 square plus 0.7 0.7 times u2 square plus 0.2 times u2 minus 1 I got 0.7 times u2 square this is by taking w2 equal to 1 that happens with probability 0.7 and the last one here is by taking w2 equal to 2. So it gives me u2 minus 1 the whole square with probability 0.2 so that is the expectation. Once we can now write this out what is going to be what and compute this for each value of u2. So let us write this out for u2 equal to 0 so u2 equal to 0 gives us this expression becomes again 0.1 now 0.1 into 1 plus 0.7 into 0 plus 0.2 into 1 so that is actually equal to just 0.3 and u2 equal to 1 is gives me 1 plus 0.1 into 2 square which is let me write this as 4 plus 0.2 plus 0.7 sorry into u2 equal to 1 so that is into 1 plus 0.2 u2 equal to 1 that is again that becomes 0. So if I evaluate this this is 0.7 plus 0.4 plus 1 that evaluates to 2.1. So again we can compare the two so if you are starting with an inventory of one unit ordering nothing costs you 0.3 and ordering an additional unit costs you 2.1. So the optimal thing to do in that case is to order nothing at all u2 star of 1 is equal to 0 and j2 of 1 is equal to so what is j2 of 1 well it is the cost that we just computed here so it is 0.3. So now let us do this for x2 equal to 2 which was the only remaining value so when x2 is equal to 2 notice that because u2 is going to be between 0 and 2 minus x2 this basically tells us u2 can only take one value which is u2 equal to 0. In other words there is nothing to optimize over right out here the j2 of 2 is simply equal to the expectation of the same cost but with u2 equal to 0 which is now amounts to 2 minus. So the cost is this with u2 equal to 0 let me write out the cost u2 plus x2 plus u2 minus w2 x2 plus u2 minus w2 the whole squared and with u2 equal to 0. So then in that case this is simply the expectation of where remember x2 is 2 so 2 plus this minus so it is going to be 2 minus w2 squared the expectation of that the expectation of that one can quickly evaluate is actually turns out to be 1.1. So consequently we find that we find that mu2 star of 2 is also 0 and j2 of 2 j2 of 2 is 1.1 this now completes the calculation for time period 2 one can now do this for do the same for time period 1. So time period 1 let us let us go to now write for k equal to 1 so k equal to 1 again we have 3 possible 3 possible states once again we have possibility that x2 is equal to 0 so let us begin with that so for x2 equal to 0 we are now calculating j1 of 0 j1 of 0 is the minimum of now dub since x2 not x2 it is x1 equal to 0 we are now calculating j1 of 0 which is the minimum of u1 when in in the constraint that satisfies the constraints that we have we have since x1 is equal to 0 u1 can take any of these 3 possible values 0 1 or 2. So now the expectation now is over w1 of this of the same expression as we had earlier we now have the stage wise cost again stage wise cost with when the state is equal to 1 is given by u1 plus u1 minus w1 squared plus then the now the cost to go the cost to go at time period 2 so that is j2 of let us do max of the j2 has to be evaluated at the next stage at the next state right so this will be j2 of the max of 0 comma the state at the next time step was max of 0 comma u1 minus w1 so it is actually x1 plus u1 minus w1 but x1 is equal to 0 which is what we are assuming here so therefore it becomes just this this is the expectation that we need to calculate we want to evaluate this for every for each of these for each of these value so now we need what we need to do here is substitute the values of j2 that we found from from from out here so j2 was computed here in these three for these in these three pieces here so that needs to be substituted out here okay so let us let us just do that that is that can be that can be done so we suppose if if u1 is if you it turns out that if u1 is equal to 0 then I will just do this expect this calculation once so with u1 equal to 0 the first term disappears and we are left with w1 square here so so this will again the the cost becomes so with u1 equal to 0 we can we can compute this we can compute this expectation here so since u1 is equal to 0 this term the first term u1 here is 0 the u1 here is also 0 and u1 here is also 0 okay so then therefore we are we have now 0.1 times the so 0.1 is the probability that w1 is 0 so if w1 is also 0 the first term and the second term are 0 only thing left is the is the second term is the third term so it is this times j2 now of 0 which is max of 0 0 0 right so j2 of 0 plus 0.7 times okay now what is this well it is we are we now need to take w equal to 1 so if u1 is 0 this is 0 and we are left we have the only the second and the third term with w equal to 1 so 0.7 into 1 plus now if w is equal to 1 and u1 is and we are talking of u1 equal to 0 then it then the max of this also evaluates to 0 so you get just j2 of 0 plus 0.2 times the 0.2 times the again this expectation again this term so that that evaluates to now with we are talking of w equal to 2 so this is will be 4 plus j2 of whatever remains so again here when w2 is equal to 2 once again what is inside that the argument of j2 is all again going to be 0 and we are left with again j2 of 0 okay now we can substitute from here the j2 of 0 that we had which is 1.3 substitute that back there and that gives you that gives you the final this particular expression as 2.8 one can similarly do this for u1 one can do this for u1 equal to 1 for u1 equal to 1 you would see that this cost evaluates to 2.5 and u1 equal to 2 this is u1 equal to 2 this evaluates to 3.68 so consequently we find that we find that the optimal action then in state at time 1 when the state is 0 the optimal action is actually u1 equal to 1 right so we find that mu1 now star of 0 is in fact equal to 1 and j1 of 0 is 2.5 okay so that gives us this particular this the optimal that gives us the optimal thing to do when you are in state 0 at time 1 one can repeat the same same sort of calculation once again and get that get I will write this out for you here you get j1 of now state 1 turns out to be 1.5 and mu1 star of 1 turns out to be 0 and j1 of 2 turns out to be 1.68 and mu1 star of 2 turns out to be 0 okay this is k equal to 1 and then finally we have we we can we need to this now completes the stage k equal to 1 one can now need needs one needs to now do this for k equal to 2 as well sorry k equal to 1 now needs to do this for k equal to 0 as well so at you again have at the similar sort of expression at k equal to 0 and again for each of the possible state values of the of the state at 0. Now since we since we are given that the initial state is 0 we only need to calculate j 0 of 0 so j 0 of 0 is all I am going to write out and this now is can be you can check is given by u 0 which is minimum over u 0 which is from 0 1 and 2 the expectation of u 0 plus 0 minus w 0 square plus j 1 now of the max of 0 comma u 0 minus w 0 once we again we can compute this compute what we are looking for it turns out so if you take action u u 0 equal to 0 the expectation evaluates to 4 if you take action u u 0 equal to 1 the expectation evaluates to 3.7 and you take action u 0 equal to 2 it evaluates to 4.818 all of this put together now tells us that the optimal action for us to choose in state 0 at time 0 is 3 is is is to order one unit one unit in one unit of inventory so it tells us that mu 0 star of 0 is equal to 1 and j 0 of 0 is equal to 3.7 so this therefore is the optimal cost of our problem j 0 of 0 is which is 3.7 is actually the optimal the optimal cost that we incur starting from 0 inventory and the policies that we need to adopt are we is that at initially you order one and then thereafter you order based on based on the these policies based on what the state gets realized as so at time 1 you order based on mu 1 star at time 2 you order based on mu 2 star where the these intermediate j's that we have computed are only as are serve as intermediate calculations the the final calculation the final total cost that we that that we would incur starting at stage at stage 0 bit with 0 inventory is 3.7 and that is given here. So with this we we have completed a complete analysis of the inventory control problem.