 Hello everyone. So what we will do in today's lecture is we will apply the dynamic programming algorithm to a problem that we have seen so far which is the problem of inventory control. So recall this problem from one of our earlier lectures. The inventory control problem had a finite time horizon n denoted by n. The problem was to decide how much stock or how much inventory to order at the start of every time period. So the time was slaughtered like this from 0 to n and what we wanted to do was how much stock should you be ordering at the beginning of each of these time periods. So the assumption was that the stock that is ordered at the beginning of time period k is immediately available at time period k and but what we do not know is what is going to be the intervening demand during the time period k. So the sequence in which things became known to us that we came to know of the amount of stock we had at time period k. This was denoted by xk. So xk is the stock available at the beginning of time period k. On the basis of that we chose uk which was the amount of stock to order at the beginning of the k time period and the delivery was immediate following which the demand was realized that is wk. So demand during the time period k and we assumed that these w 0 to w n minus 1 were independent. So with and the problem the objective was that we also had assumed that demand that is not fulfilled can be backlogged. So any unfulfilled demand showed up as a negative stock in our any as a negative available stock. So the amount of stock available was given by this linear equation xk plus 1 equals xk plus uk minus wk. The objective was to minimize a total cost. So there were the cost at 3 components. There was a cost there was a terminal cost for being left with inventory xn at time n. There is a penalty r of xk which was the penalty for having excess stock or positive stock xk or in or for holding excess inventory or equivalently for negative stock for shortage of unfulfilled demand. Then there was a cost also of ordering additional inventory this was the purchasing cost. So c that was given in this linear form c times r of c times uk. This is this these 3 together made up our total cost. So if you recall we also had this kind of a flow chart where said the xk is the stock at time k the stock you order at time k is uk that results in a cost r of xk plus c times uk and then the inventory system produces for you the stock at time period k plus 1 by using xk uk and the demand that gets realized in period k which is denoted wk. And the objective was to minimize this total cost which is a expectation of r of xn plus the summation of r of xk plus c times uk. This was the problem. What we will do is now apply the dynamic programming algorithm that we studied in the in our previous lecture. So let us recap that as well. So the dynamic programming algorithm essentially asked us to do the following. It asked us to define a sequence of functions at the nth stage the the function was defined as simply the terminal cost. So jn of xn is is equal to was defined as gn of xn. jk of xk was defined as the minimum value of a certain optimization problem. The minimum was taken over all actions you can take at at at time k as a when you are in state xk and the cost of that optimization was the expectation of the stage wise cost plus the plus the jk of the jk plus 1 of the next of the next state which is which is given in terms of these dynamics here. So this this is therefore the stage wise cost plus jk plus 1 that then defines for you jk and then you recursively define for all these functions from jn all the way down to j0 and the algorithm told you that j star of x0 which is the optimal cost is actually nothing but j0 of x0 where j0 is derived from this from these iterations. The the functions jk were called are are are are called the cost to go or sometimes also called the value function right. So this jk is what is is plays an important role in in all of this in and another object that plays an important role is the minimizer that we find from here. If you look at the minimizer that comes from this optimization problem the minimizer is you let us denote that as uk star this minimizer is a function of xk. So let us write that as uk star equal to mu k star of xk and that then minimizes the right hand side here this particular expression and it minimizes it for each k and each xk remember that we have to find these jk's and jk and jn are these are functions that means they this evaluation has to be done for every value of the state. So it has to be this has to be done for all xk here and this one also has to be defined for all xn okay. So now we will see a demonstration of this for the inventory control problem okay. So let us come come to the inventory control problem let us let us start writing out the dynamic programming algorithm for the inventory control problem. So what does the dynamic programming algorithm ask us to do it says first do first look at stage n and at stage n actually the answer the what you need to do is very simple. So jn of xn should is to be taken as the terminal cost and what is the terminal cost the terminal cost is remember r capital r of xn this is the terminal cost. So jn of xn is equal to the terminal cost which is r of xn. Now if you look at the dynamic programming algorithm now it what it asks you to do is compute jk for each k from 0 to for k bit going from 0 to n minus 1. But how does one do that practically what one has to do is do this step by step for each for each k. So one begins first with k equal to n minus 1. So let me write here k equal to n minus 1 and then write jn so jn minus 1 of xn minus 1 this is equal to now what is this equal to we can look at this from the formula of course but let us just think about it intuitively. So xn minus 1 is the amount of inventory that you have at the beginning of time period n minus 1. So after this that is only this is the last chance for you to take a decision. So at you take a decision at time n minus 1 after that the system will the demand will come in and the system will go to this to state xn at time n at which the problem ends. So the cost that you are going to incur if you are at a if you are in state xn minus 1 at time n minus 1 the what you really have is a tail sub problem of length only 1. So there is a cost that you incur in that problem which is the stage wise cost of stage n minus 1 and a terminal cost of being of whatever which is a function of whatever state you end up with at the end of the problem. So therefore the minimization here is over the sum of these 2 costs which is the stage wise cost in stage n at stage n minus 1 and the terminal cost at stage n. So intuitively this is exactly what you should be doing you should be minimizing the sum of the stage wise cost and the cost that you incur at the end of the problem. And that is precisely what the dynamic programming algorithm also ask you to do. It ask you to minimize you over minimize with respect to u n minus 1 where u n minus 1 is chosen from this set of feasible actions u n minus 1 of x n minus 1 the expectation of what is the expectation over the expectation is over the stage wise cost. Now what is the stage wise cost that you will that you have well the stage wise cost that you would have is you if you order u n inventory worth u n minus 1 the stage the cost that you would incur is equal to c times u n minus 1. So this is the cost of ordering inventory u n minus 1. We also had a cost of holding a certain amount of inventory which is which was given by small r of x n minus 1 small r of x n minus 1. This was the cost of holding or a penalty for holding or a penalty for having unfulfilled demand plus now what one so this is the stage wise cost here this is the stage wise cost at stage n minus 1. This is the stage wise cost at stage n minus 1 and then you have a terminal cost which is which is capital R of x n. So you have a stage wise cost plus a capital R of x n. Now what does one do with this well let us analyze this further. So first let us look at the constraints. So there is a constraint on the action that u n minus 1 has to be from capital u n minus 1. Now what was capital u n minus 1? Capital u n minus 1 is the set of feasible actions or in equivalently the amount of feasible inventory that you can you can order you are allowed to order at stage n minus 1. Now at stage n minus 1 what you when your what you can the problem tells you that you can only order you can only order inventory you cannot give away or return inventory. So you can only order x additional stock you cannot return any any stock that you already have. So consequently u n minus capital u n minus 1 regardless of what the x n minus 1 is only says that the small u n minus 1 should be greater than equal to 0. In other words this this here this constraint here can simply be written in the following way it can be written as the minimum over u n minus 1 greater than equal to 0 of the expectation of these this term c n u n minus 1 plus small r of x n minus 1 plus capital r of x n all right. So this is this is where the this is what the the the expected cost is and you are trying to minimize this particular expected cost over all u n greater than equal to 0. Now let us let us think very carefully about each of these terms in the cost. What is what is a function of u n minus 1 here because since you have to minimize this whole thing this this this particular expression we need to ask what is the function of what what which of these is the function of u n minus 1. The other thing we need to ask is since we are taking an expectation here which of these quantities is actually random there are this this expression has 3 terms here but which of these is in fact random. Now if you so let us look at the second question first the in the the second question see the the quantity u n minus 1 is a constant that we want to choose it is a it is it is the it is the it is the it is a variable whose value we are choosing it is where it is it is a it is only a function of x n minus 1 okay. Now x n minus 1 is not the actually is not the realized state at time n minus 1 x n minus 1 was supposed to denote any any token state at time n minus 1. So this this particular when we are writing this each of these equations here the first in the first one we really mean to write this for all x n which means that this is the the x n here is just some is is merely any let us say representative way variable that denotes any value of this of the inventory at time n at time step n. Similarly here this is being written for all x n minus 1 as well okay. So if this is written for all x n minus 1 and this is written for all x n then what this is effectively telling us is that well we have this is what we have we have j n of x n equals r of x n x n for all x n and j n of x n minus 1 is equal to this for all x n minus 1 which which effectively means that x n minus 1 is really a parameter for as far as this optimization problem is concerned and as far as this expectation is concerned. So x n minus 1 is not the realized state and so therefore is consequently not random it is only a parameter as far as this expect this expectation and this optimization problem course. So consequently x n minus 1 is actually a is it is not random and as a result this expectation in this expectation this r of x n minus 1 can actually move out of the expectation. Now u n minus 1 is also a parameter as far as this expectation is concerned because u n minus 1 is to be is going to be realized is going to be chosen as a function of x n minus 1. It does not have access to the information that is of the demand that is going to be realized at time n in that during time period and during time period n minus 1. So all that all the randomness due to the demand at time n minus 1 is represented here in in x n it is what will manifest as different values of of x n. So the only quantity that is random here under this expectation is actually the x n. In fact let us write this out more explicitly we can let us write out this minimization u n minus 1 greater than equal to 0 of expectation of c times u n minus 1 plus small r of x n minus 1 plus capital R capital R of now x n minus 1 plus u n minus 1 minus w n minus 1. So this is this is you can you can see here from from this that the randomness is all hidden in the hidden in the x n because after all the only the thing the thing that is random here is w n minus 1 which is the demand during time period n n minus 1. So x n minus 1 is fixed and deterministic u n minus 1 is is again a parameter as far as this expectation is concerned and it is it is going to be chosen as a function of x n minus 1 and w n minus 1 is the random demand at time n minus 1. So then therefore this as a consequently I can write this as follows I write this as a minimization over u n minus 1 greater than equal to 0 c times u n minus 1 plus r times x n minus 1 plus expectation of r of x n minus 1 plus u n minus 1 minus w n minus 1. So this is therefore the therefore the expression it is minimum over u n minus 1 of this of this entire call. Now you look at now what is r of x n minus 1 well r of x n minus 1 here is a con is since x n minus 1 is a parameter r of x n minus 1 is actually not affected by this minimization this minimization takes r r it takes x n minus 1 as a parameter right. So this r of x n minus 1 is unaffected by this minimization because the minimization is over only u n minus 1 u n minus 1 is present in this term and in this term here the r of x n minus 1 can go out of the minimization. So consequently this here can be written as r of x n minus 1 plus minimization over u n minus 1 greater than equal to 0 c times u n minus 1 u n minus 1 plus the expectation of r of x n minus 1 plus u n minus 1 minus w n minus 1. And when one once if you if you know the an expression for r and if you know the value of c and and also for similarly for small r one can one can calculate this right. So this would then give us j n minus 1 of x n minus 1 for all x n minus 1 all right okay. So this is this is now the cost to go at times at stage n minus 1 okay. So what the dynamic programming algorithm now asks us to do the same thing for k equal to n minus 2 that means at stage n minus 2. So what we have done so far is we have looked at from if you go from time period 0, 1, 1, 2 and so on till till n and we solve the problem in this part of the tail of the time horizon starting from time period n minus 1 and ending at the beginning of time period n. Now what the algorithm asks us to do is to do the same thing starting from time period n minus 2 okay. So it is asking us to do the same thing for this for this time period. Now fortunately when we look at this longer time period which is starting from n minus 2 all the way till n this leg of it okay the n minus 1 to n leg of it has already been solved. We already know that regard what we already know from our earlier calculation what the cost to go was starting from stage n minus 1. So out here for instance so if what Jn minus 1 of xn minus 1 has told us is that regardless of which state we end up in okay it means regardless of what value xn minus 1 takes we now know what is going to be the optimal thing to do and the optimal cost starting from that state onwards till the end of the time horizon. So as a result of this starting from we know we already know that whatever state xn minus 1 we now end up in after taking some decisions at stage n minus 2 whatever state n minus 1 x n minus 1 we end up in we know what it is going what is the optimal cost from there onwards till till the end of the time horizon right. So as a result of that we can now compute what we can now write what the optimal cost is starting at any arbitrary state at time n minus 2. So Jn minus 2 which is the cost to go at state at time n minus 2 as a function of state xn minus 2 this should therefore be the minimum over all Un minus 2. Now Un minus 2 again is constrained to be greater than equal to 0 so this is greater than equal to 0 the expectation of the stage wise cost that you have in the stage wise cost that you have in state n minus 2 that stage wise cost is again given in terms of C times Un minus 2 plus R of xn minus 2 plus the optimal cost that you would incur if you end up in a state xn minus 1 at time n minus 1. So and that cost is simply the cost to go the cost to go at time n minus 1 right. So the Jn minus 1 expression that we just computed in the previous in the in the previous slide is to be plugged in is to be plugged in here. So that comes up here your stage wise cost comes up here. So this so what is this expression effectively telling us it is telling us you know the optimal thing to do when you are at stage n minus 2 is to look at the is to look at these two terms first is the stage wise cost that you will incur in this period plus the optimal thing that you will incur as a function of where you will land up after this after the first period. So the optimal thing you would do after you as a function of where you would land up where you would land up is your state xn minus 1 right. So that is that is given here the optimal the optimal cost that you will incur is given by Jn minus 1 of xn minus 1. So that the so you take a sum total of this this is the stage wise cost and the optimal cost to go you take the expectation of that and you minimize that overall overall u n minus u n minus 2 greater than equal to 0. This then will give us the the this then will give us the value function or the cost to go at stage n minus 2. We will do this calculation a little more in detail in the next part.