 So, this expression that we have written here as Jn minus 2 of Xn minus 2 this X equals the expectation of these of the sum of these terms. This expression can be written in can be understood also in the following way. We can say well it is it is equal to the expected the or let us say the it is equal to the minimum of the expected cost in time period n minus 2 plus the expected cost in time period n minus 1 assuming assuming that an optimal policy is chosen in period n minus 1. So, assuming that you would be you are doing doing the optimal thing after reaching after reaching time period at the beginning of time period n minus 1 what is the optimal thing you would optimal cost you would have at time period n minus 2. Well that is verbally given by this this this particular sentence here it is the expected cost in time period n minus 2 plus the expected cost that you would have assuming you are optimal from time n minus 1 onwards. Now one thing to note here again as as goes without saying this this this particular expression is to be computed for all Xn minus 2. So, it is it is so Xn minus 2 in this expression is a parameter however Xn minus 1 now is not a parameter Xn minus 1 was was a parameter of the earlier optimization at stage n minus n minus 1. So, when k was n minus 1 Xn minus 1 was a was a parameters. So, now at k equals n minus 2 Xn minus 1 is actually the realized state the state that would get realized if you took action u n minus 2 at stage n minus 2 starting from a state and Xn minus 2. So, Xn minus 2 here is the parameter u n minus 2 is also a parameter to be chosen as a function of Xn minus 2 and Xn minus 1 is random. So, more explicitly this is this minimization is equal to minimum over u n minus 2 greater than equal to 0. Now once again the since Xn minus 2 and u n minus 2 are not random they would come out of the expectation you would have c u n minus 2 plus r Xn minus 2 plus the expectation of Jn minus 1 of Xn minus 1 but Xn minus 1 is itself given in terms of Xn minus 2 and u n minus 2. So, it is Xn minus 2 plus u n minus 2 minus w n minus 2. So, this therefore is the minimization and I can write this even more succinctly I can put I can take the r of Xn minus 2 outside since that does not depend on u n minus 2 plus minimization of u n minus 2 greater than equal to 0 c u n minus 2 plus the expectation of Jn minus 1 of this expression. So, this is therefore r this is therefore the expression that we end up with. Now what we can of course do this now for every stage k. So, at an arbitrary stage k what the verbally the expression that we have is that you would now be looking at a tail sub problem like this suppose you are at some stage some stage k this is k plus 1 all the way this is now n. So, what you would have the optimal thing you would do at stage from k plus 1 onwards that is given to you as a through Jk plus 1 as a function of whatever state will come up there and what we now want to compute is Jk. So, notice that the optimal thing you would do here is from k plus 1 onwards all the way till n it is not only in period k plus 1 that is something that students often get misled by or out here it was from n minus 1 till n. So, it looks like what you all you are doing is looking at one period, but what you have to really do is at another that was just a special case for k equal to n minus when k was n minus 2. Now that you are at an arbitrary k what you need to do is look at the cost the value the cost to go which is the cost from k plus 1 onwards all the way till n. So, it is the optimal cost you would incur if you are in an arbitrary state at state x k plus 1 at time k plus 1. So, what you do then to compute the same quantity at time k what you do is look at the stage wise cost in this stage and the optimal cost that you would get in this over this time period over this this tail sub problem from k plus 1 all the way till n come take the sum of those 2 and find the decision that would give you the optimal value of the sum. So, therefore, in English what we really have is the we are minimizing. Therefore, you can say the minimum of the expected cost in period k plus the expected cost in periods k plus 1 till n minus 1 in each of these periods that means the period up till the last period which ends at n. So, k plus 1 till n minus 1 assuming an optimal policy used for these periods and that particular quantity is this thing which is in the round brackets here this particular is captured by j k plus 1. So, although here in this out here in this in this in this sentence here what I have said is that we are talking of an optimal policy that will be used in periods k plus 1 k plus 2 all the way till n minus 1 all of that is embedded in j k plus 1. We are not choosing we are here the only the minimization is only being done over the action at time k only the uk. We are not we are not choosing here the actions in the subsequent time periods because that has the optimal things to do in those time periods have already been computed and we now know what is it that what the optimal cost is starting from period k plus 1. So, all of that computation is already been defined when one when one writes when one derives j k plus 1. So, doing this we now get j k. So, in other words j k of x k is equal to the minimum over uk greater than equal to 0 of the expectation of r of x k plus c times uk plus j k plus 1 of x k plus 1 and once again this is the quantity which is random x k plus 1. So, again I will simplify this I will write this as minimum over uk greater than equal to 0 expectation of r of x k plus c times uk plus j k plus 1 of x k plus uk minus w k which is minimum over uk greater than equal to 0 as by repeating the steps that we have taken earlier we can just I will I will just skip to the to the final step you will get r of x k plus minimum over uk greater than equal to 0 c times uk plus expectation of j k plus 1 of x k plus uk this this is therefore this now when done for all x k defines for us the cost to go at time period k. So, this is now this completes now the demonstration of the dynamic programming algorithm for this particular problem. Now, let us what we will do next is let us explore how we could we can potentially compute the optimal the optimal policy and the optimal the optimal actions the optimal policy and the and the cost to go or the value functions for this particular problem. So, let us say let us see dwell a bit on the complexity of doing this now notice that so to begin with this first step here did not cost us anything it was simply a definition j n of x n was being defined as capital R of x n for all x n the from the next step onwards we need to we have some amount of work to do what we need to do here is a minimization over all un minus 1 the and this minimization is being performed for every x n minus 1. So, for every possible choice of x n minus 1 we are doing a minimization. So, there is an optimization that is to be done here for every value of this parameter. Now, if x n minus 1 can take any possible real value then this would mean that the one potentially has to do infinitely many optimizations because it would have to be done for every potential every possible value of x n minus 1 the same also holds subsequently in fact keeps getting more and more complicated subsequently because now this needs to be done now for x n for every x and every value of x n minus 2 this minimization has to be performed for every value of x n minus 2 and in that minimization will feature j n minus 1 the function that you found in the previous step. So, it is imperative that we find the right function here in this step so that it does not affect the our calculations to does not adversely affect our calculations in the next step. So, it is important to do that properly in the first step that then gives us but nonetheless this again has to be done for every x n minus 2 and since if your state space is infinite like the space of real numbers or the space of integers then this becomes a very this becomes therefore a minimization we have to do for every x n minus 2 and so on and this needs to be done therefore for at every generic step k when we do this what we can also as we do this for every x for every value of the state. So, as we do this for every u n minus 1 for every x n minus 1 we get the optimal u n minus 1 as well when we solve the optimization problem it also gives us gives us as a corollary it gives us the you also get the optimal u n minus 1 star as a function mu n minus 1 star of x n minus 1. So, it is a as a function of this we get we get the optimal the optimal action now because we since that is what we get then it what it tells us is that this defines for us also the optimal policy or the optimal decision rule from or for that particular state. So, this gives us the optimal decision rule to apply at stage n minus 1 likewise when we do this minimization here we get the optimal u n minus 2 star as a function mu n minus 2 star of x n minus 2 and here in general this this defines for us the u k star as a function mu k star of x k. So, this also gives us the policy. So, in either k in either to find the policy or to find the optimal cost to go what we what we really in this kind of a problem would need to do is infinitely many optimizations. Now, there are some ways around it what one could hope one way could be that you hope for you hope that you can maybe discretize the state space and then compute the optimal only at the discrete points where you have where you have discretized in that case the quality of discretization the shape of the cost functions all of that will determine how good the accuracy is and how bad the complexity is. In a problem where the state space is infinite the real in many cases what you would want to look for is a setting where is to see if you can formulate the problem in which in such a way that you can actually get a closed form expression that this particular quantity can be computed in closed form. In other words what we get when we when we when we what we hope for is that one does not really need to do this for every x k you know one does not need to do this for every x k one does this for a token x k and the form that is a that one reduces from there is the one that would work for every x k. In other words we find a structural solution of this particular close structural closed form solution of this particular of the dynamic programming equations and that gives us the value function at each state and also the optimal policy. There are a few precious few sort of forms problem structures where such compute such computations are possible where the optimal policy and the value function can be computed in closed form we will see those we will see those in the subsequent lectures. It is but even though they are they are they are they are small in number but they also happen to be the most widely applied and they also happen to yield some of the most interesting insights into problems like this. What we will do in this in this lecture and in the next lecture is to look at a slight variation of the inventory control problem in order to more clearly illustrate the kind of calculations we can do. So, what we will we will make a few assumptions. So, the first assumption we will make is that the demand and the action which is the amount of additional stock we order these can only take values of non-negative integers. So, this earlier we had we were we had not made any such assumption which means that you one could order any amount of stock any fractional amount of stock and the demand could also take any kind of any possible value including fractional values. What we are now assuming is that the demand can only take values in as non-negative integers. So, which means we will assume that wk in z plus what is z plus well this is the set of non-negative integers 0, 1, 2 and so on all the way till infinity and similarly uk is also going to be a non-negative integer. So, uk can also take only non-negative integer values. We had also we had earlier allowed this stock to go to negative which means which would which for us said that denoted that if this denoted n fulfilled demand any fulfilled demand that were not being fulfilled was backlogged and was being recorded as a negative inventory in our system. So, on the but this time we will say that any stock that is not any demand that is not fulfilled is lost. So, as a result the stock that you have in the next time period is always is given by a different equation. So, the state dynamics are now different xk plus 1 is now going to be equal to the maximum of 0 or xk plus 1 plus uk minus wk. So, what does this equation mean? Well it means that if xk is the stock that you have at stock at time k this is the stock that we have at time k and this then this is the amount of demand that we. So, this is the amount of additional inventory we have ordered this is the action the amount of additional stock order additional stock order and wk here is the demand. So, wk is the demand. So, in that case what this is this equation is basically saying is that well if your demand is such that it is less than the total stock you have at the beginning of the time period which means these which includes the stock that you had from the earlier time period which is xk and the additional stock that you ordered. The total that you have to fulfill the demand is xk plus wk. So, if wk is less than xk plus uk then whatever you are left with at the end of time period k which means at the beginning of time period k plus 1 is equal to xk plus uk minus wk. So, what you are left with is xk plus uk minus wk. On the other hand if wk exceeds that then the additional demand is not backlogged it is lost and what is what you are left with is only is 0 inventory. So, xk plus 1 is then equal to 0. So, if wk is greater than the sum of the first two terms is greater than the sum of these two terms then the stock at the next time period would be equal to 0. So, therefore this is now our state equation. So, what we will assume also is that one cannot that we do not have space or we do not have holding capacity for more than two units. So, we will assume that we do not have storage capacity for stock that is more than two units which means that when the stock that you have and additional any and any additional stock that you order has to be in total less than equal to two units. So, we all we have xk plus uk this has to be less than equal to 2. So, as you can see this manifests as a constraint now on uk which means that we cannot order any more than 2 minus xk. So, you this effectively means that uk is less than equal to 2 minus xk where xk is the amount of inventory we have and remember also that uk still is required to be greater than equal to 0. So, this is our earlier requirement our requirement form earlier. We will also assume that the holding or storage cost in case you have the we will also assume that the holding or storage cost that you are left with when for the inventory that you are left with which is a function of basically the inventory that you are left with is given in the following form. So, the holding or cost for the kth period is xk plus uk minus wk whole squared. This particular is this particular term is the is our stage wise cost. So, this is going to be the stage. So, this particular term is one component of our stage wise cost. The other component comes from the cost of ordering additional inventory. So, this is the holding or storage cost and then there is a cost for additional inventory which is which is given by. So, we will assume that the cost of inventory is unity. So, you get the cost of additional inventory is 1 times uk. Finally, the terminal cost which we often denote by gn of xn this particular quantity is 0. So, there is no terminal cost associated with this problem. So, for any state terminal cost is equal to 0. Now, the demand takes values of as non-negative integers. So, let us also specify the probability distribution of the demand or the noise in the problem. So, we have the probability that wk is equal to 0. This probability is 0.1. The probability that wk is equal to 1 is equal to 0.7. The probability that wk is equal to 2 units is equal to 0.2. You can check that these 3 add up to 1 which means that effectively the demand can be either 0, 1 or 2. So, with these particular probabilities there is no other possibility for the for the demand. We will also assume that the initial state initial state x0 is equal to 0. That means one starts from we start with an initial initial inventory of 0. So, what one wants to do now is to find the optimal cost over a certain time period. So, the time horizon that we are considering. So, we will consider n to be 3. So, with this now we have the full problem formulator. We have wk which has this particular expression. The stage wise cost are given by this. So, the stage wise cost is a sum of these two terms here. So, the sum of this term and this term uk plus xk plus uk minus wk square. This particular term is our stage wise cost. The terminal cost is this term gn of xn is identically equal to 0. That is our terminal cost. The action that we can choose is an integer action subject to a certain set of constraints. So, it means it has to be greater than equal to 0 and less than equal to 2 minus the these top that you start with 2 minus 2 minus xk. So, that is the constraint on the action. The dynamics, these are our dynamics. The state dynamics are given by this equation here xk plus 1 equals the max of 0 comma xk plus uk minus wk. And we start the problem with an with empty inventory that is x0 equal to 0 and we will be solving it over 3 time periods that is n equal to 3. So, we will explicitly solve this in the next class.