 Welcome back. So, let us continue with our model of inventory control. Let us recap we have the state of which is capturing the amount of inventory that we have in our shop that is the stock available at time at the beginning of the kth period. The action that you need to take is the amount of stock you want to order and this order is going to be placed at the beginning of each time period and it is going to be immediately delivered. The demand that will that gets that that comes to our shop that demand is denoted WK and this is exogenous and random. So, it is this is the noise in our system. So, the sequence of events is that first the state becomes known then the action is chosen which is the inventory is ordered the additional inventory is ordered and then the demand gets realized. So, the inventory has to be ordered even before the demand for that period for the time period K is realized. Let us now go ahead and write out a few write out a few few more elements of this problem. So, I am going to we are going to also assume that the demand that is not fulfilled can be backlogged. So, demand that comes to the shop but is not fulfilled can be backlogged. So, this is going to be our assumption. So, we are going to assume that so what that demand that is not fulfilled is going to be backlogged can be backlogged which means that if it is not fulfilled then it can in the in a particular time period then it can potentially be fulfilled in the subsequent time periods as well. So, what this means is therefore any unfulfilled demand can manifest itself as a negative inventory. So, our inventory can the state xk although it is I said that this is the amount of stock that we have effectively we can allow we can also allow this xk to become negative and negative xk would represent a an unfulfilled demand that is present in the system which can be fulfilled in the subsequent time steps. So, consequently the state of the system at the next time step is given by this by this simple equation. Since we are allowing for backlogging the state can take positive as well as negative values. So, the state at time k plus 1 is is equal to the state at time k. So, this is the stock available at time k plus the stock that you order minus the demand that gets realized. So, xk plus uk minus wk. Now, if wk is larger than the sum of xk plus uk then it would mean that whatever you had the inventory that you had at the beginning of time period k plus the inventory that you ordered at time k is not sufficient to meet all the demand that you had. So, wk is larger than xk plus uk but that is okay because then that would mean that xk plus 1 becomes negative and there is an unfulfilled or backlog demand that is present at backlog in demand which is present at time period k plus 1. Now, let us come to the cost function. So, we are going to assume that there is a cost r of xk, r of xk is the cost that is present at each time period k. Now, this cost is a penalty. So, r of xk can be interpreted in two different ways. If xk is positive then it means that you are holding more inventory than you need to then r of xk is a penalty positive stock so, this could be for example, a holding cost or a storage cost for excess inventory or it can also be a penalty for negative stock. So, if you have backlog demand that this could be a penalty for that as well. So, because if your demand is not fulfilled you would not want it to be that way you would want your demand to be fulfilled immediately. So, therefore, there is a you can think of you can think that there is an additional cost on top for having negative demand as well. So, this is a shortage cost for unfulfilled demand. We also have a purchasing cost. So, there is a purchasing cost let us say c times uk. So, c times uk is say c is some constant this is the cost of purchasing purchasing cost for order for order uk and c then is the per unit cost per unit cost per unit cost of the order because we are taking in a time horizon of n it is possible that we will end up with some amount of inventory at the end of the time horizon. So, there is also in addition to this it terminal cost. So, capital R of xn is the terminal cost left with inventory xn at time at time n. So, once again we can distinguish between as I said open loop and closed loop models in an open loop model what one would be doing is choosing all these invent the order of invent the amount of inventory to be ordered the u0 to un minus 1 you would be choosing these before anything about the problem is before anything about the demand gets known or before anything about the intervening stock gets known. So, they are chosen before before the any of the uncertainties gets get realized. So, as a so what you then have is a fixed schedule of ordering that is not that is not sensitive or not not tuned to the amount of stock that you have in the system that would be an open loop that would be an open loop strategy. A closed loop strategy is one where you would is one where you would choose your the amount of stock to be ordered as a function of what of the information that you have. And in this case the information that you would have would be the amount of stock that is present at the beginning of the time period. So, if you choose uk as a function of xk then you would be doing an open loop then you would be doing a closed loop strategy. So, pictorially we can represent this in the in the following way. So, you have your you have your inventory system, you have xk which is the stock at time k that is going that is your inventory system. Now, this determines the cost at time k the cost of at time k is r of xk plus c times uk to in order to determine along in order to determine the cost you also need the action. So, this is the stock ordered at time k. So, when you have a certain stock xk at time k and you order a certain stock and you order an additional stock uk at time k the cost at time k is r k or r of xk plus c times uk. This also along with the demand that occurs at time k wk which is demand in period k this then results in our stock at k plus 1 given by xk plus 1 equals xk plus uk minus wk. So, the problem then is to minimize this total cost we have a terminal cost plus we have stage wise costs the since the stock that we are that we are ordering at each time step at time k is a is an order it is we are not allowed to return we can say that there is a constraint here there is a constraint that uk must always be greater than equal to 0. So, since since we are only allowed to order additional stock we are not allowed to return return our earlier stock. So, now let us let us look at this a little closely this cost function a little closely and try to understand things understand what exactly we are trying to do. See notice that this the the what we have to decide on the face of it it seems like the problem is to decide how much stock to order at each time k. So, it is on the face of it it appears like our problem is simply to decide how much stock we want to order at each time k. Now, that way of thinking about the problem actually leads you to a kind of a dilemma because we do not really know well I have told you that you have the option of using the information of the amount of stock that is present in the system at the beginning of time k. So, uk can be chosen based on the amount of stock that is present at time k. But then if the stock is but the stock present at time k is not something that that is definite because it is it is a random quantity because it depends on the the demands that have occurred up until time k and the stock that you are and the additional stocks that you have ordered and time k. So, consequently if you look at any time any particular times any particular snapshot or any particular time instant the action that you have to choose is something that you cannot you cannot really know what action you will be choosing which is you will not there is it is not it is not possible to pose this problem as one where we have to decide how much stock we want to order at each time k. Although that is how I have stated the problem that this is what we need to decide when I pose the problem in this particular form I cannot really tell you what a what uk should be because I cannot the the actual stock to be ordered depends on the amount of stock that will get realized and the amount of stock that will get realized is random. So, when I when this sort of a problem is posed we really this this problem we need to understand very carefully in what what exactly is are we really posing this problem. So, there is so what we need to make here the distinction that we need to make here is the distinction between what are called what is called action and what is called strategy. The amount of stock to be ordered at a particular that we would order at a particular times at a particular time instant is the action that you will be it is an actual it is an actual action and it is an action because it is directly so this is what we call the action at time k. So, the amount of stock that that we are we would order at any time time instant is what we call the action at time k. Now, this action is chosen as a function of some information and in this case the information is the amount of stock that we had at the beginning of the times at the beginning of the time instant. So, the action is a function of information. So, what are we really choosing when we are solving a problem like this. So, what we are really choosing are two we to answer this question we can look at this problem in two different in two different ways. First is what happens in real time when you are in the field when you are in the field when you are actually implementing whatever you have decided at that time the what you would be doing is choosing actions when when it is you know when when you are actually going through those n time steps the the amount of stocks you would you would actually be ordering a certain amount of stock and what you would be then choosing is actions and the reason that that is what you would be choosing is actions is because at that time you have the information you have the information available with you and therefore you would be you would be in a position to define what action to take. However, when we are when the if you look at the way this problem is posed this problem is not posed in real time this problem is posed over over the n time horizons that you would encounter. So, it is effectively a problem that is posed even before the first time horizon begins. So, it is proposed it is sort of a a a prospect it is a it is posed on the prospective evolution of the system based on the actions that you would take. So, in this in this case therefore you cannot really be this problem is not about choosing the about choosing actions, but rather choosing a plan based on which you would then later take actions in real time. So, this problem is not about choosing actions at all because we cannot really choose actions at at time 0 because we do not have the information to choose the actions what we can do is say is make a plan we can say well if I have so and some amount of inventory this is the action I would take if I have 100 units of inventory I would order another 10 more if I have 50 units of inventory I would order another 20 more if I have 150 units of inventory I would not order anything at all etc etc. So, what you can do is come up with a plan that is contingent on the information that would be available to you at when you are actually in the field having to wanting to take the action. You cannot decide the action itself, but it is possible to make a to make a plan on every possible in every possible state that could get realized right and then make and and and choose therefore and choose plans that you choose a plan that you would implement when you are when the information in fact comes up. So, this problem that we have is not a problem of of choosing the actions, but rather one where we have to choose the choose what is going to be our plan of action. So, what we are going what we are choosing here is a what we are choosing what is in what do I mean by a plan it we are choosing basically a function a function mu k at each time k such that u k is equal to mu k of x k. So, this problem is that of choosing not the u k is itself, but rather the functions the functions using which you would be able to then in real time compute the u k when the x when the information of x k is made available to you right because so when we are doing when we are doing close loop control when we are when we are choosing the actions with as a function of information and the problem the problem that we have is not the problem that we that we end up get we end up with is one where we have to choose our plans not the actions themselves. Okay, so u k has to we cannot choose u k we can we need to choose the mu k. So, the mu k is what is being chosen here. So, this problem therefore is that of minimizing so let me write this problem again we are minimizing this cost we expected cost r of x n plus information k from to n minus 1 of small r of x k plus c times u k and the minimization here is not over u 0 to u n minus 1, but rather mu 0 to mu n minus 1. So, this mu 0 to mu n minus is what is being decided. So, what we are deciding is a sequence of such plans we are deciding a plan for how much inventory to order based on how much stock would be available at each time step. So, and you have once you have a plan like this for every time step. Right, this mu 0 to mu n minus 1 is is called a policy enough also called sometimes and these mu 0 to mu n minus 1 individually the mu k's they are often called strategies strategies sometimes also called decision rules. Why is this why has this happened why has this happened why did this problem why were we why did this happen that we initially posed the problem as taking the amount a certain amount of action which is ordering a certain amount of inventory, but then the problem then we realize that the problem cannot be posed in that way because we do not have the information. The reason this has happened is because there is this problem is stochastic when the problem is stochastic the future the future in the the information that you would have in the that you would have in the future is stochastic it is a random variable. So, its value is not determined at the time at which we are trying to make our decisions. If this problem was deterministic that means if the if the amount of demand at every time step was no at every time period was known then the state evolution would be known then we would know exactly how much inventory we would have and then we would know exactly how much action we would be we would have to take. However, because this problem is stochastic because the amount of demand that will get realized in time period k is not known to us at time at time 0 when we are trying to make our when we are trying to prepare our plans because of this we cannot really decide how much inventory has to be ordered how much inventory has to be ordered what we can only do is just make a plan that if this happens then I will do that and if that happens then I will do this and so on. So, the stochastic nature of the problem has basically made compelled us to lift the problem from the space of actions to a space of strategies. This is something that is germane to stochastic decision problems that that we really can very rarely can the problem be posed in terms of actions the problem has even you know even the simplest of stochastic decision problems get posed as problems where you have to choose strategies. Now what makes the problem what makes this particular problem difficult or different well actions are actually are simply real numbers or vectors that we are choosing but strategies or policies are now functions. So, the strategy mu k is a function that will map the state at time k to the action at time k. So, the strategy therefore is a function that on the state space to the action space. So, mu k is a function from the state space to the action space. In our case the state space is the real line and the action space we said we can only order non-negative quantities. So, the action space is 0 to infinity. So, as a consequence the problem has which was earlier about just choosing quantities has now become about choosing functions, functions from on the real line to 0 infinity. So, this is where this even a slight amount of noise that means the stochastic once you bring in stochasticity into the problem the problem has to be lifted to a high dimensional space. Problem that was on vectors gets lifted to problems on functions. So, this is what makes stochastic problems significantly harder than their deterministic variants. So, the same problem in the deterministic world could have been written as about a sequence of actions that we have to choose. But once there is stochasticity there is no way to do that you have to lift this problem to the space of functions. So, that means to the space of strategies or decision rules. So, in the next part of this we will try I will give you an idea of what the solution of this problem looks like.