 Welcome everyone. So, today I will start with a new topic which is also the last topic of our course in optimization. The topic is called dynamic programming or dynamic optimization. The theme of dynamic optimization is about taking decisions over time at multiple time instance one after the other and you optimize not just the cost at each time but a consolidated cost over the entire time horizon that one is considered. So, as compared to optimization in which there is one cost function that is to be optimized at that particular time, in dynamic programming we have a cost function at each time and what we want to optimize is the sum total of these cost functions over the entire time horizon. The complications in dynamic programming arise because this is not simply a separate optimization at each time step because the decisions you take at one time step impact the information that you have at the next time step and the decisions that you will take at the following time. And as a result of this, what one has is actually a series of optimizations that are coupled intricately with each other where decisions of the past feed into the decisions of the future. As a result of this, we have this kind of optimization requires a study in its own right, it is not merely a corollary of static optimization and as a result we, you know it requires in a kind some viewpoints and tools that were not, that are not actually present in static optimization. So, to motivate and to give to explain what I mean by all this is let us take an example. The example, the topic is what is called dynamic optimization. So, all dynamic programming and what is my example? Example is what is called inventory control. So, the example is of what is called inventory control. Now inventory control really refers to the situation where you are say a shopkeeper or a shop owner and he wants to decide the amount of quantity needs to decide the amount of quantity a certain item to be ordered some time instance, some time instance, then we will say our some time instance say 0 to n. So, these are n time instance and you want to order quantity at each of these time instance right and you want to determine what is the quantity to be ordered. So, now what is the what are the considerations involved in a problem like this? Well, you want to order a certain amount of quantity the goal is to meet the demand. So, the goal is to meet the demand is to meet the demand it is also to optimize costs. You may also have other constraints such as for example, you may have storage constraints, you may not be able to store an indefinite you know very large amount of quantity of that particular item, you may they may also be constraints of say perishability that the item may need to be disposed of before a certain time and so on. So, what we will let us to formulate this problem let xk denote the stock available at the beginning the kth period. So, now we have so what we will do is we will be making decisions over time periods. So, and those time but then the time p or time from time starting from 0 to n is actually to be divided into n periods. So, this is period 1 this is the end of period 2 etc. This is how we will think of time. So, time for us is going to be slotted and discretized like this. So, we have to develop a convention about when exactly are we keeping track of the state or the stock available to us. So, our convention is going to be that we will keep track of the stock available at the beginning of the kth period. So, when we are talking of stock xk it is going to be or say stock x3 it is at the it is going to be at the beginning of period 3. So, that is the so this is this is a convention that we will have to we will adopt. So, you could also adopt a different convention in which you take the stock at the end of the period but we are convention is going to be that it is going to be at the beginning of the period. So, the stock and let us denote uk as the stock ordered the beginning of the kth period. Now, once again this is again a convention are we will we are keeping the convention that the stock is being ordered at the beginning of the period. So, it is at the left hand point of the period alright and now they are depending on the problem you may have complications such as the stock may take some time to get delivered once you once you order it say for instance you order it and it comes to you in after a certain number of period. But for simplicity I am going to assume that the stock is both ordered and it is at the beginning of the period and delivered also immediately delivered immediately. So, this would mean that our time periods are wide enough so that they take into account both the amount of time that we are measuring but also the delivery time is insignificant. And let wk be the demand for the item during the kth period it is the demand of the for the item during the kth period. Now demand again is not is not something that is not something that we can attribute to either the left end point or the right end point it is rather a property of the entire period. But the way since we have adopted this convention what we can say is that we can relate the stock available at the beginning of the k plus 1th period to the stock available at the beginning of the kth period and the demand during the kth period. So, now as a result of this we have that xk plus 1 is going to be equal to xk plus so this is the stock available at beginning this is the stock you have added or ordered this is the stock consumed over the kth period. So, by the time you reach the left end point of the k plus 1th period by the time you reach the left end point of the k plus 1th period you reach the inventory level would then be xk plus 1 or the level of the stock is xk plus 1. Now we are going to assume that W1 these Ws let us say W0 to Wn minus 1 these W0 to Wn minus 1 these are let us say independent random variables. But if you are not aware comfortable with random variables it is okay you can simply think of these as some exogenous variables that we do not have we do not know the value. So, if you now the important thing here is that W is realized the Wk is going to be realized during the period k. So, W0 gets realized in this period this is W0 gets realized here W1 gets realized here etc etc right and Wn minus 1 gets realized here at the at the beginning during the nth period which starts at which during the nth period or during during this last period here. So, these are good. So, now because this is going to be realized during this period when we are making this decision okay when we are making the decision of ordering say quantity U1 at time 1 okay when we are making the decision of ordering quantity U1 at time 1 or a quantity U0 at time 0 or whatever the that we are not aware at that at that time of this of the value of W that will be realized over this particular period. So, U1 has to be decided without the knowledge of W1 right of course you would be aware of U0 but not the value of U1 right. So, this has to be done without the knowledge of without the knowledge of this particular the realized demand at that time. So, this is this particular thing because this is something that we so it is exogenous and is its distribution we cannot control it is what we use the word noise for this we call this noise. Noise is simply any randomness which comes from the environment whose distribution it cannot be chosen based on actions that you take that we can take right. So, again as if you are not comfortable with probability randomness noise etcetera it is alright but just be aware that that in taking in in solving this these problems we have to bear in mind that x k is not x k is to be chosen before W the value of W k is known alright that is something that you have to bear in mind alright. So, let me go to the next page then so we so what we these decisions have to be taken in order to optimize a certain cost. So, let me write out let me write out write out a cost function for it. So, so let us say there is a cost R okay R of x k and let us what is this this represents say a penalty for holding for either holding excess inventory inventory in the that is in the case when when x k is positive when you have a when you have a positive amount of stock left with you or a shortage cost. So, this is the cost say of unfulfilled demand let us say some notional cost we have for unfulfilled demand. So, how does this so in in terms of a block diagram if you want to think about what is going on here sorry that is another cost term which is say a purchasing cost okay. So, when every time we order you quantity u k let us say we incur a cost say c times u k or c of u k we are purchasing cost of ordering u k. Now, these are all kept simple I of course my my my R could also depend on k here say this R here could depend on k the purchasing cost would also vary with time it could vary with it here it is just varying with the quantity but it could also vary with time right a natural natural constraint since we are talking of purchasing an item not not of disposing of items. So, the natural constraint then is that u k should be greater than equal to 0. Now, the so what is the if you if you think of what is going on here in terms of a block diagram. So, you have your inventory it gets in demand w k the stock of kth period that is x k then there is also the u k which is stock ordered kth period beginning once again this is also at the beginning then you have the cost of period k that corresponds to period k. So, that for us is going to be R of x k plus c of u k alright and now with this information you then putting in this information the system then moves to the next time period and you get to stock at period k plus 1 in the following way x k plus 1 is x k plus u k minus w k. So, remember once again that u k has to be u k here has to be chosen without knowing this w k although these arrows are all pointing in at the inventory system here remember u k has to be chosen based on x k and the past w k is not as a function of this w k ok. So, now the goal then is now there are what is the objective then the objective is to say well I want to find I want to decide how much stock should I be ordering in order to minimize my total cost. So, let us say there is also in addition to these costs let us assume there is also a terminal cost R of x n this is let us call this a terminal cost a terminal cost is simply a cost that you would incur at the end of the time horizon that you are considering. So, when you reach this point n here at the time of the high time horizon that you are considering you have no more decisions to make that is the that is the time up to which you are looking ahead and taking decisions once you reach that sort of time for that end if you are left with an inventory of x n what is the amount what is the cost that having that inventory would carry. So, either positive inventory or negative inventory what is the cost that that would carry that is what we call the terminal cost. So, our total cost then is the sum is comes up as a sum total of three terms. So, you have your your your your cost of that you have for holding you have your cost of purchase and this cost apply these two cost terms apply at every time instant. But then in addition there is also a terminal cost here at the nth time instant. So, the total cost then total cost that total cost that we incur is equal to the expectation of the terminal cost. So, plus the cost that you incur at each time period there are these n time periods k denoted k equal to 0 to n the time the cost you incur at each of those time periods is r of x k plus c of u k. So, this here is the stage wise cost a cost for at each time instant this is terminal cost. Now, what we want to do is decide what should be these actions or what is that what what we want to decide is what is the amount of stock we want to order at each time instant at each of the k at each of the k at the beginning of each of the n p n periods denoted 0 to n minus 1. So, the the amount of stock to be ordered is is is our decision. So, you would what you would want to do is minimize the above cost minimize this r of x n plus this sum. So, you want to minimize this but then minimize by doing choosing what you would like to minimize this by choosing the amount of stock to be ordered at each time. Now, but if I there is a little bit of a catch here and so that is one thing I want to make sure you all of you understand. So, if I write here say my action that I want to choose which is u 0 to u n minus 1 while that is correct yes I do want to choose these actions choose the the quantity that I want to order at each time instant the trouble with defining it trouble with writing this here is that these quantities are to be decided based on the inventory level that we would see at the beginning of the time period. So, UK is the stock to be ordered at the beginning of the kth period and that would obviously be decided once you look at the inventory at the kth at the kth period and that inventory at the kth period would depend on what demand has transpired between up to that time from the start of the first period. So, the use UK cannot is not simply a quantity that I can decide at this right now because it depends on these inventory that would be available and the inventory that would be available depends on the demand that would get realized all these demands from the previous time instance here. So, there would be previous time instance here there the demand from those time instance would determine the inventory that would be available at the kth time instance and the demand that would come in the previous time period is not something that I know at the start of the at the start of the problem right. So, when I am trying to make this make this all these decisions when I am trying to decide this this is this has been this problem has been posed assuming I am sitting at time period 0 and looking ahead into the n time periods that I have that I have considered and since I am looking ahead I do not have knowledge of what the exact demand is going to be I do know its probability distribution I know what kind of what it is likely to be with what probability but I do not know what its value is going to be right. So, as a result because I the information on which my action is to be taken that is information on based on which my stock is to be decided the stock to be ordered is to be decided since that information itself is not available to me at the start of when these decision when this problem is posed I am actually in a fix because I do not know what action should I be planning what kind of information I should be planning for right. So, the value of the amount of stock to be ordered has to be decided has to be decided only once we have seen the information that is available which is the stock that is available at the beginning of the period. So, without that being provided to us the amount of stock to be ordered is not something we can decide right. So, as a consequence this way of writing the of posing the problem is actually not well posed. So, this is this is incorrect this is not well posed not well posed and the reason this is not well posed is because I do not have a way of even talking of the action you the action or the stock to be ordered at time k sitting at time 0 because the information for on the basis of which that action is to be taken is not available to me at time 0. So, what is this mean then? So, this means that we are start we are thinking of this problem sitting here at time at the start of the time horizon we know there are going to be demands real that are going to be realized over these period, but they they are exact that their value is not known to us we just know their distributions probability distribution sitting here we want to still be able to decide what is it that we need to what action should we be taking or what amount of stock is to be ordered at each of the time period right. So, because and a specific action cannot be decided because we do not know because that action would depend on this on the inventory available and that inventory available in turn would depend on the demand that would get realized which is not and which in turn is not known to us right. So, as a result of this what do we what do we see we see that when we have to make these decisions to minimize the cost that evolve that is that is stated over the entire time horizon a cost that is stated over the entire time horizon this cost we cannot possibly we cannot possibly pose this problem as a problem of deciding the amount of stock to be ordered the plane and simple reason is that the amount of stock to be ordered depends depends on the information that will come up later during the problem itself not at this and is not available to us at the start of the problem. So, what is then the alternative what is the alternative since we have we cannot since the quantity to be ordered is going to be decided based on the information that is that will become available during the problem what is the alternative what is it that we can actually decide at the start of the problem at this at the beginning of even before the even before the demand is actually realized the thing that we can decide is not the quantity to be ordered but what our plan is going to be for quantities that we would want to order based on the information that we would potentially get during the course of the during the course of the problem or during the course of the time during course of time during the problem right. So, we can decide what is the amount how much quantity would we order if we had a certain amount of stock available at the start of this at the start of the time period that means what we can do is not decide a specific quantity but rather an entire plan which tells us how much quantity should be ordered for each level of stock right. So, if this level of stock is 100 units you would order so much quantity if the level of stock is 20 units you would order so much quantity 30 units you would order so much quantity etc etc you do not we do not we do not know what the level of stock will be that is something that will come up during that will something that will get realized once the demand itself gets realized right. But we can still plan for every possible level of stock that could get realized and that is essentially what we are doing here we are we are thinking of every possible possible level that could get realized and based on that we are trying to decide what should and for each such level we are trying to decide what the amount should be all right. So, this way of thinking about the problem then basically means that we what we want looking for is not just merely an action which is an action would be simply a specific quantity that you want to order but rather an entire plan of actions. So, the plan of actions is is then a function it is it which tells you what action you should be taking as a once you get a particular information right. So, what we are therefore optimizing over is a function are functions functions that will map your information which is in this case the amount of stock that you have at the time at the beginning of the time period to the action that you want to take.