 Now, let us dwell a little bit about the way we have reduced a stochastic control problem with imperfect information to one with perfect information. What have we done in this in this reduction? We observed that the state can be thought the we observed that the information has this sort of recursive formula to it the information at time k plus 1 can be thought can be given is given as a function of information at time k action at time k and the disturbance at time k plus 1. So, we thought of this particular thing as our new state equation and then the information at time k as the new state the action remained the same the action was then chosen as a function of the information at time k and this z k plus 1 was the disturbance at time k. Now, this simple sort of observation may make you believe that this kind of reduction is rather easy and as a result of that there is no really no need to study problems of imperfect state information in any further detail because now we have been able to reduce them to problems of perfect information by just simply redefining the state. So, by redefining the state variable the problem has reduced to that of perfect state information. Now, this reduction actually comes at a certain comes at a certain price and the price that we pay for this reduction is the dramatic increase that we see in the complexity of the problem. So, what is the state space now of this particular problem if I may ask you. So, the state space is the space of information vectors at any time. So, the state space at time k is the space of the of k dimension of all the information vectors up until time k. The space of all the possible information vectors at that time k. So, in general this could be for instance this here in our problem is the set of all observations up until time k and all the actions that you could have taken at time k. So, this here is the state space of the new information vector of the new state vector which is our information vector. Now, this is the vector at time k. Now, at time k plus 1 you have an additional some additional observation you have an additional observation and you have taken also an additional action by that time. So, the information vector at time k plus 1 would grow you would now have a longer information vector at time k plus 1. So, the space of the information vector is also going to be a larger space now which was it will now be the vector of information it will be the set of information vectors at time k plus 1. This complexity keeps growing at each time step at each time step the new state space of the problem becomes larger and larger and becomes more and more involved. So, once if you may recall we had earlier discussed what the set of all history dependent policies were for Markov decision processes and there we had seen how the number of such policies explores exponentially or in fact super exponentially because the number of because the number of histories up until a certain time itself grows very, very far. So, the same sort of complexity explosion complexity occurs when we do this reduction in this particular way because now the information vector also keeps growing with time and therefore the set of possible information vectors grows with time. And as a result of that when we do when we apply the dynamic programming algorithm on this on this problem the number of values of i k that we need or i n minus 1 that we need to solve this minimization for also grows with time and grows dramatically more with time. So, as a result this while this kind of DP algorithm can be written out formally for every problem with imperfect information the the challenges in actually doing this practically in in any kind of computationally feasible or tractable way because doing this minimization for every i k is is you know is a horrendous task. So, we will do this act for an actual problem nonetheless the problem we will choose is the machine repair problem here is our machine repair problem. So, what is this problem this this much we have a machine a machine can be in which can be in two possible states in one of two states. Now, the states are denoted P and P bar the P corresponds to the corresponds to proper condition that the machine is in is in a proper condition the is in the workable condition and P bar is is that the machine is an improper condition. So, some guy it is a P is a good state and P bar is a bad state. Now, if you operate the machine while it is in a proper condition if you operate it for one more time step. So, if machine in state P is operated for one time period it stays then there are two possibilities it stays in P with probability two thirds and the other possibilities that it will it will it will become improper after this. So, it goes to goes to state P bar with probability one third. So, if you started with in state P when it is in a proper condition will remain in proper condition with probability two thirds after one time period or it may and with probability one third it will go to an improper state which is P bar of course if the machine is in improper state if a machine is in improper state. So, it means in it is in state P bar and it remains in state P bar with probability one. Now, we will operate this machine for a total of three time periods total of three time periods that is that is the operating life of the machine that we are considering here and the machine starts with assume that suppose it starts in state in state P suppose. Now, at the end of the first and second time periods of the first second time periods the machine is inspected is inspected the machine is inspected and now after once you do an inspection there are two possible outcomes the we the possible outcomes are that you can get an outcome good or you can get an outcome bad. Now, good here just says that it is probably in good state probably it just says that the machine is probably in good state and this when we get the outcome B it says that it is probably in bad state. Now, the probabilities themselves depend on the state of the machine itself. So, if the machine is already in state P then you get the probability of you getting good as the outcome of your inspection is three fourth. So, with three fourth probability you get we get good when the machine is already in state P if the machine is in and we get one with one fourth probability you get that the machine is in bad state when you start with when you are starting with when the machine is in state P on the other hand if the machine is in state P bar then we get good with probability one fourth and we get bad with probability three fourth. Now, at the after each inspection so this is so we get and we do an inspection at the end of the first and second time period and after each inspection after each inspection we have a choice we can take two possible actions. The two possible actions are one is to continue that is C just continue to continue the operation of the machine no no need for any any further intervention just continue the other is S which is stop stop the machine determinate state stop the machine determine its state through an accurate diagnosis. So, you do an accurate diagnosis so remember the earlier thing was was only giving a probable outcome probably good probably bad now you do an accurate diagnosis do an accurate diagnosis and and if it is in state P bar that means if it is in the bad state if it is in state P bar then bring it back to P bring it back to state P. Now, we these these operations incur a cost so there is a cost which is two units starting the period when so it is a cost of two units for starting the period with in state in state P bar and it is 0 if you start in in state P that means if you start in good in that particular period in in a good state then your cost is 0. Now, the cost of cost of stopping stop and repair that means the stop and repair action which is this which is the which is your action S this cost is one unit and terminal and we have a terminal cost of 0. So, the cost of continuing is is 0 and the cost of stopping and repairing is one unit. So, if you if you happen to start the start the machine in state in a state P bar that means in your when the state is bad then you incur a cost of two units if you started in state P which is the good state you do not incur any cost if you continue there is no cost of continuing but if you if you stop and repair then there is a cost of of one unit for taking that stop and repair action. So, we can represent the transition of this of this machine through this figure here. So, suppose let me write draw this kind of a figure here where these two are the states the machine could start with could be in say now when it starts in state P it continues and goes to state remains in state P with the probability of two thirds or it could go to a state P bar at the next time step with probability one third if it starting forms in state P bar it remains in state P bar with probability one. So, this here is a state transition this represent this is there for any at any time at any time this is your state transition. Now, if you are in state P the inspection could yield an answer G with probability three fourth or could yield an answer B with probability one fourth or alternatively if you are in state P bar then you get an answer G with probability one fourth and if you are if you are in state P bar you get an answer B with probability three fourth. So, this here is the inspection phase. Now, as you can see here the problem that we face here is that we do not actually know the state of the true state of the machine. We only know the outcome of inspections and then if we in fact if we keep the machine running running when it is in a bad state we if there is a chance that it will you know it will incur as a cost because it is in a bad state. On the other hand if on the other hand taking stopping and repairing always incurs as a cost of one unit that is something that we always have to bear if we decide to stop at any time step. So, this is basically the dilemma of the problem. Now, the problem is to the problem for us is to decide what is the optimal policy that minimizes the total cost that we will incur over three over the three time steps that we are choosing. So, determine the problem is to determine the optimal policy that minimizes expected cost over three time periods. So, in other words what we want to do is we want to find see remember the actions that we have to take the policy will describe the action that we have to take the actions that we need to take have to be chosen after we get the result of the inspections. So, we get a result of the inspections at the end of time period of at the end of the first two time periods at the end of time period one and then at the end of time period two. So, what we need to do is we need to decide based on the result of those two based on the results of the that we get of the inspections that have happened at those two time periods and the history of the problem that we have so up until that time we have we want to determine what the optimal action would be as a function of that information. So, we want to know if we should be taking action taking the action to continue or taking the action to stop at based on the information that we have at that time alright. So, problem is to basically find the optimal action after the first inspection after the first inspection. So, what we want to know is what the optimal action that we will choose after the first inspection after the result of the first inspection is known and the optimal action the result of the first and second inspections. So, in the optimal action after the result of the first and second inspection and so what else do we know at the second when we are taking the second decision we also know the action that you took after the first inspection and the action taken after the first inspection. So, we want to find the we want to find an optimal action optimal set optimal actions that have to be chosen at two time steps the first is after the first inspection then the second is after the second inspection. But what we know at the second inspection is that we know the result of the first inspection and the result of the second inspection and we also know the action that you we had taken at the first inspection. So, this is what we know at the second after the second inspection and we need to find the optimal action after the second after this taking into account this information. So, the optimal action has to be chosen as a function of this information. So, after as a function of the first inspection of the result of the first inspection and second in the second case the optimal action has to be chosen as a function of these three pieces of information the result of the first inspection result of the second inspection information and the result and the action that we chose after the first inspection. So in the next lecture what we will do is we will actually formulate this problem as a stochastic control problem with imperfect state information.