 So, in the previous lecture, we were discussing the problem of repairing a machine that could be in two possible states and we were getting information about that machine through inspections and we had to decide whether we want to continue operating the machine or to stop and inspect it thoroughly, get the correct state of the machine and repair it if it is broken. So, what we will now do is decide what we will now do is model this problem as a stochastic control problem with imperfect state information. So, you can see it already fits in that particular framework quite well. So, the first thing to note is that we can do this model as a imperfect state information stochastic control problem. So, let us write out this model. So, first is let us begin with the state. What should we take as the state of the system? So, it is natural here in this case, the state of the system is the one that decides the cost that we will incur. So, the state of the system is really the state of the machine. So, the state of the system can be either P or P bar. We can take two possible actions. The actions, the control space, there are two possible actions they have already been specified for us which is to continue or to stop and repair that is these are the two actions C and S. Now, the system evolution can be described in a certain way. Let us try to think of how this is how the system actually was. So, for that let us go back to this particular figure that we had. So, if we are in state P here and if we decide to continue then we would we would transition with probably then the machine just simply continues to run and we transition from this state P to the state P bar with probability one-third or we remain in state P with probability two-third. So, which means if we continue to run the machine that means if we choose action C then this particular transition here applies. Similarly, if we are in state P bar and we continue to run the machine then this transition applies means you transition from P bar to P bar with probability one. So, let us write that down. So, the way to model this is to do the following. Let us write this state equation as follows. So, we write the state at time k plus 1 as equal to wk. Where wk is a random variable really it is just a place holder for us to describe that particular probability the probability distribution shown in that or the transition probability shown in that diagram. So, here I will write the probability of wk equals P is simply given xk equals P and the action at time k is continue. So, if you took an action continue at time k when you were in state P then you transition you remain in state P with probability two-thirds. So, that is really the probability of wk taking value P given all of this. So, this is two-thirds. So, this whereas here whereas you transition to state P bar under the same conditions with probability one-third. Now, why did I introduce wk here why did not I write just the xk, xk plus 1 directly out here the reason is because I wanted to write out explicitly a state equation or the state dynamic. So, these are my state dynamics now. So, you can see they are kind of trivially written here the state dynamics xk plus 1 equals the noise by the wk and the noise has this particular prop distribution wk equals P given xk equals P and uk equals C with a certain probability and so on. So, the distribution that is written out here. We can write out something similar when we start off in state P bar. So, if we start off in state P bar the probability of that wk turns out is P that probability is if we start off in state P bar and we continue. So, you write u equal to C this probability of transitioning from P bar to P that means the machine that transitions from bad state to good state this we have been told this probability is 0 and the other probability which is that the machine remains in bad state this probability is equal to 1. So, so far we have written the transition probabilities for when the action of continuing is chosen. Now, let us write out the transition probabilities for the action of stopping and repairing. Now, in order to do this we need to know understand very carefully what we mean by the state at any at the state xk. The state xk here is the state xk the state at the beginning of time period k. So, here is time period k the so, here is say time 0 this is at the beginning of the problem this is at if time we have time 1 then we have time 1 then we have time 2 and so on. So, what we have is the is the state at the beginning of the time period so, xk here when I write k here so, xk here will denote the state at the beginning of period k. So, now with this understanding let us write this probability carefully what is the probability that wk is equal to p given that xk is equal to p and you take an action to stop and repair. Now, which means that at time at the beginning of time period k you are in you are in state p you chose to you chose to stop and repair at that time. Now, since you are in state p the stop and repair operation has no effect it will only keep you in state p the next time step you will you will land up in state p based on the same probability that you have in this particular figure here. So, you land up in state p so, it is like transitioning you would be running the machine now for a one period and you would land up again in state p with probability two-thirds. So, this probability here is also going to be two-thirds and similarly if you talk of what is the probability of landing in state p bar which means after one run of the machine what is the probability that the machine goes into the bad state that probability is again the same as given in the previous in that figure here which is equal to which would be one-third. So, you can see these two are basically these two probabilities are the same and that is because continuing or continuing or stopping and repairing has no effect when you are in when the state of the machine is p that means when you are an actually in the in a good state the repair action has an effect when you are in the bad state. So, let us write that down now so, the here now is the probability that you are you transition to a good state from a bad state when you take the repair and stop and repair action. So, you take a stop and repair action at the beginning of time period k with with when you are in state in the bad state p bar. So, this now puts the machine back in state p. So, as I had said here so, the this here the action there are two possible actions was to continue the second was to stop the machine determine the state through an angular diagnosis and if it is in state p bar bring it back to state p. So, this all of this happens in let us say instantaneously at the beginning of at the beginning of the time period or immediately. So, in other words we are ignoring the time that is elapsed in during which this this particular repair happens. So, then in that case what happens is it is as if you are it is as if you are beginning from state p itself and then you would transition from state p it is as if you are beginning here instead of here. So, although you are here the repair operation has now taken you from here to here and now one more round of running the machine will either take you to p with probability two-thirds or to or to state p bar with probability one-third. So, this here is equal to so, this is the probability of going to p p that will happen now with probability two-thirds and the probability of again become it again becoming becoming broke happens with probability happens with probability one-third. So, you can see once again this is the these two probabilities in fact this this and this all three of them are identical and with good reason as I have just explained. Now, let us suppose we the since we do not know the initial state we need to we will be assuming that we have a probability distribution for the initial state in or equivalently we have a noisy observation of the initial state. So, we will assume that the initial state is p with probability two-thirds and we will and it is p bar with probability one-third. So, this is what we will this is we are going to assume this now what is what are the observations that we get. So, we will denote the observations the observation equation in this as follows. So, observations will be denoted like this observations zk equal to vk and this will be since we are doing this remember we are doing these observations we will be getting these observations at the beginning of the first two time periods. So, at time period k equal to 0 and at k equal to 1. So, we get again the probability of vk being good that means the that means so the observations here are the result of the inspections. So, the observations here are results of the inspections that we get. So, we so go back going back to this figure here we have a state transition now at the beginning of the first two periods we are going to get the results of these inspections which is going to be these good or bad the inspections are going to be noisy. So, you may or may not be able to conclude the true state of the machine but they do not tell you the but if you do not know the true state of the machine unless you decide to stop and inspect after getting these observations. So, these are the result of these inspections are simply the observations that we are getting as part of the problem. So, this can be captured by our observation equation zk equal to vk or k equal to 0 and 1. So, now let me write out the probabilities with which these vk is occur vk here is good if assuming the state is also in we are in a good state this probability is 3 by 4 the probability that vk is bad given this is one-fourth. So, we get when you are when the machine is in good in a it is in good condition the inspection gives you an answer good with probability three-fourth notice we have written that here and one bad with probability one-fourth that is all that has been written out here. And similarly we have here vk equal to g given that the machine is in a bad state or improper state that is equal to now one-fourth and this is now three-fourth. Now, let us finally let us write out the costs that we will we will incur. So, we have two time periods here there is no terminal cost we have three rather we have three time going from 0, 1 and 2 there is no terminal cost at time 2. So, all we have to do is we have to minimize the cost over these two over these two periods. So, the cost that we incur is let me write it as g of x0 u0 which is your the one at k equal to 0 plus you have a g of x1 and u1. Now, to let us write these out more explicitly what does what kind of a function is this g g notice that g when you are in state we have been told that if you are in a good state and you take an action and you decide to continue that continuing is costless. If you are in a good state and you continue you we do not incur any cost if you are if you are in a if at any time you are you decide to you decide to stop then you incur as stop and repair and regardless of the state you incur a cost you incur a unit cost. So, that means g of p comma s is equal to 1 which is also equal to g of p bar comma s. Now, if you are in if you are on the other hand if you are in a bad state if you are in a bad state and you decide to and you decide to continue continue running the machine then you incur a then you incur a cost of 2 units. All right. So, as we had if you recall what what what we had written here so we have a cost of 2 units for starting in the period starting the period in state in in state p in state p bar and there is a cost of 0 if you start in if you start in p that means if you run the machine in state p bar you have a cost of 2 units if you run the machine in state p you incur a cost of 0. If you if if at any time you stop and repair then then you have a cost that means you choose an action s that means you have a you have a cost you have a cost of 1 that is the that is the cost that you incur all right. So, we therefore have these four terms that define now our cost. Let us now finally write out what is it that what is the information that we have at every time. So, information at time 0 is simply the first observation that is z 0. So, the information that we have at time 0 is is z 0 and z 0 remember can be can be given through this these two through these equations here. So, z k is equal to v k for k equal to 0 and 1 we can write we can find the probability of z 0 through these through these equations here. So, information at time 0 is at time 0 is z 0 the information at time 1 is z 0 z 1 and u 0. So, this is the this is the information we have at time 1. So, this comprises of the observations the inspection observations that we have the two inspections and this here is the action at time 0 this here is the inspection the result of the first inspection. So, what we want to do therefore is minimize this minimize our total cost which comprises which can be written out like this g 0 g of let us say x 0 comma u 0, but u 0 is chosen as a function mu 0 of i 0 plus g 1 of x 1 comma mu 1 of i 1 and this minim this expectation here is taken over x 0 w 0 w 1 v 0 and v 1 and we are minimizing this over mu 0 and mu 1. So, our policy comprises of these two decision rules mu 0 and mu 1. So, what we will do in the next class is actually try to solve this out explicitly we will we will actually substitute for different values of i 0 and i 1 and write out the dynamic programming equations and then compute what the optimal policy should be for this particular problem.