 Hello, everyone. In the previous lecture, we were trying to solve the secretary problem using the dynamic programming equation. If you recall, we had written out the dynamic programming equation as for steps t less than n and t equal to n. So, for t equal to n, this is what we had written out. We wrote that Jn of 1. Remember, 1 was the state in which the present candidate is the best you have seen so far. So, Jn of 1 was equal to H of 1, which was a terminal reward and that was equal to 1. You can see here. Jn of 0 is equal to H of 0. Jn of delta was equal to H of delta. H of 0 and H of delta were both individually 0. So, Jn of 0 equal to Jn of delta equal to 0 was what we concluded for time step for the last time step that is time step n. For t less than n, we had we wrote out that Jt of 1 that means the reward to go at time 1 was or the value function at time 1 was the maximum of these 2 of 2 cos. The stage wise reward plus the expected reward to go from continuing, which is what we got from action c and then a similar quantity that we got from action q. When we wrote out these expressions, we eventually came to the conclusion that Jt of 1 should be given is given by a certain expression, which is written here. So, Jt of 1 is max is the max of t by n comma Jt of 0. Similarly, Jt of 0 is 1 by t plus 1 Jt of 1 plus t by t plus 1 Jt plus 1 of 1 and t by t plus 1 Jt plus 1 of 0. So, Jt plus Jt of 0 had an expression that was given by this here and that collapsed eventually that gave us this particular these 2 these 2 equations. You can see these 2 equations along with what have along with the boundary condition Jt of delta equal to 0 for all t that then to gives us together a recursion that we need to solve in order to get the value function at each time. Now, one of the one of the other things we were able to conclude was that if you are in state 0 that means if the present candidate is not the best you have seen so far, then it is optimal to continue. That means that is you continue to search because you would get if the present candidate is not the best so far then there is still some chance that you will get a better candidate. So, you can so, you continue to continue to continue to interview for their candidates. So, which means in state 0 the optimal thing to do is to continue. In state 1 the what the optimal thing to do is a bit more complicated state to in state 1 the if t by n is greater than if t by n is greater than Jt of 0 then the optimal action is to stop that means it is to it is to quit if t by n is less than Jt of 0 then the optimal action is to continue if t by n is equal to Jt of 0 and either action is optimal. Now, what this suggested was the that an optimal policy has a certain form namely that you observe the first tau candidates and then select the one who is better than all the previous ones. So, up until you hit time tau you simply keep continuing you keep interviewing candidates one after the other. So, you continue your search for at least a time tau there after you wait till you see the candidate who is the best you have seen so far and then in that at that moment you just stop the search and you make an offer to that candidate. This can be written articulated in the form of a policy as follows. So, your policy pi star it comprises of mu 1 star to mu n star where in state 0 at any time t you can see the optimal thing to do is to is to continue for all for tau t less than tau and also for t greater than tau. So, at any whenever you reach state 0 regardless of what tau is you can you regardless of what t is you just continue. But in state 1 that means if you if the candidate you have seen is the best you have seen so far at that moment then in then you continue up until a time so long as t is less than tau. So, when t is less than tau you are continuing in either state in state 0 or in state 1 whereas for t greater than tau you continue only in state 0 and you quit in state 1. So, for when time is greater than tau you wait you look for to wait for the moment where the best candidate you have seen so far arrives the candidate you are seeing is the best you have seen so far and then you just make an offer. So, this was not a proof as you can see this is simply a suggestion this suggests that this could be an optimal policy but this needs a proof. So, what we will do in today's lecture is actually prove that the optimal policy in fact takes this sort of form and moreover what we will further actually get an idea of what the value of tau would be in terms of n. So, let us now go to the next lecture and sort this out. So, first thing we need to ask is the optimal policy really of such a form. So, for this let us observe a few things what we will prove is that if it is optimal to continue in state 1 that means at a certain time tau then it was optimal to continue for all such time all times before tau in it is optimal to continue in state 1 in at all times before tau. So, remember the optimality of continuing is only in question only in question at when you are in state 0 in state 1 in state 0 we have all sorry is only in question in state 1. Remember the optimality of continuing up until a time tau is only of in question at state 0 at state 1. In state 0 we have already established that it is optimal it is it is optimal to continue that has come about when we when we saw that this the max of these two quantities is is always the first term here and that term is that term is greater than equal to 0. So, as a consequence it is when you are in state 0 it is always optimal to it is always optimal to continue. Now in in what we will now show is that in state 1 we have this particular structure that we will show that the optimal policy has the property that we will show that the optimal policy has the property that if it is optimal to continue in state 1 at some time tau you write this at some time tau then it is optimal to continue in state 1 at all times t less than tau. So, in other words if if you are continuing to search further when you are in state 1 at a time tau then you ought to have been continuing you ought to have continued to search upon at earlier times as well in which case you should have been continuing in either state 0 or state 1 up until that time tau. So, this is what we will we will show. Now in order to do this let us assume that either j tau of 1 is greater than tau by n or j tau of 1 is equal to j tau of 0 is equal to tau by n. Now remember in this case if looking at this equation here j j when looking putting t equal to tau here j j tau of 1 equal to tau by n which would mean that tau by n is either greater than or equal to j tau of j tau of 0 which what that effectively means is that when either of these two cases hold either j tau of 1 is greater than tau by n or j tau of 1 is equal to tau by n and as a consequence j tau of 0 is also equal to tau by n in that case it is optimal in each of these cases in each of these cases it is optimal it is optimal to continue. So, the action c is optimal. So, this is what we are assuming. So, which we are assuming that the either of these two hold in which case it is optimal to continue. Now if either of these these two hold then let us go back to the expression we have for j j j t of 0. So, now let us write out this for t equal to tau minus 1. So, if I write this for t equal to tau minus 1 I get j tau minus 1 of 0 equals 1 by tau j tau of 1 plus tau minus 1 by tau j tau of 0. Now remember we have assumed here that j tau of 1 is either greater than tau by n or equal to tau by n and now if j tau of 1 is now let us analyze this expression here the right hand side of this particular expression and recall that what we have assumed is that j tau of 1 is greater than either greater than tau by n or j tau of 1 is equal to j tau of 0 is equal to tau by n. Now if this is if j tau of 1 is greater than tau by n then let us go back to this it means that j tau of 1 is greater than tau by n which means j tau of 1 is actually is since that is strictly greater than tau by n it means that it cannot since it is since j tau of 1 is the maximum of these two quantities it has to be equal to the second quantity here which is j tau of 0. So, under either in this case or in this case we conclude that in both cases we have from these this here implies that j tau of 1 is equal to j tau of 0. Now in this case we have therefore we can write that j tau minus 1 of 0 is equal to j tau of 0 equals j tau of 1. But j tau of 1 we had we had assumed that this is equal this we had assumed that this is either greater than tau by n or equal to tau by n. So, in other words this is greater than equal to tau by n but then that is strictly greater than tau minus 1 by n. So, consequently we have concluded that you putting these last two terms together we have concluded is therefore that j tau minus 1 of 0 is greater than greater than tau minus 1 by n and actually I will not need the stronger version I let me just simply write greater than equal to tau by n that is it not. So, j tau minus 1 of 0 is greater than equal to tau by n now greater than equal to tau by n. So, now let us write out j tau minus 1 of 1. So, j tau minus 1 of 1 is the max of tau minus 1 by n comma j tau minus 1 of 0. Now j tau since it is the max of these two terms remember j tau minus 1 is greater than greater than equal to tau by n and the other term here is tau minus 1 by n. So, comparing these two we are always going to get that this is the second term here this term here is the larger one. So, consequently we have that this is greater than equal to tau by n which is greater than tau minus 1 by n. Now as a consequence of this we have concluded that j tau minus 1 is also greater than tau minus 1 by n j tau minus 1 of 1 is greater than tau minus 1 by n. Now what is this basically telling us that it tells us that if we have that if either this or this holds at tau then it also holds at tau minus 1 then it also holds at for tau minus 1 which means that the assumption j tau minus 1 is greater than tau holds this holds which means that we can now apply this logic once again for tau minus 2 and then again for tau minus 3 and so on. In each of these cases we get that it is optimal to continue thus if it is optimal to continue at tau in state at time tau right this explicitly in at time tau in state state 1 then it is optimal to continue at all t less than tau. So, in other words the optimal policy what cannot have the form where you continue for some time if you continue for some time quit and then again continue for some other some more time. The optimal policy would be that if you are continuing for a certain amount of time at a certain time at in state 1 and then you ought to have been continuing up until that time as well right. So, that is what we have concluded so far. Now let us take this further and let us see what else we can say. So, let me write out first what the form of the policy is that is the thus the optimal policy has the form mu star tau of 1 equals continue implies mu star t of 1 equals continue for all t less than tau and mu star t of 0 is continue and this is true for all t. So, when you are in state 0 you continue and where you are in state 1 you continue up until a time tau. So, as a consequence of this we also we find a few other things about the value function this can be this can be easily calculated we find that j t of 1 is here it should be t mu t of 1 equals j we find that j t of 1 equals j t of 0 and this is true for all t less than equal to tau and in fact we in fact have that as a result of this we have that that j t of j we have that j 1 of 0 is equal to j 1 of 1 is equal to all the way up until j tau of 0 is which is also itself equal to j tau of 1 this. So, in other words so, the value function is actually the same the value function is the same in each state and at each time up until time tau. Now, what happens after time tau well after time tau after time tau it is it becomes optimal to stop when you are in state 1. So, j tau of 1 becomes j in fact at for any future times t it becomes j t of 1 becomes equal to t by n for all t greater than tau. Now, I can we can actually we can in fact also find what j t of 0 would be j t of 0 is therefore, equal to 1 by t plus 1 j t plus 1 by n plus t by t plus 1 j t plus 1 j t plus 1 of 0. So, in other words this can this can further be simplified and you can write it as 1 by n plus t by t plus 1 j t plus 1 of 0 this is the recursion for j t of 0. So, now I can in fact work this out even further I can I you can you can substitute this substitute this back backwards remember we have we have that j n of 0 is equal to 0. So, substituting this substituting this backwards gives us that j t of 0 can is written given by this expression it is t by n 1 by t plus 1 by t plus 1 plus dot dot dot till 1 by n minus 1 this is j t of this is j t of 0. Now, what would be what would be the value of tau then well the value of tau remember is one where value of tau is the largest would be would be the largest value up until when we have this equality. So, it would be the largest it would be the largest value it would be the largest value for which it would be the largest value for which j t of 1 and j t of 0 would end up coinciding. So, in other words the tau therefore would be tau would therefore be the maximum value of t for which j t of 0 would be the it would be the largest value of t for which this j t of so as I said tau would be the largest value for which j tau of 0 for which j t of 0 and j t of 1 would end up coinciding. So, the largest value of t for which j t of 1 and j t of 0 are equal is our value of tau. Now, for that for those to be equal if you just go back to our calculation for these two to be equal it is necessary therefore that j t of 0 is greater than t by n. So, in other words tau therefore is the largest value of t for which j t of 0 is greater than is greater than t by n. Now, we have just we just calculated an expression here for j t of 0 I can put that in here and conclude that tau is actually the max value of t such that 1 by t plus 1 by t plus 1 all the way till 1 by n minus 1 is greater than 1. Now, when is this how can we estimate this particular expression remember this expression is a function of n it because the this goes on up until n minus 1. So, I can I can approximate this expression as follows it is as if it is an integral from tau which is going to be a function of n till n of 1 by x dx this is almost equal to that integral it is an approximation to that. So, that that approximation that approximation equals logarithm of n by tau of n and remember we want this quantity to be greater than 1. So, we ask what is the value of tau for which this for which this becomes almost equal to 1 well we find that the value of tau then is tau is actually 1 by e times n and this approximation becomes better and better as n tends to infinity. In other words as n becomes large that means as the number of candidates becomes large the optimal thing to do for us is to is is to keep continuing until you have seen a fraction 1 by e of them 1 by e is is roughly this is this is roughly equal to 1 by 1 by 2.7 or maybe you can say we we we we keep continuing up until we have seen up until we have seen the first one-third candidate and after that we we we make an offer to the best that we have seen so far. So, what we saw was that this expression here is can be approximated by this integral integral from tau of n till n of 1 by x dx that in turn is equal to log of n by tau of n and we ask that this point this this logarithm should be approximately equal to 1 that that gives us that tau tau of n should be 1 by e times n and that is that number is approximately 36.8 percent. So, 1 by 2.718 is roughly 36.8. So, what this tells us is that the optimal thing for us to do for a decision maker to do in the in this kind of stochastic control problem is to observe the first 36.8 percent candidates. So, this holds when when the number of candidates is large so as as the number of candidates becomes very large you observe the first 36 the optimal thing to do is to observe the first 36.8 percent candidates and then make an offer to the best you have seen so far. The best one that that you have come across at that up until that time. So, which means that if you recall I had mentioned at the start of the optimal stopping problem that there is an there is an element of exploration involved in deciding what the optimal action is and at the same time one has to also be timely in the optimal stopping problem. This is exactly what is being seen here you make the offer when you get the when you see the best you have seen so far but that does not mean you hurry you make you need to explore for about the first 36 percent of the queue. So, this concludes our analysis of the optimal stopping problem applied to the secretary problem. Notice that here also what all we have received all that we have been able to get is an is an approximate answer we do not actually have the exact form of the optimal policy and the reason for that is the you know the problem is complicated and we have it is harder to solve in closed form of course one can try to attempt to solve solve this solve this numerically in in some or the other way but we can get some insight by doing this particular approximation. In the next lecture what we will do is we will look at a particular another stochastic control problem which in which remarkably we are actually able to solve the problem in closed form that means you are able to get the form of the policy in closed form and that is and that that problem is something that is very widely applied in all of you know in all of engineering. So, that is coming up next in the in the next lecture.