 So, we are just going to now cover one last topic which is just a bit more generalization of our elementary renewal theorem which is called renewal reward theorem. See like in the setup when we consider elementary reward theorem right, we kind of treated all cycles to have kind of similar value ok. All cycles when I return from state j to again state j that is like kind of completing one cycle again going from j to again state j that is completing one cycle and all. So, these are cycles right, but what could happen that when you are looking at some applications, when you are going from this state to this state again state j to state j again in between you could be accruing some reward or some cost. So, for example, so in the battery case right, when you when your battery went from full charge to discharge that completed one cycle, you may be interested in how much my lamp burnt with that battery in that case. So, maybe let us say your battery is powering something for time being ok, time being let us assume your battery is powering some electrical vehicle right. So, when your battery is powering some electrical vehicle, it may be moving some distance right. So, when battery goes from full charge to zero charge, your vehicle might have covered some distance. The distance covered by your vehicle may be kind of reward for you right. Now, you want to basically measure per each and that could be random that could be again random because every time you charge your battery until it goes to zero right. The amount of the distance that your vehicle traveled could be modeled as a stochastic variable right. It is not always it need not be deterministic because when you charge your battery till it goes zero that time it could have gone through different routes which could be having different level of traffic. Because of that it is not necessary that between when from full charge to discharge, full discharge it has to travel the same amount of distance maybe it traveled but that is a possibility that it could be stochastic also. Now, now in that case you may be interested in what is the average reward I accumulated per cycle ok. So, let us say the total in this example the total distance traveled average total distance traveled ok. So, how we are going to characterize such things. So, such questions can be again answered based on a similar setup, but we have to bring in the notion of reward associated with each cycle. So, let us try to formalize that ok. Now, basically simply now consider a renewal process. So, notice that now I am going to just consider an IID sequence xn I am not going to distinguish the first cycle and the rest of the cycles. So, it is just like all the cycles are same. Now, associated each cycle xj. Now, I am going to call this as cycle once life cycle one what we called it as xj we called as lifetimes right. One lifetime as I am going to call it as one cycle associated with each cycle xj is a reward rj and rj is also IID. So, with each cycle j there is an associated reward rj and this reward process is I am also going to assume as IID and also combiningly I can write this process as ri i where in each cycle there are two things one is the length of the cycle and associated reward ok, but this reward may depend on the length of the cycle. So, Rn could be function of xn that is it could be dependent of xn, but it is independent of other cycles and other rewards. So, this Rn it just is a function of xn in that way this is going to be I can an IID process for each n. So, now if you are going to treat this as one quantity this quantity let us say call this as simply yn which consist of xn and yn and if I treat this as a no let us call it z, z is also used let us call w this w is an again an IID process where wn corresponds to xn and yn ok. Now I am interested in accumulated reward right what I mean as I said in the in the electrical vehicle example if I have run it for run my vehicle for let us say till time t what is my accumulated reward till that time. So, in that in the in the vehicle example electrical vehicle example Rn is the let us say total distance you travel when your battery lasted for xn duration ok. So, now how can how how to define my total reward over a time t ok. So, for example let us say you have vehicle running and you want to define your reward after let us say t equals to 10 days. So, one natural way to do is you just see in those 10 days how many cycles have completed that is how many times you have replaced your battery and then just add the rewards you got in each of the cycles right. And here you have to be a bit more specific because when you say 10th day maybe the battery is just replaced yesterday and that battery is still running when you are looked at the 10th day. So, one possibilities you can define that the reward till time 10th day is reward I accumulated till the completion of previous cycle or you could simply define that when I say my reward this reward is I obtain when I start the cycle or other possibilities is when I end the cycle right. For example, in the electrical vehicle case when you completely charge your battery and put it in the vehicle right in the beginning you have not got anything right only when you let us say when your battery complete then you look at how much distance you have covered after battery completely just start then you can say this is the total reward I got. In this case you are defining your reward at the end of the cycle or there could be some cases where you could say that my I got the reward as soon as this cycle started. So, suppose you reward is obtained at the beginning of each cycle. So, then what you can do the total reward you accumulate till time t can be defined as summation of Rn n equals to 1 to m of t. What is m of t indicates here number of cycles that have been completely covered till time 0 to t right and it might be possible that the next cycle would have already started within that time t, but when it has started you are already including that reward in the in the cycle because you are getting this reward as soon as the cycle starts right. So, that is why you are going to include it here. Now, reward is obtained each cycle. So, if you are going to get the reward only at the end of the cycle how you are going to define your total reward summation n equals to 1 to just m t right. Because you know that exactly m t number of cycles have been completed till time t and that is why I have to include only that many because I am getting reward only at the end of the cycles. That means I should only care about those cycles which are completed within the interval 0 t. Whereas, when I defined here when the rewards I get at the beginning of the cycle when I define c of t I have to worry about all the cycles that have already begun within the interval 0 to t right ok. So, this is two possibilities depending on your applications maybe you can define it differently. So, let us take these two cases here. So, your reward has ended let us say this is your time t right this is your time t till this time t and let us say these are your cycles one cycle went like this another cycle went like this another went and you just finished like this. Now, we are saying beginning right of this. Now, if it has ended this the next one has actually started in the same instance right that gets included there. So, that is why that m t plus 1 has to be here. So, if it is this guys ended here because of this continuity in the other guys also at the end of this right at t it has started right. So, because this guy started right at t that n plus 1th beginning of the n plus 1th is actually happening till time right till t. So, that is what it has to be in this right. So, this is a boundary case most of the times what will happen maybe this guy will end and that guy will go to the next one. So, in this things it is clear only when this happens it is happening exactly at time, but we can by our definition it has to be included here ok. Now, I have defined total cost till time t and now I want to see what is my average cost ok. So, suppose I am interested in knowing how my cost per unit time changes. So, C t is what? C t is the accumulated reward time t and I am dividing it by t that means, my average I am basically looking at my average reward till time t right ok. So, I mean you can imagine many applications where this will fit in. For example, if you have a let us say you are you are running a factory where you have machine right like machines needs lot of maintenance and you want them to be working because if they are stopping that means, you are stopping your production. So, what you can do is ok there are some critical machines in your entire operations which need to be really running you do not want them to be down at any time. So, you can what you can do is ok you see ok I repair them and till the time they go bad this critical components I am still operating right when they go bad yeah I am bad shape, but let us assume as soon as they go bad I am going to replace them with the new one and again my operation starts. But during the time they are operating they are producing something and that production is going to be like a monetary benefit for you. Let us say while they were operating they were produce let us say 100 pack 100 items that 100 items is like some benefit reward for you right during the time of their operation. Again when they go down and you replace them next till the time they go bad again they would have produce another set of items that is another set of rewards for you. Now what you want to see is what is the total items I produced I am now interest this is the total items I produced till time t and I am interested in average number of items I produce. Now so this is going to kind of give you a sense of based on your maintenance and operations how cycle how the cycle of this operation of your critical component is changing and that is going to affect this ok. So, let us see this is the average so, average reward you want to see. So, we want to analyze what is the limit what is this value is going to be. See often what happens you might have seen that while we are doing all the studies we happily let t go to infinity n go to infinity right why is that like we know that that is a kind of observe to do right letting n go to infinity t go to infinity. So, these are basically basically because to get some intuition doing just analysis if you want to find this what is the value of this at a finite t it is very hard may we can do that, but that needs more sophistication this course is not for that for that you need to more machinery with whatever machinery we did you are already bored with that right. So, but with whatever machinery we have we can only do such analysis to if you want to really understand for every t I do not want what happens at n equals to infinity as t equals to infinity I want it at some finite time that is really what I am interest for that the current tools what we study in this course is not enough you need to go much much beyond that. So, that is what like with whatever we have and whatever we can say we can only do this what we call as limiting regime or asymptotic regime ok, but it is not still bad because they still give us some kind of intuition what is happening. So, now comes this theorem is universal. So, it says that if let us say Xn Rn is going to be is an IID. So, this one we are going to call it as when we have this we call it as a Newell reward process. So, when there is a reward associated with the process we will call it Newell reward process is an IID process with the addition of R1 finite and expectation of 0 and finite. Then it says by t goes to any guess are good. So, you are a good guess sir. So, expectation of R1 you said or R1 divided by expectation of X1 given you guess do not ask you right. So, that is why we say yes or no we ask prove or disprove. So, this is a probability 1 and again this if you are going to look at. So, if I take the expectation of this what is the limit it is same right. So, when C of t this is going to be the same this is expectation sequence. So, what we are saying the average reward goes to expectation of R1 divided by expectation of X1. What is expectation of R1? This is the reward per cycle expectation of X1 is this is the length of expected length of one cycle or basically this is actually expected length of each cycle because of this IID process. Now, whether this Newell reward theorem implies my earlier elementary reward elementary Newell theorem. So, right. So, suppose if you take this Rn equals to 1 what is this going to be m of t plus 1 right if Rn equals to 1 if Rn equals to 1 this is simply m of t. So, if you so I mean basically what we did is we kind of took this case right Ct equals to mt, Ct equals to mt is what the case we had gotten earlier. That means earlier we said that in each cycle my reward is 1 we said, but that is not necessary like my reward could be something different. This is why this is like a more generalization of the elementary Newell theorem we had. So, the generalization here is in each cycle the reward could be not just unit it could be something else and also it could be stochastic. Now, this result makes sense that is the average reward is going to be average reward divided by average cycle length. Does this make sense? So, you are saying Ct is what total reward accumulated till time t and you are dividing that by total time. So, that is could be same as saying that you just focus on one cycle and one and the reward in that cycle and just look at the average in that cycle. So, this if you look at this is also saying kind of average reward per unit time right the numerator is that expected reward denominator the expected time. So, in a sense this is expected reward per unit time and this is also asking that but this is asking over a total time t. Now, because we are in this process I could as well focus on one cycle and get the information from that one cycle ok. Now, if you now want this apply this kind of results to what I said your plant manufacturing unit or something. Now, you have this kind of information average. So, this is like a random quantity right the total reward you got average total reward you got. But now you kind of know what is your expected reward per cycle and what is the expected cycle length based on that you can directly get this. And if you want more reward what you could do is you can I have to design your plant such that you are getting more reward per cycle or you have to design such that your cycle time is smaller right. So, this is all about this renewal process and renewal reward theorem I wanted to say. So, there are other aspects to this which is there in the book like you can read into that. For example, we have defined a process aging process and we have defined another process right what it is called. Yeah, we have time residual process and age process right. So, all we can do some analysis on those process also we can derive those process also. So, in the book the properties of those processes are given. So, you should understand and this those properties one can derive based on similar ideas what I have we have discussed in the class you can just read yourself and you will get. So, we will stop here.