 Okay, next move on to functions of random variables and I do not know how much of that points I will be able to cover today, but let us try to cover as much as possible because as I said all of this you would have also learnt in your IE621 course. We will now consider functions of random variables, random variable as I said itself is a function, right? So, x we said that is a function from omega to r real line, it is a function, but now we want to consider function of this random variable itself, okay? In fact like you already see one version of this like you we just said right x and we defined v equals to x minus expectation of x. Now, can we think y is a function of x? y is already a function of right like in this we are basically subtracting constant amount that is one example. So, we have to and like that you may be interested in another function maybe y equals to x square maybe y equals to x cube all functions of random variables, okay? Now, let us take g to be any function and I am going to define y equals to g of x what I mean by this is and it is still going to be under the same sample space omega only underlying sample space remains the same that means if the new random variable is still on the new on the sample space omega only, but now so now if I am going to define y like this y is again a mapping from omega to r where y equals to g of x, but what does this mean is my new random variable on any omega is g of x of omega. So, first I will compute x of omega and then I am going to apply my function g. So, what does this mean is basically this is like a composition of functions, okay? So, I have my omega space every point I first mapped on the real line using my random variable x and now from there I am further mapping it using my function g. So, that is why g is applied on this point x of omega and this is like a composition of two functions and this is very useful especially when you try to do optimization in machine learning, right? You have to use such random variables and their functions, okay? One thing is like absolute error, let us say x is the quantity you measure, but that quantity could be positive or negative, but you do not care whether it is a positive or error, what you, what you bothers you, how much from the target I am away. So, there you may want to consider the absolute value. Like I said the previous example, right? Like I have this, okay? And then you have some target here. You want to hit this target, you are exactly should fall here, but you may fall short or you may fall when you may go beyond the point. What matters is how much from this point, how much you are away, how much you are away, whether you are short or overshooting does not matter and in that case you want to consider absolute value. And like this is what I am using this machine learning terminology is your absolute error, another something called hinge loss we use in machine learning, which is like you truncate the negative values, you take max of 0x. Suppose let us say I have to do this max of 0x. Let us say you have something like this, this is my y. So, let us say I have this line with 45 degree slope. So, what is the slope of this line is going to be 1, okay? So, if I have this y here and I am going to take it as f of here, this y is simply x, this y is simply x, that is why it is a linear line. But I may want to truncate the negative values. So, then what I can do is I can redefine my function as y equals to max of 0x. In this case, this value get truncated all this becomes 0 for me. So, it is going to be like this now, okay? So, whenever you want to discard your negative values of your x, you just put a max. Now that is you can think of x is y is some function of x now. And other simple things we will consider are like linear function, maybe y is simply ax plus b for some constant a and b, okay? Now you see that motivation, right? Like I may have to in certain cases deal with functions of random variables. Okay? There is an underlying function x, but I may be not be interested in that itself, but some function of this and that time I have to deal with such functions. Now in that case, suppose let us say you have x and you know it is a PMF, if it is a discrete random variable or let us say you know it is pdf, it is a continuous random as the case may be. You know this now, you have been defined a new case. So, this was like a p of x and this was like f of x. Now you have to face with or you may have their CDFs, which is like f of x. Now you have to deal with this case. Now I want to come up with the CDF of this new random variable or I want to come up with its PMF or if it is a continuous, I may have to come up with this CDF. Now how to find this based on what I know about x? How to do this? Okay, for that we will simply use the knowledge and the relationships. Now suppose y is my new random variable, which is a function of x and now I want to find its pdf. I will take some value y and I am trying to come find out pdf, sorry, CDF of y at point small y. Now this is nothing but probability that y is less than y, this is by definition and now y I have replaced by g of x. This is again the relation we have. Now I am interested in this basically implies the set of all those omegas such that x of omega and that value computed through function g that is going to be less than or equals to y. Now this is like an event. Now what is the probability of this event? This we know like we have always defined probability on events. Now this condition we have translated into event and on that event I know how to compute my probability. So, through that I have CDF of my new random variable y and now yeah, I mean maybe I do not need to go into these two. I will just leave this. Now for example take a discrete random variable x. Now I am particularly focusing on discrete case and later I come to continuous case. If it is a discrete case how to do this? We know that in this case probability of y taking some value y is nothing but probability of y equals to y and in this case y is nothing but g of x I have replaced y by g of x and now notice that this g function need not be 1 to 1. For example, so let us say whatever the points we have here it could be like this like this is my domain and this is my range and this is my g function mapping. It may happen that these two points could go to the same value. So, when I say g of x equals to y there could be two points of x realization would could can go to the same y. So, because of that when I have to deal with this I need to take into all those x's which gets mapped to y and add their probabilities. Is that clear to all of you? So, now if I want to compute the probability of this I know that this point and these points are getting mapped to this to compute the probability of this I need to add the probability of this value as well as this value and that is what we are doing here. Similarly, when we have continuous case and now I want to find out pdf of y how to go about this. Now let me ask me one simple question. Suppose let us say you have x is continuous and y is g of x and g is some arbitrary function. Is x also continuous or it can be discrete? It can be both that depends on the g function. It may just happen g function can be such that if the value is above 0 I can take it as 1. If the value of x is below 0 I will take it as minus 1. In that case y is only taking two values 1 and minus 1. Now in that case depending on what is the situation you may want to compute either PM like you may want to accordingly compute your CDF and also go for its PMF or you find its pdf function. So, now in that case like if you want to find p of f of y what you just to do is you know that y is p of y less than or equals to y you find this value through all these relations like the relations we have here and then you if it is a differentiable at point y you go and differentiate y and this will give you f of y at point y and this is provided your f of y is differentiable at that point y. If it is not differentiable you cannot compute there fine and next suppose you have f of x sorry x random variable and you have this random variable y. How to compute what is the definition of what is the expectation of y you know expectation of x. What is expectation of x this is nothing but x into f of x x dx it is minus infinity to plus infinity. Now how to find expectation of y then this is nothing but by definition y into f of y into y minus infinity to plus infinity d of y and this is by definition. So, if I want to compute this first I have to know what is my pdf of y then I can find it but it is not always necessary. What we can do is instead yeah y you just replace by g of x but in that case you already know that if you have done this g of x y's mass is already taken account by f of x you just you do not need to do this f of y you can just use f of x here and this will still give you expectation of y. So, what is the good thing about this I only know to g here and I do not need to really go back and compute what is my f of y in this case because computing f of y itself is some task right. So, I do not need to compute as long as I know f of x I can simply go and compute based on that what and do this integration and I will get expectation of y. So, this is like it looks like almost similar to expectation of x expression right expectation of x is x into f of x x dx but when I computed expectation of y all I did is replaced x by g of x and I did not change its cdf function. So, because of that sometimes this is called as law of the unconscious statistician because you may be simply changing this x to g of x without worrying about you actually use the f of x correctly or not but still you get the correct answer while being unconscious you are getting the right thing. So, that is what it is also called as lotus ok. So, you will see that sometimes in probability every time like you do not need to compute everything repeatedly like if you have some function y you do not need to go and repeat the story ok find its pmf or pdf to compute its mean or variance it is enough even if you know x and you have to just use some properties.