 So, in the last class, I talked to you what is a discrete random variable, what is a continuous random variable and I introduced the notion of probability density function for continuous random variable, then we talked about what, we talked about expectation of a random variable. So, moving on, expectation is one characterization of your random variable, like that gives on in a sense like on an average, what is the outcome I am going to see when I am going to perform this experiment. Other question you could ask is, okay, fine, like this is the average value, but when I do the experiment, it is not like I am going to see some expected value, right, like I am going to see different, different realization taken by the random variable. How these samples are going to be away from this mean value? So, mean value gives some average characterization, this is what, but the actual outputs I got how far they could be, is it that they will be very close to this mean value or they will be very far, how that, so to characterize, we will have another notion called variance. So, as you see here what I am doing is, I am defining the variance to be expectation of a random variable X, but it is kind of centralizing it, like I am looking, removing the mean value from it or that means what I am looking at the variation of its random variable X around its mean, but I am not worried about whether this variation is happening on the right side or on the left side, I do not want to care about it. So, that is why I take the square and then I look at the expected value of this. So, this is called the variance of the random variable. Now, if you are just going to look, expand this value, you are going to see that and if you just expand it, so see here, this is a constant, right, already. What you are removing from this X is a constant value, that is the mean value. So, this is another random variable, what we call centered random variable and then we have squared it and took the expectation. If you just going to expand this, this is going to be going to be this quantity where now we have variance is the expected value of the square of the random variable, what we call it as, we call this as a second moment here because the first moment is my simple expectation. My second moment is expectation of X squared, my mth moment is expectation of X to the power m, like that we will come to that. So, this is the second moment minus the square of the first mean. So, if I give a random variable and ask you to find the mean, that mean can be positive and negative, it can be both, right. What about variance? It is always going to be positive quantity. So, now let us see, okay, we have seen mean variance and one of you asked, let us see like the mean looks like some average quantity, right, but that is not what I am going to get when I perform the experiment. So, what you are often, but mean is still some quantity of interest, right, like it kind of give you the globally what is happening, like what kind of values I am going to observe on an average. But what you would be interested is, okay, if I perform my experiment, what is the probability that my outcome will be larger or smaller than my mean value. So, suppose let us say the height, if you are going to take the example of height of the population, let us say mean value is let us say 5 feet, okay, but if I am going to pick an arbitrary person, it is not necessary that is height is going to be 5 feet, right. But his height may be more than 5 or less than 5. Now you may be wanted, may be interested in asking the question, what is the probability that the sample I pick has a value which is larger than this 5. So, how we are going to characterize such quantities. So, that comes from something called Markov inequality, okay, Markov inequality gives a, I mean it slightly answers a different question that I posed right now. And it asks, what it tells is, if I have a random variable, what is the probability it takes a value larger than a certain number. So, suppose let us say y is a positive random number, then you may be interested in asking this question, what is the probability that y is going to be greater than or equals to c, right. You perform an experiment and you want to know like the outcome of your experiment is going to be larger than or equals to c. For example, let us say you are in a, you are like if you are going to put your money in casino or let us say stock or something, you want to ask at any point, okay, whether my returns are going to be larger than this quantity. Let us say I am going to make today if I put invest in the market, the outcome you are going to get by your investment is a random quantity, but you may be interested in asking the question whether my outcome, my returns are going to be at least like let us say 10,000 rupees, how you are going, so you have this. This is nothing but what, this is the, this is like complement of your CDF, right, CDF is why less than or equals to c, but let us not worry about the equality here. So, suppose you want to, now Markov's analytic says that this quantity is upper bounded by that you going to be larger than this quantity c will be upper bounded by this ratio, okay. So now, let us plug into some value, suppose let us say c is some real number, right. Let us take c to be expected value of the random variable y itself. So, what I am asking, if I do that what the question I am asking is what is the probability that y is going to be larger than or equals to its mean value. This is going to give a value of 1, right, but that has no meaning to me because I know that probability is always going to be less than or equals to 1. But if you are going to ask the question, if you are going to choose this c to be much larger than the mean value, let us say in your experiment y is such that y has a mean of, let us assume 0.5. Now, if you are going to ask the question, okay, what is the probability that y is going to be greater than or equals to 0.5, through this inequality you get a trivial answer which has no meaning to me and which has no extra information to me. But suppose if you are going to set this c to be 0.8, okay, you are going to ask the question, okay, what is the probability that my y is going to be larger than or equals to 0.8. Then this has some value, this is going to be 0.5 divided by 0.8 and in that way it characterizes u being away from the mean value, how fast it is going to be shrink, okay. This is a simple relation, but this is one of the basic results that is useful in many, many scenarios. So, let us try to understand why this is true and notice that I have written this inequality only for the positive random variable, okay. This comes straightforwardly, so if I write an indicator function like this, so y is a positive random variable, all of you understand this notation? This is an indicator notation, right, what does this say, yeah, so if this is, I have defined a random variable like this, it says, okay, you perform an experiment. If the outcome of y is going to be greater than or equals to c, that means this condition is true, right, then it is going to take the value 1. If this y happens to take a value less than c, this condition is not true and this quantity becomes 0. So, it is simply saying that, so what I am basically saying is indicator, so a is an event. So, this is going to take 1 if a is true to 0 otherwise, such a function is called indicator function. You give this 1 and 0, you are going to define something, I will give you a and I am going to say this is the indicator function. So, here you are going to define this function, let us call this as extra random variable x, what you are going to, so x depend on y in what way, whenever y takes value greater than or equals to c, x takes value 1, otherwise it takes 0. So, here y could be a continuous random variable, it could be taking any value, but now through this indicator function you have defined another x which takes only 1 or 0, it you have a binary random variable which is a function of this y. Now, if I write like this, is this relation true for a positive random variable? Okay, now let us see, suppose y is greater than or equals to c, the left, this quantity is 1. What about the right quantity? Greater than 1. Greater than 1, so it holds. Suppose y is less than c, this is 0, but what is this? Some positive number, non-zero, I mean positive number, right, so this relation again holds. So, whatever be your choice of c, this relation holds, right, so now if I take expectation on both sides, so this is true for all c, I see, okay, so now I am going to take expectation on both sides. If I take expectation of both sides, my claim is the expectations also retain the same, the inequality direction remains the same if I take the expectation, why is this true? Yeah, so we had stated this property, right, when did we state this property? Right, so we said that if there are two random variables x and y, such that x always dominates y, that is x is greater than or equals to y with probability 1, then we said that expectation of x is greater than or equals to expectation of y. We had a name for that property, right, preservation of the order, did I apply it correctly here, okay, so because this is true, probability 1, that means this relation always holds, right, irrespective of what value y takes and for any given c, so this is true and I could apply this. Now, what is this further? What is the expected value of this indicator function? So, if I say this is nothing but probability y greater than or equals to say, is this correct? Okay, fine, so that means that this true, then we are done, right, so here y and c, so from here to here, what property did I use? Just use the scaling property, right, because he is a constant, so that is why we get this. Yeah, if I am going to assume, if I do not assume c is going to be greater than 0, if this become negative, then it may not hold, right, because I already assuming y to be positive random variable, right, it does not make sense to take c negative, because I know that y is, it is always going to be greater than 0 at least, which one, this part, yeah, if I am going to take this is going to be, yeah, right, that is fine. So, this is going to be 1 in that case and this is going to be negative, so it does not make, this inequality does not hold, but fine, we have to explicitly put, if you want to make it correct, okay, so let us say c greater, strictly greater than 0, because otherwise not true, but I mean this, for a positive random variable having c to be negative, taking negative does not make any sense, because I already know this probability is 1, why we need a bound and then the next inequality is called Chebyshev's inequality, so the Chebyshev inequality exactly tells how much, what is the probability that my random variable x will be taking value away from the mean by certain amount, so let me, so suppose if you are interested in asking this question, another thing is often I am going to use sigma square to denote variance of a random variable, so when I want to make specific that this variance is associated with the random variable, I will put x like this, okay, and if I am not specifying, I mean always I do not need to specify which random variable it is associated with in this, I will simply write it as sigma square, for any greater than 0, so to make it then we can write the probability that x minus mu greater than d is upper bounded by sigma square by d square, so what is this basically saying, mu is what, mean of this x and the sigma square is variance of this x. Now if I ask the question, what is the probability that the value taken by random variable a, when I compare it with its mean value, the probability that it will be away, I mean that difference of the x and the mean is at least d, what is this probability, this statement says that this probability is upper bounded by the ratio of variance divided by d square, okay. So do you see in any way like this quantity over here can be derived from this, in what way, how? So this is, I have deliberately put absolute value, right, if it is absolute value, if I am going to take square on both sides, the order is preserved and the probability should not change, okay then. So now if you apply Markov inequality, how we are going to apply, you are going to treat this quantity here as let us say y and this quantity is a positive random variable, right, because this is the squared value. Now what is the expected value of that? So when I am taking square, I really do not need to worry about this absolute value here and what is c? So c here should be what? d, d square, d square. Now what is this expectation of x minus mu whole square? Variance. That is variance and that is exactly this, right. So you see that like the Chebyshev inequality here, whatever I have is implied by my Markov inequality. Now what it is saying? Suppose let us take two scenarios, let us take two scenarios, let us say I have a random variable x with mu and let us say, so let us say I have two random variables, x1 with mean mu and a variance sigma 1 square and I will take another case where I will change this variance to sigma 2 square while keeping mean same, okay. So what, by our understanding of this variance, it should be such that in this case my x, the spread of my x around mean if suppose sigma 2 is larger than sigma 1. So by our understanding vaguely at this definition, the variation is more, right. So if sigma 2 is more than sigma 1, in this case this x should be about its mean value, it should be like spread, okay. So in that way, if I am going to increase the sigma square here, this probability is increasing, right. So that means then me looking at this x being spread around mu, then that probability is also larger, right. So in that way this Chebyshev inequality is kind of capturing how your random variable x spread about the mean value, okay. So we will see more about this when we look into specific distribution, okay fine. So next I am going to look at something called characteristic functions where j is my standard ILS number and this is nothing but, so let us take a random variable which has a pdf, that means x is here continuous random variable. I am going to define its characteristic function as for any u that is given, it is going to be defined like this, expectation of e to the power ju and here x here. So this is the random quantity here, u is fixed, j is fixed, j is your simple ILS number, complex number and by definition or expectation, this is the value of characteristic function, right. So now if you are going to just look at this f of x as some function given to you, what is this quantity indicates? If you look into like Fourier terms in terms of Fourier terms, so you guys recall what is the Fourier transform. So Fourier transform is defined for a function, right, give me a function, I will define the Fourier transform of that so that I can go from frequency domain to time domain and vice versa. So if I am going to treat f of x as my pdf as a function here, what is this quantity here? So this is nothing but the inverse Fourier transform of your function f of x and also there is a 2 pi factor here which we have not considered here. So this is just a definition. Okay, what is the consequence of this definition? Why this is useful? So earlier I told that we are going to say that this is x to the power k, I do not know if this exists or not but this I am going to call it as kth moment. So k is equals to 1 correspond to expectation, k equals to 2 corresponds to the second moment like that. So by the way, if I say my variance is finite, suppose let us say is my second moment finite? So if I say this quantity is finite, variance is finite, it is given as the difference of these two quantities, right? It must be the case that this should be finite, this must be finite otherwise that quantity is not going to finite. So if variance is finite, my second moment is already finite but I do not know like the higher moments will be finite or not. But at least if I say as long as variance is finite, my second moment is finite, you will also see that if your second moment is finite, also implies that your first moment is going to be finite. Okay, fine. Now if I want to find the kth moment of a random variable, then this character function comes handy. How is that? So suppose let us say we have, I know this characteristic function of a random variable x, all I need to do is take the kth derivative of this function and compute its value at u equals to 0, then that is going to give me the kth moment. So I am going to just write it, so this relation holds. So what is this? When I wrote write like this phi of k with a superscript k, that means this is the kth derivative and I compute it at 0. This is simply going to be j to the power k and expected value of x to the power k. So that means this quantity is simply going to be related to the kth moment directly. So you can directly see that, if you have all like this, if you are just going to differentiate this with respect to u, every time you differentiate this with respect to u, a x factor comes and every time second derivative, another x factor comes. After kth derivative, k, x to the power k factor comes and when you plug in u equals to 0, this term vanishes and you will end up with it. Just this is a computation, just look into that. So one good thing about this is, why this is often of interest, the character function is, every pdf has a unique characteristic function. So if you know a pdf function, you already know how its character function is going to look like. So if you know already characteristic function, you also, you can immediately say this character function belongs to this pdf function. So I am going to say that two random variables, say pdf, if and only if they have same or instead of probability distribution have same characteristic function. I think this is slightly stronger than not just pdf, it is going to be probability distribution. So if you have a probability distribution and if you are going to complete its character function, that is going to have a unique one. And if you have a characteristic function, you can associate it with a unique distribution that way. So this, we are going to use later when we are going to do some theorems like. I think we are going to use it when you prove center limit theorem and all. So this is for a case where I have a pdf and we can also define similarly for a discrete case. So for a special case when my random variable is non-negative for a non-negative random variable x, we are going to define z transform which is defined as phi of x of z is equals to expectation of and where I want this region of convergence range to be less than 1. And again we can show that for a, this is for a discrete random variable which is taking non-negative value, again we can show that phi of x1 is nothing but expectation of x, x minus 1 all the way to x minus k plus 1. So here it was straightforward that in the characteristic function, if you take the kth moment and put computed at 0, it was directly related to the kth moment. But here in the zth form for it is not so straightforwardly related. So I am going to, if you are going to take the kth derivative of this function in z and then compute it at the value z equals to 1, this is what you are going to get. And you see that this expectation is now involves all the moments. If you are going to just expand this x, x minus 1 all the way to x minus k minus 1, it will going to have expectation of x to the power k, expectation of x to the power k minus 1 and it will, at the end it will also have expectation of just x if you just expand this. And again two random variables have same probability distribution and only if z transforms. Here in this case x is a discrete random variable. Non-negative, we are just going to say, sorry non-negative I have to say integer random variable.