 We were discussing continuous random variables yesterday. We said a random variable x is continuous if the probability law p x is an absolutely continuous measure with respect to the Lebesgue measure right. So, we said. So, p x is absolutely continuous with respect to lambda which is the Lebesgue measure. In other words this means that every Borel set of Lebesgue measure 0 has p x equal to 0. And then we said there is this radon equilibrium theorem which we only stated which we will not prove. We said that for every Borel set. So, there exists. So, if p x is absolutely continuous with respect to lambda there exists a measurable function f x from r to 0 infinity. So, this is a non negative measurable function such that for all Borel sets p x of B is given by the integral over B f x d lambda. So, this is a Lebesgue integral you can just look at you can just view this as the probability of the set p x of B being given as the integral over set of this function right. This function we said is the probability density function of this random variable. So, radon equilibrium theorem is the result that asserts the existence of a density for continuous random variable. And so this. So, some people write this in notation as. So, this is notation. So, they write d p x over d lambda equal to f x. So, this looks less though this is the derivative of this measure with respect to that right. So, this is known as the radon equilibrium derivative. So, this is just notation and the radon equilibrium derivative of the probability measure p x with respect to the Lebesgue measure lambda is your probability density function p d f. So, the interpretation of this is that see if you have a set of very small Lebesgue measure measure tending to 0 it is probably measure will also be very small. So, if you take a very small Lebesgue measure set it is p x will also be very small. So, you divide that measure over the small Lebesgue measure you will get a function right. So, what you get will depend on where you are on the real line and that is your probability density function that is the interpretation of this radon equilibrium derivative. Are there any questions? So, that is really all there is to a continuous random variable. So, and then we started looking at some examples right. The first example we looked at is our familiar uniform measure uniform random variable which puts a uniform measure in some interval a b or 0 1 whatever. Then we looked at the exponential random variable the exponential random variable has a remarkable property that it is memory less. So, if you know that the life of a light bulb is for example, exponentially distributed you know that if the bulb has been glowing for certain amount of time the further time that it will glow will also be distributed as the unconditional life time right. So, because it is memory less we proved it. In fact, we also said that the exponential distribution is the only memory less distribution among continuous random variables right. So, we will look at a few more examples may be couple more examples. So, example 3 of a Gaussian random variable. So, this Gaussian random variable is probably the most important of all these continuous random variables. If there is yeah. So, this is if you have to pick one distribution as a single most important distribution among continuous random variables it has to be the Gaussian distribution Gaussian random variable. It is a two parameter distribution and I will specify. So, the probability density function is easier to specify for the Gaussian. So, mu is some parameter in R and sigma is some parameter in R plus it is strictly bigger than 0. But, the parameters by two things remember the exponential of parameters by one lambda right. So, this is parameters by two things and it is density is given by f x of x is equal to 1 by sigma square root of 2 pi e power minus x minus mu squared over 2 sigma squared. So, this is the this is for all x in R. So, the if you just look at the density. So, if you have your mu which could be positive or negative I am just drawing it as negative. So, this distribution will be symmetric around mu as you can verify right and then it falls off very quickly like e power minus x squared right. So, it starts off at the value what is it start off at x is equal to. So, it starts off at 1 by sigma square root of 2 pi right yes. And then it falls off very quickly looks like that and. So, this is your mu and if your sigma is small then this height will be quite high this will be quite high. But, it is rate of fall will be even quicker it will be very peaked. Whereas, if your sigma is large it will be more spread out right this curve will be more spread out. And if you have. So, for a standard what is known as a standard Gaussian or a standard normal. So, this is also known as normal random variable standard Gaussian you have mu equal to 0 and sigma equal to 1. Then you have f x of x is equal to 1 by square root of 2 pi e power minus x squared over 2 right this is called standard Gaussian this will be centered at 0. So, this Gaussian random variable occurs very commonly in practice in particular for example, if you measure the noise across a resistor and try to plot a histogram of how it is distributed it will be distributed like a Gaussian right. And it is several I mean innumerable number of places where this Gaussian distribution occurs the importance of the Gaussian pdf is because of 2 reasons. One is that it is what is known as a stable distribution which will study later. So, if you keep a summing independent Gaussian random variables you will get a Gaussian random variable which we this is something we will study later. So, it is stable under summation right also there is a very important theorem known as central limit theorem which says that even distributions which are not Gaussian if you keep adding those random variables independent random variables which are even not distributed as Gaussian you will eventually get to a Gaussian right. And that is the theorem we will do towards the end of this course right. So, these. So, there are so the Gaussian random variable has a very remarkable properties. So, it is stable and it is an attractor and therefore, it is great importance. So, if this is just the pdf the cdf is. So, if you want to take the cdf of the standard Gaussian it will look like. So, it is just for the standard Gaussian it will look like this right 1 over square root of 2 pi e power minus y square over 2 dy is not it. So, this integral cannot be expressed in closed form it is just whatever it is just an integral and this is known as the error function of x. So, error function is simply the cdf corresponding to the it is the cdf corresponding to the standard Gaussian. And this error function as I said it is not available in closed form there are some approximations available. And in the old days they these were the error functions were tabulated just like logarithms were tabulated right. But in this error of computers we do not need to I mean we just give it to math lab and you can get the error function values it whatever x you want. Are there any questions? So, this mu and sigma have. So, the mu has the interpretation of the mean and sigma has the interpretation of standard deviation we will we will get to that much later. Is there any questions at this point? Yes, that is also an equivalent definition. So, we did not define it that way, but you can I think this book by Grimett and Sturzaker says that x is a continuous random variable if capital fx of x can be written as the integral of some function some non negative function. So, what you can do? So, if you take any small fx which integrates to 1 non negative function which integrates to 1 then that should correspond to some random variable right. Because your properties of CDF will be satisfied no you see the people see people figured out that certain distribution occur very commonly in real world right. So, it was not I mean this was. So, the theory of probability came much later right. So, people were people figured out that this kind of bell curve this bell curve is occurs everywhere from you know heights of students in class to go register noises right. So, they all follow this kind of distribution and people knew this for a very long time. So, I am just saying the CDF does not have a closed form the PDF as a nice form and obviously this PDF does not have any interpretation of probability as I said yesterday. Because this could be very tall right this could be very peaky and it does not have any interpretation of probability only when you integrate the PDF can you speak of probabilities right. So, this is that is the goes we will keep coming back to this is a very important distribution. I will just give one more example this is that of Cauchy distribution Cauchy random variable I will just I will just give the I will not parameterize it I will just give the unit version of the unit Cauchy put to speak. So, the density is given by f x of x is equal to 1 over pi 1 over 1 plus x square x n r. So, you want to parameterize it you have to put x minus x naught whole square and some gamma here gamma here. So, it is just distracting let me not do that. So, this guy. So, as you can see I mean this is a valid distribution of course, I mean first of all you have to verify this is a valid distribution right you have to integrate minus infinity to infinity and say that this is equal to 1 it is even that is not very easy to do I forgot to mention that the integral if you just integrate this guy from minus infinity to infinity you it is equal to 1. But, showing it is not the easiest not the easiest of integrals to do right you have to go into the complex plane integrate over a contour and do some complicated stuff to get this answer equal to 1. But, it is true here it is much easier if you want to verify this is a valid probability density function you want to say that the integral minus infinity to infinity this is equal to 1 this will be a inverse tangent right. So, you will get your this is easy to do. So, this also this looks like also it is symmetric around 0. So, as the CDF of this will be a inverse tangent right you can write a minus integral minus infinity to y over whatever and get an inverse tangent formula for the cumulative distribution function. Again if you plot this it also looks. So, it starts off at 1 over pi it is not it and then it falls off rather slowly right it is falls off only as 1 over x square is not it for x large it falls off as 1 over x square. So, the exponential distribution falls off as it falls off exponentially right it falls off as e power minus lambda x it is an exponential fall which is very quick this is even faster this is e power minus x square right. So, Gaussian decays very very fast whereas, this decays only like a power law right looks like x power minus 2 with me. So, there is a qualitative difference between this Cauchy random variable and random variables such as Gaussian or exponential and the uniform for example, the uniform is bounded right it only takes values in a bounded set right a to b for example. So, it is deterministically bounded in that interval the exponential Gaussian Cauchy they are all actually unbounded there is no deterministic bound on the values they can take right. If you look at the exponential in the Gaussian the probability of taking very large values falls off very rapidly right falls off exponentially or even faster than exponentially in the case of the Gaussian. Whereas, in this case the probability of taking large values is not negligible it only falls off as some x power minus alpha here alpha is 2 right. So, these kind of random variables which the probability of taking large value falls off slowly. So, this is called a heavy tail random variable because I mean the it is like a power law right it falls off like a power law. Whereas, the exponential and the Gaussian random variables they are all examples of light tail random variables. So, they are very qualitatively different. So, if you have a heavy tail random variable like a Cauchy random variable it will take fairly large values with non negligible probability are there any questions. So, there are many other distributions which continues random variables, but many are most of them are derived from some these important distributions or generalizations of these distributions. There is all this gamma chi square log log normals and that they are all various variations of exponentials of Gaussians or derivative or they are obtained from transforming these random variables in some way right which we will get to them later. So, that this point all you really need to know is really uniform exponential and Gaussian and this is bit of an odd one out because it is heavy tail right this has some fairly bizarre properties we will see later are there any questions. So, this is called a two sided Cauchy because it is symmetric around 0 there is a one sided Cauchy if you put 2 over pi 1 over 1 plus x square and define it over x greater than or equal to 0 then this part will go away and then you will have something that is twice a stall only falling of like that right that is called one sided Cauchy you can have parameters you can have. So, normally you have you can write x minus x naught whole square and let me see. So, you have. So, normally you have gamma square gamma this is the parameterized version I just wrote the standard version of the Cauchy you can again verify that this is a valid pdf by just putting x minus x naught over gamma as some y or something and integrating this is the parameterized version of Cauchy now that you asked. So, just like so for discrete random variables it is enough to specify the probability mass function and for continuous random variables it is enough to specify the probability density function or the radon equilibrium derivative which is the same thing because you can integrate it out to find the probability law right are there any more questions. So, we will move on to single random variables. So, the single random variables are fundamentally different type this is another fundamentally different type of random variable right this is neither discrete nor continuous nor are they some mixture of these 2 this is probably something you would not have encountered so far. The reason that people do not encounter it so far is because it is very bizarre and not so much of real world use right and for that reason I will only I will not spend too much time on it I will just I will just tell you what it is what it looks like it is interesting for intellectual purposes and for academic purposes, but it is not of great practical use in engineering or statistics, but it is a very it is mathematically it is a perfectly valid probability measure on R. So, the way it works is so let us see what it could be right. So, a single random variable is something that lies between a discrete random variable and a continuous random variable. So, a continuous so a discrete random variable puts all its measure on some countable set right all the probability measures it is on a countable set finite or a countably infinite a continuous random variable on the other hand. So, on every countable set it has to put measure 0 very clear right not only that it has to put 0 measure on all Lebesgue measure 0 sets right. So, even if you have a canter set for example, any continuous random variables should have 0 probability on the canter set why that is the definition right. So, the singular random variable so what it does is that it does not put it put 0 probability on all single turns just like a continuous random variable, but it puts all the probability 1 on a 0 Lebesgue measure set an uncountable set of Lebesgue measure 0 that is how it works right. So, discrete random variable puts all its measure on a countable set right. So, singular random variable puts all its measure on a uncountable, but Lebesgue measure 0 set such as the canter set for example. So, let us define this so that is the definition of a singular random variable. So, for every single turn on the real line it puts 0 probability. So, this is similar to a continuous random variable right a continuous random variable has to be a take 0 probability on single turns right, but so here is the difference from a continuous random variable, but there exists a Borel set F which is of 0 Lebesgue measure which contains all the probability. So, this is the essential difference because see if you had a continuous random variable if I tell you F is a Borel set such as that lambda F is 0 you can immediately conclude that p x of F is 0, but a singular random variable puts all its probability on some Borel set of an uncountable set of 0 Lebesgue measure. Now, why I am saying this. So, this F has to be necessarily uncountable right I have not said it in the definition why see because if F were countable see all single turns are 0 probability. So, if F were countable F will have 0 probability countable relativity right. So, although you do not have you have not said this in the definition you do not have to, but it is clear that F is a uncountable set of 0 Lebesgue measure which contains all the probability alright this is clear. And so you already know I mean if you want to give an example of this what would you do. So, you already know an example of such an F right which is your canter set canter set a 0 Lebesgue measure right. So, this is fine and you are looking for some measure on the canter set you are looking for some measure which is sitting entirely on the canter set such that single turns are 0 probability right. So, that is called a canter random variable and this distribution is called a canter distribution it is a very bizarre distribution. So, it is so you remember the canter set right has this fractal property you remove all these middle thirds sequentially and you take that scattering of points this canter set and you smear a uniform distribution on that canter set it is very difficult to imagine what it would be like right. So, on the canter set you are putting some kind of a uniform measure that is what this is. So, if you want yeah. So, the example is a canter canter distribution or canter random variable. So, the way it works I will show picture in a little bit. So, you remember this. So, the CDF will look like this. So, let say this is my 0 1 interval you know the. So, the middle third would have been knocked off right in the canter set. So, what you do. So, at 1 it has to end up at 1 right. So, that is the CDF has to end up at 1 at 0 it starts off at 0 and it is uniform on the canter set in some sense because. So, if you knock off this middle third half the probability is sitting on this side half the probability is sitting on this side. So, what should be the value of the CDF 1 by 2 right. So, here in this interval it will be 1 on 2. So, half now similarly this guy also gets chopped up right middle third of this interval gets chopped up and similarly it has a fractal property right. So, the value here should be half of what this is right. So, if this is half and that is 1 here it should be a fourth 1 fourth and similarly here should be 3 fourths and you keep doing this right. Then you will have a little segment here little segment here and so on right. So, what you end up with is a perfectly valid cumulative distribution function I am trying to show a picture here I think it is called let us see. So, I can pull this up. So, you see this function here right this is the cantor distribution this is how it looks. In fact, I think you go down this is the Wikipedia page on the cantor function see this graphic actually shows how it is built up sequentially. So, you start off with the middle half then you put 2 more and then you put 4 more 8 more and then you keep doing this and you get that function. So, that is your CDF of the cantor random variable this is neither. So, this has you can show that this has probability 0 on all singletons, but it is not continuous either because all the measure is sitting on the cantor set. So, this function this is a perfectly valid CDF because it starts off at 0 ends up at 1 it is actually continuous it is not just right continuous it is in fact continuous why is it continuous. So, this is continuous because see you remember if the CDF is continuous at any point that is equivalent to saying the probability of that point is 0 right. Now, I am saying all singletons are 0 probability. So, which means for all x this is a continuous function it does not look even it is looks a bit bizarre, but it is actually continuous everywhere it does not have a single discontinuity it is continuous everywhere and, but it is the problem is it is not absolutely continuous. If it is absolutely continuous will have a continuous random variable it is not and it is not differentiable except it is only differentiable on these kind of segments where the derivative is 0. So, it increases without a derivative, but it does not jump you see it does not jump at all it is a continuous function right, but it does not have derivative everywhere and it increases in those points where it does not have a derivative, but it increases without jumps right. So, which is the bizarre thing about this function. So, this function is called devil star case for whatever reason some people call this devil star case I think it is a very terrible name because not because of the devil part, but because it is not a star case a star case is discontinuous right, but this is a continuous everywhere function it has no discontinues whatsoever anyway. So, this is just for some amusement value right I do not I have not seen a serious application of this kind of a singular random variable or singular distribution anywhere. Canter set does not contain intervals yes it is on the canter set that is what I am saying right. So, it is an uncountable set. So, canter set is an uncountable set and all the measure is sitting on all the measure is sitting on this canter set which is of 0 Lebesgue measure. See you can see a discrete random variable all the measure is sitting on a countable set. So, that is something very easily easy to imagine for you right. For example, if I tell you that all the measure is sitting on the rationals it may not be so easy to imagine, but it is ok right you can say that q 1 q 2 is a now all this measure is sitting not on a countable set, but it is all sitting on the canter distribution on the canter set right and that is the CDF it is not yeah it is made up of. So, it has an uncountable number of singletons and each of which has probability 0, but if you ask me what the probability of I do not know some integral 0 1 third is I will say it is half right you can assign just like I mean for any boral set this assigns a probability just like any random variable. If you ask me what the probability of the interval somewhere here is it will be 0 because you do not know canter points here right. So, it assigns probabilities to canter points in some uniform way right. So, you can think of having a uniform distribution in 0 1 writing it in binary flipping 1's to 2's and putting the measure on the canter set right. Because I mean it is the same thing right it is like a uniform distribution on the canter set which you can say you generate a uniform distribution on 0 1 write it in binary flip all 1's to 2's interpret it as a ternary and that will be your random variable right that is the way to generate a canter random variable that is true in continuous random variables also right continuous random variables also all single terms of 0 probability. And you do have that the total probability sitting on the interval let say 0 1 that you are able to buy right. But now the problem is that the measure is sitting on a 0 length set correct it is not sitting on an interval any questions and that is all I will say about this this is just for amusement value I guess. So, but it is not really just for amusement value because it is a fundamentally different type of a random variable which normally people do not even talk about because it is does not have too many applications. But I think it is good for us to know that I mean there are these bizarre objects also right. And any measure what so any probability measure on R can be decomposed into one of these 3 or some combinations there of right. So, there are 7 qualitatively different types of probability measures on R right discrete continuous singular or 2 take 2 at a time 3 of those and then you are staying all 3 at a time right any questions. Just like you can combine discrete. So, you can for example, just how do you do combine I mean what is the random variable which is mixture of discrete and continuous let us say. So, you can put some of your probability mass on discrete points right you may put some discrete atoms on let us say 0 half or something. So, you put some discrete atoms whatever these atoms add up to the remaining mass you smear it over some interval or whatever right. So, just like that you can take let us say a continuous random variable put some put some mass on it and the remaining mass you smear it over a canter set something like that right remaining probability measure you smear it or you can do all 3 right you put one mass somewhere put some of the mass on an interval and whatever remains you put on something like the canter set right that will be a mixture of all 3 right. Just that this measure of 1 has to be split between some countable set some intervals and some 0 the big measure uncountable sets right that is all there is sorry yeah. So, I mean this is just an example right. So, this is not the canter and distribution is just 1 example of a singular distribution. You can take any you can take any lebesgue measure 0 set uncountable lebesgue measure 0 set and put your probability on it that will be a singular random variable. Then there is no probability left right. So, it has to be a singular random variable correct. So, if you put only half the measure on a canter set and then the rest of the half you put it on an interval then it will be a combination of continuous and singular right and so on that is it. Any more questions? So, next I want to move on to the topic of sigma algebras generated by random variable we have studied sigma algebras generated by collections of subsets right. So, given some collection C of subsets we know what sigma of C means right is the smallest subset smallest sigma algebra containing all the sets in C. Now, we are going to talk about the sigma algebra generated by a random variable I am going to tell you what that is by random variable x. Suppose x is a random variable on omega f p. So, our favorite picture you this is your omega f and p and you have your real line and your x maps like that right. And because this is a random variable you take any Borel set its pre image if you look at its pre image let us say that is your pre image the pre image of every Borel set is an event by definition. Now, you consider for each Borel set here you consider all those pre images of Borel sets that is a collection of subsets here right. Now, we can what we what we can show is that see this Borel sigma algebra is a sigma algebra on r and you are looking at all these Borel sets and taking pre images here. So, you are looking at the pre image of the Borel sigma algebra on this sample space that you can show this a sigma algebra also. So, in particular. So, in notation I want to write as follows. So, there is a homework problem that you did right. So, if you have. So, just like this right. So, if you have function from. So, you have omega f p and you have another measurable space you have function from 1 to the other the pre images of all measurable sets here will form a sigma algebra on this set the this is a homework problem number 5 or something in your tutorial 1 if you remember right. So, if you look at all those sets consider the collection of subsets. So, I have a subset of omega for which a is equal to x inverse B for some B in Borel consider this. So, you are taking Borel sets and you are looking at the pre images a. So, this a is a subsets of omega right. So, these are all collections of subsets of the sample space correct. So, you let us call this something. So, you can show that can show this collection of subsets is a sigma algebra on omega right. This is something you already shown in your homework right. If you take pre images of a measurable sets sigma algebra you get a sigma algebra. So, you agree that this collection of subsets of omega. So, sigma algebra on omega everybody with me, but because x is a random variable each one of these sets is also f measurable correct is or no. So, what can we conclude not only is this a sigma algebra. It is a sub sigma algebra of f since x is a random variable the above sigma algebra is contained in and this is called the sigma algebra generated by the random variable x. It is denoted by sigma of capital X. So, this clear. So, I am taking the pre image of all boral sets on the real line looking at its pre image here. I know it must be a sigma algebra, but because x is a random variable that sigma algebra must be a sub sigma algebra of f. It may be f itself or it could be something smaller right correct. So, that is called the sigma algebra generated by the random variable x. So, this is an important concept and it is good to know at this stage. So, essentially the sigma algebra sigma x contains all the all subsets of omega which are pre images of boral sets. And intuitively sigma of x contains all those events whose occurrence or non occurrence is completely determined by the realization of x right. So, when omega realizes I know which boral sets occurred and which boral sets did not occur right when x is realized here right. So, for the pre images I know for every set in sigma x I know which events occurred and did not occur right. So, it is sigma x is consists of those subsets of omega whose occurrence or non occurrence is completely determined by the realization of x. So, I will stop here.