 back to transformations. So, I will talk about the Jacobian formula. So, you are considering a transformation y is equal to g of x and g of x g is a Borel measurable function. So, we said. So, last class we showed from first principles how we can obtain the CDF of y given the probability law of x and we worked out some examples. Today, I am going to derive a formula for the density of y given the density of x under some fairly special circumstances. The special circumstances are that x is a continuous random variable. So, it has a density and g is a invertible differentiable function. Suppose x has a density f x and g is a monotone differentiable function. So, g can be. So, g is differentiable and g is monotone increasing or monotone decreasing and x has a density. So, this is a very special case. This is by no means applicable for any other circumstances. So, in this case you can get an explicit formula. So, here you can. So, probability of y less than or equal to y this is your CDF of y can be written as the integral minus infinity well it is not really minus. So, it is over all x such that g of x less than or equal to y is not it agreed. I am integrating over all x after all the probability of any set is given by the integral over that set of the density. So, I am integrating over the set where g x is less than or equal to y. So, now what I am going to do I am going to. So, let me just write f x of x d x and I am going to make a transformation I am going to write x is equal to g inverse y in this integral put then you will get this will be integral minus infinity to g inverse y well. So, I will. So, first let me write this f x of x d x. So, first of all it is a forget this substitution now. So, let me say this one step later. So, you have that. So, why do you have that why is this true exactly. So, I have assumed that. So, first I have assumed that if g is monotone increasing then you will have this correct because g s less than or equal to y will be the same as integrating from x from minus infinity to g inverse y g also has an inverse. Now, you make the substitution I said now you put. So, now you will get this is integral minus infinity to y f x of g inverse y d t over g prime of g prime of g inverse of y. So, what am I doing here I am just putting x is equal to the t is not it all this will be t I think this is all t I am just putting x is equal to in this I am putting x is equal to g inverse t g of x is equal to t. So, I have that is fine and then my d x will become. So, what I basically have is. So, I have g of x is equal to t. So, g prime x d x will be d t. So, d x will be d t over g prime x which is g prime of the inverse t. So, this implies that. So, basically this implies g. So, g prime x d x is equal to d t and therefore, d t over g prime x is what I have here which is d x. So, this is your standard change of variable standard change of variable which means that my. So, my f y of y must be equal to this guy over here f y of y must be f x of g inverse y over g prime of g inverse y. So, I am just pulling that guy out. Now, this holds for g monotone increasing if you have g monotone decreasing you will get a minus sign over here. You can just work it out if g is monotone decreasing you will get a minus sign here. So, generally you can do that you can just put an absolute value and this will hold for g monotone decreasing or monotone increasing invertible function. So, this is your. So, I have expressed my density of y as a in terms of the density of x correct and in terms of the inverse of g and this term here which is again a function of the g derivative and the inverse of g. Now, this term here is called the Jacobian of the transformation this is called the Jacobian. So, it is called the Jacobian of the transformation g of x is equal to t that is why this formula is called the Jacobian formula. Now, see this Jacobian is not something this is Jacobian is a concept you are already familiar with you may not know it that this is called Jacobian. So, you have done integrals since your 12th class of something. So, when you put x is equal to sin theta in some integral you write d x is equal to cos theta d theta that cos theta is the Jacobian of the transformation. Just like here I have put. So, g x is equal to t g prime x d x is equal to d t. So, this guy is the Jacobian of the transformation you put x is equal to t square d x is equal to 2 t d t. So, the 2 t is the Jacobian. So, the Jacobian essentially tells you if you make a transformation from one space to another it tells you how a small differential element in the x space scales in the other space. So, it has to be scaled by the Jacobian in order to give you the small differential element in the other space. So, this is just a one this is just for one variable Jacobian there are also multi variable Jacobians. If you go in a space from R n to R m or some R n to another transformation you can write down the Jacobian for that as well. So, is this clear? So, this Jacobian is something you already know about you may not you may not call it the Jacobian, but you have been using it all along. There is really nothing new about this formula I have just used some variable change in the integration. So, again so I want to caution you that this formula is only useful under a very restrictive set of assumptions that x has a density and g must be differentiable monotone increasing or monotone decreasing. So, for example the example we had yesterday we had y equal to e power x x is normal 0 1 e power x is a monotone increasing differentiable function. So, you can use this formula correct. So, what will you have? So, you will have f y of y is equal to f x is what? So, e power. So, 1 by 2 pi square root of 2 pi e power minus x square over 2, but in place of x I have to write g inverse y what is g inverse y log log y log y square over 2, but I also have the Jacobian sitting in the bottom. So, I have to do derivative g prime of g inverse of y what is that give me. So, g inverse y is what? So, y is equal to e power x g inverse y is log y is not it? So, g prime y is equal to g prime of t is equal to e power t g x is e power x. So, g prime t is e power t. So, g prime of g inverse y is simply y. So, this implies g prime of g inverse y is equal to e power log y which is y. So, that guy you will get here actually I should put absolute value of y, but y is always positive. So, I do not have to put value of y. So, this is valid for y is strictly bigger than 0. I think yesterday I wrote y greater than or equal to 0 please correct it this is only for y strictly bigger than 0. So, this is the same formula you got yesterday. So, my see my impression is that this formula I mean while it works out all right in this specific cases specific case I actually feel a lot more comfortable using the first principles derivation. Because there is a much smaller scope of making a mistake or you know you may go something often computing these things. So, I would. So, while I derive this formula for completeness I never use it I will tell you my personal opinion on it I will just use the first principles whenever I have to compute c d f. So, for you see for example, even in the simple case let us take the other example where my let us say my x is n 0 1 and y is x squared you cannot use this formula right. Because x square is not a inverse it is not monotonically increasing or monotonically decreasing it has no inverse as a matter of fact right x square is not invertible on r correct. So, in this case the formula cannot work very simple example you cannot use the formula right useless right cannot use formula. Because g is not monotonic or you have to massage it a little bit you have to look at the positive side and negative side and do something about it right a bit of a headache if you had on the other hand if you had let us say x as exponential with parameter lambda and y is equal to x square in this case what happens you can use the formula. Because x you see this is this works only over the positive this density is only over the positive x axis and therefore, over r plus this is monotonically increasing right. So, you can use the formula here you can you can just you can just do this exercise, but still I claim that you are better of using this is the first principle derivation. So, I will just do are there any questions on this. So, I will do the n dimensional version of this. So, you have some more so more generalized version. So, you have x 1 x 2 dot x n random variables and define y 1 through y n as it is a g 1 of x 1 g 2 of x 2 and so on g n of x n. So, this is the so this is a in this case. So, what you want is to compute the so given the joint distribution of the x n you want the joint distribution of the y n and here similarly I mean a derivation you can do a similar derivation, but you have a multidimensional transformation. So, your Jacobian will basically you are going from a space x 1 through x n through y 1 to y 1. So, you will have a Jacobian in this n dimensions and the Jacobian in n dimensions will be a determinant of partial derivatives you might have seen this in other fields as well. So, I will not really first so much about the derivation it is it is a long, but not very instructive derivation you are just looking for the how the differential element d y 1 d y 2 d y n transforms to d x 1 d x 2 d x n. So, what you have is if you have a unique pre so in the situation where the for each y 1 y 2 y n there is a unique pre image in the x space you have a similar Jacobian formula. And your joint density so x 1 x 2 x n random variables with joint density f x 1 x 2 x n of. So, this is just x 1 x 2 dot x n that is your joint density then you can write f y 1 y 2 y n of little y 1 y 2 y n as f x 1 x 2 x n of g 1 inverse of y 1 dot dot dot g n inverse of y n times or times were Jacobian y or may be where. So, let me let me write this properly. So, j of y should be like determinant del x 1 del y 1 del x 2 del y 1 del x n del y 1 dot dot dot. So, so here you will have terms like del x 2 del y 1 del x 2 del y 2 del x 2 del x 2 del x 2 del x n del y 2 and so on. And the finally, you will have del x 1 del y n dot dot dot del x n del y n determinant of all that. So, does this agree with what I wrote here. So, I this guy is in the denominator. So, the Jacobian of y with respect to x and x with respect to y are just inverse of each other. So, I am just wondering if this and this are consistent. So, if you have just one function here you have x is equal to g inverse y d y. So, g x. So, you will have g x is equal to y if you have just one function and you will have g prime x d x is equal to d y and have g prime x which is d y d x. I think this is correct. So, I think this these two are fine. So, because I will have d y. So, I will have one over g inverse x which is the same as d y d x. So, you will have. So, what this guy is is just one over d y d x which is same as d x d y. So, I think it is all this is consistent. So, except the Jacobian here. So, the j y. So, this Jacobian and a Jacobian here are inverse of each other. They are this is the Jacobian of y with respect to x and x with respect to y are inverse of each other. So, I think this formula is correct. Now, I was just wondering if this has to be a minus 1 here. It is not. So, this is correct. So, that is your formula. So, this is the Jacobian of the transformation from y to x rather from x to y. This is plays the same role as this guy. It is you can think of this as the 2 t d t. When you put x is equal to t square 2 t is your Jacobian. This is your multidimensional Jacobian del x 2 del y 1. Did I make a mistake? Yes. This will be del x 1. Wait a second. So, this should be del x. So, this should be del x 1 del y 1 del x 2 del y 1. Yeah, that is fine. This should be del x 1 del y 1. Correct. Yeah, x is go that way. Why is go that way? Del y. This is del y 2. So, this should be del y 2. That is correct. Yeah, sorry. It does not matter if you go the other way. You can the determinant after all is symmetric to flipping the main across the main diagonal. So, if you want you can write del x 1 del y 1 del x 1 del y 2 and so on. That is also good. So, let me do. So, let me. So, since I put down the formula which looks fairly unusable. Let me show at least one example of a multidimensional transformation and apply this formula. So, this is a very famous example. So, I am going to take an example where x and y are i i d n 0 1, which means x and y are both standard gaussians independent. So, they both have densities n 0 1, 1 over square root of 2 pi e power minus x square over 2 and similarly for y and they are independent. So, x and y. So, if we give you a picture. So, there is some sample space and x and y take some value in R 2. So, if this is your y axis, this is your x axis. x and y can take values here. It is a 4 quadrant distribution and along the x axis you have the standard gaussian. Along the y axis you have the standard gaussian and they are independent. Now, I am going to make the transformation. So, let us say that my point realizes here x comma y realizes here. Then I am going to define R is equal to square root of x square plus y square and theta is equal to R tan y over x. So, essentially I am looking at just looking at a polar coordinate that is my R and that is my theta. So, remember x and y are random variables that map from omega and whenever x y realizes here or here for that matter. I am going to make the transformation instead of looking at Cartesian coordinates x and y. I am going to look at the polar the distance from the origin and the angle made with the x axis. This is simply a standard Cartesian coordinates to polar coordinates transformation. So, given that x and y are IID standard gaussians. I want to look for the distribution of R and theta. In fact, I want ideally I want a joint distribution of R and theta. Is the example clear everybody? So, R and theta are also random variables which is why I am putting capital letters. This is capital theta by the way big theta. It is not a h with a circle around it it is capital theta. So, now I have to figure out the Jacobian of the transformation actually you already know the Jacobian of the transformation from x y to R theta. You make this transformation. So, you done polar integrals. So, d x d y will become R d r d theta. So, that is you can if you want you can compute it. Let me do this. So, you have x is equal to I am going to compute the Jacobian. So, I am going to use x is equal to r cos theta y is equal to r sin theta. So, my j of r theta will be d x. So, del x del r del y del r del x del theta del y del theta. Now, del x del r will be what? R well see just cos theta I think del x del r this cos theta and del y del r will be sin theta del x del theta will be minus r sin theta and del y del theta will be r cos theta. So, the determinant will be r cos square theta plus r sin square theta which is just r. This is simply r which is why we write d x d y is equal to r d r d theta which means r is the Jacobian of your polar transformation. So, simple is that now you know what that big determinant is it is just telling you how the differential limit transforms. So, we have good. So, now you can you can go and solve this great. So, now you have f r theta of little r little theta is equal to I have to write the density first. So, first of all I know that f x comma y of little x comma little y is 1 over 2 pi e power minus x square plus y square over 2. The square root of 2 pi will I mean they will square root now and I am just. So, why am I multiplying the density is because they are independent. So, good. So, now I have 1 over 2 pi e power minus x square plus y square is simply r square. So, that will become r square over 2 times the Jacobian which is r and that is it. This is my density. Now, what happened to theta? There is no theta in the. So, what happened to theta? So, this is some function of theta r n theta r is here no problem. What happened to theta? It is constant in theta that is what it means. If you do not see theta in the answer which means it is constant in theta which means there is a uniform distribution on theta. So, this is valid first of all. So, this is valid for r greater than 0 or greater than equal to 0 actually and theta n theta n 0 2 pi. So, this is the joint distribution. You do not see theta here because you see it in only in the constraint. So, now you know the joint distribution you can easily find out the marginals of both r and theta. So, if you want f r of small r that will be integral of all this. So, it will be f r theta of r theta d theta from theta from 0 to 2 pi. So, that will be r e power minus r square over 2 for r greater than equal to 0. That is your density for r and f of f theta of theta will be equal to will simply be equal to. So, you have to integrate that guy from 0 to infinity and that integral will be 1 because you mean after all it will be the integral of this guy. So, you will get 1 over 2 pi for theta n 0 2 pi. So, is not that remarkable? Why is this remarkable? You find that r and theta are independent because f r times f theta is f r theta. So, what you see is that even in the polar axis if you have x and y as independent i i d Gaussian's even the polar coordinates r and theta are independent. It is a very remarkable property. So, this will not be true for example, see it is not because of the independent of x y alone. It is also because of the i i d nature of x and y that this is true. So, good. So, this is called Rayleigh distribution and this is just a uniform. So, if you have 2 i i d Gaussian's on x and y the polar distances distributed as a Rayleigh random variable. So, Rayleigh p d f will look like this if you want to plot f r of r against r it is only valid for r greater than or equal to 0. So, for very small r this term behaves like 1, approximately 1 r is very very small. So, the function will increase linearly, but at some point this e power minus r square 2 over 2 will catch up and start hitting that. So, you will have something like that and you can figure out where the maximum occurs. So, you can differentiate and see where this maximum occurs it has a particular maximum. This is the density then it will go down like that. So, this Rayleigh p d f has lots of applications in wireless communications. So, there is a concept in wireless communication known as fading. So, when you transmit a wireless signal from here to somewhere over there let us say I transmit to somebody there. The wireless it propagates over space as it propagates over space it hits a lot of reflectors. It has all this desks and chairs and walls and the signal that you receive at the receiver will be a combination of variously delayed linear combination of what you transmit. So, what you receive can in fact you can in fact be modeled by 2. So, 2 Gaussian's to independent Gaussian's what is known as a complex Gaussian and the total gain of what you get the absolute value of the signal of the power you get is determined by r this random variable r which is a Rayleigh distributed random variable. So, often you speak of a Rayleigh fading in a wireless communication which is the simplest kind of model for a wireless channel. So, if you do wireless you will encounter a lot of this stuff. Incidentally you can also show that if you look at r squared. So, if you say if you look at the random variable let us say r square call this some other let us say random variable rho or something. This rho you can show will be exponentially distributed show that rho is exponentially distributed. You can use the 1 dimensional Jacobian formula because r is non negative r square is monotonic. So, you can show that from here you can show that rho is in fact exponentially distributed. So, if you have so the power is so to speak the power of the r squared the signal strength that you receive will be a will be some will be a exponential random variable. So, whatever power you send will be multiplied by some exponential if you send a wireless signal over a fading channel. So, this is also something you will use a lot if you do wireless. No, see this r is it takes values in 0 infinity x and y are 0 mean variable, but r is not 0 mean around 0 0 points around that values yes. See x is primarily centered around here y is. So, y x will be centered around 0 y will be centered around 0, but so they will so most of the x y's will lie in a. So, if you look at this 2 dimensional p d f right it will look like a nice just plot it in matlab and see. So, it will look very symmetric like symmetric. It will be a 2 dimensional symmetric bell curve and so a most of the points will lie in a nice circular region of let us say around 2 sigma around the origin and your r will be distributed really. In r we are taking a square root that is what pushing mean available 0. No, I am saying even without a square root it is an exponential distribution whose mean is also positive. See the radian distance will not be 0 the radian distance is almost surely positive. So, you cannot expect that the mean will be 0. No, you are not you are not getting 0 0 right you are getting something positive for r right most of the time you are getting r around that region and r square r square rho will be also positive it is exponentially distributed. So, r is a function of those x and y and r is not having a mean at 0. It will not have a mean at 0. It is not because of the square root even I am saying that even if you take r square which is simply x square plus y square even r square will not have a mean 0 I am saying it is exponentially distributed. So, most of the values so you will get a lot of values like you will get some values here and so on. So, most of the time so your r is roughly concentrated around like 1 I think it is a concentrated around 1 o square root of 2 or something just figure out where that maximum is right most of the time it is something like that. It will not be very large because your Gaussian is becoming very I mean it is going to go down very rapidly right. So, your r will not be very large as dictated by this decaying curve, but it will not have. So, taking values around 0 very close to 0 is unlikely because you are looking at a very small circle. So, r is not likely to be very small either x plus y as a function then that will not have a mean 0 x plus y correct x plus y will have mean 0 yes, but x square plus y square right this is going to be a non negative value random variable. So, x and y take both negative and positive values, but x square plus y square will be positive almost surely right. So, which is why the mean is strictly positive. So, this row is exponential what is known as the wireless channel gain right under Rayleigh fading channel is an exponential. So, if you want to generate these kind of so you know how to. So, if you can if you are in a position to generate two independent Gaussians you can automatically generate an exponential and you can also automatically generate a uniform right. And conversely if you are only in a position to generate uniform random variables you can also generate Gaussians how can you do that. So, if you can only generate uniform random variables you will first generate theta which is uniform in 0 to 2 pi right. So, let us say you can only generate uniform random variables in 0 1 you have an algorithm to do that right. So, you generate 1 uniform random variable in 0 1 multiplied by 2 pi call that theta right. Now, independently generate another uniform random variable in 0 1 and you take say let us say u is. So, u 1 is 1 uniform random variable you generate you multiply it by 2 pi and you call that theta right. And you generate another random variable u 2 independently uniformly in 0 1 and minus log u 2 you call it r squared. So, you can show that if u 2 is if u is uniform minus log u is exponentially distributed with unit parameter alright r square yeah may be you have to scale it by 2 I think this r square will be exponential with parameter half I think right. So, you can do some scaling right this is not quite true there is may be some constant here. So, r square will be exponential which means you take the square root it will be what take the square root of an exponential you will have Rayleigh right. So, you have r comma theta r being Rayleigh and theta being uniform. So, you put that point r comma theta on your x axis y on your polar coordinates read of your x axis read of your y axis you have 2 independent normal distributions. So, I may I think I got it wrong. So, there is some scaling problems here. So, this may be 1 over 2 pi right I do not quite know. So, this may be no this is correct I think actually this is in fact correct this is right. And here there is a scaling problem, but the very small scaling problem because r square is exponential. So, r square is exponential with parameter half I believe is that true I think that is right. R square will be exponential parameter half, but log u 2 will be exponential parameter. So, this will be exponential parameter 1. So, if I divide by 2 I think I should be fine right. So, if I divide by 2 I think I will actually have r square exponential with parameter 1 parameter half. And therefore, r will be distributed like that Rayleigh distributed. Now you read of. So, now you put this r comma theta generated like this on your polar coordinates. So, this is your r and that is your theta. And then you simply read of r cos theta and r sin theta. So, this will be x and y they will be independent identically distributed standard Gaussian. So, if you can generate a uniform you can generate exponentials you can generate Gaussian right you can do whatever you want Rayleigh. So, this is example clear. 2 pi u 1 varies from 0 to 2 pi right correct and the constant value is 2 pi. 1 over 2 pi the f f theta will be 1 over 2 pi because it takes values in 0 to 2 pi f theta will be 1 over 2 pi. u 1 takes 1 u 1. So, let us say u 1 and u 2 are uniform in 0 1 u 2 is also IID. So, these are IID uniforms in 0 1. So, u 1 is takes values in 0 1. So, theta takes uniform values in 0 2 pi. So, that is fine here there is a small problem minus log u 2 will be exponential with parameter 1. You can show that, but we what we want is something with the parameter half r square you will shows a exponential with parameter half I believe. Then I think you divide I am not very sure about this just do this computations. I think this will be the correct scaling then you can go ahead and put it on the 2 dimensional plane read of the x coordinates read of the y coordinates you will have IID cosians 2 IID cosians. So, the transfer the chapter on transformation is over thanks.