 Welcome back, today we will discuss characteristic functions. So, the if x is a random variable then the characteristic function is defined as follows c x of t is defined as expectation function of e power i t x where t is some real number. So, this t is some real parameter x is your random variable this e power i is of course, your i square is minus 1 it is your square root of minus 1 e power i t x is a complex value random variable right. So, which is something we have not really talked about. So, you can so you can actually take this as expectation of cos t x plus i times expectation of sin t x cos t x and sin t x are random variables parameters by t and you take their expectations. So, expectation of cos t x plus i times expectation of sin t x now this you know what it means right because these are all real value random variables. So, that is the definition of a characteristic function. So, if this is again defined for any random variable right. So, for any random variable with probability law p x. So, p x is your law then c x of t is nothing but integral e power i t little x d p x right. So, given your probability law you can compute the characteristic function again here e power i t x is just cos t x plus i sin t x. So, you can evaluate the real and imaginary part as usual. So, now this so why do you need this characteristic function. See this characteristic function is analogous to a Fourier transform just like your moment generating function is analogous to a Laplace transform. In particular when you have a density if x is a continuous random variable with probability density function f x then c x of t is equal to integral e power i t x f x of x d x x is from minus infinity. Because in the case of a continuous random variable you know that radon equilibrium tells you the d p x is f x d x right. So, this is like a Fourier transform of the p d f except that Fourier transform usually has a negative sign here the only difference. So, now why do you need this. So, why not just the moment generating function. So, the reason is so the main reason is that the moment generating function does not always exist outside s is equal to 0 right as this is equal to 0 moment generating function is equal to 1. But as we saw in the case of a Cauchy random variable for example, the moment generating function was undefined for all other arguments. But on the other hand the characteristic function is always well defined and it is finite for any random variable. So, the reason that is true is the following. So, if you look at so if you is true more generally, but if you just look at this for a continuous random variable. If you take absolute value of this integral then you can bound it above by the integral of the absolute value right. But the absolute value of e power i t x is 1. So, you will have. So, the absolute value of c x of t is upper bounded by 1 always right. So, this integral in particular you can show is uniformly convergent for all t right for all real t this is uniformly convergent absolutely convergent therefore, uniformly convergent. So, this is always well defined that is one thing. So, you can handle even random variables such as Cauchy. So, you can basically handle this heavy tail random variables which decays lower than all these exponential like Cauchy and so on. For which the moment generating function is not defined outside s is equal to 0, but characteristic function will be defined. The other reason is that at some level this inversion of characteristic function is a little bit easier than inversion of moment generating function for the simple reason that just like the inversion of Fourier transform is a little bit easier right. You can write down an explicit formula which does not involve some very complicated contour integrals over the complex plane and so on right. So, it is a little bit easier. So, in the case of moment generating function you have to do some pattern matching here you can actually write down some more elementary formula for the inversion that is a more practical reason. So, there is one thing I want to a point out. So, let me just give an example right. So, I think my point will be clear if I give you an example of let us say x is exponential with parameter mu right. So, f x is mu e power minus mu x for x greater than or equal to 0. So, you are used to computing Fourier transforms in your signals and systems. So, you would expect the characteristic function to be the same thing except that in your Fourier transform you have to be careful about the minus. So, wherever you have a minus in the Fourier transform you have to get rid of the minus essentially right. So, if you have. So, if you in whatever answers you know for your Fourier transforms replace your argument with minus t right. Then you should get your characteristic function. So, just is yes that is true right, but I want to make some cautionary remarks. So, I have to give you some caveats here. So, in particular if you try to if you write this. So, here is my p d f and my characteristic function will be integral 0 to infinity mu e power minus mu x e power i t x d x right fine. So, now this is when again this is an integral you would have computed in signals and systems. So, what you normally do in signals and systems is that you would just write this as e power minus mu minus i t times x right. So, you will write this as 0 to infinity mu e power minus mu minus i t times x d x correct. And then you will just and then the answer is therefore, mu over mu minus i t right this is for t in r value for all all the real t. So, essentially in going from here to here what you are basically treating is this mu i mu minus i t you are treating is the as some a some real a right. And then you are just writing this integral down pretending that mu minus i t is some real number whereas, it is not a real number right it is in fact a complex number right. So, in this particular case I mean and you would have similarly for the case of your in your signals and systems you get a plus here that is all right. The Fourier transform of this exponential curve is a you will have a plus here that is the only difference right. Now, this step is not really correct it. So, happens that by sheer luck you get this answer if you just pretend that this is a real number. So, what you really have is a function of a complex variable right. And this integral is should be treated as the integral over the contour of the real line. So, in order to evaluate this integral strictly speaking you have to perform contour integration right. So, in fact the Fourier transform that you have been evaluating in signals and systems is in fact strictly speaking a contour integral. You see what I mean. So, this is something you do in signals and systems right when you compute Fourier transforms this is not really justified. It so happens that in this case you get the correct answer even if you perform the contour integration correctly you get this answer. It so happens that if you treat this as a real number you happen to get the right answer by luck. There are examples where you will not get the correct answer if you just pretend that mu minus i t is a real number. So, let me now that I am talking about that let me just give you a I told you the example of a Cauchy right. So, the one of the nice things about the characteristic function is it can help you handle things like the Cauchy distribution which the moment generating function does not all of you to handle right. So, if you have a Cauchy. So, if you have the Cauchy f x of x is 1 over pi 1 over 1 plus x square right. So, in this case your c x of t is equal to integral minus infinity e power i t x over pi times 1 plus x square d x. So, now in this case actually you cannot explicit you cannot pretend this i t is some real number and do this integral and you will not get the answer. In fact, you might have done this you have what you have after all is a function that looks like 1 over 1 plus x square right. And you have computed it is you know it is Fourier transform right I mean do you know it is Fourier transform what it looks like the functional form e power. So, you so yeah. So, let us go back to. So, you think so the you know from your signal systems that the transform should look like e power minus some constant times absolute value of t right. So, just let us go back to Fourier transforms from your signals and systems for a little bit. So, if you have so you have. So, what do you use x so f of x. So, f of x and f of t for Fourier transform this is your function this is your transform right. So, you have you know you write down relations like this right. So, if you have e power minus a x greater than or equal to 0 the transform the Fourier transform is like 1 over a plus i t right. And so and you have e power minus a absolute x right you have to help me with this. So, this transforms to what so something 2 a right 2 a over a square plus a here see you know these by right. So, good. So, now what you do so this transform you compute this explicitly again in computing this you will pretend that i t is a real number and you know add this up. And then you get this answer right then you will invoke Fourier duality right to say that oh if you have something like 1 over 1 plus x squared here right you should have something like e power minus absolute t I am missing some constants I am surely missing some constants here, but I do not really care about it I am just saying that if you are given a function that looks like the transform you use duality and say then the transform should look like this right that is what you do right. So, why do you not integrate this function pretending that e power i t x is a so you see what I am saying right you never calculate the Fourier transform of this by integration you always invoke Fourier duality right. So, the reason is so in signals and systems you explicitly calculate the transform by integrating in precisely those cases where you can get away by pretending that i t is a real number in cases like this where you cannot get away by pretending that i t is a real number you are told that do not integrate just use Fourier duality or some other property to get the answer. Now, the fact of the matter is in all these cases just blindly integrating this as a pretending this i t to be a real number is actually wrong some cases it gives you the right answer some cases it does not. So, this is something I want to point out actually doing this possible, but you need contour integration. So, that is not something I will expect of you I would just want to mention that these integrals are not I mean you are no longer an under graduate studying signals and systems. So, I think you should at least know that there is more to it than what you studied in signals and systems. So, if you have something like this. So, you have a function that has 1 over 1 plus x squared right. So, you have to essentially use contour integration and then invoke the residue theorem you might have study the residue theorem in complex analysis. So, what you do in evaluating something like this is that. So, this guy has 2 poles right this 1 over 1 plus x square has 2 poles you have a pole at i you have a pole at minus i right. So, you are basically integrating like that right over this line. So, you approximate. So, what you do is you perform contour integration over a big contour like that and then you compute that integral using the residue 2 pi i times the residue at i and then you similarly do it for this pole. And then you have to argue that that integral and that integral will be 0 right and finally, you will get some answer right. And in fact, you will get the answer you expect I think you will get e power minus absolute t. You have to do it separately for t greater than 0 and t less than 0 and you will get this answer. So, you have to even here you have to do contour integration right, but it so happens that you get this answer which you can with this you can believe this is harder to believe for you right that is because you can no longer t power i t as a real number. So, in a point clear. So, even in Fourier transform this is how you should do it strictly speaking even when you are finding the Fourier transform of this exponential function you have to actually do contour integration. So, normally what you do in signal systems is you pretend that this is real, but you only do that in cases where you can get away and in cases where you cannot get away you use some other trick right. I am just pointing that out actually the answer this is a very useful transform to know for a Cauchy e power minus absolute t is the characteristic function. Any question so far see in cases where the moment generating function exists in an interval you can actually obtain the characteristic function by simply putting s is equal to i t. So, when the moment generating function exists in some interval around the origin you know that the moment generating function is an analytic function. And therefore, it is in fact analytic on the imaginary access as well. So, in those cases you can suppose in this case right. So, you can just put i t s is equal to i t even in the case of let us say in the case of the Gaussian right. You can blindly put s is equal to i t to get the characteristic function. So, let me just state that let me just state some elementary properties. So, elementary properties if y equal to a x plus b c y of t is equal to e power i b t c x of a t this is again from definition. So, if you have independent random variables x and y and z is equal to x plus y the characteristic function of z is given by the product of the two characteristic functions. This is similar to the property for moment generating functions except you have an advantage what is the advantage it exists right. So, if you if I mean the same thing is true for moment generating function, but if you are dealing with let us say two Cauchy random variables moment generating function will not exist. So, if you want to figure out of x and y are Cauchy distributed i i d Cauchy distributed for example, like that if you want to figure out if you want to figure out what the sum is like right you cannot use moment generating function, but you can use characteristic functions right. So, in particular you can do this example for yourself if x and y are independent and identically distributed according to the Cauchy distribution you know it is characteristic function. So, if z is equal to the sum of two standard Cauchy distributions it is characteristic function will be e power minus twice absolute t correct. So, you would imagine that. So, if you invert that characteristic function you would imagine that you will get another what kind of a. So, it will be another Cauchy distribution with the different parameter right. So, what I am saying is that you will multiply this characteristic function of x and the characteristic function of y is the same thing right. So, the characteristic function of z will be e power minus twice absolute value of t right which looks of the same functional form right it looks like e power minus a absolute t. And therefore, z must also be a Cauchy distribution with parameter square root of 2 right correct. So, using characteristic functions you can figure out that sum of two independent Cauchy is also a Cauchy which you cannot figure out using moment generating function. So, the proof of this is straight forward by the way right just from it is just expectation of e power i t z right. So, z is equal to x plus y and then this proof is exactly similar to the proof for the moment generating function. So, now you know new result you know that sum of two independent Gaussians is a Gaussian now you also know that sum of two independent Cauchy distribution is a Cauchy distribution. And you cannot figure that out using moment generating function you have to use characteristic functions correct. So, this Cauchy is also a stable distribution. So, called stable distribution you keep adding independent Cauchy you get keep getting other Cauchy distributions just like for the case of the Gaussian which we know right. For the Gaussian case you can use characteristic function or moment generating function because they both exist no problem these. So, these two are elementary properties what else. So, you have so this is not an elementary property, but I will I will stated without a proof. So, if m x of s less than infinity for s in minus epsilon epsilon then c x of t equal to m x of i t this is just what I mentioned earlier right. If you have see if your moment generating function is only defined at the origin then bad luck you cannot really I mean it may be a very I mean it may not say much about the density, but if you have s if you have the moment generating function defined in a neighborhood of the origin then you know that some very nice properties kick in right the moment generating function is analytic. So, it gets defined in the entire complex plane and in particular it will be analytic on the j omega axis or the j i whatever axis you call it vertical axis right. So, you can just substitute s is equal to i t and you can get the characteristic function. So, if you have x is normal 0 1 we know m x of s equal to e power s squared over 2 right and this is true for all s in the complex plane right. So, c x of t will be equal to m x of i t which will be equal to e power minus t squared over 2 for all t belonging to r. So, the characteristic function of a standard Gaussian is e power minus t squared over 2 right. So, it also looks like a standard Gaussian right it looks like e power minus except for the 1 over square root of 2 pi right. So, and then you invoke suppose you invoke this property right then you can figure out the characteristic function of say let us say if y is equal to sigma x plus mu right then y will become normal mu sigma squared right. In that case your characteristic function c y of t will be equal to you apply that rule right it will you will get e power i mu t times c x of a t which will be e power minus sigma squared t squared over 2 I am just using this rule. So, that gives you the characteristic function of even non standard Gaussian mean mu variance sigma squared. So, now that you know this if you have 2 independent Gaussian y 1 and y 2 with parameters mu 1 sigma 1 mu 2 sigma 2 you can show that z which is the sum of the 2 will also be Gaussian with the parameters mu is adding up and the sigma squared is adding up right. So, you can get it from here, but this result you can get it using even moment generating function right. Whereas, in the Cauchy case you have to use the characteristic function right because the moment generating function does not exist is this clear. So, this is I guess not particularly elementary, but we will not prove this inversion theorem. So, this is just like the inverse of Fourier transform. So, it is a highly non trivial theorem by the way I will just state the inversion theorem for characteristic functions. If 2 random variables have the same characteristic function then their CDFs are equal or the same without further. If x is a continuous random variable then the pdf can be recovered from the characteristic function as follows. So, I believe this holds only for all for every x where I believe it holds for every x where f x is continuous. So, if you have. So, this is like the Fourier transform inversion right except. So, I guess for the Fourier transform inversion again you are used to writing integral minus infinity to infinity. It is not quite the same as saying the limit t tending to infinity minus t to t that is because I mean if you write minus infinity to infinity these 2 limits can go at different rates right. So, but this is the correct statement. So, you have so you have limit t tending to infinity 1 by 2 pi integral minus t to t e power minus i t x c x of t g t. So, this holds. So, this holds where f x is continuous and if I remember correctly. So, this I do not exactly remember I believe that if your f x is not continuous f x f x as a jump then this inverse transform converges to the middle point of the discontinuity. This is again from your Fourier inversion theorem right that is something I do not quite completely remember correctly, but this holds for a points of discontinuity. If you have discontinuity the inverse transform converges to the middle point of the discontinuity. So, this just says actually again this is something you will not really use. So, often right you will it is much easier to usually do pattern matching and figured out what the density is if at all there is a density right, but this theorem just says. So, you know what given your characteristic function you can obtain your density whenever there is a density. And in cases when there is no density there is actually a more general way to recover the CDF. I do not know if I should state that. So, let me not state that because I have I do not have that much time. So, the characteristic function is defined even when x is not continuous and if you are given the characteristic function of a random variable which is not a continuous random variable you can there is a way to invert it to obtain the CDF. Maybe I will put that in the notes. Let me not spend class time on it. So, this is a very important theorem. So, this I will call this defining properties. So, you will very soon realize why these properties are defining properties of a characteristic function. If x is a random variable with characteristic function C x of t then property a C x of 0 equal to 1 and absolute value of C x of t less than or equal to 1 for all t in r. This is something you will understand very easily right. C x of 0 is obviously 1. So, expectation of 1 right you are putting t equal to 0 and I argued that absolute value of C x of t less than or equal to 1 correct. I think I just I did not write it out, but I argued it out B C x of t is uniformly continuous on r i e for all t in r. There exists a psi of h maybe I should call it this is not the conditional expectation or anything maybe I should call it I call psi for conditional expectation right. So, maybe I should call it phi of h which goes to 0 as h goes to 0 such that absolute value of C x of t plus h minus C x of t is less than or equal to phi of h. So, this is the second property. So, this essentially says that characteristic function is a continuous function of t is actually a uniformly continuous function of t uniform continuity is a stronger form of continuity uniform continuity in place continuity, but not the other way round in particular uniform continuity is exactly what I have said here. So, normally you mean by continuity you mean that this absolute difference is small as a is goes to 0 as h tends to 0 right that is what you mean by continuity, but in the case of uniform continuity no matter what t you choose this absolute difference is uniformly bounded by this function phi of h. See normally you only need that this different goes to 0 as h goes to 0 correct, but here you are demanding that no matter what t is you can choose a phi of h such that this difference is uniformly bounded above by phi of h as phi of h is a function that goes to 0 as h goes to 0 in some sense you are saying that this difference is this difference itself does not depend on the value of t irrespective of the value of t it is bounded above by some phi of h right that is stronger than a continuous function right saying more than continuity it is called uniform continuity c x the third property is c x is a non negative definite corner i e for any real for any n and real t 1 t 2 dot dot t n and any complex z 1 z 2 dot dot dot z n we have we have sum over j k z j c x of t j minus t k z k conjugate is greater than or equal to 0. So, this is a property that seems is a very fundamental property of characteristic function it does look a little bit looks a little bit hard to digest in the beginning right I will help you pass this. So, what you are saying is that you are considering this complex quadratic forms right this is like complex quadratic form you are fixing some n all right fix any positive integer n and you pick any real t 1 t 2 t n you want any complex z 1 z 2 z n you want and then you are looking at. So, let me put this in a matrix form. So, that is a little bit easier to digest right. So, you are looking at a matrix in which the i j th entry is c x of t j minus t k this is the i j th entry of the matrix or j k th entry of the matrix. And therefore, the diagonal entries will be what once they are all once because you have t j minus t j which is 0 c x of 0 is 1 that you will have once on this main diagonal all the diagonal. And you will have these kind of terms on the j k th entry and then you are essentially what you are doing is. So, really what you are doing is that you are taking the z's like that right or maybe this is z bars right this is all z bars this is z 1 z 2 z n right you are looking at a quadratic form of that kind that is what this is come to think of it right. And you are saying that for no matter what these t j's are no matter what these z j's are this quadratic form is always this complex quadratic form is always non negative fine. So, is it clear what it what it says. So, I am saying that this is exactly same as this. So, you arrange your you choose your t j's and you form this matrix right then this matrix will have this matrix is a non negative kernel in the sense that all complex quadratic forms are non negative. So, we have to prove this results right. So, this first bit is easy this is quite trivial this and this needs some work. So, how my own time and probably I just have like 2 3 minutes maybe I will just try and do part b. So, it says 5 minutes. So, let me just do part b and part c maybe will do next class. So, I am trying to prove uniform continuity. So, you have absolute value of c x of t plus h minus c x of t is equal to absolute value of e power expectation of e power i t plus h x minus e power i t x unit right. So, that is equal to I can write this as absolute value of absolute value of expectation of e power i t x times e power i t h minus 1 right agree with that h x sorry right correct. Now, this will become. So, this you can show this is not equal to expectation of e power i h x minus 1 correct because this absolute value will be equal to 1. So, this is what you have to look at right. So, this is like. So, if you look at this is what. So, this will be like the absolute value will be like. So, if you write e power i h x as cos h x plus i sin h x right. So, this will become. So, absolute value of e power i h x minus 1 is equal to square root of you can show that this is nothing but square root of 2 minus 2 cos h x just trigonometric identity and then this is equal to 2 times the absolute value of sin h x over 2 correct. So, I am just expanding this cos x. So, just the absolute value of a complex number is you know is equal to square root of the real part square imaginary part square simplify sin square plus cos square equal to 1 you do all that you end up with this. So, essentially what we are saying is that this guy will be less than or equal to 2 is not it. So, this is essentially less than or equal to 2. So, this is my. So, this is my this is what my phi of h is. So, I have to prove that my phi of h tends to 0 as h tends to 0, but as my h tends to 0. So, I have to use dominated convergence theorem now. So, I can take the limit inside right. So, it is bounded. So, it is dominated above by 2 right. So, I can take the limit inside and this guy obviously goes to 0 right as h goes to 0 correct. So, by dominated convergence theorem you will get the result that is the step I was missing. So, maybe we will recap this next class stop here.