 So, we were discussing MMSE estimate right first of all we said. So, we define the conditional expectation expectation of y given x we define this as random variable which is measurable under the sigma algebra sigma x right. So, I mean in more plain language expectation of y given x is a random variable which only depends on x right to say it little more mathematically precisely it is a sigma x measurable random variable right. So, we call this some psi of x right and we said that. So, we define this only for the case when x and y are discrete and for the case when x and y have a joint p d f the case when x and y are jointly continuous right only in those two cases we could define this in terms of an expectation with respect to the conditional density or the conditional p m f and then blindly replace the actual value with the random variable right that is how we defined it somewhat it is a very elementary way of defining it. And we said that. So, we said iterated expectation holds. So, this is the law of iterated expectation. So, we said expectation of expectation of y given x equal to expectation of y. So, this we proved again by explicitly writing it out for the case of discrete and jointly continuous random variables. And for I also said for any we said also for any measurable g well any measurable g we said expectation of y minus psi of x times g of x equal to 0 right this is what we said last. And this again you can show by explicitly writing it out whenever this is defined right expectation of another way of saying it is expectation of y g x is same as expectation of g of x times the conditional expectation. Now, this result we interpreted geometrically right we said that you can view psi of x as an estimate of y that only depends on x right suppose x and y are some dependent random variables. And you want to estimate y, but you cannot observe y let us say you can only observe x. And you want to somehow come up with an estimate for y all right. So, if you view this as an estimate the expectation of y given x is a random variable which estimates y only based on x then y minus psi of x represents your estimation error right. So, what this is saying is that in this estimation error is orthogonal to any function of what you observe what you observe correct. So, we already established that the square integrable random variables are Hilbert space right. So, and the covariance plays the role of the inner product right. So, in that sense we have I just drew a picture that said if you are looking at the subspace of sigma x measurable random variables. So, this is. So, this whole space I am drawing it like R 2, but it is actually the space of the Hilbert space of square integrable random variables in that sigma x measurable random variables form a subspace of the Hilbert space. And what you are saying is that if your random variable y is some random some point here and you have to estimate y based only on things in the subspace right. So, if you have to pick a sigma x measurable random variable which is the best estimate of your y geometrically it seems clear that you should pick the foot of the perpendicular right. And this foot of the perpendicular is really what your conditional expectation is. So, that y minus i of x which is your estimation error is orthogonal to this space of the subspace of sigma x measurable random variables clear. So, that is the interpretation we gave now. So, this is something we this is a result you can prove explicitly by writing it out for the case of discrete random variables and jointly continuous random variables. But for more general cases where x and y may be anything right where mixtures or singlers or whatever we did not even define the conditional expectation right we do not know how to define it. In fact this is taken as the definition of conditional expectation in the general case. So, you can actually prove that for any two random variables x and y and any measurable function g there exists a random variable which is sigma x measurable such that this error is orthogonal to any function of x. The existence of such a random variable can be proven and it is and such a random variable can be shown to be unique except for a set of measure 0. So, this conditional expectation exists uniquely in a in an almost your sense right. So, that existence can be proved that is how you define conditional expectation in a more general setting this is taken as a defining equation is that clear. So, now and then we also said that psi of x is an MMSE estimate of y that is for any function h we have expectation of y minus psi of x squared is less than or equal to expectation of y minus h of x squared. So, if you have any h of h h of x will be a sigma x measurable random variable right it will be some random variable in that subspace correct. Now, what we look what we are saying is that for any random variable on that subspace the error y minus the squared error the mean squared error this is called right expected squared error is smallest when you choose this particular psi of x. And that is again fairly clear because of this orthogonal interpretation right I mean if this were a normal Euclidean space this is not a Euclidean space this is a Hilbert space. If it is a normal Euclidean space you would obviously, choose the foot of the perpendicular as your closest point and that intuition holds perfectly right. And therefore, this psi of x is called the minimum mean squared error estimate MMSE estimate. How do you prove this? So, I am going to prove it like this we will write expectation of y minus h of x squared is equal to expectation of y minus psi of x squared plus expectation of psi of x minus h of x squared plus. So, I am going to write this. So, I am just trying to decompose this right I am going to write this is y minus psi x plus psi minus h x. So, I should have a twice expectation of what would I have y minus psi x times psi x minus h x correct. Now, what can you say? So, you agree with this equation all right I am just expanding everything out I hope I have not made a mistake. So, now, if you look at this term right it looks like. So, you can consider this as a function of x right this is some g x right. So, what you will have is this whole thing is what 0 right. So, this term is 0 see why because of that property correct. And this term is obviously non negative right. So, this must be greater than or equal to that right here. So, if you look at this. So, if you just look at this. So, this term is 0 right. So, what this is saying is expected y minus h x squared is equal to expected y minus psi x square plus psi x minus h x square which is simply Pythagoras theorem right in this space right all right. So, this result follows. So, what so this proof is using this fact which again you prove using your elementary definition for discrete and jointly continuous random variables. But, what you can show see in more generality what you can show is that I mean you cannot. So, this is taken as some kind of a defining property right this is taken as a defining property. And you have to show that such a psi x exist uniquely up to a set of measure 0 right. So, what you have is that in more general settings this is not I mean this is not something you have to worry about too much. But, let me just tell you that if you just consider expected y minus let me say y minus z square and I take infimum over all z which are sigma x measurable. So, I am considering that. So, z is simply some parameters z is a random variable which lies in that subspace. And I am looking for that particular z which infimizes expected y minus z whole square error from this is square error from y. Now, this infimum is well defined now the point is that this infimum is attained. The reason that this infimum is attained is because this being a subspace of a Hilbert space Hilbert spaces are continuous right I mean to say complete right Hilbert spaces are complete. So, the space does not have any holes right. So, there must exist a z which satisfies which actually achieves this infimum. And that infimum is your conditional expectation and that infimum obviously satisfies this right by just algebra. And that is how you define it in a more general setting at least in l 2 spaces that is how you define it. I said I will be briefly indicate how you define conditional expectation in a more general setting right which is exactly what I have done now right. So, this set. So, this infimum is attained for some sigma x measurable random variable and that it is attained because this space is a complete space being a Hilbert space. And that z is called the conditional expectation and it is unique up to a set of measure 0. Are there any questions? So, this includes a study of conditional expectations. So, next we move on to the next module or next section on transforms. So, we will study three types of transforms. So, we will study actually we will first study probability generating functions. Probability generating function or P G F we have moment generating function. And we will study probably in most detail we will study the characteristic function. So, good say good references for this will be Grimett and Sturzaker chapter 5 and open course where lecture 14 and 17 lectures 14 and 17. And of course, I will upload notes. So, this transform techniques are at some level they you can think of them as some frequency domain techniques just like in your signals and systems you have your time domain techniques your frequency domain techniques. So, you can think of all your P D Fs and P M Fs as some kind of time domain functions and these transforms as some frequency domain functions. So, at that is the rough level of analogy. So, your knowledge of signals and systems will come in handy in this in this particular section. So, this probability generating function which is commonly used for commonly used for discrete random variables particularly integer value random variables this is this is akin to your Z transform Z transform for your discrete for discrete time signals. And moment generating function is similar to Laplace transform and characteristic function is similar to Fourier transform in some rough sense you will see properly. So, this transform techniques are very useful in a number of settings. So, sometimes they help their simple one simple use is that sometimes they help you compute things much easier just like for example, if you have to convolve densities or convolve P M Fs it is much easier to go to the in your signals in some you go to the frequency domain multiply invert your transform back right. So, those sort of computationally they are useful and they also are very useful in analyzing certain stochastic processes like branching processes random walks are there useful in proving limit theorems set as the central limit theorem strong law of large numbers and so on law of large numbers and so on. So, they have a number of uses. So, let us take the probability generating function at x be a x be an integer value random variable define g x of. So, you can defend g x of Z is equal to expectation of Z power x that is the definition of the probability generating function. So, this is a integer value random variable. So, this is you can just write this as sum over i Z power i probability that x is equal to y. So, Z is just some parameter you can think of Z as generally the Z could be some complex number. So, you are what you are looking at is some summation of a complex series and this looks like. So, if you were to look at this guy as some kind of a discrete time signal this looks very much like your Z transform except usually you have a minus here for the Z transform, but otherwise it is analogous at least and you sum over all i. Now, of course, there is when you write down a series like this there will be all sorts of questions on whether the series converges what kind of convergent it is a complex series after all. So, you have to talk about what the region of convergence is whether it converges absolutely uniformly and so on. So, these convergence issues we will push under the rug a little bit for this course because it requires a I mean it requires some knowledge of complex analysis. You already know I guess that from your study of signals and systems you know that Z transforms the convergence regions are what kind of figures. They are generally circular circles right in general circles or annular regions then that is true here it is there exists some radius of convergence. So, let me just mention that. So, convergence. So, there exists R. So, this could be possibly infinite such that the P G F what I mean the P G F converges is that the series converges right this P G F converges for all Z such that absolute value of Z less than R and diverges for absolute Z greater than R. Actually I think this particular statement is I am missing out a little bit here I think this statement is probably true for non negative value random variables right. Otherwise you could have this Z power minus i terms and you may have annular regions of convergence. So, if you this statement I believe this statement let me clarify this, but I think this statement is true for non negative integer value random variables right. Let me clarify this once I confirm this the one thing you can say for sure. So, this R may be infinity also in which means that the series converges in all in the whole complex plane. The one thing you can say for sure though is that this radius of convergence includes Z equal to 1 for sure right. Because you will simply have. So, if you have Z equals 1 right or Z absolute Z is equal to 1 then you will certainly have absolute G Z is equal to absolute value of that which is less than or equal to absolute value of. So, you will just have 1 right. So, for Z equal to 1 you have absolute convergence and therefore, uniform convergence right. So, this region of convergence will certainly include the unit circle right whatever else it includes or not. It does certainly include the unit circle correct as region of convergence. You mean normally if this were some arbitrary discrete time signal that is not necessarily the case, but because this is a P M F your region of convergence will include the unit circle right. So, that much is clear and whenever. So, if you are in the region of convergence you will have absolute convergence and therefore, uniform convergence and therefore, this P G F will be very different. So, and that is as much as I am going to say about convergence issues. So, I mean I am not going to dwell on this too much because it we are mostly going to use it as a tool rather than worry about its existence and so on right. Existence analyticity you can talk about all sorts of stuff for this complex function, but we will use it mostly as a computational tool. So, I will not dwell on this convergence issues too much I will confirm this statements correctness. I am pretty sure it holds for non negative random variables I will confirm this. So, let us do an example with probability of x is equal to i is equal to e power minus lambda lambda power i by i factorial i equal to 0. So, Poisson random variable here your g x of z will be sum over i equals 0 to infinity and that is simply the series for e power lambda z right. So, this will become e power lambda z minus 1 and that is an analytic function in all of the complex plane right. It is converges uniformly its analytic in the whole complex plane. So, this is for any z in c on the other hand if you try doing this for geometric. So, if you have p x is equal to i is equal to 1 minus p power i minus 1 times p then you will get g x of z is sum over i equals 1 through infinity 1 minus p z whole power i times p unit. So, that is equal to p z over 1 minus z times. So, sorry 1 minus p times z and that holds for absolute value of z less than 1 over 1 minus p correct. So, that is your radius of convergence or region of convergence it clearly includes absolute z equal to 1 right. It always includes absolute z equal to 1 it is a little bit bigger than that and depends on p and out for z bigger than that it does not exist. So, in terms of notation I am using big g for generating function and x I am suffixing it with the random variable I am talking about and z is the argument. So, let us look at some properties first property is that if you take d g d z and evaluate it at z equal to 1 you get expectation of. So, the maybe the property 0 should be that g x of 1 is equal to property g x of 1 is equal to 1 always right. You can see in these cases as well right g x of 1 is 1 I am saying the g prime of 1 is equal to the expectation of x. So, if you know the probability generating function you differentiate it with respect to z and set z equal to 1 you get the expectation of x. So, what is the reason that this is true first of all is this is derivative always exist see the region of convergence always includes you see it always includes the circle the unit circle absolute z equal to 1 right. So, in that case you will have an analytic function which you can differentiate and you can just set z equal to 1 right. So, the way of proving this is that. So, what you so you differentiating this right differentiating the series essentially and you will get I z power I minus 1 right and you setting z equal to 1 right and you get I times p x equal to y. So, which is the expectation right the only thing that I mean again I am not being very precise about is that how you can take the derivative inside the summation and things like that right. If the finite summation you can do it, but so basically you are it is justified when the summation converges to an analytic function then you can take the derivative inside the sum. I am just pushing those details under there are little bit and you can show that the k th derivative similarly you can show that the k th derivative at z equal to 1 will give you expectation of x times x minus 1 times x minus k plus 1 it is similar argument you can prove that similarly analyticity at if analyticity on the unit circle will imply this result right. Third if x and y are independent and z equal to x plus y then g z well. So, now I have to be careful. So, you may call this something else then right x plus y is equal to some other random variable what shall I call it. Let me just call it z and just call g z of s let us say right. So, I mean I do not want confusion between the argument and random variable g z of s is equal to g x of s times g y of s. So, you can multiply the so x and y are independent z is the sum of the 2 just like. So, if x and y are independent then you will the discrete random variables. So, you will convolve the p m s but convolving your p m s is same as multiplying your p g s just like in your discrete if you have discrete time signals that are convolving you multiply the z transforms right this is the same result I am writing s here just to avoid confusion with the random variable itself. And the region of convergence will be the region where both converge intersection of the regions of the convergence. So, if you want to show that sum of 2 Poisson random variables is independent Poisson random variables is a Poisson random variable. You can just directly look at this if 1 of them has parameter lambda the other has parameters mu then you can show that the product will have parameter lambda plus mu. So, it is a Poisson with parameter lambda plus mu. So, in one short you will get it without convolving and whatever right you get it. And obviously, this extends to n independent random variables also you just multiply the p g s. So, if you want to find the p g f of a binomial random variable you can take the essentially take the n power of a the Bernoulli p g f right because binomial is sum of Bernoulli's right. So, this is very useful. So, we in your last quiz you had this problem on the negative binomial right. So, try as an exercise perhaps you want to try taking the finding the p g f of the negative binomial. And then proving that the sum of those 2 negative binomial is another negative binomial right that is something you proved the hard way I guess, but you can prove it easier using p g f. So, the last property I will talk about here property number 4 is about random sum of. So, let us say z is equal to sum over i equals 1 through n x i right where x i is are i i d discrete random variables actually in positive integer value random variables let us say and n is independent of x i. So, in this case you can show that g z of s you can denote by z n z begin if you like. So, g z of s is equal to g n of g x of s. So, if you want to find the p g f of this random sum of random variables right. So, you take you compose the p g f of n and the p g f of f. And how do you prove that use iterated expectations. So, you have g z of g z of s is equal to expectation of s power z right that is you write this as expectation of expectation of s power z given by iterated expectations right. And so, if you fix an n. So, you can compute this right expectation of s power z given n. So, if you are fixing an n expectation of s power z will simply be the n time product of g x right. Because if you fix n equal to little n this will simply become i is equal to 1 to little n x i and the p g f of that we know simply g x of s raise to the n power right. So, but if you replace small n with capital n this will simply become expectation of g x of s power big n no right. I am skipping a step and this looks like what you want right. This is simply g n of g x of s we have to be careful with the region of convergence. So, for example, you can now if you are summing let us say a geometric number of geometric random variables. Let us say x i is a geometric with parameter p n is geometric with parameter q. You go ahead and as an exercise you can try finding the distribution of z x i is a geometric with parameter p n is geometric with parameter q we are all independent try this exercise using this. Let us stop here.