 In the previous module 6.1 we listed several of the desirable properties of estimates estimators. These are extremely fundamental properties one should always look for whenever they are trying to do statistical estimation because if you show your result to your statistician they will immediately challenge you by asking this question is it unbiased how efficient is this it is a constant estimate and so you have to be aware of these potential questions that an educated person who is well versed in statistics could ask because if you project data simulation as one of estimation of unknowns you cannot escape analyzing properties of estimates that one has to consider as a fundamental base. So, in this module we are going to turn to a particular method of estimation least square estimation we have already talked about deterministic least squares so far now we are going to be talking about statistically squares. So, the whole lectures are centered around these square methods until now we talked about deterministic least square principles now because we considered static deterministic problem dynamic deterministic problem everything was deterministic up until now now we have ventured into the world of stochastic or statistical estimation with respect to model being stochastic with respect to observations are always stochastic. So, this theory essentially calls for some nodding understanding of the fundamental underpinnings of stochastic or statistical estimation. One of the workhorse of the statistical estimation methodology is again statistically squares. So, we are going to not develop the grand theory of statistically squares but we are going to illustrate the fundamentals of statistically squares using simple examples that are relevant to data simulation. Again let us start with let z be an observation let z be a linear function of the unknown x v be a noise v be such that its mean is 0 its covariance is r is known r is spd r is a covariance matrix in statistics the inverse of the covariance matrix is called information matrix for example let us let us let us consider the following let sigma square be the variance if the sigma square is large what does it mean the random variable can find itself in a large domain that means the uncertainty is large when sigma square is small means what the variance is very small if the variance is very small there is a very small variation that the underlying random variable encounters. So, we know pretty much the values much more confidently than when the variance is large so the inverse of the variance if sigma square is large 1 over sigma square is very small your sigma square is small 1 over sigma square is large therefore the inverse of variance is called information they are inversely related to each other one is the reciprocal of the other so larger variance means less information and smaller variance means more information the covariance matrix inverse of a covariance matrix we simply do not say inverse of a covariance matrix another name for it is information matrix so if r is spd information matrix exists x is an unknown x could be random for example as we have already said the unknown could itself be a random variable for example a climate variable as a natural variability so climate variables are random variables I am going to make observations of this underlying climate variables v is the observation noise so x is a random variable v is another add on noise coming from observations so there are 2 components in here I am going to assume that the observation noise and the underlying intrinsic variation in x are uncorrelated that is a very standard assumption it is also a valid assumption the climate does not change because I am trying to measure the nature so v depends on the instruments I use but the climate itself has an underlying variability for example El Nino comes once in several years in some form of a cycle while we cannot pin down the exact period of the cycle we know roughly what happens in here so we know the overall behavior we cannot pin down the value of the period so the period with which El Nino occurs is a random variable so El Nino phenomena has a natural variation if El Nino varies naturally the associated climate also varies naturally so x is unknown x is endowed with a natural variability x is a random variable v is an additional randomness that I am introducing into by virtue of measuring about gaining information about x so when you have multiple random variables in statistics one of the things that we have to worry about what is the relation between these 2 random phenomena simplest possible assumption is also very valid assumption x and v are uncorrelated now define the residual a this sounds very familiar to us that is what we have been in deterministic squares so I can now consider the weighted sum of square residuals the weighted sum of square residuals ffx is equal to rx transpose r inverse rx r is the noise covariance which is also called the energy norm square of rx where the weight is r inverse the explicit expression is given by z-hx transpose r inverse z-hx I have stuck in a factor of half I have already argued that many of the arguments do not change the only convenience for half is that it simplifies a little bit of an algebra we can compute the we can compute the gradient of that we already know we have we have we should be already masters of this you may have by now we have seen this term several times over so this is the expression for the variance this is the expression for the Hessian if I set the variance to 0 and solve the equation 1 I get the estimate least square estimate like this if I assume z to be to be to be deterministic xl x hat ls the least square estimate is also be deterministic but in our case z is random z is random because x is random v is random z inherits randomness from 2 sources the 2 sources are uncorrelated therefore x hat ls is random so this estimate is a random the vector it has its own underlying sampling distribution the estimator the estimate has a distribution so I would like to be able to ask myself the following question under what condition this least square estimate is unbiased and so on we would like to be able to analyze the properties of this estimate so first question is this unbiased is this estimate unbiased so I am going to discuss unbiasedness please and from the previous slide this is the expression for the estimate but please remember x is equal z is equal to hx plus v so if I substitute this z in here and simplify simple algebra leads you to this so if I took the expected sides left hand side is this right hand side is this the expected value v is 0 so the expected value of the estimate is equal to the unknown and hence the estimate is unbiased so what is the mean the least square estimate is unbiased in this case I would like to be able to compute the covariance of the estimate the covariance of the estimate z of x ls being a vector the covariance consists of the outer product this is the column that is the row you have to consider a matrix expected value of the matrix expected value of the matrix as the expected value of the every element of the matrix but in this particular case from this equation I do know the expression for x hat ls so expression for x hat ls is equal to x plus this so using this relation in here as well as in here we can readily see these two factors relate to this I am going to talk about the expected value is very easy for me to draw these things and tell you but I would like you I would like to emphasize that you should do all the verify all these algebra that is very fundamental now look at this now this term is not random this term is not random this term consists of the covariance of the noise vector and we all know the covariance of the noise vector is R this R will get cancelled with one R inverse once it gets cancelled with R inverse the other term also gets cancelled leaving behind H transpose R inverse H inverse A that is the expression for the least square least covariance of the least square estimate now I would like to go back to the previous slide if you look at equation 2 the Hessian I want to emphasize the Hessian of the FFX is equal to H transpose R inverse H now let us come back in here the covariance of the estimate is H transpose R inverse H inverse so you can readily see the covariance of the least square estimate sorry the covariance of the least square estimate is simply inverse of the Hessian of the objective function it is a beautiful property is one of the one of the most one of the most beautiful property that relates the gradient the Hessian the estimate the variance of the estimate and so on therefore what is the conclusion the conclusion is the least square estimate in this setup is unbiased I can also compute its variance the variance is related to the inverse of the Hessian of the objective function that the least square estimate tries to minimize please remember we have already talked about the relation between projections and least square estimates so relation to projections I want I would like to bring to our attention so what is Z hat so let us we can think of it like this now this is the span of H we have already talked about it in our iterative deterministic methods this is Z if I project this. This is then orthogonal projection if I project this that is an oblique projection so I in the case of orthogonal projections Z hat in the case of oblique projection there should be this I could also get an oblique projection like this there is only one orthogonal projection all the other projections are oblique oblique projection occurs when there is a weighted least squares orthogonal projection comes in only when you do ordinary least squares these are wait at least squares. So we are going to get oblique projections in here therefore Z hat whichever projections you have Z hat is always is equal to H of XLS XLS the formula for that is already known you substitute that formula this is the expression for the projection of Z on to the span of H. I am going to concoct a matrix that matrix is a projection matrix P it is given by this expression we have already seen this in the case of deterministic case I am trying to redo them within the context of statistically squares to see the relation between least squares whether it is statistical or deterministic it can be verified this matrix P is idempotent what does it mean P square is equal to P please verify that this matrix is not symmetric P is equal not equal to P transpose please again verify that. So any idempotent matrix that is not symmetric gives rise to an oblique projection that is the basic theory we have not proved that theory we have not proved that theorem I would like to if you are a person who wants to know more and more of fundamentals what is the fundamental result in here any orthogonal projection matrix must be idempotent and symmetric any oblique projection matrix as an operator must be idempotent but not symmetric what is the intrinsic relation between projection operators and matrices with these properties these are produced in an advanced book on matrix analysis matrix theory there are number of books we have alluded to especially the book by horn and johnson is one of my favorites it is very nice proofs of these basic facts from operator theory as well as projection theory when R inverse is I P is equal to P transpose it becomes symmetric it becomes orthogonal projection. So this is something we have already come across within the context of deterministic I wanted simply inform ourselves of the fact that the same properties also carry over to the statistical counterpart. Now I am going to make some further simplification until now we assumed the covariance of V the covariance of V is equal to R in general R is a speedy so R is a symmetric matrix in general it can have nonzero off diagonal elements now I am going to come to a very special case I am going to consider the noise component V are uncorrelated. So what does it mean V has V1 V2 Vm the covariance of VI with respect to Vj if it is equal to 0 for I not equal to J what does it mean all the elements here are 0 all the elements on here are 0 further I am going to assume the variance of each of the is equal to sigma square that means they all have the common variance in this case my R becomes sigma square times I sigma square times I is an identity matrix multiplied by sigma square that means it is a matrix where off diagonal elements are 0 all the diagonal elements are same and this makes sense why is that I am going to measure observations at different places and collate them to a vector the observational errors of different places where I am measuring from they are uncorrelated I am using the same instrument at every place so they have a common variance so that is the physical import of this assumption so in this case R is equal to sigma square I in this case my expression for XLS if I substitute this becomes this now please remember now in this case there is no sigma square there is no R because it magically goes away that is the nature of the expressions in here the covariance of this is simply the inverse of the Hessian which is given by this H transpose H you remember that that is the H there is a Gramian H transpose H is symmetric if I do an Eigen decomposition of H transpose H I get this Q is the matrix of columns of the Eigen vectors H transpose H is equal to Q lambda Q transpose Q Q transpose Q transpose I is I lambda is the diagonal element with n Eigen values I know I am going a little faster but I am sure we have come across this several times so I do not want to overly repeat what we have already done we have come across these things in the context of the discussion of the value decomposition and the same ideas come over here because this is the Gramian so given this so if H transpose H is equal to this its inverse is equal to the inverse of the right hand side we already know inverse of the product is the product of the inverses in the case of Q Q is an orthogonal matrix transpose is the inverse I am combining all those results to get this this is a very special formula for the for the for the for the H transpose H inverse why am I interested in the H transpose H inverse because H transpose H inverse is a matrix that relates to the covariance of the the least square estimate least square estimate now if I consider a covariance matrix the diagonal elements of the covariance matrix are all variances of the different components so I am now going to compute the trace of the covariance of the estimate please remember we have already talked about trace the trace of a matrix is equal to sum of the diagonal elements when the matrix is the covariance matrix diagonal elements are all related to variance so the trace of the covariance matrix gives you the total variance in the components of the estimate so trace of the covariance of X hat LS is equal to the total variance in all the elements of the estimated vector X hat L of X hat LS now we already know that is equal to sigma square this this is a constant that multiplies the trace so that comes out from the definition of trace so now I am going to utilize this formula to to to substitute for this so this becomes this there are lots of matrix algebra in here and and we also have seen trace of a b c is equal to trace of c a b is equal to trace of b c a that trace is invariant under the circular shift of the product so because of that I get this q q transpose is I therefore this quantity becomes a trace of lambda inverse trained trace of lambda inverse is simply sum of 1 over lambda i times sigma square beautiful beautiful so that the total variance in the estimate is simply sigma square times sum of the reciprocals of the Eigen values of the Gramian H transpose H it is it is it is absolutely beautiful result so what does it tell you if H transpose H is nearly singular one of the Eigen values is going to be close to 0 when one of the Eigen values is going to be close to 0 1 over the smallest Eigen value that is explored on your face that means the covariance estimate is large so what does it tell you you remember the condition number we talked about is something similar to that when is the condition number goes goes large the condition number please remember is the ratio of the largest Eigen value to the smallest Eigen value if the smallest Eigen value while remaining positive is very close to 0 that is going to explode on your face that is exactly what is happening in here that is exactly what is happening in here the condition number the Gramian H transpose H is very large another interpretation in the context of statistical least squares is that the covariance of the least square estimate can explode on your face if the matrix H is ill condition or nearly ill condition that is that is exactly the conclusion coming from the last line of the slide now I am going to talk about A I can do all the estimates I am assuming my covariance noise covariance is sigma square I until now I assume my I know my sigma square what happens if sigma square is not known so I could estimate that statistics are very clever people they will relax every possible assumption what if that is known how to how to estimate this what if that is not known how to estimate this so they will talk about every possible combinations of knowns and unknowns in the context of estimation problem as we have seen in the previous exercise so the whole question is this can I use this framework to be able to estimate the observation noise covariance let me let me let me pose the problem like this we have already seen if we use satellite observations we generally do not know what is the covariance of satellite observations so is that a way to formulate your estimation problem such that using which I can at least hope to estimate the variance of statistical of the satellite observations or radar observations that is one way to think of this problem as a motivational motivational motivational question so I would like to be able to estimate I would like to be able to estimate sigma square sigma square is related or by this now to be able to do this estimation I am going to go back to my residual residue E which is the error in the estimate E is equal to I sometimes I call it or sometimes I call it E depending on the context I do not think that should bother you I can call the same quantity by different names at different places depending on convenience I hope it does not throw you out of the boat so Z-HFLX so this is the model counterpart this is the actual observation the difference is the error I can express this error as I-P times Z where P is the projection matrix which we have already talked about in the last slide I can substitute for Z as HX plus V and if I do the multiplication it turns out that this error is equal to I-P times V and why this is I-P times V because of the algebra that is given in here so because of this algebra it turns out that this product while it should have 4 terms essentially the product of I-P the HX becomes 0 because of the argument in here I am going to leave it to you to enjoy the argument so the expected value of this error is 0 that means this error is an unbiased it is the covariance of this error I am interested in estimating therefore I want to be able to estimate sigma square sorry I want to be able to estimate sigma square in order to estimate sigma square E is the error vector E transpose E what is that that is the sum of the squared errors I want you to remember two things now E E transpose is a matrix because E is a column vector but E transpose E is a scalar I want you to distinguish two things E E transpose E is simply sum of squared errors so expected value of the sum of squared errors is given by this because we have already derived the expression for E in the previous slide E is equal to I-P times V I plug in that value in here I am sorry I plug in that value in here sorry yeah this is the right place I plug in that value in here I-P P is the projection operator is idempotent so I-P times I-P is I-P I want you to verify I-P is idempotent because P is idempotent V transpose I-P V is a scalar therefore a scalar is a quadratic form in V that is equal to its own trace because trace of a scalar is itself that is a mathematical trickery that we bring in to make the analysis more meaningful and the trace of a product by rule by the cyclic rule the trace is essentially given by this this is essentially so I can now trace of V V transpose is E of I want you to remember E of V V transpose is equal to sigma I sigma square I so if you use that particular property this becomes this which is essentially trace of sigma square times trace of I-P trace of a sum is the sum of the traces you can see so many facts are used trace of I is M because I am concerned with the M by M matrix trace of P trace of the projection matrix is N that is proved in here I have attached the proof for every one of these sigma so the expected value of sum of squared errors is sigma square times M-N I am concerned with an over determined system where M is greater than N that is an underlying assumption we have made all through therefore what could be the structure of a estimator for sigma square this is the resulting structure for the estimator so E is the error please remember E is the error as a residual in the estimate so E is computable E transpose E is a scalar that is computable you divide that by M-N that tends to be an unbiased estimate that tends to that that is a very good unbiased estimate for sigma square so look at this now using this framework we not only can assume sigma square is known and estimate X which is unknown after having estimated X we can also use the same framework to be able to come to estimating sigma square so we can kill two birds in one straw we can kill two birds in one straw is a very powerful device is a very powerful device I hope you do go over this very leisurely and enjoy the moment to understand the beauty and the structure the arguments and the power of this example so now I am going to summarize everything we have seen so I am talking about the world of estimators U is the world of unbiased estimate that is given in green L is the world of linear estimates some estimates are linear some estimates are non-linear some estimates are biased some estimates are unbiased so outside the green is biased outside the red is non-linear the intersection of unbiased and linear is called linear unbiased among all the linear unbiased estimate LUE L for linear U or unbiased E for estimate I am interested in the best linear unbiased estimate best in what sense best in the sense of minimum variance A that is what Kalman filter rest on Kalman filter next estimations which are blue Kalman filter estimates are blue that is the reason why we use them repeatedly and what gives us the power to use Kalman filters and make predictions because it is a very beautiful underlying property called blue blue is the best linear unbiased estimate we have talked about unbiased we have talked about linear estimate we have talked about also best best relation efficiency linear relative structure of the estimator unbiased is also related to the property of the estimate so far we have seen via an example properties of least square statistical least square estimate a reader can very easily recognize the parallelism between deterministic least squares and statistical least squares we not only are able to estimate the unknown we are able to show that is unbiased we are able to compute its variance we are also able to test the properties of the resulting estimate against the standards unbiasedness and consistency and being able to estimate the unknown covariance and so on that gave raise to the notion of the class of all unbiased estimators the class of all linear estimators the why we are going to be concerned with linear estimators because computationally linear estimators are easy to compute nonlinear estimators requires more time if I can get very good results with linear estimators why not unbiasedness is very fundamental to any and every estimator whenever wherever possible so I am interested in the intersection of these two properties namely linear unbiased estimates LUE once you confine yourself to LUE you also want to be able to next talk about the other dimension namely variance of the estimate if the total variance of the estimate can be minimized then it is going to be optimal in some sense such optimal estimates are called best linear unbiased estimate so we are interested in blue blue is a key characterization of estimations and as we will see later in when we do the Kalman filtering this notion of blue plays a very fundamental role in the definition of filtering equations known as Kalman filter equations now I am going to bring to your attention you have very fundamental theorem called gas Markov theorem gas Markov theorem relates to an inherent optimality of the least square estimates thus far we talked about some of the routine properties this gas Markov theorem takes the least square estimates one step further and puts it on a pedestal so what does gas Markov theorem says among all the linear unbiased estimate then the least square estimate is the best so it tries to attest the importance of the least square estimate within the framework of estimation theory so that is what is called optimality of least square estimates least square estimates are optimal in a very natural sense the discovery of this natural property of least squares is called gas Markov theorem gas live in 1700s Markov lived in early 1900s Markov is a Russian mathematician gas is a famous German mathematician so in that honor I believe it is called gas Markov theorem so let us try to formulate the basic aspects of the claim of the gas Markov theorem let X be an unknown to be estimated let us pick a mu a vector a vector of the same size as X and let us concoct a function phi of X which is mu transpose X mu is a vector of fixed X is a variable so mu transpose X is a linear function is an inner product so phi of X is a linear function so this is a linear estimator phi of X is a linear function is a linear estimator now what is that we want to do even though mu is known even though mu is something I picked X is not known components of X is not known so mu of mu transpose X is essentially sum of mu i X i so instead of estimating X I would like to be able to estimate this linear combinations of the components of X which is sum of mu i X i so the problem is to consider estimating phi of X I hope that is clear now ultimately I am interested in estimating X but I am considering a special case of estimating phi of X so you can think of phi of X as a functional so phi is a mapping from Rn to R phi is a function to which if I give X it gives me mu transpose X and phi is defined by mu so that is the idea here so what is that we are seeking we are seeking a linear unbiased estimate for phi of X now how is this related to estimating X suppose so now you can see mu is a vector which is mu 1 mu 2 mu n suppose if I set mu 1 is 1 all the other mu are 0 mu transpose X becomes X 1 suppose I picked mu 2 as 1 everybody else is 0 I pick X 2 and so on so if I can if I know how to estimate mu transpose X I know how to estimate every component of X if I can estimate every component of X I can estimate X therefore the problem of estimating X is embedded in this problem of estimating a functional of X so without loss of generality let us consider the problem of estimating a linear functional of X so we all know linear function functional is a mapping from a vectors from Rn to R and so on so let Z so we talked about the structure of what we want to do let Z be the data or the observation that has information about X let A transpose Z be the estimator for phi of X so phi of X is the is the so what is that now we want to do we do not want to we do not want so that is X that is phi of X we want to be able to estimate phi of X to estimate phi of X I need an estimator estimator is a function of the observation so I am going to concoct that A transpose Z where Z is the observation be an estimator for phi of X phi of X is a scalar X is a vector let A be another vector I would like to be able to consider A transpose X to be a potential candidate for estimating the value of phi of X I hope that is all clear now A is Rn Z is Rn mu is Rn X is Rn so you can see all the relations all the players in this game if A transpose Z is my estimator I am sorry is my is the structure of my estimate I can compute the expected value my Z has a standard formula Z is the linear function of X so A transpose HX plus V I am substituting for Z in here the standard one expectation of the sum is the sum of the expectations expectation of V is 0 expectation of E of X if the unbiased estimate is X therefore E transpose A Z is A transpose HX so that comes very nicely that comes very nicely now A transpose Z is an unbiased estimate only if phi X is equal to mu transpose X is equal to expected value of A transpose Z is equal to A transpose HX please remember for the unbiased estimate it must be equal to the value I am seeking therefore if I were to relate these two quantities that essentially tells you mu transpose must be equal to A transpose H or H transpose A must be equal to mu so that provides the relation between the A vector that I use in the estimator and the mu vector I use in the definition of the functional to be estimator so the A and mu are cannot be two distinct they must be related through H H is a function that relates the state to the observation I hope that is clear since it is an unbiased estimator we already know the mean square value is equal to the variance please remember in one of the earlier lecture we have already said that if the estimate is unbiased mean square error is equal to the variance so minimizing the mean square error is equal to minimizing the variance therefore I am now going to compute the variance of the estimate A transpose Z the variance of the estimate is equal to expected value of the error square I am again going to substitute for Z which is given by this we already know A transpose Z expected value is A transpose H X so if you simplify this you get this A is a constant V is a random vector A transpose Z so we can write like this A transpose V square is equal to A transpose V transpose I am sorry A transpose once again times A transpose V here A transpose V times V transpose A because the symmetric the inner product is symmetric therefore this can be written as A transpose V V transpose A if I took the expectations I get what is this so to go from here to here this is the simple algebra so like this I would like you to do all the basic algebra to see which steps come from the previous one why and how and the variance V is R so this is A transpose R A A transpose R A R is a matrix A transpose R A is a quadratic form so A transpose is this this is the matrix that is the vector so this is a scalar so the variance of a scalar is a scalar everything matches so I have computed an expression for the variance of the estimate which is A transpose A the variance of the estimate is A transpose R A yes I know I am going a little faster but what I am trying to tell is nothing unknown to you except for the algebra so I would like you to be able to go through these algebra and convince yourself in my regular class where I teach a student I also do not go over this algebra in the class because it is a part of the folklore of the course where you need to struggle learn to do many of the matrix manipulations and these exercises these simplifications are very educative I hope you will pursue that now what is that what you want to do we want to be able to get the best estimate best estimate is in the sense it minimizes the variance of the estimate so what is the estimate the estimate is is is is is mu transpose x I am sorry the estimate is A transpose z what is the functional being estimated mu transpose x what is the variance of the estimate the variance of the estimate we have already seen is given by A transpose R A now I would like to minimize this variance with respect to A A is the variable you remember A is the vector I picked to design my estimator which is A transpose z z is given A is something I picked now I am going to take responsibility in how to pick A what are the conditions for the choice of A that is where we are coming to so I need to be able to minimize this quadratic function A with this quadratic function with respect to A but A is not a free variable please remember the unbiasedness requires A and mu to be related H transpose A must be mu therefore I am not interested in minimizing A for any A I am interested in minimizing A transpose R A under the condition that H transpose A is mu so this is the constraint this is an equality constraint so I am interested in minimizing something with the equality constraints now how many times we have seen this constraint minimization with the equality constraint what is the rule the grand general multiplier now you know the importance of multivariate calculus optimization theory so formulate this as a Lagrangian problem A is a free variable with respect to which I am going to do the minimization lambda is a vector with respect to which I am going to build my constraint into my Lagrangian function so this is the Lagrangian function compute the gradient of the Lagrangian function with respect to A compute the gradient of the Lagrangian function with respect to lambda equate them to 0 solve them simultaneously the solution process is given here ultimately the minimizing A is given by A is equal to R inverse H H transpose R inverse H inverse mu wow look at this now I want to go back I know there is lot of lot of side steps there is involved Z is equal to H X plus V I do not want to estimate X by I want to estimate a functional of X which is mu transpose X to estimate this I picked an estimator to be A transpose Z this estimator is unbiased this estimator has a variance which is A transpose R A I am interested so this estimator is linear I am sorry this estimate is linear it is also unbiased therefore it is already LUE I want to be able to introduce a B to that I want to make it a blue to make it a blue I have to minimize this with respect to A but the unbiasedness requires A and mu to be related H transpose A must be equal to mu therefore this is the constraint this is the function to be minimized so I combine the two I minimize this with this as a constraint using Lagrangian multiplier technique with a little bit of algebra and simplification I have now found a formula for the optimal A what is this value the A that I should use in my estimator is not coming out of the blue sky but it is going to be decided by this structure which is given by star now let us look at the structure star it depends on R the covariance matrix of the noise it depends on H which is the linear map between the model space and the observation space it also depends on mu what is mu mu is the coefficient of the functional that we originally started that we originally started with so we have solved the problem if I pick my A to be this and use that A in my A transpose Z the resulting estimator is not only is not only is not only linear it is unbiased it is also minimum variance it is also minimum variance. So linear unbiased minimum variance estimate of fx mu transpose x is equal to is given by is given by is given by the by this expression so you can readily see this is the least square estimate the H transpose R inverse H inverse H transpose R inverse Z that the least square estimate so it is mu transpose times the least square estimate so mu transpose Z sorry mu transpose Z that is the that is the structure of the that is the structure the estimate that comes out. So if I pick mu to be 1, 1, 0, 0, 0 I get the best estimate of x1. If I pick 0, 1, 0, 0, 0 I pick x2 and so on so I can by this formulation can estimate any component of x I can estimate all components of x each of the component of x will be it will be I can estimate use and the properties of the estimate is blue. So that is the basic idea of the that is the basic idea of the setup that is the basic idea of the setup. So I would like to take you through some of the slides so in the gas markup theorem we have established linearity we have established unbiasedness we have also now established it being the best being the best. Now I am going to further embellish this with some other claims some extensions if the noise so until okay good until now I only assume the noise v has a mean 0 and the covariance r we did not explicitly invoke to the Gaussian nature of v. So until now all the analysis depends on the fact that v has a mean 0 and the covariance r it need not necessarily be Gaussian. So in addition if somebody tells you v is indeed Gaussian as in this case there is a further extension of this gas markup theorem and that gas markup theorem is the extension is called Raub-Blackwell theorem the Raub-Blackwell theorem essentially tells you the least square estimate with a Gaussian noise is the best among all estimates linear and nonlinear. If v is not Gaussian there exist that there exist nonlinear estimates whose variance is smaller than the linear estimate these are all deeper results from statistical estimation theory these results can be gleaned from the book by Dr. Raub we refer to in the previous talk. So I believe I had given you a reasonably good picture of the properties of statistically squares as opposed to deterministically squares that we saw within the context of within the context of deterministic static estimation theory. My favourite book on this topic is again Melso and Cohen is the same book that I have already referred to earlier Decision and Estimation Theory by McGraw-Hill is a small little book no more than 200 pages but it is beautifully written it has an engineering flavour and whenever I have trouble with these I always fall back on on on Melso and Cohen or Melso and Sagan Melso these are the two books and we also refer to many of these things in our chapter 14 of our book on data simulation. So with these two you should be able to get a rather complete picture of the basic elements and properties of statistically square estimation and it is underlying properties. Thank you.