 So, far we have reviewed principles of statistical estimation. We covered the basic properties of estimates and biasness, consistency, efficiency. Then we looked into two types of estimation problems namely one within the Fischer's framework where there is no prior. One of the another is the Bayesian framework where there is prior. In both the cases there are observation. So, the ultimate challenge to us is if somebody gives you only observation what do you do? If somebody gives you observation and some prior information or prior belief what do you do? In the both the context we have developed least square based estimation methodology we have talked about deterministic least squares, we talked about statistical least squares, we also talked about Bayesian least squares. Please understand the primary theme that unifies the whole presentation of solving inverse problems static deterministic, dynamic deterministic, static stochastic, dynamic stochastic all the possible combination the method that unifies them is the least square method. So, with that knowledge of statistical methods of estimation now I am going to provide another thought process within the framework of estimation theory which is called sequential linear minimum variance estimation. Please understand so far we insisted on simple least squares both within Bayesian and non-Bayesian context. Now this is a slightly a different perspective for formulating the estimation problem linear minimum variance estimation. The importance of this linear minimum variance estimation the importance of sequential linear minimum variance estimation comes from the fact that Kalman originally use this framework to be able to derive the now famous equation called Kalman filter equation. So, by doing this linear minimum variance estimation derivation we are going to relate that to the Bayesian least square estimation. We are going to show the Bayesian least squares estimation that we discussed in the previous lecture and the linear minimum variance estimation that we are going to do now are 2 different facets of the same problem by and we will build a bridge to go one from one to the other both ways thereby establishing a broader way to be able to look at estimation principles that are used within the context of general dynamical data simulation. So, it is in that spirit we call this module from Gauss to Kalman why Gauss least squares why Kalman linear minimum variance estimation. So, we will first develop the linear minimum variance estimation per se starting from fundamentals then we will try to build the bridge between the Gaussian least squares and the linear minimum variance estimation thereby establishing the dual aspects of the estimation problem estimation methodology that is the goal of this chapter. Once we do this we would have in principle completed the fundamental principles that underlie the derivation of Kalman filter equations this we are going to do not within the context of considering the dynamics but a very general discussion and it is this generality of the discussion is very attractive and that is what we are going to pursue. Linear minimum variance estimate is something similar to the Gauss Markov theorem Gauss Markov theorem essentially relates to least square estimates let Z be equal to HX plus V the assumptions are the noise the mean 0 the covariance of the noise is sigma V sigma V is SPD X is random I have a prior distribution the prior has a mean MX the covariance of this is that is SPD X and V are not correlated. So, these are the basic assumptions that one has to be able to formulate our estimation problem again you can see there is a Bayesian under current of Bayesian assumptions X is random X has a prior Z is observation. So, prior gives some information observation gives some other information we have two pieces of information whenever you have two pieces of information we would like to be able to combine it optimally the Bayesian framework essentially concentrates on that we have already seen that now we are going to be looking at another way of doing the estimation Bayesian like estimation but within the framework of linearity unbiasedness and minimum variance that is the theme of this first part of this discussion. So, I have an estimator which is phi of Z gives you the estimate estimate is X hat estimate is X hat in the Bayesian framework we we concocted a cost function and then we had we had use that cost function define what is called the Bayesian cost and we try to minimize the structure of the estimator that minimizes the Bayesian cost here we are going to do totally differently. We are going to insist that this estimate X hat is linear in the observation this X hat is going to be unbiased estimate this X hat also simultaneously possesses the minimum variance property earlier when we did the Bayesian estimation we showed that the conditional Bayesian cost can be minimized by choosing the estimator to be the posterior conditional mean and after choosing that we then demonstrated that estimate is unbiased it is also minimum variance. So, we first optimize the cost and then study the properties of the estimate as being up as being unbiased and of minimum variance here we are going to start by explicitly talking about the structure of the estimate. So, let the let the estimate X hat be dependent linearly on on on on Z well X hat. So, phi of Z is equal to AZ plus B is such functions are not called linear they are called affine in in in in so if B is 0 is linear if B is not 0 it is a linear term plus a constant term that is called affine function. But we call such a structure for simplicity linear so the estimate has a linear structure and I would like this estimate to be unbiased and minimum variance now please understand we only know Z what is a is a matrix what is size of a the left hand dimensional vector Z is m dimensional vector. So, a belongs to R n by m B belongs to R m. So, A and B are two unknowns so we are going to simply require let X hat be AZ plus B there are two parameters A and B I would like to be able to determine A and B I would like to determine A and B such that you properly chosen X hat becomes unbiased you properly chosen X hat in addition to being unbiased is also minimum variance. So, we are going to determine two parameters such that the resulting estimate will satisfy two conditions one being unbiased another being minimum variance straight away. So, let X hat be the error in the estimate to we seek to minimize the mean square error. So, X hat transpose X is given by X minus X hat transpose X minus X hat X this is an inner product that is a scalar. Therefore, the transpose of a scalar is it is the trace of a scalar is itself. So, I am going to express this as a trace of that quantity because a scalar is a one by one matrix I can think of a scalar and the trace of a scalar is itself. We also know trace of A B the trace of A B is equal to trace of B A. Therefore, the third line comes from the second line expectation operators trace operator they commute one can really see. So, expectation of the trace is simply the trace of the expectation expectation of this term inside as an outer product that is that is X tilde times X tilde transpose and that is that is the covariance of the of the error we are going to call it P. Therefore, that is equal to trace of P where P is called the covariance of the error it is this quantity the covariance of P is what we seek to minimize. So, when we talk about minimum variance we are going to be talking about the following X is the unknown X hat is the estimate X minus X hat is equal to X tilde I would like to be able to minimize the sum of the variances of the individual components of X tilde that quantity is related to trace of P. So, that is the that is the target that we are going to be working towards. So, now having having having explained what is that we are looking for now we are going to go back to the expression now please remember X hat is the estimate X hat is the estimate X hat is equal to B plus AC let M be the mean of the estimate. So, M is equal to E of X hat E of X hat is equal to expectation of B plus AC B is a constant Z is equal to H X plus V. So, I can substitute Z is equal to H X plus V A being a constant H being a constant I can pull them out E of X is M because we have we have already we have already made assumptions about we have already made assumptions about the mean of X being M that is correct the mean of X being M. Therefore, M which is the which is the which is the mean of the unknown because unknown is a random vector unknown as a mean M and such a mean by virtue of my estimate being linear in Z must be B plus AH times M. So, M must be equal to B plus AH times M and if that is equal it is unbiased. So, this is where the unbiasedness comes comes in what is the unbiasedness condition E of X hat is equal to M. M is the unknown I am trying to estimate M is the mean we are going to estimate X hat is the estimator. So, unbiasedness essentially requires E of X hat is equal to M. So, that is what we start with and so unbiasedness condition leads to M must be equal to B plus AH times M. So, that essentially tells you B must be equal to I minus AH times M. If I substitute this value of B in here I get a new structure for X hat X hat is equal to I minus AH times M plus AZ which can also be written as M plus A times Z minus HM. Look at that structure this looks we look at the structure we saw the structure in the Bayesian framework already. That means the estimate is equal to the prior plus A times Z minus HM. So, Z minus HM is the innovation A is the weight. So, I had two unknowns I have element I have I have decided what must be the structure of B in order that the estimate is unbiased by substituting the condition for unbiasedness the estimate the unbiased the structure the unbiased estimate must be given by this M plus A times Z minus HM. Therefore, the covariance of P the covariance P of X hat is equal to X minus X hat times X minus X hat transpose the expected value that of X minus. So, I know X the structure of X hat. So, I can compute explicitly what is X minus X hat. So, if I substituted this structure in here I get the error structure to be this. So, that is the expression for the estimation error it is this matrix whose expected value is P. So, let us compute that matrix explicitly by substituting star in here we have two terms you multiply the two terms and simplify you get as a result there are four terms that affect the the inner product X minus X of X hat transpose X minus X hat again Z is equal to HX plus V. So, Z minus HM is equal to H of X minus M plus V again there are lots of algebra in here. So, if I substitute this expression in here for Z minus HM the P is equal to expected value of the product X minus X hat times X minus X hat that is M sorry I am sorry parenthesis that is missing. So, I can substitute all the four terms in here in the in the in the I can substitute all the four terms from the previous page in here. So, I have now some of four expected values in here some of four expected values in here the first term you can readily see is sigma X the second term you can readily see is minus sigma X times H transpose A transpose the third term is minus A times H times sigma X the fourth term becomes A D A transpose where for simplicity the in for simplicity notation D denotes H times sigma X plus H transpose plus sigma V yes there is a ton of simplification, but I think it is a worthwhile exercise. So, if you did that I ultimately get what I want this is the structure of the covariance of the estimate. Now look at the structure this structure has sigma X which is known it has H which is known it has D let us look at the structure of D D depends on H sigma X and sigma V everything is known. So, in this every quantity other than A is known A is a matrix now you can also see one more in this term A is quadratic in A quadratic in A in these two terms it is linear in A. So, you can think of P is some form of a quadratic function in the elements of the matrix A is unknown let us go back. So, what is that we want we want to be able to get a linear minimum variance estimation we also wanted to have a linear so that essentially said my estimator must be a linear function of the observation. So, we concocted let X hat be A Z plus B we forced unbiasedness that gave raise to that gave raise to condition on B in terms of A which we substituted we carried through the analysis we computed the expression for the covariance of the estimate the covariance of the estimate is a quadratic function in the elements of the A matrix which is F to be decided. So, we still have not decided what A should be. So, every parameter is a form of a control we have it is like a knob I can change to enable what I want we use B to force unbiasedness please go back I want my estimate to be linear unbiased linearity by assumption by structure A Z plus B unbiasedness B has been expressed in terms of any other quantity and we have eliminated B we have done. So, the only one remaining condition is minimum variance to be able to attain minimum variance what are we going to be doing we are going to be using the tool the unknown the elements of the matrix A. So, I am going to fine tune the element of the matrix A such that the trace of P becomes minimum why trace of P? P is a matrix the diagonal elements of the matrix are variances. So, the total variance sum of all the variances in the components of the estimate is equal to the trace of P. So, what is the rest of the problem the rest of the problem is a mathematical problem of trying to minimize trying to minimize the trace of P which is the total variance with respect to what with respect to yet to be decided free parameter. Now, what is the trace of A trace of A is essentially sum of trace of A is some essentially sum of all the diagonal elements PII. Now, please remember A has n square elements. So, we are going to be minimizing a scalar quantity called trace of P that depends on the elements of A elements of A there are n square of them. So, in principle we have to do a minimization over n square variables that is the computational problem of interest in here to simplify this competitive. So, what is that one can do one can now that we have an expression for the trace we can compute the derivative of the trace with respect to each element compute the condition and correlate all the conditions and build the matrix A that is one way. The another way would be an easier way would be to determine A not element by element but by column by column or row by row that way we can simplify the structure of this minimization problem there are several different ways in which one can minimize. I am going to talk about one particular form of minimization please recall we have already learnt how to minimize a quadratic function where we are we have we have gained a good experience in the minimization of quadratic functions. So, given our understanding and knowledge about the minimization of quadratic forms I am going to use the two of minimization of quadratic forms to be able to determine the elements the elements of the optimal A to that end I am now going to consider an ith element ith element of P now please understand this expression in in in slide 5 gives you the entire matrix. So, if I want to consider the ith element please remember P is a sum of 1 2 3 4 matrices. So, the ith element of P which is a PII at the diagonal element of P which is PII is the sum of the corresponding ith element in every matrix on the right hand side that is the basic idea here. So, PII is equal to ith element of sigma x now let us look at the second let us look at the second term the second term is AD A transpose. So, let us look at this now the second term is AD I am sorry the second term is AD A transpose I would like to be able to consider the ith element of I ith element of this a little reflection reveals this ith element of this matrix is given by the product of the ith row of A times D times the transpose of the ith row of A please understand A is a matrix D is a matrix and A transpose is a matrix this is AD A transpose this is going to be given by the product matrix. So, I am interested in the II this is PII in order to get this PII what is that I do I take the ith row of A I take the ith column of A transpose. So, I do a quadratic form which is given by this expression therefore PII is simply the sum of ith element of sigma x the quadratic form with the ith row of A likewise you can also readily see for the third term for the third term and the fourth term we can express them as a I star a I star look at that now a I star b I star transpose b I star a I star transpose where b I star is the ith row of the n by m matrix sigma xht let me probably I would like to spend couple of minutes on that please let us go back. So, the third term is AH I am sorry the third term is the third time sigma x AH times sigma x. So, let us consider the third term AH times sigma sigma x. So, I would like to be able to consider the matrix as AB where B is equal to no I would like to do the other way I am sorry once again. So, let us consider the term sigma x h transpose A transpose. So, let us look at these two terms sorry let us look at the two terms there are two terms I will I will I will I will erase this part. So, if you go back I have I have two terms AH sigma x AH sorry let me write that down in here AH sigma x is one term and if I go back that is sigma x h transpose A transpose sigma x h transpose A transpose in here this is not x this is h sorry this is h sigma x symmetric. So, you can see these two matrices are transposes of each other transposes of each other. Therefore, if I consider the ith element of this matrix I can infer the ith element of that matrix very easily which is the transpose of this how do I compute the ith element of this matrix. So, this is A let this matrix be let this matrix be h sigma x. So, this is A so in order to consider ith element what is that I now need to do I need to take the ith row of this I need to consider the jth column of that jth column ith row that is essentially it or if I want to have the ith element I need to consider both INJ are equal. So, that is the basic thing. So, in here I need to be able to consider this I this is I therefore, ith element therefore, ith element of AH sigma x is given by the product of ith row and the ith column of the product h sigma x the ith column of h sigma x is the. So, sigma so let us look at this now h sigma x is one matrix sigma x h transpose is another matrix both come in here we also know these two are in transposers of each other. So, what does it mean the ith row of this is the ith column of that ith the row I row becomes the ith column of this. Therefore, we can readily convince ourselves A I star B I star transpose minus B I star A I star transpose or the ith element of the matrix AH sigma x and sigma x A transpose h. Therefore, the entire PI is given by this expression I want you to understand that this is how the extract the ith element of the matrix P. It is the sum of all the PI's that gives it the transpose. So, what is the idea here if I want to minimize the sum it is enough if I minimize the individual term. So, if I minimize the individual terms I can I can I can in other words we are going to minimize the individual terms PII to be able to minimize the trace. Therefore, this PI in here can be written as a quadratic form please remember A I star is a row vector A I star transpose the column vector D is the matrix this is minus 2 B I transpose A I and this is a constant sigma x I I where the two comes from these two terms are scalars they are both equal they are transposers of each other they are scalars. Therefore, these two terms are combined two times B I transpose A I star plus sigma x I I. So, how does this looks like this term essentially looks like x transpose A x minus B transpose x plus C that is a quadratic function x is replaced by. So, x is used in in in place of A I star. So, you can see the relation between x and A I star A I star is the ith row. So, x is a vector in this case x is equal to A I star transpose because x is a column vector A I star is the row vector. So, with this association between x and A I star you can really see this is the quadratic function. So, I would like to be able to able to rewrite this in the form of a quadratic function which is y transpose dy minus 2 B y plus C y I is the A I the row of that. So, y transpose is the column vector. So, y transpose d y is what we are concerned with we also do know the the associations. So, this one is a whose structure is very well known to us. Now, I am going to minimize P I I the expression on the right hand side given by star with respect to y. So, minimize P I I with respect to y that is a standard quadratic form you compute the gradient the gradient in this case is given by 2 dy minus 2 B therefore, the optimal y is given by d inverse B I change the notation back to A. So, this is going to be equal to A I star transpose is it going to be equal to d inverse B I star and please remember B is related to sigma xh. Therefore, therefore I can now construct a matrix A 1 star transpose A 2 star transpose A m star transpose each of these have d inverse is a common factor common factor and I have can included all the B's. Therefore, this matrix is equivalent to product of this matrix time that matrix this expression provides optimal value of A optimal A. So, A 1 star is the first row of A A 2 star is the second row of A A m star is the nth row of A. Therefore, if I consider the transposes this matrix now becomes A transpose. So, instead of expressing A we are trying to express it in terms of A transpose that is a convenient thing. Therefore, the previous expression look at this now the previous expression the bottom of slide 7 now can be succinctly denoted by A inverse I am sorry A transpose is equal to d inverse hx. Now, A I take the transpose of both sides remember sigma x is a symmetric matrix d is a symmetric matrix. So, A is given by this d we already know d is a symbol that represents sum of 2 the sum of 2 matrices. Therefore, A the optimal A takes this particular structure that is very important. So, we have already used the quadratic minimization principle to be able to determine A that minimizes the trace of A. So, let me summarize where we are right now we start over the linear structure we require it to be unbiased we eliminated B then we looked at the resulting structure of the estimation error we compute an expression for the covariance of the estimation error then we computed an expression for the total sum of the variances of the individual components of the forecast error which relates to trace of P P is the covariance of the forecast error the covariance P is a quadratic function in the elements of A there are several different ways to do the minimization problem we chose a particular minimization problem the reason for choosing what we did because we already know how to minimize quadratic functions. So, we fell back on what we know and know very well we know how to minimize quadratic forms. Therefore, we converted the problem of minimizing PI we converted the problem minimizing the trace of P trace of P is the sum of all the PIs. So, what is that we did we minimize the individual terms in the summation if you minimize the individual terms in the summation the total variance is going to be a minimum that is the line of argument the minimum of the individual elements can be obtained by appropriate choice of A. So, the choice of A that relates to minimum of the I I element PI I relates to a row of A. So, by deciding each row of A for each element PI I we collectively got all the elements of A. So, the optimal A is given by sigma xh transpose D inverse once we have that please go back and remember what is the optimal structure we already have the optimal structure is equal to m plus 8 times h minus I am sorry z this is z minus h m. So, z minus h m is known m is known. So, we have to substitute A we substituted this value of A in here that led to this expression in here. So, the optimal estimate that is linear unbiased and minimum variance is given by x hat is equal to m plus A matrix which is a weight matrix and the innovation this is the innovation. Now, not only we have gotten the estimate we already know the structure of the covariance of the estimate which is P if you go back to the previous slide in slide 5 we have an expression for the covariance of P and that everything is known except A. Now we have optimally we have determined the optimal value of A. So, if you substitute the optimal value of A in here we also get the covariance the covariance of the optimal estimate substituting A in P and simplifying that is a good bit of algebra involved there you get the matrix P which is the covariance of the linear unbiased estimate the minimum covariance I should say minimum covariance the covariance of the estimate where the total variance is minimum the covariance structure is takes this particular form now look at this now that is there are two terms one is sigma x what is sigma x sigma x is the variance of the prior I am subtracting from that a quantity. So, you can think of it the posterior covariance is less than the prior covariance if the posterior covariance is less than the prior covariance means what by combining the prior the posterior by combining the prior on the observation I have tried to reduce the variance of the posterior variance of the posterior now look at this now sigma x is the covariance of the prior sigma v is the covariance of the noise. So, this whole term is a symmetric matrix it definitely it is a symmetric positive definite matrix I am subtracting a symmetric positive definite matrix on sigma x. So, P the trace of P is less than the trace of sigma x. So, by lessening the variance I am going to improve the quality of the posterior mean. So, I had prior mean I have observation I have the posterior mean posterior mean has a smaller variance in the prior mean and that is the result of combining two pieces of information prior and the observation. This structure that is given in here is called the Kalman filter structure this is the structure of the Kalman filter which has originally derived by Kalman in 1960. So, is any the derivation that we had gone through is a very important derivation that leads to the fundamental result in Kalman filtering techniques. Now, I would like to build a relation between the Bayesian least square solution and the linear minimum variance solution and introduce a sense of duality between between the two that end I am going to now recall what we have done the Bayesian approach the Bayesian approach is supposed to do the calculation in what is called the state space Rn. So, in Bayesian approach you may recall from our previous analysis this is the structure of the this is the structure of the optimal estimate the optimal estimate is the linear combination observation Z and the prior mean and is given in this particular form sigma E is the matrix that waits the sum and sigma E is given by I am sorry this must be sigma X and this must be H transpose sigma E n by n this must be I am sorry one second I would like to correct an error here. So, this must be equal to sigma X and this must be equal to sigma V let me think about it that is sigma V no the other way sorry H is let me let me let me let me make a little calculation in here H is m by n H transpose is n by m. So, if I am going to multiply this so this must be sigma V this must be sigma X this must be sigma X sorry for the error. So, sigma V you can readily see from the previous calculations we have already done. So, sigma X is the prior covariance sigma V is the observational covariance. So, that is the covariance of X the mean square estimate coming from the Bayesian. So, this is the estimate and its covariance this formulation is called state space formulation when n is less than m it is useful to do the calculation to the state space which is Rn because n is less than m. So, computations in the state space are smaller the two spaces when compared to when you consider the observation space and the state space. So, this is the summary of the Bayesian least square solution 16.26 16.25 these are all expressions in chapter 16 of our textbook which is the LLD 2006. Now, linear minimum variance estimation is supposed to be working the observation space which is Mn the observation space formulation is used when m is less than n in the case of linear minimum variance estimation the structure of the estimate is given by this we just saw the covariance of the estimate is given by this this is again given in LLD. So, now you can see we have two types of results one coming from state space formulation within the Bayesian least square setup another one is the observation space formalism that comes from linear minimum variance estimation at the outset it looks as though they are different but the important part of the result is that they are indeed the same. So, there are two different versions of the estimation problem the same estimation problem the results are given in two different forms which at the first side looks very different. Now, we are going to show that one can convert one result into the other by invoking to a very simple matrix identity which is called the Sherman Morrison-Hutbury formula we have already used the Sherman Morrison-Hutbury formula in the context of recursively squares when we did static deterministic problems the same two Sherman Morrison-Hutbury formula is a result from matrix theory which were developed during the early mid 30s and 40s is becomes very handy to see the relation between these two formalisms between these two formalisms the bridge between these two states. So, I would like to reinstate one is the states per formalism another is the observation space formalism one coming from the Bayesian another coming from the linear minimum variance estimation. So, Kalman derived using the linear minimum variance estimation people who came after him also derived the Bayesian formulation and then showed the Bayesian formulation and the linear minimum formulation are one and the same they are dual to each other in one case you do the computations observation space another case you could do the computational state space by proving the equivalence between the two it provides a lot of freedom to do the computation in a space which is smaller than two whichever is smaller I will adopt this formulation or I will adopt that formulation that is the basic idea. So, the bridge based on Sherman Morrison-Hutbury formula is the formula in matrix theory we have given in the appendix B to our book I have also talked about extensively about the Sherman Morrison-Hutbury formula in our module on matrix analysis matrix theory facts from matrix theory. So, can start let us start with the linear minimum variance analysis we recall D is given by this we recall D is given by this D inverse is given by this if I apply Sherman Morrison-Hutbury formula Sherman Morrison-Hutbury formula to this we essentially get this Sherman Morrison-Hutbury formula I am not going to quote the formula it is already given in the module on matrices. So, what is that essentially say the Sherman Morrison-Hutbury formula says the following if I have sigma V I may know sigma V inverse suppose I update sigma V by adding a matrix which is H sigma X H transpose how do I compute the inverse of sigma V plus H sigma X H transpose inverse. So, given this the question is how to compute this and that is given by the well-known Sherman-Hutbury formula if you use that formula you will essentially get D to be this. Now, I am going to do a little bit of a jugglery multiply both sides by sigma X H transpose please understand these two are equal these two are equal I multiply both the expressions in the left by this and then do a sequence of simplifications we arrive at this formula. So, what does that tell you sigma X H transpose the inverse of the sum is equal to the inverse of this sum times that. So, this is a matrix identity that gets out that is a result of coming there is a result coming from application of Sherman Morrison-Hutbury formula. So, there is a lot of matrix algebra involved but the general steps are should be very clear by now yes there is there is lot of checking to be done I hope you will take few minutes to be able to check all the details. So, in view of this fact that this matrix is equivalent to this matrix let us call this now sorry let us call this matrix sorry I have to let us call that matrix as A let us call this matrix as B. So, what is that we have accomplished by using Sherman-Marsson-Hutbury formula we have essentially shown the matrix A is equal to the matrix B that is the first step. Now, I am going to import that relation now look at this now the structure where do I get the structure this structure comes from the linear minimum variance estimation from Goster-Colman we just talked about few minutes ago in this I have a matrix that takes this form this form is the matrix A I am going to replace that matrix by matrix B which is A is equal to B that I derived in the last page we know A is equal to B. So, we have earned the right to be able to replace A by B now if I do the multiplication and simplify I get that to be the sum of I get this to be the sum of two terms this term and this term again I am going to do simplification consider the second term by sequence of simplification it can be shown the second term reduces to this by combining this simplification from here and substituting this in here we get the overall structure to be X transpose I am sorry X hat is equal to given by this. Now, if you look back this is exactly the structure given by this is exactly the structure given by the Bayesian sorry this is exactly the structure given by the Bayesian analysis. So, what is that we have done we started with the linear minimum variance estimation technique linear minimum variance formula for the optimal estimate from there we have derived the Bayesian analysis by applying your matrix identity that arises out of the application of Sherman-Maris-Anoudbury formula by retracing this we can readily see these two formulas are equal. Therefore, linear minimum variance analysis is equivalent to Bayesian analysis linear minimum variance analysis is equivalent to Bayesian analysis. Therefore, there are two equivalent ways of looking at Bayesian data simulation or Bayesian way of estimation one through linear minimum variance another through the classical Bayesian itself. Now, I am going to reformulate it in the form of a Kalman filter static case why the Kalman filter generally talked about within the kind of dynamics but Kalman filter application is applied at a stage where we are going to apply at a given time. So, it is in application if you understand the static case we should be able to apply it repeatedly in time to get the Kalman filter equations within the context of dynamic models. So, I am now going to quickly review some of the basic ideas let X be an unknown but constant. So, X hat be an unbiased estimate of X with no observation that means when there is no observation that is the prior the prior is given by minus X minus means prior X plus means posterior the prior has a mean X. So, the prior information consists of the mean and the covariance the I should not say this is constant I am sorry X is the unknown that is all what it is X is the unknown X hat is the unbiased estimate of X when there is no observation E of X hat is X minus is the prior covariance. So, minus refers to the prior minus refers to X minus sigma minus or prior information or prior statistics prior statistics prior statistics. So, I am given the observation again you have the standard assumptions the usual conditions on V holds good namely the X and V are uncorrelated the linear minimum variance approach. So, I am now going to talk about a posterior estimate X plus the posterior estimate is a linear function of the prior and the linear function of the function of the observation. So, L is a matrix K is a matrix I would like to be able to consider the posterior estimate X X hat X plus is the posterior estimate. So, I am given two pieces of information X minus and Z I would like to be able to combine them to to to X plus sorry I would like to able to combine them to get the X plus to X plus. I am going to again remain within the linear minimum variance estimation as done by Kalman. So, I am going to insist on X plus is a linear function of the prior and the observation L is equal to X bar plus KZ. So, that is the posterior structure now prior is given observations are given I would like to be able to find X plus there are two unknowns L and K I would like to be able to use L and K such that X plus is an unbiased estimate and also has a minimum variance. So, you can see I am just I am trying to repeat what I did the linear minimum variance estimation, but this is the derivation that Kalman had given originally in his paper. So, what is the difference in here earlier we assumed the estimate is B plus AX. So, earlier we assumed X hat is equal to B plus A Z now we are assuming X plus which is the posterior which is equal to L X minus plus KZ these two are equivalent these two in some sense are equivalent and so we are now going to repeat the derivation our aim is to be able to guarantee X plus is unbiased with also minimum variance. I have to impose two conditions the two conditions are imposed by selecting the two matrices L and K therefore let us look at the condition for unbiasedness of this X plus is equal to L of X minus plus KZ X must be equal to E of X plus E of X plus is equal to E of L X minus plus KZ Z is equal to HX plus V you substitute that you get here by doing the simplification since the expected value of E is 0 that term goes away the prior is unbiased. So, E of X minus is X KX X is X in this case therefore I get the quantity I get the quantity the X must be equal to L plus KHX. So, this essentially tells you L plus KHX must be identity or L must be equal to I minus KH. So, one of the two matrices are is expressed in terms of the other and H therefore the structure now becomes this which can be written like this. So, that is the structure of the unbiased estimate. So, you can see we are running very parallel very much parallel to the linear minimum variance estimation in this particular context except that the structure has been has been taken to be this in here I would like to I would like to remind ourselves that we are looking at the structure of the estimate that comes from the unbiasedness condition and the linear structure linear in both Z as well as in X prior. So, that is what the resulting structure is. So, this is the posterior error in the estimate if I have the posterior error in the estimate I can compute the posterior error covariance sorry the posterior error covariance again we are talking about the posterior covariance this is the expression for the posterior error it is its transpose you multiply you simplify the whole expression you get this and you get D you can readily see the parallel between what we did and what we are trying to do I would like to be able to choose K such that the trace of sigma X plus as minimum the problem is similar to what we just solved. So, the K that minimizes the trace of sigma plus is given by that this K takes this particular form and this K has a special name is called Kalman gain in order of Kalman who derived the filter for the first time in 1960. So, we substitute this K please go back this is the structure it is into the structure we are going to sorry it is into the structure we are going to substitute the value of optimal K substitute the value of optimal K if you substitute the value of optimal K in here and simplify you get the expression for X plus which is given here and you also remember the covariance sigma plus has K in it. So, I could also substitute the value of the covariance in this expression because I already know so I am going to submit this back in here if I did this and simplify I get the posterior being given by that and the posterior covariance is given by the posterior covariance is given by this expression and in the traditional literature this could be written as this is generally written as X is equal to X minus plus K times Z minus H X minus K is called the Kalman gain Kalman gain this is called the innovation this is called the innovation this is called the posterior covariance this is the posterior covariance again you can see the posterior covariance subtracts a matrix from the prior covariance. So, the posterior covariance is smaller than the prior that means after combining the prior and the new information I have reduced the uncertainty in my estimate that is why posterior estimate is better and this is the reason why posterior mean is optimal within the context of Bayes Bayesian framework. So, we have come to the end of this discussion from Gauss to Kalman linear minimum variance estimation. So, in the previous lecture we talked about the structure of the Bayesian analysis and Bayesian optimal decision and as well as the resulting value of the covariance from the Bayesian structure here we developed the theory of minimum linear minimum variance estimation we talked about the intrinsic relation between linear minimum variance estimation and Bayesian estimation we build the bridge the bridge depends on a result from matrix theory which was independently developed by mathematicians in the 30s and 40s and that is called the Sherman-Marisou-Nordbury formula. So, by using an already existing formula we were able to build the bridge between the state space formulation and the observation space formulation and that essentially shows that Bayesian method can be interpreted in one of two ways either within the classical Bayesian or within the linear minimum variance framework and we have now by introducing these two we have more choices in terms of picking which one is better from a computational perspective we pick one or the other depending on whether m or n smaller than 2. So, with that I think we have provided a broad overview of the fundamentals of linear minimum variance estimation as well as a derivation of the Kalman filter equations we very strongly urge the reader to be able to follow through the exercises and verify all the computations that are involved in here I am very cognizant of the fact that there are a lot of facts one need to verify in a class setting I generally would cover this linear minimum variance estimation itself in about 2, 2 and a half, 3 classes giving all the details but in this compressed video form I am giving I am hitting all the major steps leaving behind the verification of the formulas as an exercise to you so please do please do continue I also would like to draw your attention to a particular way of looking at what is called a 3D bar problem even though I have not introduced them I think it is better to anticipate that so I have a prior estimate and I have a prior mean and the covariance under normal distribution I have a prior and its and its covariance z is equal to hfx plus v I am going to consider a function hfx which is a sum of 2 quadratic functions so please remember there are 2 pieces of information one coming from prior another coming I am sorry this is prior one coming from prior one coming from prior another coming from observations this comes from observations. So you can see there are 2 pieces of information coming together I am joining them in the least square framework so ffx is a sum of 2 quadratic forms I can minimize ffx with respect to x and if you did that you will essentially get the Bayesian estimation Bayesian results therefore you can readily see the Bayesian framework with Gaussian distribution assumptions and the 3D bar problem are essentially one of the same so it is a very instructive exercise to pursue and you already know how to do the minimization of the quadratic forms so I very strongly urge you to be able to do the minimum and find the solution and look at the structure of the solution and all these derivations are further expanded in the book by Sage and Nelson 1971 estimation theory and application to communication and control also we delve deeply into many of these in our chapter 17 in LLD so with this we have now come to the end of a discussion that relates to all the basic fundamental principles relating to statistical estimation starting from the properties of estimates to statistical least squares to maximum likelihood estimates to Bayesian estimates to linear minimum variance estimation this is simply a small sampling of results from statistical estimation theory the statistical estimation theory is a big ocean I want to provide a window of opportunity to be able to look at the kind of results that statistical estimation theory provides and how some of these results are intrinsic to pursuing our goal in dynamic data simulation thank you.