 Okay, so let us continue with filtering and last lecture I introduced to you derivation of Kalman filter, there are multiple ways of deriving Kalman filter, I thought I tried to do the simplest one, simplest derivation to derive Kalman filter, if you see I just went through optimization approach, there is a way to go through statistics and probability density functions and so on, I am going to make those connections now, but those derivations are little more algebraically complex, if you thought the earlier one was simple then, but if you go over it actually I would strongly recommend that you sit and try to derive the expressions in the notes, those 5, 6 pages, it is easy to derive those expressions, it is not so difficult, if you derive those expressions yourself you will get confidence about what is happening. So let us just summarize Kalman filtering algorithm, in fact my today's lecture unfortunately today not many people are able to attend because of this some extra lecture, I think it is equally important or probably more difficult lecture than yesterday's lecture. So let me summarize, because now in this lecture I want to go beyond what I taught yesterday, yesterday I taught this derivation of this Kalman filter, first of all you have to understand that we are dealing with a stochastic process x which is governed by a difference equation, okay. In a stochastic process you can do prediction, okay, prediction is why do you develop models because you want to do predictions, okay, so prediction is integral part of, so we predict the state without before receiving the measurement, this is my state prediction, this is what I expect, this is the best value of the state, best guess for the state, okay. Now this is as we have seen that this is nothing but the conditional mean, this is the conditional mean of x, given information up to k-1, okay, then it is not sufficient to characterize only mean, I want to characterize the uncertainty associated with the mean with this random variable, so covariance comes into picture, so we have this expression for the covariance, okay. So this tells you that the covariance associated with the prediction is function of the covariance associated with the previous updated mean, okay. The x hat k given k-1 given k-1 is the previous updated mean, okay, covariance associated with this is pk-1 given k-1, okay, this only relates the uncertainty at the previous time step, okay with the uncertainty at the current time step, okay, so this is a stochastic process in which the density at the new time step is associated with the density at the previous time step, okay, it is a function of, it is a function, see in fact the view point I am going to present today is that these densities are nothing but you can be viewed as Gaussian normal densities, okay, so you can view this as a Gaussian normal process, Gaussian normal densities can be characterized by mean and variance or mean and covariance, so it is sufficient for me to have you know update of mean and update of covariance, okay, actually what I am doing is I am describing a stochastic process through propagation of its mean in time and propagation of its variance in time, okay, so this is predicted mean, this is predicted covariance, okay, now when the measurement arrives, okay, I want to fuse the measurement with the predicted estimate and I want to do a correction, the correction step or the update step, okay, for that you need this gain, the Kalman gain is computed, we saw that optimal Kalman gain, I am calling it L star, in the previous slides I have this Lk, Lk is any arbitrary gain, L star is the optimal gain, okay, the optimal gain can be computed using cross covariance of estimation error and innovation, epsilon is estimation error, okay and E is innovation, this cross covariance multiplied by innovation variance or innovation covariance inverse, pre-post multiplied, it is very important here, post multiplied, these are all matrices, okay and this is that quantity, we had computed this quantity earlier, we had computed this quantity earlier separately, I am just putting the expressions now, so there is a recursive expression to compute the gain, okay, this gain if you notice is the function of uncertainty associated with the estimate that is pk, it is also function of, now this pk given k-1 is function of q, q is the uncertainty associated with the state error, what is r? r is the uncertainty associated with measurements, okay, r is the uncertainty associated, see what are the three sources of uncertainty in your difference equation, one is the state noise, state noise wk, okay, second is you know the initial state, this itself see this is a difference equation, this is the initial state, this initial state itself is uncertain, right, so this initial state is uncertain, so there are three sources of uncertainty, initial state when you go from k-1 to k, okay, second source of uncertainty is wk, okay, third source of uncertainty is vk, measurement noise, okay, now all three play role in this gain estimation, okay, all three of them play role, now if you see this, okay, well I should not be talking in terms of ratios because this is this matrix into this matrix inverse, but in some sense if you see here, this is you know uncertainty, prediction uncertainty plus you know this q which is coming of wk, okay divided by in some sense divided by, I should not be using divided by because this is a matrix, okay, but if you do a scalar case you will see that a ratio will come and you know this r here is coming in the, you know in some sense it is coming as a denominator, if you take a scalar case it will come as a denominator, so what it means is that we are trying to find out that fraction, see this q appears in the numerator as well as in the denominator, okay, whereas r appears only in the denominator, okay, so we are trying to find out that fraction which is because of w, okay, we want that gain because of w and then we want to use that to correct the estimate, okay, now update comes through the correction, okay, so lk times ek and where is the information about wk and vk contained in, it is contained in yk, so yk minus predicted output, innovations will have that information, okay, some fraction of it, what is that fraction decided by this covariance, decided by this lk matrix, gain matrix, this lk matrix is used to fuse the measurements with the state estimate of the model, so my model remember is running in parallel in my computer, only thing that is coming from outside, okay, our data one thing that is coming from outside or that is coming from outside is the measurement y, okay, u is something that computer decides, either a controller decides or operator decides, you know as a operator you are tweaking, so you know those values of u, okay, so I am going to do a correction here in my state and this is the uncertainty associated with the correction, see this is the uncertainty associated with the prediction, this is the uncertainty associated with the correction, okay, now what I expect is when I do a correction, uncertainty should reduce, I have got some information from the plant, okay, I use that information and use it to correct the state estimate, this is my correction step, right, so in some sense x hat k given k should be a better estimate of the state than k given k-1, okay, so what I would expect is that uncertainty associated with it, okay, with xk given k-1 should be higher should be larger than uncertainty associated with the k given k, this is indeed the case, okay, yeah, prediction step, yeah, because you have taken the expected value of wk to be 0, so we have taken the best estimate that we have for or mean of wk that is 0, we add the state noise something equivalent to state noise component, some compensation for the fact that state noise has actually influence x, okay, no, no, no, what is unwanted for you is the measurement noise, okay, wk is a real input I mean or you can say that it is an imposter for a real input, we have tried to create a model that kind of imitates some real input, okay which is influencing the state some uncertainty which is influencing the state, okay, we do not have an explicit measurement, we are trying to create a pseudo compensation for it through this, see yk brings an effect of wk-1, okay and then somehow I want to use that information and you know inject it into my estimate, okay, so this correction is an attempt to make a compensation of for that using, okay, see have we done this kind of a thing earlier, we have, when we have looked at time series models, ARX models, box Jenkins models, I am going to do a connection in today's class hopefully, I am going to show that actually what we are developing is a form of time series model and I will also show that the steady state Kalman filter, I am going to talk about something called steady state Kalman filter, so steady state Kalman filter is nothing but box Jenkins model, okay, so what was the attempt in time series modeling, unknown input modeling, right, unmeasure disturbance modeling, okay, here is the same thing, okay, see in box Jenkins model or in ARMAX model we had this ek, right, ek was innovation, innovation was yk- y hat k given k-1, okay, this ek and that ek are not different, I will do the connection then you will realize that actually you know we are talking same thing into two different languages, okay, is the same idea, different ways of parameterization, different approaches actually finally lead you to the same fundamental idea, okay, so actually in your assignment when you develop let us say ARMAX model, okay and when you convert it into state space form, actually you have identified a Kalman filter from data, okay, so I will come to that, but you know historically these two things have probably emerged completely differently and then now you can see those global connections you know or maybe those who developed it knew all those connections, we are late in understanding those, so if you see Loong's book he will talk about connections with Kalman filtering very clearly, only thing which I am somewhat disappointed is that he has put it as a exercise problem, so it should have been prominently you know there in the main part of the text saying that actually this is a Kalman filter, it is not different, so unless somebody goes and solves those exercise problems you would not realize that you know actually a Bonk Jenkins model or ARMAX model is the parameterization of Kalman filter, okay, anyway we will do this connections, so is this clear, okay, so we are hop in time, okay, we are going from time k-1 to k and every time we are finding the new density function associated with it, okay, now where is the density function coming in, till now I only talked about mean and covariance, let us now try to associate some real density function which is before that let me say something about covariance reduction, okay, the nice thing about positive definite matrices, okay is that you can compare them, you can say that this matrix is greater than this matrix, only for positive definite or for negative definite of course, indefinite matrices you can do this, positive definite matrices you can talk about a matrix, one positive definite matrix being greater than the other positive definite matrix, okay, so actually a positive definite matrix can be associated with some kind of close counters in the X1, suppose you take a simple case of X1, X2, okay and if you draw the locus of points, see what I am trying to show here is that no, it is X transpose Px, it is X transpose Px comparison, so here see what is the positive definite matrix, a positive definite matrix is X transpose Px is greater than 0 if X is not equal to 0, okay and this is equal to 0 if X is equal to 0, right, this is a positive definite matrix, okay, I can actually compare two matrices if we say that matrix P1 is greater than matrix P2, because X transpose P1x is greater than X transpose P2x or X transpose P1x minus X transpose P2x is greater than 0, okay or other way you can put that is X transpose P1 P2x minus X transpose P1x is less than 0, either one of them, so actually if you take set of all points such that X transpose P1x is equal to 1 and X transpose P2x is equal to 1, okay, if you take this locus actually in general first of all know that if you take a locus of points X transpose Px equal to constant, okay, so this will turn out to be ellipses, this will turn out to be closed ellipses like this if you take let us say X1 X2, so this is in two dimensional case, so this is X1 and this is say X2, X2 then in two dimensional space you know, so this will be some C1, C2, C3 and actually C1 will be greater than C2 will be greater than C3 and so on, okay, so locus of these points for a given positive definite matrix are ellipses, okay and when you have this inequality which means that if you compare the ellipses of P1 and P2, okay ellipse created by P1 will be larger than the ellipse created by P2, okay, so actually this ellipse size for a fixed say if you take X transpose P2x is equal to 1 and X transpose P1x is equal to 1, if you compare these sizes see then you can get idea of size of uncertainty, see if the ellipse is smaller uncertainty is smaller, if ellipse is larger uncertainty is larger, larger covariance will give you larger ellipse, smaller covariance will give you smaller ellipse, okay, so what I want to show here well I have not drawn exactly, the ellipses I have drawn here are aligned along the axis in general for a general positive definite matrix they will be aligned they might be tilted, okay, so I have just created a cartoon which is convenient to draw it was difficult for me to tilt this, if someone one of you can explain how to tilt this ellipses I could improve this cartoon, but the idea is that what is happening is that this matrix Pk given k is actually smaller than Pk given k-1, why am I saying this see let us go back here, okay, let us go back here if you, okay, you look at this quantity here Pkk is Pk given k-1- this quantity, now this quantity which is here if this always positive definite, Lc this is Pe is a covariance matrix, covariance matrix is positive definite, okay, so its inverse is positive definite, okay, then if you take a matrix, if you take a matrix, okay, multiplied by a positive definite matrix and multiply by transpose of that matrix Lk, this whole thing is a positive definite matrix, okay, so actually you are subtracting a positive definite matrix from another positive definite matrix, okay, so this matrix should be smaller, okay, this matrix should be smaller than, you know Pkk should be smaller than because you are removing a positive term in some sense, okay, you are removing some positive term from the original matrix, so the covariance associated in the update step the covariance associated with you know with the estimate reduces, okay, so this is very critical, this is very nice thing that happens, okay, so now actually we have developed a filter which systematically takes into account stochastic description of the you know unknown signals, we have a way of doing optimal filtering, okay, there are some questions to be answered, okay, if this is optimal but is this stable, okay, stability concept, stability idea is still not answered, okay, but at least we have established that this is the optimal filter, okay, I will show you that this is at least under certain you know simplifications you can establish the stability, establishing the full stochastic stability is little complex and it can be done but I do not do it as a part of this course but I want to give you at least flavor of how stability analysis can be done, is this idea clear that the covariance is reducing, okay, moment I do update, okay covariance reduces, so every time actually I do an update more and more information I get the variance associated with the estimate shrinks, okay, so I should get better and better estimate every time I more and more data I collect, now till now I never made an attempt to associate any specific distribution to it, I am going to associate now the most wonderful distribution, most easy to handle Gaussian distribution and then we look at the same thing through Gaussian distribution viewpoint, okay, actually Gaussian distributions have some wonderful properties, why multivariate Gaussian distributions, well multivariate Gaussian distributions can be expressed in analytical form, I will show you that form what that form is, then this is a very very important statement, I do not know how many of you have done first course in statistics but if you have done it you would have stumbled upon central limit theorem, central limit theorem says that if there are multiple random variables simultaneously influencing some system, okay, there then combined effect is like a Gaussian random variable, okay, each there are multiple random variables, each one of them have different distributions, none of them is individually Gaussian but the combined effect can be shown to be Gaussian, okay, now if you go back and say that well did you see the statement what is it, let me have I am saying that some of many independent and equally distributed, some of these words are very very important, I am just I try to give a very simplified version in color field language but the right way of putting it is that many independent and equally distributed random variables can be approximated by Gaussian distribution, this is a very very powerful result, so if you know that uncertainty is arising from multiple physical sources it is justified to model it as a Gaussian distribution, okay that is why now why am I saying this let us go back here, let us look at our original model, just look at this model, okay let us assume see this is a linearized model, right, this is a linearized model, what is u here, u here is the known input, okay, actually I try to give some rational to w by saying that you know we start from linearizing, you know we start from a linear perturbation model, okay and then you have this disturbance okay and then you make an assumption that these are piece wise constant, okay but just imagine when you are developing a model for a particular system actually you cannot develop a model with respect to all disturbances that are affecting, you will develop a model with relevant disturbances, okay see let us say I ask you to develop a model for temperature distribution in this room, okay you will develop a model with you know prominent disturbances for example I would say prominent disturbance is people walking in by opening the door, okay random disturbance which is somebody walks in with opening the door or walks out with the opening the door but you know there could be some disturbance which is coming because you know there are electrical fluctuations which is causing some problem with the you know air distribution and you know you cannot model everything that is happening, okay and all of these are independent sources of disturbances, okay their combined effect okay actually can be thought of as if it is a Gaussian distribution, okay so even though I have given here this rational it is not always possible to model all disturbances that are actually influencing the plan, okay so actually when you come up with this model here so this is all fine you know we discretize and when we come up with all this model here but moment I come to this WK model here you can view what is the other source of uncertainty when you are actually doing this see you actually started from a non-linear differential equation you linearized, okay and then you discretized, right and then you got this equation. Now you know one thing is that when I got this equation I made an assumption that a disturbance is a piecewise constant but they are not piecewise constant so there is a residual disturbance which is not modeled, right so that assume that it is an independent another disturbance residual disturbance which is unmodeled is like some other disturbance, okay this one more source of uncertainty actually my original system is non-linear I linearized so there are approximation errors coming from linearization, okay that is another source of uncertainty in this dynamics, right so there are multiple such sources you know and then there are so many unmeasured disturbances which we cannot model so if you view this W as arising from multiple independent sources of uncertainty, okay then it is justifies to model this W as a Gaussian distributed random variable, okay that is what the central limit theorem assures you, okay so all that I want to say is that when I develop this model same thing is true about the BK when you are taking a measurement, okay there will be errors committed through multiple sources see you have a signal first of all your signal conversion you know you will convert some temperature into some voltage then you will have amplification circuit, right will amplify it, okay filter some noise or something and that will add some you know that will add some spurious signals into the then you have transmission, right you have to transmit the data either digitally or through analog say 4 to 20 milliamps or some digital transmission it will pick up some noise there then if it is analog transmission then you have D2A conversion, okay in your computer it will pick up some noise there and all these are independent sources they are not a combined effect of this can be viewed as a Gaussian normal distribution quite justified in doing that and actually you will see that if you try to estimate distribution for a measurement noise I have shown you in the earlier lectures it comes out to be almost Gaussian distribution, okay so this Gaussian distributions are very very important now there is something more to it than just the central limit theorem they are also very very mathematically convenient, okay they have wonderful mathematical properties and that is why also by the way we have not missed out anything about Gaussian distributions till now because Gaussian distributions need only two things mean and variance and we have formula for doing mean and variance, okay so attaching Gaussian interpretation to what we have done is very easy only two things required are required to characterize Gaussian distributions, okay so this is the first thing that uncertainties arising from multiple independent sources can be conveniently modeled as Gaussian distributions, first thing second thing is we have very very attractive mathematical properties if you take a Gaussian random variable transform it through a difference equation, okay the resultant is also Gaussian random variable that is very nice about Gaussian random variable any linear transformation of a Gaussian random variable is also a Gaussian random variable, okay this property is you know wonderful and then it helps us to that is not true about other distributions Gaussian distributions have this amazing property that when they are transformed linearly the linear transform Gaussian variable is also a Gaussian variable, okay and then you know for Gaussian distributions optimal estimates have simple forms so what I am going to hint at is that if you take a viewpoint that WK, VK are Gaussian random processes, okay and initial estimation error X0 is also a Gaussian random process all of them are Gaussian not process initial estimation has a Gaussian distribution and WK and VK have are Gaussian random processes, okay then XK also has Gaussian distribution XK given K-1 or XK conditioned on Y up to K-1 and XK conditioned up to YK both of them have Gaussian distribution because you know Gaussian distributions transform through linear equations will again give you Gaussian distribution that is the wonderful property let us have a look at the Gaussian distribution what is the Gaussian distribution requires only two things mean and variance so if X bar is the mean and P represents a positive definite matrix then probably you remember this complex looking formula so this is exponential of X-X bar transpose P inverse X-X bar you can probably make a connection with some of the square of errors X-X bar, okay is like an error, okay it is weighted by a covariance inverse P inverse, okay and okay this is an exponential term what do you expect if X is away from the mean this quantity will be large or small if X is close to mean this will be 0 right X-X bar will be 0, okay so exponential of that will be a large term, okay so actually this is trying to say that larger errors are less you know smaller errors are more they are closer to the mean errors are more and you know this is only a normalizing factor this particular factor which comes here it only make sure that what is the property of density function area under the curve should be 1, okay so this is actually this part is only a normalizing factor, okay and if you look at this is like some of the square of errors, okay it is like exponential of some of the square of errors what is the weighting function of the some of the square of errors P inverse covariance inverse is the weighting function, okay so the nice thing is that if you take X as some random variable Gaussian random variable and then A is some constant matrix and B is a constant vector and you create this new variable Z which is AX plus B, okay you can show that Z also has a Gaussian distribution, okay Z also has a Gaussian distribution it is very very easy to prove can you just do this can you work this out what will be the mean if X bar is the mean what will be the mean of Z it will be A Z bar, okay what will be the variance Z minus Z bar into Z minus Z bar transpose what is it expected value of Z minus Z bar Z bar will be AX bar right plus B Z bar will be AX bar plus B AX minus X bar so Z minus Z bar into Z minus Z bar transpose will be AX minus X bar X minus X bar transpose A transpose now if I take expectation on both sides okay expectation of this is expectation of this quantity okay which is equal to A expectation of X minus X bar into X minus X bar transpose A transpose this is A P A transpose right A P A transpose okay so this is a very very crucial property that if you have a Gaussian random variable I am talking about vectors okay X is a vector Z is a new vector AX plus B gives me Z this is a new vector if X has Gaussian distribution normal distribution Z also has Gaussian normal distribution and from one knowing one you can estimate the other okay that is very very important yeah Gaussian white noise now WK you can take Gaussian white noise other than Gaussian see white noise does not require you know what are the two properties of white noise no no no mean is constant okay and there is no time correlation that is important okay so if you can construct see mathematically it is possible okay what we normally use conveniently is yeah Gaussian distributions I think it is possible to construct Gaussian distributions which are multivariate for example I will give you example of truncated Gaussian there is something called truncated Gaussian distributions okay so it is possible to construct the white noise need not always have for example I think you can create a white noise with uniform distributions at least one dimensional you know should be possible to have white noise which is uniform distribution yeah no no not AX plus B so actually you know the filtering which is happening through difference equation is actually linear filtering we are all having linear difference okay so with an algebra using this particular property this fundamental property okay AX plus B is not filtering what I am saying is that a consequence of consequence of this property is that linear filtering of a Gaussian distribution so using this property applying it to our difference equations you know this is the fundamental property which can be applied and then you can yeah oh yes thank you have missed out the negative side fundamental error thanks okay now sorry about that okay so I am going to call this distribution as okay the notation I am going to use is n n stands for normal distribution X bar and p p so only two things are required for normal distribution mean and variance okay there are some other properties of multivariate random distributions if you take two random variables okay which are Gaussian distributed X and Z they are called as uncorrelated this is the general definition for any random variables they are called as uncorrelated if expected value of X-X-Z-Z bar transpose is 0 then they are called uncorrelated I am just giving a definition here okay this is nothing to do right now with Gaussian random variables and random vectors X and Z are called independent if probability density function of X joint density function of X and Z can be expressed as PX into PZ okay now if X and Z are independent that means given any two random variables if X and Z are independent that means they follow this property okay then they are uncorrelated that you can show okay but if they are uncorrelated does not mean they are independent with the exception of Gaussian okay in Gaussian random distributions if you have two vectors X and Z and if X and Z are uncorrelated then they are independent if they are independent they are uncorrelated this is true only for Gaussian random variables okay in general it is not true Gaussian random variables have a special property independence implies uncorrelatedness uncorrelatedness implies independence okay they are one and the same properties for Gaussian random variables okay this is just a property which I am stating here all these properties have to be used when you do the analysis using this interpretation okay now where am I going to use this I am going to make a fundamental assumption that W is a stochastic process is a white noise with 0 mean normally distributed okay normally distributed random variable with covariance Q V is a 0 mean see this is this symbol implies that V is drawn from okay normal distribution V tilde is V is drawn from okay that is how you read this and X0 is drawn from X0 0 0 and this is the initial covariance initial uncertainty about the estimate that you are giving at time 0 okay so all three of them are normally distributed multivariate random variable if I make this assumption okay then using the properties of Gaussian random variables you can show that I am not doing this proof here okay what I will do is I will upload some you know references and I will send you some references which you can refer to for further studies but I am not going to do this proofs but what you can show is that this conditional see we wanted to find out conditional mean right if you make this assumption that this is Gaussian W is Gaussian random variable V is Gaussian random variable and initial condition is Gaussian random variable you can show that conditional density of X even YK-1 is nothing but this as a mean and this as covariance we know how to compute this covariance and mean okay same thing is true about the conditional density of XK given YK okay so if these three are Gaussian these are also Gaussian okay this proof is fairly simple and I mean lot of algebra but you can prove this using these properties and I will give you some appropriate literature on this so that you can pursue this so one could actually what could actually derive everything assuming Gaussian I do not think Kalman did this Kalman did the way we have gone about he did not assume Gaussianity but what is wonderful is that you get the same result if you assume Gaussian distributions the final result update rule you know Kalman-Gel calculation nothing changes okay so completely different view point will give you same optimal result okay that is very very wonderful okay you can also show that innovation itself of Gaussian random variable those of you who have lecture you can move do not worry. So this innovation sequence is also Gaussian random variable innovation sequence and then you it is also 0 mean we have already found out mean and variance for innovation sequence you can show that even your innovation is a Gaussian random variable okay so with this assumption and the fact that linear transformation of Gaussian random variables is of give rise to Gaussian random variables you can show that conditional density of X is also X given Yk-1 is Gaussian conditional density of X given Yk is Gaussian conditional density of innovations is Gaussian so very very nice this does not work out for any other distribution this works out only for Gaussian linear transformation of Gaussian variables give rise to Gaussian variables okay what are their co-variance is you can estimate them okay then well I am just leaving some of some more interpretations here you will need to reflect over it before you can get all the meaning see there are two things that you should note see actually what we are saying here is that Xk is a random process okay Xk is a random process and by this assumption we are actually able to find the probability density function of Xk right conditional probability density function of Xk okay so this actually gives you the density functions okay so what is the value that Xk takes it is a random variable okay you want to give a point estimate actually if you give the distribution Xk is characterized is not it Xk is completely characterized if you are given the distribution is it so actually it is like saying that this the temperature in this room today okay has a certain distribution okay if you take temperature in this room as a stochastic process temperature has a distribution what value it has taken today is a realization okay that is a realization but if you characterize the entire distribution you are done I mean then you know you have said everything about that random process okay but what happens is we need a point estimate you know we need one value to say what is the base estimate okay so if I give you a distribution and say that this is the random distribution but you will ask me okay but what do you expect it to be today okay so that is a point estimate so actually once you have distribution characterized you have entire information about X okay yet we need a point estimate so how do you construct a point estimate okay so there are different ways of constructing a point estimate I can say that I want that estimate okay which maximizes the probability density function conditional probability density function okay which value of X okay maximizes the density function that is the most probable value I am willing to accept that is the most probable value okay that is called as map estimate maximum a priori estimate so if I give you lot of information because density is lot of information you know I cannot interpret I want one number okay so or I want one vector okay so that one vector can be constructed from the density information by multiple ways one of them is this maximum so maximize this density okay so you can show that estimate that maximizes this density function okay is nothing but the Kalman filter estimate it turns out that the Kalman filter estimate is the map estimate map estimate is maximum a posteriori that is the value of X that maximizes this density function is nothing but the Kalman filter estimate okay so you can derive Kalman filter from this view point okay you start with Gaussian density functions you develop this you know density function then you maximize with respect to estimate you will get the same formula you will get the same update rule everything will be same okay finally but a different view point completely different view point will give you same Kalman filter okay that is actually beauty of Kalman filter multiple view points actually arrive you arrive at the same formula okay there is one more view point one view point is you know maximize this density function other one view point is maximize the likelihood function okay so given a density function there are different ways of you know constructing point estimates one of them is this likelihood function this likelihood function is log likelihood function actually says that you are maximizing this or you are minimizing this sum of the square of errors what are the sum of the square of errors you are constructing that estimate which minimizes true minus predicted estimate okay this is norm this is you know weighted norm with this inverse coming in okay this weighted norm between yk true minus c xk with r inverse coming in okay why this r inverse and pk inverse are coming in no they are Gaussian these are Gaussian densities so when I take log likelihood okay I will get log of see what is Gaussian density Gaussian density will be x transpose p inverse right so what is x transpose p inverse x I could write this see Gaussian density I can write as so this is 1 upon some constant exponent minus x minus x bar p inverse x minus x bar right this quantity I can say that this is norm of x minus x bar sorry transpose comes here yeah so x minus x bar this is weighted norm this is weighted norm square okay so weighted norm square is x minus x bar transpose p inverse x minus x bar this is a norm function right you can see that this is a norm function norm square actually not norm so it is like distance square okay it is like distance square it is distance from the mean value okay that is what appears here in the okay so going back to our slides okay when you take see p xk yk or p xk given yk all these are Gaussian density all these are Gaussian density and when you take log of that okay you will get you will get this norm which is r inverse or norm because individual densities you know you can use all the properties of Gaussian densities of you know uncorrelated and independence all those things will come into picture linear additions of I mean I am skipping all the in between steps okay I will give you those material on that but this can be viewed as a you know estimate that gives you maximum likelihood estimate or one that minimizes this objective function so this derivations I will give you this derivations are also equally important so what is the advantage of the Kalman filter Kalman filter it generates maximum likelihood estimates it generates maximum a posterior estimates when the noises are Gaussian okay if you do not want to attach Gaussian distribution interpretation okay even then it gives you minimum variance estimate we derived everything through minimization of variance I never used any Gaussianity there okay so forget about Gaussian distribution we do not know what distribution it is but it is yet a minimum variance estimate okay so minimum variance estimation derivation did not require Gaussianity anywhere okay so that is very very important so it is a minimum variance estimate it only requires two moments first moment and second moment okay I am just done with so that is why it is very very nice algorithm it is very easy to adopt for irregularly sampled systems irregularly sampled systems are very much there today in which Kalman filtering is used you have a data see let us take an example of a mobile phone okay in which the data why is the signal received okay if you have a Kalman filter and if you are estimating the state state is the speech which is you want to reconstruct okay and deliver it to the you know person who is listening so if the data is missing okay if I have a model I can do prediction okay I can fill in the missing syllable or missing data using x hat k given k-1 x hat k-1 given k-1 okay whenever data comes I can use it and correct the estimate so you know you can use all these ideas for signal reconstruction you know when the data is missing and packet missing you know it happens when you are moving with a mobile phone suddenly the packets miss but yet if you have a sound algorithm which can reconstruct the missing states in between okay using Kalman filter okay you have to have also uncertainty a model for measurement and for the state noise okay and you can develop I mean I am trying to give a simplistic view point of this whole thing it is probably much more complex than what I am saying but you know it is possible to do this possible to do this at least something equivalent can be done in process control I have done this it works so and this is much easier than pole placement approach believe me finding out poles that will give you best rejection of disturbances is very very difficult it is not so easy okay another difficult part is now convergence right we talked about the optimal estimate you said that maximum likelihood view point is gives you best estimate and so on okay now comes as a control engineer we are first bother about the stability then performance right now I talked about the performance first I said this is the best estimate okay so you should ask me what about stability okay will it converge okay it might be the best estimate okay but if it does not converge does not make sense how do you do convergence now this is a little tricky task because this is a time varying system LK is a matrix with changes as a function of time you cannot use simple Eigen value analysis we have to use this Lyapunov stability analysis I am going to take a simplified view point I am going to take I am going to look at Kalman filter as a deterministic system I am going to ignore WK and VK okay there are three sources of uncertainty initial state okay state noise measurement noise state noise and measurement noise are inputs those inputs let us say if they are bounded you know then we do not have to too much worry about the bounded output bounded input will give you bounded output I am first worried about convergence with respect to error in the initial guess okay so I am going to consider restricted problem I am going to say that it is a deterministic system there is no error in the measurement there is no state noise only possible error is in the initial estimate of the state okay x00 and x0 true x and estimate of x at 0 are different okay when error go to 0 between true and estimate okay I proved it no, no, no that does not mean convergence expectation at 0 does not mean it converges slightly different notions we have a class something we can continue so let us look at KF implemented as a deterministic system okay so let us look at this simplified form okay and so I have knocked off measurement noise I have knocked off uncertainty in the states okay Kalman gain competitions are still using those equations recurrent equations these are called as Riccati equations so I have this pk given k-1 is this phi k-1 k-1 to phi transpose plus q then Kalman gain is still computed using the same formula okay and update is then using the same formula so these equations are still the same and q and r are some positive definite matrices okay which quantify uncertainties some uncertainty right now no interpretation from the viewpoint of noise some tuning matrices let us look them as a tuning matrices okay so my question is if I have this Kalman filter okay if I have this Kalman filter will you know this dynamics of error will it be stable okay see this is my error dynamics can you just derive this can you just take this what is the error dynamics this is the plant this is the true plant okay and this is the estimator okay what is the relationship of prediction error and what is the relationship for just check whether you get these two we just subtract plant dynamics from this subtract this equation from the plant dynamics okay you will should get this equation UK gamma UK gamma UK will cancel okay you will get phi into bracket just see if everyone is with me on this see I have this equation x k plus 1 is equal to phi x k plus gamma UK separate from this x hat k plus 1 given k is equal to phi x hat k given k plus gamma UK if I subtract I will get x k plus 1 minus x hat k plus 1 given k is equal to phi into x k minus x hat k given k okay so this is my epsilon k plus 1 given k and this is my epsilon k given k okay so that is what I have done okay second equation follows from the update equation it is not difficult to derive okay so these are my these two coupled equation define the error dynamics okay I will eliminate I will eliminate here epsilon kk I will combine this into this equation these two can be combined into eliminate one of them I am going to work with this prediction error dynamics okay if I show that prediction error dynamics is stable then this also is stable because these two are related by a constant not by a constant matrix but through a matrix this is only a linear transformation of this is only a linear transformation of this so if this goes to 0 this will also go to 0 that is not a problem okay so one of them I have to show that if I show that this dynamics is stable now I cannot apply normal analysis of Eigen values here why because this matrix is time varying okay so I have to use something else okay lots of linear algebra okay so first of all I am defining two matrices here which are pi k given k minus 1 which is inverse of this matrix okay and pi k given k which is inverse of pk given k okay that is the first thing I am defining this inverse second thing is I am skipping huge amount of algebra in between okay I am just giving you one result okay this particular result is very famous result in linear algebra this is called as matrix inversion lemma okay very very rampantly used in derivations of Kalman filter okay matrix inversion lemma talks about A plus BCD inverse and this can be proved very easily just do multiplication okay and you will you can show that left hand side is equal to right hand side it is not difficult to do this okay you just have to be patient multiply all you know matrices properly you will you can you will get i equal to i okay this result is not difficult to do but this result is very useful and I am going to use this result to prove a very complex inequality okay what is this term here this is inverse of pk plus 1 given k okay I am going to use this term okay I am showing that this term is smaller than you know this quantity here plus another quantity now this is a positive definite matrix this is another positive definite matrix and this is a positive definite matrix because why because pk is a positive definite matrix so inverse of pk is also positive definite matrix okay so this inequality with lot of algebra you can prove this inequality okay you have to trust me with this I can put up those derivations and you can have a look at this the end result where I want to come to is more important so there is some complex inequality which we have proved using this okay how it is going to help me I am going to use it to define a Lyapunov function okay and I am going to use Lyapunov stability analysis to prove the convergence okay let us go back to Lyapunov analysis what is done in Lyapunov analysis Lyapunov function is positive definite function and its gradient should be negative definite okay so I want to show that vk plus 1 minus vk okay is negative or is negative definite okay so as time progresses okay you know vk reduces okay as time progresses Lyapunov function reduces if I can construct a Lyapunov function okay where I can show this okay then I establish the stability of the okay I am going to define a Lyapunov function I am going to define a Lyapunov function using see I am going to define a Lyapunov function vk is epsilon transpose p inverse epsilon what is p p is the covariant associated with epsilon okay but we have said that pi is equal to p inverse okay so this is nothing but epsilon transpose pi epsilon okay I have just removed all k k given k minus 1 all that this is what I am doing I am defining a Lyapunov function actually this is the term notice that this will this is the term which appears in the Gaussian distribution e to the power the same term okay I am using the same term to define the Lyapunov function okay so I am defining the Lyapunov function like this I am moving on here I have defined this Lyapunov function vk which is error at k given k minus 1 transpose pi is p inverse into error again at k given k minus 1 what you can show is that vk plus 1 minus vk is a negative function this difference is always a negative function because this can be shown to be equal to this term here on the right hand side this term here appears to be nothing but this complex matrix and how could I prove this I could prove this because of this inequality this inequality follows from matrix inversion lemma I am not giving you a proof I am giving you a sketch of the proof okay I am giving a sketch of the proof I can do Lyapunov stability arguments I can construct a Lyapunov function I can construct a Lyapunov function okay through which I can show that Kalman filter gives me a stable filter if I view it as a deterministic filter I only take initial estimation error into account I forget about measurement noise I forget about the state noise okay I have considered a Lyapunov function which is negative definite okay vk plus 1 minus vk okay is always a negative value why because this omega is a positive definite matrix why is omega positive definite matrix you have to go back and argue this is see what is this function of what is pk pk is p inverse p is a positive definite matrix so p inverse is positive definite matrix okay so this is positive definite matrix into a positive definite matrix will be a positive definite matrix okay what is ? q inverse ? is this the positive definite matrix it always positive matrix sum of positive matrix and positive matrix is a positive matrix inverse of it is a positive matrix okay and then you have again post multiplying by positive matrix so this entire thing is positive matrix okay so I can show this vk plus 1 minus vk is nothing but this term which is always negative because this is positive matrix so this term is always positive for any epsilon okay so minus of this is always negative so this vk always reduces this Lyapunov function always reduces so this is a Lyapunov stable system. So I have proved stability of Kalman filter okay at least in a restricted sense okay I have ignored the state noise measurement noise but I have proved at least for the initial condition the error will go to 0 okay at least the sketch of that proof is yeah yeah yes yes no no no so I am saying that that is a formula p and q are some tuning matrices okay and there is a formula for computing Kalman gain what is that formula this is the formula okay this is a formula for computing the Kalman gain so p and q are some positive different matrices which are some tuning matrices now do not associate viewpoint of covariance okay so p and q are some tuning matrices well I am trying to compress things that I have understood over say 7 8 years the few lectures and you also just think that this is just sensitizing if you pursue in this line you will understand this much more deeper maybe then much more than what I have understood okay so this is a Lyapunov function and then if you make some more assumptions okay you can show that it is not just stable observer but it is asymptotically stable observer and you can even show that it is exponentially stable observer and all that so I have just outlined that here that if you show if you can say that this covariance is bounded between some lower limit and upper limit then you can show that this vk is also bounded between a lower limit and upper limit and you can prove you know asymptotic convergence of error will not only the error dynamics is not only stable but you can show that it is asymptotically stable okay so that you can prove so this particular quantity you can show is always positive quantity and a rate of error will asymptotically go to 0 okay so I have just put here the conditions for asymptotic stability I have shown that if you make one additional assumption that the covariances are bounded okay they do not blow up then you can prove even the asymptotic stability of no that was just stability it just said that it is stable in the sense of Lyapunov I just said that it is stable in the sense of Lyapunov strictly negative definite yeah no but I am saying here it is less than or equal to 0 so it can admit actually positive semi definite case from the yeah no but if you say if you admit this yeah but the equality will no no but what if what if this omega is rank deficient and epsilon lied in the null space of then if it is rank deficient positive definite but rank deficient okay I mean I am just thinking mathematical possibility of so first strict asymptotic stability you have to show this that vk is bounded first of all okay and then it is not sufficient to show what I have shown earlier if you can establish this then you can show asymptotic convergence so let me stop here it is already quite complex what I want to do next is to connect these Kalman filters with time series models that you have developed that is my last task in this series of lectures once I have done that you know I have closed the loop so we started with data driven models and we came to Kalman filters and I will now show that actually Kalman filters have a connection with the data driven models and how they merge where they merge where is the merging point okay so we will come to that in the next lecture and then after I am done with that I am going to start control okay I am done with so the next part of the assignment is computing assignment is you implement the Kalman filter okay.