 So, far we were concerned with a model that is linear but stochastic observations are also linear functions in the state model noise observation noise initial condition are all normally distributed this is the classic LQG problem and in this case we had a complete solution for the filtering problem. So, in this sense Kalman solved one of the fundamental data simulation problem of assimilating data into an imperfect model where imperfections are captured by stochastic model noise. Now we are going to be talking about extensions of Kalman's ideas to assimilating data in this case the data may be nonlinear function of the state. The model itself may evolve according to a nonlinear map. So, we are going to be concerned with the extension to nonlinear stochastic models the stochasticity comes from our assumption relating to assuming that the model noise is again white and the observations are again corrupted by observation noise which is Gaussian. We are again going to fall back on the Gaussian assumption for the both the model noise and the observation noise initial conditions are also going to be random we will assume that to be also Gaussian as in the previous case the primary difference is that because the models nonlinear the forecast loses the Gaussianity property right at the first step. So, we have to contend with non Gaussian processes arising out of the nonlinear systems and that presents lots of challenges in the data assimilation process and we are going to provide how to approach the filtering problem in this case we will not be first talking about first moment second moment the mean and covariance we will be talking about probabilistic characterization of the forecast the probabilistic characterization of the analysis we will try to give an evolution of the forecast probability density analysis probability density these are in general infinite dimensional problems because we are trying to talk about an evolution of the density function in the model space and all the associated mathematical problem challenges that is what we are going to see first. So, let us consider a stochastic model a nonlinear stochastic model we are also trying to generalize the forcing W k plus 1 is a model noise vector we are going to assume the model noise vector is r dimensional sigma x is the coefficient that multiplies the model noise sigma x is a state dependent matrix functions matrix of functions the matrix sigma is n by r we assume it is full rank we assume it captures the model errors. So, if I assume sigma x k is equal to identity if I assume r is equal to n and sigma x k is identity then the the observation the model error is simply a sequence of state independent Gaussian random variables here this is a state dependent noise so this is state dependent. So, we are assuming that the model is driven by in general a state dependent in principle it could be a general state dependent noise process r is a variable r can be in principle less than n r refers to the the the the degrees of freedom that the noise has in terms of in terms of its ability to affect the evolution of the state when r is equal to 1 there is only one scalar noise that affects all the components of the state vector when r is equal to n there are n different noise components that can affect all the components of the state vector depending on the structure of the made the state dependent matrix sigma sigma x. So, there are you can see from the setup by appropriately choosing the sigma matrix by appropriately choosing the value r one can simulate quite a variety of assumptions one can realize quite a variety of assumptions relating to the nature and type of model noise into the system. So, WK is mean 0 WK has a covariance QK QK is a matrix of size r by r the special cases are two special cases are r is n and and and and and sigma XK is equal to IN is the identity matrix when r is n and sigma XK is equal to IN what is that we do there is no state dependent noise the noise is independent of the state the noise becomes a pure Gaussian white noise the the initial conditions are random again it is a multivariate normal distribution with the mean M not and and and covariance may not have. So, given XK WK is random so I can compute the distribution of sigma XK WK plus 1 the probability density of this vector. So, please please realize this is an n vector sigma XK times WK is an n vector and so it is it is mean 0 it is a covariance is given by is given by this expression sigma QK plus 1 sigma transpose. So, we have talked about the choice of the model error or model noise we have also talked about the choice of initial conditions. Now we are going to be talking about the properties of the evolution of the state of the system when namely the forecast when there is no observation that is what is called analysis of stochastic dynamics. We are interested in the conditional probability density of XK we are we are interested in the evolution of conditional probability of of of I should say XK plus 1 given the past. So, what is the probability that XK plus 1 will find itself in a state in a set A. So, A is supposed to be in this case a subset of Rn please realize in some cases this use A is for the subset in some cases use A for matrices. So, the occasion will tell you what that symbol means. So, in this case A is a set. So, what is that we are trying to talk about given the past trajectory model starting from X0 to XK plus 1 we are interested in trying to find out what is the probability that XK plus 1 will belong to a set A in here XK is the present. So, we can think of this XK K XK time K XK is a present this is K plus 1 this is XK plus 1. So, we were trying we are trying to ask ourselves the following question let me draw the picture little bit differently this is K plus 1 this is XK plus 1. So, given the state XK what is the probability XK plus 1 will be a subset of the set A. So, this is the set A at time K plus 1 now I am given the state of the system at time K minus 1 I am given the state of the system as time 1 I am given the state of the system at time 0 even though I am given the complete history from 0 to K if this probability depends only on XK but not on the past. So, K is the present. So, given the present the future probability the future evolution of the state is independent of the is independent of the past that means the state of the system from 0 to K minus 1 does not play a role once XK is given in determining what FK plus 1 is going to be that kind of property is called Markov property. So, what does the Markov property say given the present fast is inconsequential to consider the future. So, the probability that at time K plus 1 it will belong to set A depends only on the current state XK and not on the past. So, the model equation given in the previous slide in fact represents a Markov process as it is evident from the relation if I am given XK I do not have to know anything M of XK can be computed sigma XK can be computed WK K plus 1 can be generated it is WK plus 1 is the one that brings randomness into the in deciding what XK plus 1 is going to be. So, given XK the value of XK plus 1 does not depend on anything before XK it depends on XK and the noise that comes into the system after the time K that is why the model is said to be a discrete time Markov model and the process generated by this model is called a discrete time Markov process. The notion of being Markov is very fundamental it is a stochastic generalization of the deterministic principle what is the principle of determinism if I have a differential equation if I know the state of the system at time K in principle that is enough to be able to compute the state of the system at time K plus 1 because the differential equation tells you the rate at which the system evolves starting from time K therefore Markov property can in many ways thought of a simple extension of the fundamental properties of deterministic dynamical system. In this case we are simply concerned with the discrete time evolution continuous time evolution in the Markov process theory still exists but that theory is little bit more technical in order to reduce the amount of mathematical technicality that one needs to know we can find our attention to the analysis of non-linear difference equation which are driven by a state which could be dependent on state dependent noise vector and the evolution together this describing a discrete time Markov process. Therefore we are interested in what is called the one step transition probability given xk what is the probability of xk plus 1 this is the one step conditional probability is a one step conditional transition probability. So if I am given xk m of xk plus 1 depends on m of xk plus the noise term given xk sigma xk is given expected value of wk plus 1 is 0 therefore the meme is essentially the deterministic part m of xk the covariance of the state is given by sigma qk plus 1 sigma transpose sigma transpose. So given xk xk plus 1 is Gaussian and it has a density function whose expression is little bit complex and but for explicit analysis I am giving the distribution in this particular form and that essentially tells you how the system evolves. So once I know the initial distribution once I know how the system goes from time k to k plus 1 I should in principle be able to pull the system forward in time. So we are now going to talk about how knowing one step transition probability we can compute multi step transition probabilities. Let us consider transition from time 0 to time 1 to time 2 let us assume I am in state x0 to start with I would like to be able to find out what is the probability that x2 will be at the position shown in the figure. So in order to go from x0 to x2 I have to go through an intermediary stage the intermediary stage is the value of the state at time 1. So from x0 I can go to any one point in the one dimension in the x space in this case for simplicity I am trying to show the state space as a vertical line as if it was a one dimensional but the same thing applies to multi dimensional we are simply representing the multi dimensional space by a vertical line. So x1 refers to the state at time 1 so go from x0 is fixed x2 is fixed to go from x0 to x2 I have to go through some x1 in the intermediary so with this the probability of going from x0 at time 0 to x2 at time 2 is given by conditional probability of x2 given x0 so that is what the conditional probability given x2 x0. So I would like to be able to argue now I started from x0 I want to go to x2 x1 can take any intermediary values so I am going to go one step from x0 to x1 and from the chosen x1 I go to x2 this x1 can be any point in the in the space in the state space so I am going to have to multiply this conditional probability p of x2 given x1 times p of x1 given x0 and integrated with respect to x1 dx1 this sum total of the product of the conditional probabilities will give you a two-step conditional probability is called a multi-step conditional probability this relation of trying to find how an expression for the multi-step transition based on one-step transition has come to be called Chapman-Colmograph equation is a very basic equation I can now extend this to a general case instead of 0 to 2 I can now think of the following let us assume I am in state q I am sorry I am in state xq at time q so q is some instant in time I want to be able to go to k I would be able to go to xk at time k if I want to go from q to k I am go I have to go through some intermediary stage you so you can think of that intermediary stage to be you can think of that intermediary stage to be from q to p so p is the time this is xp so you go from xq to xp from xp to xk you go from xq to xp and xp to xk so xp can take any value in here therefore q is fixed k is fixed q of the p xp is are variable I am going to integrate with respect to xp I can once I integrate this I get the transition probability from step q to step k so what is this this is this is the transition from q to p this is the transition from p to k so this is a combination of two multi-step multi-step transition probabilities so by I can so I can break this p to be many things so I can go from q to q plus 1 q plus and then q plus 1 to k that is the possibility in this case p is equal to p is equal to q plus 1 I can split it like q to k minus 1 to k so I can so in this case p is equal to q plus 1 in this case p is equal to k minus 1 and so I can reduce the multi-step transition by a sequence of one-step transitions by a sequence of one-step transitions so I would like to rewrite this so x I would like to be able to rewrite this equation recursively in this way p xk given xq is equal to summation p xk given xk minus 1 times p xk minus 1 given times xq times d xk minus 1 so q to p is related to k minus 1 to k and q to k minus 1 so this is one-step transition this is the multi-step transition I can convert this likewise so this is a recursive relation using this recursive relation I can compose multi-step transition probability involving any number of steps so this general this equation is called Chapman colmograph equation for probability density functions multi-step transition probability. So given a Markov process a Markov process is uniquely defined by the initial condition or initial distribution and the one-step trade transition mechanism if the one-step trade transition mechanism is specified I can create multi-step transition mechanisms probability values using Chapman colmograph equations so that is the conclusion so far now so what is the statement of the nonlinear problem I am given an initial condition p naught x naught please understand pk xk is the probability density of the state xk at time k is the probability density of the state xk at time k in general this probability density will depend on k therefore the subscript for p refers to the time varying density of the state xk as the state evolves according to the dynamical system. So what is the question given p naught x naught this is the initial state distribution which is given by x naught m naught hat and p naught hat please remember that x naught is equal to m naught p naught so I am trying to define my initial analysis to be m naught my initial analysis covariance to be p naught so with this initialization this represents the initial distribution given the initial distribution of the state I would like to be able to find the distribution of the state at time k this is what is called the probability distribution at time k probability distribution at time k so I want to remind the reader now there are several probability density functions we are involved in one is the initial probability density another one is the transition probability from k minus 1 to k another one is pk xk now in all this we know this transition density from the model we are given this initial condition from external specification that is equivalent to specifying the initial conditions for the dynamical system. So given these two our job is to be able to compute the state probability density function at time k so this is called the state probability distribution of xk I would like to be able to compute this quantity now we are going to look at means by which we can arrive at this evolution of the state probability density functions in time so let me state that once more given the model given the forcing given the initial condition the model defines the one step transition the initial condition randomness is given so you can see there are two sources of randomness one coming from the choice of initial condition another coming from the one step state transition these two together decide the state probability density function the state probability density function is called is the pk xk our ultimate goal is to be able to find out how the states of the model are distributed at any given time pk what is pk xk for any k given this I would like to be able to start from what is called the joint density if I have two random variables I can consider there always exists in an appropriately chosen probability space there is a joint density then I can consider the marginal densities I am assuming that we are all familiar with the notion of marginal densities conditional densities joint densities all basic fundamental concepts so consider a joint density of the state from x0 to xk using a simple conditional probability I can express the condition of the joint density as the product of another joint density and the conditional density so this is p of xk given xk-1 through x0 times p of xk through xk-1 through x0 so conditioned on the knowledge from x0 to xk-1 I can I can I can split this into a product of these two but I have already assumed the process is Markov so knowing xk-1 I do not have to know how I got to xk-1 the past is of no consequence is deciding the future if the present is known so xk-1 is the present so the this conditional probability reduces to a one step trade transition probability of the Markov process we are considering so this is the joint density from state 0 to state k-1 so you can now see the joint density from 0 of the state from 0 to k-1 k can be broken down into the product of joint density from 0 to k-1 and one step transition from k-1 to k this recurrence relation this is the recurrence relation and this I can apply this recurrence relation to this term on the right hand side if I apply this continuously now I can you can readily see the joint density is expressible as the product of the conditional densities that is a typo this is x sub i this is x i-1 so the the product of the conditional densities times the initial density so this is the initial density this is the conditional density so I can you can readily see the joint density is the product of the one step state transition densities and the initial density one step state transition density is given by the mark the model this is the initial density so I am expressing the conditional density as a product of everything that I know so I am now going to be looking for expressions for the joint density little bit more characterization so we now know p of x naught is normal one step state transition probability which is the conditional probability that is also normal we have already argued about the normality of the one step state transition probability for the model equations in the previous step we have expressed the joint density as the product of the conditional densities and the initial densities therefore I can express the joint density as the product of the normal densities and another normal density the product of k normal densities referring to the k step transition from 0 to k and the initial density if I substitute the expressions for each of these normal densities and simplify I get a constant times exponential of minus gk minus one half of gk times the initial density where gk has an expression which is the sum of xk minus one minus m of xi minus one transpose the inverse of the covariance matrix of the one step transition times xi minus m of xi minus one so that is a quadratic form this quadratic form is a non-linear is non-linear is much more than quadratic because m in general is a non-linear function this will become an actual quadratic form only when the model is linear when the model is not linear so what do you mean by saying model is linear m of xi minus one is equal to m of xi minus one that is a linear case in this case it is a quadratic form if not this is not a quadratic form so in principle this one is not a quadratic form it is it is more complex than a quadratic function is a non-linear function much of the difficulty in computing the joint density arises from the this complex nature of the non-linearity that enters into the description that in the description of joint density so the ck is given by this constant so given this now we have computed the joint density at least mathematically in the form given by gk in the in the form that is given by the exponent gk the expression for the exponent gk is a complex non-linear function if you recall we are in the middle of discussion of non-linear filtering in the case of non-linear filtering because of the non-linearity we cannot simply be content with first moment and second moment the complete solution is given by the entire probability density function for the forecast for analysis once you know the probability density function then we can compute any number of moments first moment second moment extra extra this is largely because of the fact that the non-linearity in the system even though the initial condition may be maybe in Gaussian distributed maybe the one step transition probabilities of the non-linear system that defines the Markov process which are also Gaussian in spite of the fact if you want to be able to compute the state distribution at any given time that is highly a non-linear function and it is far from being normal we are trying to get a handle on this important quantity namely pk xk you may recall from the previous page what is pk xk is pk xk is the probability density function of the state at time k there are 2k's one for the state x of k and the state itself is changing another there is a substitute for p p of k of xk p of k refers to the density the probability density of the state xk at time k the subscript k associated with p tells that the probability density function is changing in time not only the state is changing in time the probability density is also changing in time. It is this quantity which is of interest to us and we are trying to express the joint densities to start with I am trying to go over some of the things we have already done. So, I am trying to compute the joint density the state from 0 from time 0 to time k using this recurrence we just saw it can be expressed as a product of conditional densities and the initial density. Initial density is normal condition density is normal so the joint density is given by constant ck times exponential minus one half time gk normal with mean m not hat and p not hat the crux of the expression relates to analyzing what is contained in gk gk is given by this complicated expression which is the exponent and the large that the most of the difficulty arise from the fact m is not a linear function in case m of xi is equal to xi-1 is equal to m times xi-1 the exponent becomes a simple quadratic function because m is general not necessarily a linear function this exponent is in general more nonlinear than quadratic functions. So, in general they are quadratic functions and this is largely the major difficulty in trying to quantify the state distribution the distribution of xk at time k which is pk xk but at least theoretically one can compute the joint density of the state from time 0 to time k once you have the joint density from time 0 to time k I am still interested in finding not the joint density but the state density at time k pk xk you know that pk xk is the marginal distribution of the joint density so this is the joint density if I integrate the joint density over all the variables other than xk so this is integration is from x0 x1 through xk-1 so there are k iterated integrals each of these iterated integrals are integrals over Rn because Rn is the state space so when I say integral over xi that is equivalent to integral over Rn so integral over xi integral over Rn so this is the repeated integration in the in the n dimensional space please recall we are not trying to do the actual integration we are trying to develop the theory so the theory can go anywhere but ultimately we are interested in trying to compute the probability density function for the state xk at time k we can obtain now a recursive form for this pk xk why this pk xk can be expressed in a recursive form pk xk is the probability density of the state xk at time k so if I know pk-1 xk-1 that is the probability density of the state k-1 at time k-1 then from xk I can go to xk-1 by the one step transition probability rule so given this I can compute the transition density from k-1 to xk so this integration is over xk-1 so this is this essentially follows from basic probability theory a basic probabilistic arguments in particular when k is 1 p1 x1 that is in the probability density of the state at time 1 is equal to the initial state distribution that is the initial condition this is the one step transition probability the initial density is Gaussian the conditional density is Gaussian but the conditional density is a function of the model map model map is highly non-linear therefore p1 x1 can in principle be expressed by this integral but it is far from being Gaussian that is the primary difference between the linear and non-linear filter much of the difficulty associated with the non-linear filtering at least in one part comes from this inability to preserve normality under non-linear transformation. So now let us give a little bit more life to this p0 x0 is normal p of x1 given x0 is normal but if I substitute all the normal expressions for this this integral becomes equal to this well I am now going to talk about the exponent the exponent is alpha times x1 x0 that alpha times alpha of x1 x0 is given by this function that is the exponent that describes the product density the product of the conditional density and the initial density you can readily see this M is a model map this product that tries to make it a non-linear function. So I cannot simply re-rated it Gaussian and this alpha of x1 x0 has also another term that comes from the initial condition. So the initial condition current contribution is quadratic but the contribution from one step transition is not it is a combination of these two terms makes p1 x1 far from being Gaussian that is the real rub when it comes to non-linear filtering as you move from linear to non-linear maps as you go from linear to non-linear models. So if I can so by this we have seen that p1 x1 is not normal therefore pk xk is not normal the non-normality continues to dominate the show because the state distributions are not normal it is not enough to compute the mean and the variance I need to be able to compute the entire distribution. So the non-linear filter seeks to update not the mean and the covariance as the linear Kalman filter did let us go back what is that thing we did in the Kalman filter we updated the forecast mean we update the forecast covariance we update the analysis mean we update the analysis covariance because everything is normal by knowing the mean and the variance I know the entire distribution that is not the case and that is largely the difficulty. So pk xk in principle one can compute the only way to be able to compute these things is numerically of course there are very many good numerical integration packages one can utilize to be able to compute this but what is that we are seeking we are seeking something a sequential algorithm what is the sequential algorithm like in Kalman filter I am going from state time k to time k plus 1. So knowing the analysis and its covariance at time k I would like to be able to compute the forecast and this covariance of time k plus 1 observation comes I am now going to recompute the analysis and the covariance at the next time interval that is the sequential nature that is what we are looking for. So we would like to go from time k to time k plus 1 we would like to be able to update p k-1 xk-1 to pk xk if I can do that that is what the sequential algorithm is all about please recall each of these are functions each of these are continuous I am assuming the density function they are continuous these are functions continuous functions defined over the n dimensional space and the integral of this must be 1 these functions have to be positive now we can see the I am talking about non I am talking about positive function functions that are non negative and whose integral has to be 1 and they are defined the n dimensional space and when n is large 100,000, 10,000 million you can see the associated difficulties in trying to keep the non negativity of this function when you are trying to do the numerical computation these are some of the challenges one will find themselves in when you are trying to convert these things into numerical algorithms. So what is that we have accomplished we have simply analyzed the model forecast with no observation starting from the initial distribution knowing the one step transition probability of the underlying Markov process defined by the model I am now at least theoretically be able to explain or express pkxk so we need we started from we started from p naught x naught we had access to pkxk-1 from by combining these now I have an expression for pkxk no data is involved it is simply model analysis this idea of trying to explain the evolution of pkxk that is part of the stochastic dynamics that is the stochastic dynamics part. So the complete information that one can hope to give the case of a stochastic dynamics is the probability is the evolution of the probability density function now let us bring in the filter filter means observation so when I am going to be developing expressions for the nonlinear filter I have to have now two kinds of densities one is the predicted density another is the filtered density what is the predicted density it is the density of the state at time k plus 1 given all the observations so that embeds the model as well as the observation so we are going to call this as the predicted density the filter density is going to be f of k of xk which is which is given all the observations from 1 to k I would like to get the best estimate of the state at time k please recall the Kolmogorov v in a definition given all the information up to time k estimating the state of the system at time k is called the filter problem given all the information up from time 1 to k trying to know the state of the system at time k plus 1 that is a prediction problem here instead of simply predicting the mean of the covariance we have to predict the entire distribution itself so this is the predicted density this is the filter density we have some idea of the state density evolution now I would like to talk about the structure of the filter density. So f of k xk the filter density also changes in time so f of sub k of xk xk changes in time f of k also changes in time much like the state density distribution changed so by definition this is equal to the probability density of xk given z 1 to k the observations from time 1 to time k this can be written the joint this can be written as the integral of the conditional density of the state of the system from x0 to xk given z 1 to k in the integration is from k-1 xk-2 and x0 and that is the repeated integration in here that comes essentially from basic definition of the density functions I want you to understand these are all mathematical possibilities we want to know that it is first mathematically feasible to describe what I want leaving the computational problem after the feasibility studies have been completed so this conditional density of the state at time k given the observation from 1 to k is essentially the marginal of the joint conditional density integrated over x0 to xk-1 I think that should be particular from basic probability argument the predicted density now is pk plus 1 I am giving again expressions for these filtered density and predicted density we saw the filtered density in equation 1 we are now giving the predicted density in equation 2 predicted density is likewise you have the look at this now the joint predicted density condition on z1 to z of 1 to k integrated over 0 to k so that is the expression for the predicted density these two densities in principle make sense but these are all not in the recursive form so what is our goal to say I have a non-linear filter is to be able to rewrite these two equations 1 and 2 in a recursive form so we arrive at a simple recursive form and who is going to provide the key to the recursive form the maca property the underlying maca property of the stochastic model please understand that the key so what is the maca model essentially tells you if I know the state at time k and if I know what comes after time k knowing what comes after time k I should be able to precisely probabilistically predict what the state will be we would know the exact value but we would be able to tell the distribution of the state of the system at the next time the maca property depends critically on the one step state transition probability and we are going to exploit that property to be able to write equation 1 and equation 2 in a simple beautiful recursive form once that is accomplished at least in principle we would have solved the non-linear filtering problem to tell how to make the prediction of the density of the predicted state then given the predicted density and the distribution of the observation I am going to get the filter density the filter density represents the analysis the predicted density represents the forecast step so you can see essentially all the ingredients of the carbon filter are alive and well instead of computing the vectors and matrices which are the first and second moment we are going to have to update the entire functions over their dimensional space that is the key to understanding non-linear filter equations. So with this as the background now I am going to talk about manipulating the expressions for fk fsk is what I want now let us start with the basic statement given the observation from 1 to k the condition on that I have a probability density over the state from x 0 to xk so that is the conditional probability distribution given the observation up to time k a you can you can see the following I am xk is involved I am also interested in the trajectory of the system starting from x 0 to xk I am condition on I am conditioning everything on all the observations up to including time k z1 colon k z1 colon k represents all the information that are obtained from time 1 to time k of course inherent in here is the model information to why going from x0 x1 is given by the model so there is model information that is observation information so this is the gobbledigou mix of both the model and the observations. Now using Bayes rule this conditional probability can be written as z1k conditioned on the state 0 to k times the probability of the trajectory going from 0 to k divided by p the probability of observing the observations from 1 to k that essentially follows from simple Bayes rule so we have already applied Bayes rule as in 3 now I am going to I am going to express this probability which is the probability of the trajectory starting from x0 xk we have already seen using Markov property the joint probability of the trajectory starting from x0 to x1 is given by p0 of x0 so we can we can we can talk about this now so p0 of x0 then p of x1 given x0 then p of x2 given x1 all the way up to p of x k-1 times xk so what does it tell you this is the probability of observing a particular trajectory of the system starting from x0 and it is this is encapsulated in here it is simply the product of the transition probabilities that define the path times times the initial distribution initial distribution so what is this this is a stochastic analog of simply recursing the set of equations simply recursing as a set of equation I will give you a quick analog in a linear system if I have xk plus 1 is equal to m times xk x0 is given from here we would know xk is equal to m to the power of k x0 so I should be able to relate the system at time k to the time x0 through the k step transition probability matrix the kth power of it and that is what happens in the linear system in the non-linear system this cannot be done but you can think of this to be an analog of what happens what happens in the case of linear system so for those of us who are who would like to have a link that is how you need to look at this given the initial condition given the state transition map how does the state transition map and the initial condition together define the trajectory so this is the probability of observing the entire trajectory from x0 to x1 to xi xk-1 to xk now using the Markov property so this is one of the terms in the Bayes rule it is the second term in the numerator on the right hand side of 3 now I am going to consider the first term in the numerator of 3 on the right hand side of 3 and that is what is given by this z1 colon k what do you mean means what I have all the observations from z1 to zk I am conditioning it on the trajectory now look at this now I have already computed the probability of observing the trajectory so conditioning this is known so given the probability that a particular trajectory is observed I can now condition on that particular trajectory I can then compute the probability of the observation condition on that trajectory that is that whole idea that is a very simple idea again we are going to apply the manipulation of conditional probability again and again so this can be written as probability of z1 given z2 zk x0 to xk times probability of z2 to zk and the same trajectory now let us look at this now z1 is the observation at time 1 x0 x1 these are the status system at time 0 and time 1 x1 depends on x0 and z1 depends on x1 z1 does not depends on x2 what is x2 x2 the status of the system at time 2 what is z1 z1 is the observation at time 1 the observation at time 1 does not depend on the future state I hope that is very clear so in view of the non dependence of the observation z1 on states beyond x1 that means z1 does not depend on x2 to xk if it does not depend on x2 to xk the conditioning has no value I can drop that conditioning out of consideration so I can rewrite this term as P of z1 conditioned on x1 times P of then the rest of this the rest of the term comes in here right now so now you can see I am trying to bring a recursive structure into the system and this recursive structure is again a consequence of Markov property consequence of Markov property so let me let me say it once more if I have a time 1 z1 is an observation that comes at time 1 x2 is a future state it makes sense to think about the system does not have the anticipatory power in other words my today's observation of temperature is not going to take into account tomorrow's temperature today's observation is measured on today and perhaps some of the past even that has taken place so same consideration because of such simple arguments you can really see the conditioning even though I have conditioned this on z2 to zk x0 to xk today's observation does not depend on tomorrow's observation today's observation does not depend on tomorrow's state therefore the dependence of z1 on z2 to zk can be dropped depend on dependence of z1 on state from x2 to xk can be dropped therefore z1 can at best depend on x1 so the first term simplifies as follows that again comes from the Markov property as we observed now look at this now this is the left hand side this is one of the terms in the right hand side this is the first term and this is the second term the second term and the left hand side are exactly the same except that the left hand side is from 1 to k the right hand side is from 2 to k so what does it mean I have expresses a recursive structure this recursive structure now can be adapted to the second term the right hand side I can further apply the recursive structure so if I apply this recursive structure continuously open it up I have that is called iterating so iterating iterating this we get this density is equivalent to product of product of probability of zi versus xi i is equal to 1 to k. So we can now I have 5 I have 4 I can substitute 5 and 4 look at this now what is 4 4 relates the second term on the numerator on the right hand side of the base rule what is 5 5 relates to the expansion in the property of the first term on the right hand side of the base rule so I substitute 4 and 5 in 3 simplify the trajectory given the observation the filter density is simply the marginal density of this conditional density integrated with respect to x0 to xk-1 so that is what this one is so this is integrated x-1 x-2 and x0 so the entire expression for the filter density in full form is given by this look at this now I am multiplying 1 over the probability of observing the k first to k observation 1 over probability of z of 1 column k that comes from the denominator of the base rule that was given in equation 3 the right hand side of the equation 3 the numerator of the base rule I have utilized the recursive property and broken down into several factors. So one of them relates to one step transition probabilities now look at this now the structure is absolutely beautiful this this relates to the model transition probability which is given by the Markov process this is given by the conditional density of the observation conditional density of observation conditional density of the observation this is the initial density initial condition if you want to call it and this is integration with respect to integration with respect to the k time variable that is what comes from here that is what comes from here. So this is a this expression is a complex expression it is a it is it is but it is a easy it is easy to understand so the filter density what is the filter density the density of the state at time k given all the observation is equal to 1 over the probability of observing all the observations times the integral the k fold integral along the path from 0 to k minus 1 and the integrand is the product of the initial density the model one step transition probability and the conditional density the observation given the state. So nothing could be more beautiful than this we know the conditional density we know the state transition distribution we also know the initial conditions what is that only thing we need to do we need to have it all multiply them have it all multiply them and that gives you an expression for the filter density. So it is not that we cannot compute the filter density it is simply that computation of this difficult conceptually it is possible so this is one part of the solution of the non-lini filtering problem. Now I would like to come to the forecast density please understand filter density is the analysis so we have done the analysis part now I would like to be able to do the forecast part in Kalman filter equation the forecast depends on the previous analysis the analysis depends on the previous forecast it is this interdependency the forecast analysis that makes the sequential method very beautiful and extremely effective. So now let us try to compute the forecast density what is the forecast density is the probability of given observation 1 to k I would like to be able to explain not only what happened up to k but also beyond k given z 1 to k what happened up to k that is filtered that is done now I want to know what is happening beyond that is why this is called filter density again I am now talking like a broken record this conditional probability can be written again broken down by applying the conditional probability rule again to gain the recursive form. So this is equal to probability of xk plus 1 having observed the state from x0 to xk and having observed the observation 1 to k times the probability of being able to observe the state from 0 to k given the observations at 1 to k observation from 1 to k I hope that transition is clear it is simply a very simple probabilistic rule of trying to express the conditional density as a product of two other related conditional densities is a very simple mechanism. Now let us come to the first term of this product term what is given I am interested in the conditional density of xk plus 1 different the entire trajectory from x0 xk and the observation from 1 to k but the process k is xk is Markov. So once the Markov process xk plus 1 depends only on xk and nothing else nothing else matters therefore the first factor depends the conditioning depends only on xk so the first factor reduces to probability of k plus 1 given xk that comes from the previous discussion that we have already had we had already had therefore the second factor the second factor is the right hand side of 6 so let us go back to the right hand side of 6 the right hand side of 6 is the filter density so let us look at this now I would like to spend a minute on that so the probability density of xk plus 1 given the entire trajectory 0 through k and the observation times probability given 0 observation 1 to k and this now what is this part that is the filter density by 6 that is the right hand side of 6 therefore by identifying this to be the filter density now I can express what I want so that is 8 is a beautiful expression for the forecast density so the forecast density now can be written using the right hand side of 6 let us go back let us go back the right hand side of 6 is given by this expression which is the k fold multiple integral so I am going to I am going to copy that multiple integral in here therefore the forecast density is simply the forecast density is simply integral of the entire path given the observation and the integration is from 0 to that and again I can I can decompose this from the previous argument to this integral please understand we have already done that decomposition therefore pk plus 1 of xk plus 1 is mouthful you can see it will probably take 5 minutes to write the slide I am trying to spend less than a minute on this you understand that but all I am trying to do is nothing new it is simply manipulation of conditional probabilities that is all what it is so except for the complications in the size of the expressions the ideas are extremely simple therefore by substituting this quantity from the previous slide I get this part that is the integrand that is integrand hopefully that is clear to all of us now I can rewrite that integrand into the product of conditional density once of transition probability yeah I can I can I can express this as a one step transition probability times this quantity and that quantity is essentially fk xk and times one step one step transition look at this this is beautiful what is that what is that we have we have said analysis k is equal to forecast at time k plus the Coleman gain 3k minus h of xkf that is the that is the data assimilation step and to be able to get xk plus one f is equal to this is m times xk hat that the that is the forecast equation this is the analysis equation we saw the embodiment of analysis equation previously with respect to the filter density this is the analog of this look at this now the forecast at time k plus one in the linear case is model times the analysis at time k now let us look at this here f of k what is this this is the analysis at time k why is this called analysis at time k this is not a vector that gives the analysis this is the analysis density that is the filter density so model operates on here model operates on analysis here model operates on analysis density what is the model one step transition matrix therefore if I multiply if I multiply the analysis density with the one step trade trade state transition probability integrated over xk I get this so I have already said this this is the analog of analog of the forecast at time k plus one is equal to model times the for the analysis at time k my analysis time time k so we have now considered both the equations so let us let us try to think of the non-linear filter a little bit more simplification is needed if you consider this conditional probability that can be written like this again by applying simple base rule this is a simple base rule I am sorry this one I should have said xk xk minus one the k and x were at the same level I am sorry for that this is product at x naught so that is the numerator part of it so this essentially for us from the base rule which is given the right hand side but z the probability the observation given given the the state again given by in long form given by this we already know probably the probability of observing k given the entire thing depends can be written as a proper product of these two conditional densities but this one depends only on the state at time k so that becomes the conditional probability distribution of the observation given xk and then we get the second quantity from here so I get the recursive form by again I can rewrite this in this form by the base rule this must be zk minus one sorry z zk minus one this again must be xk xk minus one xk minus one I am applying the same base rule here 11 so you can readily see you can readily see how I am able to compute the probability of observing the set of all observations given the trajectory in this particular form as given by 11 as given by 11 the substitute so now look at this now substitute this 11 into 10 you you then you essentially get you get you get what you want so this is the this is the the forecast sorry this is the this is the filter density which can be written like this which can be again written from from from substituting 11 into 10 by substituting 11 into 10 we get this form and this is exactly the relation given in our book sorry this is exactly the relation given in our book chapter 29 equation 21 and section 2 so with this we have readily seen the recurrence relation relating to f depends on p and p depends on f let me let me go back to and and and and and and and say equation 9 the predictive density depends on the filter density and the filter density and the filter density in in on the right hand side of 7 also depends can be rewritten as as as the as the as the predictive density so these two equation together gives you gives you the expression for the nonlinear filter in in in general term you can now I am going to further simplify f of kxk let me let me go back you can see there are lots of things to be done this is the equation 7 so I am going to substitute 8 9 and 10 and 11 to get back to the forecast I am sorry the filter density coming back the filter density so the filter density again from fundamental principles is given by integral of of this and that can be written by using Bayes rule by this then in the previous slide in equation 10 and 11 we have broken this down by applying Bayes rule that gave rise to the factor like this which comes out of the integral and this is what I have within the integral if I integrate this I I get an expression which is probability of zk given xk times probability of pkxk the probability of pkxk this probability of pkxk comes from here so the problem this integral in its entirety gives rise to pkxk now look at this this is the analog of the of the analysis the analysis of time k plus 1 is equal to forecast the time k plus on Kalman filter times this what is the analog coming in here this is the filter density this filter density depends on the forecast density I am sorry this is the filter density depends on the forecast density and and and and the observation so this is the density the observations and and and that gives you the recursive form and this is the analog of the analog of the analysis step this is the analog of the analysis stuff yes this is easily said than done I hope you are able to keep track of the all the major major you are able to keep track of all the major issues in here and that is the that is the relation that relates the forecast with the filter the forecast of the filter forecast of the filter I hope it is clear so substituting again 11 in 10 you can see this relation is again given by the 29.2.1 in our book and that is the important relation that relates the that relates the predictor density with the filter density. So in summary what is that we have accomplished yes I know I may have some of you might think that I may I may have gone little fast but again this is an advanced course in this course we are we will not be able to hand carry you and show you every little step but we have shown all the basic major steps going from one step to another step is largely part of the exercise I hope you will be able to pursue but it is this will provide you have good big picture modulo some of the algebra I hope I hope with this you are able to see the relation. So 13 I am sorry 12 12 relates the forecast to the filter 9 relates the filter to the forecast so 9 is the is the model forecast step 12 is the data simulation step you can readily see the data simulation step let us let us spend one or two more minutes on this the filter density at time k is the predictor density at time k if xk is known I can condition on xk I can condition on xk then this is the conditional density of observation and once I have conditional density observation the this ratio comes as again as a multiplying factor which is essentially meant to induce that the density is the condition for the density is observed. What is the condition for density the integral must be 1 so you can think of this as simply a multiplying constant a multiplying constant and what are those this is the ratio of the probabilities of observing the observations the ratio of the ratio of the probabilities of observing the observation yes the program the the the the the the expressions little complex but the basic idea of going from forecast analysis analysis the forecast forecast analysis in the function space must be clear it is this iteration of the function space which is an infinite dimensional space which makes this iterative scheme impractical except for in very simple cases what are the simple cases linear Gaussian quadratic is one case where I can implement this because it reduces to updating the mean and the covariances in the literature they have identified a few handful of other cases combinations of nonlinear systems and and associated noise where they could explicitly express these integrals in closed form so other than these simple elementary cases these equations in general are not are not easy to compute and hence the difficulty of nonlinear filtering I want to reemphasize it is not that we do not know how to do nonlinear filtering this has been done way back in the mid 60s our derivation depends on the the development in Bucie's book our development depends on the book by Bucie Bucie I did not spell correctly sorry our our which at Bucie the original paper was by Kalman the second paper was Kalman and Bucie Kalman originally derived the filter in discrete time at the same time Bucie was also deriving the Kalman filter in continuous time when Bucie submitted the paper Kalman's first paper was already under review has been accepted so the reviewers asked both of them to get together and and and and publish a common paper so first is Kalman second is Kalman Bucie and Bucie has been working in nonlinear filtering ever since 19 late 50s early 60s and and the derivation that we had given here is adapted from Bucie's papers and and and a monograph he wrote so this essentially meant to provide you the idea that filtering problem what is filtering problem the data the sequential data simulation problem in a nonlinear system is solved theoretically but not computational that is the story so far we had concentrated on deriving the filter equation and the predictor equation on the function space these are infinite dimensional in nature computationally extremely demanding the next question is even if I spent lot of effort to be able to get the entire distribution from the forecaster perspective what kind of forecast product I have to develop from these probability distributions if you think about it for a moment more often than not we are used to interpreting the mean we have a reasonably good interpretation of the variance I do not know what would mean to a public consumption if I say the third moment is this the fourth moment is this so third moment relates to skewness of the distribution fourth moment is called kurtosis skewness of the distribution essentially tells you the mean and the mode may be different or the whole thing can be tilted one way or the other the kurtosis if the kurtosis is large what does that tell you the tails are thick the kurtosis for the normal distribution distribution is 3 for the standard normal distribution is 3 what is meant by saying the kurtosis is larger the kurtosis larger statistically implies that the probability mass for very large values of the state variable are larger if the probability mass for very large values of state variable is large means what high impact events could occur with a larger probability that is what kurtosis larger kurtosis means that is called tail of the probability distributions so if you are trying to develop a forecast product generally we can only process the first moment second moment third moment I am not sure how we we use it in our interpretation of events that could occur fourth moment if you say kurtosis is more than 3 I am not sure it is very easy to interpret the likelihood of likelihood of rare events happening so kurtosis means kurtosis large means rare events can occur much more frequently than smaller kurtosis that is what it means so in principle larger kurtosis means the potential for extremely rare events to happen with a higher frequency that is all what it means so looking from our ability as well as the usage of statistical quantity to interpret random phenomena in nature we generally settle down are being settled down on being able to predict the first moment which is the mean the second moment or the second centred moment which is the variance so from that perspective while we have in principle derived expressions for the update of the filter equations or the predictor equation in the n dimensional space more often than not we are interested simply in moment dynamics what is moment dynamics how the mean update themselves or evolve how the variance evolves please go back the Kalman filtering is essentially dynamics of the first two moments mean of the covariance and that fits everything we generally know how to do in statistic and how to interpret in statistics so given these we are now going to look at how to derive the moment dynamics from the dynamics of probability density functions that is what we are after now so consider your nonlinear dynamics given by this consider a nonlinear observation given by this the forecast step what is the forecast step the conditional expectation of the best linear estimate I hope it is clear from our discussion of the statistics conditional expectation is the best estimate the conditional expectation is the best linear estimate and we are now going to be looking at we are now going to be looking at what is the best way to be able to compute this conditional expectation of the forecast so the forecast at K plus 1 is equal to expected value of X of K plus 1 given Z at 1 to K Z at Z from 1 to K means what I have I have been given all the observations from Z 1 to Z K but and I am I also know X K I also know X naught I know every state I would like to be able to predict X K plus 1 so that is that is so given all the information X K as well as Z K I would like to be able to predict X K plus 1 and we have already seen in the derivation of the Kalman filter equation forecast the the the best estimate for the forecast is the conditional expectation of the state given all the observation this expectation is taken with respect to the predictor density predictor density but X K plus 1 from the model is given by this conditioned on that conditioned on observation 1 to K the expected value of W K plus 1 is 0 therefore it reduces to the conditional distribution of the non-linear value of X of K given X 1 to K this conditional density I am now going to call as m hat of X K m hat of X K please understand evaluating this conditional expectation is not easy but such a conditional expectation exists I am going to call it m hat X K what is m hat X K it is the average of the value of the state pass through the non-linear map given all the observation the expectation is with respect to the the the predictor density. So you can really see m hat of X K is equal to is equal to m hat of X K is not equal to m of X K unless m is linear only in the case of linear linearity m of m hat of X K which is equal to m of X K that is the linear case. So in general the expectation that we got in the previous step the conditional expectation in that we got in the previous step is not equal to m of X K so what is the idea I am trying to seek approximations I am going to seek approximations to the to the to the conditional moment. So what is the what is the basic idea here I have analysis X hat K which is which is so let us pretend I have an analysis X hat K I am going to be I am going to be I am going to be approximating m hat of X K around X hat K. So let us look at this what is what is the idea here suppose I have X naught hat initial condition I know the analysis I am going to be able to make a prediction I would like to be able to make a prediction X 1 f and what is that this is equal to E of m of X 1 given Z 1 and that is equal to m hat of X 1 then general this is not equal to m of so I should have put a thing in here therefore therefore it is very difficult to compute m hat of X 1 because is a is a conditional is an integral relating to the conditional expectation so I can only approximate. So what is that we are going to approximate we are going to approximate m hat in a small neighborhood around X hat of K so what is X hat of K X hat of K is an approximate analysis known at time K I am going to approximate my forecast around that so we seek an approximation of m of X K near X K hat so f of K is equal to m of X K minus so what is that that is the error m of X K is the actual value m of X K is the expected value so you can think of that as an anomaly f K hat is the conditional expectation you can see the hat reports the conditional expectation so this is going to be this is this is f K so a conditional expectation of K f of K given the all the observation that is 0 that you can readily see from the definition of f of K because if you took the conditional expectation the conditional expectation of m of X K given Z 1 colon K is equal to m hat of X K please remember that is the definition and that immediately when applied to this immediately implies this when applied to this immediately implies that therefore the error the anomaly has expected conditional expected value 0. I am now going to define the forecast error which is given by this the forecast error is equal to X K plus 1 is given by this that is the forecast from our definition f of K this the first term that if you combine these two that is f of K therefore if I use if I use my definition of f of K e K plus what is equal to e K plus 1 f is equal to f of K plus w K plus 1 that is the that is the that is the forecast error. So you can you can write readily see how the approximation start building up I have f of K I have e of K plus 1 that is called the forecast error the forecast error has two terms one due to f of K another due to w K plus 1 this is the forecast error term which is very similar to what we have what we have in the in the in the in the linear case. So if I consider the conditional expected value of e K plus 1 of f given this I can now substitute e K plus 1 which gives raise to these two terms the first term is 0 because of the definition of f of K the second term is 0 because of the definition of the noise therefore e K plus 1 f is unbiased. So therefore x K plus 1 is an unbiased estimate this when combined with the least square estimate it becomes the minimum variance estimation. Therefore x K plus 1 f is also a minimum variance estimation in our based on the the the basic statistical information we have created. So compute the second order approximation properties of e K plus 1 f second order properties relates to the covariance structure. So the covariance structure p K plus 1 that is equal to that is equal to this must be this must be e. So this must be e K plus 1 f times e K plus 1 f transpose given z 1 to K expected value that is the expression e K plus 1 is f of K plus w K plus 1 f of K plus w K plus 1 transpose if you multiply both of them f of K and w K plus 1 are uncorrelated because f of K depends only on up to time K w K plus 1 is what happens after time K. So this reduces to this equation. So you can think of p K plus 1 that is a second moment you can think of so let us look at what what is that we have accomplished. We have we have a forecast look at this now we have a forecast in we have a forecast in in page 18 you have the forecast covariance approximation in page I am sorry at this stage it is not approximation we have a computer everything reasonably exactly but we are going to later see f K is not easy to handle because f K has m bar m bar has to be approximated therefore but at least in principle this is the forecast covariance. In other words I am trying to derive the general expressions assuming everything is possible without worrying about computational issues right now. So the data simulation step now can again be given by x K plus 1 I am trying to do what I did in the case of a linear linear case I am going to do the derivation from from the scratch ground up. So if I have I am going to make my analysis depends on a plus K times z K plus 1 you may remember in one of the earlier discussions of statistical estimation this is a structure the linear estimation a is the vector K is the matrix. So this is the so you can express the analysis as a linear function of the observation where a and K have to be determined to make the analysis unbiased and also analysis of minimum variance. So with that in mind I have e bar I am sorry e hat of K plus 1 is equal to the basic definition in here which is given by this equation which is given by this equation so this is the this is the equation for x K plus 1. So my job is to be able to find a K such that my job is to be able to find a K such that this is a blue x K plus 1 is a blue. So if I did that if I did that sorry I am going to take the conditional expectation on both sides forcing unbiasedness that gives raise to the the value of a. Now please understand I have H hat essentially comes from the fact that I have been given the expression for the update. So this is the expression for the update so I have a computed according to this relation sorry I have a computed according to this relation therefore if I substituted this a in this expression which is in this expression sorry in this expression in page in in page 21. So this is the expression I am going to compute I have already computed a I am going to substitute a in here in star if I did that I am going to get a structure for a given by this this when substituted in the previous expression get the structure you can see this is the this is the typical update from Kalman. So what is h bar h bar is given by the conditional expectation please understand this is as difficult to compute as m bar x K h bar x K this is equal to expected value of x K let us go back I want to be able to remind you where it was. So look at this in page 18 is given by the conditional expectation of the conditional expectation of the conditional expectation of m of x K given observations 1 through K and that is exactly what we are going to sorry that is right this is this is conditional expectation given z 1 colon K likewise likewise for h h of x K bar h of x K bar in this case is equal to e of e of h of x K plus 1 given z 1 colon K 1 colon K and and these are the 2 difficult quantities to compute. So even though these are difficult quantities to compute by we know such quantity exists mathematically so we have derived the underlying expressions following the derivations of the linear Kalman filter assuming all the complicated integrals can be evaluated. Now I am going to derive the moment dynamics let gK be the difference h of x K plus 1 minus h of x K plus 1 hat we already know from this definition if I take the conditional expectations of both sides that is 0 therefore this the analysis error is given by x K plus 1 minus x hat of K plus 1 I already know the structure of the x hat K plus 1 using the Kalman filter equation. So this can be rewritten using the definition of gK this can be rewritten as this when combined with this equation so e K plus 1 look at this now the analysis error is equal to forecast error minus correction gK is a random quantity vK is a random quantity gK and vK are random quantity I want to be able to compute the analysis covariance in this case this must be I think the left hand side must be pK plus 1 hat pK plus 1 hat is equal to if this is the analysis error expression this times its transpose conditioned on z is going to be pK plus 1 hat if I multiply these two after I do lot of algebra I get this expression you can really see in this expression I have aK is a matrix aK is a matrix given by gK e K plus 1 f conditional expectation on z 1 to K dK is given by cK plus rK plus 1 and cK is given by gK gK transpose again conditioned on that. So there is a lot of lot of notations in here I can compute I have an expression for aK I have an expression for dK I have an expression for cK so if I look at this expression I know dK I know aK I know pK plus 1 the only thing I do not know is K I am sorry I know aK but I do not know K sorry so my job is to be able to find K such that so this d is known I want to be able to find K such that so aK is known dK is known pK f is known K is not known so I am going back to the old homework how do I make the trace of pK plus 1 minimum with appropriate choice of K that is the minimization problem that we have already involved in this expression is a quadratic in K so this gives raise to a quadratic minimization problem so given this quadratic minimization problem I can I need to find K go back I need to find K such that it minimizes a trace of pK plus 1 which was given in the previous page. So by method of perfecting the squares again I am going back to the exercises I have done earlier in the context of Gauss to Kalman I am doing exactly the same thing the mathematics are absolutely similar so I am trying to rewrite the equation for pK plus 1 hat this is the method of perfecting the perfect square the method of perfecting the square if I want to be able to now look at the expression on the right hand side this is another method for minimizing in the earlier case what did we do we computed the we minimized the ith term the I ith term in other words we minimized pK plus 1 I I with respect to a given row of K minimize with respect to the ith row in here I mean demonstrating another basic principle we are simply trying to express the previous quadratic expression as the product of these two this is called method of perfecting the square so this is one term this is another term D is known now look at the structure now pK f is the sum of three terms first term is known that is independent of K second term is known that is independent of K the third term is known it depends on K so if some term does not depend on K I cannot choose K to be able to change it so the only way to be able to affect K is to make those terms that are dependent on K 0 therefore by picking K is equal to AK transpose DK inverse I can make this quadratic term to vanish in that case my best or optimal covariance is given by this expression where AK is the matrix already known DK is the matrix that is already known I would like to emphasize the following fact that this derivation that we had given is the generalization of the linear filter derivation if m fx is equal to m times x h f is equal to h h times x this derivation I had given in the past 5 6 slides reduces to the derivation of the Kalman filter the moment dynamics for the Kalman filter so by camouflaging the difficulty in computing certain conditional expectation and we are able to derive the moment dynamics the first moment dynamics and second moment dynamics the mean and the covariance forecast mean forecast covariance analysis mean analysis covariance pushing into the background the details of the or the difficulties of the conditional expectation computation but giving them a name the I we know that exists by giving them a name I can I do not have to worry about the computability at this time I simply can carry on the derivation so we have completed the derivation of the dynamics of the first two moments in any general non-linear filtering equation which parallels the development in the linear Kalman case and how do we know if this parallels the development of the linear Kalman case if you set m fx is equal to m times x h f is equal to h times x our derivation essentially reduces to the Kalman filter equation so it in that sense there is nesting it is in that sense it is it is a parallel derivation of the filter equation especially the moment dynamics for the non-linear case I hope this part is clear so in the in the earlier part we talked about the updating of the updating of the distributions now we have talked about updating of the first moment and second moment assuming such filter density assuming such predicted density exists they do we may not know it exactly but I can handle it mathematically that is what we have done and this derivation again parallels the development of the linear minimum variance estimation it essentially rests on the fundamental principle the conditional expectation is the best mean square estimate so that is the fundamental statistical facts as fundamental statistical fact the whole derivation rests on I think it is in that sense it is a unification of the derivation of moments both in the non-linear case as well as the linear case I hope I hope the reader will appreciate the the the parallels and the role of conditional expectations and so on with this as a background now I am going to consider specific approximations so that gives raise to approximation to moment dynamics so until now the moment dynamics I have considered are are are exact in the sense are exact in the sense even though I do not know how to compute them such a thing exists let me plow through I got what I want but once you realize there are certain quantities which cannot be actually computed we begin approximation when we start doing approximations we get the notion of approximate moment dynamics so approximate moment dynamics there are several degrees of approximation first order approximation depends only on first order Taylor series expansion of non-linear quantities second order approximation rests on second order Taylor series expansion of non-linear conditional expectations so first I am going to derive approximation for the second order filter what is the second order filter the filter equations are approximate but they are approximate up to the second order term the second order term in the use of Taylor series up in the approximation now you can see wherever there is a approximation Taylor always comes to your rescue if you use Taylor what is the advantage I can cut the approximation at any order of accuracy in all practices we generally are able to handle the first order accuracy second order accuracy because that is what mostly used in practice so to derive the non-linear errors we are going to approximate them so what are the non-linear error terms go back f of k is equal to I think earlier we had we had denoted the lower case f of k let me go back and talk to you about it okay in here in page 9 in slide 19 f of k is equal to m of x k minus m hat of x k that is the same expression that is given in here too so I am seeking second order approximations look back even though I have called it capital f of k this we have earlier defined this to be we have earlier defined this to be defined this to be f of k g k is again the same kind of thing with respect to h but at time k plus 1 so I am now going to be concerned the forecaster m of x k so what is that I am trying to do I am assuming I know x hat k I also know m of m hat of x k is not equal to m of x k hat so I am going to approximate m hat of I am going to approximate this in the neighbourhood of x bar k so that is what I am trying to do now so m of x k according to the second order Taylor series is m of x k bar so I am trying to do everything around approximation around we had so I have Jacobian times the error I have this second order term in here where dm is the Jacobian and this is the dm is the vector that depends on the Hessian term so you can see this is the quadratic pop with respect to the Hessian of m 1 quadratic form with respect to Hessian of m 2 quadratic form with respect to Hessian of m n we have already seen these things in a module on multivariate calculus how to have second order Taylor's expansion for maps so it essentially comes from one of the early slides on on multivariate calculus now I would like to be able to take the conditional expectations on both sides I have to take the conditional expectations of both sides of this equation conditional expectations on both sides of this equation given z k the conditional expectations on both sides given z k is given by m bar of x k is given by m bar of x k and that is equal to so by taking the conditional expectations on both sides of this I get the first term I get the first term I get the expected value of the third term now if you go back to the second term consists of the second term consists of the Jacobian at x k bar and e k so if I take the expectation of d m x k hat e k hat given given observation z 1 to k that is equal to d m x k hat times e of x k hat given z 1 k and we have already shown that is 0 that this is 0 so in view of that even though there are 3 terms on the right hand side if I take the conditional expectation I get only 2 now I am going to give a little example to be able to illustrate these calculations so let let y be equal to y 1 y 2 let the the covariance of y is given by expected value of y y transpose which is given by this let us assume the given by this matrix let us also assume a is a matrix which is symmetric so y is a random vector with this covariance matrix a is a symmetric matrix I am now considering positive definite quadratic form which is y transpose ay I am trying to compute the the expected value of this positive definite quadratic form and that by substituting this in here is the sum of 3 expectations because expectation of the sum of the sum of the expectations expectation of y 1 is sigma 1 square expectation of y 1 y 2 is sigma this must be 2 b sigma 1 2 plus e times sigma 2 square it can be verified this is simply the trace of the matrix AP trace of the matrix AP is get get can be written as a times e of expected y transpose y the trace of the expectation expectation of the trace they commute therefore this is equal to expectation of the trace of ay y transpose the trace remains invariant under the cyclic permutation therefore trace is equal to y transpose ay y transpose ay is a scalar trace of a scalar is itself so that is equal to expected value of that therefore we have come one circle around that tells you the details of the calculations with respect to computing the expected value of this why are we interested in this computation let us go back to this term what is this term from the previous slide this term is a vector each component of this is a quadratic form the delta square m 1 delta square m 2 delta square m 1 they are all Hessian matrices they are symmetric. So y plays the role of ek a plays the role of the Hessian therefore expectation of this so the sorry therefore the expectation of the vector of Hessians or this vector of scalar product let me let me talk about that once more sorry. So what is this expectation has got to do with this is one half of expected value of e 1 ek transpose del square m 1 ek ek transpose del square m 2 ek all they have to ek transpose del square m 1 ek if I take the expectation of a vector that is equal to expectation of the individual component of it expectation of the individual components which is of the form e of ek delta square mi ek and this is a Hessian matrix this looks like this looks like this term this looks like this term that is why that particular term is this particular example is very meaningful I hope the relations are very clear I am just trying to give this example to be able to manipulate the expected value of the second order term in the Taylor series expansion that is a vector each of the component is a quadratic form so if I know how to compute the expectations of the quadratic form I can compute the expectations of the individual elements of this vector and hence this example provides you a handle on how to compute the conditional expectation and the second term in the first equation on the top of the slide 26 hope that is clear yes it is mouth full it takes 5 more than 5 minutes to be able to type this but I am trying to spend less than half a minute but you know the basic steps. So using that example I am interested in computing the quadratic form so this is the row vector this is the matrix this is the column vector so from the previous example I can say this is equal to trace so from the previous example what is the formula here expected value the quadratic form is equal to the trace of a times expected value of y transpose that is the equation I am trying to use here. So this is this is a sorry this is the matrix a this is the expected value the expected value of this pk so this is the Hessian this is the covariance matrix p hat so we have computed we have computed the expected value of the term on the right hand side of the first equation so with that I am going to derive a vector a vector of second order corrections so this is the vector of second order corrections each of the terms are induced by term type terms of this type the middle term was already 0 therefore the forecast using the second order approximation is equal to is exact values m hat its approximate values m of xk bar plus the second order correction that is the that is the real kicker so this is called the second order correction to the forecast let me write that down you can see you will learn lot of probability manipulations when you when you do this kinds of computation that further helps you to visualize the power of the statistical arguments and the interaction between statistics and matrix vector manipulations I would like to anticipate suppose I do not consider the second order approximation I only consider the first order approximation the delta square will be 0 so you can readily see if you make a first order forecast that is essentially the first order forecast is essentially equal to m of xk hat this is the approximation the actual value of the forecast is xk plus 1f is equal to m hat of xk so I am trying to replace this by this that is the first order in the second order what do I do I add the second order correction term therefore second order second order forecast must be more accurate than the first order forecast where do you when do you consider what is the right order for the approximation it depends on the degree of nonlinearity now if you go back what does the second order correction term depends look at this now delta square mi is the hessian of the ith component of the model map if the model is mildly linear I am sorry mildly nonlinear the second derivative may not be too high in that case you can you can essentially get away with first order approximation if your model is such that the second derivative the hessian of each of the component is strong first order approximation won't cut it for making forecast second order approximation is more meaningful so this essentially tells you by appropriately controlling the order of terms in the Taylor series expansion I can improve the accuracy of the forecast so this is second order accurate forecast so this is the second order forecast I also want to remind you first order forecast is lot easier why first order forecast is lost easier I already know xk hat from the previous step I simply be able to evaluate the function map at the previous step so that is the first order but you be able to compute the second order I have to do lot more arithmetic so second order forecast is definitely more accurate but it is computationally more expensive so what does it bring you it brings you the accuracy versus time trade off so anybody who is involved in approximating any quantity the degree of approximation and the the degree of approximation relate the quality of approximation the cost of computing the approximation is will be the ultimate judge in trying to decide the order that we will feel comfortable with so using using the derivation that parallels the linear carbon filter by trying to approximate m hat of by trying to approximate m hat of xk around xcat k around xcat k we have tried to arrive at a second order correct forecast now if I am going to do a second order correct forecast what is its covariance so that is the level now I have to talk about the covariance in order to be able to compute the covariance again I am going to go back to the error in this forecast the error in the second order forecast can be written like this where dm is the model Jacobian which we already know eta k comes from the second order term eta k comes from the second order term we already know the error is equal to f of k plus wk plus 1 therefore the forecast covariance is equal to from here this equation follows very easily the forecast covariance what is the forecast covariance pk plus 1 f is equal to expected value of e k plus 1 f times e k plus 1 f to the power f transpose so that is the that is the formula so when I apply this formula using this because f of k and wk plus 1 are not correlated it reduces to two term the cross term vanish f of k is given by the expression that we have seen earlier so if I substitute all these things and and and and and and bulldoze all the details I get these terms so this is the expression for the forecast error covariance when the forecast is second order accurate now let us take some time to be able to compute all the all all the all the expectations all the conditional expectations in here eta k is quadratic in e k so now let us let us look at this now this is the most important part of the whole step sorry so this is where we are eta k is quadratic in e k this gives this gives to what is called the moment closure problem the this gives us to what is the moment closure problem so what is the moment closure problem if I want to compute the second moment that depends on higher higher higher order moment if I want to be able to compute the first moment that depends on the higher order moment so let us look back in here Xk plus 1f that is the conditional expectation for the forecast so the conditional forecast depends on the second moment. So, first moment depends on the second moment, second moment depends on third moment, fourth moment. So, what does that tell you? You cannot compute these moments in a closed form because each one lower moment depends on the higher moment. This provides a computational difficulty that difficulty has been around for a long time in all nonlinear problem especially in turbulence they always deal with this problem which is called moment closure problem. So what does it mean? If a second order moment depends on the previous second order term and the higher order term they will simply approximate by dropping the higher order terms that is what is called a simpler moment closure approximation problem. The same moment closure approximation problem comes in here. So dropping please understand I am trying to compute pk plus 1 f, pk plus 1 is second moment. Where are the third moment terms comes in? So let us look at this now. This term ek plus 1 times eta k, eta k depends on the second order term, ek depends on the first order term. The product of ek hat and eta k third order term therefore you can readily see the moment closure problem coming and derailing our ambition to be able to improve the accuracy. So what is our aim? Our aim was to be able to use second order term to improve the accuracy for the first moment which you have already accomplished. Now using that approximation for the first moment I am trying to develop an approximation for the second moment but the expression for the second moment depends on second moments of other quantities and third moments of related quantities. So when I am trying to compute if I need third moment if I do not know the third moment I can compute the second moment. So that is the issue in here. So what are we going to do? Approximation ideas are very clear. Moment closure problem this is intrinsic to nonlinear problem and how do we tackle this? We simply close our eyes and drop all those terms that we do not know. So in trying to get a second order approximation drop the third order and higher order approximation that actually leaves that essentially leaves only these two terms I am sorry that essentially leaves with only these two terms the following terms are dropped. So once I drop that my approximation to the second order I am sorry my second order approximation to the second moment is given by this. So there are two things to be considered one what is the moment that is being approximated second what is the order of approximation. So first moment depends on certain second order terms second moment dependent on higher order terms. So the moment closure problem shows as ugly face. So by considering simple solution to this moment closure problem by dropping the third degree and higher degree term we get the expression for the forecast which is second order accurate what is that this is the forecast dynamics this is the moment dynamics. I would like to remind you this is very similar to the Kalman filter equation in the Kalman filter equation what is that we have forecast is equal to m times pk m transpose plus qk plus 1. So this is very similar to that dm is the Jacobian of the dm transpose is the Jacobian transpose. So when m is m of x is equal to m times x dm dm of x is equal to m. Therefore this relation even though we say this is second order accurate it looks like the Kalman filter linear case. So you can see the analogy between this equation and that equation. So I have derived the expression for the approximate evolution of the forecast and the forecast error covariance. Now we can do the same thing for the data simulation step I have the expression for the analysis that is equal to forecast plus the Kalman gain times the innovation. Now hk plus 1 I can express hk plus 1 in the form of forecast now look at this now to be able to make a forecast I am anchoring on the previous analysis to be able to make the analysis I am going to make a banking on the previous forecast. The forecast depends on the xk plus 1 forecast depends on the previous analysis and the analysis depends on the forecast or we can even put it xk plus 1 is equal to xk plus 1 that relation still holds good. Therefore I can approximate h of xk plus 1 is equal to h of xk plus 1 f plus this again this is the first order term this is second order term taking again the conditional expectation conditional expectation we already have h bar what is h bar please remember h bar of xk is equal to e of h of xk given z 1 to k all these are conditional expectations. The conditional expectation of e k plus 1 f is 0 therefore this is the second order accurate expression for h bar where delta square h again follows the same second order Taylor series approximation by in view of the example I have already incorporated all the information so this is the second order correction term this is the second order correction term I hope that is clear I am doing exactly similar to what I did in the case of a model map except here the map is h therefore I want to substitute back so the analysis. So what is the second order accurate analysis the second order accurate analysis is equal to second order accurate forecast plus Kalman gain times the second order accurate innovation so this is the second order accurate innovation second order accurate innovation this is the second order accurate innovation now we want to be able to compute so I have already approximated the analysis I would like to be able to approximate the analysis covariance analysis covariance is given by this expression from our definition we already know ak we already know gk we already know ck we already know dk please remember these are the derivations I had already given I am now going to substitute manipulate the whole thing therefore g of this is given by is given by is given by xk plus 1 there is no in here and I am going to express this by a second order Taylor series minus this so gk I am sorry there is a there is a there are a term in here so this must be please go back I would like to go back and tell g there is a term in here that is missing I would like to go back to the definition of gk so you look at this now at the top of the page 23 gk was defined h of xk plus 1 minus h hat of xk plus 1 that is what I am now going to have to pull in here so this is going to be h plus 1 let us go back to 23 once more so this is please remember that this is the expected value which is being subtracted from h of xk I want to remember that so gk is a kind of an anomaly in the in the in the in the nonlinear conditional expectation and that is that is what being done in here that is what being done in here so if I expanded this on the on the Taylor series expansion I know the Taylor series expansion is given by this I know the expansion for this is given by this so I have to subtract these two quantities if I subtracted these two quantities my gk now takes the form which is given by this now z my my my xi k much like the eta k previously is given by this gobbly goo expression this gobbly goo expression essentially relates to the error in the second order approximation which is again you can think of its second order anomaly if you wish to call it so ak is given by by definition is this if I substitute the value of gk from the previous consideration ak can be seen to be equal to this quantity now I would like to be able to compute show that ck is given by I know I am I am I am going little little little too fast for many of you but I want you to understand these calculations are very simple I want to keep repeating that so dk is given by this expression kk is given by this expression pk plus 1 in a in in the end is given by this expression that is the expression for the analysis covariance for the expression for the analysis covariance so with this I am now going to summarize this is the final summary of all the things we have done so far so I am going to give you the expression for the second order filter the second order filter so it is given in a form that you can right away right away the program so this is the model equation this is the observation look at this now model is non-linear observations are non-linear we have the standard assumptions about wk vk I also have standard assumption assumptions over x0 x0 is given by m of m0 p0 therefore what is the second order accurate forecast this is the second order accurate forecast what is the second order accurate forecast covariance that is given by that what is the second order accurate analysis that is that now please understand what is this term that makes the second order accurate this is the second order term that affects the forecast likewise this is a second order term that affects the analysis so this is the second order term that affects the analysis that is a second order term affects the forecast that is why both the forecast and analysis are second order accurate these the even though the expression looks like the first order expression because of the closure we end up having the second order accurate forecast covariance like this second order accurate analysis covariance by this and the second order accurate Kalman gain is this. Now look at this now everything is approximation forecast is an approximation forecast covariance is an approximation Kalman gain is an approximation analysis is an approximation so if you think back hey this is the best you could do in the case of non-linear system second order approximation is the best you could do. So this filter in the literature has come to be called second order filter that helps you to approximate the evolution of the state as a function of time as a function of time. So you can really see forecast step and the analysis step they go hand in hand much like the Kalman filter equations are. So this is the sequential second order accurate moment dynamics for the non-linear filter so that is the whole description of this is the second order accurate evolution of first moment and second moment of the forecast and the analysis within the context of non-linear model and non-linear observation. It turns out that this equation reduces to the Kalman filter equation when the model is linear I am going to approach I am going to establish this now. If the model map mfx is equal to mfx dxm is equal to m the del square m is 0 I should say del square mi is 0 because each term if del square mi is 0 the second order terms are 0 and the whole thing reduces to the Kalman filter equations therefore in this case second order filter implies reduces to the classical Kalman filter. So in this sense this is an extension when do you say a is an extension of b a is said to be an extension of b when you specialise a becomes b that is called nesting. If an extension does not have this natural nesting property then the extension does not have much it cannot hold much water the extension argument cannot hold much water. So to say something is an extension of something else I should be able to get that something else from the extended value if I set certain parameters if I specialise at certain parameters to extreme values. So in that sense you can see the consistency please understand in my derivation of the second moment in my derivation of the approximate moment dynamics I showed that our derivation parallels Kalman filter derivation and it reduces the Kalman filter when you make appropriate choices again I am trying to demonstrate the same thing. So this allows us to be able to maintain the beauty of nesting when you go from special to general or general to special. If you set all the moments all the second order moments to 0 you get what is called the first order filter first order filter in the literature is called extended Kalman filter. So extended Kalman filters which many of you may have heard of what is extension Kalman filter once Kalman filter was announced in 1960, 61 very soon they were interested in extending to non-linear cases they met with lots of difficulties so they started approximating the first approximation that was developed within the context of space travel was essentially extended Kalman filter extended Kalman filter in our notation is essentially a first order filter first order filter is obtained from second order filter simply by setting the second moments to 0. So this is the model dynamics this is the observation both are non-linear this is the forecast step this is the forecast analysis the forecast covariance I would like to remind you two things now the forecast covariance the expression for it exactly the same in the first moment second moment but the trajectory of the second moment dynamics second moment are going to be different therefore even though the expressions look the same the actual values will be different because the forecast trajectories in this case and in the previous case are slightly different because the second order term is going to affect the forecast trajectory. So one thing we have to remember while the expressions may look the same the actual trajectories will not because the second order term alter the trajectory compared to the first order term. So the forecast step again is similar to what we have except the second order correction term the data simulation step is again that that looks very much like the linear Kalman case this is the Kalman gain this is the analysis forecast these are again very similar to the second order filter what is the only difference the only term that I marked by second order approximation terms are absent in the in the in the in the table in 35 compared to the 1 in 34. So that essentially completes our our derivation of moment approximation to the non-linear filter. So general order general expression for the moment dynamics second order approximation first order approximation. So what is that so if you are given a non-linear system if if the system is not too big you can apply the first order moment equation or the first order dynamics second order filter so you can experiment with by taking a small simple problem you can solve the problem by second order filter you can solve the problem by first order filter you can plot the trajectories of forecast from first order versus second order you can also plot the trajectories from first order filter analysis of it the the analysis from both the cases. So how do the analysis differ with the order of approximation to me that is a very good in an interesting exercise it could be a part of a classroom computer related project on a model chosen and it turns out this if you change the model the quality and the quantitative differences between these approximation may not hold across various models. Therefore when you want to be able to apply non-linear filters to small dimensional problem it is better to do it in in in slightly different ways first order filter second order filter and then compare the performance and then compare the performance. So that will be a very very nice interesting class project I often in my in my teaching I give these projects in in in my class and they are extremely very educative the systems I give I do not give large dimensional system two dimensional three dimensional. So what is the typical system I I I like suppose there is a object falling freely from from the sky it has been falling for so long that the acceleration is countered by with a friction. So Stokes law comes into effect and the and the particle is dissenting with a constant speed. So the the the vertically it has attained the terminal so-called terminal velocity if a if in a friction filled medium if a if a if a if a particle is dropping down Stokes law essentially tells you it will reach a terminal velocity where the acceleration is going to be countered by the friction I am putting a radar at at the bottom I am trying to observe the position and the velocity of the particle through the radar and I am asking them to be able to assimilate a non-linear model for the for the for the free body and radar observations. So it is a very simple educative model using which one can bring out various discussions relating to non-linear filters non-linear approximation quality of non-linear approximations. So we have talked about second order filter first order filter I have now I am I am now going to take couple of minutes in talking about another Kalman filter equation these exercises are taken from our book this is exercise 29.5 because this exercise is important is important I am going to talk about this this is called linearized Kalman filter what is the linearized Kalman filter I have a model equation XK plus 1 is equal to MFXK. So what do I do I pick a state X naught I compute the non-linear trajectory X1 X2 XK then I induce a perturbation to X naught if you want to call it the unperturbed state it could be a bar state it could be a base state the perturbed state X naught is equal to X bar naught plus delta X naught it comes to X1 it comes to X2 it comes to XK. So what is that we now know delta XK is equal to D at XK of M delta XK plus 1. So let me let me write that equation clearly sorry let me write that equation little bit carefully delta XK plus 1 is equal to D at XK of M delta XK this is the propagation of the of the perturbation this is called in mathematics variational equation meteorologist calls a tangent linear system we have already come across this equation in the context of 4D war and that is this equation. So for this linear system delta X naught is initial condition so what is that I do now I am having a non-linear system I consider the base trajectory this is the base trajectory and I superimpose a perturbation I superimpose an initial perturbation and talk about the dynamics of evolution of perturbation that is a linear equation excuse me this linear equation essentially tells you how the superimposed initial perturbation propagates at the top of the base state. So now forget about the original non-linear model consider this perturbed model this is a linear model. Now let us assume I have been given observations originally so let us assume originally I have been given observation which is ZK is equal to H of XK plus V XK plus V so now what am I going to do I am going to consider increments the observation so I am going to linearize ZK along the base trajectory therefore delta ZK is equal to ZK minus H of X bar K what is X bar K is the base trajectory to a first order approximation I can say delta ZK is given by this first order quantities. So look at this now I have this as the model equation my observation equation is going to be delta ZK is equal to H times I am sorry this is not right this is equal to D of XK bar of H times delta XK. So look at this this equation looks like XK plus 1 is equal to AK XK this equation looks like ZK is equal to H of XK so you have a linearized observation you have linearized model now add some noise to this linearized model to get this. So this becomes a linearized stochastic model this becomes a linearized observation if you have a linear model and the linear observation I can do a classical Kalman filter that filter going to give you an approximate estimate of the forecast and the approximate estimate of the analysis for the perturbed system by adding the perturbed forecast and the analysis the perturbed analysis to the base I will get the actual so if I know the increment if I know the base by adding the increment to the base I know the actual. So what is the what is the order of approximation here this is called 0th order filter so we talked about second order approximation we talked about first order approximation now this is called the 0th order approximation to the non-linear filter so what do we do we simply create a linearized variational equation for the evolution of the model of the perturbation across the base state then we consider a linearized version of the observation we throw the original models out you have you you you consider the linear model a linear observation as you are given model you do a linear Kalman filter a classical filter you compute the analysis you compute the forecast you add them to the base state you get the actual forecast and the approximation to the actual forecast and approximation to the actual analysis. So this is called the 0th order filter or the linearized Kalman filter is a very interesting exercise so you can take a simple non-linear problem and do it in 3 ways and look at the quality of approximation in and what do I get what do I lose what is the computational cost this could be a very interesting exercise and this module has been a summary of our chapter 29 and that completes our discussion of non-linear filter yes this module is very dense because non-linear problems are not easy so we have we have on one part the stochastic dynamics the Markov property on the other hand we have non-linearity on one hand we have conditional Gaussian distribution on the other hand the conditional Gaussian distribution do not transfer itself as Gaussian for the predictive density so we have tried to deal with it exactly as far as we can and derive the update in the infinite dimensional space then we want to come to the real world of finite dimensional computations when we came from infinite dimensional space to finite dimensional space we had moment approximation but moment approximation even though is an approximation even this approximation is riddled with what is called a closure problem so I have to tackle the approximation at several levels so if you superimpose one level of approximation to another level of approximation to another level of approximation ultimately when you when the fog clears you can have essentially a second order accurate filter first order accurate filter 0th order accurate filter these three are considered to be meaningful approximations to the non-linear filter problem with this we come to the end of the discussion of non-linear filters thank you.