 So, far we solve deterministic least square problems both linear non-linear well posed ill posed in what is called an offline mode. What is an offline mode? An offline mode is a way of solving problems where we assume all the observations are available at a given time, but in some cases we may not be able to wait until all the observations are collected. So, as and when the observations come in we would like to be able to make an estimate contingent on a given set of observations. In such cases we would like to be able to improve the quality of the estimates as and when new observations come into play. Such way of attacking the problem is called online as opposed to offline. This is also called sequential methods because we are going to be updating the estimate as and when new observations arrive on the scene. They are also called recursive. So, online sequential recursive these are common terms utilized to represent the concept of being able to update when a new information is available. We are still going to be solving the least square problems and I am going to provide a broad overview of this class of online algorithms. So, we have assumed another way of reemphasizing this. We have assumed that the number of observations M is fixed. Please remember Z belongs to Rm. So, what does it mean? Then there are M observation given to us and that is it. With that we have to solve the problem. So, we assume the vector Z to be a vector of a fixed length M. This treatment was known as the fixed sample or offline version of the least square problems. Why is the fixed sample? Because M is fixed. M is the size or the number of observations. Z is the vector. Observation vector has the M components. If M is fixed everything is fixed. So, what does it mean? I go to the lab make measurements. I make M individual observation I collect them into a vector. I close the lab then I come back to do the analysis that is what is called offline. It is conceivable on the other hand that observations all the observations may not be known in advance. They may be arriving in a sequence one at a time. That could be a delay between the occurrence of each of these observations. So, in that case it is prudent to ask a question what is the optimal estimate? What is XLS based on M observation? So, I am now going to associate the number of measurements I have used in estimating the least square value of X the unknown. So, XLS is the least square estimate of X but XLS is conditioned on having M observations. So, when a new observation comes in I would like to be able to update XLS to XLM plus 1. So, this gives you a flavor of what we mean by online as opposed to offline. So, let us try to formulate this problem. Let X be Rn let X be a vector in Rn let Z be a vector in Rm X is the unknown Z is the known observations. In the previous lectures we have already seen XL as M is given by H transpose H inverse H transpose Z. I am assuming an over determined case in here and that denotes the optimal estimate based on M observations that comes from the basic theory of linear least square deterministic inverse problems. Now, let us pretend ZM a real observation comes into play I would like to be able to estimate XL as M plus 1 this is the new one that is the old one I would need to convert this into this when the M plus 1 the observation come to the picture. So, what is the basic idea as and when a new information is given how am I going to update my old belief into a new belief or a new estimate. So, this is the new observation. So, I have an old estimate I have a new observation I would like it I would like to get a new estimate. So, that is a sequential problem. So, the original vector Z is given in here this is vector consisting of M this is only one that that there is only one observation. So, I am so in the components of the vector Z are denoted by Z1 Z2 Zm. So, in that notation the M plus 1th element as ZM plus 1. So, my model essentially says the linear model. So, Z is equal to HFX the old model ZM plus 1 is equal to HM plus 1 transpose X as the addition additional row to the matrix H I need to be able to explain the new observations all of it may go back to my particle moving in a straight line. I had Z1 Z2 Z3 I had Z1 Z2 Z3 I had this I had 1 T1 1 T2 1 T3 I had Z0 V now suppose somebody gives you a fourth observation Z4 the fourth observation is given a time T4. So, Z4 is equal to Z0 points VT4. So, this is the new row this is the new observation now I hope the extension is very clear now. So, the last row of H is meant to relate the new observation to the unknown through the model equations. So, that is how ZM HM plus 1 transpose comes into plan. So what does it mean HM plus 1 is a column vector transpose of HM plus 1 is a row. So, this is the row. So, H in the in this case HM in this case HM is equal to is equal to 1 T4. So, HM transpose is 1 T4 I hope you understand the form of the HM transpose. So, this is the partitioned form of matrix ZM plus 1 is the new observation I want to emphasize that. So, again please remember H is this matrix HM plus 1 is this HM plus 1 is the new row added to H to account for the new observation using this example I have already talked about that. Therefore, now what do I want to do I would like to be able to find the least square estimate is the linear problem. So, I am going to consider the residual the residual until now we simply said residual RFX because the number of observations are fixed. Now I am going to consider residual based on M observations residual based on M plus 1 observations that is the residual RM plus 1 RM plus 1 is equal to Z minus HX ZM plus 1 minus H transpose this. So, observation minus the model. So, this essentially comes from the old this essentially comes from the new Z minus HM is or MFX. Therefore, the new residual with the M plus 1 observation is the old residual the M observation and the new component in here. Therefore, I can you can see the recursive nature RM plus 1 depends on RM and the new information is the new information. So, the new residual vector has M plus 1 it uses M plus 1 observation whereas RMX is essentially the residual from M observations. So, I am able to I am able to relate residual as a function of the number of observations. We already know XL as M minimizes RMFX that is the optimal solution when I have M observations. So, the goal is to find the XL as M plus 1 that minimizes the norm of the new. So, we already know how to minimize this this is already know we want to know how to minimize this. So, that is that is that is the key part of the problem formulation. Therefore, I am going to follow the same route as I have done in linearly square problems the here M plus 1 that is again the sum of the square of the residual when there are M plus 1 observation the sum of the square of the residual from M plus observations are given by this. If you have so from simple calculation of the norm this norm is equal to the norm of the first M components plus the norm of the new component that is added. I hope that is clear to you for example what that I have the norm of if I have if I am going to talk about the norm of vector X and Y we already know this is the norm of X plus the square of that is X square plus Y square. So, X plays the role of RMX M M plus 1 X Y plays the role of the increment. Therefore, this sum of square residual is equal to the sum of two terms, but this is exactly RM this RM square essentially is equal to FM. Therefore, what is that we have got we have got a recursive relation or a sequential relation the sum of square error using M plus 1 observation is equal to sum of square error using M observations plus an increment. So, this is the sum of square observation using M plus 1 this is sum of square observation using M and this is M plus 1 observations. So, you can see FM is recursively defined in M the number of observations. So, this added so this is an additive structure this is a recursive relation for the square of the residual it quite basic and it brings out clearly the contribution of the new observation. So, this is the contribution of the new observation this is the old observation. So, this is the total when I have all the observation together. So, what is that I want I want like to be able to find X L as M plus 1 X L as M plus 1 must be the minimizer of F of M plus 1 X we already know the minimizer for this. So, knowing the minimizer for this I would like to be able to compute the minimizer for F of M plus 1 that is the mathematical problem again I would like to be able to find the optimal solution let me go back. So, I would like to be able to compute the gradient of FM plus 1 which is 4 the I also would like to be able to compute the Hessian of F of M plus 1 in 4 and that is what we are going to accomplish in this slide. So, if you consider the gradient if you consider the gradient the gradient of this is equal to the old gradient plus an increment term. The increment term essentially comes from this the second term in 4 in equation 4. We already know the expression for the gradient of the first M terms from our previous analysis and that gives raise to this. So, if I substituted 6 and 5 I get the expression for the gradient using M plus 1 observation I set it to 0. If I set that to 0 I get a new set of normal equations. Now, look at this that normal equation is given by this matrix times X plus this plus this. Now, if you set H M plus 1 is equal to 0 vector this becomes H transpose H F of X is equal to H transpose Z that is the old problem using M observations. If you bring in the Mth observation I get this increment in here I get this increment in here. Therefore, I can express the optimal solution using M plus 1 observation as the inverse of this matrix. Now, look at this now this is an outer product matrix this is an outer product matrix. So, using this outer product matrix is added to the original matrix I am assuming H is a full rank. So, this is the matrix of full rank and I am going to add a rank 1 outer product matrix that inverse times this is the new. So, you can see the beautiful structure using the relation 8. So, I have an old term I have a new term the old term and the new term mix beautifully in a hue. So, if I can compute this I have an expression for the recursive way of updating the estimate when you go from M to M plus 1. Now, in order to be able to express the sum the inverse of the sum. So, what is the real computational challenge in here equation 8 looks very familiar it is exactly the same equation we have seen except that there is one more term that is all what it is. So, the question here is that am I going to compute the inverse that is needed in 8 from ground 0 all over again or I can update the old inverse by adding a new correction term to compute the inverse of this new term that is a question one want to ask an answer and to that end I am going to invoke to a very well known formula that we have already alluded to when we did the module on matrices is called Sherman-Marrison Woodbury formula in order to be able to apply the Sherman-Marrison Woodbury formula I am going to slightly change the notation to make it convenient to for discussion. So, let us try to call the matrix H transpose H as P let us simply call H M plus 1 as H I am simply dropping the subscript here there I am replacing H transpose H as P in order to simplify the notation we already know if H is a full rank then H transpose H is P that is SPD symmetric and positive definite again that is a result from matrix theory H H transpose is an outer product matrix outer product matrices are always of rank 1. Therefore, the solution calls for computing the inverse of P plus H H transpose so what is the question here if I know the question is if I know the inverse of P can I compute the inverse of the sum and that is what is related through the Sherman-Marrison Woodbury formula the standard Sherman-Marrison Woodbury formula that was given in the module on matrices essentially tells us P plus H H transpose inverse is equal to P inverse minus the ratio of these two terms now look at the right hand side every term in here is known so what is the assumption in here if I know the inverse of P can I compute the inverse of P plus H H transpose the answer is yes that is where the Sherman-Marrison Woodbury formula comes into play. So, if I want to be able to compute the inverse of the sum I simply need to be able to have a correction term that is a correction term please look at the numerator the numerator is a matrix P inverse is a matrix H H transpose is an outer product matrix P inverse is a matrix. So, it is the product of matrices the denominator 1 is a scalar H transpose P inverse H that is a quadratic form that is a scalar so I am going to now to simplify notation call the denominator 1 plus H transpose P inverse H is alpha. So, 1 over alpha is alpha inverse I am going to interpose alpha inverse into this formula therefore the inverse of the sum is given by inverse of P minus the correction term the correction term has a very beautiful structure here it is this structure it is this formula that is going to enable us to be able to go from offline to online. So, you can see the importance of matrix theory comes into play through this ability to compute the inverse of a rank 1 update. So, this is this matrix is called rank 1 update again I want to emphasize if you are interested simply in application you can take these for granted but what am I doing I am not simply giving you the algorithms that you could readily use I am also going behind the theory why if you understand the theory of these very well that will enable you to be able to strike out newer algorithms by reformulating the problem in several different ways and I am trying to provide or many of the mathematical tools that has proven useful in the past discovery of several algorithms related to data assimilation. So, that is the scope of these lectures therefore I would like to be able to reconsider my problem I am going to be I am going to be able to express the solution using M plus 1 as this formula this is the formula that was already derived in equation 8. I am rewriting equation 8 using the new notation please remember that P is equal to H transpose H my H is equal to H of M plus 1 it is a simpler version of the of the of the same formula. Now I could apply the Sherman-Marrison Woodbury formula for this and that gives raise to this relation therefore the same solution in here is equal to the inverse computed explicitly please remember there is an inverse here there is no inverse and this is the same term that comes from here therefore I would like to be able to I would like to be able to relate the two terms I would like to be able to relate the two terms this term comes in here that term comes in here this is given by the Sherman-Marrison Woodbury formula that is given by the Sherman-Marrison Woodbury formula therefore I have expressed this as a product of two terms each of which have two factors here are two factors here are two factors you can multiply these and I also I would like you to be able to remember that P inverse H transpose Z is equal to this that is equal to XLM. So if you multiply this and use this fact this equation XLM now becomes this expression becomes this expression so what does this say this is beautiful expression this is the old estimate this is the new estimate so the estimate using M plus an observation is based on estimate using M observation plus the one that comes as a correction term this correction term looks pretty complex here but it can be very easily simplified to be able to simplify this I am going to use the proper I am going to insert alpha in here so alpha please remember alpha is equal to 1 plus H transpose P inverse H therefore H transpose P inverse H is equal to alpha minus 1. So I have substituted those terms in here if I substitute those terms and simplify I get this term so if you substitute and simplify 13 I have substituted 13 into 12 and simplify you get the new estimate the new estimate structure is absolutely beautiful structure this is the old estimate this is the correction term the correction term has a weight so this is the weight this is the weight term and what is ZM plus 1 ZM plus 1 is the new observation the new ops and what is this this is the model predicted is the predicted model predicted observation using M plus 1 using the previous estimate. So this term ZM plus 1 minus H is called the innovation or the new information from observing the new observation and this is called the increment to so this is the increment to the estimate this is the increment I this is going to be the increment. So now you can readily see the recursive nature coming into play so if I have estimate based on M observation if you give me the new observation I can extract the new information from that observation weighted by a matrix I get the new estimate this structure is a beautiful structure this structure will occur again in Kalman filtering but Kalman filters are essentially discussed within the stochastic framework but here I am doing everything deterministic deterministic linear least square problems with sequential update. So you can readily see a precursor to Kalman filter even within the even within the deterministic framework we will show later when in one of the later lectures that Sherman Morrison would very formalize also used in the derivation of Kalman filter equations. So you can see Sherman Morrison would very formalize fundamental to the derivation of sequential estimates of the unknown and that is the important thing I would like to be able to emphasize in here I am I can I can rewrite this in several different ways I can call the matrix HM KM as H transpose H and I can rewrite KM inverse to be that using Sherman Morrison would very formula for this you can verify these things these are given in chapter 8 of the book on dynamic data assimilation by Lewis Lakshmi Varahan and Dahl which is the basic textbook from which all these lectures are derived you can verify the relations that is given in here I am going to leave these as simple homework problems. So with that I am now going to be able to say the final way of recursive estimation is given by the estimate using M plus 1 observation is equal to estimate using M observation a metric times the innovation the matrix itself is going to be updated and this update is exactly similar to the Kalman filtering update. So what does it mean the new observation comes in first update the Kalman filter the gain which this is called the gain matrix once you update the gain matrix this is the new information times the new gain matrix plus the world estimate that gives us to a new estimate this is the final recursive form for the estimate as a function of the number of samples involved. I am now going to see how we can utilize this recursive set of to solve a simple problem. Suppose you want to find your weight you made M measurements of your weight the measurements are given in a vector Z1 Z2 Zm it could be that you made your weight in M different scales or you use the same scale but made measurements of your weight in the morning in the afternoon in the evening for several days together. So you have a set of M weights this is the M plus 1th weight if I am going to consider the estimate of your weight based on the first M observation the least square solution in this case I have the simple form Z is equal to hfx h is simply a vector of all once. So Z1 is equal to I do not have to go there so h is all once we have already represented that so h is all one I have an unknown x the unknown x is your weight M is a set of M measurements of your own weight so h is all once. So what is the best estimate of your weight the average of all the weights so average so what does it bring it brings is a very fundamental result average is the best linear estimate of your unknown weight. So x is your weight x is unknown you have M observation unknown is 1 N is 1 M is much greater than 1 is an over determined problem. So in this over determined problem your weight varies from different time to different time so what is your intrinsic weight the best estimate of the your intrinsic weight is simply the average of the measurements so that is what it says the average is the best least square estimate. If you are going to get a new weight day after tomorrow morning and if that weight is ZM plus 1 this is the old estimate of the weight this is the new estimate of the weight I can update your weight based on this information and in this case the gain is simply M plus 1 I can rewrite this equation 18 as x this times M by M plus 1 times the old estimate plus the new estimate these two are the same equations written in a different way. Now you can really see when M goes to infinity KM tends to 0 and the contribution of the innovation terms becomes increasingly smaller if the contribution of the innovation term this is the contribution of the innovation term if that goes to 0 your weight has stabilized in other words XLS converges as M goes to infinity. So this is a very simple illustration of the recursive linear least square setup the setup being one of being able to find our estimate your unknown weight based on M observation this also further brings out the inherent optimality property of the averages have again this intrinsic behavior of being the optimal estimate based on unweighted linear least square problems. So this is perhaps one of the reasons why whenever there are multiple opinions or multiple measurements we take the average to be the one that would utilize R we would use to interpret that note. One can also compute sequentially the notion of sample variance so what is that I am going to state the problem but I am given all the derivations I am going to leave this as a reading assignment but let me tell you quickly suppose I am given a set of data X1 to Xn I want to be able to compute the mean and the variance the mean of X and the variance of X. So let mu k be the mean of X let Xk be the variance of these quantities. So given k quantities I know how to compute the mean is that is given by this algorithm the variance is given by this algorithm but what is that I want suppose I give you a new observation k Xk plus 1 the question is how I am going to be able to update Xk k plus 1 mu k plus 1 so that is essentially the least square problem that we are interested in and that is the sequential update as I give you a new information I would like to be able to update the old information by adding the new information I have exact I have derived the formula for the reclusive estimate I am not going to go over that you can readily read this it is a reading assignment for you I am now going to show you the calculations in the next pages that ultimately gives raise to a formula that I am looking for the formula is this one is a beautiful formula what does it say the sample variance using k plus 1 sample that is sk plus 1 square Xk square is a sample variance with k items. So this is old this is new I am going so this is the weight function for the old this is the weight function for the new this is the innovation term the innovation term is mu k plus 1 the average of the k plus 1 items average of the k items the difference square. So this gives raise to a very beautiful recursive formula for computing the sample variance as well as a sample mean sample mean we already updating the sample mean we already saw in the previous example. So this is an example where we can recursively compute the sample variance as well so these two together very beautifully illustrate the notion of sequential or recursive or online algorithms with so the two recursive algorithms are going in parallel this is the one for the mean I am sorry this is the this is one for the mean and this is the one for the variance. So what is that we are illustrating we are illustrating the application of recursive estimation by solving a very simple problem in statistics namely if you get a newer sample if you get a newer data item how do you update your mean how do you update your sample variance sample variance so these two together illustrate very beautifully the notion of sequential update or online update with that we come to the end of the discussion of sequential estimation so the notion of sequential estimation is absolutely fundamental it can it arises naturally in solving statistical problems in computing many of the statistical standard moments the first moment mean the second moment variance centered second moment variance and so on. The notion of a recursive computation also occurs within the context of geophysical inverse problem so we considered a set of linear inverse problem and illustrated how one can utilize the recursive solution to be able to solve the linear deterministic inverse problems with respect to availability of observations and being able to update my old belief to incorporate the new measurements thereby deriving a class of online sequential recursive algorithm and we also mentioned this idea is inherent to Kalman filters and there is a considerable similarity between both Kalman filter and this algorithm but one works in the stochastic domain another works in the sequential domain and both the applications rush on the fundamental matrix formula namely Sherman-Marrison-Rodberg formula with that we conclude our discussion of sequential estimation thank you.