 In this lecture we are going to provide a broad overview of a class of techniques that has come to be called 3D war. War for variational 3D essentially tells you we are concerned with problems in the 3 dimensional spatial variable at a given time. So, time is not a factor here. So, the 3D war and 4D war, war always represents variational method is a acronym for variational method. 3D essentially no time only space 4D 3 plus 1, 3 for space 1 plus 1 for time. So, you can think of 3D war to be a data simulation scheme done over the spatial domain at a given time. 4D war is a data simulation scheme that is done as a system evolves in time in the 3 dimensional space. Again, why are we talking about 3, 4 and so on? Most of the physical why most all the problems of interest in geophysical sciences happens on and around the earth. So, the space of interest is fundamentally a 3 dimensional space XYZ and many of the physical processes evolve in time. So, meteorologists in the early days when data simulation schemes were developed in order to be able to bring out the distinction between when time is a factor, when time is not a factor they concocted this notion of 3 and 4. The way VAR essentially stems from the fact some of the earliest thought processes that led to the data simulation scheme arose from the variational approach which we have indicated when we did forward sensitivity method 4D war method or adjoin method and that is the door by which these acronyms are kept into the vocabulary. So, you understand my thought is the row smells the same by whatever name you call. So you can if they want to call 3D war survey what is the idea? The idea is the underlying idea in 3D data simulation is always I would like to be able to estimate the unknown true state of a field variable. So, I have a spatial domain I have a grid embedded in the spatial domain I have numbered the grid 1 through n at each of the grid points I have a pressure value or temperature value at each grid value I am interested in a scalar variable I am going to collate all the values and create a vector action. In the case of 1D the grids are along the 1 dimension I would like to talk about for a minute. So, in here the grid is like this in case of 2D over the grid. So, in here we have n points along one line in the case of 2D I have the x dimension nx points along the x ny points along y. So, n is equal to nx times ny in this example 1 2 3 4 nx is equal to 4 ny is also equal to 4. So, n is equal to 16 in the case of a 3D I have a grid I have a grid I have a grid. So, this is nx this is ny this is nz therefore, n is equal to nx times ny times nz. So, you can see as you go from 1 dimension to 2 dimension to 3 dimension the size explodes let me give you a quick example let us assume nx is equal to 100 ny is also equal to 100 and nz is equal to 50. So, n is equal to 1 2 3 4 5 5 times 10 to the power of 5 that is half a million. So, the whole question is this. So, what is the nx the number of points along the x direction. So, let us now consider a spherical coordinate system. So, I would like to be able to embed a spherical coordinate system built on the earth. The earth diameter is 8 roughly 8000 miles. So, the circumference of the earth at the equator is roughly 24000 miles. If you consider a grid of 100 points covering 24000 miles the distance between 2 grid points is 24000 divided by 100 that is a large distance 24000 miles divided by 100. So, that is 240 miles. So, the distance between 2 grid points is 240 miles. Now, let us take the distance from Bangalore to Madras for example it is roughly 200 miles. So, there will be 1 grid point in Madras 1 grid point in Bangalore 1 grid point in probably Bangalore can that accurately capture the variations no. So, if you now change the number. So, if you want better accuracy you have to increase nx you have to increase ny you have to increase the nz. What is nz? nz is the number of vertical levels generally in meteorology the vertical levels is limited to 50 or 100 that is all. But what is the general problem for global models nx ny the product of nx ny with the nz the number of points. So, that is why in atmospheric sciences in oceanographic sciences the value of n quickly in a hurry becomes a million become tens of millions. So, there are 2 things 1 is how to solve conceptually a given problem. Secondly knowing conceptually how to solve a given problem how do we solve the problem for which we know that there is an algorithm but a problem of huge size. Not every conceptual algorithm will be applicable directly to problems of large size. So, the course of dimensionality often times limits what one can do we have already alluded to these things. So, how do we decide what is the value of n I want to do many things whatever I want to do I must be able to do in my lifetime earlier we saw to multiply 2 matrices of million by million on a machine with a teraflop power it will take about 12 days to multiply 1 matrix to multiply 1 pair of matrices. Can we afford to solve a problem for 12 days can we afford to solve a problem for one day can we afford to solve a problem in 6 hours no we want to be able to make forecast every 6 hours every 12 hours every 24 hours. So, what is the time period within which you have to generate forecast that provides the time horizon the computer provides you the computing power. So, the time horizon and the computing power together limits the value of n that you can consider in solving problems. So, whenever we talk about problems of size n I want you to keep track of this n I mathematically I can make this n as large as possible but physically to be able to put it through the computer there are other limitations. So, let us pretend I have a global domain I have a field variable of interest I have correlated all the field variables of interest in a vector vectors of size n n if it is a one dimensional problem two dimensional problem three dimensional problem how relates to 1d 2d 3d that is what we have talked about so far. So, Rn is called the model space as we have been talking about all through there are our notation is pretty common through the whole thing there are two pieces of information about the unknown one is the background or a prior and see the new observation again this is nothing new we have talked about it within the Bayesian context at great length. Let me give an example suppose you are planning a trip to Paris in the middle of January you want to be able to pack the right kind of dress. So, what did you do you would like to look at the Almanacs that talks about the average low temperature average high temperature in the middle of January in downtown Paris and that information is available this day and age the Almanac is essentially Google Wikipedia they all provide you lots of information how are these information collected these are summaries over a long time and that is the information that is called background the background essentially comes from a summary of the prior information it is also called climatology. So, the background can come from several different directions from previous knowledge climatology previous forecast and so on and so forth. So, all the gobbledygomix of everything I know up until this time if I can spread it over the grid and embed this as a vector that I call as a background. Z is the new information RM is called the observation space again we have talked about it at great length in several occasions the prior info so I am going to call XB as a prior. So, you can see the Bayesian framework comes into play sneaks in the XB has been derived from the previous forecast on the climatology X is the unknown X is the unknown I want to be able to estimate sorry X I am interested in trying to estimate X or find X. XB is the prior knowledge about X so X minus XB is X tilde B X tilde B is called the difference between the unknown and the background. The expected value of XB is 0 the covariance of X tilde B is B B is a matrix B is a n by n matrix. So, B is a n by n matrix that encapsulates the spatial covariance that underlie the current knowledge of XB and that should be very clear to us from the previous slide in optimal interpolation as well what was C there C was a covariance matrix that is derived from the knowledge of the long time series. So, XB so if you look into Google you can talk about what is the maximum temperature in downtown Paris noon every day that is a prediction. So, you can from this data you can compute B by doing simple statistical analysis. So, I am assuming the I know XB I know B what is B B is represent the spatial variability of XB but then you and I look at Google to go to Paris in the middle of January we do not we do not worry about the variance B simple as a the maximum temperature what did you say the maximum temperature in downtown Paris in middle of January could be 5 degree Celsius plus or minus 1 the plus or minus 1 is indication of XB 5 degrees is XB. So, we have plus or minus gives you the variation at a given location I also know the mean value. So, I am going to now assume PX is the prior prior has the mean XB the covariance is B I am going to pretend it is normally distributed let us consider the let us discuss that for a moment normal distribution what is the beauty of normal distribution. Normal distribution is probably the only distribution I think it is the only distribution that is uniquely determined by the mean and the variance. So, there is a general result in probability theory given a density function PFX sorry given a density function PFX given a density function PFX I can compute all the moments what are the moments first moment second moment third moment central moment and non-central moments. So, what is the expression for the kth moment the expression for the kth moment is expected value of X to the power k which is equal to integral let us assume scalar variable that is equal to X to the power of k PX DX. I can also consider central moments or centred moment X minus EFX to the power k PFX DX I can compute this for every k. So, given P I can compute all moments now let us consider the converse how many moments should I know in order that I can build the probability density function. So, given the probability density function I can compute all moments simply evaluation of the integral some integrals can be evaluated explicitly some integral almost all the integrals can be evaluated to a very high degree of approximation numerically. So, the passage from distribution to moment I can do anything that the converse question is a fundamental question that question has come to be known as moment problem moment problem. What is the statement of the moment problem the moment problem tell as the following question how many moments should I know in order to be able to build the underlying probability density function that is a very good proof you know in the first volume of Feller's book that essentially tells you one needs to know infinitely many moments to be able to reconstruct PFX. So, the converse problem of going from moments to distribution is a much tougher problem. So, within the context of this moment problem now I want to mention that normal distribution Gaussian distribution is the only unique distribution that is determined by the mean in the variance why if I say x is a Gaussian distribution x is uniquely decided by the mean and the covariance or other variance. Therefore in many meteorological application a very statistical application if I am given a bunch of data from the data I can grind the mean from the data I can grind the sample mean and the sample variance sample covariance if I have a sample mean and sample covariance I am going to pretend that underlying random variable is distributed with normally with a sample mean as the mean and sample covariance as a covariance. So, that is the reason why even though we only know the first moment and second moment we take liberty we take a big liberty in assuming that the background is the background is normally distributed with the mean x b and the covariance b it is a bit of a stretch all we know is only x b and b to go from x b and b to the normality with x b as the mean and the b and the is the leap in faith but we oftentimes do that. So, for those of us who would like to look at the specificity of the distribution function to say p of x is normal is to say p of x has this functional form 1 over 2 pi to the power n by 2 determinant. So, this is the determinant of the matrix b determinant of the matrix b exponential of minus j b of x where j b is given by this quadratic functions. Now please understand this kind of quadratic functions we have seen several times over and over and over and over and again this the so j b looks like the least square criterion this j b which is the which looks like the least square criterion comes as an exponent to an exponential function. Look at this now least square where invented by gas gas in distribution is normal. So, the least square functional form trying to describe the functional form of the normal distribution it is not an accident it is all it is all invented by the same person. So, it is this j b the mean square type objective function gives rest of the bell shaped curve. So, this is the functional form it is very important to understand this functional form. So, this is the information about the background let us go to the observation z is the observation z is equal to h of x plus v z contains information about the state. So, is the indirect measurements of the state I am going to assume v is normally distributed. So, z because of the additivity because of the uncorrelated nature of x and v the conditional distribution p x given p of z given x is given by this distribution where j o is given by this you can see the relation between 5 and 6 and the relation between 8 and 9. So, 5 and 6 relates to prior the background is taken as the prior. So, what is the assumption here the prior information is the normal is normally distributed with the known mean and the known covariance the conditional distribution given x that means given x the conditional distribution is given by 8 and 9. So, I want you to take a good look at j b in 6 and j o in 5 in particular I would like to draw attention to the j o in 9 j b b refers to background j o o refers to observation. So, z minus h of x I am sure you will recall this z minus h of x is the residual this is the residual. So, this is essentially least square criterion we have already used but we did not say anything about least squares we simply talked about exponent of a underlying normal distribution. So, I would like you to develop an appreciation least squares normality they are all intimately associated with each other we are also. So, what an assumption x b and v are uncorrelated x b what is x b x b is the background the background and the observation noise uncorrelated these are standard assumptions why are these standard assumption they are very meaningful x x b and v and 0 v and x b that means there is no correlation between v and x b. So, x b b refers to the prior z r refers to the new information these are two pieces of information about the unknown true state please understand when we did not have prior we only had observation we know how to do data simulations for static problem we usually squares now we are going to invoke to the Bayesian type of analysis I have prior I have new information we are going to use pfx the prior conditional distribution in a Bayes rule to get the posterior distribution and that is the underlying philosophy of 3D war. So, recall the Bayes rule the posterior this is the posterior that is the prior that is the conditional distribution pfz is a normalizing constant I can now substitute look at this now I have expression for pfx I have expression for conditional distribution both are normal I can substitute everything if I did that I get this multiplying constant times exponential exponential minus gfx minus gfx is simply j0 of x plus jb of x where c is some normalizing constant we have looked at this kind of ratio and under normal distribution we have done several exercises in in the chapter on Bayes least squares estimation so I am trying to do very thing very much similar to what I did in the past now what is that I would like to do I would like to be able to so now look at this now before I go further what is this that is a posterior you can think of as a likelihood a like Fischer are with me place so Fischer did not have background so he only had the conditional distribution so he would he considered the notion of likelihood and maximized the likelihood here we have posterior within the Bayesian context we have already seen the posterior mean is the best estimate so I would like to be able to I would like to be able to maximize minimize appropriate quantities given the setup that is the optimization problems of interest so I hope the expression 11 and 12 are very clear now 12 what is 12 I want to draw your attention to 12 12 is simply sum of two quality forms j0 arising from observation jb arising out of background so I would like to be able to maximize px of z with respect to x we can see px of z that means I want to find the location where the px of z is maximum when px of z a pre-remember px of z is proportional to barring a constant exponential minus j of x and j of x is equal to j0 of x plus jb of x so maximizing exponential of minus j of x is equal to minimize in j of x therefore therefore the algorithm reduces to minimizing this term this is the background term this is the observation term when hfx is equal to linear hfx both the terms are quadratic even hfx is not equal to linear the first term could be non-linear but second term is still quadratic I am interested in minimizing jfx which is given in 13 this minimization problem has come to be called the 3D war problem the 3 dimensional variation of simulation problem is essentially minimizing the two quadratic forms one coming from background another coming from observation and so minimize jfx with respect to x is an unconstrained minimization problem a lot easier there is no constraint we can compute the gradient gradient of the sum is some of the gradients Hessian of the sum is some of the Hessians so gradient of j0 gradient of jb substituting summing and equating that to 0 I get the general equation now look at this now the right hand side involves xb zb Jacobian of h and r everybody is known and the left hand side b is known r inverse is known Jacobian of h is known x I do not know x I do not know when hfx when hfx is equal to hfx this reduces to a linear relation which can be solved by any one of the methods in principle 17 is a non-linear system is a non-linear algebraic system whose solution gives you the optimal analysis you with the non-linear algebraic system the only way to solve a non-linear algebraic system is by numerical methods so there are tons of numerical methods one knows to solve 17 in the special case I would like to when I reemphasize that in the special case when hfx is equal to hfx in this in so when when hfx is equal to hfx the Jacobian of h is equal to h if everything simplifies therefore the matrix like this times x times this matrix and that is the optimal solution is given by this this system I have to solve this system is like ax is equal to b again ab is spd we already know how to solve spd systems and it is no accident that this is exactly the same relation that we derived in the model space formulation in the Bayesian scheme so this see the model space formulation so this is equivalent to Kalman filter equation so the model space formulation so what does this tell you as we will see when we come to Kalman filter equation Kalman filter equation Kalman filtering essentially consists of the following step namely the background is replaced by a forecast a new observation comes in how do you combine forecast and and the observation that is the Kalman filter equations so you can readily see what happens at the filtering stage of the Kalman filter Kalman filter has multiple stages one of the stages in Kalman filter is called the filtering phase or a filtering stage the filtering stage of the Kalman filter is equivalent to 3D war that is the story that is the story and 3D war is very much related to the Bayesian framework and it is no surprise here we are coming back to Kalman because we have already established the relation between Bayesian least square and minimum linear minimum variance estimate one did it in the model space another did in the observational space we built the bridge between the two using the matrix identity called Sherman-Maurice-Rodbury formula so now I believe the whole thing is falling in place Bayesian linear minimum variance 3D war Kalman filter all these things are all close cousins of each other and the even though the ideas come in different with different labels and different names and the underlying mathematics are very nearly similar or same the so the equation to this gives you an so the equation to 19 gives you I am sorry the equation to 19 gives you the analysis the optimal solution for this is called the analysis so that is called XA analysis XA is the analysis please realize analysis is the fancy term that geophysicists use for the posterior estimate so what statisticians calls as posterior estimate is essentially analysis in geophysical sciences so we have already seen this the the Hessian of the J function again I would like you to verify we have already seen when we did statistical least squares the inverse of the Hessian is indeed the analysis covariance of the analysis so I have XA I have PA PA is essentially X inverse that is the Hessian so this is the analysis covariance so this is PA analysis covariance so I have analysis XA from the previous slide given by equation 19 the analysis covariance is given by 12 why am I interested in analysis covariance because we are interested we have been talking about statistical methods in a statistical methods mean alone will not cut I need to be able to provide information about the underlying variability as well so in most of the stochastic predictions we are not only interested in the level given by the mean but also we are interested in getting some feel for the underlying covariance so I have both the mean and the covariance so this is the best we could do you gave me the first two moments of background and up the first two moments of both the background as well as observation now I have combined the first two moments of the background of the observation to create the first moment and the second moment of the analysis the mean of the analysis the covariance of the analysis how do I use the covariance analysis analysis covariance matrix is a matrix the diagonal elements are all the individual variance the sum the trace the sum of the diagonal elements which is called a trace the total amount of variance variance across all the components that provides you a good measure of the quality of the of the analysis if that variance is large you cannot put too much faith on your analysis if that variance is smaller analysis has lot more credibility lot lesser variation so that is how we would like to be able to we would like to be able to whenever we generate an analysis what is an analysis analysis are kind of a prediction what good is a prediction if I do not know good what is the degree of variability in my prediction with respect to lunar eclipse lunar eclipse lunar eclipse and solar eclipse variability is 0 we know precisely can we predict the temperature on noon on March 15th at at at at at downtown London we cannot we we can I can of course I can generate a number but what good is that number unless I know what is going to be underlying variance associated with it so prediction deterministic prediction stochastic prediction deterministic prediction we are content with one number in stochastic prediction we also need to have a number that gives you the level also we need to give the variability so if you look at current predictions of local weather they will say tomorrow there is a chance of 1 inch rain 80% probability so 1 inch rain is the measure of the mean level of the rain 80% what that is a very very high percentage very high percentage so we need to be able to not only give the level but also an associated variance which is the measure of the variability in my our confidence in my prediction so in stochastic predictions I need both the mean as well as the variance or the covariance once we get this there are is a quick exercise I can talk about the incremental form of 3D war in the model space so let me run through this very quickly this is a largely related to computational advantages and disadvantages of whichever formulation we want to follow let this be the increment this is called incremental formulation substituting this 23 in 20 my J becomes this that you can readily see where D is equal to H Z-H of XB that is what is called the observation increment or the innovation so we can work with the incremental variable it can be verified that the optimal increment delta X given by the solution of this in other words if I minimize this the minimizer is given by the solution of this what is the idea now the analysis is given by background plus the increment the increment comes from this analysis that is that is another way of looking at it one of the difficulty in here in solving this is this system as well as the previous system let me go back we need to know B inverse we need to know R inverse so that is the question one can ask myself so that from a computational point of view do I know XB do I know XB and B do I know XB and RB inverse where B inverse comes in B inverse comes in as the weight in my objective function in the least squares so my while I know B my data simulation scheme uses B inverse B is a very large matrix n by n n is a million the inverse of a million by million matrix so computationally now one has to ask oneself the question hey are you going to really invert this matrices is there a way to reformulate the problem without worrying about B inverse but using B only I hope you appreciate this problem now we know XB so what is that we have been given I know the background information XB and B I know the observation Z and R but my J0 uses R inverse my JB uses B inverse my JB uses B inverse B inverse R inverse who is going to give you the theory is beautiful but you need to be able to compute B inverse R inverse look at this now the left hand side matrix in here involves these inverses so where in the world are you going to get these so the question here is that is there a way to reformulate I like the formulation but I do not like the computational demands of it is there a way to reformulate it to an equivalent formulation that does not depend on some of the inverses so that is a question that drives some of these so look at this now AM so I am going to now call B inverse plus H transpose R inverse H as AM sorry as AM B and R are a speedy I am assuming H is a full rank therefore AM is a speedy if AM is a speedy this combined matrix AM has n eigenvalues the least of the eigenvalues is positive the condition number for this matrix is given by lambda 1 by lambda n please understand this is the matrix with which I have to solve AM X is equal to the right hand side right hand is the vector so the quality of the solution depends on the condition number of this matrix AM the condition number please recall lambda 1 by lambda n while AM could be symmetric and positive definite since lambda n positive but lambda n can be very small so the condition number could be very large since lambda n is positive but small so lambda n can be small but still positive so let us assume lambda n as an example is 10 to the power of minus 6 is still positive but it comes in the denominator so condition number will be very large so condition number for SPD matrix is becomes very large because the smallest eigenvalue while remaining positive may become very small in those cases I will have headache computationally so large condition number leads to instabilities in computation to tame the conditioning we use a suitable transformation called preconditioning so preconditioning is the methodology by which I am going to get around the challenges posed by an ill conditioned matrix or a matrix with large condition number what is the matrix of interest here AM what is the AM AM is the sum of B inverse plus H transpose R inverse H these are all mathematical considerations so how am I going to do develop a preconditioning to avoid the challenges of ill conditioned matrices or nearly ill conditioned matrices that is what we are going to discuss now so we have assumed B is known we are assumed R is known if B is known let us pretend I know the Choloski factors G G transpose every SPD as a Choloski decomposition so G is also called Choloski factor or a square root of B we have already seen that when we did that so I am now trying to define a new variable U in terms of delta X you can think of this as a linear transformation or you can think of this as a preconditioning transformation whichever way you want to call it so with this delta X is equal to GU look at this now if B is equal to G G transpose in 28 B inverse is going to be G to the G minus T G inverse minus T means transpose of the inverse so if I substitute this linear transformation my GU becomes this look at this now in the first term there is no B B is gone in the new variable and what is the minimizer the minimizer of this is given by this so I am not trying to show you all the steps this is the quadric 30 is a quadric function can compute the gradient equate the gradient to 0 equated to 0 you get the optimal solution in the new variable U to be given by 31 I hope you keep track of everything now what is the matrix AM AM is given by G transpose H transpose R inverse H G now let us look at this matrix I AM so I have the matrix I so let us instead of AM I will consider the matrix A what is the general resultant matrix theory if lambda is an eigenvalue of A implies 1 plus lambda is an eigenvalue of I plus A that is a fact so if let us assume now if lambda bar 1 is greater than equal to lambda bar 2 greater than equal to lambda bar n greater than equal to 0 so if AM is a symmetric positive definite it is eigenvalue the least eigenvalue is lambda bar n please understand I use lambda 1 lambda 2 lambda 3 for A I am using lambda bar 1 bar 2 bar n for A bar m so the eigenvalues let this be the eigenvalues of this the eigenvalues of I plus AM are 1 plus lambda 1 bar greater than equal to 1 plus lambda 2 bar greater than equal to 1 plus lambda n bar this is greater than 0 now look at this now the least eigenvalue is 1 plus lambda n bar 1 plus lambda n bar can never be close to 0 lambda n bar is positive but 1 plus that means I am bounding the value of the least eigenvalue that means my condition number is not going to explore this new space therefore I have tamed the condition number by a useful transformation so such transformations are called preconditioning methods preconditioning methods have been known for a long time that is essentially motivated by to tame the instabilities that may arise because of large or a large condition cases so I have already talked about all this so the condition number for the condition number for the new matrix is this and this is always the least eigenvalue is always bounded away from 0 instead of being bounded away from 0 so this matrix is well tamed if this matrix is well tamed I should not expect any difficulty in numerical analysis in numerically trying to compute the solution when I use any method any meaningful method to solve. So with that we come to the end of the discussion of the so called 3D war so in summary 3D war is another way of looking at the Bayesian formulation which we have already talked about within the context of statistical estimation theory especially development of Bayes least squares theory we are trying to reinforce it by using a new language that is prevalent in atmospheric sciences in data simulation called 3D war and we talked about the nature of the solution we talked about the potential for ill conditioning we also talked about how do we tame using a preconditioning transformation to be able to induce better conditioning for matrices so that I can have reasonably good stable numerical results I would like to quickly bring to your attention to the Sherman-Marrison-Nutbury formula you can see the importance of this formula again I want to tell you something Sherman-Marrison-Nutbury formula was developed in mathematics for the sake of interest in matrix identity what is that they were interested in they were interested in the following if A is a matrix if I know A inverse if I perturb that matrix by adding another matrix if I am interested in A inverse can I find the inverse of A plus B which is the perturbed matrix simply as a function of A inverse and B that is the question they were interested in in other words if I know if I have a routine to compute A inverse of a matrix I can plow the entire new matrix A plus B through the algorithm and can compute it but that is not that what they were interested in I know the inverse of A A has been updated by B how do I update A inverse to get the inverse of A plus B that is the idea the idea is very similar to what we do in Taylor series what is the example of in Taylor series I know the value of the function at FFX I would like to be able to know the value the function at FFX plus H and what we do we simply say this is FFX plus H times F prime of X plus H square divided by 2 times F double prime of X so what is the idea in general both here as well as in here knowing what I know how can I extrapolate my knowledge in a neighbourhood of around what I know so I know A inverse from A A plus B is a matrix which is close to A so if I know somebody close to somebody who is invertible can I express my inverse based on the inverse of a fellow whom I know is invertible and I know his inverse that is the idea same thing in here I know the value of function at the point X I am considering neighbourhood of a point F of X plus H how do I extrapolate the value of FFX to FFX plus H I can do it by knowing the value of the function I am just derivative here I simply want to be able to use A inverse and B and that is the beauty of Sherman-Marrison's beauty formula when they developed it they had no idea whether it will be used at all they did it for the sake of mathematics for the sake of beauty in fact Sherman-Marrison formula plays such a crucial role in data simulation especially in trying to relate the data simulation formulation from model space to observation space why is this relation needed the model space is n dimensional observation space is m dimensional which is seldom m is equal to equal to n so either m is greater than n or m is less than n so what is that we would like to take advantage of it is always cheaper to perform computation in a smaller dimensional space so n is smaller you do the operations in model space if m is smaller do the operations in the observation space who provides this freedom to go between these two world Sherman-Marrison's very formula that is beauty it has no physics it has no meteorology it has nothing it is yes a beautiful mathematical enterprise that helps you to build the bridge between these two world and that relieves this of the pain of having to do all the computation only in one world irrespective whether m is greater m is less it is this power of the Sherman-Marrison's very formula I would like you to understand appreciate and see how repeatedly we have used in our discussion of in our discussion of data simulation I would like to make one or two comments ends up in the United States which is called the National Center for Environmental Protection they use the preconditioned incremental model space for the spectral model so it is mouthful let me tell you this now what is the model they use they use spectral model for the phenomenon what is the spectral model so consider the primitive equation in a spherical domain you express the solution in the spherical harmonics as a Fourier series you substitute the expansion in the spherical domain in the spherical harmonics into the differential equation you reduce the system of partial differential equation to a system of ordinary differential equation on the amplitudes of the spherical harmonics so what is the expansion of the field variable such as velocity such as pressure such as temperature so what is that I am assuming I can encapsulate the spatial variation by involving spherical harmonics what do I did not what I what I do not know it is amplitude so amplitude we assume the spherical the spatial variation can be captured by a clever combination of spherical harmonic functions amplitude is something I do not know so by substituting this Fourier series expansion in spherical domain into the primitive equation model I can reduce the infinite dimensional partial differential equation to a finite dimensional ODE the resulting finite dimensional ODE's have come to be called either low order models or spectral models so spectral models are essentially resulting from the application of the Fourier analysis to the continuous domain so they consider the spectral model they do a model space formulation they use an incremental version what is the incremental version delta x they use the preconditioned incremental model space formulation of the spectral model I hope you are now appreciate that that means there is a background I want to have the analysis I want to go from prior to posterior background is prior posterior is analysis I would like to be able to express the analysis the posterior as background plus an increment it is that increment in this incremental formulation that is being computed that is what we have expressed that because the computational problem they use the preconditioned version and they also use model space formulation so model space formulation of the spectral model precondition increment as opposed to NSF in NASA they use the observation space formulation so different places develop programs systems in different formulation because of that belief for the targeted applications in mind so I would like to leave you with a very interesting computational project it could be a very nice master's thesis in fact compare so what is that you have a 3D var problem I can do an iterative method customer like iterative method I can do a 3D var like method so it behooves us to ask a question to compare the results of 3D var with iterative methods and also I not only would like you to compare the quality of the result but also the computational requirements I think these kinds of projects at the master's level would be very educated very inspiring people to be able to look at different kinds of methodologies that could be brought to bear to solve different types of data simulation problems of interest in daily life thank you