 In the last module we talked about predictability in the context of deterministic model. In general deterministic forecast do not have any spread, deterministic forecast is always a point prediction at what time at what day time of a particular year the lunar solar eclipse will happen. So it is a point estimate you are going to talk about the occurrence of a phenomenon called solar eclipse or lunar eclipse by specifying the year, the month, the day and the time. So there is no spread, there is no variance. So in the context of solar eclipse it is perfectly predictable. We use deterministic models to be able to make weather predictions. So weather predictions in general have always been point estimate until recently. But we know that point estimate by itself does not mean much we need to be able to tell the confidence with which the point estimate would be a reasonable predictor. This relation to the quality of prediction is often given by the variance associated with the value being predicted. In the case of deterministic models there is no notion of variance within the deterministic concept. So we worried about the error. I would have obtained one level of prediction had I used one state, I would have obtained another level of prediction had I used a state which is close by. I would like to be able to look at the differences between these predictions resulting from a small minuscule differences in the initial conditions. If the errors in initial conditions grow with time then the model exhibits extreme sensitivity. The predictability does not, I cannot make predictions, I cannot make reliable prediction for long durations of time in which case I have to determine the length of the period of time over which I have some confidence in the quality of prediction that led to the notion of predictability limit. The predictability limit is a measure of the time period within which I can believe, I have a good idea of the goodness of the prediction. But the interest in prediction is moving towards stochastic analysis. What is that? Tomorrow I am likely to have 10 centimetres of rain with a confidence of plus or minus 20%. So that it is going to rain is for sure. So what is the confidence on and the measure, the level 10 centimetre rain. Yes 10 centimetres of rain in 24 hours is an extreme event. We are all observing several extreme events over the whole world over a longer period of time. So one of the problems here is that what are the probabilities of extreme events. If you make a prediction on an extreme event what is the confidence interval? What are the variance of the estimate? If I say I am going to have a prediction of 5 millimetre rain what is going to be the confidence of that? If I say tomorrow's temperature is going to be 105 degrees what is going to be the confidence in here? So as we move from deterministic prediction to some aspect of probabilistic prediction where I am interested not only in predicting the level but also the variability, the uncertainty in the prediction. That is the kind of a stochastic prediction. So motivated by this need for both the level and the spread of uncertainty which is characterized by variance in this module we are going to look at some of the basic issues relating to predictability a stochastic view. Predictability relates to the ability to predict both the normal course of events as well as extreme or catastrophic events. Is tomorrow's weather is going to be like tomorrow? Today's weather is there likely to be chances for extreme events such as 10 centimetre rain in 24 hours these are extreme events. So we would like to be able to predict not only the normal course of events but also extreme events. So predictability problem has this dual goal normal and extreme. So it calls for assessing the goodness of prediction as measured by the variance of the prediction. What is the variance in the prediction of normal events? What is the variance in prediction of extreme events? Sometimes we may be able to predict the normal events with better accuracy than extreme events. It is the extreme events that generally catches by surprise but these predictions are going to be created by models. So how do we calculate the level as well as the variance of these predictions? A proper appropriate framework would be to make a probabilistic analysis. This probabilistic analysis is called the probabilistic view or the stochastic view of predictability. The quality of prediction often times is affected by both the inherent natural variability as well as the goodness and properties of the model used to generate the prediction. We all know climate. Climates has a natural variability. One of the interest in climate science is to be able to make how the temperature of the earth has been rising and what is the maximum raise that one would face in the next 50 years? What is the effect of increasing carbon sequestration in the atmosphere on the rise of global temperature? These are very interesting and practical problems. If you look at the predictions, the prediction for the rising temperature in the next 50 years, 100 years vary anywhere from 2 degrees to 6 degrees. There is a wide range of variation in this prediction that relates to the properties of the model that are being used. So different models have different characteristics leading to different types of prediction. So ultimately the quality of prediction rests at the feet of the model. Why? It is the model that is used to extrapolate. Extrapolation is prediction. So with this in mind, we have already talked about the role of model in deterministic predictability. So one measure of the predictability using deterministic model is the computation of Lyapunov indices. If one of the Lyapunov indices is positive, then there is going to be problem in trying to make longer range predictions. In the case of stochastic analysis, we need to be able to analyze what are the stochastic factors that affect the prediction. That is what the ultimate role of this module. But before that, we would like to classify some even some meaningful way. Some events are perfectly predictable. We have alluded to this several times but I think in this context is what emphasizing some events are perfectly predictable. Lunar solar ectopsis, the phases of the moon and their impact on ocean tides. So normally we make predictions about ocean tide at days during the week month that depends on the phases of the moon and those predictions are pretty accurate. Some events are perfectly predictable but many events are not perfectly predictable. What do you mean by not perfectly predictable? Because when you go from deterministic stochastic, the variance matters. If the variance of the prediction is large, then the quality of prediction is low. If the variance of prediction is low, the quality of prediction is high. So in general within the stochastic framework, this is related to the classical signal to noise ratio. So a prediction is good if the signal to noise ratio is large. Why is that? The numerator signal, denominator is noise. If the noise is low, the signal to noise ratio is high. So what is that effect that is reducing the quality of prediction is the noise. The noise can be thought of as errors in prediction. Those errors arise because of the properties of the model that are used to create the prediction. So you can think of this. We have alluded to this in the last lecture itself with respect to when we are discussing the Rayleigh's coefficient. So Rayleigh's coefficient in some sense can be thought of as an indirect measure of the signal to noise ratio. So what are the examples of normal events and other events? The maximum minimum temperature in major cities around the world tomorrow. The prediction of foreign exchange rate from US dollar to Euro for example. Because for the conduct of the economy, I need to know the foreign exchange rate every day. Foreign exchange rate is a random process. So we need to be able to, and that depends on several factors affecting different currencies in different parts of the world and different economic conditions. So there are very good mathematical, empirical mathematical models that have been developed to be able to predict the foreign exchange. The foreign exchange is a slight increase or decrease in rate means enormous economical implications when you buy and sell. So these are some of the normal events, maximum temperature, minimum temperature, prediction of foreign exchange and so on and so forth. The next class of events is the prediction of RAD event. RAD event with high impact. For example we recently witnessed a high impact rain event that affected certain parts of the state Tamil Nadu. It is a rare in the sense it is considered to be a one in a hundred year event. The impact is very high because of continuous rain for several days that led to heavy flooding that cost immeasurable, immeasurable misery to several segments of the community. So what are the examples of prediction of RAD events? What is the probability that 8.0 in the Ritchard scale magnitude earthquake will occur in the Los Angeles basin in one year from now? What is the probability of having 15 inches of rain Chennai in the month of December? That is a RAD event. Analysis of RAD event with high impact lies at the heart of what is called risk analysis. For example insurance industry are always interested in risk analysis. For example we tried to live close to the water's edge. So there is a lot of population density all around the world along the shorelines. But shorelines are always subjected to extreme weather because of hurricane like events. So when a hurricane, when a disastrous hurricane comes and hits there is not only loss of life loss of property. So insurance industry when it is subjected to heavy loss of both life and property they have to pay back lots of compensation and so they would like to be able to estimate the probability with which high impact, low probability even will occur and based on which they will assess the risk to be able to decide the premium for insurance in order to be able to cover a mansion on shorelines all over the world. So prediction in general relates to normal events as well as RAD events. In the case of high impact RAD events it is even more important to be able to make a good estimate about the prediction because enormous implications in both loss of human and animal life plus loss of property. So these are all the motivations for trying to improve the quality of prediction and also understand the level as well as the variability. It is a general information that goodness of prediction is a direct function of the amount and quality of information used in generating prediction. This information set generally contains a model and the observations of the process being predicted. The heart and soul of dynamic data simulation is to calibrate the model prediction against the noisy observations that what the primary emphasis of this course has been. In this module we are primarily interested in in qualifying the goodness of prediction generated by an assimilated model. An assimilated model means what I have estimated the unknowns the estimates have errors. The errors in the estimate induces errors in the forecast. So I would like to be able to knowing the statistical properties of the errors that leads to the prediction I would like to be able to derive the statistical properties of prediction. All within the stochastic context. So sources of prediction error that could be a slight repetition but I think it is worth repeating some of the key factors. In deterministic models prediction error is attributable to model sensitivity to errors. It is related to errors in the initial condition boundary condition parameters that are being estimated. It is a topic that we have pursued in the previous module. In the stochastic model however the uncertainty in the prediction may arise from uncertainty in the initial conditions uncertainty in the random forcing or uncertainty in the parameters. I am assuming we are assuming in here the parameters are very well established. So I am simply trying to relate the uncertainty in the forecast to uncertainty in the initial condition and uncertainty in the random forcing. So in this module we discussed the resulting uncertainty in the prediction induced by the uncertainty in the initial condition and uncertainty in the forcing. So in a deterministic model if the initial condition is random the solution exhibits random behavior. In a deterministic model if the forcing is random the solution exhibits random behavior. In a deterministic model both the initial condition and the forcing are random then the combination of the two results also in stochastic nature of the prediction. So, within this within the stochastic model setup we need to be worry about the uncertainties in the prediction induced by these two sources of randomness. The third source of randomness which we are not considering for sake of simplicity is randomness in the solution induced by the parameters. They are often more difficult because the dependence of the solution and parameters are always nonlinear more often than not even if a model is linear the solution depends on the parameter nonlinearly. What is an example I will give a quick example suppose x dot is equal to a x a is the parameter that is the initial condition we already know the solution x of t is e to the power of a t of x naught. So, in here the if x naught is stochastic x t is stochastic but x t depends on x x naught linearly however if a the parameter is stochastic x t induces that induces stochasticity in x t but x t depends on a f t nonlinearly. So, even though the equation is linear sometimes the solution may depend on the parameter nonlinearly. So, here analyze the whole spectrum of possibilities the randomness can arise from initial condition can arise from parameters can arise from forcing here I am controlling an enforced model. So, what is the what are the class of techniques that are available for analyzing solutions of stochastic system and trying to compute or quantify the uncertainty solution that is the goal of predictability within the context of stochastic system. First is a class of analytical methods in this class of analytical methods our aim is to be able to describe the evolution of the probability density function of this states of the system from which forecast products can be generated. For example, if x t is the solution of the system at time t if p t of x t is the probability density of the solution at time t we know p t evolves in time. So, there are two t is now x depends on t p is the density function of x of t the density function of x of t may also depend on t because p t evolves in time if p t is equal to is equal to p of x of t what does it mean the probability density does not change in time that is what is called invariant density or stationary density. If x t varies according to a dynamical system it is often the case that p also varies. So, we are in general concerned with we are in general concerned with p t of x of t so probability density. So, if there is a means by which I can describe the dynamics of p t of x of t based on the dynamics of x of t what is that we know we know the complete distribution of the state of the system at every time what is the maximum information one can give about a random variable it is a probability distribution there is no more information than the probability distribution that one can endow a random variable with. So, once the probability distribution is known we know everything about it once given the probability distribution I can compute the mean variance I can compute any number of statistics you want to be able to generate out of it. So, knowing the probability density function so what are the forecast products mean is the level variance is the uncertainty. So, I can create the forecast product as mean and variance from p f t if p f t can be analytically derived, but it turns out that the ability to characterize p t analytically is possible only for a very small class of uninteresting simple textbook cases. So, that is the difficulty what is the second one the second one is if I cannot do everything so knowing p t is everything if I cannot do everything at least I would like to do some meaningful parts of everything. So, what is that that gives rise to approximation of moment we have already talked about in the context of non-linear filters how do we generate moment approximation in the case of linear dynamics in the case of non-linear dynamics. So, that is the next best thing evolution of moment dynamics evolution of evolution of moments or moment dynamics what is the third method third method is called a Monte Carlo technique what is the Monte Carlo technique. So, if I if I am given a deterministic system if the deterministic system has initial condition random then I can pick samples from the initial distribution pick different initial condition from the given probability distribution run the model forward in time I can create an ensemble of forecast and that is the basis for ensemble method that we have seen in the context of ensemble filters or reduced rank filters a class of reduced rank filters coming from ensemble methods. So, once I can create an ensemble of forecast and at every time then I can compute an ensemble mean I can compute ensemble variance or covariance I can use the ensemble mean and the covariance as a forecast product. So, these are primarily the three ways of attacking stochastic predictability issues third method Monte Carlo method is useful in all cases, but it requires lot of computational power because I need to be able to run the model in parallel for Humpteen different types of initial conditions how many such initial condition you should be able to run in order to be able to get meaningful estimates of the variance and covariance that comes from large sample statistical analysis. So, statistics provides you in order to be able to estimate the mean with certain degree of accuracy how many samples you must have in order to be able to estimate the variance with certain degree of accuracy how many samples you must have. So, we can fall back on the large sample statistical theory in trying to estimate how many sample how many ensemble members do I need to be able to make decent estimates of the mean covariance or mean covariance analysis based on which we can generate forecast product for public consumption. So, relations to other parts of the lectures I would like to now talk about this the approach based on approximate moment I have already alluded to, but we would like to formalize it an approach based on approximation moment dynamics was described in the module 8.3 and non-linear filtering. So, even though we are not covering it here many parts of this idea has already been covered. So, I would like you to be able to tap those results to the benefit of making probabilistic prediction sampling based approach which is the Montcarl approach has can be related to ensemble reduced rank filters and that was covered in one of the early slides. There are other class of methods called unscented transformation non-linear unscented transformation we are not going to be talking about we are not we will not have time to cover these and there are enormous literature on in this area. So, unscented transformation is a class of an ensemble is a specific class of ensemble methods using which you generate not only a stochastic prediction, but also it will enable you to be able to compute the moment statistics rather easily. So, given these that we have covered aspects of Monte Carlo methods aspects of moment dynamics earlier aspects of Monte Carlo coming from ensemble aspects of moment dynamics in the context of ensemble filters I am simply going to restrict our discussion to the class of analytical approach. Admittedly this approach is very limited because its solution process except for in simple cases I cannot solve this even though its called analytical approach the equations for the evolution of the probability densities are clearly known, but the solution closed form solution is hard to come by, but using those solutions one could use numerical methods to be able to develop good forecast and quality of forecast using computing variances. So, we have narrowed down all our discussion to one class of method namely analytical methods. So, when is what is the condition under which one class of analytical methods can be applied the model is deterministic the randomness in the solution comes essentially from randomness in the initial condition. So, that is the first case. So, let a for simplicity I am now dealing with the continuous time there is a corresponding analog of this treatment using discrete time. So, let us start with an ordinary differential equation load x dot is equal to f of x comma alpha with the initial condition x not alpha is the parameter as let us assume alpha is well known alpha is error free. If this is the differential equation dynamical system theory tells you the solution at time t can be described in principle by a map the map is phi t of alpha. So, phi alpha t is a map the map maps R n to R n this map is called the state transition map this map tries to relate the initial condition t is equal to 0 to the solution at t. So, it relates x not to x of t. So, relating x of t directly to x not is done by the use of a map this for a fixed t for a fixed for a fixed t the solution x of t is simply a function of x of x not. So, phi of t of alpha of x not is called can be can be relabeled as g f x not. So, the emphasis in here what is the emphasis in here x t is a solution of the system under mild conditions on an f such as Lipschitz condition it can be shown that the solution exists solution is unique if the solution exists an unique that is a representation for the solution. So, the solution can be represented by a transformation of the initial condition the transformation is this phi of alpha of t and that is what is called the state transition map. For example, if x not is equal to a f x in which case f in a in a words f of x of alpha is equal to a f x the elements of alpha may be in matrix a x is x this is linear in this case the solution x of t is e to the power of a t times x not. Therefore, in this case phi t of alpha is essentially e to the power of a f t. So, phi of t is not something unknown in the case of linear system I can quantify what that map is but in general under mild conditions on the existence of solution the solution exists means I should be able to relate x t to x not that relation between x t and x not is defined by the transformation at a mapping and that is what is called the state transition map. Let p not x not be the probability density function the initial state x not let p t x t be the probability density of x t that can be directly obtained using the formula for in module 9.1 unfortunately we are not covering 9.1 so I am going to give you the formula here. So, you do not have to worry about 9.1 but the formula that is referred to is given by the relation 3 in here what is that we have already said that x of t is a function of the initial condition I can think of x of t to be g of x not. So, g is a mapping from R n to R n and g is a is related to phi of alpha of t g is related to phi of alpha of t alpha of t. So, in the case of linear once again x not is equal to a x x t is equal to e to the power of a t x not. So, I can relate this as to g of x not in this case as g t of x not because e to the power of a e to the power of a t depends on t. So, I can simply say is g t of x not. So, now let us consider the relation it is it can be shown that this initial condition is random with the probability p not of x not if the state of the system at time t the probability density p of x of t it is a simple exercise in probability theory it is simply transformation of random variables g is the function that leads to the transformation. So, I can express the density at time t in terms of the density at time 0 times a multiplying factor that is that is the magnification factor now let us look at the fact let us look at what is happening in here. So, p not is the initial density that is evaluated as g inverse of x of t why g inverse of x of t because the distribution of x not is known g of x not is the value of the solution at time t. So, g maps the initial state to the state x t. So, under mild conditions on the existence of solution it can be shown in fact I am not I am not proving this we have already proved this of this is already proved in several text books and probability theory it is called transformation of random variables or transformation of random vectors. So, the formula 3 essentially captures what we want. So, if I know the initial distribution p not p not of x not by doing an inverse transformation. So, if x of t is equal to g of x not so x not is equal to g inverse of x of t. So, p inverse p of 0 evaluated g inverse of x of t times what is the multiplying factor it is the determinant of the Jacobian. So, d x of g inverse of x. So, g is a function g inverse is a function Jacobian of that function the Jacobian of that function is a matrix the determinant of the Jacobian of that. So, that is called the magnification factor if this result is very well known in probability literature. So, I can compute the distribution of a random variable of the solution at time t in terms of the initial distribution and the magnification factor. There are a number of examples one can use to illustrate this these examples are very well known given a number of text books and basic course and probability theory. I would refer the reader to some of these good books for example Feller's book is a good example Papalli's books is a good example. So, there are number of interesting text book that relates to the transformation of probabilities. So, 3 essentially gives you the analytical solution. So, to be able to use this equation 3 what is that I should know I should know the solution I should know in other words I should be able to I should be able to solve the system. So, look at this now I should be able to solve the system grow I have the same problem as I had this morning I am sorry say I defined all these things in the context of in the context of uncertain transformation I have worked a lot of examples. So, I could not use the what is this what am I doing wrong when it sees you it behaves better than the writes is that okay is that okay now one second. So, what is that we have done we have assumed the basic things like this I have an equation I have an equation I have a dynamical equation x dot is equal to f of x of alpha I have given an initial condition initial condition as a distribution p not of x not I know the solution x of t as g of x not that means I know the solution explicitly the solution is the function of g. So, if you know the solution of the differential equation then the formula 3 in the previous page essentially gives you the evolution of the gives you the distribution of solution x t at any t. Now, I would like to be able to reflect on that linear system we can solve them perfectly because x dot is equal to a x is a linear system in the case of linear system the solution is always given by x of t is equal to e to the power of a of t of x not. So, for all linear system this this formula is extremely useful and simple that there are very few non-linear system for which we can solve the system. So, there are very few non-linear system for which we know the g please remember the the expression in g is contained the expression in 3 conditioned on a knowing g. So, if g is not known 3 is not useful. So, in cases where so, what is the summary I know I have a differential equation I know my initial conditions are random, but I can solve the differential equation and get the solution in a closed form x of t is equal to g of x not. So, once I know the g I can use 2 the the the equation 2 in this in this in this module to be able to characterize the probability density function at time t using the formula 3. So, that is the basic idea the question here is that what if I am not able to solve. So, that is the next question and that is what being answered in this slide we are going to assume I have a differential equation I have an initial condition I have an initial condition that is random I have an initial condition use that is random here I am going to change my notation a little bit instead of saying p t of x of t I am now is going to say p t of x of t really does not matter whether you put this time subscript as a subscript or as a form of along with the functional dependency. So, p 0 x of 0 what does it mean this 0 tells you initial time this tells you the initial state. So, if I know the probability distribution of the initial state at time 0 I would like to be able to compute the probability distribution of the solution at time t. So, p t of x of t is the distribution of x of t distribution is the distribution of the solution not is the distribution of the solution of the starter p d that is given above in 4 I am not going to derive this it can be shown this p t of x of t is given by the solution of a partial differential equation where the partial of p with respect to t plus the summation from i is equal to 1 to n of the partial of the product a 5 versus p t now p is a scalar function please remember p is defined on I am sorry p is defined I want to make sure we understand this p is a function p is a probability density function it is defined on r cross rn to r. So, p has the first component time second component x is a matrix and the value of p of x is always a scalar the probability is a number in between 0 and 1 we all know that and the integral of the probability must be equal to 1 f i is the ith component of f now go back now what is f f is equal to f 1 f 2 f n each of the f i r r r r functions of x and alpha. So, each of the each component f i is a function of x and alpha. So, I am going to consider the product of f i with p I am going to sum with respect to i I am going to consider the partial with respect to x i of the product what does it say the time derivative the derivative of p with respect to time plus this term must be equal to 0 this equation is called Louisville's equation this equation essentially tells you the conservation of the probability mass. So, the probability density function may vary in time but what are the conditions it must always be non it must be always non negative and the total amount of probability must be preserved equal to 1 it is this it is a simple statement about the conservation of the probability mass at all time that leads to this partial differential equation. Now please understand the f i's are known p t x t are not known what is the claim the solution p t x t is obtained by solving then the partial differential equation in phi this equation is called Louisville's equation I am not going to indulge in the derivation of this the derivation itself is a very good exercise there are not lots of text books in stochastic dynamics that talk about Louisville's equations and solution process one of the standard ways of solving the Louisville's equation is by method of characteristics those of you who have studied partial differential equation theory should be very familiar with method of characteristics so you can solve these equations very readily there are lots of good good good good text book that deal with the solution process of first order equations by method of characteristics. So, f i is known p i is the known p i is obtained by solution of this partial differential equation solving this numerically would be an would be a very difficult task because you have to discretize it you have to compute this you have to ensure the probability is never less than 0 the probability value is never the total sum must be must be must be equal to 1 so it is the preservation of the probability mass at all time lead to lots of computational challenges if one tries to solve this equation in numerical forms this equation again can be solved in analytical form only for certain special classes of f i's again we are against the hard rock we know how to do but we may not be able to accomplish it because the method of solving levels equation analytically is very difficult numerically is also very challenging so these are some of the tools that are available so levels equation is one way by which we can characterize the evolution of the probability density function of the solution of a differential equation whose initial condition is random once suppose we are able to compute p f t we know everything about x of t that we can compute the mean we can compute the variance we can compute all kinds of statistics if one and that could be used as the forecast product so that is how uncertainty in the forecast within the context of stochastic dynamic system are quantified so far we have come talked about uncertainty arising only from initial conditions again I have not shown you how to derive the formula 3 I have not shown you how to derive the equation 5 I believe these things could be a motivating factor for for the reader to be able to engage himself or herself in different directions I wish you will pursue different lines of enquiry depending on your interest that is the way to rewrite this so I can do the differentiation on the on the second term and I can readily write this equation as equation 6 this equation is called the continuity equation for the probability mass over r of n I have already I have already discussed that again I want to re-emphasize except in simple cases numerical methods are the only option to solve 6 even here designing numerical methods to solve this kinds of probable this kinds of partial differential equation is extremely challenging because of the constraint of this you can solve numerically but not every numerical solution will be able to satisfy the constraint of probability mass being preserved so your numerical methods have to have to embed this condition as a part of the design and that is largely the challenge again there are books written on how to solve partial differential equations of this kind there is enormous literature I would like you to I would encourage you to enquire into this line of argument now I am going to go to the next level where I am assuming there is randomness in the initial condition plus randomness enforcing there are two sources of there are two sources of randomness in this case a perfect way to be able to describe the stochastic differential equation is under the umbrella of what are called stochastic calculus defined by ETO so they are called ETO type stochastic differential equation a typical ETO type stochastic differential equation can be written this form if sigma is 0 the second term is 0 in which case it becomes dx by dt is equal to f of f of t of x of t let me f of t of x of t so f of t of x of t now can be written the differential form as f of t of x of t of x of t times dt so this is the deterministic part this is the random part sigma txt is a state dependent function this is again a state dependent function this state dependent function can also explicitly depend on time this function also can depend on state and time so sigma x is a could be a complex nonlinear function of both time and state f of f is also could be a complex of nonlinear function of both time and state dwt this is the important newcomer dwt is called Wiener incremental process it is a very important class of stochastic processes what is the part what is the property of the Wiener increment process dwt in this case is a is a is a vector so before I consider dwt let me talk about some of the dimensions of quantities involved xt is Rn f is a map depends on two quantities t and xn t comes from R xn comes from Rn the value is a map so it is an Rn so f is equal to f1 to fn sigma t is an Rn by p matrix so Wdt is a vector with p components wxt is a the vector with b b components so the ith component of this vector is called dw i if I consider the ith component I am going to talk about the properties of this this is a random variable which is normally distributed whose variance is delta t what is delta t delta t is a short interval of time so this kind of a stochastic process is what determines the forcing term this is called so w dwi and dwj are independent stochastic processes independent independent independent stochastic processes for i not equal to j so the the vector dwt is a collection of p Wiener incremental stochastic processes which are linear which are which are stochastically independent with each other stochastic independent means what what is the property of stochastic independence if x and y are random variables such that they are stochastically independent e of x y is equal to e of x times e of y if x and y are stochastically independent the joint density is a product of the marginal densities so you can you can talk about lots of properties of independence so independence is a very strong property but using your so using the stochastic independence of this p component vector where each component is normally distributed whose variance is proportional to a small increment and time delta t that is a special class of processes called Wiener processor so that is going to be the random forcing so x of t is going to be a random because of the random forcing and also x not is random the x not is distributed according to this distribution so this is the initial distribution that is random this is the forcing that is random the forcing is magnified by the term sigma so sigma is the kind of a matrix that tries to magnify the the effect of forcing p is the size of the random vector the least value p can take is 1 in which case in which case sigma t is a vector and w d w t is a one standard increment in the extreme case p could be n in which case that could be n linearly independent in a Wiener increment process so with p as a variable between 1 and 1 and n this equation 7 describes a family of stochastic differential equation stochastic ordinary differential equation in general any stochastic model can be represented by this or every model of this type every equation of this type represents some form of stochastic models in this case I am not again going to prove this this has been proved in several recent intermediate level courses on stochastic processes so what is that our interest our interest is in trying to study p t of x of t what is p t of x of t it is the probability density of x of t at time t the probability density this is what we had so far mentioned it as p sub t of x of t both are same I would like to be able to derive in equation that governs the evolution of this probability density that equation is given by this system that is given in 8 the left hand side of this is the same as in loyal's equation you can readily see that the right hand side is the new term that comes into b into being the right hand side squarely depends on the matrix sigma so sigma is a matrix sigma transpose the matrix please remember sigma is p by is n by p sigma transpose I am sorry sigma transpose is p by n therefore sigma sigma transpose is n by n so the this is so take the i j the element of this n by n matrix multiplied by p and compute the second partial of this product with respect to x i and x j sum over all i were running from 1 to j j running from 1 to n that is the right hand side if sigma is 0 it reduces to loyal's equation when sigma is 0 7 becomes a deterministic differential equation in the deterministic differential equation stochasticity occurs only from the random initial condition so you can readily see the left hand side captures the stochasticity arising from the random is the initial condition the right hand side captures the stochasticity coming from random forcing the this equation is the very well known equation derived by Kolmogorov in the 1930's is called Kolmogorov's forward equation in some circles in physics these kinds of equations were derived much much earlier in the early 1900's starting from the days of Einstein who tried to build the model for Brownian motion Chapman Kolmogorov Fokker Planck these are some of the names associated with it so the some of the earliest name in the physics literature for this equation is called Fokker Planck equation it was Kolmogorov in the 30's he formalized everything by putting everything on a very strong mathematical basis so physicists had used lot of good intuition to come up with very good mathematics but Kolmogorov as he often does put everything in a beautiful mathematical pedestal where he developed the basic axiomatic approach probability theory he then developed the axiomatic approach to analysis stochastic processes in particular class of Markov processes and he tries to characterize the solutions of Markov processes so Chapman Kolmogorov equation is one form another is in forward Chap Kolmogorov forward equation in is another equation that described that is related to Markov processes in terms of Markov processes the model 7 actually defines a continuous time Markov process we have already dealt with Markov process in the context of nonlinear filtering in discrete time this is in continuous time so Kolmogorov forward equation essentially tells you how the probability density of the state function evolves in time when the model is subjected to two types of randomness one coming from randomness in initial condition another coming from randomness in the forcing function the forcing function of the type induced by the winner increment you may ask a question why you are particularly talking about randomness coming from winner increments that takes us too far of far into stochastic modeling which we are not going to get into at this stage. So a general description of a Markov model is given by 7 within the stochastic analysis framework it is within this general framework Kolmogorov forward equation describes the evolution of the probability density function so you can readily see I have already alluded to but it is worth the repeating it when sigma is 0 8 reduces to the Lewis equation therefore equation 6 and 8 are beautifully nested you can also see the additivity property that comes into being if I had only initial condition what happens if I had initial conditions and randomness in the forcing what happens unfortunately there is no general equation that talks about the evolution of probability density function density function when there is stochasticity in the initial condition stochasticity in the forcing and stochasticity in the parameter analysis of stochasticity in the solution arising from stochasticity in the parameters is extremely hard and that can be that has been done only for very special cases those are called random differential equations with the random coefficients that is lot of literature within the engineering community but the methods are ad hoc there is no one grand theory that combines initial condition uncertainty forcing uncertainty and parameter uncertainty so uncertainty quantification is the primary goal of stochastic predictability analysis if you make a prediction I need to be able to understand the uncertainty associated the prediction so uncertainty analysis stochastic predictability they are all related disciplines the tools so we talked about some of the available tools that one has at one's disposal to be able to be able to to be able to to be able to get a handle on this stochastic predictability again you can see solution of the Kolmogorov forward equation is also extremely difficult because it is also a partial differential equation again the the mass conservation must be satisfied physicists have relied on on on on on Louisville's equation and Kolmogorov equation in the past for simple systems they have solved it very cleverly in special cases so you can you can look into books that deals with stochastic processes in physics stochastic processes in physical sciences that deals lot about stochastic model that has arisen in the context of physical systems especially of interest in basic physics and and and and and and and basic chemistry with this I believe we have come to the end of a discussion of the of the stochastic predictability so stochastic predictability is little bit more complex than deterministic predictability because quantifying the evolution of the continuous function p of t of x of t is an infinite dimensional problem they are solutions of partial differential equation which are often extremely difficult to solve numerically there are several attempts there are reasonably good methods to solve these things but they are all rather demanding where do you go for a good reading of these Louisville's equation is derived in chapter 8 of satis book 1967 modern non-linear equations is one of my favorites chapter 8 is a very succinct chapter that provides a beautiful discussion of the derivation of Louisville's equation Jasmin ski's book on stochastic process and filtering theory contains a very readable derivation of Kolmogorov's forward equation as well as backward equation Ornall Ludwig Ornall 1974 stochastic differential equation published by Wiley contains a very nice introduction to stochastic calculus on stochastic differential equation that is also my favorite I use Ornall's book in my courses on stochastic differential equations stochastic modeling and and and related topics so sati is very specific to the discussion of Louisville's equation Jasmin ski generally talks about the whole aspects of stochastic filtering so Kalman filter non-linear filter they are all discussed at great length in Jasmin ski's book he also describes in in very good detail an introduction to stochastic calculus from an engineer's perspective so Jasmin ski would be a very good starting point for those of us who are interested in pursuing stochastic modeling stochastic analysis of course a much more rigorous mathematical treatment of this stochastic analysis is book by Ornall with this we conclude our short introduction to stochastic predictability we have now come to the end of our course relating to dynamic data assimilation so I am going to provide a broad summary of what we have done where we are and where you can go data assimilation problem first you need to create models so model generation is a topic in itself so there are modelers whose primary aim is to be able to develop models for various physical phenomena of interest then there are people who are measurement people who develop methods for observations they develop various sensors using which they can measure different quantities of interest to various disciplines pressure temperature humidity wind velocity and so on so observation people they are always interested what is what is that we can we are capable of observing directly they can observe the energy radiated they can observe the reflectivity from a cloud certain observations are direct certain observations are indirect so model describes the state xk or x x is a static model xk is a dynamic model observations are zk we have already seen observations are related to the model through a through a function h which is called the forward operator observations are noisy this is h of xk plus vk then once the models are available once observations are available then we need to be able to calibrate the model the calibration process is called data simulation dynamic data assimilation dda with the model static we simply call it data assimilation when a model is dynamic we call it dynamic data assimilation so data assimilation relates to estimation process I am not going to repeat everything we have talked about the entire course was on estimation the unifying theme is the least squares method once we have gotten the estimation then we want to be able to generate prediction once we are able to generate prediction I want to be able to talk about the quality of prediction predictability limit predictability limit in the case of discrete time model uncertain quantification of uncertainty in the case of stochastic model uncertainty in production uncertainty in production so this is by a large our overall theme we largely confined ourselves to the estimation to the estimation or data assimilation part of the whole story so you can see how many different branches of science are involved modeling development or observational systems then development of least squares development of methods for estimation ability to predict ability to understand the quality of prediction so I would say these together all these together forms the basic components of predictive science so predictive science is fundamental to many many aspects of our human life and within this predictive science we concentrated on this process namely estimation process so we confined our attention to estimation process but we classified our presentation across various topics DDA we talked about the model being static or dynamic we talked about the model being linear or nonlinear we talked about observations being linear nonlinear we talked about the we talked about observations linear nonlinear and other things and we talked about the overdetermined case we talked about the underdetermined case overdetermined or underdetermined in the overdetermined case what is the idea I have more data than the number of parameters to be estimated. If the dimension of the unknown to be estimated is n, if the number of observations is m, m is greater than n that is called over determined system I have lot more data. So in satellite meteorology satellites keep spitting data, radar meteorology, radar keeps spitting data I have lot more data than the number of variables I need to estimate. So many of the problems in statistical meteorology, radar meteorology generally lie in the framework of over determined system. In the under determined system there is much less observation for example I would like to be able to explore whether there is oil whether there is gold. So how do they estimate the amount of gold in this mountain before I start excavating. They go and drill holes whether it is for exploration of oil whether it is for exploration of platinum or exploration of rare metals they go and take samples from the f by drilling holes for the samples. So they get they drill k number of holes from each hole they get the samples from the samples and use using the topography and the formation geological formation somebody is going to estimate well within this mountain containing this much of volume we estimate there will be there will be 10 to the power of 5 kilos of gold that is an estimate why do I need to know the estimate of the amount of gold because what is the current price of gold so what is the net worth that gold that is buried in here may worth. So if I know the amount of money that the gold that is hidden here is worth then I can see how much money I can spend to dig it out and process it and make profit how much money I can spend to be able to pump oil to be able to make money. So all the oil companies in the world they want to be able to estimate where the oils are where the oil reserves are what is the amount of oil reserve estimate and then based on the estimate and based on the cost of production they try to decide to drill a hole here to drill a hole there online offshore arm inshore and so on and so forth. Now to drill one hole for petroleum these days in the current technology I was told it cost anywhere from 12 to 15 million dollars one hole one hole of depth about 2 miles. So how many 15,000 dollars one can spend before they estimate the amount of money is limited therefore the number of observations that can be made in making predictions about natural resources hidden within the earth that is an estimation problem there I have to make an estimation based on a smaller set of samples because the cost because a simple sheer cost of collecting data. In the case of satellite a satellite may cost you 200 million dollars but once you put the satellite once it is going to be live for 20 years. So you take the money over time you take the advantage over time. So depending on what is being measured what is the cost of making measurements the problem can be divided into over determined and under determined problem. Also we have talked about online versus offline so we have we have done various combinations we have talked about static linear when the observations are linear under determined over determined we have talked about online offline we have talked about dynamic linear non-linear observation linear non-linear we always generally talked about over determined system both offline and online. What are the specific class of methods we have we have talked about the importance of deterministic least square formulation a stochastic static least square formulation dynamic least square formulation stochastic dynamic least square formulation we have talked about online methods which are sequential methods it is a Kalman filtering we have talked about offline methods such as 4D war. So we have covered a ten of topics in addition to covering these ten of topics we have also emphasized the importance of mathematical tools from finite dimensional vector space matrix theory optimization theory theory of multivariate calculus matrix algorithms as well as optimization algorithms. So in about half the course we have emphasized the importance of thorough understanding of the fundamental mathematical principles that lie at the core of data assimilation why without these fundamental mathematical principles you cannot you cannot proceed to be able to do data assimilation. So while model building relates domain specific knowledge where the models are going to be developed while observations also understand to develop measurements of observation you understand good physics that relates to design of sensors in our view data assimilation is kind of an engineering discipline we are interested always in trying to find optimal estimates of unknown. So it leads to solving inverse problems so the cold key is inverse problem and while trying to solve the inverse problem we have talked about ill conditioned problem as well as well conditioned problem generally inverse problems are hard to solve a well conditioned inverse problem is easily solvable than ill conditioned inverse problems we have given instances thereof. So with this I think you get a good understanding of what this subject is all about the sum is always greater than the parts there are lots of parts to this story but the sum total provides you a broad overview of the area that has come to be called dynamic data assimilation. Now so the whole course provides introduction to data assimilation across these dimensions static versus dynamic linear versus non-linear model linear observation with non-linear observation over determined under determined offline versus online and all the mathematical prerequisites thereof. Now what is next next is more detailed study of the predictive studies we only scratch the surface in two lectures so predictability study of deterministic system involves a deeper results from non-linear dynamics so non-linear dynamics I am sorry non-linear dynamics I misspelled it a good introduction to non-linear dynamics will help us understand the variation of the solution with respect to changes in parameters for example x dot is equal to f of x comma alpha we assumed alpha is fixed so it behooves us to ask a question what is the allow allow so alpha belongs to a set let us assume b which is a subset of rp so b is a p dimensional set within which alpha lies for every point in b the differential equation is defined but it is important to understand how the solution varies as alpha varies in the parameter space so much of the much of the importance of non-linear systems deals with the variation of solution with respect to variation of parameters non-linear dynamics also deals with the stability properties of solutions long term behavior of the solutions so if one has a thorough understanding of non-linear dynamics and processor relative stability you will be able to then appreciate the importance of the behavior of model solution with respect to changes in parameters that is very fundamental because once you change the parameter some models may change the value the behavior drastically that leads to that is a part of the analysis of chaotic systems so that is something one can do there are also other methods for approximating the solutions and there are several methods one is called un-centered transformation we have not had time to talk about un-centered transform there are other there is also called particle filters particle filters is again a form of a specialized form of motocall or type estimation problem to be able to do the data simulation so un-centered transformation based particle filter based and then there is a whole host of hybrid methodology hybrid method that can be developed so once you know several algorithms you can try to combine some of the better features the various algorithm to create a newer algorithm for estimation for data simulation they are called hybridized algorithms so these are some of the areas one could one could one could specialize in there is lot of literature non-linear dynamics there are lot of literature in approximation of estimate in the context of non-linear models when you do data simulation so these have large potential for developing newer results and newer masters PHD thesis and potential publication in this course we have confined ourselves only to a broad introduction to various tools and techniques for assimilating data in dynamic and static models I am available for further help anybody who is interested in contacting me may contact me through the email address varajan at ou.edu varajan you have to spell it in lower case of course this is a lower case I hope you find this course useful and I will be I will be very happy to interact with any one of you who want to pursue this course online and interact with me with your questions both of the development as well as on the problems that are given to be solved I hope you all enjoy and reap the benefits of reading through several reading and walking through the several parts of the lectures thank you for this opportunity bye