 In the last couple of lectures, we saw the role of data simulation and its importance to predictive science. Prediction is forecast. We also saw yesterday that not all processes can be predicted accurately, some can be done rather precisely means many cases forecast will not be perfect. An imperfect forecast is said to have errors. So we are going to talk today's lecture with a good classification of forecast errors because this classification will help us how do we attack the problem of correcting forecast errors using data simulation and it will also tell us what kind of errors need what type of tools to be able to correct them in order to be able to make the forecast better. So today's emphasis is going to be a classification of forecast errors and I would like to remind the reader that forecasting is fundamental aim of data simulation and forecast errors are inherent in every forecast in order to be able to correct the error we need to have a handle on the classification of errors. To do that I am going to start with the relation between the truth and the observation. The truth is the true state of mother nature. Realizations are data obtained through sensing the mother nature's true state. So let us assume x star is a vector be the unknown true state of the system under observation. For example today's temperature in the city of London that is a true state of the mother nature but we are going to observe the true state so z is called the observation. Observation in general is a m vector true state is a n vector the observation and the true state are related through a fundamental mathematical expression z is equal to h of x star and v plus v here v is the observation noise in the nonlinear case z is equal to h of x star plus v h is a nonlinear function. So observation may be related linearly to the true state or observation can be related to the true state nonlinearly in either case there are going to be errors correcting the observation. We are assuming the errors are additive in nature that is a simple way of dealing with observational errors and this aspect of considering observational errors being an additive process has been around ever since the days of Gauss that we talked about in the last class. So you can readily see if you want to know the true state of mother nature you can only sense it through devices the devices output the z the input of the devices are the x stars so z contains information about the true state of mother nature x star but it is corrupted by additive noise so we say z contains the information modulo the observation noise v. This observation noise is in general unavoidable it is also unobservable in what sense we may not we will not be able to separate h of x star or h of x star from z if we are able to separate h of x star from z we are able to have a filter that will filter out the noise in general such filtering is not easy to develop because we may not know very precisely all the properties of the noise we generally assume it is Gaussian distributed it is also white and so on. So if you want to know the true state of mother nature you have to observe her evolution observation contains the secrets about mother nature and that is not unusual when you feel not too well you go to the doctor the doctor wants to be able to estimate your true state of the physical system the true state of the physical systems are obtained by making observation blood pressure temperature various kinds of tests and so on. So observations are indicators of the underlying true state of any system be it human or nature. Now let us pull the other one we are talking about model what are model models represent abstractions of reality models represent our understanding of how mother nature works it reflects our cumulative understanding of the working of the system. One model stems from it must be stems from not systems I am sorry stems from the fact that the forecast product are generated from model state or solutions. So to do forecast models are necessary models represent our understanding of the mother nature our understanding of the mother nature sometimes closer to being perfect sometimes may not be perfect a model and its solution in general depend on number of factors pertinent to the behavior of the system being model we have already talked about the role of parameters in models and so on. So now here comes the two facets the reality as it is our sense of our sensing of reality in terms of observation models represent our understanding of how reality probably works and there is probably a gap between the two it is this gap between the actual reality and our understanding of how reality works leads to forecast errors if the model is perfect the forecast are perfect if the model and the reality if there is a gap that gap reflects in the form of forecast errors. Now I would like to be able to classify the presence of this gap between our understanding of reality and actual reality itself to emphasize the intrinsic dependence of model solution on various factors. So we have already seen even the dynamic model the model the solution depends on the initial condition parameters boundary conditions. So the model solutions are contingent on the value that we are saying to these variables because these variables control the model solution anybody who has done anything in differential equation knows the differential equation solution I have a general solution you get a particular solution by specializing the initial conditions if we change the initial condition the solution changes. So changing the initial condition changes a solution in other words initial condition controls the evolution of the solution changing the parameters controls the evolution of the solution. So anything that can change the model solution is called a control variable. So control in principle refers to all the factors collectively that affects the evolution of the model solution based static dynamic model. Let us see refers to a subset of RL RL L is a integer RL is a space of real vectors of size M I am assuming C is a subset of RL that means any vector C belonging to script C is a control vector of dimension L through the L number of control in general L C is called the control space every point in the control space corresponds to one solution of the model if you change the control vector that is the model solution changes. So ultimately the behavior of the forecast depends on the value of the control that we use and I would like to quickly remind that the control consists of initial condition boundary condition parameters anything that is part of the model if I change any one of these factors if the solution changes I call it control in static model the control represents only the physical parameters in a static model there is no initial condition there is no boundary condition there is a bunch of parameters and that is all. So in that case the parameters we call it alpha alpha is a p vector in this case L is equal to p the control space is essentially a parameter space in general I want you to remember the parameter space is only a subset of the control space the control space consists of initial condition boundary conditions and parameters but parameters a parameter set is a subset of all the controls. So in the static model there is no initial condition there is no boundary condition simply a parameter in the dynamic model is the control is a union of parameters initial condition boundary conditions. So L is equal to p plus n plus q where is p coming from p is the number of parameters n is the number of initial condition q is the number of boundary condition so L is equal to p plus n plus q in non-linear differential equation there is a branch called bifurcation analysis in bifurcation analysis c represents the parameter space the bifurcation analysis depends on variation of the behavior with respect to variation of parameters in the parameter space. So as c varies in script c we get different instantiation of the model anybody who knows differential equation knows that if the initial condition changes it represents a different model if the parameter changes it represents a different model. So each model within a class so for a particular choice of the parameters we call it an instant of that model the instant being picked by the values of control. The set c in a sense denotes the set of all models so from a model now we are considering not one model a class of models. So in general in science a model does not mean one model a model means a class of model when we say primitive equation model primitive equation model is not one but it is an infinite collection of models barotropic vertices equation model is not one but the collection model shallow water model likewise same thing with respect to harmonic oscillator harmonic oscillator when it is generic thing the frequency if we change it the model changes initial condition changes it changes if you add a friction it changes if you add a forcing it changes. So by model we always mean infinite class I have to pick from this infinite class a particular model that can be utilized the picking of the particular model from the class is done by specifying the values of the control the control consists of parameters initial condition boundary condition whatever applies whether it is dynamic or static. So now with this as a background I am now going to define a classification of forecasters let us see be in instance of the model so in other words I am representing a model by the choice of the control vector. So if c is a control vector xc let us denote the solution as c varies xc varies. So define z superscript m so z superscript m so what does this mean xc is the model output h is the operator z of m is the model counterpart of the observation z of m the non-unique model counterpart of the observation. Now I would like to distinguish between model counterpart or model predicted observation from the actual observation z is the actual observation comes from the meter that I read satellite radar voltmeter and meter whatever it is but if I know the state if I know h I can also predict what the model predicted observation ought to be. So there are two versions of the observation the actual observation the model predicted observation. So let c star be the c such that x of c star is x star the true state if I assume the model is perfect my model includes a perfect model a perfect model has to be parameterized that c star be the parameter that gives me the perfect model. I do not know what c star is but I am assuming such a c star exists if your modeling is good such a c star exists that is no question about the existence of that so this is where the whole thing lies there is a c star that corresponds to the true state of nature the model matches the mother nature but there is a c I have picked the c may not be c star if the c is not equal to c star the x of c is not equal to x of c star x of c star corresponds to the true state of nature x of c is the state predicted by the model that corresponds to the parameter c and these two in general need not be the same and that is where fundamentally forecast errors arise. So the forecast error now can be defined as error in the model induced by the control vector c E f c is equal to z super amp what is that that the model counterpart of the error which is generated by z the z without any superscript that is the true state of mother nature the difference between the two is what I just talked about what the model sees what the mother nature has the difference between the two is called the forecast error. So the forecast error now has a description z m is h of x c z is equal to h of x star plus v so the you get this following equation 3 the first term I am going to call it b of c comma c star minus v v is the observation noise what is this b b is the deterministic part of the forecast error. So the forecast error consists of two parts one due to the unavoidable unobservable random error v the second one the deterministic part which is which arises largely because of my inadequate knowledge about what mother nature does she uses c star I use c if c is not equal to c star that is going to be a bias so you can think of b as a bias in the forecast the bias is a function of c and c star. So this is the general framework in other words the forecast error always consists of two parts a deterministic part and a random part I also want to quickly add random part we cannot touch it we cannot annihilate the random part random part stays with the observation. So what is the best you can do if you want to be able to reduce the forecast error the only thing that you can do is to hope to annihilate b if you can annihilate b then you will be left only with the random error which is uncontrollable. So what is the basic idea of forecast error classification I would like to be able to understand what part of the forecast error I can control was part of the forecast error I cannot control we can only deal with things that I can control over do not worry about things that you have no control over. So the separation of the forecast error into deterministic part and the stochastic part is very helpful in trying to design schemes for correcting forecast errors that is the motivation for this. So given EFC now please go back EFC is given by equation 3 so you can think of E is equal to b minus v so given EFC so what is that we would like to be able to do what is the concept of forecast error control there is seed star so we have talked about separation of forecast errors into deterministic part and the random part now we are going to look at a classification of forecast errors we know EC is the forecast error EC so what is the basic idea here is a C using C I am going to generate a solution I am going to get a forecast let us say XK or I will simply say X of C X of C but I have X star which is the unknown true state X star is different from XC so the question is how do I change XC to X star we all know X star depends on C star so the only way to move XC to X star is to change C to C star and that can be done by adding a perturbation delta C that is what is being talked about in here. So if you want to be able to annihilate the error you have to be able to change the control you have to add a correction delta C to C and if C plus delta C is equal to C star it will become H of C X of C star and that will be the true system or the true state and Z represents the truth so the truth minus truth will cancels itself and V is the uncontrollable unavoidable noise which is case this is purely random. So if you look at fundamentally how do we can improve the quality of forecast here analyze the solution the only way to be able to improve the quality of forecast is to find an increment delta C to the control which an added to the control C will annihilate the deterministic part of the forecast error that is the fundamental relation that one needs to bear so this can be pictorially represented like this. So the control space C represent the current belief about the model C star is the unknown truth if I use to see I have picked a model X of C if I picked X of C X of C gives me the observation Z of M but C star has X star that gives the observation Z so how do I minimize the difference between Z and C star in order to be able to minimize the difference between Z and C star I should be able to minimize the difference between C and C star that is where the control lies. So what is the increment I should add to the control in order to force X C closer to X star which will in turn make Z closer to Z M so this pictorial view is the basis for classification of forecast errors. So look at this mathematically now as C tends towards C star X of C will tend towards X star which will then imply Z of M will tend towards Z when Z of M tend towards Z means what my model reflects mother nature my model reflects mother nature I cannot do any better than that. This picture essentially tells you how one can hope to control the forecast error largely due to the difference between the model forecast and the true state of the system. So with that as a basis now I am going to provide the actual classification we have been talking about case one in this case model is perfect. Means I have thorough understanding of mother nature but I did not pick my C to be equal to C star I may have a total understanding in the process but I may not know the initial state of the mother nature so C is not equal to C star. So in this case the forecast error is largely due to the incorrect control the model is capable of replicating mother nature but I did not know the actual parameter mother nature uses I only guess it C is my guess C star is her choice the difference between C and C star is going to reflect the difference between C and C star reflects the forecast error. So E of C in this case B is C C star minus V almost all the standard formulations of data simulation problem for deterministic static and dynamic models are of this type. So what is that we assume we assume my model is perfect let us talk about that for a moment now no modeler believes that model is not correct because if it is not correct he won't use it. So if I am going to use a barotropic water city equation to be able to describe the hurricane scientists know that it captures 90 95% of reality is very close to being perfect if it does not if a scientist does not have that confidence in the model they won't use it. So much of the development in the forecast literature fall into this category namely models are perfect we assume the models to be perfect even the perfect model if there is going to be a forecast error is largely due to the difference in control if there is a forecast error only because C is not equal to C star I have the ability to be able to change the control thereby force the forecast error to be purely random. So most of the standard formulations of data simulation falls in the category the well-known 3D war 4D war forward sensitivity method nudging are all some of the examples of methods used to do data simulation fall under this category case 2 this is much more difficult case the model is imperfect if the so if the model is imperfect and my control is not the same. So there is 2 kinds of errors one coming from the model not being perfect another coming from the fact control is not perfect. So there are 2 types of errors that are confirmed it is very these confounded errors are very difficult to separate we cannot say this part of the error is due to this is this part of the error due to this this confounding is a large headache the forecast error is a confounding of the model error and the control error in this case we need much more powerful techniques and this is the most difficult case that one can deal with this case can be handled in one of many ways depending on how one wants to postulate the correction to the model error efficiency in other words you have to want to correct the model error in a particular way the way that you would like to be able to correct the model error will dictate the method by which you are going to correct the forecast error. So this view of data simulation as a forecast error correction was proposed in a paper by Lakshmi Varahan and Lewis in 2010 it is a basis for the paper forward sensitivity based approach to dynamic data simulation these 3 error classifications have been the subject of this paper and the forward sensitivity method we had proposed is one of the methods by which we can correct model errors as in case one with that we have concluded analysis of classification of forecast errors thank you.