 So what I want to start this afternoon and then continue tomorrow morning is the discussion of what's referred to as data simulation. I think I mentioned on the first day that the first data simulation meeting for ecologists I went to spent most of the three days arguing about what ecologists meant by data simulation, but in the physical environmental sciences is a very specific meaning, which is this process of iteratively updating models as new information becomes available, which is different than calibration, which is constraining your model's parameters with data. Some people, I've known ecologists who want to call that data simulation and what I am calling data fusion, which we'll talk about on Friday morning, which is when we try to use multiple different types of data constraints simultaneously because we're trying to combine multiple sources of information. We've talked about calibration, we've talked about propagating uncertainty into forecast and understanding it, so now we have the idea we've made the forecast, the new observations become available, we want to assimilate that new information back in, feedback to have additional constraints on the state of the system that we can then use to update our forecasts. So what we're going to focus on over the course of these two lectures is this idea of what I'll call the forecast analysis cycle. So this iterative process, again, of predicting the future given our current understanding of the system, here's our current uncertainty, we project that from time one to time two, we compare that to observations, we then do a statistical analysis to update that state. So the analysis is the process of combining our predicted understanding with new observations to update that understanding. So in terms of those two steps, we essentially spent the morning talking about how the forecast step works. So everything we talked about the morning of how we propagate uncertainty in making a forecast, that is the forecast step. So I want to focus now on the analysis step, which is given that how do we do that update. So prior to observing how the future plays out, what is our best estimate of the future state of the system? So if the state variable we're interested in is x, what's our best estimate of x at time t plus one? So if you felt x was your best predictor of x at t plus one, then you've ascribed to a random walk model. The point here is that prior to observing how the future plays out, our forecast is our best estimate of the future state of the system. That's why we made it. So everything we know about what we think is going to happen in the future is what we want to build into our forecast. Before the future becomes the present, that is our best understanding of what we think the future could be. If it wasn't, we would improve our forecast. If we knew something that wasn't in our forecast, we'd want to put it in our forecast. So we assume that we're putting all of our understanding into our forecast model that's accounting for all the uncertainties we have about the system, and so that's our best guess of the future. So it's our forecast, our predicted probability, forecast probability of x at time t plus one. Okay, so now we make observations of the system why those observations are imperfect. Now what's the best estimate of the state of the system? Some balance between the two. Yes, and importantly, it's not the data. So I think a lot of folks before they get introduced to how data simulation works have the kind of intuition that if I make a prediction, then I observe some data, I should just reset my model to the data, which seems reasonable, but the data isn't perfect. The data is noisy. Maybe we shouldn't trust the data completely. We should instead think about how we combine information. So an example that would say come from Catherine's work on phenology. So she's trying to predict, say, percentage of leaf on and leaf off through the season. So if I look at a MODIS image right now, I'll get an estimate of LAI every week. Throughout the summer, that's going to be bouncing around because there's non-trivial observation error in the MODIS product. Should our estimate of whether leaf on has occurred or not bounce around mid-summer? No, we should trust the model that says it is the middle of the summer. All you're seeing there is observation error. You're not seeing any process. If you caused your model to follow all of the wiggles in that, you would just be following noise. So we don't want to just follow the noise in the data. We want to balance that against the data. So instead, we think about, well, how do we get that? What do we want? So we would say we want, instead of just wanting the probability of y, so we started with the probability of x, our forecast. Now we want to understand the probability of x given some new observations. So how do we get to the probability of x now conditioned on some new data? Can we calculate this directly? What's the probability of the latent state x, the true state of the system, given the data? Not really, but what we can do, we can definitely calculate what's the probability of observing some real-world data given some state variable. What do we call this? That's the observation model, but likelihood. That's what I was thinking. This is the likelihood. So if I have a likelihood for y given x, but what I really want is x given y, how do I get from one to the other? Base. Base. Yes. We need Bayes' theorem. So if we want to get to the, if we want to update our understanding of this data system given new data, we need Bayes' theorem. We need a prior. We need a likelihood to give us this updated posterior. The likelihood, in this case, yes, it is mostly that observation error model. Where's the prior? Forecast. Good, you're paying attention. So in this, in a nutshell, is actually the key insight of how data simulation works, which is most of the time when we apply Bayes' theorem, we spend a lot of time thinking about the likelihood of bringing the data in. Priors are sometimes uninformative. Sometimes we have some prior data. Sometimes we use expert elicitation. But, you know, prior, it gets some weight, but it's not a lot. You know, we could come from lots of places. But in a forecast system, the prior is really important, because we just said, it embeds everything we know about the system up to date. It can be highly informative. And it's what we want to update. So the key idea is that the analysis problem is just Bayes' theorem. And it's just Bayes' theorem where the model itself, the forecast itself, is providing the prior. I shouldn't say the forecast, not the model, because the forecast with all of the uncertainties propagated in is our prior. So that's the key concept. Pretty much everything else after this is detail. It's example, because armed with that key concept, you could actually go forth and say, I can write down the prior for my model from out of that forecast however I want. I can write down the likelihood however makes sense. I can do this analysis using whatever Bayesian computational tools I have available to me, and I can customize it. What we're going to see over the next, over the rest of this lecture in the beginning of next lecture is largely in some ways the history of how people have dealt with this problem through a very, a series of specific special cases, specific simplifying assumptions that deal with certain problems efficiently. But there's one of the take home that we will return to at the end of this is to remember there is nothing sacred about the current class of data simulation models that you will find in the literature. And one of the things I said earlier, I think I've said multiple times, was that the data simulation tools that we borrow from the atmospheric sciences were tools that were optimized for one type of forecast problem. And we might not have that problem. So it's completely fair game to say, if we take this concept, we might, as ecologists, optimize it differently for different classes of problems we have. With that, I want to dive into seeing how some of the ways that this actually has been used. So again, conceptually, we have some estimate of the state. We forecast it in the future. We make some observations. And now we use Bayes theorem to combine the two to get our updated estimate. And then that then serves as the initial condition for our next forecast. The previous slide I showed a case where the data in the model don't completely agree with each other. So you're going to find the average balance between the two. But if you're doing this well, if you're forecasting well, the forecast in the observation shouldn't be wildly different. And what you're getting is this case where they're reinforcing each other. So not only, if your forecasts are good, it's not a matter of jumping to the data and ditching with the model predicted. It's more of the data reinforcing. But if your model's wrong, then the data should pull you back to reality. I'm going to start with walking through what I believe to be the simplest possible case of data simulation as the starting point. It'll help you get your head around it, and it also will be the foundation for more complicated cases. So simplest thing, whenever statisticians want to make life as simple as possible, they start assuming that everything that can be Gaussian distributed is Gaussian distributed, so we'll do that. We'll assume that our forecast is Gaussian distributed with some mean and variance. And as a place to get started, so we're going to start at the analysis step, we'll assume that we know what our forecast is. It would be kind of hard to do data simulation if you didn't know what your own forecast was. We're going to also assume that the observation is, again, Gaussian distributed with some observation error. There's no bias correction in here. There's no kind of calibration model in here. It's just simple observations are distributed around the true value with some observation error. So again, simplest possible observation error model. So that's our likelihood or data model. So we're going to assume that our observations are known, which seems like a good assumption, again, that the forecast mean and variance are known and that we know at the observation error. We'll assume that the observation error has been reported. We're not trying to estimate the observation error as we're going. So the only thing really left is the unknown here is that that state of the system that we're trying to update. Now, this is a case of a normal prior times a normal likelihood. So that has, it's a conjugate combination of prior and likelihood as analytical solution. If you guys read chapter five where I introduce Bayes, there's a box that goes with the derivation of normal times normal as an example of conjugacy. It wasn't an accident that I went through that derivation because I knew I was going to need it here. So this is the actual posterior, but I should point out that I can recode this in a way that makes it a little bit easier to understand. So specifically, let's call, so let's call pf the precision of my forecast 1 over tau squared. Let's call pO the precision of my observations 1 over sigma squared. If I do that, then the precision of my analysis that go over here. Is just the precision of the forecast plus the precision of the observations. And if you stare at this a while, you'll realize, yeah, one over that, one over that, added together, one over the whole thing. This just looks nasty because you're dealing with it in terms of variances. But if you deal with it in terms of precision, all we're doing is adding the precision together. Here I'm also simplifying away the case to assume that I only have one observation, but I can make the observations, I can deal with N observations just as easily. Now the mean, the mean can similarly be simplified by saying that the mean of the analysis involves the mean of the forecast and the data. And each of those things are going to be weighted by their precision of the forecast or the precision analysis plus the precision of the observations or the precision analysis. So again, the precision analysis is just the total precision. So this works out to saying that in the simplest example, normal forecast, normal observation error, the mean is a weighted average between the forecast and the data and they are weighted by their precision, whichever is more precise gets more weight. And then the precision of the analysis is the sum of the precision, again that idea that they are self-reinforcing. The analysis is always more precise than either because it combines those two pieces of information. Posterior is just the weighted average of the prior and the data, each weighted by its precisions. So this has a couple of things that could, a couple of examples. So if we have imprecise data, then the posterior of the analysis will stay pretty close to the forecast. That was again the idea, if you have noisy data, you don't want to just be following noisy data. You want to buffer that, the degree to which you follow the noisy data based on the degree to which the model says, yes this is or is not a time where I expect to be certain or not. If I'm very confident that's the middle of the summer and the data is really noisy, I just ignore the data. That's actually the right thing to do. Likewise if the model is not very confident but the data is really good, yeah your posterior is mostly going to look like your data. I think I said this earlier in the week but I'll reinforce reinforce it here. Even if you never do data simulation ever, it's really important to under, an important take home message about data simulation is that to be able to do it we need to know the uncertainty in the data and we need to understand the uncertainty in the models. This, we cannot combine datas and models if we don't know the uncertainties in each and those uncertainties need to be robust and as complete as we can. So if you have an estimate of the observation in the data but it's only considering part of the error in the data and it's actually leaving tons of stuff out, then you're giving the data way more weight than it should. Likewise if you have a model and you're propagating, you know you guys just saw propagating five sources of uncertainty into a prediction. If you propagate one of those and you choose the wrong one you may be way overconfident in your model. Like let's say you only consider the initial condition uncertainty in that model that you did in the last exercise. You would be giving that model huge weight that it didn't deserve. If you failed to include the parameter error, the process error, the the driver error you know you'd be giving that model way too much weight. So doing this in theory is simple like it's just a weighted average we're done. Doing this in practice is about like yeah, being coming obsessive about characterizing these uncertainties correctly, which is why that's where we started the course was on characterizing uncertainties because you can't do data simulation and forecasting without it. So if that's our simplest possible analysis, let's talk through our simplest forecast. Technically our simplest forecast would be a random walk. I'm going to make this slightly more interesting by assuming that the process model has some slope to it m, some rate of change. So that may be decaying back to a background rate. It may be a rate of growth, but we're going to assume that there's some constant x of t plus one has come some constant slope m times the current state. If m is one this is just our random walk model. We're going to assume that the process error is normally distributed, mean zero so there's no bias in the process error with variance q. The reason for q is murky in the depths of the data simulation literature, but it's always q for some reason so we're going to stick with it because you'll see that in other literature. At this stage with the simplest possible model we're going to assume that the model's parameters and the process error are known. I'll point out here that it is not hard to augment what we're doing to consider that slope to be an unknown that needs to be updated. It is very hard to consider this q to be an unknown that needs to be updated. The reason for that is because very similar to the math we worked out here when we combine normals they end up as weighted averages of their uncertainties. If you put q in the numerator and q in the denominator it blows up. You can't constrain it. If I put m in the denominator fine no problem I can update m but I can't update q if q is the weight with only assuming normal normal there are ways to get beyond it but it requires adding terms to the model. In fact this is something that I think I have in tomorrow's lecture some examples of doing this with some mechanistic process models with Megan's lab mate and Reho and I've spent a lot of time obsessing over the fact that in ecological problems I would totally believe that we understand the data we could figure out the observation error I might even be able to wave my hands and believe that I could constrain the model somehow prior to starting but the idea that I knew the process error before I started is just ridiculous like there's no ecological problem where we know the process error before we start. Process model process error in this case we now need to consider the initial conditions as what's going in because we have a model that doesn't have drivers so we don't have that term it doesn't have random effects so it doesn't have that term again we've assumed the parameters are known so that's not that term so if the five terms we need to worry about we have initial conditions in process error so what is the initial conditions well that's what came out of the analysis so I'm keeping this mu a tau a that tau a is just pa equals one over tau a so how do we figure out what our forecast is so we went over five different ways to make a forecast this morning anyone think about which ones which one might be useful here you could technically use all of them but one of them gives you the most elegant yeah we can drive this analytically we can drive this analytically because this is normal and this is normal and this is linear so since it's linear the analytical approaches to propagating the moments will work and since everything is Gaussian the Gaussian is fully defined by its mean invariance so if we just know that mean invariance then we know everything we need to know we're not losing any information in higher moments and since it's a linear transformation of a normal it's going to come out as normal on the other side so we're not losing anything there so how would we do that well we kind of saw that this morning the expected value of x t plus one is expected value of the process model well the expected value of the error is zero expected value of the model is constant times expected value of x which is you variance we've done this derivation already variance of x is the variance of a linear model so if we do this out in full slope squared variance of x plus the variance in this error minus two times covariance yep now i'm going to make a simplifying assumption which is there's not a covariance between the process and the error so this is actually not uncommon to find that your residuals are not correlated with your parameters so that's approximately m squared variance plus variance residual variance and the variance of x is yeah there because it's x at that point that's at our initial conditions this is our variance in our initial conditions which is what came out of the analysis m squared from the process model process error so our forecast involves initial condition uncertainty process error and a process model worth pointing out if you had a linear model that did include other uncertainties and other processes but remained linear you could expand this out to be that full five term derivation i did it before just as easily there's nothing you know this is just the classic derivation of the simplest forecast model but if you had a driver in that model you just add that term in it's going to again end up with some slope times the uncertainty in your drivers same with your parameters same with any random effects and so overall the probability of x at time point t plus one so our forecast is normal mean uh mu x i guess that should be yeah mu at time t uh and then our forecast variance so now we have the capacity to put these things together we have a forecast step that we update x from t to xt plus one and we have analysis step that we then update x as i wrote on the board here we're now again putting this in precision notation some of the precisions and a weighted average of the data in the forecast has an analytical has an iterative analytical solution like you can't write the whole thing out in closed form but as you you can iterate between these two steps in a completely analytical way there's no need to do sampling there's no need to run ensembles run the model once propagate all the uncertainties analytically cool this analytical solution has a name this is known as the kalman filter is named after rudolf kalman who got a presidential medal for this now you don't get a presidential medal for just doing a bit of clever math that bit of clever math is actually enabled an enormous amount of technology that's in in our society and i'll come back to some of those examples of where the kalman filter has been applied in the real world later in this lecture also cannot help but snark on the fact that i could not imagine the current administration giving a scientist any credit for anything we more enlightened time so this is an example of applying a kalman filter just focusing on the means so here the solid line is some truce data the system the dots are data the data don't fall on the truth because some observation error has been added on to them and then the filter is the kalman filter so it's trying to reconstruct the dynamics of that system through time and it does a pretty good job of following it when you get to places like here where there's gaps in the data it's kind of has no choice but to kind of project across them and this was a simple ar1 simulated data uh fit with a kalman filter where the slope was known now that that last image didn't have the confidence interval on it so here is just a separate plot looking at how the uncertainty in the kalman filter evolves over time so what we see is that when there is data the uncertainty kind of reaches a steady state kind of reaches an equilibrium because in this particular case the observation error is not changing through time the process error is known and so it reaches a steady state these spikes correspond to those periods of missing data so when you have missing data the the analysis uncertainty increases during those data gaps now if you remember yesterday when shannon talked about state space models they had this behavior where they had a confidence interval around the data but if there was a gap in the data they would balloon up and come back down what do you notice it's different about the shape of these uncertainties compared to what we saw in the state space model they're not symmetric no so so in the state space model remember the state space model could look ahead and see that there were these constraints after the gap it looks both forward and backwards in time that that ability to look forward and backwards and time is characterized by what's known as a smoother so the state space model is a smoother it opera it uses information in both directions the kalman filter is a filter it only progresses through time in one direction you know it's designed to do iterative assimilation it's designed to use information as it's coming in not to go and go oh now there is observations past i'm going to revise the past so the kalman filter does not revise past time steps that it's seen previously it just you know when it gets to this data point now it goes oh now i do know where i am it doesn't say oh and i should go back and revise where i was previously it's a forward operation and because of that it has this characteristic kind of shark tooth behavior to its or saw tooth behavior to its uncertainty yes saw tooth as opposed to this which i kind of likened to kind of the balloon animal model of uncertainty it kind of bulges out where you aren't constraining it with data so the kalman filter derived in the simplest possible case for one variable maybe used to ecologists but was not going to cut it with atmospheric scientists where they've got you know hundreds of thousands of atmospheric grid cells with dozens of state variables they wanted something that could deal with more than one thing at a time and so what we're going to do next is generalize this kalman filter from the univariate case to a multivariate case before i dive into the math for doing that i wanted to highlight some of the key concepts behind what we're going to gain from doing that and i talked about this like earlier in the week but it's it's worth reiterating so imagine that my forecast model implies a covariance between two of my states at this stage with an annular kalman filter it's not done as an ensemble would literally be done as a mean covariance but the key point is that there is a covariance in the forecast so that when we observe one thing about one of my state variables i can use base theorem to update that thing but i can also use the covariance to update other latent states so that's one of the advantages of the kalman filter is that it this kind of data simulation approach is it allows us to easily update latent variables during the analysis step and likewise as i mentioned earlier in the week to have mutual information so if i did have an observation here about a gb the update on each of these variables would be coming from the direct constraint of that variable and indirect constraint through the correlations and this works not just over multiple state variables but it can also work over multiple locations so let's say i am forecasting only one thing but i'm forecasting that one thing at multiple locations in space when i'm doing a forecast of multiple locations in space you could easily see that my forecast you know for the population here and the forecast for population there are also positively correlated because they're maybe responding to the same environmental inter-annual variability the same environmental drivers so information at one location helps me constrain what's happening at other locations in proportion to the degree that with the degree to which we forecast them to be correlated so again this allows us to do what i kind of like to call the idea of models as scaffolds so the model itself the thing that we're using to make the predictions ultimately what it is contributing to this is that covariance structure it's contributing a prediction that tells us how the covariance between different processes in space and time should be related to each other and and thus allows different types of data that may not be directly comparable to each other to be mutually informative even if you can't necessarily run you know up even if you can't make a scatter plot between how variable x is related to variable y you know because they're operating on different scales or just disconnected in their process your understanding of how the process works which is what you've built into the model your hypotheses about how reality works are used to link different data sets together through this data simulation process so generalizing this to a multivariate case i'm going to try to keep the notation as similar as possible if you haven't been thinking about matrix algebra in a while you're going to take a few things for granted but i'll try to try to highlight what i'm doing mu a and mu f are our means instead of being a single number they're now going to be a vector of our means length n long so i'm now care about n means those ends may be n different locations they may be n different variables that my model's predicting where they could be some combination so if i'm predicting five things at 20 locations n is unwrapped should be 100 long and it's my responsibility to figure out how to wrap them back up to know which means go which width variables and locations it's not actually hard the error covariance matrix in the analysis and in the forecast are going to be p a and p f that's what i was previously calling tau i and tau f and those are n by n covariance matrices because there's n states i then have some vector of observations that's length p long key point p and n don't have to be the same number it's like i said you can have latent variables you can have states that are not observed directly likewise you can have states that are observed with multiple observations so p could be bigger than n p could be smaller than n p could be and n could be the same but all the observations are just for one state it doesn't matter what matters is that there is going to be a term that maps that gives us a map between y and x that's going to be called h since i have p observations the observation error is described by covariance matrix r it's not uncommon for that to be diagonal to assume that our observations are independent but the structure of the Kalman filter doesn't require that your observation errors be independent in practice they often are but they don't have to be and so h is what's called our observation matrix and that's essentially i'll show an example on a slide very soon but it's essentially just a lookup table that says which y goes with which x so you know the first x goes with the first y the second x goes with the third y you know whatever it allows us to keep track of which goes with which m is now our process model this can now be n by n so we now have a matrix process model i don't care what part of ecology you come from you've probably seen matrix models before matrix population models ecosystem biochemical models can all be all pool and flux models can be represented in matrix forms food webs can be represented in matrix forms so matrix models i'm going to take that as actually not that big of a limiting constraint and so our analysis problem comes down to a normal likelihood of the data given the latent state x and the observation error times forecast which has some mean and covariance so we have our analysis step is again normal normal it's just multivariate by multivariate normal they're still conjugate and so we can still multiply them together conceptually the answer will be the same which is conceptually the precisions will add up and the means will be a weighted average of the means it it just gets uglier so here's my variance this one over one over this inverse is just turning those into precisions i'm adding those precisions up and i'm inverting them again to get back to a covariance so again i'm just summing up the precisions that term shows up again so that's the in the denominator so that's again the weight here's the data weighted by its observation error here's the forecast weighted by the forecast error it's just uglier it's very common to reorganize this in terms of this term k which is called the kelman gain which is a reorganization of this to kind of say instead of thinking of this thing as being weighted with that thing of thinking of the analysis being the forecast times some gain okay so let's give a really simple example let's assume that we are forecasting three things three state variables one two and three and let's have said assume that we observe two of them why two and why three so we don't observe the first variable either the first location or the first state variable we'll also assert assume that our observation error is diagonal of sigma so the same observation error is for every observation and that they're independent which is a simplifying case but it's a pretty common assumption for observation error now h that observation matrix again what it's doing is mapping between the two so it's saying the first y maps to the second x the second y maps to the third x so it's very common for these h matrices to just be zeros and ones mapping between these things that said they don't strictly have to be just zero and ones so if you have some process where you have you know your your forecast on one grid and your observations on a slightly different grid which can happen with remote sensing these numbers might be fractional to say you know i have to add up remote sensing pixels map to my observation so they each get you know one third weight or you know sort of sort of any sort of spatial re-gridding or downscaling is always handled in h any place where you need to bias correct so h being one essentially implies no no bias you know so if you need to bias correct you can put that in there is any place where there's just you know proxy data you know the tdr example i need it there's a slope of the line between tdr and soil moisture that slope is what goes in there in a more general sense when you get beyond the most basic Kalman filter h is just the that observation process in the observation model so you you know in more complicated examples h can be a function that just maps between the latent states in the in the model and the real world observation so i gave when when shana was talking about state space models yesterday i gave the example you know if your remote sensor h may be you know the the everything between our latent state which may be like the vegetation that's actually on the ground and everything in what we observe from the satellite so it could have to do with you know could be a complex function of the cloudiness of the air and the aerosols and you know sensor bandwidths and stuff like that so it can get more complicated but for most it's not uncommon for it to just be like i said a mapping variable posterior for x2 and x3 aren't going to be the super interesting one i'm going to focus on the posterior for x2 because we don't actually observe x2 directly so we end up with a case where the posterior mean for x2 involves the mean of the forecast plus some weight based on the difference between we forecast and observe for the latent state and what we forecast and observed for each of these two latent states and these weightings are coming from the these error matrices i think this is interesting and worth mentioning because this math is going to be familiar to some of you because this is the exact same math that goes on in every spatial statistical model when you're doing creaking this is the same math if you do a time series model and are projecting to an unknown state so if you thought of those x's locations as p is a spatial covariance matrix this is creaking so what i wanted to point out here is that the only real difference between spatial multivariate models temporal time series multivariate models data simulation models is really what p means so in a spatial model you have a covariance matrix and the way you construct that covariance matrix is by writing down a function that describes how correlation changes as a function of distance and that is an empirically calibrated function just of distance and the reason you do that is because you have n observations but you need some way of filling in an n by n covariance matrix there's no way to fill in an n by n covariance matrix with only n observations unless you make some assumption about the structure of that covariance matrix and the way that struck the common assumption in spatial statistics is that the correlation decays as a function of distance in a data simulation you're doing essentially the same thing but the way that you're filling in that covariance matrix is with your forecast model so you're using your forecast model as the way of filling in that covariance matrix once you do that under the hood it's just like creaking just like time series you can imagine if you have a model making predictions in space you know it's essentially borrowing information in space not based on distance but based on the similarity of your forecasts which is probably implicitly a function of distance but not explicit okay forecast step i think is actually a good bit more straightforward mean of the forecast expected value of f given x a is just m times mean of the analysis covariance of the forecast has q our process error plus and this is this is just essentially you're not familiar with matrix you know this m times a covariance matrix times m transpose is essentially analogous to m squared so it's it's mathematically directly analogous to m squared times you know variance it's just now a covariance and m is also a matrix so examples of using the kelman filter if for example m were the equations for new cotonian mechanics where we do know the constants then the kelman filter is how your gps gets you around it's making observations of where you are in space with uncertainty it's assuming that you are obeying the laws of physics and where you are going so it's predicting where you're going to be the next time point and constantly correcting that based on where it thinks you are but it never believes the gps satellite completely which i think you all know is true and it's also consistent with why you like sometimes you know you exit and the gps thinks you're still on the highway because it has not rejected the hypothesis that you are still on the highway until it's accumulated enough data that it goes aha the observations are no longer compatible with that model if m is newtonian mechanics you can also land on the moon that's why kelman got presidential medal one of the first real world applications of the kelman filter was in the apollo navigation system because if you try to get from here to the moon by dead reckoning launch and never correct anything along the way you are going to miss the moon instead you are observing where you are in space and readjusting based on an algorithm that's keeping track of where you are and as as my colleague dav more who teaches he leads the flux course that i've taught out a number of years likes to say data summation is not rocket science but you can use it for that which is true data summation is used in in often less innocent uses of rocket launches as well okay so pros and cons of the kelman filter big pro analytically tractable makes it very computationally efficient in that regard and it gives you some understanding of what's going on in data simulation so even as as we start relaxing assumptions the intuitions are very similar well so we'll start relaxing the math but the idea that the the analysis is going to be weighted average between the data in the model is going to stay the idea that when we combine pieces of information we're going to be more confident than we were with either that's going to stay it depends only on the previous state so the the kelman filter is marcovian in that regard so you need to know where you are you don't need to know where you are you know n steps in the past in the past that's really handy in fact that's one of the reasons data simulation can be helpful so imagine i was trying to forecast the weather for the next few hours because they update that every six hours and if i had to re-assimilate all of the data since we started numerical order forecasting in the 60s till today to predict the next six hours then numerical word forecasting would become it would get harder every day no what you're saying is all of the information about everything that came before now is embedded in the forecast so we embed all of our understanding in the forecast and so we don't need to go back and look at the previous data the past is in the past which may again makes this something that we can handle there's a lot of problems that are computationally infeasible to do as a smoother where you have to fit all of the data at once where you even if you're not making a forecast you can do it as a filter by moving through the problem in time because you're only caring about each observation at each point in time and you're updating your understanding so when you do that you go back and rerun an analysis over the past that's a re-analysis which if you've ever used atmospheric data products you may have used for example NCEP's reanalysis products or the ECMWF reanalysis products those are the analysis part of the data simulation rerun after all the data was available and they've cleaned it all up so it's not the raw observations it's it's that analysis step these are things that are either pros or cons depending on how you're thinking about the world linear normal I tend to not believe the world is linear normal but sometimes it's a nice simplifying assumption it does involve matrix inversion when that matrix that n by n matrix gets really big the matrix inversion itself is the computational cost so if you know imagine if I have if I wanted to assimilate say a snapshot of the world at Landsat resolution 30 by 30 meters there's a lot and I have to unwrap that into length n and then make an n by n covariance matrix between every single Landsat pixel on the planet that's a very big matrix assumes that all the parameters are known and it's again forward only if we look at the history of the Kalman filter the assumption that troubled the communities that were using it which were largely engineers and atmospheric scientists and those folks the linearity assumption was the one that presented them with the most immediate problem so if we think about what we learned this morning how might we relax that linearity assumption Taylor approximation exactly so what we're going to see is that the major flavors of data simulation that are used by the community actually map fairly cleanly onto what we learned this morning about the alternative ways of propagating uncertainty in a forecast so the analytical moments approach maps on to the Kalman filter the Taylor series approach maps on to what's called the extended Kalman filter it could have been more original but you know extended Kalman filter works can anyone guess we're going to stay on the trend of lacking originality what they called this one ensemble Kalman filter you guys are brilliant yes the ensemble Kalman filter is just the same math but now we're doing the forecast with an ensemble and then here they were more clever if you extend this to using the Monte Carlo uncertainty propagation that's called a particle filter as to my knowledge no one has tried this so the last thing I wanted to finish with is the extended Kalman filter like I said earlier I'm going to this lecture covers the analytical approaches we will pick up with the numerical approaches the ensemble Kalman filter and the particle filter in the morning an extended Kalman filter assesses that linearity assumption and it does it by saying uf is just whatever my non-linear model predicts if you put the mean in technically that violates Jensen's inequality it is an approximation but I guess the counter argument is well what else would you put it if all you know is the mean in the variance we're going to update the variance using Taylor series expansion and so to do that we're going to need f which is going to be the Jacobian the matrix of all those possible pairwise derivatives once we have that matrix of derivatives which is essentially a matrix of slopes the forecast step just switches to being q times you know f squared initial condition uncertainty instead of q times m squared initial condition uncertainty so instead of having a matrix of slopes we have a matrix of derivatives which is a matrix of slopes but the key point here is when you run the normal Kalman filter this matrix of slopes is always the same when you run the extended Kalman filter this matrix of slopes gets re-updated every time step because it's derivatives with respect to different means every time step so you analytically solve the derivatives but you have to plug in what the current mean is every time so why the extended Kalman filter works is because you are changing those means every time step is this a decent approximation I mean that's honestly in many ways kind of a calc one sort of question you know if we have a bunch of derivatives and I want to approximate you know some process you know if I if I make my time steps small enough then approximating that with derivatives isn't too bad the bigger the time step the worst approximation that this begins the more nonlinear the process the worst approximation this is so if you have a highly nonlinear process and you're taking really big time steps like you want to you know if you want to update soil microbes annually like not gonna work because you know they have generation times in the order of hours and you know so the the time step would need to be appropriately short technically it can be extended to higher order moments but I'm not going to go through that derivation key take-homes the degree to which this works has a lot to deal with the nonlinearity of the model and the time step but technically it is biased because of Jensen's inequality that can't be that and likewise because of Jensen's inequality the assumption of normality is also technically false because if I put in a normal distribution as my initial condition and I apply a nonlinear transform to that what comes out of the other side can't be normal the degree to which these are bad assumptions again has a lot to do with how nonlinear things are and how how often you're doing the updating