 So I changed my title a little, I changed it kind of late. It was chaos, which isn't entirely related. They'll be a little bit of chaos at the beginning, but most of them when the focus on the work I've been doing over the last 15 years or so related to weather forecasting. And this is a large group project and, like I said, this has been going on for 15 years. Most of these people are one time associated with the University of Maryland, which is my home institution. And I'm not going to spend my whole talk telling you what each of these people did, but again, I want to emphasize that this is really what I'm telling you today. It's a large group effort. So let me talk a little bit about weather forecasting or the computational aspect of it. Numerical weather prediction is the, if you Google that phrase, that's the meteorologist's term for what goes on behind the scenes in the weather forecast. And so a global weather forecast is based on running a computer model. It keeps track of a relatively small number of variables. There's temperature, velocity, pressure, humidity, ozone, that's a pressure. Anyway, that's about it. But it keeps track of them at millions of grid points throughout the atmosphere. In fact, the latest models each time somebody gets a new super computer, there are more variables. So there is nearly a billion models now, variables now at the European Center. Do you mean variables or grid points? Grid points, actually. I mean, yeah, so it's billions of variables then in the system of, but yes, nearly a billion grid points is what I meant to say. The inputs to this model are initial conditions, all these physical quantities at all the grid points. And then the output is predictive values of these quantities at whatever future times you choose to run the model too. So weather forecasting is limited by, as I was saying at the beginning, by chaos and the different people mean different things by chaos. But I think the most important property to keep in mind is a lot of sensitive dependence on initial conditions or, from my point of view, exponentially sensitive. It should be exponentially sensitive, and I'll say it a little more precisely what I mean by that in a little bit. But so before I get into the weather forecasting, just here's an apparatus that illustrates some principles of chaos and most of you have seen this already, so I'm not going to take too long to introduce it, but this is a double pendulum that does various interesting things. And so the linear theory of dynamical systems, let's say, traditionally we thought that we could describe dynamics of systems like this in terms of a certain number of modes. There's a mode of oscillation that's like this, and there's a mode of oscillation that's, I can get it to be a little bit more periodic, but there's an anti-phase mode, and if the dynamics were linear, everything would be a superposition of these modes, but this is a nonlinear system, and it can't be described so simply. And in fact, let me try to illustrate this principle of sensitive dependence on initial conditions. I'm going to try to start this so that this bar is horizontal and this bar is vertical. Let me just sort of keep track of the pattern. One way to keep track of the pattern is watch this piece here, which will rotate sometimes clockwise and sometimes counterclockwise, and so you can count it, it swung that way three times and then that way two times and so forth. Let's try to watch the pattern. Actually, let me put a little more energy there. Let me start with this vertical and this horizontal, and we can try to follow the pattern of what happens next. So I lost track. There were a couple turns that way and then a few the other way. Anyway, let me try to start with the same initial conditions. I'm not sure if I convinced you. Okay. All right, let me start with the simpler one I'm used to. Okay, that time... See, now it started to turn. So it took a while and it turned a couple times that way and then a few times that way. Okay, let me start again. Yeah, I think it's just going to keep doing it. Yeah, yeah. And even if I got a robot to do this a little bit more precisely we could illustrate this a little bit better. One more thing with this. It's just a game I like to play with this and then I'll stop and go back to the weather but I'm going to start this going again with a lot of energy. Eventually it's going to lose energy and this bottom piece, at some point there's going to be a last time where that bottom piece swings over the top through 360 degrees. Do you understand what I'm saying? So I want you to try to predict because the point of the talks is going to be how in part how difficult chaos makes things to predict things. I want you to predict the last time that this little piece here is going to swing through 360 degrees. The last time it's going to swing over the top before it settles into this type of motion. And when you think it's done it for the last time I want you to clap. Clap your hands when you think this thing has done that for the last time. Let's start with a lot of energy. So already some people have lost the bottom piece. I agree the top piece is long since done swinging over the top. Still going. Patience is rewarded in this game. You might, no, no, no. It's getting close now. I can't really illustrate things quantitatively to you so much with this, but I claim that over a certain time scale, which is obviously a time scale of seconds and not hours, any initial error in the initial condition, any initial discrepancy in the initial condition will grow exponentially. That is if I had two identical ones of these things here and started them with similar initial conditions they would, in a matter of seconds, be sort of decorrelated in what they're doing. So how can we expect to forecast as complex a system as the weather if we can't even forecast this thing very well? Well, fortunately the time scale is different. And so in a weather system some error in specifying initial conditions tend to double every, it's a time scale of days. And so estimates vary wildly, and it sort of depends on the size of the perturbation too, but roughly think of the uncertainty in the forecast doubling every two days. So what that means, and this is exponential growth of uncertainty. So after four days, well, after four days the uncertainty grows by a factor of four. But after six days it grows by a factor of eight, and after eight days by a factor of 16. So according to this crude estimate. But this is why we don't get weather forecasts for one month from now. It's because by that time, you know, the error is more than a thousand times as large. The uncertainty is more than a thousand times as large as now. And so the accuracy of the forecast is going to be limited both by the accuracy of the model but by the accuracy of the initial conditions itself. If the dynamics are stable, the initial conditions might not matter, but they do matter very much. Still, there are limitations into what we can expect to do in improving weather forecasts, and the chaos sort of gives this limitation if we were ten times more accurate with our model and ten times more accurate in initial conditions. We've got everything ten times more accurate when the uncertainty grows by a factor of ten over roughly a week, so we might be able to forecast a week more in the future. But we will never be able to forecast accurately a year into the future, what exactly the weather will be a year from today. It's completely infeasible to have that accurate initial conditions or model or what have you. But we've made a lot of progress over the last decades, and so I want to give you some idea of what we've done to try to make further progress. And this is a slide, I know that the type is probably too small to read, but it's just illustrating various factors that go into a weather forecast model. They're actually separate models for air, ocean, and land, and these models are coupled to each other. There are all sorts of land factors that need to be taken into account, including human factors, this picture here shows some smokestacks, but I'm told also that in places in the world where there is a lot of rice farming, the amount of moisture that's added to the atmosphere when the farmers flood their rice paddies is significant enough that the forecasters in those regions have to take that into account in the way they model things. Clouds, this picture of a cloud too, clouds are very difficult to model. I think this is all... I mean, a weather model is basically the Navi-Stokes equation that's hard, or Boussinesq equation, there's convection and so forth. Lots of the physics is very well understood, but the physics of the cloud is not well understood, so there's a lot of... Cruden is still in the models. Now, I'm a mathematician, my colleagues were physicists, and then we started talking to some of the meteorologists at our university. We came from the point of view of mathematics and of low-dimensional dynamical systems like this. Trying to do something about weather forecasting by actually digging into the details of the model and improving the model, that seemed somewhat unfeasible to us. The goal of our project really was to develop methods to improve the quality of forecast without changing the models to... At least when we started out to treat the model as a black box. The advantage of that too is, as a mathematician, I could start working on the project without knowing any meteorology. Of course, it's been very interesting for me to learn some meteorology along the way, and as you get further along this project, you have to start to know a little bit more about the physics of what you're modeling. But we focused on the methodology for what meteorologists call data simulation, and that's how one forms the initial condition for a forecast, and how we form the initial conditions for a forecast based on the available atmospheric data. So in my collaborations with laboratory physicists, I never thought of initial conditions as being an issue, but you just measure them. You're done. We don't have weather instruments at every model grid point throughout the atmosphere. We don't come close to measuring all the things that the model wants to know in order to forecast. It's a very heterogeneous network of observations, and so I'm emphasizing on this slide, it's not just a matter even of interpolation to the model grid. The data is too sparse in certain parts of the atmosphere to make this a simple problem of forming the initial conditions. And I want to show you a few pictures just giving you an idea of what sort of data we have to work with. The first thing is just another sort of cartoon of the various sources of the data that goes into weather forecasts around the world. There are, of course, land weather stations. There are weather balloons. There are ships and buoys that radio information. Commercial aircraft are an important part of the observing network, and then there's satellites in more recent decades. There has got to be a large amount of satellite data. Can I give you an idea of the geographical distribution of these things? The yellow dots here are surface weather stations. They're measuring temperature and pressure and wind speed and measuring all those variables. This is a fairly dense network over most, but not all, of the land portion of the world. But this also is only telling us what's going on on the surface and that's what's going on up above us. These are the weather balloons. The weather balloon is launched and it goes up and at various heights in the atmosphere it sends back its data. It's very important, but the weather balloon network, which is also yellow dots here, is a lot sparser. It's very thin over the oceans. It's also very thin over the southern hemisphere. One thing I learned about meteorology is that the dynamics in the northern hemisphere and the southern hemisphere don't mix too quickly. There's sort of a barrier at the equator. There's some mixing going on. What's that? How high is the weather balloon up? I'm not sure how high it goes up. It's up a good bit higher than the airplanes do. I'm not... 30 kilometers? There's commercial aircraft. Again, this is... Now we start to get some more data over the oceans, but again this is very heavily concentrated in the northern hemisphere. We're getting very little over the southern hemisphere and over the poles. Ships and buoys. Now we've got a bunch of data over the oceans, but it's still just surface data. So especially in the southern hemisphere, especially in the upper atmosphere, especially over the oceans, we still have a relatively sparsity of data. And not nearly enough, as I said, to just interpolate to model grid points. A lot of these instruments are also not measuring all the variables involved. It's especially true of satellites. Satellites can potentially give you global coverage. There's a lot of information on this slide here, but the upper left figure is from some geostationary satellites. Most of the satellite data comes from these satellites. They're on a polar orbit. And this is the data from a particular day. So this particular satellite sent back a lot of data on a particular day, but it didn't cover the whole globe. And what these things are sending back, what can a satellite observe? What it observes is a... This is actually the third bullet here. I'm skipping around a little bit, but what it observes is... There are a couple of things it observes. Some of the satellites actually observe the surface of the ocean and make an inference of wind from the surface of the ocean, except when there's clouds above, in which case clouds confuse things. A great deal. So there are gaps in the data. But the main thing satellites are measuring is just radiation. And different temperatures, different amounts of moisture in the atmosphere affect the absorption of radiation. So you can make some inferences about temperature and humidity based on these radiances. But it's a very crude sort of measurement. It's not nearly as accurate as having an actual instrument there to measure things. So there's lots and lots and lots of satellite data, but using it profitably is difficult, is more challenging than the other types of data. Okay, so let me get back to this idea of data assimilation, just conceptually what I mean, sort of independent of the weather problems. What do I mean by data assimilation? And I've tried to convince you of the need for it in the weather forecasting problem. So in data assimilation, we assume that we have some sort of forecast model for a system, some sort of physical system, and we have some ongoing time sequence of measurements of observations or data. And what data assimilation does is it attempts to synchronize the model state with the physical state. It tries to get the model state to track close to the physical state, so that at any given point we can say, okay, let's stop here and forecast into the future where we don't have data yet. But we need to stay close in order to get reasonable forecast. And so we're trying at any given time to estimate the current state in the system based on the current and past observations. And using observations to the past is crucial to filling in the voids and the data we have in the present data. We're not measuring very well what's going on now over the Southern Pacific Ocean. But the weather that's there now circulated from some other part of the world where we did have more instruments over the past few days. And so if we take into account the past several days for the measurements, we can make inferences from that information about what's going on right now. Here's a cartoon. I'm trying to describe this paradigm in a variety of ways since the next few slides are maybe going to be a little bit redundant. But here's a cartoon of what's going on. Well, let me do this one. So the horizontal axis here is time. The vertical axis is the state of the system or as best I can represent the state of a system in one variable. The blue curve here represents what the system is really doing. This is what we're trying to infer. We don't know what the blue curve is. We're trying to infer what the blue curve is. The black curves represent runs from our forecast model. And so we forecast for a while, and then we get some observations. The blue dots here are observations. The blue dots are not the same as the truth, but they're hopefully reasonably close to the truth. The most misleading thing about this cartoon is in general we're not measuring the system state directly. In general, the observations live in a different vector space than the model state does, and we have to go back and forth between them somehow. But just let's pretend in this case we're directly measuring the system state. That's the blue dot here. So we have a forecast that said the system should be doing this. We have an observation that says the system was doing that. Neither one of them is exactly right. And so what we do is we somehow split the middle between the two. This red dot here represents our best guess as to what's going on right now based on both the forecast and the current observations. Then we forecast from there. And again, this is sort of a first guess as to what's going on right now. We make observations and we move towards the observations. We don't move all the way towards the observations because we might in some cases overshoot. In this case, our forecast was very accurate. It was about as accurate as our observation. So we don't want to go all the way to the observation. We want to somehow interpolate between the two. So the data simulation is this whole cyclical procedure. We forecast, adjust to the observations. That's the red dot. Use that as initial conditions for a new forecast. Adjust. And hopefully we have enough information in those observations to keep tracking the system like this. In weather forecasting, this adjustment is generally done every six hours. Why is it every six hours? Well, every six hours we get a new weather forecast which represents more accurate initial conditions than we had from extrapolating six years ago. I mean, six hours is maybe the cycle over which if we did this every hour, we wouldn't get much new information after one hour. If we have after six hours, we can then perform this procedure and get a significantly better forecast than six hours ago. So I mean, it's synchronized with the clock cycle. Certain of these instruments, like the weather balloons, go up every 12 hours. Certain other instruments are on a sort of six-hour schedule. Some of the information just comes in whenever we want it. So we forecast six hours into the future. We gather all the atmospheric observations. We get over that six-hour time period. And we adjust our forecast to better fit the observations. I meant to highlight the word forecast, and analysis is the term that the specialists use for this step here. Forecast, analysis, forecast, analysis. I'm emphasizing those words because I'm going to use them not too much, but I'm going to use them occasionally later on through the slides. The adjusted model of statement is an initial condition for the next six-hour forecast and repeat the cycle. And I highlighted this middle step in red because that's the interesting part. That's the part I've told you the least in terms of, well, how actually do you do the adjustment in a quantitatively reasonable way? And that's the part that we've concentrated on. Okay, yet another representation. This is just the flow chart representation for people who like flow charts. I haven't told you how to start this cycle, but we'll make a forecast. We look at the observations, compare them against the forecast, do this assimilation or analysis step. We get an updated or adjusted model of state that initializes the next forecast and we cycle through. Okay, so now I'm a mathematician, so there's going to be a couple slides of equations and then we'll be done with equations. I want to sort of formulate a reasonably precise mathematical problem just to give you some idea of what's underlying the various methods I'm going to be talking about through the rest of the lecture. So for the mathematical formulation, we're going to assume that we have a perfect forecast model. It's a little bit trickier to say what you should do when you know your model is making errors, but you don't have such a good idea of the nature of those errors. So I'm formulating this model in discrete time unit is the length of time between those assimilation steps. So this function F here says forecast X, X is some vector quantity, forecast X for one time unit. So F is not something I think it was being given by the formula, F means run the computer model. So we assume that at a given time T we have a vector of observations Y sub T that is a known function of X sub T. I'm highlighting here in this slide I'm highlighting and read all the assumptions that or at least some of the assumptions that aren't true in real life. We really don't know exactly what this function H should be. In the case of the satellite data, H is given by what's called a radiative transfer model. There's a lot of imperfections in that model. That model is saying that the atmospheric state is this, the observation should have been that. That's what this H does. And we assume that there's some uncertainty or some noise in those observations. And we assume that noise is Gaussian. So written out as an equation, our observation vector Y at time T is a function H of X plus what we assume to be Gaussian noise. In that case, so I've introduced a little bit of statistics in the problem, we can do a maximum likelihood estimate. There's nothing really that's sophisticated about what underlies this method. The problem is to sort of put something vaguely resembling the theory in the practice. So we do a maximum likelihood estimate for the trajectory of model states X of T given the sequence of observations Y sub T and it amounts to minimizing a certain cost function. And what's in this summation here? These terms are the kind of terms you see in the exponential of the probability distribution function associated with the Gaussian. So these things in parentheses here, Y minus H of X, that's the epsilon for the previous slide. That's supposed to be normally distributed with covariance R. So the PDF for that is E to the minus this stuff. And when you do a maximum likelihood estimate, you multiply together a bunch of probability distribution functions. And when you multiply them together, you have the exponents. And utilizing this function is equivalent to maximum likelihood estimate. And so one of the goals of data assimilation is to take that nonlinear least squares problem and instead of solving it from scratch, each time we get new observations, take the solution we got from the observations up to time T minus 1 and use that somehow to at least approximately solve the problem at time T. Next I want to talk a little bit about variational data assimilation. So this is a methodology that I'm going to contrast our own methodology with. Variational data assimilation has been state of the art now for the last couple of decades at the larger weather services. And the European Center, which has the most sophisticated data assimilation system and put into practice now, in a method they call 40 bar, and it solves this nonlinear least squares problem I formulated. It solves it not using, whoops, not using the observations going back to time zero to the first observations, but using the observations to the last 24 hours. So everything in the summation that's from observations more than 24 hours ago gets replaced by some crude approximation to what that should have been. But then the rest of this cost function is modeled fairly accurately over the last 24 hours worth of observations. And then the result of, the result there is then used in the next minimization, which still happens six hours from now. So it's a very accurate method, but it's very computationally expensive. You're trying to minimize a nonlinear function of millions or billions of variables. And one thing I want to emphasize, it requires not only a lot of computer time, but a lot of people time. In order to do this minimization, you have to basically differentiate the model in order to compute a gradient of this cost function. And differentiating a weather model is something that people spend months doing. And every time the service gets a new computer and they go to a higher resolution model, somebody again has to spend a great deal of time doing the necessary sort of symbolic calculation that's needed in order to employ this numerical method. So we were after something not necessarily better than this, but simpler than this and comparable in quality to this variational approach. And what we're doing is based... I'm repeating myself to some extent, but I'm going to try to give you a variety of ways to look at what we're doing in data assimilation. There's a Bayesian way of thinking about it. For those of you who are familiar with Bayesian statistics. And from the Bayesian point of view, what we really want to do is keep track of the probability distribution of possible model states. We want to acknowledge that we don't know exactly what the system is doing at any given time. And at least ideally, conceptually, we'd like to keep track of the probability distribution that says these are the most likely states. And this is sort of the amount of uncertainty we have at a given time. So in the forecast step, we would like to evolve this probability distribution according to the forecast model. And then at the analysis or the assimilation step, what we'd like to do is combine that forecast probability distribution with the probability distribution, the Gaussian probability distribution that we associated with the observations. And then Bayes' rule says we should multiply those two things together. So the forecast is the prior distribution in the Bayesian language. Then we take into account new information and we get a posterior distribution. But this step of applying Bayes' rule is basically multiplying two PDFs. That's relatively simple. The problem is if I have a large nonlinear model, we need to make some approximation for the forecast step. I can't actually forecast a probability distribution in a billion-dimensional space. I can't even represent the probability distribution in a billion-dimensional space on the computer with any reasonable resolution. So a lot of approaches make a Gaussian approximation. And that's what we're going to do at some level. We're going to assume all the probability distributions are involved are Gaussian. And then we can represent a probability distribution just by a mean and covariance. That's a lot less information involved in describing a more general arbitrary probability distribution. And if the forecast model is linear and the observation function is linear, this approximation is exact. In a linear model, a Gaussian distribution propagates to a Gaussian distribution. If I multiply two Gaussian distributions, I get a Gaussian distribution. And so everything in the Bayesian procedure I described before is self-consistent, at least if you start out with Gaussian distributions, you're still going to get Gaussian distributions provided everything in the model is linear. And this is the essence of the Kalman filter. And so I've heard about the Kalman filter for many years before I really understood what was going on. But that's basically what Kalman observed and it turned out to be very important for applications. The equation for the Kalman filter comes from basically, it's basically completing the square. Once you understand what's going on, it's just a certain amount of algebra to derive the Kalman filter equations. And going back to my cost function, if the forecast model is linear, that means the x and t's at different times are related to each other by a linear operator and the h is linear, so everything's a quadratic function of x here. And it's just a matter of taking a sum of quadratics and completing the square, sort of writing that as one quadratic. And that's where the Kalman filter equations come from. And so as I said, then one keeps track of a mean and covariance and we think of the mean as representing the current state estimate, our most likely state, and the covariance quantifies the uncertainty in that estimate. So the nice thing about this is we're keeping track not only of our best guess, but about how accurate we think our best guess is. Still, even having made this approximation for a high dimensional model, forecasting that entire covariance matrix is unfeasible. Taking a billion by a billion matrix and then doing some dynamics on it, which would be a matter of solving a partial differential equation that's associated with the ordinary differential equation of our forecast model, still unfeasible. But this covariance is really crucial. It helps us, first of all, quantify the relative uncertainty in terms of the forecast value and the observed values. So you remember that picture before where I had the black overhead and the blue dot and we put the red dot somewhere between them? We interpolate according to what we perceive to be the uncertainty. We've got two pieces of information and we think this one is ten times more accurate than this one. We interpolate closer to the one we think is more accurate. So the covariance matrix gives us a mathematically motivated way of combining information. It also allows us to make inferences about what the model is doing at one location from observations in another location. In cases where we don't have observations, covariance matrix allows us to make inferences. The inferences are basically of this type. Let's suppose the forecast for today is 30 degrees in Trieste. And maybe it's also 30 degrees for Venice. But maybe there's a weather station in Venice and not in Trieste. So if we observe that the weather was actually two degrees cooler than forecast in Venice, the covariance matrix tells us how the temperatures are correlated between the two places. We might infer it's probably two degrees or at least one degree cooler than forecast here in Trieste as well. So we want to keep track of some sort of covariance matrix but not a billion by a billion sort of matrix. We want to keep track, quantify the uncertainty in a less data intensive matter. But the potential advantage we have over the variational methods is that the variational methods use a sort of climatological constant in time uncertainty for the forecast. So they can actually deal with a large covariance matrix but it's a static covariance matrix. They use the same one each time. We want to better quantify, though in a cruder fashion, the time-varying uncertainty in a forecast. And I'm going to show you a couple slides that sort of illustrate how that uncertainty varies with time. So our solution to this problem of quantifying the uncertainty is based on the notion of ensemble forecasting. So ensemble forecasting was started in 1992, actually it was both at the U.S. Weather Service and in the same year at the European Center as well, started producing what they called ensemble forecasts where they made multiple forecasts from the same computer model. Each forecast was starting with slightly different initial conditions. And the reason for doing that at first was just to be able to put some sort of error bars on the forecast to get some idea of what the uncertainty in the forecast that's put out to the public was. And in the way we use an ensemble, we make things a little more precise, we have an ensemble of model states, we can form a sample mean in the sample covariance from that ensemble. And so that sample covariance matrix associated with an ensemble we take and plug into the Kalman filter equations in order to do our method. But the U.S. and the other big weather centers didn't really start using the ensemble forecast in this data assimilation procedure until the last few years. So there was a long time between when they started doing these forecasts for the end product and when they started using it internally in their data assimilation procedure. Here's a couple more pictures. There aren't too many more pictures in the talk but there's not too much time either so that's fine. But a couple more pictures here illustrating ensemble forecasts just to give you an idea of what this looks like. This is what the meteorologists call a spaghetti plot. And what these curves are is these are these are contours of pressure. The value of the pressure to plot the contour out is chosen to roughly correspond to where the jet stream is going across the United States here. This is a four and a half day forecast on a particular date back in 1995. And this figure was chosen because the agreement in these ten or so different four and a half day forecast is actually much the agreement is much better than it usually is. All these things are sort of agreeing there's going to be a strong low pressure system over the eastern United States. So above the curves is sort of lower pressure below is higher pressure. And on this basis the U.S. Weather Service made several days in advance predicted a major snow storm. It was the first time they'd ever predicted a major snow storm this far in advance. And it was this ensemble forecast that gave them the confidence to do it that far in advance. Here's another forecast from a different date in the same year. This is only a two and a half day forecast and the forecast don't agree very well at all. So in part I'm just trying to give you an idea of how the forecast uncertainty can be very heterogeneous both in time and in space. In this particular area the forecast uncertainty is quite large. On the other hand this plausibly at least looks like maybe a one parameter family of curves. If I could now make an observation at this time it wouldn't take too many observations for me to figure out which of these curves is right and which is wrong. And that's maybe the key to why what we do works. Is because we can combine this forecast with relatively little information in order to really home in on what's actually going on. Is that today with the 104 in Austin on Halloween? Well it was in 1995 and it was Halloween night. I'm not, oh wait no. I misread it. I'm sorry. It's October 21. I mislabeled it. I just looked at this fine print a little bit quicker. So what we do is based on what's now called Ensemble Cabin Filtering. This term was introduced by a scientist named Gear Evanson back in 1994. The idea here is we make an Ensemble of Forecasts and the Ensemble of Forecasts are shown in the corner of the screen. So what we do is we make an Ensemble of Forecasts. And the Ensemble of Forecasts are shown according to the current state estimate. Remember in this Gaussian approach we have, or in the Kalman filter approach we have a mean and covariance at any given time. We think of those as representing a Gaussian distribution so we essentially sample that distribution. Forecast forward in time and think of that as sampling a forecast distribution. Now we can only see uncertainty in the sort of space spanned by these forecasts. But within that space we try to determine which linear combination of those forecasts best fits the observations. We go with the Ensemble members or at least put more weight on the Ensemble members that better fit the observations they were collected over the forecast time period than the ones who fit the observation worse. There's another cartoon we forecast an Ensemble from time n minus 1 to time n take a sample mean and covariance apply something resembling Kalman filter equations form new initial conditions iterate forward. Now the bad news about this and I alluded to this already is that we can only quantify the uncertainty in whatever space is spanned by this Ensemble of forecasts. And in general for practical reasons the Ensemble is going to be much smaller than the dimension of the model. And so we have a reduced our sample covariance is a reduced in rank relative to the true covariance. But the good news then is we can do all the linear algebra we need to do in this much lower dimensional space. It becomes computationally more expensive. If we have captured the space in which most of the uncertainty lies we can actually get nearly as much information as we can out of a full rank covariance matrix but we can manipulate it a lot more quickly. Now the last ingredient in the system we have developed which I'm going to describe over the next couple of slides the last ingredient is what we call spatially localization. And so if we have a high dimensional system that comes from discretizing a PDE for example it sort of has some spatial extent to it. We may not be able to forecast Ensemble large enough to span all the plausible states globally but we may be able to get a reasonable picture of what the possible states are locally or regionally and furthermore if a long distance correlations in the model or the system are relatively weak this sample covariance we take is going to be dominated by sampling error when we get to the weaker correlations and so the correlations we see in Ensemble at long distances are actually going to be spurious. And so we filter out the spurious correlations and we filter out in a very crude very simple way we estimate the state at a given geographical location using only the observations from nearby. We throw away the observations beyond a certain distance away which further simplifies the calculation but it also fortunately makes it more accurate. So our version of the Ensemble-Calvin filter chooses a grid of points which since time is running short let's just think of it as the model grid around each grid point we choose some region and we assimilate only the observations from that region we do a separate calculation for each grid point using the only observations near it so at each point we get an estimate of what we think the current meteorological variables are at that point loop over all the grid points and now we have an estimate of what's going on around the globe this can be done in parallel in a fairly simple manner so we initially called this method L-E-K-F for local Ensemble-Calvin filter then we formulated an equivalent but simpler to implement method L-E-T-K-F local Ensemble Transform-Calvin filter and one thing I learned from this exercise is if you want to use choose a unique acronym go with at least five letters if you google L-E-K-F you'll see all sorts of things which are not the L-E-K-F stands for a lot of things if you google that you'll see lots of things that don't have anything to do with science if you google these five letters everything you see will be related to our method so I'm going to show you one slide of results one slide to explain well, a few more slides here but one slide to explain what the results are and then one slide of results so what we did several years ago is we got the model of the National Weather Service used or actually we got what was by then a few years old version of their model but we went back and we got the 2004 data so we took the actual 2004 data and so we ran our data assimilation procedure their data assimilation procedure and compared the two and just to give you an idea of the scale of things we used we used a 60-member ensemble and 800 kilometer radius we threw away observations that were more than 800 kilometers from the particular point that we were interested in so here's some results now these meteorological graphs are kind of funny they like to put pressure on the vertical axis because well pressure corresponds to the vertical coordinate spatially so this is as a function of height in the atmosphere this is the bottom of the atmosphere the top of the atmosphere this here starts at zero here this is the forecast error our method is this dashed line with the circles National Weather Service method is this more solid line with the pluses and this is southern hemisphere northern hemisphere northern hemisphere is basically the same southern hemisphere we were actually doing better and we think that's related to the fact that the data is much sparser in the southern hemisphere so the sparser the data is that's one conclusion we drew the more advantages there are to our method relative to the variational method and we were very interested in something that could be done in practice so we looked at the computational speed every six hours the weather service only devotes time to this data simulation step but then we have a super computer we were able to do a comparable thing in 15 minutes on the Linux cluster so it should run fast enough to be used operationally and as I'm about to show you actually a couple slides it is at this point being used so but the scaling of the computational time is linear in a lot of things it's quadratic in the ensemble size it doesn't grow too quickly as the problem gets larger ok so more recently these results I showed you were from the last decade we're continuing to develop any number of improvements and extensions to our LATKF method I don't have time to talk to you about obviously meanwhile many of the largest weather services and I'm not saying we alone convince them many of the largest work of the services now are using a hybrid between an ensemble method like we're doing and the variational methods that they developed over many decades it makes perfect sense for them not to throw out their variational methods and like I said our goal wasn't to beat them so much as provide complimentary information relatively cheaply but lastly I'm pleased to note that the Italian weather service has adopted our particular algorithm and is using it currently for the regional model that they run in order to make their forecast so if the forecast for tonight's concert is correct you can thank me tomorrow don't bring it up if it's not correct it's the weather service fault ok so let me wrap up now so I'll try to explain to you what data assimilation is maybe that's the most important thing I've tried to explain to you I find it a useful paradigm I find a lot of different problems and what it does again is it estimates the state of some sort of physical system in cases where the entire state can't be measured directly at least all at one time but there's some sort of model available that allows us to re-rate information from past measurements to what's going on currently our particular method we proposed I tried to commit to is a practical method for data assimilation in very large systems let's say here it scales well the high dimensional systems and it's also relative to the variational methods I want to emphasize it's largely model independent you can treat the model more like a black box or you can take code develop for one model and port it to another model more simply and this is why some of the smaller weather services around the world the ones, not the detailed ones, is that small but they don't have the resources to develop a variational model so our method has some appeal to people with less resources than the big weather centers here I remind you there's a page where we have lots of publications and so forth and lastly I'll leave you with a model that you should forecast globally but assimilate locally thank you