 The next speaker is Joe Tribbia. Joe Tribbia is a senior scientist here at NCAR and Joe was the section head before he became an emeritus of the section I'm in now. And there's many things Joe has done. He has made many contributions in many different areas and I personally think there is not a single topic he has not made a valuable contribution to or does not know the details about. Joe, please go ahead. I'm looking forward to your talk. Okay. It's very nice to be here, even if it is virtually, but it does give me the opportunity to give a talk in my pajamas. So I really appreciate it. And I want to thank the organizers for asking me to give a talk. So I'm going to share my screen. Can you all see it? Can you see the screen? Yes, thank you. That's great. Okay. So the topic I'm talking about is ensemble prediction. So I really appreciated Will's question on ensemble size because it's really the issue that I'm going to try to elucidate how we get by predicting a large system with really rather small ensemble sizes. So I'll waste no more time with that and just mention that this talk relies on a number of slides I've gathered over the years from some of my friends doing ensemble prediction. And there'll be noted on the slides going on. So the outline for the talk is going to be a little bit of a breakdown on how we got to probabilistic prediction and ensemble methods. Then talk a little bit about the theory behind some of the ways we construct ensembles in medium range forecasting and by extension the ensembles that go into S2S prediction. Then most of those will involve vectors of various types. And then I'll talk a little bit about the successes and limitations of ensemble prediction and talk a little bit about what might be next. So that's the outline. Here we go. So I wanted to mention three people who really made gigantic inroads in the area of probabilistic prediction. The first one is Ed Epstein who was my chairman at the University of Michigan when I was in graduate school. He was chairman of the department. The next person was Chuck Leith who was one of my bosses when I began at NCAR. And the third one is Hank Tenakis who was a visitor at NCAR with whom I interacted many times, but also was on the scientific advisory board of the European Center. And his status as a board member allowed him to make some really substantial contributions in getting ECMWF to do ensemble prediction. So let me go through what each one of them did. Ed Epstein fundamentally raised the problem of probabilistic prediction using a dynamical model. And his fundamental approach was a topic known as stochastic dynamic prediction. And stochastic dynamic prediction involved trying to predict the mean or the most likely forecast, the mean forecast and the covariance of that forecast. So both a little bit about the mean of the forecast and the reliability of the forecast, the covariance information associated with the forecast skill. And so one of the benefits Epstein found was that by predicting the mean, the error was not as large in an RMS sense than using a deterministic prediction. The reason for that, of course, is a deterministic prediction will vary as much as any climate state, any random state drawn from the climatological distribution from the weather after a period of time. So from the difference between the verification and the forecast can be as large as two standard deviations or twice the variance of a climate prediction. So the error variance can be twice. If you predict the mean, it only remains one standard deviation or as far away as the climatology is. And that's because there's an enormous smoothing that comes in from the mean of a climate distribution, from the mean of a forecast of a probabilistic forecast, and it filters out all the unpredictable aspects of a prediction. The next person I want to let's go back here. One of the biggest problems was Epstein used a moment prediction account. So he predicted the mean equation, the forecast for the mean indeed gave was an n dimensional forecast just like the forecast model. However, for the standard deviation, for the covariance matrix, its order n squared. So if you have a large dynamical model, say a million degrees of freedom, you're talking about n squared or 10 to the 12th degrees of number of degrees of freedom. And that's an impossible computational task. Leith solved that problem by saying you could try using Monte Carlo samples to predict the mean and get by with order 10 Monte Carlo samples that would give you essentially the same filtering capability of the mean forecast that Epstein had gotten in the stochastic dynamic prediction studies. Finally, Henk Tenakis doing work in the Netherlands and being on the board of the European Center made the bold statement that a no forecast is complete without a forecast of forecast skill. And this actually invigorated the study of ensemble methods for prediction of not only forecast, but of forecast skill, okay, which means you really have to address this question of the uncertainty in the forecast, the covariance of forecast error when you go forward in time. And so this research led to ways of using a small number of ensemble elements, despite the fact that you're trying to predict a probability distribution in a gigantic phase space on the order of 10 to the 7th, 10 to the 8th degrees of freedom. So on the face of it, it would seem pretty impossible to get useful uncertainty information or uncertainty quantification out of an ensemble of less than 100 say 50 as the European Center is currently using. To illustrate that point, if you had a one dimensional Gaussian distribution, you might need 10 ensemble members to span that. If you went to a two dimensional Gaussian distribution, you might need 100 members to populate that probability distribution and predict it going forward. If you had an n dimensional space, you might need order 10 to the n. Okay, so if n is 10 to the 6th, 10 to the 7th, 10 to the 8th, this is 10 to 10 to the 8th. This is an impossible computational burden to address even once. So how do we get around that? Well, what you really want to be doing is predicting the probability distribution forward in time. And how do we guarantee that the ensemble that we generate will do a good job of predicting the probability distribution? So I'm going to illustrate, give you some theoretical guidance as to how you might want to do that. Okay, so I'm going to try to tackle the large dimension versus small ensemble trade off by getting some linear theoretical guidance. And in order to do that, we're going to start off with a forecast model. I'm not going to care how large the dimension is, but I am going to make one simplification. We're going to look at a rather tight probability distribution initially, which is the case in prediction. We have relatively small errors at the initial time. So we'd have a relatively tight probability distribution, a relatively small standard deviation of the ensemble elements. Going forward, we might use the fact then that we would be able to linearize the ensemble around a mean or a particular trajectory close to the mean, and that would allow us to make a probability distribution of Z as this deviation from the central trajectory. And that linearized equation for the deviation from that control forecast or central forecast is just a linear matrix equation. But the linearization is about a mean trajectory. So in here, we're embedding not a constant basic state, but a basic state around which we're linearizing, which is a single trajectory of this forecast model. This being a linear equation, it has a solution in terms of a linear operator. So if we have a particular initial condition for Z, a linear operator takes it from t equals zero to t equals t, and that linear operator depends on this matrix operator A of x zero of t and t. For further reference, I want to note that we can also map back from the initial condition from the time z equal t condition to our initial condition. And I'm going to call that map z naught of z and t and t. We're going to use that now in the solution of a probability density equation in phase space. So a predictive equation for the probability density in phase space is called the Liouville equation. And it's just the continuity equation for the density in phase space. So just like in fluid mechanics, we have a continuity equation, a row of t plus the divergence of the velocity times the density row in physical space. The same is true by the same mathematical arguments, you get a continuity equation for the probability density in phase space. That probability density can be broken up. That continuity equation can be broken up into two parts. One is an invective part, and the other is this divergence part, this compressive part. And I'm going to call this compressive part, which is a scalar that depends only on t sigma of t. So with that definition for this compressive part, the divergence in phase space, we can rewrite the probability density in a way that will get rid of the compressive part and leave us only the invective part. And this is the invective part. And we'll predict this renormalize probability phi using this advection equation phi of t, phi sub t, plus az dot grad z in phi. And this is merely an advection equation in phase space. And the invective velocity is this matrix operator az velocity, the velocity of z in phase space. And so to solve this equation, the way one does it is to use the method of characteristics. And the method of characteristics says what I'm going to do is find how the phi is carried along the characteristics of this flow. So the characteristics of the flow are given by these two equations, which is merely our dynamical equation. And phi is conserved along that velocity field. So if phi at time t equals zero is some arbitrary function of z, phi at z and t will be that arbitrary function of z mapped back to the original velocity point. So mapped back to its initial point. So phi is constant along the characteristic curves. And we just map it back to its original point. What does that mean for a simple initial condition like a Gaussian? So if we start with a Gaussian, okay, mapping back, we start with a Gaussian with covariance function lambda inverse. So that's our initial Gaussian distribution in z. So it's Gaussian with zero means since we've taken out the mean of the Gaussian. At a later time, the probability density function is proportional to another Gaussian, a new Gaussian with a different covariance function. And this covariance function is given by this transformation of the original covariance function. And the point of my going through all of these is the following. The EOFs of this new probability density are the degrees of freedom that have the most variance at a given time. And the EOFs of that distribution, the EOFs of this covariance matrix are what are called the singular vectors of this linearized dynamical system. The important aspect of this is that in this interpretation, the EOFs can form a basis for phase space, a basis for distributing our ensemble elements at the initial time. And given that, they give us a systematic strategy for constructing an initial ensemble and going forward in time with it. The EOFs at the later time are the evolved singular vectors. So using singular vectors in this form is a very strategically logical way of constructing an ensemble. It will construct an ensemble of that will span the most important degrees of freedom in the probability distribution. Let me show you. So really now the issue with singular vectors and bread vectors are not only a probabilistic perspective, but can be gotten from a different variational perspective. This is a variational perspective is the one that's typically used to to motivate the use of singular vectors. But I want to point out that this variational perspective that is outlined here where we form the variance at a later time and then try to maximize that variance, look for degrees of freedom, which grow the most rapidly, kind of emphasizes much more so singular vectors as dangerous degrees of freedom. Whereas the linearized Liouville equation motivates singular vectors much more as a natural basis for the PDF of the system. So basically from Rayleigh-Ritz, the variational problem here with the constraint, the initial condition norm equal to one, excuse me, motivates an eigenvalue problem using the Rayleigh-Ritz criterion. And this is the eigenvalue problem that you would get. And the eigenvalue problem can either be solved as what are the most dangerous degrees of freedom looking forward or the most dangerous degrees of freedom that have grown from the past. And in the first case, moving forward, you have singular vectors as implemented by the European Center. And if you think of them as the most dangerous degrees of freedom growing up until the present time, then you motivate bread vectors as implemented originally by EnSEP in their ensemble system. Okay. Moving on. So this is, as I said, the motivation of bread vectors or singular vectors moving forward in time, start off with a small circular PDF. And at some later time, the linearized dynamic stretches the PDF in the degrees of freedom that are the most dangerous. Or looking at it this way, the EOFs of this original probability distribution become the EOFs of this, the principal axis of this ellipsoidal covariance. And the leading EOF is the one that explains the most variance and is stretched out along this elongated degree of freedom here. All right. Actually, at the European Center, a much more general eigenvalue problem is solved to compute their EOFs, which involves weighting them with various initialization weightings and finalization weightings, which give you a slightly different, a slightly different eigenvalue problem than the one I portrayed on the previous slide. And really, what it has to do with using metric norms that might be different at the initial time and the final time. But basically, you're solving an eigenvalue problem just as I pointed out there, which once again, diagonalizes the covariance matrix. And that's a different perspective than the most dangerous degrees of freedom perspective. And I want to emphasize that. All right. So what's the difference between bread vectors and singular vectors? So I constructed a very simple example in which a baroclinic channel with a localized baroclinic jet at one end is perturbed and the singular vectors are found and the bread vectors are found for each of them. So the singular vector here is this structure and the bread vector that's constructed is this structure down below. And as you can see, you get exactly what you might consider as being the necessary aspects of these vectors. This singular vector in time is going to evolve into a very dangerous degree of freedom. It's going to look very much like this structure bread vector below when it evolves. This singular vector looks very much like a adjoint mode associated with a normal mode that might look like this. So this is another aspect. You should have a way of thinking about singular vectors. They're really much like the adjoint structures that will most effectively grow into degrees of freedom that will look like unstable structures, unstable baroclinic waves in this case, and look like covariance structures that evolve into realistic uncertainties at the final time. Okay, so singular vectors are also very useful for picking out aspects of the flow that will be uncertain and grow in the future time. So I constructed some model error states that were gotten using a fraternal twin of that same evolving baroclinic flow that I showed on the previous time. And this is what the error growth of backscatter looks like at the initial time or the beginning of very early into the forecast. And the leading singular vectors do very effectively pick out areas where these error growth, these error growths in this system. I should point that back the other way and say the error growths in this system where the singular vectors suggest they ought to grow because these are the most ticklish degrees of freedom, the most delicate degrees of freedom, the ones that will give you the most dangerous growth going forward in time. Okay, so the point of doing all these things in terms of finding exotic or dynamically constrained ways of finding initial conditions for a smallish ensemble is that predictability is flow dependent. Okay, the uncertainty structure on a given day is going to vary from one day to the next. One day the region of uncertainty will be very large, okay, show very large uncertainty, whereas on an earlier day you might see very tight correspondence of ensemble members, and so no predictability growth. So this region here is a very certain region for the 500 nB height field, whereas at a later time this region in here is a very uncertain region for the predictability. And singular vectors do a very good job of picking out that uncertainty. They also do a very good job of spanning the kinds of weather that might exist on a particular day. You can see these are different ensemble members from the European Center at 32 hours and excuse me, at 10 days. What you can see is very large deviations of the ensemble members from one another showing very great forecast uncertainty. Excuse me, this is at six days, not 32 hours or 10 days. This is six day forecast. You can see there's a wide variety of forecast realizations that might exist, and so the ensemble has a good chance of being representative at that time. To show you a case where it came in handy in the medium range is a forecast of a famous Lothar storm at 48 hours, 42 hours, and this is the deterministic prediction. This was the verification, a very much deeper trough. And as you can see, the ensemble generated not only troughs that deep, but in some cases troughs even deeper, showing how strong and how uncertain this forecast is, but it did give a hint of the kind of extreme weather that was possible for this particular time, in which there was indeed some extreme weather did occur. Two points I want to mention before. I quit because my time is just over. Turns out a method called random field perturbations works nearly as well as some of these dynamically constrained methods. And to show you that here's an initial forecast uncertainty field for the 500 millivariate height field constructed using singular vectors, random field vectors, and ensemble transform vectors, which is pretty close to bread vectors. And this is the estimate for the initial uncertainty. You can see random field is a good job of picking up the initial uncertainty. And in terms of the evolved ensemble forecast, does a good job of picking up the error, the actual error or the uncertainty of the forecast in the future time. So however, one aspect that is deficient in almost all forecasts is that ensemble predictions in and of themselves are not representative. They don't do a good job of predicting the forecast error beyond about two days in time. So they're deficient in error variance at two days in time. And the only way this is shown also from a rank histogram diagram in which the probability of occurrence is of the atmosphere is actually outside the range of the forecast as shown by the most populated histogram states are the wings of the distribution showing the there's not enough variance in the ensemble. And the only way around that is to add some stochastic forcing, which is the whole a whole topic of another talk and we will end there only by stating that you know, some means of emulating forecast errors, model forecast errors is needed in order to get represent representative ensembles at the medium range and certainly out to the asterisk. So I'm going to close there with the following conclusions. Probabilistic predictions require solutions to the model dimension ensemble size problem, singular breadth and random field vectors solve this problem well. Fairly well, error prediction is represented about about two to three days but not longer. Missing variance for model errors must be accounted for by model stochastic terms or perturb parameters, large ensembles, ensemble data, ensemble data assimilation methods are being explored to help minimize the impact of these errors. And all of this in my talk was focused on the atmosphere, but it goes without saying that these techniques need to be explored out into the coupled model domain. So I'll end there and take some questions. Thank you. Thank you so much, Joe. So what Joe just did is he summarized at least 60 years of the theory of predictability in chaotic systems, as well as representing forecast uncertainty in ensemble prediction systems in 30 minutes. Thanks so much, Joe.