 So our next speaker is Matt Newman, and are you here, Matt? I want to share your screen. You're on mute. Yeah, I know. I have a t-shirt, by the way, that says you're muted, but I shouldn't wear that today. OK. Sorry, give me a moment. I want to introduce you. So our next speaker is Matt Newman. Matt studies climate prediction and predictability on timescales ranging from weekly to decadal with an emphasis on the use and diagnosis of empirical models constructed from both observations and the output of climate models. He's also known to like to have a cup of coffee before going to work. And we have had many interesting conversations over a coffee in the morning. And his favorite food is blackberries. Remember that. That's pretty good. Thanks. Thank you. I'm looking forward to your talk. Yeah, so I'm actually back in my office for the first time in over a year just for this talk. So got all the cobwebs out and we're ready to go here. All right. So I'm going to be talking about the characterizing predictable dynamics. You'll notice a little asterisk there. And that's because what I'm going to do is I'm going to use what are called empirical dynamical models rather than what people are more familiar with in terms of physical dynamical models. So in Richard Seeger's immortal phrase, these are physics-free models. But first, I kind of wanted to go back to this. This is always drives me a little nuts. You've seen this picture a number of times already, I think, in the last week. And in particular, it's this idea of saying, well, here we are in this S2S time scale. And this is where the predictability is coming from. The MGO, land, surface data, and other sources. And I can't help looking at this thinking that something is missing on that time scale. And perhaps that something is ENSO. Because even though we think of ENSO as seasonal, it turns out that we can actually expect higher S2S scale when ENSO occurs. And so here's an illustration of that point. This is a week four scale of the IFS, which is the European operational model. This is the one that was operational in 2017. 20 years to on-cast for the winter. Week four scale for geopotential height on the top and for surface temperature over North America on the bottom. And there's two things that are pretty clear here. Number one, most of the scale is during ENSO years. So all the other things are nice, but it's also good to remember ENSO is the big dog and when the big dog barks, that's when there's predictability. But there is some scale at non-ENSO years and it can occur in different places, obviously, than ENSO. And so what we're interested in more generally here is looking at a situation where the average scale is very low. I mean, these values are in general on the average for a weekly average, they're typically point two to point four. And what we'd like to know is can we do better? Can we identify when the forecast will have higher scale? And that's important because for the most part, for most users, this level of scale, especially in non-ENSO years, is really probably not enough to really pay a lot of attention to. It's got academic interest, but maybe not so much practical interest. So what I'm interested in, what I'm gonna be talking about then is predictability on climate time scales and kind of going back to what Judith talked about. One way to think of predictability is it's the limit at which your forecast probability distribution function for day-to-day weather looks identical to the climatological distribution. At that point, you're just predicting climatology and that's about two weeks for daily weather. This is why we're interested in taking longer averages. And so again, it's worth thinking that climate predictability is not really the same problem necessarily as weather predictability. It's sort of the statistical mechanics problem. We're really interested in predicting aggregates of daily weather. We don't expect to predict individual events. So you can already think that a lot of what's gonna be relevant for predictability on these timescales is going to involve some averaging. Now there's two ways to do this. You could do predictability as a model construct. Use perfect model studies, which some people may have talked about. But then there's this more fundamental question about whether there are real limits even to S2S prediction, to seasonal predictability. And so there's been a body of work going back particularly to Val Madden and Leith and others trying to estimate from the observations essentially some slow predictable signal on this weekly seasonal timescale relative to fast weather noise. And weather noise, remember that's unpredictable. There's always gonna be a lot of it. So these signal and noise ratios are going to typically be low. And that of course is related to the fact that we have such low scale on average. So how could we estimate the signal and noise ratio? This is just by the way an illustration of how you can derive skill. You can actually derive an expected skill given a signal-to-noise ratio. Basically as the signal is getting larger relative to some spread, you would expect to have higher scale. So again, you could do this in a model. You can get the signal from the ensemble, mean of the model, and you can get noise from its spread. That's pretty straightforward, but there's a question about whether this is realistic. Obviously if the model for example doesn't typically have enough spread, it could actually be overconfident. If it has too much spread, it could be underconfident. And in that case, its actual skills could actually be better than its predictability. So again, we wanna do this in observations. So the way we're taking the approach we're following is basically estimating natural variability as weather noise. So you're trying to get an estimate of what the weather noise is. And then anything beyond that is signal. This goes way back to the early 60s, Gilman et al, a couple other papers. And as is always the case, lurking in the background is Ed Lorenz, who basically kind of gave him the idea apparently. And the idea is that they were just looking at daily weather and they found even with daily weather that the memory of the weather was only on the order of a few days. And you could do a pretty good job just by modeling it as something called red noise, very simple auto-regressive one, AR1 processed, where you have some time scale of memory. And then otherwise, the variability is just being forced by white noise. So white noise means equal power on all time scales, no memory effectively. Madden went and tried to do something similar, a little more sophisticated looking at seasonal averages, trying to see if the seasonal average variability might be higher than what you might expect just by averaging the daily red noise, found a small difference, suggested that might be predictable. There have been more sophisticated techniques since, but again, it's always important to remember that you have to make some assumption about the noise. You have to figure out empirically how to separate the signal and noise. So what we're gonna do is we're gonna try to come up with the framework that we can do this empirically. And it's fairly simple. We're starting with the idea that we have a highly non-linear system. And if you had a GCM obviously, you'd have a resolved part of the non-linearity, and then you'd have some non-linear portion which is parameterized, sub-grid scale diffusion, so on and so forth. And then there's always actually a residual, basically that's not even parameterized. So you're throwing away in all contexts some little bits of noise, which relates to what Judith talked about last week. So what we're gonna do is we're gonna coarse-grain the system because again, remember, what we're thinking about here is we're not trying to predict instantaneous values of whether we're trying to predict some sort of aggregate. So we're gonna be taking some averaging and as we take some averaging, what we wanna do is we wanna separate the slow part that we're hoping is predictable from the fast part, which may be chaotically non-linear, but it may be unpredictable on the time scales that we're interested in. The daily weather is not predictable on two-week time scales, but maybe we can get something on the three and four and five-week time scale. So we do what amounts to a Taylor's expansion and you'll get a linear term, obviously, linearized term, but you'll also get some maybe noise terms as well. This is the part that's kind of unpredictable. Now you get other terms, higher order terms than those higher order terms could be the deterministic predictable non-linearity, but we're gonna ignore them and we're gonna see how far we get. We can collect terms like this. You'll get a noise term here. You'll also get a noise term that has a linear state dependence, which we're going to ignore mainly because we're interested in doing this empirically. It doesn't impact anything I'm gonna talk about. I may return to it at the end. The only real difference is it allows for non-Gaussian statistics with linear predictability. And so we end up with a rather simple equation. Essentially, we're just gonna try to predict all the dynamics as a multivariate linear system. So the key thing here is this X is a state vector. It represents all the variables in the system at all grid points as a function of time. And so what we wanna do is we wanna see how well we can do. How much of the predictability of X can we capture in this fairly simple linear way? And then given that, it also gives us some idea of what the predictable non-linearity that we've ignored is contributing to the system. So what does this linear operator represent? First, it's worth stressing. It's not a linearization. So we're not linearizing the system. We're not assuming that the non-linear term is small. Allowing for a potentially large non-linear term. What we're saying is that the time scale of the non-linear term is small. And that's what allows us to coarse-grain the system, that we have relatively slow linear dynamical time scales and the chaotic non-linearities are pretty fast. So on the time scale that we're trying to predict on the weekly, monthly time scale, the non-linearities, while they provide a tremendous amount of variability, maybe they don't provide so much predictability. So we can parameterize our effects. You can think, for example, of synoptic eddies feeding into a block, and to maintain the block, the block is sort of your slow time scale that synoptic eddies feeding in are fast. And so it's that flux that the eddies are providing, which is allowing the block to maintain. So again, there's that kind of time scale separation. Barotopic versus baroclinic dynamics, that's also kind of a natural time scale separation. Another way to think of this, that Cecile Penland likes to use is, think of it as the dynamical version of the central women's theorem. Again, we're doing aggregates. This is the key thing here. We're averaging over a lot of individual fast events. We're looking for some slower envelope of those events. And as you average events, even in a highly nonlinear system, you tend to make it more Gaussian. And when you make it more Gaussian, you make it more linear. It's also important to remember that the linear dynamics, again, this is multivariate system. That operator is asymmetric. And it's asymmetric because the system is asymmetric. If you have a shear in the system, then obviously you're gonna have different advection time scale at one location than another. So the location matters. And also different variables interact differently. So wind blowing on the ocean, for example, will drive changes in the sea surface height, but there's no sea surface height, a variable that's in the wind momentum equation. And so again, that kind of asymmetry allows for the dynamics to be asymmetry. And that's gonna be important because that means that the, when you look at the dynamical modes of the system, when you do an eigenanalysis of L, that those eigenmodes are not orthogonal. And as a result, they can evolve in such a way that they can cover up each other and then uncover each other. And that can give you transient anomaly growth even in a stable system. Okay, so we could, in principle, we could imagine kind of deriving this system from first principles, that would be kind of hard. But in this assumption, if we're in a system where the nonlinear areas are mostly fast so that on a slower climate time scale, they're essentially unpredictable and their effects can be linearly parameterizable, these are assumptions which we wanna test, then we can empirically model the system in this linear way where X is some state vector representing a series of maps, remember, and then it's being forced by some white noise. S indicates that it could also have some spatial structure, no temporal structure, though. And again, we can do this because we can, in principle, this is what I just said before, sorry, we can actually infer this in an inverse sense. If we have a system of this type, it implies a relationship between the lag co-variability and the zero lag co-variability of the data. And so we can derive this linear operator from the data, just from the covariance statistics, just like we can do this for a univariate AR1 system. It's exactly analogous. And then the noise statistics, which I'm not gonna talk about, although they're very important, but I won't talk about them today. You get that from a balanced relationship. Now, from a practical standpoint, this linear inverse model is computed in a low order space. It involves essentially, effectively, an inversion of a covariance matrix. So we have to truncate into a low enough EOF space, a low enough dimension that inversion is tractable that we don't get errors, very large errors. And one of the things that means is that the limb is a low order model. It's on the order of tens of degrees of freedom, maybe a hundred, as opposed to millions of degrees of freedom that you get in the model. That makes it a lot easier to run in a forecast sense. So you can run forecasts with the limb in basically less than a minute. Now we're assuming that this is a good fit for the data. We're assuming that's a good observation is we can test that with something called the tau test. There are various ways to do this, but typically what we're looking for is that it doesn't really matter what lag we choose to derive this L. Essentially, we could look at one month lag covariance or we could look at two month lag covariance or three month lag covariance for monthly data. And in principle, the linear operator should always be the same. There are other tests one can do, basically one can look at the spectra, one can compute for example, a one month lag covariance for the tropical Pacific and then test to see that the resulting limb gives the spectra out to decadal timescales and matches observations to the extent that it does. And typically it does, then the system is acting in a fairly linear way over month to month, year to year. I'm not gonna show that here. So how do we do this? How do we do forecast? Well, this is pretty simple equation. You can just integrate this forward. Obviously, if you ignore the noise, then you have an ensemble mean forecast that can be computed very simply because this is an empirical technique. We have the usual problem of having enough training data in principle. In practice, the easiest way to do that is something called cross-validation. We typically take out 10% of the data, compute the linear operator, the limb, and then use that limb to make forecasts, make on-cast rather for the missing 10% and cycle through. So obviously that's different than in a GCM where you just run the initial conditions forward for the model for, say, the last 20 or 30 years. Now, again, remember that the noise is, in this case, is independent of the system. So that means that when we do probabilistic forecasts with the limb, all we're really looking at is a shift of the ensemble mean. We're predicting some sort of signal and that signal is going to give a shift and so you're going to see a change in the tail probabilities simply because of the mean shift. That actually allows you to look and see, well, so how good a forecast system is this? If I compare it to a system which could potentially have changes in spread, are the probabilities better or worse or the same in the limb as compared to a system where I could potentially have changes in the width of the PDF as well as in its position? So again, here's the complete forecast system. Now we do want to consider that spread. So we're going to have an integrated noise term. We'll get into the details of that. This is essentially the forecast system including the ensemble. So each noise realization will give you a different member here and again, you'll end up with a Gaussian distribution. Again, for any given forecast, there is this forecast error but that forecast error, again, it's only a function of the lag of the lead time. It's not a function of the state of the system. That means there is no spread-scale relationship. So if to the extent that the limb predictability is useful, it suggests that there may not be a spread-scale relationship on these time scales. So what is predictability in this context? Well, the nice thing about this is that I know what my forecast signal is, that's this term right here and I know what my forecast noise term is, that's the statistics I have here. So I can basically determine a signal-to-noise ratio at every forecast time. And given that signal-to-noise ratio, this is just kind of a complicated, this is the actual math, but without getting into the details, I can use that predicted signal-to-noise ratio, expected signal-to-noise ratio to give me an expected forecast skill. And so I'm going to use this row infinity that we like to call it, but it's basically in this case, the predicted anomaly correlation and one can derive other metrics from signal-to-noise. This predicted anomaly correlation to stratify forecast scale. So I'm looking for a forecast where I predict that my scale is going to be higher. And I look at other forecasts where I predict that the scale is going to be lower. And I want to compare those predictions to what actually happened, both for the LIM and for operational numerical GCMs. So this is a calculation like that for ENSO. This is a LIM using tropical anomalies of SST and sea surface height and wind, comparing it to the NMME. I'm looking at an ensemble mean of eight operational models. And we're looking at that for 82 to 2010. So here on the left, I'm showing the month six forecast. Along the equator, I'm showing the, it's a skill score. It's sort of an error measure, but basically a score of one would be perfect. And the blue line here is the LIM and the red is the ensemble mean of the eight models, the little lines here are the individual models. The green is the predicted spatial variation. And so the first thing you can see is that in general, the predicted spatial variation is being mimicked by the LIM. The LIM is basically picking up that same structure. Now we can also look in time and see how the evolved pattern correlation is. And what we find again, is that there's a very high correlation between the LIM variation in skill in blue from year to year and this ensemble mean is NMME actual forecast scale. The correlation at six monthly is 0.8. And again, both of the variations are basically being predicted by the LIM. So the LIM is able to predict spatial and temporal variations in its own skill. And that actually seems to also match the spatial and temporal variations that we see in GCNs. All right, so that was a seasonal. Now we'll switch to a sub-seasonal time scale. So this LIM is a little different. We're constructing this out of a lot of atmospheric variables here in particular, there's a vertically integrated diabetic heating is a component of the LIM state vector. And we're gonna compare that to two operational models, the EC model IFS, which was operational in 2017 and the CFS, which is actually operational now. And again, it's key to remember both of these are bias corrections. So they're actually, they're already somewhat empirical. If you look at the skill of these models without bias correcting, the error is gonna be quite a bit bigger. All right, so here's the weeks three, four scale and the weeks five, six scale for mean sea level pressure for the 99 to 2010 period, because that's all we have for the CFS. And again, you can see that in all three models, there's a similar pattern of scale, very clear maxima in the North Pacific, a secondary maxima or another maxima on the Atlantic. And these maxima very match very well, regions of lower scale are also lower in both the models and the LIM. The LIM typically has a poorer scale when it's bad. So there's a tendency for the LIM when it's scale is low, it's scale is really low. But when it's scale is high, it's typically comparable with the couple of models. And you can kind of, you can sort of see that here. But again, the LIM is basically capturing the spatial distribution of regions of high and low scale. Now we can also again look at that temporally and this is kind of in a way, it's the central figure of this talk. The LIM is identifying what we would call forecast of opportunity. In other words, when we look in the LIM and find the 10% of the time when the LIM is predicting that its scale is gonna be highest. So this is at the time of forecast, the LIM is saying this is gonna be one of my higher forecasts. We can then look at what the LIM actual skill is in those scenarios. And that's these orange, the dark orange bars here. And we can compare that to the LIM scale when it's predicting low scale. And you can see it's definitely getting that stratification even gets in the first week. But you can see that stratification as you go on through, obviously, there's a lot of sampling issues on 12 years of high and cast. What's nice is that the LIM is also capturing the IFS and to a lesser extent, the CFS variation in scale. So again, the LIM is picking out the higher IFS scale versus the lower IFS scale. And again, here's our higher IFS scale and lower IFS scale. So the LIM is not predicting just its own high scale cases. It's predicting the model high scale cases. And this is a better metric as it turns out. You can look in the paper if you're curious. Again, to looking at a spread scale relationship. If you go into the IFS ensemble and try to compute high and low scale cases based on the spread scale relationship, you won't get nearly this sort of a separation. So again, it's just basically suggesting that to the extent that the LIM is capturing the variability, sorry, capturing the predictability of the real system, that most of the predictability is just coming about because of this mean shift in the signal, not so much because of changes in spread. That's in the Pacific. This is the NAO, the black line is the IFS scale. Now actually we've captured about the top 15% here. And so that black line is representing the IFS scale for the NAO in the 15% of the cases when the LIM is predicting higher scale. And the orange is the LIM's own scale in those circumstances. And that scale is considerably higher than in all the remaining 85% when the LIM is predicting lower scale. Here's the IFS and here's the LIM. If you remember, like I said, when the LIM is bad, it's awful. Part of this has to do with this low order truncation of the LIM because we're not representing the entire system, but we're representing the part that's largely predictable, which is all we really care about on these timescales. So where is this coming from? How does the LIM do this? It's magic. Now it's pretty simple. If you go back to what I was talking about earlier, this linear dynamics, like I said, is non-normal. What that means is that initial conditions can, depending on how they're oriented, extract energy essentially or from the system, from the basic stages of the system, but they can only do that for finite time periods. They're not exponentially unstable. These initial conditions, they can initially be in a position where they can extract energy, but they evolve. And as they evolve, they change their shape and they change their pattern and they evolve in such a way that they are no longer able to grow and then they begin to decay. So these are all transient events, but these transient events can last for weeks and that allows a lot of predictability. So for example, without, again, getting into the details of how you compute this, if this is the initial 250 millibar stream function pattern and this is the initial heating pattern in the limb, 14 days later, this is the pattern that will evolve. This is essentially kind of a combination of an ENSO and MJO heating. Basically the MJO heating starts kind of out of phase with the ENSO and it ends up being in phase with ENSO. And as that evolution occurs, there's a large amplification of the stream function anomaly, but if you were to look at a, I don't know, 40 or 50 I figured exactly when this will have decayed so that is now less than it was initially. But that transient growth is important because when that transient growth occurs, that means that we're getting a larger signal. And so our signal to noise ratio can be particularly large if initial conditions have a particularly large projection on this pattern or potentially this pattern or this pattern. And that's what's shown here. These little stoplight plots are showing the various cases when there was a large projection on one and or two and or three. So all three, just one and two, one and three and so on. And again, these are stratified by three things. First, the black bar shows the limb predicted skill in those cases. So you can see this is basically working as the initial condition is projected more on these initial patterns, the initial optimal structure, you get more signal growth. And so the expected skill is higher. The red shows the actual skill. So again, we're predicting the variation in skill. And this is kind of an old model as MRF 98. It's somewhat worse than the CFS is today. But even in that case, we were still predicting the stratification of the actual model as well. Hey, Matt, could you wrap up in the next three minutes? Yeah, I am. I've got a clock and I'm almost on schedule. So this is just kind of an illustration of what this non-normal amplification looks like. Essentially, again, like I said, the Eigen modes of the system are non-orthogonal. They're not like EOFs. You can't partition the variability in the Eigen modes. One can't really talk about so much variability in this Eigen mode and so much variability in this Eigen mode. And if I add it up, it doesn't work that way. They cover each other up and then they reveal. So for example, for ENSO, a lot of ENSO can be understood as this so-called four-year Eigen mode, which is similar to theoretical Eigen modes that have been determined in this two-year fast SST mode, which initially cover each other up. So the SST anomaly is pretty small, but here's this heat content anomaly. Nine months later, this mode has evolved but somewhat slowly. This one has evolved quickly. It's changed sign and decayed and you end up with the SST amplification. Similarly in the atmospheres for the PNA, for example, if this is the initial 200 millibar stream function anomaly, that seems to be a consequence of a structure that looks like this, which I'll just call an internal. It's unrelated to SST basically. And then here is a component which is related to SST basically related to ENSO. So these two are covering each other up. You have a small initial stream function, normally the SST and heating and the tropical Pacific are large though and they're driving this growth so that after 15 days, there's a large amplification of the PNA. You'll notice this is entirely flip sign while this has stayed almost the same. So this is evolving more slowly. This is evolving quickly and goes from destructive interference to constructive interference giving us transient growth and predictability. Okay, I'm off by a minute, not too bad. All right, so to conclude then, the predictable S2S variations, what we see is that they are largely driven by these linear but non-normal dynamics. And we can make that claim because we've tested this. So a low order linear model, in fact, reproduces the, to a very good approximation, the GCM ensemble mean scale and largely predicts both the limb and operational model spatial and case to case variations in the scale, which means that predictability, although there can be certain preferred states, those preferred states are being arrived through linear dynamics. They're largely being driven because certain initial conditions, if they're forced basically randomly will evolve into particular states, those particular growing structures that give rise to more predictable signals, which also means because we're doing this in a limb that the initialization only needs to be correct in a fairly small subspace. So it's interesting that we want to try to get a perfect initial condition, but a lot of what we're getting in the initial condition on these longer time scales just generates noise. And it does not necessarily yield predictability. So again, while these empirical models are physics free, they do constrain physical dynamical models. So we do want to be thinking about how do you get, how do you go from these highly nonlinear physical systems and arrive into a system where most of the predictability is largely linear? And I just want to have one, this is sort of a postscript, but this also kind of says something important about S2S forecast in general, because it's again, useful to remember that on average S2S forecasts have low scale. We're fighting against the weather noise, which is not predictable. And even in the best locations, the average scale is generally below what is considered to be useful. And that suggests the importance of this predictability problem because while there are some users who could use small shifts of probability, so an anomaly correlation of point two gives you a bit of a shift, maybe that's still useful. There are a lot of users, I would say most users, that most decisions are binary. And that's particularly true if you're not particularly rich. You don't usually get to come back from a bad forecast if you're not very rich. If you're a ski area and you have a low snow year, you can survive it. So these issues are important. And so to me, it's really suggests the importance and not just from a scientific perspective but from a societal perspective of trying to identify these relatively few forecasts of opportunity and being able to confidently identify them ahead of time so that we can go to people who need forecasts and say, this is a forecast that you can trust. And I'll stop there. Thank you very much. Thank you, Matt.