 So we have our next plenary speaker, Arun Kumar from NCEP who will be giving the next talk. Arun, whenever you're ready, you can share your screen. Arun will be talking about assessing predictability and prediction scale on S to S time scales. And I'll give you a two minute warning before 20 minutes, Arun. Okay, let's get you to see the screen. Yeah, it's full screen. Thanks Aneesha and thanks to the organizers for inviting me to give this presentation. I work in a climate prediction center which is part of the U.S. National Weather Service. We are in operational centers. We produce a lot of long range forecasts on sub seasonal and seasonal time scales. So I picked up this topic of predictability and prediction scale on S to S time scales because it's very important for us. All of these aspects of the long range predictions are very important for us in both in quantifying prediction scale of our various forecasts and also trying to understand what the predictability limits on a particular time scale would be. I mainly focus on giving you the example in giving you the example that may be focused on seasonal timescale but the same problems or same methodologies and issues also exist for sub seasonal timescale. So whatever I say you can translate that it may be in the context of seasonal predictions, but you can translate that in the context of sub seasonal predictions. So there are two different aspects of the long range predictions on the S to S timescale. One is knowing what the prediction skill is. So for example, CPC makes an operational forecast for seasonal means surface temperature and precipitation anomalies. So we put these forecasts out every season. So users would like to know what the skill of the average skill of our seasonal predictions are. So in that sense, it's important for us to know the prediction skill and do an objective assessment of the quality of the series of the forecasts which are available to us on which we have done. So the final task assessment assessments of the prediction skills could be for operational predictions or it could be for individual prediction tools. And prediction skill is a property of what prediction tool you are using, be it dynamical model or could be be it empirical prediction tool. Another aspect of the long range forecast or any forecast is what are the predictability limits. I mean how much on average you can predict or explain the variance of a time series for a particular timescale. So predictability depends on the special location and what time average you're looking at. It's a property of nature. It's not a property of the prediction tool you're using to provide the forecast. And the connection between prediction skill and predictability is that predictability is the upper limit of the prediction skill or what the predictions can achieve again using whatever forecast tool you have. And given the predictability limit, prediction skill is a realization of predictability using various forecast tools. Again, they could be dynamical models or they could be empirical prediction systems. So predictability is important in the sense that you get an estimate how much there is more room for improvement. You can set the expectations for the user community in terms of how much these forecasts are going to improve. These are fairly controversial issues. You can estimate predictability using various approaches. These efforts have gone back to more than 40 years now. And he was showing that a screenshot of the paper by Ronald Madden. In fact, he did a first estimate in 1976 for the monthly mean temperature anomalies over the US. So these efforts have gone far back in time but still remains a controversial observation. And to set the context, I'll just give you a couple of examples of the prediction skill. So CPC, Climate Prediction Center has been making operational forecasts for seasonal surface temperature and rainfall going back to starting in January 1995. So we have about 25 year history of these archived forecasts which are operational and you can assess the how the skill varies in time or how it varies in the space. So this is an example of the time series of the surface temperature forecast for the seasonal surface temperature going from starting from 1995 to 2040. I'll just focus on the black curve, ignore the red curve for now. So right panel is the skill, a particular skill measure which we use, we just call height case skill score, the time series of that particular skill measure from starting from 95 to 2040. And the small inset panel is just the 11th month running mean of the time series shown on the left. The property of this skill measure is that if the forecast is perfect, then the score is 100. If the forecast is same as the climatology, then the skill score is zero, and forecast is wrong everywhere, then the skill is minus 50. The average of the black line is 14.7. The best you can make, the best this number can get is 100, like for anomaly correlation. And this average of the seasonal forecast skill over this 25 year period, which is 14.7 is much below what the maximum could be so that the skill in long range forecast is fairly well established fact, both of the sub seasonal and seasonal time scale is fairly low compared to what the predict skill for the weather predictions is. What is is that there are lots of ups and downs sometimes the skill is high other times the skill is low. You could see it better in the right panel so if you just focus on the black line so sometimes it goes up sometimes gets close to zero there are a lot of ups and downs in terms of skill of the individual forecast. The message I want to convey here is that the average skill over this 45 year period is about 15 out of what the possible maximum could be which is 100. The same figure and a special special map so this this this the skill for a particular time is comparing the forecast map over the US against the for the observed map observed anomalies over the US. You can also do the skill measure in time and purchase special distribution so that's shown on the left panel here so this is the same skill score over the US for the service temperature. We have evaluated over 25 years, the red colors are better skill, yellowish colors are smaller skill. So in general, you can see the special distribution of the skill of the surface temperature forecast over the US going back 25 years. In some regions where the skill is higher the other regions like Mid-Best and Upper Mid-Best States over the US where the skill is lower based on the SS vendor 25 years. And again, the typical color range is from kind of oranges to red and the skill over those regions is about 25 to 30 which is again fairly low compared to what the maximum could be. So this is just not the case for the operational forecast. I mean you can look at the dynamical forecast also which is shown an example shown in the right right hand panel. This is for the the seasonal forecast system running at NCEPC called CFS version two. So these are the hindcast from 1982 to 2010. You can look at the monthly mean rainfall doing assessment of the skill of the rainfall over this period and what's shown is an hourly correlation. The scale here is yellowish color goes from between 0.1 to 0.2 and brownish colors somewhere between 0.7 and 0.8. So again, you can see a regionality in the skill. And in general, except for very few places where then so signal is the skill is fairly low. So these are some average assessments of the skill and give you a more specific example. So this is an example of forecast made by CPC for the winter of DJF 2015-16 Alina. This was one of the biggest Alina in the historical record we have so the SST anomaly at the top left panel. Our forecast was shown in the right panel. So the green colors indicate a chance for above normal rainfall anomaly. And yellow and brownish color indicate a chance for below normal anomalies and then those who are familiar with the ENSO signal that this is very much like the composite of the ENSO signal over the US. So this was a prediction and what actually happened or what was observed for the DJF 2015-16 is it's the bottom left panel. So in fact, instead of above normal rainfall anomaly, you have below normal rainfall and how do you want to California, which is shown in the reddish colors. And there is a above normal rainfall anomaly or Washington over the Oregon and Washington along the northern west coast of the US. So the forecast was almost almost opposite to what the, sorry, the observations are fairly almost opposite to what the forecast was. Given that this was one of the biggest Alina years and you expect the forecast to be better during that. We got a fair bit of issues in terms of verifying and communicating this forecast of the user. So this is an example of individual forecast. You can do it other years or other individual cases, but the bottom line is that there's a lot of assessments of the long range forecast be there in the sub seasonal timescale or in the seasonal timescale. The general impression is the prediction scale is fairly low. So raises a fundamental question is the low skill and artifact of the moral biases. As you heard in the previous talks, there are issues with the morals and that might lead to a lower realization of the prediction skill from the predictability or is because of the natural constraint imposed by the predictability and answer to these questions, both on the sub seasonal and seasonal timescales are fairly important. There's a. You can use various tools of various datasets to understand reasons for low prediction skills. You can look at multiple morals. Various dynamical systems are empirical predictions of stones, assess the skill and see if there is general agreement in the amplitude and the special variability in the skill and if every method get starts to give you the same, same pattern and same amplitude then you might start to believe this is more because of the natural constraints on the predictability. To estimate the predictability and going beyond the prediction skill and also use modeling systems and multi model prediction systems I'm just going to give you some examples of assessing the prediction skill and the predictability base of the California rainfall during DJF interest based on the North American multimodal examples. The models user listed here it's not important to the seven different models. Each has a different ensemble size, depending upon for the seasonal predictions and as a very fairly extensive data set which can be used for assessment of prediction skill and predictability in the seasonal timescale you can use a similar. Multi model and databases as to as project hosted by ECMWF or the sub sub X project running in the US. So very basic thing you can first do is to take all these seven models check their climatology over the California check the standard deviation of the rainfall over California. And look at the regression but we need a 3.4 and the rainfall anomaly or over the California. This is just a vertically stretch display of the grid points along the California coast indicated in the small black box at the top right panel. The message here is by enlarge so if you the last panel is the observation here so all the models have basic sense of what the climatology of the rainfall is. What the standard deviation of the rainfall is and what the answer regression or answer response over the California is and they all have different depictions. It depends on your viewpoint whether the glass is a full or half empty, you can believe or criticize these models on an individual basis in terms of their biases. So if you look for example if you look at the regression parents and observed anomalies as when when Alina is there has higher rainfall over the west coast southern west coast over California and has a lower rainfall over Oregon and Washington. Almost every model replicates that almost all the seven model replicates that parents are in their right over the north and over the southern coast of the western US. So it provides some basic assessment of what these models can do individually, and you can also do a multi model ensemble mean and look at what the skill might be or what the characteristics of the multi model ensemble might be. Hey everyone, you have five minutes including questions. Yeah, so you can look at the signal denoids and the skill of these forecast top panel here is the skill of the individual models in the terms of an hourly correlation just look at the last panel in the top which is the skill of the enemy. The message here is again the skill is somewhere between 0.4 to 0.5 or the western coast or the skill across all the models or average of all the models is fairly low. You can look at the individual events. So the top panel here is the forecast for so what we did was to pick up 11 that we know he is going from 2014 to 2015. Now these are ranked in terms of the amplitude I mean a 3.4 anomaly the last panel is the composite for all these events in the bottom panel is observations. So NMME is Ensemble Mean Forecast so it gives a very consistent forecast across all the 11 years. It's drier over the northern west coast, better over the southern west coast of the US. But the observed anomalies are all over the place some events are opposite to what the composite is somewhere in the same sign as the composite. And the last three panels which are the observed anomalies for 82, 82, 83, 97 and 2015, which was the three strongest of linear years and share some resemblance but they could be very different. Another way to look at it would be you can just look at the consistency among the models across all the 11 and linear so again just focus on the two panels at the right. The top one is the top right. Father most right panel is observations and the bottom is NMME. The color scheme here is if all seven models agree in the sign of their anomalies then it's kind of either dark brown or dark green. While there's very good consistency among the NMME forecast systems there is very little case by case consistency in terms of observations. So, what this is saying is that although all the forecast for 11 and linear years across NMME for all 11 and linear years will predict a below normal rainfall on our lease over the northern part of the west coast of the US and predicted below about normal rainfall on our lease over the southern part of the US west coast and these predictions are very consistent across different and linear years the same kind of consistency does not exist in observation so there's a lot more variability. There's a lot of variations compared to variability within within the forecast and that's the basic reason that the skill is very low for for the long range forecast there's a lot of noise in the system which is not, which cannot be predicted So this sort of indicates that what he's saying in terms of long range forecast the low prediction skill might be because of lower predictability limits and it depends on which region you're looking at. So if you were to do the same kind of analysis over the southeast. In the US the prediction skill would be higher the consistency among for in sampling forecast would be about the same what he's saying or the western coast. But the consistency among observations across different and linear years will be much higher than you're seeing with the western US. So, doing this analysis regionally also gives you some some some basic assessment of what the special variability in the in the rain for would be, oh sorry in the predictability So, given these all these wonderful data sets we have for the seasonal forecast for the subsequent forecast you can do a lot of fundamental or basic science stuff looking at the assessments of the forecast skill. I can look at look at the predictability and these assessment starts to give you some estimate of what the predictability limits might be. It seems like it in the low prediction skill might be because of low predictability limits but it's not guaranteed. All these estimates are based model base so there's a lot of room for erroneous conclusions. models may share common biases all have errors so they might all be giving the low estimates or some key physical aspects of the key physical process or the aspects of the climate variability may not be represented in these models. I just keep repeating these things as the newer generation of the model. Come along and build up a history of these things and every 510 years have a systematic assessment of what the prediction skill and predictability limits are going beyond that raises a lot of interesting questions so again if you compare the consistency among the forecast and observations you see a big difference and this difference could be there that if the model response to Enzo is very linear in the in the models compared to in the observations if that happens then this this case could arise so the questions you can ask is how linear is the atmospheric response to Enzo can we can get very can we get a handle on these things what the influence on Enzo flavors so there's a simple pessimism so an eastern pessimism and so what's the there is influence on the atmospheric responses how the spread of the or the noise of the seedle mean changes with time to do an instructor and so and so on so these are some very fundamental questions for our prediction skill and predictability on seasonal time scale similar questions exist for a subseason time scale and I'll stop there. Thanks for your attention. Thanks, Aaron. Yeah, really great points and questions for future research as well. Auntie has a question until would you like to unmute and ask you question. Yes, I tried to formulate it as in the chat like how important do you think are the skill levels if you look at at so say the source of seasonal predictability like Enzo versus the skill in and predicting the propagation of the signal elsewhere to the extra tropics or so do you have a feel of you know where you know you spoke about relatively low levels of skill where where this reduction of the potential skill mostly comes from and an implication of what could be perhaps improved in the future. So the indication so far is, or at least some some level of consensus consensus in the community is that a lot of errors might be coming from representing the tropical rainfall. And, and the Dalek once you have the errors in the tropical rainfall and the response to different MGO errors in the response to in conjunction with the MGO and so variability. Those errors in the rainfall anomalies might lead lead lead to errors in the teleconnections or the extra tropical latitudes. There could also be errors in the mean state in between extra tropics and tropics which might influence the propagation of teleconnection patterns from the tropics beyond but I guess most sort of consensus would be the lot of errors might be coming from tropical rainfall and air sea interactions in the tropics and a lot of focus or field programs devoted to that aspects of improving that cloud models and their performance. Great. Thanks. Thank you. Thanks. Judith, you have a question. Thank you so much for your talk. I would be interested if you comment a little on state dependent predictability associated with the P&A state, especially for the western United States and to which degree this is similar or not similar to the predictability from NAO over Europe. Right. So I mean the assessment of predictability or prediction skill like shown in this diagram are across all kinds of answers. It includes normal years, La Nina years and Alina years. The ideal would be to have a state or a conditional estimate of prediction skill. So for example, just for the Alinos or just for the Central Pacific or Eastern Pacific Aminos or the assessment of prediction skill to other factors. So at Lino when you have Arctic oscillation or a positive negative phase of AO or NAO, but that's one of the eventual goals where it's very hard to pull out information. I mean, given we have only 25, 30 years history of the data, anything beyond assessing the skill beyond across all 30 years, trying to condition it on individual parameters gets very hard. At least in the seasonal timescale, it might have better chance or better luck on the sub seasonal timescale where the sampling, the verification sample data set might be longer. So yeah, it is an important thing, but it's very hard to approach or answer this given the data sets of the forecast history we have. Okay, thanks. Thanks. Thanks, Jared. Thanks, Aaron. Let's see other questions in the chat. I had one, Aaron, actually it's coming back to Shuie's question to Tim earlier this morning in terms of using imperfect simple models to assess predictability. This seems to be dominating in our community and how do we change this in terms of understanding the true limits on predictability versus what imperfect models are giving us as limits on predictability. Right, so yeah, so the model based estimates are limited because based on what the biases in the models are, but you could use range of methodologies. If you go back to Ronald Madden's paper, it was just purely based on the autocorrelation of the weather time series, purely observed time series. So that's one method that has been done repeatedly over and over as the observational data records has become longer and longer. So that's one way to just using purely observational data set for seasonal timescale and do the same with sub seasonal. There's no errors there but except what methodologies you use. So you can use the linear based forecasting, empirical forecast tools like connecting and so anomalies or MGA anomalies. With other parts of the globe, do a linear reconstruction and see what the estimates of predictability are. So those are purely observational approaches. And then ultimately, I mean, you could have the hierarchy of models, you can do the MEP runs where SSCs are perfectly there but they are single interactions or not. Or you can finally have a couple of models, which have everything thrown in there. But there are already at least for the seasonal hierarchy of approaches to estimate these things and you just had to put it all together and take it from a distance and see what answers you're getting. And whether it can make sense out of those things or not, or you can reach a more convincing conclusion. Yeah. Great. Yeah. Thanks again. Thank you. Great talk and great set of questions for future research. Thanks.